C# (CSharp) Lucene.Net.Analysis.Util Namespace

Classes

Name Description
AbstractAnalysisFactory Abstract parent class for analysis factories TokenizerFactory, TokenFilterFactory and CharFilterFactory.

The typical lifecycle for a factory consumer is:

  1. Create factory via its constructor (or via XXXFactory.forName)
  2. (Optional) If the factory uses resources such as files, ResourceLoaderAware#inform(ResourceLoader) is called to initialize those resources.
  3. Consumer calls create() to obtain instances.

BufferedCharFilter LUCENENET specific class to mimic Java's BufferedReader (that is, a reader that is seekable) so it supports Mark() and Reset() (which are part of the Java Reader class), but also provide the Correct() method of BaseCharFilter. At some point we might be able to make some readers accept streams (that are seekable) so this functionality can be .NET-ified.
CharArrayIterator A CharacterIterator used internally for use with BreakIterator @lucene.internal
CharArrayIterator.CharArrayIteratorAnonymousInnerClassHelper2
CharArrayIterator.CharArrayIteratorAnonymousInnerClassHelper4
CharTokenizer An abstract base class for simple, character-oriented tokenizers.

You must specify the required LuceneVersion compatibility when creating CharTokenizer:

A new CharTokenizer API has been introduced with Lucene 3.1. This API moved from UTF-16 code units to UTF-32 codepoints to eventually add support for supplementary characters. The old char based API has been deprecated and should be replaced with the int based methods #isTokenChar(int) and #normalize(int).

As of Lucene 3.1 each CharTokenizer - constructor expects a LuceneVersion argument. Based on the given LuceneVersion either the new API or a backwards compatibility layer is used at runtime. For LuceneVersion < 3.1 the backwards compatibility layer ensures correct behavior even for indexes build with previous versions of Lucene. If a LuceneVersion >= 3.1 is used CharTokenizer requires the new API to be implemented by the instantiated class. Yet, the old char based API is not required anymore even if backwards compatibility must be preserved. CharTokenizer subclasses implementing the new API are fully backwards compatible if instantiated with LuceneVersion < 3.1.

Note: If you use a subclass of CharTokenizer with LuceneVersion >= 3.1 on an index build with a version < 3.1, created tokens might not be compatible with the terms in your index.

CharacterUtils CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a Version instance. @lucene.internal
CharacterUtils.CharacterBuffer A simple IO buffer to use with CharacterUtils#fill(CharacterBuffer, Reader).
CharacterUtils.Java4CharacterUtils
CharacterUtils.Java5CharacterUtils
ElisionFilter Removes elisions from a TokenStream. For example, "l'avion" (the plane) will be tokenized as "avion" (plane).
ElisionFilterFactory Factory for ElisionFilter.
 <fieldType name="text_elsn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ElisionFilterFactory"  articles="stopwordarticles.txt" ignoreCase="true"/> </analyzer> </fieldType>
FilteringTokenFilter Abstract base class for TokenFilters that may remove tokens. You have to implement #accept and return a boolean if the current token should be preserved. #incrementToken uses this method to decide if a token should be passed to the caller.

As of Lucene 4.4, an IllegalArgumentException is thrown when trying to disable position increments when filtering terms.

OpenStringBuilder A StringBuilder that allows one to access the array.
RollingCharBuffer Acts like a forever growing char[] as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet. This is like a PushbackReader, except you don't have to specify up-front the max size of the buffer, but you do have to periodically call #freeBefore.
SegmentingTokenizerBase Breaks text into sentences with a BreakIterator and allows subclasses to decompose these sentences into words.

This can be used by subclasses that need sentence context for tokenization purposes, such as CJK segmenters.

Additionally it can be used by subclasses that want to mark sentence boundaries (with a custom attribute, extra token, position increment, etc) for downstream processing. @lucene.experimental

StopwordAnalyzerBase Base class for Analyzers that need to make use of stopword sets.
TestCharArrayIterator
TestCharArrayMap_
TestCharArraySet
TestCharTokenizers
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper.LetterTokenizerAnonymousInnerClassHelper
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper2
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper2.LetterTokenizerAnonymousInnerClassHelper2
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper3
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper3.NumberAndSurrogatePairTokenizer
TestCharacterUtils
TestElision
TestElision.AnalyzerAnonymousInnerClassHelper
TestElisionFilterFactory Simple tests to ensure the French elision filter factory is working.
TestFilesystemResourceLoader
TestRollingCharBuffer
TestWordlistLoader
TokenFilterFactory Abstract parent class for analysis factories that create TokenFilter instances.
TokenizerFactory Abstract parent class for analysis factories that create Tokenizer instances.
WordlistLoader Loader for text files that represent a list of stopwords.