이름 | 설명 |
---|---|
AbstractAnalysisFactory | Abstract parent class for analysis factories TokenizerFactory, TokenFilterFactory and CharFilterFactory. The typical lifecycle for a factory consumer is:
|
BufferedCharFilter | LUCENENET specific class to mimic Java's BufferedReader (that is, a reader that is seekable) so it supports Mark() and Reset() (which are part of the Java Reader class), but also provide the Correct() method of BaseCharFilter. At some point we might be able to make some readers accept streams (that are seekable) so this functionality can be .NET-ified. |
CharArrayIterator | A CharacterIterator used internally for use with BreakIterator @lucene.internal |
CharArrayIterator.CharArrayIteratorAnonymousInnerClassHelper2 | |
CharArrayIterator.CharArrayIteratorAnonymousInnerClassHelper4 | |
CharTokenizer | An abstract base class for simple, character-oriented tokenizers. You must specify the required LuceneVersion compatibility when creating CharTokenizer: A new CharTokenizer API has been introduced with Lucene 3.1. This API moved from UTF-16 code units to UTF-32 codepoints to eventually add support for supplementary characters. The old char based API has been deprecated and should be replaced with the int based methods #isTokenChar(int) and #normalize(int). As of Lucene 3.1 each CharTokenizer - constructor expects a LuceneVersion argument. Based on the given LuceneVersion either the new API or a backwards compatibility layer is used at runtime. For LuceneVersion < 3.1 the backwards compatibility layer ensures correct behavior even for indexes build with previous versions of Lucene. If a LuceneVersion >= 3.1 is used CharTokenizer requires the new API to be implemented by the instantiated class. Yet, the old char based API is not required anymore even if backwards compatibility must be preserved. CharTokenizer subclasses implementing the new API are fully backwards compatible if instantiated with LuceneVersion < 3.1. Note: If you use a subclass of CharTokenizer with LuceneVersion >= 3.1 on an index build with a version < 3.1, created tokens might not be compatible with the terms in your index. |
CharacterUtils | CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a Version instance. @lucene.internal |
CharacterUtils.CharacterBuffer | A simple IO buffer to use with CharacterUtils#fill(CharacterBuffer, Reader). |
CharacterUtils.Java4CharacterUtils | |
CharacterUtils.Java5CharacterUtils | |
ElisionFilter | Removes elisions from a TokenStream. For example, "l'avion" (the plane) will be tokenized as "avion" (plane). |
ElisionFilterFactory | Factory for ElisionFilter. <fieldType name="text_elsn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ElisionFilterFactory" articles="stopwordarticles.txt" ignoreCase="true"/> </analyzer> </fieldType> |
FilteringTokenFilter | Abstract base class for TokenFilters that may remove tokens. You have to implement #accept and return a boolean if the current token should be preserved. #incrementToken uses this method to decide if a token should be passed to the caller. |
OpenStringBuilder | A StringBuilder that allows one to access the array. |
RollingCharBuffer | Acts like a forever growing char[] as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet. This is like a PushbackReader, except you don't have to specify up-front the max size of the buffer, but you do have to periodically call #freeBefore. |
SegmentingTokenizerBase | Breaks text into sentences with a BreakIterator and allows subclasses to decompose these sentences into words. This can be used by subclasses that need sentence context for tokenization purposes, such as CJK segmenters. Additionally it can be used by subclasses that want to mark sentence boundaries (with a custom attribute, extra token, position increment, etc) for downstream processing. @lucene.experimental |
StopwordAnalyzerBase | Base class for Analyzers that need to make use of stopword sets. |
TestCharArrayIterator | |
TestCharArrayMap_ | |
TestCharArraySet | |
TestCharTokenizers | |
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper | |
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper.LetterTokenizerAnonymousInnerClassHelper | |
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper2 | |
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper2.LetterTokenizerAnonymousInnerClassHelper2 | |
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper3 | |
TestCharTokenizers.AnalyzerAnonymousInnerClassHelper3.NumberAndSurrogatePairTokenizer | |
TestCharacterUtils | |
TestElision | |
TestElision.AnalyzerAnonymousInnerClassHelper | |
TestElisionFilterFactory | Simple tests to ensure the French elision filter factory is working. |
TestFilesystemResourceLoader | |
TestRollingCharBuffer | |
TestWordlistLoader | |
TokenFilterFactory | Abstract parent class for analysis factories that create TokenFilter instances. |
TokenizerFactory | Abstract parent class for analysis factories that create Tokenizer instances. |
WordlistLoader | Loader for text files that represent a list of stopwords. |