Name | Description |
---|---|
Analyzer | An Analyzer represents a policy for extracting terms that are indexed from text. The Analyzer builds TokenStreams, which breaks down text into tokens. |
Analyzer.GlobalReuseStrategy | |
Analyzer.PerFieldReuseStrategy | |
Analyzer.ReuseStrategy | Strategy defining how TokenStreamComponents are reused per call to Analyzer#tokenStream(String, java.io.Reader). |
Analyzer.TokenStreamComponents | this class encapsulates the outer components of a token stream. It provides access to the source (Tokenizer) and the outer end (sink), an instance of TokenFilter which also serves as the TokenStream returned by Analyzer#tokenStream(String, Reader). |
BaseCharFilter | * Base utility class for implementing a CharFilter. * You subclass this, and then record mappings by calling * AddOffCorrectMap, and then invoke the correct * method to correct an offset. |
BaseTokenStreamTestCase | Base class for all Lucene unit tests that use TokenStreams. This class runs all tests twice, one time with {@link TokenStream#setOnlyUseNewAPI} |
BaseTokenStreamTestCase.AnalysisThread | |
BaseTokenStreamTestCase.CheckClearAttributesAttribute | Attribute that records if it was cleared or not. this is used for testing that ClearAttributes() was called correctly. |
ChainedFilter | |
ChainedFilterTest | |
CharFilter | Subclasses of CharFilter can be chained to filter a Reader They can be used as java.io.Reader with additional offset correction. Tokenizers will automatically use #correctOffset if a CharFilter subclass is used. this class is abstract: at a minimum you must implement #read(char[], int, int), transforming the input in some way from #input, and #correct(int) to adjust the offsets to match the originals. You can optionally provide more efficient implementations of additional methods like #read(), #read(char[]), #read(java.nio.CharBuffer), but this is not required. For examples and integration with Analyzer, see the Lucene.Net.Analysis Analysis package documentation. |
CollationTestbase | base test class for testing Unicode collation. |
CollationTestbase.ThreadAnonymousInnerClassHelper | |
MockAnalyzer | Analyzer for testing this analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:
|
MockTokenizer | Tokenizer for testing. this tokenizer is a replacement for #WHITESPACE, #SIMPLE, and #KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. this tokenizer has the following behavior:
|
PayloadSetter | |
ReusableStringReader | Internal class to enable reuse of the string reader by Analyzer#tokenStream(String,String) |
ReverseStringFilter | |
StopFilter | Removes stop words from a token stream. |
TestAnalyzers | |
TestAnalyzers.MyStandardAnalyzer | |
TestCachingTokenFilter | |
TestCachingTokenFilter.AnonymousClassTokenStream | |
TestCachingTokenFilter.TokenStreamAnonymousInnerClassHelper | |
TestCharArraySet | |
TestCharFilter | |
TestCharFilter.CharFilter1 | |
TestCharFilter.CharFilter2 | |
TestGraphTokenizers | |
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper | |
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper2 | |
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper3 | |
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper4 | |
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper5 | |
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper6 | |
TestGraphTokenizers.GraphTokenizer | |
TestGraphTokenizers.MGTFAHAnalyzerAnonymousInnerClassHelper2 | |
TestGraphTokenizers.MGTFBHAnalyzerAnonymousInnerClassHelper | |
TestGraphTokenizers.RemoveATokens | |
TestISOLatin1AccentFilter | |
TestKeywordAnalyzer | |
TestLengthFilter | |
TestMappingCharFilter | |
TestMockAnalyzer | |
TestMockAnalyzer.AnalyzerAnonymousInnerClassHelper | |
TestMockAnalyzer.AnalyzerAnonymousInnerClassHelper2 | |
TestMockAnalyzer.AnalyzerWrapperAnonymousInnerClassHelper | |
TestMockAnalyzer.AnalyzerWrapperAnonymousInnerClassHelper2 | |
TestNumericTokenStream | |
TestPerFieldAnalzyerWrapper | |
TestStandardAnalyzer | |
TestStopAnalyzer | |
TestStopFilter | |
TestTeeSinkTokenFilter | |
TestTeeSinkTokenFilter.AnonymousClassSinkFilter | |
TestTeeSinkTokenFilter.AnonymousClassSinkFilter1 | |
TestTeeSinkTokenFilter.ModuloSinkFilter | |
TestTeeSinkTokenFilter.ModuloTokenFilter | |
TestToken | |
TestToken.SenselessAttribute | |
TokenStream | A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text. this is an abstract class; concrete subclasses are:
TokenStream API has been introduced with Lucene 2.9. this API has moved from being Token-based to Attribute-based. While Token still exists in 2.9 as a convenience class, the preferred way to store the information of a Token is to use AttributeImpls. The workflow of the new
You can find some example code for the new API in the analysis package level Javadoc. Sometimes it is desirable to capture a current state of a The {@code TokenStream}-API in Lucene is based on the decorator pattern. Therefore all non-abstract subclasses must be final or have at least a final implementation of #incrementToken! this is checked when Java assertions are enabled. |
Tokenizer | A Tokenizer is a TokenStream whose input is a Reader. this is an abstract class; subclasses must override #IncrementToken() NOTE: Subclasses overriding #IncrementToken() must call AttributeSource#ClearAttributes() before setting attributes. |
Tokenizer.ReaderAnonymousInnerClassHelper | |
VocabularyAssert | Utility class for doing vocabulary-based stemming tests |
WordlistLoader | Loads a text file and adds every line as an entry to a Hashtable. Every line should contain only one word. If the file is not found or on any error, an empty table is returned. |