C# (CSharp) Lucene.Net.Analysis Namespace

Nested Namespaces

Lucene.Net.Analysis.AR
Lucene.Net.Analysis.Ar
Lucene.Net.Analysis.BR
Lucene.Net.Analysis.Bg
Lucene.Net.Analysis.Br
Lucene.Net.Analysis.CJK
Lucene.Net.Analysis.CWSharp
Lucene.Net.Analysis.Ca
Lucene.Net.Analysis.CharFilter
Lucene.Net.Analysis.CharFilters
Lucene.Net.Analysis.Cjk
Lucene.Net.Analysis.Ckb
Lucene.Net.Analysis.Cn
Lucene.Net.Analysis.CommonGrams
Lucene.Net.Analysis.Compound
Lucene.Net.Analysis.Core
Lucene.Net.Analysis.Cz
Lucene.Net.Analysis.Da
Lucene.Net.Analysis.De
Lucene.Net.Analysis.El
Lucene.Net.Analysis.En
Lucene.Net.Analysis.Es
Lucene.Net.Analysis.Eu
Lucene.Net.Analysis.Ext
Lucene.Net.Analysis.Fa
Lucene.Net.Analysis.Fi
Lucene.Net.Analysis.Fr
Lucene.Net.Analysis.Ga
Lucene.Net.Analysis.Gl
Lucene.Net.Analysis.Hi
Lucene.Net.Analysis.Hu
Lucene.Net.Analysis.Hunspell
Lucene.Net.Analysis.Hy
Lucene.Net.Analysis.Id
Lucene.Net.Analysis.In
Lucene.Net.Analysis.It
Lucene.Net.Analysis.Lv
Lucene.Net.Analysis.Miscellaneous
Lucene.Net.Analysis.NGram
Lucene.Net.Analysis.Ngram
Lucene.Net.Analysis.Nl
Lucene.Net.Analysis.No
Lucene.Net.Analysis.PanGu
Lucene.Net.Analysis.Path
Lucene.Net.Analysis.Pattern
Lucene.Net.Analysis.Payloads
Lucene.Net.Analysis.Pl
Lucene.Net.Analysis.Position
Lucene.Net.Analysis.Pt
Lucene.Net.Analysis.Query
Lucene.Net.Analysis.RU
Lucene.Net.Analysis.Reverse
Lucene.Net.Analysis.Ro
Lucene.Net.Analysis.Ru
Lucene.Net.Analysis.Shingle
Lucene.Net.Analysis.Sinks
Lucene.Net.Analysis.Snowball
Lucene.Net.Analysis.Standard
Lucene.Net.Analysis.Stempel
Lucene.Net.Analysis.Sv
Lucene.Net.Analysis.Synonym
Lucene.Net.Analysis.Th
Lucene.Net.Analysis.Tokenattributes
Lucene.Net.Analysis.Tr
Lucene.Net.Analysis.Util
Lucene.Net.Analysis.Wikipedia

Classes

Name Description
Analyzer An Analyzer represents a policy for extracting terms that are indexed from text. The Analyzer builds TokenStreams, which breaks down text into tokens.
Analyzer.GlobalReuseStrategy
Analyzer.PerFieldReuseStrategy
Analyzer.ReuseStrategy Strategy defining how TokenStreamComponents are reused per call to Analyzer#tokenStream(String, java.io.Reader).
Analyzer.TokenStreamComponents this class encapsulates the outer components of a token stream. It provides access to the source (Tokenizer) and the outer end (sink), an instance of TokenFilter which also serves as the TokenStream returned by Analyzer#tokenStream(String, Reader).
BaseCharFilter * Base utility class for implementing a CharFilter. * You subclass this, and then record mappings by calling * AddOffCorrectMap, and then invoke the correct * method to correct an offset.
BaseTokenStreamTestCase Base class for all Lucene unit tests that use TokenStreams.

This class runs all tests twice, one time with {@link TokenStream#setOnlyUseNewAPI} false and after that one time with true.

BaseTokenStreamTestCase.AnalysisThread
BaseTokenStreamTestCase.CheckClearAttributesAttribute Attribute that records if it was cleared or not. this is used for testing that ClearAttributes() was called correctly.
ChainedFilter
ChainedFilterTest
CharFilter Subclasses of CharFilter can be chained to filter a Reader They can be used as java.io.Reader with additional offset correction. Tokenizers will automatically use #correctOffset if a CharFilter subclass is used.

this class is abstract: at a minimum you must implement #read(char[], int, int), transforming the input in some way from #input, and #correct(int) to adjust the offsets to match the originals.

You can optionally provide more efficient implementations of additional methods like #read(), #read(char[]), #read(java.nio.CharBuffer), but this is not required.

For examples and integration with Analyzer, see the Lucene.Net.Analysis Analysis package documentation.

CollationTestbase base test class for testing Unicode collation.
CollationTestbase.ThreadAnonymousInnerClassHelper
MockAnalyzer Analyzer for testing

this analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:

  • By default, the assertions in MockTokenizer are turned on for extra checks that the consumer is consuming properly. These checks can be disabled with #setEnableChecks(boolean).
  • Payload data is randomly injected into the stream for more thorough testing of payloads.
MockTokenizer Tokenizer for testing.

this tokenizer is a replacement for #WHITESPACE, #SIMPLE, and #KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. this tokenizer has the following behavior:

  • An internal state-machine is used for checking consumer consistency. These checks can be disabled with #setEnableChecks(boolean).
  • For convenience, optionally lowercases terms that it outputs.
PayloadSetter
ReusableStringReader Internal class to enable reuse of the string reader by Analyzer#tokenStream(String,String)
ReverseStringFilter
StopFilter Removes stop words from a token stream.
TestAnalyzers
TestAnalyzers.MyStandardAnalyzer
TestCachingTokenFilter
TestCachingTokenFilter.AnonymousClassTokenStream
TestCachingTokenFilter.TokenStreamAnonymousInnerClassHelper
TestCharArraySet
TestCharFilter
TestCharFilter.CharFilter1
TestCharFilter.CharFilter2
TestGraphTokenizers
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper2
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper3
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper4
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper5
TestGraphTokenizers.AnalyzerAnonymousInnerClassHelper6
TestGraphTokenizers.GraphTokenizer
TestGraphTokenizers.MGTFAHAnalyzerAnonymousInnerClassHelper2
TestGraphTokenizers.MGTFBHAnalyzerAnonymousInnerClassHelper
TestGraphTokenizers.RemoveATokens
TestISOLatin1AccentFilter
TestKeywordAnalyzer
TestLengthFilter
TestMappingCharFilter
TestMockAnalyzer
TestMockAnalyzer.AnalyzerAnonymousInnerClassHelper
TestMockAnalyzer.AnalyzerAnonymousInnerClassHelper2
TestMockAnalyzer.AnalyzerWrapperAnonymousInnerClassHelper
TestMockAnalyzer.AnalyzerWrapperAnonymousInnerClassHelper2
TestNumericTokenStream
TestPerFieldAnalzyerWrapper
TestStandardAnalyzer
TestStopAnalyzer
TestStopFilter
TestTeeSinkTokenFilter
TestTeeSinkTokenFilter.AnonymousClassSinkFilter
TestTeeSinkTokenFilter.AnonymousClassSinkFilter1
TestTeeSinkTokenFilter.ModuloSinkFilter
TestTeeSinkTokenFilter.ModuloTokenFilter
TestToken
TestToken.SenselessAttribute
TokenStream A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text.

this is an abstract class; concrete subclasses are:

  • Tokenizer, a TokenStream whose input is a Reader; and
  • TokenFilter, a TokenStream whose input is another TokenStream.
A new TokenStream API has been introduced with Lucene 2.9. this API has moved from being Token-based to Attribute-based. While Token still exists in 2.9 as a convenience class, the preferred way to store the information of a Token is to use AttributeImpls.

TokenStream now extends AttributeSource, which provides access to all of the token Attributes for the TokenStream. Note that only one instance per AttributeImpl is created and reused for every token. this approach reduces object creation and allows local caching of references to the AttributeImpls. See #IncrementToken() for further details.

The workflow of the new TokenStream API is as follows:

  1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
  2. The consumer calls TokenStream#reset().
  3. The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
  4. The consumer calls #IncrementToken() until it returns false consuming the attributes after each call.
  5. The consumer calls #end() so that any end-of-stream operations can be performed.
  6. The consumer calls #close() to release any resource when finished using the TokenStream.
To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in #IncrementToken().

You can find some example code for the new API in the analysis package level Javadoc.

Sometimes it is desirable to capture a current state of a TokenStream, e.g., for buffering purposes (see CachingTokenFilter, TeeSinkTokenFilter). For this usecase AttributeSource#captureState and AttributeSource#restoreState can be used.

The {@code TokenStream}-API in Lucene is based on the decorator pattern. Therefore all non-abstract subclasses must be final or have at least a final implementation of #incrementToken! this is checked when Java assertions are enabled.

Tokenizer A Tokenizer is a TokenStream whose input is a Reader.

this is an abstract class; subclasses must override #IncrementToken()

NOTE: Subclasses overriding #IncrementToken() must call AttributeSource#ClearAttributes() before setting attributes.

Tokenizer.ReaderAnonymousInnerClassHelper
VocabularyAssert Utility class for doing vocabulary-based stemming tests
WordlistLoader Loads a text file and adds every line as an entry to a Hashtable. Every line should contain only one word. If the file is not found or on any error, an empty table is returned.