C# (CSharp) org.apache.lucene.analysis.standard Namespace

Classes

Name	Description
ClassicAnalyzer	Filters ClassicTokenizer with ClassicFilter, {@link LowerCaseFilter} and StopFilter, using a list of English stop words. You must specify the required Version compatibility when creating ClassicAnalyzer: As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords As of 2.9, StopFilter preserves position increments As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068) ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1. As of 3.1, StandardAnalyzer implements Unicode text segmentation, as specified by UAX#29.
ClassicAnalyzer.TokenStreamComponentsAnonymousInnerClassHelper
ClassicFilter	Normalizes tokens extracted with ClassicTokenizer.
ClassicFilterFactory	Factory for ClassicFilter. <fieldType name="text_clssc" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.ClassicFilterFactory"/> </analyzer> </fieldType>
ClassicTokenizer	A grammar-based tokenizer constructed with JFlex This should be a good tokenizer for most European-language documents: Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split. Recognizes email addresses and internet hostnames as one token. Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer. ClassicTokenizer was named StandardTokenizer in Lucene versions prior to 3.1. As of 3.1, StandardTokenizer implements Unicode text segmentation, as specified by UAX#29.
ClassicTokenizerImpl	This class implements the classic lucene StandardTokenizer up until 3.0
StandardFilter	Normalizes tokens extracted with StandardTokenizer.
StandardFilterFactory	Factory for StandardFilter. <fieldType name="text_stndrd" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
StandardTokenizerFactory	Factory for StandardTokenizer. <fieldType name="text_stndrd" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory" maxTokenLength="255"/> </analyzer> </fieldType>
StandardTokenizerImpl	This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Tokens produced are of the following types: <ALPHANUM>: A sequence of alphabetic and numeric characters <NUM>: A number <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer <IDEOGRAPHIC>: A single CJKV ideographic character <HIRAGANA>: A single hiragana character <KATAKANA>: A sequence of katakana characters <HANGUL>: A sequence of Hangul characters
UAX29URLEmailAnalyzer	Filters org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer with org.apache.lucene.analysis.standard.StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words. You must specify the required org.apache.lucene.util.Version compatibility when creating UAX29URLEmailAnalyzer
UAX29URLEmailAnalyzer.TokenStreamComponentsAnonymousInnerClassHelper