C# (CSharp) Lucene.Net.Analysis.Cjk Namespace

Classes

Name Description
CJKAnalyzer An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter
CJKAnalyzer.DefaultSetHolder
CJKBigramFilter Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

CJK types are set by these tokenizers, but you can also use #CJKBigramFilter(TokenStream, int) to explicitly control which of the CJK scripts are turned into bigrams.

By default, when a CJK character has no adjacent characters to form a bigram, it is output in unigram form. If you want to always output both unigrams and bigrams, set the outputUnigrams flag in CJKBigramFilter#CJKBigramFilter(TokenStream, int, boolean). This can be used for a combined unigram+bigram approach.

In all cases, all non-CJK input is passed thru unmodified.

CJKTokenizerFactory
CJKWidthFilter A TokenFilter that normalizes CJK width differences:
  • Folds fullwidth ASCII variants into the equivalent basic latin
  • Folds halfwidth Katakana variants into the equivalent kana

NOTE: this filter can be viewed as a (practical) subset of NFKC/NFKD Unicode normalization. See the normalization support in the ICU package for full normalization.

CJKWidthFilterFactory Factory for CJKWidthFilter.
 <fieldType name="text_cjk" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CJKBigramFilterFactory"/> </analyzer> </fieldType>
TestCJKAnalyzer Most tests adopted from TestCJKTokenizer
TestCJKAnalyzer.AnalyzerAnonymousInnerClassHelper
TestCJKAnalyzer.AnalyzerAnonymousInnerClassHelper2
TestCJKAnalyzer.AnalyzerAnonymousInnerClassHelper3
TestCJKAnalyzer.FakeStandardTokenizer
TestCJKBigramFilter
TestCJKBigramFilter.AnalyzerAnonymousInnerClassHelper
TestCJKBigramFilter.AnalyzerAnonymousInnerClassHelper2
TestCJKBigramFilter.AnalyzerAnonymousInnerClassHelper3
TestCJKBigramFilter.AnalyzerAnonymousInnerClassHelper4
TestCJKBigramFilter.AnalyzerAnonymousInnerClassHelper5
TestCJKBigramFilterFactory Simple tests to ensure the CJK bigram factory is working.
TestCJKTokenizer
TestCJKTokenizer.TestToken
TestCJKTokenizerFactory
TestCJKWidthFilterFactory Simple tests to ensure the CJKWidthFilterFactory is working