C# (CSharp) Lucene.Net.Analysis.Th Namespace

Classes

Name Description
TestThaiAnalyzer Test case for ThaiAnalyzer, modified from TestFrenchAnalyzer
TestThaiWordFilterFactory
ThaiAnalyzer
ThaiAnalyzer.SavedStreams
ThaiTokenizer Tokenizer that use BreakIterator to tokenize Thai text.

WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.

ThaiTokenizerFactory Factory for ThaiTokenizer.
 <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.ThaiTokenizerFactory"/> </analyzer> </fieldType>
ThaiWordBreaker LUCENENET specific class to patch the behavior of the ICU BreakIterator. Corrects the breaking of words by finding transitions between Thai and non-Thai characters. This logic assumes that the Java BreakIterator also breaks up Thai numerals from Arabic numerals (1, 2, 3, etc.). That is, it assumes the first test below passes and the second test fails in Lucene (not attempted). ThaiAnalyzer analyzer = new ThaiAnalyzer(TEST_VERSION_CURRENT, CharArraySet.EMPTY_SET); AssertAnalyzesTo(analyzer, "๑๒๓456", new string[] { "๑๒๓", "456" }); AssertAnalyzesTo(analyzer, "๑๒๓456", new string[] { "๑๒๓456" });
ThaiWordFilter
ThaiWordFilterFactory