Name | Description |
---|---|
TestThaiAnalyzer | Test case for ThaiAnalyzer, modified from TestFrenchAnalyzer |
TestThaiWordFilterFactory | |
ThaiAnalyzer | |
ThaiAnalyzer.SavedStreams | |
ThaiTokenizer | Tokenizer that use BreakIterator to tokenize Thai text. WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available. |
ThaiTokenizerFactory | Factory for ThaiTokenizer. <fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.ThaiTokenizerFactory"/> </analyzer> </fieldType> |
ThaiWordBreaker | LUCENENET specific class to patch the behavior of the ICU BreakIterator. Corrects the breaking of words by finding transitions between Thai and non-Thai characters. This logic assumes that the Java BreakIterator also breaks up Thai numerals from Arabic numerals (1, 2, 3, etc.). That is, it assumes the first test below passes and the second test fails in Lucene (not attempted). ThaiAnalyzer analyzer = new ThaiAnalyzer(TEST_VERSION_CURRENT, CharArraySet.EMPTY_SET); AssertAnalyzesTo(analyzer, "๑๒๓456", new string[] { "๑๒๓", "456" }); AssertAnalyzesTo(analyzer, "๑๒๓456", new string[] { "๑๒๓456" }); |
ThaiWordFilter | |
ThaiWordFilterFactory |