C# Class Lucene.Net.Analysis.Th.ThaiWordBreaker

LUCENENET specific class to patch the behavior of the ICU BreakIterator. Corrects the breaking of words by finding transitions between Thai and non-Thai characters. This logic assumes that the Java BreakIterator also breaks up Thai numerals from Arabic numerals (1, 2, 3, etc.). That is, it assumes the first test below passes and the second test fails in Lucene (not attempted). ThaiAnalyzer analyzer = new ThaiAnalyzer(TEST_VERSION_CURRENT, CharArraySet.EMPTY_SET); AssertAnalyzesTo(analyzer, "๑๒๓456", new string[] { "๑๒๓", "456" }); AssertAnalyzesTo(analyzer, "๑๒๓456", new string[] { "๑๒๓456" });
Datei anzeigen Open project: apache/lucenenet Class Usage Examples

Public Methods

Method Description
Current ( ) : int
Next ( ) : int
SetText ( string text ) : void
ThaiWordBreaker ( BreakIterator wordBreaker ) : ICU4NET

Private Methods

Method Description
GetNext ( ) : int

Method Details

Current() public method

public Current ( ) : int
return int

Next() public method

public Next ( ) : int
return int

SetText() public method

public SetText ( string text ) : void
text string
return void

ThaiWordBreaker() public method

public ThaiWordBreaker ( BreakIterator wordBreaker ) : ICU4NET
wordBreaker BreakIterator
return ICU4NET