C# Class Lucene.Net.Analysis.Cn.ChineseTokenizer

Tokenize Chinese text as individual chinese chars.

The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:

  • The tokens returned from ChineseTokenizer are C1, C2, C3, C4
  • The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.

Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

Inheritance: Lucene.Net.Analysis.Tokenizer
Datei anzeigen Open project: apache/lucenenet Class Usage Examples

Public Methods

Method Description
ChineseTokenizer ( AttributeFactory factory, TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
ChineseTokenizer ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void

Private Methods

Method Description
Init ( ) : void
flush ( ) : bool
push ( char c ) : void

Method Details

ChineseTokenizer() public method

public ChineseTokenizer ( AttributeFactory factory, TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
@in TextReader
return Lucene.Net.Analysis.Tokenattributes

ChineseTokenizer() public method

public ChineseTokenizer ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
@in TextReader
return Lucene.Net.Analysis.Tokenattributes

End() public method

public End ( ) : void
return void

IncrementToken() public method

public IncrementToken ( ) : bool
return bool

Reset() public method

public Reset ( ) : void
return void