C# Class Lucene.Net.Analysis.Cn.ChineseTokenizer

Tokenize Chinese text as individual chinese chars.

The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:

  • The tokens returned from ChineseTokenizer are C1, C2, C3, C4
  • The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.

Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

Inheritance: Lucene.Net.Analysis.Tokenizer
Afficher le fichier Open project: apache/lucenenet Class Usage Examples

Méthodes publiques

Méthode Description
ChineseTokenizer ( AttributeFactory factory, TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
ChineseTokenizer ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void

Private Methods

Méthode Description
Init ( ) : void
flush ( ) : bool
push ( char c ) : void

Method Details

ChineseTokenizer() public méthode

public ChineseTokenizer ( AttributeFactory factory, TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
@in TextReader
Résultat Lucene.Net.Analysis.Tokenattributes

ChineseTokenizer() public méthode

public ChineseTokenizer ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
@in TextReader
Résultat Lucene.Net.Analysis.Tokenattributes

End() public méthode

public End ( ) : void
Résultat void

IncrementToken() public méthode

public IncrementToken ( ) : bool
Résultat bool

Reset() public méthode

public Reset ( ) : void
Résultat void