C# 클래스 Lucene.Net.Analysis.Cn.ChineseTokenizer

Tokenize Chinese text as individual chinese chars.

The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:

  • The tokens returned from ChineseTokenizer are C1, C2, C3, C4
  • The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.

Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

상속: Lucene.Net.Analysis.Tokenizer
파일 보기 프로젝트 열기: apache/lucenenet 1 사용 예제들

공개 메소드들

메소드 설명
ChineseTokenizer ( AttributeFactory factory, TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
ChineseTokenizer ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void

비공개 메소드들

메소드 설명
Init ( ) : void
flush ( ) : bool
push ( char c ) : void

메소드 상세

ChineseTokenizer() 공개 메소드

public ChineseTokenizer ( AttributeFactory factory, TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
@in TextReader
리턴 Lucene.Net.Analysis.Tokenattributes

ChineseTokenizer() 공개 메소드

public ChineseTokenizer ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
@in TextReader
리턴 Lucene.Net.Analysis.Tokenattributes

End() 공개 메소드

public End ( ) : void
리턴 void

IncrementToken() 공개 메소드

public IncrementToken ( ) : bool
리턴 bool

Reset() 공개 메소드

public Reset ( ) : void
리턴 void