C# 클래스 Lucene.Net.Analysis.CJK.CJKTokenizer

CJKTokenizer was modified from StopTokenizer which does a decent job for most European languages. and it perferm other token method for double-byte chars: the token will return at each two charactors with overlap match.
Example: "java C1C2C3C4" will be segment to: "java" "C1C2" "C2C3" "C3C4" it also need filter filter zero length token ""
for Digit: digit, '+', '#' will token as letter
for more info on Asia language(Chinese Japanese Korean) text segmentation: please search google

@author Che, Dong @version $Id: CJKTokenizer.java,v 1.3 2003/01/22 20:54:47 otis Exp $
상속: Lucene.Net.Analysis.Tokenizer
파일 보기 프로젝트 열기: synhershko/lucene.net

공개 메소드들

메소드 설명
CJKTokenizer ( AttributeFactory factory, TextReader _in ) : System
CJKTokenizer ( Lucene.Net.Util.AttributeSource source, TextReader _in ) : System
CJKTokenizer ( TextReader _in ) : System

Construct a token stream processing the given input.

End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void
Reset ( TextReader reader ) : void

비공개 메소드들

메소드 설명
Init ( ) : void

메소드 상세

CJKTokenizer() 공개 메소드

public CJKTokenizer ( AttributeFactory factory, TextReader _in ) : System
factory AttributeFactory
_in TextReader
리턴 System

CJKTokenizer() 공개 메소드

public CJKTokenizer ( Lucene.Net.Util.AttributeSource source, TextReader _in ) : System
source Lucene.Net.Util.AttributeSource
_in TextReader
리턴 System

CJKTokenizer() 공개 메소드

Construct a token stream processing the given input.
public CJKTokenizer ( TextReader _in ) : System
_in TextReader I/O reader
리턴 System

End() 공개 메소드

public End ( ) : void
리턴 void

IncrementToken() 공개 메소드

public IncrementToken ( ) : bool
리턴 bool

Reset() 공개 메소드

public Reset ( ) : void
리턴 void

Reset() 공개 메소드

public Reset ( TextReader reader ) : void
reader TextReader
리턴 void