C# 클래스 Lucene.Net.Analysis.Standard.UAX29URLEmailTokenizerImpl

This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.

Tokens produced are of the following types:

<ALPHANUM>: A sequence of alphabetic and numeric characters
<NUM>: A number
<URL>: A URL
<EMAIL>: An email address
<SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer
<IDEOGRAPHIC>: A single CJKV ideographic character
<HIRAGANA>: A single hiragana character
<KATAKANA>: A sequence of katakana characters
<HANGUL>: A sequence of Hangul characters

상속: IStandardTokenizerInterface

파일 보기 프로젝트 열기: apache/lucenenet

공개 프로퍼티들

프로퍼티	타입	설명
EMAIL_TYPE	int
HANGUL_TYPE	int
HIRAGANA_TYPE	int
IDEOGRAPHIC_TYPE	int
KATAKANA_TYPE	int
NUMERIC_TYPE	int
SOUTH_EAST_ASIAN_TYPE	int
URL_TYPE	int
WORD_TYPE	int
YYEOF	int

공개 메소드들

메소드	설명
GetNextToken ( ) : int
GetText ( ICharTermAttribute t ) : void
UAX29URLEmailTokenizerImpl ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
YyBegin ( int newState ) : void
YyCharAt ( int pos ) : char
YyClose ( ) : void
YyPushBack ( int number ) : void
YyReset ( TextReader reader ) : void

비공개 메소드들

메소드	설명
ZzRefill ( ) : bool
ZzScanError ( int errorCode ) : void
ZzUnpackAction ( string packed, int offset, int result ) : int
ZzUnpackAction ( ) : int[]
ZzUnpackAttribute ( string packed, int offset, int result ) : int
ZzUnpackAttribute ( ) : int[]
ZzUnpackCMap ( string packed ) : char[]
ZzUnpackRowMap ( string packed, int offset, int result ) : int
ZzUnpackRowMap ( ) : int[]
ZzUnpackTrans ( string packed, int offset, int result ) : int
ZzUnpackTrans ( ) : int[]

메소드 상세

GetNextToken() 공개 메소드

public GetNextToken ( ) : int
리턴	int

GetText() 공개 메소드

public GetText ( ICharTermAttribute t ) : void
t	ICharTermAttribute
리턴	void

UAX29URLEmailTokenizerImpl() 공개 메소드

public UAX29URLEmailTokenizerImpl ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
@in	System.IO.TextReader
리턴	Lucene.Net.Analysis.Tokenattributes

YyBegin() 공개 메소드

public YyBegin ( int newState ) : void
newState	int
리턴	void

YyCharAt() 공개 메소드

public YyCharAt ( int pos ) : char
pos	int
리턴	char

YyClose() 공개 메소드

public YyClose ( ) : void
리턴	void

YyPushBack() 공개 메소드

public YyPushBack ( int number ) : void
number	int
리턴	void

YyReset() 공개 메소드

public YyReset ( TextReader reader ) : void
reader	System.IO.TextReader
리턴	void

프로퍼티 상세

EMAIL_TYPE 공개적으로 정적으로 프로퍼티

public static int EMAIL_TYPE
리턴	int

HANGUL_TYPE 공개적으로 정적으로 프로퍼티

public static int HANGUL_TYPE
리턴	int

HIRAGANA_TYPE 공개적으로 정적으로 프로퍼티

public static int HIRAGANA_TYPE
리턴	int

IDEOGRAPHIC_TYPE 공개적으로 정적으로 프로퍼티

public static int IDEOGRAPHIC_TYPE
리턴	int

KATAKANA_TYPE 공개적으로 정적으로 프로퍼티

public static int KATAKANA_TYPE
리턴	int

NUMERIC_TYPE 공개적으로 정적으로 프로퍼티

public static int NUMERIC_TYPE
리턴	int

SOUTH_EAST_ASIAN_TYPE 공개적으로 정적으로 프로퍼티

public static int SOUTH_EAST_ASIAN_TYPE
리턴	int

URL_TYPE 공개적으로 정적으로 프로퍼티

public static int URL_TYPE
리턴	int

WORD_TYPE 공개적으로 정적으로 프로퍼티

public static int WORD_TYPE
리턴	int

YYEOF 공개적으로 정적으로 프로퍼티

public static int YYEOF
리턴	int