C# 클래스 org.apache.lucene.analysis.standard.StandardTokenizerImpl

This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

Tokens produced are of the following types:

  • <ALPHANUM>: A sequence of alphabetic and numeric characters
  • <NUM>: A number
  • <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer
  • <IDEOGRAPHIC>: A single CJKV ideographic character
  • <HIRAGANA>: A single hiragana character
  • <KATAKANA>: A sequence of katakana characters
  • <HANGUL>: A sequence of Hangul characters
상속: StandardTokenizerInterface
파일 보기 프로젝트 열기: paulirwin/lucene.net

공개 메소드들

메소드 설명
StandardTokenizerImpl ( java @in ) : System

Creates a new scanner

getText ( CharTermAttribute t ) : void

Fills CharTermAttribute with the current token text.

yybegin ( int newState ) : void

Enters a new lexical state

yychar ( ) : int
yycharat ( int pos ) : char

Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster

yyclose ( ) : void

Closes the input stream.

yylength ( ) : int

Returns the length of the matched text region.

yypushback ( int number ) : void

Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method

yyreset ( java reader ) : void

Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.

yystate ( ) : int

Returns the current lexical state.

yytext ( ) : string

Returns the text matched by the current regular expression.

비공개 메소드들

메소드 설명
zzRefill ( ) : bool

Refills the input buffer.

zzScanError ( int errorCode ) : void

Reports an error that occured while scanning. In a wellformed scanner (no or only correct usage of yypushback(int) and a match-all fallback rule) this method will only be called with things that "Can't Possibly Happen". If this method is called, something is seriously wrong (e.g. a JFlex bug producing a faulty scanner etc.). Usual syntax/scanner level error handling should be done in error fallback rules.

zzUnpackAction ( string packed, int offset, int result ) : int
zzUnpackAction ( ) : int[]
zzUnpackAttribute ( string packed, int offset, int result ) : int
zzUnpackAttribute ( ) : int[]
zzUnpackCMap ( string packed ) : char[]

Unpacks the compressed character translation table.

zzUnpackRowMap ( string packed, int offset, int result ) : int
zzUnpackRowMap ( ) : int[]
zzUnpackTrans ( string packed, int offset, int result ) : int
zzUnpackTrans ( ) : int[]

메소드 상세

StandardTokenizerImpl() 공개 메소드

Creates a new scanner
public StandardTokenizerImpl ( java @in ) : System
@in java
리턴 System

getText() 공개 메소드

Fills CharTermAttribute with the current token text.
public getText ( CharTermAttribute t ) : void
t Lucene.Net.Analysis.Tokenattributes.CharTermAttribute
리턴 void

yybegin() 공개 메소드

Enters a new lexical state
public yybegin ( int newState ) : void
newState int the new lexical state
리턴 void

yychar() 공개 메소드

public yychar ( ) : int
리턴 int

yycharat() 공개 메소드

Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster
public yycharat ( int pos ) : char
pos int the position of the character to fetch. /// A value from 0 to yylength()-1. ///
리턴 char

yyclose() 공개 메소드

Closes the input stream.
public yyclose ( ) : void
리턴 void

yylength() 공개 메소드

Returns the length of the matched text region.
public yylength ( ) : int
리턴 int

yypushback() 공개 메소드

Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method
public yypushback ( int number ) : void
number int the number of characters to be read again. /// This number must not be greater than yylength()!
리턴 void

yyreset() 공개 메소드

Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.
public yyreset ( java reader ) : void
reader java the new input stream
리턴 void

yystate() 공개 메소드

Returns the current lexical state.
public yystate ( ) : int
리턴 int

yytext() 공개 메소드

Returns the text matched by the current regular expression.
public yytext ( ) : string
리턴 string