C# Class Lucene.Net.Analysis.Standard.UAX29URLEmailTokenizerImpl

This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.

Tokens produced are of the following types:

  • <ALPHANUM>: A sequence of alphabetic and numeric characters
  • <NUM>: A number
  • <URL>: A URL
  • <EMAIL>: An email address
  • <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer
  • <IDEOGRAPHIC>: A single CJKV ideographic character
  • <HIRAGANA>: A single hiragana character
  • <KATAKANA>: A sequence of katakana characters
  • <HANGUL>: A sequence of Hangul characters
Inheritance: IStandardTokenizerInterface
Afficher le fichier Open project: apache/lucenenet

Méthodes publiques

Свойство Type Description
EMAIL_TYPE int
HANGUL_TYPE int
HIRAGANA_TYPE int
IDEOGRAPHIC_TYPE int
KATAKANA_TYPE int
NUMERIC_TYPE int
SOUTH_EAST_ASIAN_TYPE int
URL_TYPE int
WORD_TYPE int
YYEOF int

Méthodes publiques

Méthode Description
GetNextToken ( ) : int
GetText ( ICharTermAttribute t ) : void
UAX29URLEmailTokenizerImpl ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
YyBegin ( int newState ) : void
YyCharAt ( int pos ) : char
YyClose ( ) : void
YyPushBack ( int number ) : void
YyReset ( TextReader reader ) : void

Private Methods

Méthode Description
ZzRefill ( ) : bool
ZzScanError ( int errorCode ) : void
ZzUnpackAction ( string packed, int offset, int result ) : int
ZzUnpackAction ( ) : int[]
ZzUnpackAttribute ( string packed, int offset, int result ) : int
ZzUnpackAttribute ( ) : int[]
ZzUnpackCMap ( string packed ) : char[]
ZzUnpackRowMap ( string packed, int offset, int result ) : int
ZzUnpackRowMap ( ) : int[]
ZzUnpackTrans ( string packed, int offset, int result ) : int
ZzUnpackTrans ( ) : int[]

Method Details

GetNextToken() public méthode

public GetNextToken ( ) : int
Résultat int

GetText() public méthode

public GetText ( ICharTermAttribute t ) : void
t ICharTermAttribute
Résultat void

UAX29URLEmailTokenizerImpl() public méthode

public UAX29URLEmailTokenizerImpl ( TextReader @in ) : Lucene.Net.Analysis.Tokenattributes
@in System.IO.TextReader
Résultat Lucene.Net.Analysis.Tokenattributes

YyBegin() public méthode

public YyBegin ( int newState ) : void
newState int
Résultat void

YyCharAt() public méthode

public YyCharAt ( int pos ) : char
pos int
Résultat char

YyClose() public méthode

public YyClose ( ) : void
Résultat void

YyPushBack() public méthode

public YyPushBack ( int number ) : void
number int
Résultat void

YyReset() public méthode

public YyReset ( TextReader reader ) : void
reader System.IO.TextReader
Résultat void

Property Details

EMAIL_TYPE public_oe static_oe property

public static int EMAIL_TYPE
Résultat int

HANGUL_TYPE public_oe static_oe property

public static int HANGUL_TYPE
Résultat int

HIRAGANA_TYPE public_oe static_oe property

public static int HIRAGANA_TYPE
Résultat int

IDEOGRAPHIC_TYPE public_oe static_oe property

public static int IDEOGRAPHIC_TYPE
Résultat int

KATAKANA_TYPE public_oe static_oe property

public static int KATAKANA_TYPE
Résultat int

NUMERIC_TYPE public_oe static_oe property

public static int NUMERIC_TYPE
Résultat int

SOUTH_EAST_ASIAN_TYPE public_oe static_oe property

public static int SOUTH_EAST_ASIAN_TYPE
Résultat int

URL_TYPE public_oe static_oe property

public static int URL_TYPE
Résultat int

WORD_TYPE public_oe static_oe property

public static int WORD_TYPE
Résultat int

YYEOF public_oe static_oe property

public static int YYEOF
Résultat int