C# 클래스 Lucene.Net.Analysis.Standard.StandardTokenizer

A grammar-based tokenizer constructed with JFlex

This should be a good tokenizer for most European-language documents:

Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
Recognizes email addresses and internet hostnames as one token.

Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.

상속: Lucene.Net.Analysis.Tokenizer

파일 보기 프로젝트 열기: apache/lucenenet 1 사용 예제들

공개 프로퍼티들

프로퍼티	타입	설명
TOKEN_TYPES	string[]

공개 메소드들

메소드	설명
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void
StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, AttributeFactory factory, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31	Creates a new StandardTokenizer with a given org.apache.lucene.util.AttributeSource.AttributeFactory
StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31	Creates a new instance of the StandardTokenizer. Attaches the `input` to the newly created JFlex scanner.

비공개 메소드들

메소드	설명
Init ( Lucene.Net.Util.LuceneVersion matchVersion ) : void

메소드 상세

Dispose() 공개 메소드

public Dispose ( ) : void
리턴	void

End() 공개 메소드

public End ( ) : void
리턴	void

IncrementToken() 공개 메소드

public IncrementToken ( ) : bool
리턴	bool

Reset() 공개 메소드

public Reset ( ) : void
리턴	void

StandardTokenizer() 공개 메소드

Creates a new StandardTokenizer with a given org.apache.lucene.util.AttributeSource.AttributeFactory

public StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, AttributeFactory factory, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31
matchVersion	Lucene.Net.Util.LuceneVersion
factory	AttributeFactory
input	System.IO.TextReader
리턴	Lucene.Net.Analysis.Standard.Std31

StandardTokenizer() 공개 메소드

Creates a new instance of the StandardTokenizer. Attaches the input to the newly created JFlex scanner.

public StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31
matchVersion	Lucene.Net.Util.LuceneVersion
input	System.IO.TextReader	The input reader /// /// See http://issues.apache.org/jira/browse/LUCENE-1068
리턴	Lucene.Net.Analysis.Standard.Std31

프로퍼티 상세

TOKEN_TYPES 공개적으로 정적으로 프로퍼티

String token types that correspond to token type int constants

public static string[] TOKEN_TYPES
리턴	string[]