C# Класс Lucene.Net.Analysis.Standard.ClassicTokenizer

A grammar-based tokenizer constructed with JFlex

This should be a good tokenizer for most European-language documents:

Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
Recognizes email addresses and internet hostnames as one token.

Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer. ClassicTokenizer was named StandardTokenizer in Lucene versions prior to 3.1. As of 3.1, StandardTokenizer implements Unicode text segmentation, as specified by UAX#29.

Наследование: Tokenizer

Показать файл Открыть проект Примеры использования класса

Открытые свойства

Свойство	Тип	Описание
TOKEN_TYPES	string[]

Открытые методы

Метод	Описание
ClassicTokenizer ( LuceneVersion matchVersion, AttributeFactory factory, System.IO.TextReader input ) : Lucene.Net.Analysis.Tokenattributes	Creates a new ClassicTokenizer with a given org.apache.lucene.util.AttributeSource.AttributeFactory
ClassicTokenizer ( LuceneVersion matchVersion, System.IO.TextReader input ) : Lucene.Net.Analysis.Tokenattributes	Creates a new instance of the ClassicTokenizer. Attaches the `input` to the newly created JFlex scanner.
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void

Приватные методы

Метод	Описание
Init ( LuceneVersion matchVersion ) : void

Описание методов

ClassicTokenizer() публичный Метод

Creates a new ClassicTokenizer with a given org.apache.lucene.util.AttributeSource.AttributeFactory

public ClassicTokenizer ( LuceneVersion matchVersion, AttributeFactory factory, System.IO.TextReader input ) : Lucene.Net.Analysis.Tokenattributes
matchVersion	LuceneVersion
factory	AttributeFactory
input	System.IO.TextReader
Результат	Lucene.Net.Analysis.Tokenattributes

ClassicTokenizer() публичный Метод

Creates a new instance of the ClassicTokenizer. Attaches the input to the newly created JFlex scanner.

public ClassicTokenizer ( LuceneVersion matchVersion, System.IO.TextReader input ) : Lucene.Net.Analysis.Tokenattributes
matchVersion	LuceneVersion
input	System.IO.TextReader	The input reader /// /// See http://issues.apache.org/jira/browse/LUCENE-1068
Результат	Lucene.Net.Analysis.Tokenattributes

Dispose() публичный Метод

public Dispose ( ) : void
Результат	void

End() публичный Метод

public End ( ) : void
Результат	void

IncrementToken() публичный Метод

public IncrementToken ( ) : bool
Результат	bool

Reset() публичный Метод

public Reset ( ) : void
Результат	void

Описание свойств

TOKEN_TYPES публичное статическое свойство

String token types that correspond to token type int constants

public static string[] TOKEN_TYPES
Результат	string[]