C# Class Lucene.Net.Analysis.Standard.StandardTokenizer

A grammar-based tokenizer constructed with JFlex

This should be a good tokenizer for most European-language documents:

  • Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
  • Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
  • Recognizes email addresses and internet hostnames as one token.

Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.

Inheritance: Lucene.Net.Analysis.Tokenizer
Afficher le fichier Open project: apache/lucenenet Class Usage Examples

Méthodes publiques

Свойство Type Description
TOKEN_TYPES string[]

Méthodes publiques

Méthode Description
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void
StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, AttributeFactory factory, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31

Creates a new StandardTokenizer with a given org.apache.lucene.util.AttributeSource.AttributeFactory

StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31

Creates a new instance of the StandardTokenizer. Attaches the input to the newly created JFlex scanner.

Private Methods

Méthode Description
Init ( Lucene.Net.Util.LuceneVersion matchVersion ) : void

Method Details

Dispose() public méthode

public Dispose ( ) : void
Résultat void

End() public méthode

public End ( ) : void
Résultat void

IncrementToken() public méthode

public IncrementToken ( ) : bool
Résultat bool

Reset() public méthode

public Reset ( ) : void
Résultat void

StandardTokenizer() public méthode

Creates a new StandardTokenizer with a given org.apache.lucene.util.AttributeSource.AttributeFactory
public StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, AttributeFactory factory, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31
matchVersion Lucene.Net.Util.LuceneVersion
factory AttributeFactory
input System.IO.TextReader
Résultat Lucene.Net.Analysis.Standard.Std31

StandardTokenizer() public méthode

Creates a new instance of the StandardTokenizer. Attaches the input to the newly created JFlex scanner.
public StandardTokenizer ( Lucene.Net.Util.LuceneVersion matchVersion, System.IO.TextReader input ) : Lucene.Net.Analysis.Standard.Std31
matchVersion Lucene.Net.Util.LuceneVersion
input System.IO.TextReader The input reader /// /// See http://issues.apache.org/jira/browse/LUCENE-1068
Résultat Lucene.Net.Analysis.Standard.Std31

Property Details

TOKEN_TYPES public_oe static_oe property

String token types that correspond to token type int constants
public static string[] TOKEN_TYPES
Résultat string[]