C# Class Lucene.Net.Analysis.Core.LetterTokenizer

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required LuceneVersion compatibility when creating LetterTokenizer:

Afficher le fichier Open project: apache/lucenenet Class Usage Examples

Méthodes publiques

Méthode Description
LetterTokenizer ( LuceneVersion matchVersion, Lucene.Net.Util.AttributeSource factory, TextReader @in ) : System.IO

Construct a new LetterTokenizer using a given org.apache.lucene.util.AttributeSource.AttributeFactory.

LetterTokenizer ( LuceneVersion matchVersion, TextReader @in ) : System.IO

Construct a new LetterTokenizer.

Méthodes protégées

Méthode Description
IsTokenChar ( int c ) : bool

Collects only characters which satisfy Character#isLetter(int).

Method Details

IsTokenChar() protected méthode

Collects only characters which satisfy Character#isLetter(int).
protected IsTokenChar ( int c ) : bool
c int
Résultat bool

LetterTokenizer() public méthode

Construct a new LetterTokenizer using a given org.apache.lucene.util.AttributeSource.AttributeFactory.
public LetterTokenizer ( LuceneVersion matchVersion, Lucene.Net.Util.AttributeSource factory, TextReader @in ) : System.IO
matchVersion LuceneVersion /// Lucene version to match See above"/>
factory Lucene.Net.Util.AttributeSource /// the attribute factory to use for this
@in System.IO.TextReader
Résultat System.IO

LetterTokenizer() public méthode

Construct a new LetterTokenizer.
public LetterTokenizer ( LuceneVersion matchVersion, TextReader @in ) : System.IO
matchVersion LuceneVersion /// Lucene version to match See above"/>
@in System.IO.TextReader
Résultat System.IO