C# Class Lucene.Net.Analysis.MockTokenizer

Tokenizer for testing.

this tokenizer is a replacement for #WHITESPACE, #SIMPLE, and #KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. this tokenizer has the following behavior:

An internal state-machine is used for checking consumer consistency. These checks can be disabled with #setEnableChecks(boolean).
For convenience, optionally lowercases terms that it outputs.

Inheritance: Tokenizer

Afficher le fichier Open project: apache/lucenenet Class Usage Examples

Méthodes publiques

Свойство	Type	Description
DEFAULT_MAX_TOKEN_LENGTH	int
KEYWORD	CharacterRunAutomaton
SIMPLE	CharacterRunAutomaton
WHITESPACE	CharacterRunAutomaton

Méthodes publiques

Méthode	Description
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes	Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}
MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes	Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)
MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
Reset ( ) : void

Méthodes protégées

Méthode	Description
IsTokenChar ( int c ) : bool
Normalize ( int c ) : int
ReadChar ( ) : int
ReadCodePoint ( ) : int

Private Methods

Méthode	Description
SetReaderTestPoint ( ) : bool

Method Details

Dispose() public méthode

public Dispose ( ) : void
Résultat	void

End() public méthode

public End ( ) : void
Résultat	void

IncrementToken() public final méthode

public final IncrementToken ( ) : bool
Résultat	bool

IsTokenChar() protected méthode

protected IsTokenChar ( int c ) : bool
c	int
Résultat	bool

MockTokenizer() public méthode

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}

public MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes
factory	AttributeFactory
input	System.IO.TextReader
Résultat	Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
factory	AttributeFactory
input	System.IO.TextReader
runAutomaton	CharacterRunAutomaton
lowerCase	bool
Résultat	Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
factory	AttributeFactory
input	System.IO.TextReader
runAutomaton	CharacterRunAutomaton
lowerCase	bool
maxTokenLength	int
Résultat	Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)

public MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes
input	System.IO.TextReader
Résultat	Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
input	System.IO.TextReader
runAutomaton	CharacterRunAutomaton
lowerCase	bool
Résultat	Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
input	System.IO.TextReader
runAutomaton	CharacterRunAutomaton
lowerCase	bool
maxTokenLength	int
Résultat	Lucene.Net.Analysis.Tokenattributes

Normalize() protected méthode

protected Normalize ( int c ) : int
c	int
Résultat	int

ReadChar() protected méthode

protected ReadChar ( ) : int
Résultat	int

ReadCodePoint() protected méthode

protected ReadCodePoint ( ) : int
Résultat	int

Reset() public méthode

public Reset ( ) : void
Résultat	void

Property Details

DEFAULT_MAX_TOKEN_LENGTH public_oe static_oe property

public static int DEFAULT_MAX_TOKEN_LENGTH
Résultat	int

KEYWORD public_oe static_oe property

Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...

public static CharacterRunAutomaton KEYWORD
Résultat	CharacterRunAutomaton

SIMPLE public_oe static_oe property

Acts like LetterTokenizer.

public static CharacterRunAutomaton SIMPLE
Résultat	CharacterRunAutomaton

WHITESPACE public_oe static_oe property

Acts Similar to WhitespaceTokenizer

public static CharacterRunAutomaton WHITESPACE
Résultat	CharacterRunAutomaton