C# Class Lucene.Net.Analysis.MockTokenizer

Tokenizer for testing.

this tokenizer is a replacement for #WHITESPACE, #SIMPLE, and #KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. this tokenizer has the following behavior:

  • An internal state-machine is used for checking consumer consistency. These checks can be disabled with #setEnableChecks(boolean).
  • For convenience, optionally lowercases terms that it outputs.
Inheritance: Tokenizer
Afficher le fichier Open project: apache/lucenenet Class Usage Examples

Méthodes publiques

Свойство Type Description
DEFAULT_MAX_TOKEN_LENGTH int
KEYWORD CharacterRunAutomaton
SIMPLE CharacterRunAutomaton
WHITESPACE CharacterRunAutomaton

Méthodes publiques

Méthode Description
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}

MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)

MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
Reset ( ) : void

Méthodes protégées

Méthode Description
IsTokenChar ( int c ) : bool
Normalize ( int c ) : int
ReadChar ( ) : int
ReadCodePoint ( ) : int

Private Methods

Méthode Description
SetReaderTestPoint ( ) : bool

Method Details

Dispose() public méthode

public Dispose ( ) : void
Résultat void

End() public méthode

public End ( ) : void
Résultat void

IncrementToken() public final méthode

public final IncrementToken ( ) : bool
Résultat bool

IsTokenChar() protected méthode

protected IsTokenChar ( int c ) : bool
c int
Résultat bool

MockTokenizer() public méthode

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}
public MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
Résultat Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
Résultat Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
maxTokenLength int
Résultat Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)
public MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
Résultat Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
Résultat Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public méthode

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
maxTokenLength int
Résultat Lucene.Net.Analysis.Tokenattributes

Normalize() protected méthode

protected Normalize ( int c ) : int
c int
Résultat int

ReadChar() protected méthode

protected ReadChar ( ) : int
Résultat int

ReadCodePoint() protected méthode

protected ReadCodePoint ( ) : int
Résultat int

Reset() public méthode

public Reset ( ) : void
Résultat void

Property Details

DEFAULT_MAX_TOKEN_LENGTH public_oe static_oe property

public static int DEFAULT_MAX_TOKEN_LENGTH
Résultat int

KEYWORD public_oe static_oe property

Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...
public static CharacterRunAutomaton KEYWORD
Résultat CharacterRunAutomaton

SIMPLE public_oe static_oe property

Acts like LetterTokenizer.
public static CharacterRunAutomaton SIMPLE
Résultat CharacterRunAutomaton

WHITESPACE public_oe static_oe property

Acts Similar to WhitespaceTokenizer
public static CharacterRunAutomaton WHITESPACE
Résultat CharacterRunAutomaton