C# Class Lucene.Net.Analysis.MockTokenizer

Tokenizer for testing.

this tokenizer is a replacement for #WHITESPACE, #SIMPLE, and #KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. this tokenizer has the following behavior:

  • An internal state-machine is used for checking consumer consistency. These checks can be disabled with #setEnableChecks(boolean).
  • For convenience, optionally lowercases terms that it outputs.
Inheritance: Tokenizer
Datei anzeigen Open project: apache/lucenenet Class Usage Examples

Public Properties

Property Type Description
DEFAULT_MAX_TOKEN_LENGTH int
KEYWORD CharacterRunAutomaton
SIMPLE CharacterRunAutomaton
WHITESPACE CharacterRunAutomaton

Public Methods

Method Description
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}

MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)

MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
Reset ( ) : void

Protected Methods

Method Description
IsTokenChar ( int c ) : bool
Normalize ( int c ) : int
ReadChar ( ) : int
ReadCodePoint ( ) : int

Private Methods

Method Description
SetReaderTestPoint ( ) : bool

Method Details

Dispose() public method

public Dispose ( ) : void
return void

End() public method

public End ( ) : void
return void

IncrementToken() public final method

public final IncrementToken ( ) : bool
return bool

IsTokenChar() protected method

protected IsTokenChar ( int c ) : bool
c int
return bool

MockTokenizer() public method

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}
public MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
maxTokenLength int
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)
public MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
maxTokenLength int
return Lucene.Net.Analysis.Tokenattributes

Normalize() protected method

protected Normalize ( int c ) : int
c int
return int

ReadChar() protected method

protected ReadChar ( ) : int
return int

ReadCodePoint() protected method

protected ReadCodePoint ( ) : int
return int

Reset() public method

public Reset ( ) : void
return void

Property Details

DEFAULT_MAX_TOKEN_LENGTH public_oe static_oe property

public static int DEFAULT_MAX_TOKEN_LENGTH
return int

KEYWORD public_oe static_oe property

Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...
public static CharacterRunAutomaton KEYWORD
return CharacterRunAutomaton

SIMPLE public_oe static_oe property

Acts like LetterTokenizer.
public static CharacterRunAutomaton SIMPLE
return CharacterRunAutomaton

WHITESPACE public_oe static_oe property

Acts Similar to WhitespaceTokenizer
public static CharacterRunAutomaton WHITESPACE
return CharacterRunAutomaton