C# Class Lucene.Net.Analysis.MockTokenizer

Tokenizer for testing.

this tokenizer is a replacement for #WHITESPACE, #SIMPLE, and #KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. this tokenizer has the following behavior:

  • An internal state-machine is used for checking consumer consistency. These checks can be disabled with #setEnableChecks(boolean).
  • For convenience, optionally lowercases terms that it outputs.
Inheritance: Tokenizer
Show file Open project: apache/lucenenet Class Usage Examples

Public Properties

Property Type Description
DEFAULT_MAX_TOKEN_LENGTH int
KEYWORD CharacterRunAutomaton
SIMPLE CharacterRunAutomaton
WHITESPACE CharacterRunAutomaton

Public Methods

Method Description
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}

MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)

MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
Reset ( ) : void

Protected Methods

Method Description
IsTokenChar ( int c ) : bool
Normalize ( int c ) : int
ReadChar ( ) : int
ReadCodePoint ( ) : int

Private Methods

Method Description
SetReaderTestPoint ( ) : bool

Method Details

Dispose() public method

public Dispose ( ) : void
return void

End() public method

public End ( ) : void
return void

IncrementToken() public final method

public final IncrementToken ( ) : bool
return bool

IsTokenChar() protected method

protected IsTokenChar ( int c ) : bool
c int
return bool

MockTokenizer() public method

Calls {@link #MockTokenizer(Lucene.Net.Util.AttributeSource.AttributeFactory,Reader,CharacterRunAutomaton,boolean) MockTokenizer(AttributeFactory, Reader, WHITESPACE, true)}
public MockTokenizer ( AttributeFactory factory, TextReader input ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( AttributeFactory factory, TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
maxTokenLength int
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

Calls #MockTokenizer(Reader, CharacterRunAutomaton, boolean) MockTokenizer(Reader, WHITESPACE, true)
public MockTokenizer ( TextReader input ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
return Lucene.Net.Analysis.Tokenattributes

MockTokenizer() public method

public MockTokenizer ( TextReader input, CharacterRunAutomaton runAutomaton, bool lowerCase, int maxTokenLength ) : Lucene.Net.Analysis.Tokenattributes
input System.IO.TextReader
runAutomaton CharacterRunAutomaton
lowerCase bool
maxTokenLength int
return Lucene.Net.Analysis.Tokenattributes

Normalize() protected method

protected Normalize ( int c ) : int
c int
return int

ReadChar() protected method

protected ReadChar ( ) : int
return int

ReadCodePoint() protected method

protected ReadCodePoint ( ) : int
return int

Reset() public method

public Reset ( ) : void
return void

Property Details

DEFAULT_MAX_TOKEN_LENGTH public static property

public static int DEFAULT_MAX_TOKEN_LENGTH
return int

KEYWORD public static property

Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...
public static CharacterRunAutomaton KEYWORD
return CharacterRunAutomaton

SIMPLE public static property

Acts like LetterTokenizer.
public static CharacterRunAutomaton SIMPLE
return CharacterRunAutomaton

WHITESPACE public static property

Acts Similar to WhitespaceTokenizer
public static CharacterRunAutomaton WHITESPACE
return CharacterRunAutomaton