C# Class Lucene.Net.Analysis.Pattern.PatternTokenizer

This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

  • "pattern" is the regular expression.
  • "group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String#split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:

 pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc' 
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

Inheritance: Tokenizer
Exibir arquivo Open project: apache/lucenenet Class Usage Examples

Public Methods

Method Description
End ( ) : void
IncrementToken ( ) : bool
PatternTokenizer ( AttributeFactory factory, TextReader input, Regex pattern, int group ) : Lucene.Net.Analysis.Tokenattributes

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

PatternTokenizer ( TextReader input, Regex pattern, int group ) : Lucene.Net.Analysis.Tokenattributes

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

Reset ( ) : void

Private Methods

Method Description
FillBuffer ( StringBuilder sb, TextReader input ) : void

Method Details

End() public method

public End ( ) : void
return void

IncrementToken() public method

public IncrementToken ( ) : bool
return bool

PatternTokenizer() public method

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
public PatternTokenizer ( AttributeFactory factory, TextReader input, Regex pattern, int group ) : Lucene.Net.Analysis.Tokenattributes
factory AttributeFactory
input TextReader
pattern System.Text.RegularExpressions.Regex
group int
return Lucene.Net.Analysis.Tokenattributes

PatternTokenizer() public method

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
public PatternTokenizer ( TextReader input, Regex pattern, int group ) : Lucene.Net.Analysis.Tokenattributes
input TextReader
pattern System.Text.RegularExpressions.Regex
group int
return Lucene.Net.Analysis.Tokenattributes

Reset() public method

public Reset ( ) : void
return void