C# Class Lucene.Net.Analysis.Pattern.PatternTokenizerFactory

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

  • "pattern" is the regular expression.
  • "group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String#split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:

 pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc' 
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
Inheritance: Lucene.Net.Analysis.Util.TokenizerFactory
Exibir arquivo Open project: apache/lucenenet

Protected Properties

Property Type Description
group int
pattern System.Text.RegularExpressions.Regex

Public Methods

Method Description
Create ( Lucene.Net.Util.AttributeSource factory, TextReader input ) : Tokenizer

Split the input using configured pattern

PatternTokenizerFactory ( string>.IDictionary args ) : Lucene.Net.Analysis.Util

Creates a new PatternTokenizerFactory

Method Details

Create() public method

Split the input using configured pattern
public Create ( Lucene.Net.Util.AttributeSource factory, TextReader input ) : Tokenizer
factory Lucene.Net.Util.AttributeSource
input System.IO.TextReader
return Tokenizer

PatternTokenizerFactory() public method

Creates a new PatternTokenizerFactory
public PatternTokenizerFactory ( string>.IDictionary args ) : Lucene.Net.Analysis.Util
args string>.IDictionary
return Lucene.Net.Analysis.Util

Property Details

group protected_oe property

protected int group
return int

pattern protected_oe property

protected Regex,System.Text.RegularExpressions pattern
return System.Text.RegularExpressions.Regex