C# (CSharp) Lucene.Net.Analysis.Pattern Namespace

Classes

Name	Description
PatternCaptureGroupFilterFactory	Factory for PatternCaptureGroupTokenFilter. <fieldType name="text_ptncapturegroup" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.PatternCaptureGroupFilterFactory" pattern="([^a-z])" preserve_original="true"/> </analyzer> </fieldType>
PatternCaptureGroupTokenFilter	CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns. For example, a pattern like: `"(https?://([a-zA-Z\-_0-9.]+))"` when matched against the string "http://www.foo.com/index" would return the tokens "https://www.foo.com" and "www.foo.com". If none of the patterns match, or if preserveOriginal is true, the original token will be preserved. Each pattern is matched as often as it can be, so the pattern `"(...)"`, when matched against `"abcdefghi"` would produce `["abc","def","ghi"]` A camelCaseFilter could be written as: `"([A-Z]{2,})", "(?<![A-Z])([A-Z][a-z]+)", "(?:^\|\\b\|(?<=[0-9_])\|(?<=[A-Z]{2}))([a-z]+)", "([0-9]+)"` plus if #preserveOriginal is true, it would also return `"camelCaseFilter`
PatternReplaceCharFilter	CharFilter that uses a regular expression for the target of replace string. The pattern match will be done in each "block" in char stream. ex1) source="aa bb aa bb", pattern="(aa)\\s+(bb)" replacement="$1#$2" output="aa#bb aa#bb" NOTE: If you produce a phrase that has different length to source string and the field is used for highlighting for a term of the phrase, you will face a trouble. ex2) source="aa123bb", pattern="(aa)\\d+(bb)" replacement="$1 $2" output="aa bb" and you want to search bb and highlight it, you will get highlight snippet="aa1<em>23bb</em>" @since Solr 1.5
PatternReplaceCharFilterFactory	Factory for PatternReplaceCharFilter. <fieldType name="text_ptnreplace" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^a-z])" replacement=""/> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> </fieldType> @since Solr 3.1
PatternReplaceFilter	A TokenFilter which applies a Pattern to each token in the stream, replacing match occurances with the specified replacement string. Note: Depending on the input and the pattern used and the input TokenStream, this TokenFilter may produce Tokens whose text is the empty string.
PatternReplaceFilterFactory	Factory for PatternReplaceFilter. <fieldType name="text_ptnreplace" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> </analyzer> </fieldType>
PatternTokenizer	This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group". "pattern" is the regular expression. "group" says which group to extract into tokens. group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String#split(java.lang.String) Using group >= 0 selects the matching group as the token. For example, if you have: pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc' the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks) NOTE: This Tokenizer does not output tokens that are of zero length.
PatternTokenizerFactory	Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group". "pattern" is the regular expression. "group" says which group to extract into tokens. group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String#split(java.lang.String) Using group >= 0 selects the matching group as the token. For example, if you have: pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc' the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks) NOTE: This Tokenizer does not output tokens that are of zero length. <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
TestPatternCaptureGroupTokenFilter
TestPatternCaptureGroupTokenFilter.AnalyzerAnonymousInnerClassHelper
TestPatternReplaceCharFilter	Tests PatternReplaceCharFilter
TestPatternReplaceCharFilter.AnalyzerAnonymousInnerClassHelper
TestPatternReplaceFilter
TestPatternReplaceFilter.AnalyzerAnonymousInnerClassHelper
TestPatternReplaceFilter.AnalyzerAnonymousInnerClassHelper2
TestPatternReplaceFilter.AnalyzerAnonymousInnerClassHelper3
TestPatternReplaceFilterFactory	Simple tests to ensure this factory is working
TestPatternTokenizer
TestPatternTokenizer.AnalyzerAnonymousInnerClassHelper
TestPatternTokenizer.AnalyzerAnonymousInnerClassHelper2
TestPatternTokenizerFactory	Simple Tests to ensure this factory is working