C# (CSharp) Lucene.Net.Analysis.Compound Namespace

Nested Namespaces

Lucene.Net.Analysis.Compound.Hyphenation

Classes

Name Description
CompoundWordTokenFilterBase Base class for decomposition token filters.

You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

  • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
  • As of 4.4, CompoundWordTokenFilterBase doesn't update offsets.

CompoundWordTokenFilterBase.CompoundToken Helper class to hold decompounded token information
DictionaryCompoundWordTokenFilter A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

  • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

DictionaryCompoundWordTokenFilterFactory Factory for DictionaryCompoundWordTokenFilter.
 <fieldType name="text_dictcomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/> </analyzer> </fieldType>
HyphenationCompoundWordTokenFilter A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

You must specify the required Version compatibility when creating CompoundWordTokenFilterBase:

  • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

HyphenationCompoundWordTokenFilterFactory Factory for HyphenationCompoundWordTokenFilter.

This factory accepts the following parameters:

  • hyphenator (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.
  • encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
  • dictionary (optional): dictionary of words. defaults to no dictionary.
  • minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
  • minSubwordSize (optional): minimum length of subwords. defaults to 2.
  • maxSubwordSize (optional): maximum length of subwords. defaults to 15.
  • onlyLongestMatch (optional): if true, adds only the longest matching subword to the stream. defaults to false.

 <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>

TestCompoundWordTokenFilter
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper2
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper3
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper4
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper5
TestCompoundWordTokenFilter.MockRetainAttribute
TestCompoundWordTokenFilter.MockRetainAttributeFilter
TestDictionaryCompoundWordTokenFilterFactory Simple tests to ensure the Dictionary compound filter factory is working.