C# (CSharp) Lucene.Net.Analysis.Compound Namespace

Nested Namespaces

Lucene.Net.Analysis.Compound.Hyphenation

Classes

Name	Description
CompoundWordTokenFilterBase	Base class for decomposition token filters. You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase: As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries. As of 4.4, CompoundWordTokenFilterBase doesn't update offsets.
CompoundWordTokenFilterBase.CompoundToken	Helper class to hold decompounded token information
DictionaryCompoundWordTokenFilter	A TokenFilter that decomposes compound words found in many Germanic languages. "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this. You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase: As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
DictionaryCompoundWordTokenFilterFactory	Factory for DictionaryCompoundWordTokenFilter. <fieldType name="text_dictcomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/> </analyzer> </fieldType>
HyphenationCompoundWordTokenFilter	A TokenFilter that decomposes compound words found in many Germanic languages. "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this. You must specify the required Version compatibility when creating CompoundWordTokenFilterBase: As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
HyphenationCompoundWordTokenFilterFactory	Factory for HyphenationCompoundWordTokenFilter. This factory accepts the following parameters: `hyphenator` (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/. `encoding` (optional): encoding of the xml hyphenation file. defaults to UTF-8. `dictionary` (optional): dictionary of words. defaults to no dictionary. `minWordSize` (optional): minimal word length that gets decomposed. defaults to 5. `minSubwordSize` (optional): minimum length of subwords. defaults to 2. `maxSubwordSize` (optional): maximum length of subwords. defaults to 15. `onlyLongestMatch` (optional): if true, adds only the longest matching subword to the stream. defaults to false. <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>
TestCompoundWordTokenFilter
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper2
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper3
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper4
TestCompoundWordTokenFilter.AnalyzerAnonymousInnerClassHelper5
TestCompoundWordTokenFilter.MockRetainAttribute
TestCompoundWordTokenFilter.MockRetainAttributeFilter
TestDictionaryCompoundWordTokenFilterFactory	Simple tests to ensure the Dictionary compound filter factory is working.