C# Класс Lucene.Net.Analysis.Compound.DictionaryCompoundWordTokenFilter

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

  • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

Наследование: CompoundWordTokenFilterBase
Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод Описание
DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary ) : Lucene.Net.Analysis.Util

Creates a new DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch ) : Lucene.Net.Analysis.Util

Creates a new DictionaryCompoundWordTokenFilter

Защищенные методы

Метод Описание
Decompose ( ) : void

Описание методов

Decompose() защищенный Метод

protected Decompose ( ) : void
Результат void

DictionaryCompoundWordTokenFilter() публичный Метод

Creates a new DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary ) : Lucene.Net.Analysis.Util
matchVersion LuceneVersion /// Lucene version to enable correct Unicode 4.0 behavior in the /// dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
input TokenStream /// the to process
dictionary CharArraySet /// the word dictionary to match against.
Результат Lucene.Net.Analysis.Util

DictionaryCompoundWordTokenFilter() публичный Метод

Creates a new DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch ) : Lucene.Net.Analysis.Util
matchVersion LuceneVersion /// Lucene version to enable correct Unicode 4.0 behavior in the /// dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
input TokenStream /// the to process
dictionary CharArraySet /// the word dictionary to match against.
minWordSize int /// only words longer than this get processed
minSubwordSize int /// only subwords longer than this get to the output stream
maxSubwordSize int /// only subwords shorter than this get to the output stream
onlyLongestMatch bool /// Add only the longest matching subword to the stream
Результат Lucene.Net.Analysis.Util