Factory for
HyphenationCompoundWordTokenFilter.
This factory accepts the following parameters:
hyphenator
(mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/. encoding
(optional): encoding of the xml hyphenation file. defaults to UTF-8. dictionary
(optional): dictionary of words. defaults to no dictionary. minWordSize
(optional): minimal word length that gets decomposed. defaults to 5. minSubwordSize
(optional): minimum length of subwords. defaults to 2. maxSubwordSize
(optional): maximum length of subwords. defaults to 15. onlyLongestMatch
(optional): if true, adds only the longest matching subword to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>