C# Class org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter

When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines. This is often the case with documents where narrow text columns are used, such as newsletters. In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together. This filter should be used on indexing time only. Example field definition in schema.xml:
 <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.HyphenatedWordsFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> 
Inheritance: TokenFilter
Mostra file Open project: paulirwin/lucene.net Class Usage Examples

Public Methods

Method Description
HyphenatedWordsFilter ( TokenStream @in ) : System.Text

Creates a new HyphenatedWordsFilter

incrementToken ( ) : bool

{@inheritDoc}

reset ( ) : void

{@inheritDoc}

Private Methods

Method Description
unhyphenate ( ) : void

Writes the joined unhyphenated term

Method Details

HyphenatedWordsFilter() public method

Creates a new HyphenatedWordsFilter
public HyphenatedWordsFilter ( TokenStream @in ) : System.Text
@in TokenStream
return System.Text

incrementToken() public method

{@inheritDoc}
public incrementToken ( ) : bool
return bool

reset() public method

{@inheritDoc}
public reset ( ) : void
return void