C# Class org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter

When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines. This is often the case with documents where narrow text columns are used, such as newsletters. In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together. This filter should be used on indexing time only. Example field definition in schema.xml:
 <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.HyphenatedWordsFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> 
Inheritance: TokenFilter
Afficher le fichier Open project: paulirwin/lucene.net Class Usage Examples

Méthodes publiques

Méthode Description
HyphenatedWordsFilter ( TokenStream @in ) : System.Text

Creates a new HyphenatedWordsFilter

incrementToken ( ) : bool

{@inheritDoc}

reset ( ) : void

{@inheritDoc}

Private Methods

Méthode Description
unhyphenate ( ) : void

Writes the joined unhyphenated term

Method Details

HyphenatedWordsFilter() public méthode

Creates a new HyphenatedWordsFilter
public HyphenatedWordsFilter ( TokenStream @in ) : System.Text
@in TokenStream
Résultat System.Text

incrementToken() public méthode

{@inheritDoc}
public incrementToken ( ) : bool
Résultat bool

reset() public méthode

{@inheritDoc}
public reset ( ) : void
Résultat void