C# 클래스 org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter

When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines. This is often the case with documents where narrow text columns are used, such as newsletters. In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together. This filter should be used on indexing time only. Example field definition in schema.xml:
 <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.HyphenatedWordsFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> 
상속: TokenFilter
파일 보기 프로젝트 열기: paulirwin/lucene.net 1 사용 예제들

공개 메소드들

메소드 설명
HyphenatedWordsFilter ( TokenStream @in ) : System.Text

Creates a new HyphenatedWordsFilter

incrementToken ( ) : bool

{@inheritDoc}

reset ( ) : void

{@inheritDoc}

비공개 메소드들

메소드 설명
unhyphenate ( ) : void

Writes the joined unhyphenated term

메소드 상세

HyphenatedWordsFilter() 공개 메소드

Creates a new HyphenatedWordsFilter
public HyphenatedWordsFilter ( TokenStream @in ) : System.Text
@in TokenStream
리턴 System.Text

incrementToken() 공개 메소드

{@inheritDoc}
public incrementToken ( ) : bool
리턴 bool

reset() 공개 메소드

{@inheritDoc}
public reset ( ) : void
리턴 void