C# 클래스 Lucene.Net.Analysis.CommonGrams.CommonGramsFilter

상속: TokenFilter

파일 보기 프로젝트 열기: apache/lucenenet 1 사용 예제들

Private Properties

프로퍼티	타입	설명
GramToken	void
SaveTermBuffer	void

공개 메소드들

메소드	설명
CommonGramsFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet commonWords ) : Lucene.Net.Analysis.Tokenattributes	Construct a token stream filtering the given input using a Set of common words to create bigrams. Outputs both unigrams with position increment and bigrams with position increment 0 type=gram where one or both of the words in a potential bigram are in the set of common words .
IncrementToken ( ) : bool	Inserts bigrams for common words into a token stream. For each input token, output the token. If the token and/or the following token are in the list of common words also output a bigram with position increment 0 and type="gram" TODO:Consider adding an option to not emit unigram stopwords as in CDL XTF BigramStopFilter, CommonGramsQueryFilter would need to be changed to work with this. TODO: Consider optimizing for the case of three commongrams i.e "man of the year" normally produces 3 bigrams: "man-of", "of-the", "the-year" but with proper management of positions we could eliminate the middle bigram "of-the"and save a disk seek and a whole set of position lookups.
Reset ( ) : void	{@inheritDoc}

비공개 메소드들

메소드	설명
GramToken ( ) : void	Constructs a compound token.
SaveTermBuffer ( ) : void	Saves this information to form the left part of a gram

메소드 상세

CommonGramsFilter() 공개 메소드

Construct a token stream filtering the given input using a Set of common words to create bigrams. Outputs both unigrams with position increment and bigrams with position increment 0 type=gram where one or both of the words in a potential bigram are in the set of common words .

public CommonGramsFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet commonWords ) : Lucene.Net.Analysis.Tokenattributes
matchVersion	LuceneVersion
input	TokenStream	TokenStream input in filter chain
commonWords	CharArraySet	The set of common words.
리턴	Lucene.Net.Analysis.Tokenattributes

IncrementToken() 공개 메소드

Inserts bigrams for common words into a token stream. For each input token, output the token. If the token and/or the following token are in the list of common words also output a bigram with position increment 0 and type="gram" TODO:Consider adding an option to not emit unigram stopwords as in CDL XTF BigramStopFilter, CommonGramsQueryFilter would need to be changed to work with this. TODO: Consider optimizing for the case of three commongrams i.e "man of the year" normally produces 3 bigrams: "man-of", "of-the", "the-year" but with proper management of positions we could eliminate the middle bigram "of-the"and save a disk seek and a whole set of position lookups.

public IncrementToken ( ) : bool
리턴	bool

Reset() 공개 메소드

{@inheritDoc}

public Reset ( ) : void
리턴	void