C# 클래스 Lucene.Net.Analysis.Cjk.CJKBigramFilter

Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

CJK types are set by these tokenizers, but you can also use #CJKBigramFilter(TokenStream, int) to explicitly control which of the CJK scripts are turned into bigrams.

By default, when a CJK character has no adjacent characters to form a bigram, it is output in unigram form. If you want to always output both unigrams and bigrams, set the outputUnigrams flag in CJKBigramFilter#CJKBigramFilter(TokenStream, int, boolean). This can be used for a combined unigram+bigram approach.

In all cases, all non-CJK input is passed thru unmodified.

상속: TokenFilter

파일 보기 프로젝트 열기: apache/lucenenet 1 사용 예제들

Private Properties

프로퍼티	타입	설명
DoNext	bool
FlushBigram	void
FlushUnigram	void
Refill	void

공개 메소드들

메소드	설명
CJKBigramFilter ( TokenStream @in ) : Lucene.Net.Analysis.Standard	Calls {@link CJKBigramFilter#CJKBigramFilter(TokenStream, int) CJKBigramFilter(in, HAN \| HIRAGANA \| KATAKANA \| HANGUL)}
CJKBigramFilter ( TokenStream @in, int flags ) : Lucene.Net.Analysis.Standard	Calls {@link CJKBigramFilter#CJKBigramFilter(TokenStream, int, boolean) CJKBigramFilter(in, flags, false)}
CJKBigramFilter ( TokenStream @in, int flags, bool outputUnigrams ) : Lucene.Net.Analysis.Standard	Create a new CJKBigramFilter, specifying which writing systems should be bigrammed, and whether or not unigrams should also be output.
IncrementToken ( ) : bool
Reset ( ) : void

비공개 메소드들

메소드	설명
DoNext ( ) : bool	looks at next input token, returning false is none is available
FlushBigram ( ) : void	Flushes a bigram token to output from our buffer This is the normal case, e.g. ABC -> AB BC
FlushUnigram ( ) : void	Flushes a unigram token to output from our buffer. This happens when we encounter isolated CJK characters, either the whole CJK string is a single character, or we encounter a CJK character surrounded by space, punctuation, english, etc, but not beside any other CJK.
Refill ( ) : void	refills buffers with new data from the current token.

메소드 상세

CJKBigramFilter() 공개 메소드

Calls {@link CJKBigramFilter#CJKBigramFilter(TokenStream, int) CJKBigramFilter(in, HAN | HIRAGANA | KATAKANA | HANGUL)}

public CJKBigramFilter ( TokenStream @in ) : Lucene.Net.Analysis.Standard
@in	TokenStream
리턴	Lucene.Net.Analysis.Standard

CJKBigramFilter() 공개 메소드

Calls {@link CJKBigramFilter#CJKBigramFilter(TokenStream, int, boolean) CJKBigramFilter(in, flags, false)}

public CJKBigramFilter ( TokenStream @in, int flags ) : Lucene.Net.Analysis.Standard
@in	TokenStream
flags	int
리턴	Lucene.Net.Analysis.Standard

CJKBigramFilter() 공개 메소드

Create a new CJKBigramFilter, specifying which writing systems should be bigrammed, and whether or not unigrams should also be output.

public CJKBigramFilter ( TokenStream @in, int flags, bool outputUnigrams ) : Lucene.Net.Analysis.Standard
@in	TokenStream
flags	int	OR'ed set from , , /// ,
outputUnigrams	bool	true if unigrams for the selected writing systems should also be output. /// when this is false, this is only done when there are no adjacent characters to form /// a bigram.
리턴	Lucene.Net.Analysis.Standard

IncrementToken() 공개 메소드

public IncrementToken ( ) : bool
리턴	bool

Reset() 공개 메소드

public Reset ( ) : void
리턴	void