C# 클래스 Lucene.Net.Analysis.Compound.DictionaryCompoundWordTokenFilter

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:

As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

상속: CompoundWordTokenFilterBase

파일 보기 프로젝트 열기: apache/lucenenet 1 사용 예제들

공개 메소드들

메소드	설명
DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary ) : Lucene.Net.Analysis.Util	Creates a new DictionaryCompoundWordTokenFilter
DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch ) : Lucene.Net.Analysis.Util	Creates a new DictionaryCompoundWordTokenFilter

보호된 메소드들

메소드	설명
Decompose ( ) : void

메소드 상세

Decompose() 보호된 메소드

protected Decompose ( ) : void
리턴	void

DictionaryCompoundWordTokenFilter() 공개 메소드

Creates a new DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary ) : Lucene.Net.Analysis.Util
matchVersion	LuceneVersion	/// Lucene version to enable correct Unicode 4.0 behavior in the /// dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
input	TokenStream	/// the to process
dictionary	CharArraySet	/// the word dictionary to match against.
리턴	Lucene.Net.Analysis.Util

DictionaryCompoundWordTokenFilter() 공개 메소드

Creates a new DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch ) : Lucene.Net.Analysis.Util
matchVersion	LuceneVersion	/// Lucene version to enable correct Unicode 4.0 behavior in the /// dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
input	TokenStream	/// the to process
dictionary	CharArraySet	/// the word dictionary to match against.
minWordSize	int	/// only words longer than this get processed
minSubwordSize	int	/// only subwords longer than this get to the output stream
maxSubwordSize	int	/// only subwords shorter than this get to the output stream
onlyLongestMatch	bool	/// Add only the longest matching subword to the stream
리턴	Lucene.Net.Analysis.Util