C# Class Lucene.Net.Analysis.Compound.HyphenationCompoundWordTokenFilter

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

You must specify the required Version compatibility when creating CompoundWordTokenFilterBase:

  • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

Inheritance: CompoundWordTokenFilterBase
Exibir arquivo Open project: apache/lucenenet Class Usage Examples

Public Methods

Method Description
GetHyphenationTree ( FileInfo hyphenationFile ) : HyphenationTree

Create a hyphenator tree

GetHyphenationTree ( FileInfo hyphenationFile, Encoding encoding ) : HyphenationTree

Create a hyphenator tree

GetHyphenationTree ( Stream hyphenationSource ) : HyphenationTree

Create a hyphenator tree

GetHyphenationTree ( Stream hyphenationSource, Encoding encoding ) : HyphenationTree

Create a hyphenator tree

GetHyphenationTree ( string hyphenationFilename ) : HyphenationTree

Create a hyphenator tree

GetHyphenationTree ( string hyphenationFilename, Encoding encoding ) : HyphenationTree

Create a hyphenator tree

HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator ) : Lucene.Net.Analysis.Compound.Hyphenation

Create a HyphenationCompoundWordTokenFilter with no dictionary.

Calls {@link #HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, int, int, int) HyphenationCompoundWordTokenFilter(matchVersion, input, hyphenator, DEFAULT_MIN_WORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MAX_SUBWORD_SIZE }

HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary ) : Lucene.Net.Analysis.Compound.Hyphenation

Creates a new HyphenationCompoundWordTokenFilter instance.

HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch ) : Lucene.Net.Analysis.Compound.Hyphenation

Creates a new HyphenationCompoundWordTokenFilter instance.

HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize ) : Lucene.Net.Analysis.Compound.Hyphenation

Create a HyphenationCompoundWordTokenFilter with no dictionary.

Calls {@link #HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, CharArraySet, int, int, int, boolean) HyphenationCompoundWordTokenFilter(matchVersion, input, hyphenator, null, minWordSize, minSubwordSize, maxSubwordSize }

Protected Methods

Method Description
Decompose ( ) : void

Method Details

Decompose() protected method

protected Decompose ( ) : void
return void

GetHyphenationTree() public static method

Create a hyphenator tree
If there is a low-level I/O error.
public static GetHyphenationTree ( FileInfo hyphenationFile ) : HyphenationTree
hyphenationFile System.IO.FileInfo the file of the XML grammar to load
return org.apache.lucene.analysis.compound.hyphenation.HyphenationTree

GetHyphenationTree() public static method

Create a hyphenator tree
If there is a low-level I/O error.
public static GetHyphenationTree ( FileInfo hyphenationFile, Encoding encoding ) : HyphenationTree
hyphenationFile System.IO.FileInfo the file of the XML grammar to load
encoding Encoding
return org.apache.lucene.analysis.compound.hyphenation.HyphenationTree

GetHyphenationTree() public static method

Create a hyphenator tree
If there is a low-level I/O error.
public static GetHyphenationTree ( Stream hyphenationSource ) : HyphenationTree
hyphenationSource System.IO.Stream the InputSource pointing to the XML grammar
return org.apache.lucene.analysis.compound.hyphenation.HyphenationTree

GetHyphenationTree() public static method

Create a hyphenator tree
If there is a low-level I/O error.
public static GetHyphenationTree ( Stream hyphenationSource, Encoding encoding ) : HyphenationTree
hyphenationSource System.IO.Stream the InputSource pointing to the XML grammar
encoding Encoding
return org.apache.lucene.analysis.compound.hyphenation.HyphenationTree

GetHyphenationTree() public static method

Create a hyphenator tree
If there is a low-level I/O error.
public static GetHyphenationTree ( string hyphenationFilename ) : HyphenationTree
hyphenationFilename string the filename of the XML grammar to load
return org.apache.lucene.analysis.compound.hyphenation.HyphenationTree

GetHyphenationTree() public static method

Create a hyphenator tree
If there is a low-level I/O error.
public static GetHyphenationTree ( string hyphenationFilename, Encoding encoding ) : HyphenationTree
hyphenationFilename string the filename of the XML grammar to load
encoding Encoding
return org.apache.lucene.analysis.compound.hyphenation.HyphenationTree

HyphenationCompoundWordTokenFilter() public method

Create a HyphenationCompoundWordTokenFilter with no dictionary.

Calls {@link #HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, int, int, int) HyphenationCompoundWordTokenFilter(matchVersion, input, hyphenator, DEFAULT_MIN_WORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MAX_SUBWORD_SIZE }

public HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator ) : Lucene.Net.Analysis.Compound.Hyphenation
matchVersion LuceneVersion
input TokenStream
hyphenator org.apache.lucene.analysis.compound.hyphenation.HyphenationTree
return Lucene.Net.Analysis.Compound.Hyphenation

HyphenationCompoundWordTokenFilter() public method

Creates a new HyphenationCompoundWordTokenFilter instance.
public HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary ) : Lucene.Net.Analysis.Compound.Hyphenation
matchVersion LuceneVersion /// Lucene version to enable correct Unicode 4.0 behavior in the /// dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
input TokenStream /// the to process
hyphenator org.apache.lucene.analysis.compound.hyphenation.HyphenationTree /// the hyphenation pattern tree to use for hyphenation
dictionary CharArraySet /// the word dictionary to match against.
return Lucene.Net.Analysis.Compound.Hyphenation

HyphenationCompoundWordTokenFilter() public method

Creates a new HyphenationCompoundWordTokenFilter instance.
public HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, bool onlyLongestMatch ) : Lucene.Net.Analysis.Compound.Hyphenation
matchVersion LuceneVersion /// Lucene version to enable correct Unicode 4.0 behavior in the /// dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.
input TokenStream /// the to process
hyphenator org.apache.lucene.analysis.compound.hyphenation.HyphenationTree /// the hyphenation pattern tree to use for hyphenation
dictionary CharArraySet /// the word dictionary to match against.
minWordSize int /// only words longer than this get processed
minSubwordSize int /// only subwords longer than this get to the output stream
maxSubwordSize int /// only subwords shorter than this get to the output stream
onlyLongestMatch bool /// Add only the longest matching subword to the stream
return Lucene.Net.Analysis.Compound.Hyphenation

HyphenationCompoundWordTokenFilter() public method

Create a HyphenationCompoundWordTokenFilter with no dictionary.

Calls {@link #HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, CharArraySet, int, int, int, boolean) HyphenationCompoundWordTokenFilter(matchVersion, input, hyphenator, null, minWordSize, minSubwordSize, maxSubwordSize }

public HyphenationCompoundWordTokenFilter ( LuceneVersion matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize ) : Lucene.Net.Analysis.Compound.Hyphenation
matchVersion LuceneVersion
input TokenStream
hyphenator org.apache.lucene.analysis.compound.hyphenation.HyphenationTree
minWordSize int
minSubwordSize int
maxSubwordSize int
return Lucene.Net.Analysis.Compound.Hyphenation