C# Class Lucene.Net.Analysis.Compound.Hyphenation.HyphenationTree

This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word. This class has been taken from the Apache FOP project (http://xmlgraphics.apache.org/fop/). They have been slightly modified.
Inheritance: TernaryTree, PatternConsumer
Afficher le fichier Open project: apache/lucenenet Class Usage Examples

Protected Properties

Свойство Type Description
classmap TernaryTree
stoplist IList>.IDictionary
vspace Lucene.Net.Analysis.Compound.Hyphenation.ByteVector

Méthodes publiques

Méthode Description
AddClass ( string chargroup ) : void

Add a character class to the tree. It is used by PatternParser PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.

AddException ( string word, List hyphenatedword ) : void

Add an exception to the tree. It is used by PatternParser PatternParser class as callback to store the hyphenation exceptions.

AddPattern ( string pattern, string ivalue ) : void

Add a pattern to the tree. Mainly, to be used by PatternParser PatternParser class as callback to add a pattern to the tree.

FindPattern ( string pat ) : string
Hyphenate ( char w, int offset, int len, int remainCharCount, int pushCharCount ) : Hyphenation

Hyphenate word and return an array of hyphenation points.

Hyphenate ( string word, int remainCharCount, int pushCharCount ) : Hyphenation

Hyphenate word and return a Hyphenation object.

HyphenationTree ( ) : Lucene.Net.Support
LoadPatterns ( FileInfo f ) : void

Read hyphenation patterns from an XML file.

LoadPatterns ( FileInfo f, Encoding encoding ) : void

Read hyphenation patterns from an XML file.

LoadPatterns ( System.Stream source ) : void

Read hyphenation patterns from an XML file.

LoadPatterns ( System.Stream source, Encoding encoding ) : void

Read hyphenation patterns from an XML file.

LoadPatterns ( XmlReader source ) : void
LoadPatterns ( string filename ) : void

Read hyphenation patterns from an XML file.

LoadPatterns ( string filename, Encoding encoding ) : void

Read hyphenation patterns from an XML file.

Méthodes protégées

Méthode Description
GetValues ( int k ) : sbyte[]
HStrCmp ( char s, int si, char t, int ti ) : int

String compare, returns 0 if equal or t is a substring of s

PackValues ( string values ) : int

Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.

SearchPatterns ( char word, int index, sbyte il ) : void

Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:

for(i=0; i<patterns.length; i++) { if ( word.substring(index).startsWidth(patterns[i]) ) update_interletter_values(patterns[i]); }

But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table

UnpackValues ( int k ) : string

Method Details

AddClass() public méthode

Add a character class to the tree. It is used by PatternParser PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.
public AddClass ( string chargroup ) : void
chargroup string
Résultat void

AddException() public méthode

Add an exception to the tree. It is used by PatternParser PatternParser class as callback to store the hyphenation exceptions.
public AddException ( string word, List hyphenatedword ) : void
word string normalized word
hyphenatedword List a vector of alternating strings and /// objects.
Résultat void

AddPattern() public méthode

Add a pattern to the tree. Mainly, to be used by PatternParser PatternParser class as callback to add a pattern to the tree.
public AddPattern ( string pattern, string ivalue ) : void
pattern string the hyphenation pattern
ivalue string interletter weight values indicating the desirability and /// priority of hyphenating at a given point within the pattern. It /// should contain only digit characters. (i.e. '0' to '9').
Résultat void

FindPattern() public méthode

public FindPattern ( string pat ) : string
pat string
Résultat string

GetValues() protected méthode

protected GetValues ( int k ) : sbyte[]
k int
Résultat sbyte[]

HStrCmp() protected méthode

String compare, returns 0 if equal or t is a substring of s
protected HStrCmp ( char s, int si, char t, int ti ) : int
s char
si int
t char
ti int
Résultat int

Hyphenate() public méthode

Hyphenate word and return an array of hyphenation points.
public Hyphenate ( char w, int offset, int len, int remainCharCount, int pushCharCount ) : Hyphenation
w char char array that contains the word
offset int Offset to first character in word
len int Length of word
remainCharCount int Minimum number of characters allowed before the /// hyphenation point.
pushCharCount int Minimum number of characters allowed after the /// hyphenation point.
Résultat Hyphenation

Hyphenate() public méthode

Hyphenate word and return a Hyphenation object.
public Hyphenate ( string word, int remainCharCount, int pushCharCount ) : Hyphenation
word string the word to be hyphenated
remainCharCount int Minimum number of characters allowed before the /// hyphenation point.
pushCharCount int Minimum number of characters allowed after the /// hyphenation point.
Résultat Hyphenation

HyphenationTree() public méthode

public HyphenationTree ( ) : Lucene.Net.Support
Résultat Lucene.Net.Support

LoadPatterns() public méthode

Read hyphenation patterns from an XML file.
In case the parsing fails
public LoadPatterns ( FileInfo f ) : void
f FileInfo the filename
Résultat void

LoadPatterns() public méthode

Read hyphenation patterns from an XML file.
In case the parsing fails
public LoadPatterns ( FileInfo f, Encoding encoding ) : void
f FileInfo the filename
encoding System.Text.Encoding
Résultat void

LoadPatterns() public méthode

Read hyphenation patterns from an XML file.
In case the parsing fails
public LoadPatterns ( System.Stream source ) : void
source System.Stream the InputSource for the file
Résultat void

LoadPatterns() public méthode

Read hyphenation patterns from an XML file.
In case the parsing fails
public LoadPatterns ( System.Stream source, Encoding encoding ) : void
source System.Stream the InputSource for the file
encoding System.Text.Encoding
Résultat void

LoadPatterns() public méthode

public LoadPatterns ( XmlReader source ) : void
source XmlReader
Résultat void

LoadPatterns() public méthode

Read hyphenation patterns from an XML file.
In case the parsing fails
public LoadPatterns ( string filename ) : void
filename string
Résultat void

LoadPatterns() public méthode

Read hyphenation patterns from an XML file.
In case the parsing fails
public LoadPatterns ( string filename, Encoding encoding ) : void
filename string
encoding System.Text.Encoding
Résultat void

PackValues() protected méthode

Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.
protected PackValues ( string values ) : int
values string a string of digits from '0' to '9' representing the /// interletter values.
Résultat int

SearchPatterns() protected méthode

Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:

for(i=0; i<patterns.length; i++) { if ( word.substring(index).startsWidth(patterns[i]) ) update_interletter_values(patterns[i]); }

But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table

protected SearchPatterns ( char word, int index, sbyte il ) : void
word char null terminated word to match
index int start index from word
il sbyte interletter values array to update
Résultat void

UnpackValues() protected méthode

protected UnpackValues ( int k ) : string
k int
Résultat string

Property Details

classmap protected_oe property

This map stores the character classes
protected TernaryTree classmap
Résultat TernaryTree

stoplist protected_oe property

This map stores hyphenation exceptions
protected IDictionary> stoplist
Résultat IList>.IDictionary

vspace protected_oe property

value space: stores the interletter values
protected ByteVector,Lucene.Net.Analysis.Compound.Hyphenation vspace
Résultat Lucene.Net.Analysis.Compound.Hyphenation.ByteVector