프로퍼티 | 타입 | 설명 | |
---|---|---|---|
substCount | int |
메소드 | 설명 | |
---|---|---|
Substitute ( |
Do some substitutions for the term to reduce overstemming: - Substitute Umlauts with their corresponding vowel: äöü -> aou, "ß" is substituted by "ss" - Substitute a second char of a pair of equal characters with an asterisk: ?? -> ?* - Substitute some common character combinations with a token: sch/ch/ei/ie/ig/st -> $/В§/%/&/#/!
|
메소드 | 설명 | |
---|---|---|
IsStemmable ( String term ) : bool |
Checks if a term could be stemmed.
|
|
Optimize ( |
Does some optimizations on the term. This optimisations are contextual.
|
|
RemoveParticleDenotion ( |
Removes a particle denotion ("ge") from a term.
|
|
Resubstitute ( |
Undoes the changes made by Substitute(). That are character pairs and character combinations. Umlauts will remain as their corresponding vowel, as "?" remains as "ss".
|
|
Stem ( String term ) : String |
Stemms the given term to an unique discriminator.
|
|
Strip ( |
Suffix stripping (stemming) on the current term. The stripping is reduced to the seven "base" suffixes "e", "s", "n", "t", "em", "er" and * "nd", from which all regular suffixes are build of. The simplification causes some overstemming, and way more irregular stems, but still provides unique. discriminators in the most of those cases. The algorithm is context free, except of the length restrictions.
|
protected Substitute ( |
||
buffer | ||
리턴 | void |