C# Class Lucene.Net.Analysis.De.GermanStemmer

A stemmer for German words.

The algorithm is based on the report "A Fast and Simple Stemming Algorithm for German Words" by Jörg Caumanns (joerg.caumanns at isst.fhg.de).

Mostra file Open project: synhershko/lucene.net Class Usage Examples

Protected Properties

Property Type Description
substCount int

Protected Methods

Method Description
Substitute ( StringBuilder buffer ) : void

Do some substitutions for the term to reduce overstemming: - Substitute Umlauts with their corresponding vowel: äöü -> aou, "ß" is substituted by "ss" - Substitute a second char of a pair of equal characters with an asterisk: ?? -> ?* - Substitute some common character combinations with a token: sch/ch/ei/ie/ig/st -> $/В§/%/&/#/!

Private Methods

Method Description
IsStemmable ( String term ) : bool

Checks if a term could be stemmed.

Optimize ( StringBuilder buffer ) : void

Does some optimizations on the term. This optimisations are contextual.

RemoveParticleDenotion ( StringBuilder buffer ) : void

Removes a particle denotion ("ge") from a term.

Resubstitute ( StringBuilder buffer ) : void

Undoes the changes made by Substitute(). That are character pairs and character combinations. Umlauts will remain as their corresponding vowel, as "?" remains as "ss".

Stem ( String term ) : String

Stemms the given term to an unique discriminator.

Strip ( StringBuilder buffer ) : void

Suffix stripping (stemming) on the current term. The stripping is reduced to the seven "base" suffixes "e", "s", "n", "t", "em", "er" and * "nd", from which all regular suffixes are build of. The simplification causes some overstemming, and way more irregular stems, but still provides unique. discriminators in the most of those cases. The algorithm is context free, except of the length restrictions.

Method Details

Substitute() protected method

Do some substitutions for the term to reduce overstemming: - Substitute Umlauts with their corresponding vowel: äöü -> aou, "ß" is substituted by "ss" - Substitute a second char of a pair of equal characters with an asterisk: ?? -> ?* - Substitute some common character combinations with a token: sch/ch/ei/ie/ig/st -> $/В§/%/&/#/!
protected Substitute ( StringBuilder buffer ) : void
buffer System.Text.StringBuilder
return void

Property Details

substCount protected_oe property

Amount of characters that are removed with Substitute() while stemming.
protected int substCount
return int