C# 클래스 Lucene.Net.Analysis.De.GermanStemmer

A stemmer for German words.

The algorithm is based on the report "A Fast and Simple Stemming Algorithm for German Words" by Jörg Caumanns (joerg.caumanns at isst.fhg.de).

파일 보기 프로젝트 열기: synhershko/lucene.net 1 사용 예제들

보호된 프로퍼티들

프로퍼티 타입 설명
substCount int

보호된 메소드들

메소드 설명
Substitute ( StringBuilder buffer ) : void

Do some substitutions for the term to reduce overstemming: - Substitute Umlauts with their corresponding vowel: äöü -> aou, "ß" is substituted by "ss" - Substitute a second char of a pair of equal characters with an asterisk: ?? -> ?* - Substitute some common character combinations with a token: sch/ch/ei/ie/ig/st -> $/В§/%/&/#/!

비공개 메소드들

메소드 설명
IsStemmable ( String term ) : bool

Checks if a term could be stemmed.

Optimize ( StringBuilder buffer ) : void

Does some optimizations on the term. This optimisations are contextual.

RemoveParticleDenotion ( StringBuilder buffer ) : void

Removes a particle denotion ("ge") from a term.

Resubstitute ( StringBuilder buffer ) : void

Undoes the changes made by Substitute(). That are character pairs and character combinations. Umlauts will remain as their corresponding vowel, as "?" remains as "ss".

Stem ( String term ) : String

Stemms the given term to an unique discriminator.

Strip ( StringBuilder buffer ) : void

Suffix stripping (stemming) on the current term. The stripping is reduced to the seven "base" suffixes "e", "s", "n", "t", "em", "er" and * "nd", from which all regular suffixes are build of. The simplification causes some overstemming, and way more irregular stems, but still provides unique. discriminators in the most of those cases. The algorithm is context free, except of the length restrictions.

메소드 상세

Substitute() 보호된 메소드

Do some substitutions for the term to reduce overstemming: - Substitute Umlauts with their corresponding vowel: äöü -> aou, "ß" is substituted by "ss" - Substitute a second char of a pair of equal characters with an asterisk: ?? -> ?* - Substitute some common character combinations with a token: sch/ch/ei/ie/ig/st -> $/В§/%/&/#/!
protected Substitute ( StringBuilder buffer ) : void
buffer System.Text.StringBuilder
리턴 void

프로퍼티 상세

substCount 보호되어 있는 프로퍼티

Amount of characters that are removed with Substitute() while stemming.
protected int substCount
리턴 int