C# (CSharp) Lucene.Net.Analysis.Ar Namespace

Classes

Name Description
ArabicAnalyzer Analyzer for Arabic.

This analyzer implements light-stemming as specified by: Light Stemming for Arabic Information Retrieval http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf

The analysis package contains three primary components:

  • ArabicNormalizationFilter: Arabic orthographic normalization.
  • ArabicStemFilter: Arabic light stemming
  • Arabic stop words file: a set of default Arabic stop words.

ArabicAnalyzer.DefaultSetHolder Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class accesses the static final set the first time.;
ArabicLetterTokenizer
ArabicLetterTokenizerFactory
ArabicNormalizationFilter A TokenFilter that applies ArabicNormalizer to normalize the orthography.
ArabicNormalizationFilterFactory Factory for ArabicNormalizationFilter.
 <fieldType name="text_arnormal" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ArabicNormalizationFilterFactory"/> </analyzer> </fieldType>
ArabicNormalizer Normalizer for Arabic.

Normalization is done in-place for efficiency, operating on a termbuffer.

Normalization is defined as:

  • Normalization of hamza with alef seat to a bare alef.
  • Normalization of teh marbuta to heh
  • Normalization of dotless yeh (alef maksura) to yeh.
  • Removal of Arabic diacritics (the harakat)
  • Removal of tatweel (stretching character).

ArabicStemFilter A TokenFilter that applies ArabicStemmer to stem Arabic words..

To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.

ArabicStemFilterFactory Factory for ArabicStemFilter.
 <fieldType name="text_arstem" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ArabicNormalizationFilterFactory"/> <filter class="solr.ArabicStemFilterFactory"/> </analyzer> </fieldType>
ArabicStemmer Stemmer for Arabic.

Stemming is done in-place for efficiency, operating on a termbuffer.

Stemming is defined as:

  • Removal of attached definite article, conjunction, and prepositions.
  • Stemming of common suffixes.

TestArabicAnalyzer Test the Arabic Analyzer
TestArabicLetterTokenizer
TestArabicNormalizationFilter Test the Arabic Normalization Filter
TestArabicNormalizationFilter.AnalyzerAnonymousInnerClassHelper
TestArabicStemFilter Test the Arabic Normalization Filter
TestArabicStemFilter.AnalyzerAnonymousInnerClassHelper