C# (CSharp) Lucene.Net.Analysis.Ckb Namespace

Classes

Name Description
SoraniAnalyzer Analyzer for Sorani Kurdish.
SoraniAnalyzer.DefaultSetHolder Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class accesses the static final set the first time.;
SoraniNormalizationFilter A TokenFilter that applies SoraniNormalizer to normalize the orthography.
SoraniNormalizationFilterFactory Factory for SoraniNormalizationFilter.
 <fieldType name="text_ckbnormal" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SoraniNormalizationFilterFactory"/> </analyzer> </fieldType>
SoraniNormalizer Normalizes the Unicode representation of Sorani text.

Normalization consists of:

  • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
  • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
  • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
  • Alternate (joining) form of 'h' (06BE) is converted to 0647
  • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
  • Harakat, tatweel, and formatting characters such as directional controls are removed.

SoraniStemmer Light stemmer for Sorani
TestSoraniAnalyzer Test the Sorani analyzer
TestSoraniNormalizationFilter Tests normalization for Sorani (this is more critical than stemming...)
TestSoraniNormalizationFilter.AnalyzerAnonymousInnerClassHelper
TestSoraniStemFilter Test the Sorani Stemmer.
TestSoraniStemFilter.AnalyzerAnonymousInnerClassHelper
TestSoraniStemFilterFactory Simple tests to ensure the Sorani stem factory is working.