C# (CSharp) Lucene.Net.Analysis.Pt Namespace

Classes

Name Description
PortugueseAnalyzer Analyzer for Portuguese.

You must specify the required Version compatibility when creating PortugueseAnalyzer:

  • As of 3.6, PortugueseLightStemFilter is used for less aggressive stemming.

PortugueseAnalyzer.DefaultSetHolder Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class accesses the static final set the first time.;
PortugueseMinimalStemFilter A TokenFilter that applies PortugueseMinimalStemmer to stem Portuguese words.

To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.

PortugueseMinimalStemFilterFactory Factory for PortugueseMinimalStemFilter.
 <fieldType name="text_ptminstem" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PortugueseMinimalStemFilterFactory"/> </analyzer> </fieldType>
PortugueseMinimalStemmer Minimal Stemmer for Portuguese

This follows the "RSLP-S" algorithm presented in: A study on the Use of Stemming for Monolingual Ad-Hoc Portuguese Information Retrieval (Orengo, et al) which is just the plural reduction step of the RSLP algorithm from A Stemming Algorithm for the Portuguese Language, Orengo et al.

PortugueseStemFilter A TokenFilter that applies PortugueseStemmer to stem Portuguese words.

To prevent terms from being stemmed use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.

PortugueseStemFilterFactory Factory for PortugueseStemFilter.
 <fieldType name="text_ptstem" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PortugueseStemFilterFactory"/> </analyzer> </fieldType>
PortugueseStemmer Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa) algorithm. This is sometimes also referred to as the Orengo stemmer.
RSLPStemmerBase Base class for stemmers that use a set of RSLP-like stemming steps.

RSLP (Removedor de Sufixos da Lingua Portuguesa) is an algorithm designed originally for stemming the Portuguese language, described in the paper A Stemming Algorithm for the Portuguese Language, Orengo et. al.

Since this time a plural-only modification (RSLP-S) as well as a modification for the Galician language have been implemented. This class parses a configuration file that describes Steps, where each Step contains a set of Rules.

The general rule format is:

{ "suffix", N, "replacement", { "exception1", "exception2", ...}}
where:
  • suffix is the suffix to be removed (such as "inho").
  • N is the min stem size, where stem is defined as the candidate stem after removing the suffix (but before appending the replacement!)
  • replacement is an optimal string to append after removing the suffix. This can be the empty string.
  • exceptions is an optional list of exceptions, patterns that should not be stemmed. These patterns can be specified as whole word or suffix (ends-with) patterns, depending upon the exceptions format flag in the step header.

A step is an ordered list of rules, with a structure in this format:

{ "name", N, B, { "cond1", "cond2", ... } ... rules ... };
where:
  • name is a name for the step (such as "Plural").
  • N is the min word size. Words that are less than this length bypass the step completely, as an optimization. Note: N can be zero, in this case this implementation will automatically calculate the appropriate value from the underlying rules.
  • B is a "boolean" flag specifying how exceptions in the rules are matched. A value of 1 indicates whole-word pattern matching, a value of 0 indicates that exceptions are actually suffixes and should be matched with ends-with.
  • conds are an optional list of conditions to enter the step at all. If the list is non-empty, then a word must end with one of these conditions or it will bypass the step completely as an optimization.

RSLPStemmerBase.Rule A basic rule, with no exceptions.
RSLPStemmerBase.RuleWithSetExceptions A rule with a set of whole-word exceptions.
RSLPStemmerBase.RuleWithSuffixExceptions A rule with a set of exceptional suffixes.
RSLPStemmerBase.Step A step containing a list of rules.
TestPortugueseLightStemFilter Simple tests for PortugueseLightStemFilter
TestPortugueseLightStemFilter.AnalyzerAnonymousInnerClassHelper
TestPortugueseLightStemFilter.AnalyzerAnonymousInnerClassHelper2
TestPortugueseLightStemFilter.AnalyzerAnonymousInnerClassHelper3
TestPortugueseMinimalStemFilter Simple tests for PortugueseMinimalStemFilter
TestPortugueseMinimalStemFilter.AnalyzerAnonymousInnerClassHelper
TestPortugueseMinimalStemFilter.AnalyzerAnonymousInnerClassHelper2
TestPortugueseMinimalStemFilter.AnalyzerAnonymousInnerClassHelper3
TestPortugueseMinimalStemFilterFactory Simple tests to ensure the Portuguese Minimal stem factory is working.
TestPortugueseStemFilter Simple tests for PortugueseStemFilter
TestPortugueseStemFilter.AnalyzerAnonymousInnerClassHelper
TestPortugueseStemFilter.AnalyzerAnonymousInnerClassHelper2
TestPortugueseStemFilter.AnalyzerAnonymousInnerClassHelper3
TestPortugueseStemFilterFactory Simple tests to ensure the Portuguese stem factory is working.