Имя | Описание |
---|---|
ASCIIFoldingFilter | This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists. Characters from the following Unicode blocks are converted; however, only those characters with reasonable ASCII alternatives are converted:
|
ASCIIFoldingFilterFactory | Factory for ASCIIFoldingFilter. <fieldType name="text_ascii" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false"/> </analyzer> </fieldType> |
CapitalizationFilter | A filter to apply normal capitalization rules to Tokens. It will make the first letter capital and the rest lower case. This filter is particularly useful to build nice looking facet parameters. This filter is not appropriate if you intend to use a prefix query. |
CapitalizationFilterFactory | Factory for CapitalizationFilter. The factory takes parameters: <fieldType name="text_cptlztn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.CapitalizationFilterFactory" onlyFirstWord="true" keep="java solr lucene" keepIgnoreCase="false" okPrefix="McK McD McA"/> </analyzer> </fieldType>@since solr 1.3 |
CodepointCountFilter | Removes words that are too long or too short from the stream. Note: Length is calculated as the number of Unicode codepoints. |
CodepointCountFilterFactory | Factory for CodepointCountFilter. <fieldType name="text_lngth" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.CodepointCountFilterFactory" min="0" max="1" /> </analyzer> </fieldType> |
EmptyTokenStream | An always exhausted token stream |
HyphenatedWordsFilterFactory | Factory for HyphenatedWordsFilter. <fieldType name="text_hyphn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenatedWordsFilterFactory"/> </analyzer> </fieldType> |
InjectablePrefixAwareTokenFilter | |
KeepWordFilter | A TokenFilter that only keeps tokens with text contained in the required words. This filter behaves like the inverse of StopFilter. @since solr 1.3 |
KeepWordFilterFactory | Factory for KeepWordFilter. <fieldType name="text_keepword" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" ignoreCase="false"/> </analyzer> </fieldType> |
KeywordMarkerFilterFactory | Factory for KeywordMarkerFilter. <fieldType name="text_keyword" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protectedkeyword.txt" pattern="^.+er$" ignoreCase="false"/> </analyzer> </fieldType> |
KeywordRepeatFilterFactory | Factory for KeywordRepeatFilter. Since KeywordRepeatFilter emits two tokens for every input token, and any tokens that aren't transformed later in the analysis chain will be in the document twice. Therefore, consider adding RemoveDuplicatesTokenFilterFactory later in the analysis chain. |
LengthFilter | Removes words that are too long or too short from the stream. Note: Length is calculated as the number of UTF-16 code units. |
LengthFilterFactory | Factory for LengthFilter. <fieldType name="text_lngth" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LengthFilterFactory" min="0" max="1" /> </analyzer> </fieldType> |
LimitTokenCountFilter | This TokenFilter limits the number of tokens while indexing. It is a replacement for the maximum field length setting inside org.apache.lucene.index.IndexWriter. By default, this filter ignores any tokens in the wrapped {@code TokenStream} once the limit has been reached, which can result in {@code reset()} being called prior to {@code incrementToken()} returning {@code false}. For most {@code TokenStream} implementations this should be acceptable, and faster then consuming the full stream. If you are wrapping a {@code TokenStream} which requires that the full stream of tokens be exhausted in order to function properly, use the #LimitTokenCountFilter(TokenStream,int,boolean) consumeAllTokens option. |
LimitTokenPositionFilter | This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit. By default, this filter ignores any tokens in the wrapped {@code TokenStream} once the limit has been exceeded, which can result in {@code reset()} being called prior to {@code incrementToken()} returning {@code false}. For most {@code TokenStream} implementations this should be acceptable, and faster then consuming the full stream. If you are wrapping a {@code TokenStream} which requires that the full stream of tokens be exhausted in order to function properly, use the #LimitTokenPositionFilter(TokenStream,int,boolean) consumeAllTokens option. |
LimitTokenPositionFilterFactory | Factory for LimitTokenPositionFilter. <fieldType name="text_limit_pos" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LimitTokenPositionFilterFactory" maxTokenPosition="3" consumeAllTokens="false" /> </analyzer> </fieldType> The {@code consumeAllTokens} property is optional and defaults to {@code false}. See LimitTokenPositionFilter for an explanation of its use. |
Lucene47WordDelimiterFilter | |
Lucene47WordDelimiterFilter.WordDelimiterConcatenation | A WDF concatenated 'run' |
PatternAnalyzer | |
PatternAnalyzer.FastStringReader | |
PatternAnalyzer.FastStringTokenizer | |
PatternAnalyzer.RegexTokenizer | |
PatternAnalyzerTest | Verifies the behavior of PatternAnalyzer. |
PerFieldAnalyzerWrapper | This analyzer is used to facilitate scenarios where different fields Require different analysis techniques. Use the Map argument in #PerFieldAnalyzerWrapper(Analyzer, java.util.Map) to add non-default analyzers for fields. Example usage: {@code Map In this example, StandardAnalyzer will be used for all fields except "firstname" and "lastname", for which KeywordAnalyzer will be used. A PerFieldAnalyzerWrapper can be used like any other analyzer, for both indexing and query parsing. |
PrefixAndSuffixAwareTokenFilter | Links two PrefixAwareTokenFilter. NOTE: This filter might not behave correctly if used with custom Attributes, i.e. Attributes other than the ones located in org.apache.lucene.analysis.tokenattributes. |
PrefixAndSuffixAwareTokenFilter.PrefixAwareTokenFilterAnonymousInnerClassHelper | |
PrefixAndSuffixAwareTokenFilter.PrefixAwareTokenFilterAnonymousInnerClassHelper2 | |
PrefixAwareTokenFilter | Joins two token streams and leaves the last token of the first stream available to be used when updating the token values in the second stream based on that token. The default implementation adds last prefix token end offset to the suffix token start and end offsets. NOTE: This filter might not behave correctly if used with custom Attributes, i.e. Attributes other than the ones located in Lucene.Net.Analysis.TokenAttributes. |
RemoveDuplicatesTokenFilter | A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream. |
RemoveDuplicatesTokenFilterFactory | Factory for RemoveDuplicatesTokenFilter. <fieldType name="text_rmdup" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> |
ScandinavianFoldingFilter | This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o. It also discriminate against use of double vowels aa, ae, ao, oe and oo, leaving just the first one. It's is a semantically more destructive solution than ScandinavianNormalizationFilter but can in addition help with matching raksmorgas as räksmörgås. blåbærsyltetøj == blåbärsyltetöj == blaabaarsyltetoej == blabarsyltetoj räksmörgås == ræksmørgås == ræksmörgaos == raeksmoergaas == raksmorgas Background: Swedish åäö are in fact the same letters as Norwegian and Danish åæø and thus interchangeable when used between these languages. They are however folded differently when people type them on a keyboard lacking these characters. In that situation almost all Swedish people use a, a, o instead of å, ä, ö. Norwegians and Danes on the other hand usually type aa, ae and oe instead of å, æ and ø. Some do however use a, a, o, oo, ao and sometimes permutations of everything above. This filter solves that mismatch problem, but might also cause new. |
ScandinavianFoldingFilterFactory | Factory for ScandinavianFoldingFilter. <fieldType name="text_scandfold" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ScandinavianFoldingFilterFactory"/> </analyzer> </fieldType> |
ScandinavianNormalizationFilter | This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ. It's a semantically less destructive solution than ScandinavianFoldingFilter, most useful when a person with a Norwegian or Danish keyboard queries a Swedish index and vice versa. This filter does not the common Swedish folds of å and ä to a nor ö to o. blåbærsyltetøj == blåbärsyltetöj == blaabaarsyltetoej but not blabarsyltetoj räksmörgås == ræksmørgås == ræksmörgaos == raeksmoergaas but not raksmorgas |
ScandinavianNormalizationFilterFactory | Factory for org.apache.lucene.analysis.miscellaneous.ScandinavianNormalizationFilter. <fieldType name="text_scandnorm" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ScandinavianNormalizationFilterFactory"/> </analyzer> </fieldType> |
SetKeywordMarkerFilter | Marks terms as keywords via the KeywordAttribute. Each token contained in the provided set is marked as a keyword by setting KeywordAttribute#setKeyword(boolean) to true . |
SingleTokenTokenStream | A TokenStream containing a single token. |
StemmerOverrideFilter | Provides the ability to override any KeywordAttribute aware stemmer with custom dictionary-based stemming. |
StemmerOverrideFilter.Builder | This builder builds an FST for the StemmerOverrideFilter |
StemmerOverrideFilter.StemmerOverrideMap | A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for StemmerOverrideFilter |
StemmerOverrideFilterFactory | Factory for StemmerOverrideFilter. <fieldType name="text_dicstem" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="dictionary.txt" ignoreCase="false"/> </analyzer> </fieldType> |
TestASCIIFoldingFilter | |
TestASCIIFoldingFilter.AnalyzerAnonymousInnerClassHelper | |
TestASCIIFoldingFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestCapitalizationFilter | Tests CapitalizationFilter |
TestCapitalizationFilter.AnalyzerAnonymousInnerClassHelper | |
TestCapitalizationFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestCapitalizationFilterFactory | |
TestCodepointCountFilter | |
TestCodepointCountFilter.AnalyzerAnonymousInnerClassHelper | |
TestCodepointCountFilterFactory | |
TestEmptyTokenStream | |
TestHyphenatedWordsFilter | HyphenatedWordsFilter test |
TestHyphenatedWordsFilter.AnalyzerAnonymousInnerClassHelper | |
TestHyphenatedWordsFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestKeepFilterFactory | |
TestKeepWordFilter | Test KeepWordFilter |
TestKeepWordFilter.AnalyzerAnonymousInnerClassHelper | |
TestKeywordMarkerFilter | Testcase for KeywordMarkerFilter |
TestKeywordMarkerFilter.LowerCaseFilterMock | |
TestLengthFilter | |
TestLengthFilter.AnalyzerAnonymousInnerClassHelper | |
TestLengthFilterFactory | |
TestLimitTokenCountAnalyzer_ | |
TestLimitTokenCountFilter | |
TestLimitTokenCountFilterFactory | |
TestLimitTokenPositionFilter | |
TestLimitTokenPositionFilter.AnalyzerAnonymousInnerClassHelper | |
TestLimitTokenPositionFilterFactory | |
TestLucene47WordDelimiterFilter | |
TestLucene47WordDelimiterFilter.AnalyzerAnonymousInnerClassHelper | |
TestLucene47WordDelimiterFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestLucene47WordDelimiterFilter.AnalyzerAnonymousInnerClassHelper3 | |
TestLucene47WordDelimiterFilter.AnalyzerAnonymousInnerClassHelper4 | |
TestLucene47WordDelimiterFilter.AnalyzerAnonymousInnerClassHelper5 | |
TestLucene47WordDelimiterFilter.LargePosIncTokenFilter | |
TestPerFieldAnalyzerWrapper | |
TestPerFieldAnalyzerWrapper.AnalyzerAnonymousInnerClassHelper | |
TestPrefixAndSuffixAwareTokenFilter | |
TestPrefixAwareTokenFilter | |
TestRemoveDuplicatesTokenFilter | |
TestRemoveDuplicatesTokenFilter.AnalyzerAnonymousInnerClassHelper | |
TestRemoveDuplicatesTokenFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestRemoveDuplicatesTokenFilter.TokenStreamAnonymousInnerClassHelper | |
TestRemoveDuplicatesTokenFilterFactory | Simple tests to ensure this factory is working |
TestScandinavianFoldingFilterFactory | Copyright 2004 The Apache Software Foundation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. |
TestScandinavianNormalizationFilter | |
TestScandinavianNormalizationFilter.AnalyzerAnonymousInnerClassHelper | |
TestScandinavianNormalizationFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestScandinavianNormalizationFilterFactory | Copyright 2004 The Apache Software Foundation Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. |
TestStemmerOverrideFilter | |
TestStemmerOverrideFilterFactory | Simple tests to ensure the stemmer override filter factory is working. |
TestTrimFilter | |
TestTrimFilter.AnalyzerAnonymousInnerClassHelper | |
TestTrimFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestTrimFilter.AnalyzerAnonymousInnerClassHelper3 | |
TestTrimFilter.IterTokenStream | |
TestTrimFilterFactory | Simple tests to ensure this factory is working |
TestTruncateTokenFilterFactory | Simple tests to ensure the simple truncation filter factory is working. |
TestWordDelimiterFilter | New WordDelimiterFilter tests... most of the tests are in ConvertedLegacyTest TODO: should explicitly test things like protWords and not rely on the factory tests in Solr. |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper2 | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper3 | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper4 | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper5 | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper6 | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper7 | |
TestWordDelimiterFilter.AnalyzerAnonymousInnerClassHelper8 | |
TestWordDelimiterFilter.LargePosIncTokenFilter | |
TrimFilter | Trims leading and trailing whitespace from Tokens in the stream. As of Lucene 4.4, this filter does not support updateOffsets=true anymore as it can lead to broken token streams. |
TrimFilterFactory | Factory for TrimFilter. <fieldType name="text_trm" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.NGramTokenizerFactory"/> <filter class="solr.TrimFilterFactory" /> </analyzer> </fieldType> |
TruncateTokenFilter | A token filter for truncating the terms into a specific length. Fixed prefix truncation, as a stemming method, produces good results on Turkish language. It is reported that F5, using first 5 characters, produced best results in Information Retrieval on Turkish Texts |
WordDelimiterFilter | Splits words into subwords and performs optional transformations on subword groups. Words are split into subwords with the following rules:
|
WordDelimiterFilter.OffsetSorter | |
WordDelimiterFilter.WordDelimiterConcatenation | A WDF concatenated 'run' |
WordDelimiterFilterFactory | Factory for WordDelimiterFilter. <fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" protected="protectedword.txt" preserveOriginal="0" splitOnNumerics="1" splitOnCaseChange="1" catenateWords="0" catenateNumbers="0" catenateAll="0" generateWordParts="1" generateNumberParts="1" stemEnglishPossessive="1" types="wdfftypes.txt" /> </analyzer> </fieldType> |
WordDelimiterIterator | A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterFilter rules. @lucene.internal |