C# Класс org.apache.lucene.analysis.miscellaneous.PatternAnalyzer

Наследование: Analyzer
Показать файл Открыть проект Примеры использования класса

Открытые свойства

Свойство Тип Описание
DEFAULT_ANALYZER PatternAnalyzer
EXTENDED_ANALYZER PatternAnalyzer
NON_WORD_PATTERN Pattern
WHITESPACE_PATTERN Pattern

Открытые методы

Метод Описание
Equals ( object other ) : bool

Indicates whether some other object is "equal to" this one.

GetHashCode ( ) : int

Returns a hash code value for the object.

PatternAnalyzer ( System.Version matchVersion, Pattern pattern, bool toLowerCase, CharArraySet stopWords ) : System

Constructs a new instance with the given parameters.

createComponents ( string fieldName, Reader reader ) : TokenStreamComponents

Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, Reader, String) and is less efficient than tokenStream(String, Reader, String).

createComponents ( string fieldName, Reader reader, string text ) : TokenStreamComponents

Creates a token stream that tokenizes the given string into token terms (aka words).

Приватные методы

Метод Описание
ToString ( Reader input ) : string

Reads until end-of-stream and returns all read chars, finally closes the stream.

eq ( object o1, object o2 ) : bool

equality where o1 and/or o2 can be null

eqPattern ( Pattern p1, Pattern p2 ) : bool

assumes p1 and p2 are not null

Описание методов

Equals() публичный Метод

Indicates whether some other object is "equal to" this one.
public Equals ( object other ) : bool
other object /// the reference object with which to compare.
Результат bool

GetHashCode() публичный Метод

Returns a hash code value for the object.
public GetHashCode ( ) : int
Результат int

PatternAnalyzer() публичный Метод

Constructs a new instance with the given parameters.
public PatternAnalyzer ( System.Version matchVersion, Pattern pattern, bool toLowerCase, CharArraySet stopWords ) : System
matchVersion System.Version currently does nothing
pattern Pattern /// a regular expression delimiting tokens
toLowerCase bool /// if true returns tokens after applying /// String.toLowerCase()
stopWords CharArraySet /// if non-null, ignores all tokens that are contained in the /// given stop set (after previously having applied toLowerCase() /// if applicable). For example, created via /// and/or /// as in /// WordlistLoader.getWordSet(new File("samples/fulltext/stopwords.txt") /// or other stop words /// lists .
Результат System

createComponents() публичный Метод

Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, Reader, String) and is less efficient than tokenStream(String, Reader, String).
public createComponents ( string fieldName, Reader reader ) : TokenStreamComponents
fieldName string /// the name of the field to tokenize (currently ignored).
reader Reader /// the reader delivering the text
Результат TokenStreamComponents

createComponents() публичный Метод

Creates a token stream that tokenizes the given string into token terms (aka words).
public createComponents ( string fieldName, Reader reader, string text ) : TokenStreamComponents
fieldName string /// the name of the field to tokenize (currently ignored).
reader Reader /// reader (e.g. charfilter) of the original text. can be null.
text string /// the string to tokenize
Результат TokenStreamComponents

Описание свойств

DEFAULT_ANALYZER публичное статическое свойство

A lower-casing word analyzer with English stop words (can be shared freely across threads without harm); global per class loader.
public static PatternAnalyzer,org.apache.lucene.analysis.miscellaneous DEFAULT_ANALYZER
Результат PatternAnalyzer

EXTENDED_ANALYZER публичное статическое свойство

A lower-casing word analyzer with extended English stop words (can be shared freely across threads without harm); global per class loader. The stop words are borrowed from http://thomas.loc.gov/home/stopwords.html, see http://thomas.loc.gov/home/all.about.inquery.html
public static PatternAnalyzer,org.apache.lucene.analysis.miscellaneous EXTENDED_ANALYZER
Результат PatternAnalyzer

NON_WORD_PATTERN публичное статическое свойство

"\\W+"; Divides text at non-letters (NOT Character.isLetter(c))
public static Pattern NON_WORD_PATTERN
Результат Pattern

WHITESPACE_PATTERN публичное статическое свойство

"\\s+"; Divides text at whitespaces (Character.isWhitespace(c))
public static Pattern WHITESPACE_PATTERN
Результат Pattern