C# Class org.apache.lucene.analysis.miscellaneous.PatternAnalyzer

Inheritance: Analyzer

显示文件 Open project: paulirwin/lucene.net Class Usage Examples

Public Properties

Property	Type	Description
DEFAULT_ANALYZER	PatternAnalyzer
EXTENDED_ANALYZER	PatternAnalyzer
NON_WORD_PATTERN	Pattern
WHITESPACE_PATTERN	Pattern

Public Methods

Method	Description
Equals ( object other ) : bool	Indicates whether some other object is "equal to" this one.
GetHashCode ( ) : int	Returns a hash code value for the object.
PatternAnalyzer ( System.Version matchVersion, Pattern pattern, bool toLowerCase, CharArraySet stopWords ) : System	Constructs a new instance with the given parameters.
createComponents ( string fieldName, Reader reader ) : TokenStreamComponents	Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to `tokenStream(String, Reader, String)` and is less efficient than `tokenStream(String, Reader, String)`.
createComponents ( string fieldName, Reader reader, string text ) : TokenStreamComponents	Creates a token stream that tokenizes the given string into token terms (aka words).

Private Methods

Method	Description
ToString ( Reader input ) : string	Reads until end-of-stream and returns all read chars, finally closes the stream.
eq ( object o1, object o2 ) : bool	equality where o1 and/or o2 can be null
eqPattern ( Pattern p1, Pattern p2 ) : bool	assumes p1 and p2 are not null

Method Details

Equals() public method

Indicates whether some other object is "equal to" this one.

public Equals ( object other ) : bool
other	object	/// the reference object with which to compare.
return	bool

GetHashCode() public method

Returns a hash code value for the object.

public GetHashCode ( ) : int
return	int

PatternAnalyzer() public method

Constructs a new instance with the given parameters.

public PatternAnalyzer ( System.Version matchVersion, Pattern pattern, bool toLowerCase, CharArraySet stopWords ) : System
matchVersion	System.Version	currently does nothing
pattern	Pattern	/// a regular expression delimiting tokens
toLowerCase	bool	/// if `true` returns tokens after applying /// String.toLowerCase()
stopWords	CharArraySet	/// if non-null, ignores all tokens that are contained in the /// given stop set (after previously having applied toLowerCase() /// if applicable). For example, created via /// and/or /// as in /// `WordlistLoader.getWordSet(new File("samples/fulltext/stopwords.txt")` /// or other stop words /// lists .
return	System

createComponents() public method

Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, Reader, String) and is less efficient than tokenStream(String, Reader, String).

public createComponents ( string fieldName, Reader reader ) : TokenStreamComponents
fieldName	string	/// the name of the field to tokenize (currently ignored).
reader	Reader	/// the reader delivering the text
return	TokenStreamComponents

createComponents() public method

Creates a token stream that tokenizes the given string into token terms (aka words).

public createComponents ( string fieldName, Reader reader, string text ) : TokenStreamComponents
fieldName	string	/// the name of the field to tokenize (currently ignored).
reader	Reader	/// reader (e.g. charfilter) of the original text. can be null.
text	string	/// the string to tokenize
return	TokenStreamComponents

Property Details

DEFAULT_ANALYZER public_oe static_oe property

A lower-casing word analyzer with English stop words (can be shared freely across threads without harm); global per class loader.

public static PatternAnalyzer,org.apache.lucene.analysis.miscellaneous DEFAULT_ANALYZER
return	PatternAnalyzer

EXTENDED_ANALYZER public_oe static_oe property

A lower-casing word analyzer with extended English stop words (can be shared freely across threads without harm); global per class loader. The stop words are borrowed from http://thomas.loc.gov/home/stopwords.html, see http://thomas.loc.gov/home/all.about.inquery.html

public static PatternAnalyzer,org.apache.lucene.analysis.miscellaneous EXTENDED_ANALYZER
return	PatternAnalyzer

NON_WORD_PATTERN public_oe static_oe property

"\\W+"; Divides text at non-letters (NOT Character.isLetter(c))

public static Pattern NON_WORD_PATTERN
return	Pattern

WHITESPACE_PATTERN public_oe static_oe property

"\\s+"; Divides text at whitespaces (Character.isWhitespace(c))

public static Pattern WHITESPACE_PATTERN
return	Pattern