C# Class org.apache.lucene.analysis.miscellaneous.PatternAnalyzer

Inheritance: Analyzer
Mostra file Open project: paulirwin/lucene.net Class Usage Examples

Public Properties

Property Type Description
DEFAULT_ANALYZER PatternAnalyzer
EXTENDED_ANALYZER PatternAnalyzer
NON_WORD_PATTERN Pattern
WHITESPACE_PATTERN Pattern

Public Methods

Method Description
Equals ( object other ) : bool

Indicates whether some other object is "equal to" this one.

GetHashCode ( ) : int

Returns a hash code value for the object.

PatternAnalyzer ( System.Version matchVersion, Pattern pattern, bool toLowerCase, CharArraySet stopWords ) : System

Constructs a new instance with the given parameters.

createComponents ( string fieldName, Reader reader ) : TokenStreamComponents

Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, Reader, String) and is less efficient than tokenStream(String, Reader, String).

createComponents ( string fieldName, Reader reader, string text ) : TokenStreamComponents

Creates a token stream that tokenizes the given string into token terms (aka words).

Private Methods

Method Description
ToString ( Reader input ) : string

Reads until end-of-stream and returns all read chars, finally closes the stream.

eq ( object o1, object o2 ) : bool

equality where o1 and/or o2 can be null

eqPattern ( Pattern p1, Pattern p2 ) : bool

assumes p1 and p2 are not null

Method Details

Equals() public method

Indicates whether some other object is "equal to" this one.
public Equals ( object other ) : bool
other object /// the reference object with which to compare.
return bool

GetHashCode() public method

Returns a hash code value for the object.
public GetHashCode ( ) : int
return int

PatternAnalyzer() public method

Constructs a new instance with the given parameters.
public PatternAnalyzer ( System.Version matchVersion, Pattern pattern, bool toLowerCase, CharArraySet stopWords ) : System
matchVersion System.Version currently does nothing
pattern Pattern /// a regular expression delimiting tokens
toLowerCase bool /// if true returns tokens after applying /// String.toLowerCase()
stopWords CharArraySet /// if non-null, ignores all tokens that are contained in the /// given stop set (after previously having applied toLowerCase() /// if applicable). For example, created via /// and/or /// as in /// WordlistLoader.getWordSet(new File("samples/fulltext/stopwords.txt") /// or other stop words /// lists .
return System

createComponents() public method

Creates a token stream that tokenizes all the text in the given Reader; This implementation forwards to tokenStream(String, Reader, String) and is less efficient than tokenStream(String, Reader, String).
public createComponents ( string fieldName, Reader reader ) : TokenStreamComponents
fieldName string /// the name of the field to tokenize (currently ignored).
reader Reader /// the reader delivering the text
return TokenStreamComponents

createComponents() public method

Creates a token stream that tokenizes the given string into token terms (aka words).
public createComponents ( string fieldName, Reader reader, string text ) : TokenStreamComponents
fieldName string /// the name of the field to tokenize (currently ignored).
reader Reader /// reader (e.g. charfilter) of the original text. can be null.
text string /// the string to tokenize
return TokenStreamComponents

Property Details

DEFAULT_ANALYZER public_oe static_oe property

A lower-casing word analyzer with English stop words (can be shared freely across threads without harm); global per class loader.
public static PatternAnalyzer,org.apache.lucene.analysis.miscellaneous DEFAULT_ANALYZER
return PatternAnalyzer

EXTENDED_ANALYZER public_oe static_oe property

A lower-casing word analyzer with extended English stop words (can be shared freely across threads without harm); global per class loader. The stop words are borrowed from http://thomas.loc.gov/home/stopwords.html, see http://thomas.loc.gov/home/all.about.inquery.html
public static PatternAnalyzer,org.apache.lucene.analysis.miscellaneous EXTENDED_ANALYZER
return PatternAnalyzer

NON_WORD_PATTERN public_oe static_oe property

"\\W+"; Divides text at non-letters (NOT Character.isLetter(c))
public static Pattern NON_WORD_PATTERN
return Pattern

WHITESPACE_PATTERN public_oe static_oe property

"\\s+"; Divides text at whitespaces (Character.isWhitespace(c))
public static Pattern WHITESPACE_PATTERN
return Pattern