C# Class Lucene.Net.Analysis.Util.WordlistLoader

Loader for text files that represent a list of stopwords.
Exibir arquivo Open project: paulirwin/lucene.net Class Usage Examples

Public Methods

Method Description
GetWordSet ( TextReader reader, CharArraySet result ) : CharArraySet

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, System.Version matchVersion ) : CharArraySet

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, string comment, CharArraySet result ) : CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, string comment, System.Version matchVersion ) : CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

getLines ( InputStream stream, Charset charset ) : IList

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

getSnowballWordSet ( Reader reader, CharArraySet result ) : CharArraySet

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

getSnowballWordSet ( Reader reader, System.Version matchVersion ) : CharArraySet

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

getStemDict ( Reader reader, CharArrayMap result ) : CharArrayMap

Reads a stem dictionary. Each line contains:

word\tstem
(i.e. two tab separated words)

Private Methods

Method Description
WordlistLoader ( ) : System

no instance

getBufferedReader ( Reader reader ) : BufferedReader

Method Details

GetWordSet() public static method

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, CharArraySet result ) : CharArraySet
reader TextReader Reader containing the wordlist
result CharArraySet the to fill with the readers words
return CharArraySet

GetWordSet() public static method

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, System.Version matchVersion ) : CharArraySet
reader TextReader Reader containing the wordlist
matchVersion System.Version the Lucene
return CharArraySet

GetWordSet() public static method

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, string comment, CharArraySet result ) : CharArraySet
reader TextReader Reader containing the wordlist
comment string The string representing a comment.
result CharArraySet the to fill with the readers words
return CharArraySet

GetWordSet() public static method

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, string comment, System.Version matchVersion ) : CharArraySet
reader TextReader Reader containing the wordlist
comment string The string representing a comment.
matchVersion System.Version the Lucene
return CharArraySet

getLines() public static method

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

If there is a low-level I/O error.
public static getLines ( InputStream stream, Charset charset ) : IList
stream InputStream
charset Charset
return IList

getSnowballWordSet() public static method

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

public static getSnowballWordSet ( Reader reader, CharArraySet result ) : CharArraySet
reader Reader Reader containing a Snowball stopword list
result CharArraySet the to fill with the readers words
return CharArraySet

getSnowballWordSet() public static method

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

public static getSnowballWordSet ( Reader reader, System.Version matchVersion ) : CharArraySet
reader Reader Reader containing a Snowball stopword list
matchVersion System.Version the Lucene
return CharArraySet

getStemDict() public static method

Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)
If there is a low-level I/O error.
public static getStemDict ( Reader reader, CharArrayMap result ) : CharArrayMap
reader Reader
result CharArrayMap
return CharArrayMap