C# Класс Lucene.Net.Analysis.Util.WordlistLoader

Loader for text files that represent a list of stopwords.
Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод Описание
GetWordSet ( TextReader reader, CharArraySet result ) : CharArraySet

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, System.Version matchVersion ) : CharArraySet

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, string comment, CharArraySet result ) : CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, string comment, System.Version matchVersion ) : CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

getLines ( InputStream stream, Charset charset ) : IList

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

getSnowballWordSet ( Reader reader, CharArraySet result ) : CharArraySet

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

getSnowballWordSet ( Reader reader, System.Version matchVersion ) : CharArraySet

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

getStemDict ( Reader reader, CharArrayMap result ) : CharArrayMap

Reads a stem dictionary. Each line contains:

word\tstem
(i.e. two tab separated words)

Приватные методы

Метод Описание
WordlistLoader ( ) : System

no instance

getBufferedReader ( Reader reader ) : BufferedReader

Описание методов

GetWordSet() публичный статический Метод

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, CharArraySet result ) : CharArraySet
reader TextReader Reader containing the wordlist
result CharArraySet the to fill with the readers words
Результат CharArraySet

GetWordSet() публичный статический Метод

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, System.Version matchVersion ) : CharArraySet
reader TextReader Reader containing the wordlist
matchVersion System.Version the Lucene
Результат CharArraySet

GetWordSet() публичный статический Метод

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, string comment, CharArraySet result ) : CharArraySet
reader TextReader Reader containing the wordlist
comment string The string representing a comment.
result CharArraySet the to fill with the readers words
Результат CharArraySet

GetWordSet() публичный статический Метод

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, string comment, System.Version matchVersion ) : CharArraySet
reader TextReader Reader containing the wordlist
comment string The string representing a comment.
matchVersion System.Version the Lucene
Результат CharArraySet

getLines() публичный статический Метод

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

If there is a low-level I/O error.
public static getLines ( InputStream stream, Charset charset ) : IList
stream InputStream
charset Charset
Результат IList

getSnowballWordSet() публичный статический Метод

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

public static getSnowballWordSet ( Reader reader, CharArraySet result ) : CharArraySet
reader Reader Reader containing a Snowball stopword list
result CharArraySet the to fill with the readers words
Результат CharArraySet

getSnowballWordSet() публичный статический Метод

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

public static getSnowballWordSet ( Reader reader, System.Version matchVersion ) : CharArraySet
reader Reader Reader containing a Snowball stopword list
matchVersion System.Version the Lucene
Результат CharArraySet

getStemDict() публичный статический Метод

Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)
If there is a low-level I/O error.
public static getStemDict ( Reader reader, CharArrayMap result ) : CharArrayMap
reader Reader
result CharArrayMap
Результат CharArrayMap