C# 클래스 Lucene.Net.Analysis.Util.WordlistLoader

Loader for text files that represent a list of stopwords.
파일 보기 프로젝트 열기: paulirwin/lucene.net 1 사용 예제들

공개 메소드들

메소드 설명
GetWordSet ( TextReader reader, CharArraySet result ) : CharArraySet

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, System.Version matchVersion ) : CharArraySet

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, string comment, CharArraySet result ) : CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

GetWordSet ( TextReader reader, string comment, System.Version matchVersion ) : CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

getLines ( InputStream stream, Charset charset ) : IList

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

getSnowballWordSet ( Reader reader, CharArraySet result ) : CharArraySet

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

getSnowballWordSet ( Reader reader, System.Version matchVersion ) : CharArraySet

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

getStemDict ( Reader reader, CharArrayMap result ) : CharArrayMap

Reads a stem dictionary. Each line contains:

word\tstem
(i.e. two tab separated words)

비공개 메소드들

메소드 설명
WordlistLoader ( ) : System

no instance

getBufferedReader ( Reader reader ) : BufferedReader

메소드 상세

GetWordSet() 공개 정적인 메소드

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, CharArraySet result ) : CharArraySet
reader TextReader Reader containing the wordlist
result CharArraySet the to fill with the readers words
리턴 CharArraySet

GetWordSet() 공개 정적인 메소드

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, System.Version matchVersion ) : CharArraySet
reader TextReader Reader containing the wordlist
matchVersion System.Version the Lucene
리턴 CharArraySet

GetWordSet() 공개 정적인 메소드

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, string comment, CharArraySet result ) : CharArraySet
reader TextReader Reader containing the wordlist
comment string The string representing a comment.
result CharArraySet the to fill with the readers words
리턴 CharArraySet

GetWordSet() 공개 정적인 메소드

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
public static GetWordSet ( TextReader reader, string comment, System.Version matchVersion ) : CharArraySet
reader TextReader Reader containing the wordlist
comment string The string representing a comment.
matchVersion System.Version the Lucene
리턴 CharArraySet

getLines() 공개 정적인 메소드

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

If there is a low-level I/O error.
public static getLines ( InputStream stream, Charset charset ) : IList
stream InputStream
charset Charset
리턴 IList

getSnowballWordSet() 공개 정적인 메소드

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

public static getSnowballWordSet ( Reader reader, CharArraySet result ) : CharArraySet
reader Reader Reader containing a Snowball stopword list
result CharArraySet the to fill with the readers words
리턴 CharArraySet

getSnowballWordSet() 공개 정적인 메소드

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

  • Lines may contain multiple words separated by whitespace.
  • The comment character is the vertical line (|).
  • Lines may contain trailing comments.

public static getSnowballWordSet ( Reader reader, System.Version matchVersion ) : CharArraySet
reader Reader Reader containing a Snowball stopword list
matchVersion System.Version the Lucene
리턴 CharArraySet

getStemDict() 공개 정적인 메소드

Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)
If there is a low-level I/O error.
public static getStemDict ( Reader reader, CharArrayMap result ) : CharArrayMap
reader Reader
result CharArrayMap
리턴 CharArrayMap