C# Class org.apache.lucene.analysis.wikipedia.WikipediaTokenizer

Extension of StandardTokenizer that is aware of Wikipedia syntax. It is based off of the Wikipedia tutorial available at http://en.wikipedia.org/wiki/Wikipedia:Tutorial, but it may not be complete.

@lucene.experimental

Inheritance: Tokenizer
ファイルを表示 Open project: paulirwin/lucene.net Class Usage Examples

Public Properties

Property Type Description
TOKEN_TYPES string[]

Public Methods

Method Description
WikipediaTokenizer ( AttributeFactory factory, Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner. Uses the given org.apache.lucene.util.AttributeSource.AttributeFactory.

WikipediaTokenizer ( Reader input ) : System.Collections.Generic

Creates a new instance of the WikipediaTokenizer. Attaches the input to a newly created JFlex scanner.

WikipediaTokenizer ( Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner.

close ( ) : void
end ( ) : void
incrementToken ( ) : bool
reset ( ) : void

Private Methods

Method Description
collapseAndSaveTokens ( int tokenType, string type ) : void
collapseTokens ( int tokenType ) : void
init ( int tokenOutput, HashSet untokenizedTypes ) : void
setupSavedToken ( int positionInc, string type ) : void
setupToken ( ) : void

Method Details

WikipediaTokenizer() public method

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner. Uses the given org.apache.lucene.util.AttributeSource.AttributeFactory.
public WikipediaTokenizer ( AttributeFactory factory, Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic
factory AttributeFactory
input Reader The input
tokenOutput int One of , ,
untokenizedTypes HashSet
return System.Collections.Generic

WikipediaTokenizer() public method

Creates a new instance of the WikipediaTokenizer. Attaches the input to a newly created JFlex scanner.
public WikipediaTokenizer ( Reader input ) : System.Collections.Generic
input Reader The Input Reader
return System.Collections.Generic

WikipediaTokenizer() public method

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner.
public WikipediaTokenizer ( Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic
input Reader The input
tokenOutput int One of , ,
untokenizedTypes HashSet
return System.Collections.Generic

close() public method

public close ( ) : void
return void

end() public method

public end ( ) : void
return void

incrementToken() public method

public incrementToken ( ) : bool
return bool

reset() public method

public reset ( ) : void
return void

Property Details

TOKEN_TYPES public_oe static_oe property

String token types that correspond to token type int constants
public static string[] TOKEN_TYPES
return string[]