C# Class Lucene.Net.Analysis.Wikipedia.WikipediaTokenizer

Extension of StandardTokenizer that is aware of Wikipedia syntax. It is based off of the Wikipedia tutorial available at http://en.wikipedia.org/wiki/Wikipedia:Tutorial, but it may not be complete.

@lucene.experimental

Inheritance: Tokenizer
Exibir arquivo Open project: apache/lucenenet Class Usage Examples

Public Properties

Property Type Description
TOKEN_TYPES string[]

Public Methods

Method Description
Dispose ( ) : void
End ( ) : void
IncrementToken ( ) : bool
Reset ( ) : void
WikipediaTokenizer ( AttributeFactory factory, TextReader input, int tokenOutput, IEnumerable untokenizedTypes ) : System.Collections.Generic

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner. Uses the given org.apache.lucene.util.AttributeSource.AttributeFactory.

WikipediaTokenizer ( TextReader input ) : System.Collections.Generic

Creates a new instance of the WikipediaTokenizer. Attaches the input to a newly created JFlex scanner.

WikipediaTokenizer ( TextReader input, int tokenOutput, IEnumerable untokenizedTypes ) : System.Collections.Generic

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner.

Private Methods

Method Description
Init ( int tokenOutput, IEnumerable untokenizedTypes ) : void
collapseAndSaveTokens ( int tokenType, string type ) : void
collapseTokens ( int tokenType ) : void
setupSavedToken ( int positionInc, string type ) : void
setupToken ( ) : void

Method Details

Dispose() public method

public Dispose ( ) : void
return void

End() public method

public End ( ) : void
return void

IncrementToken() public method

public IncrementToken ( ) : bool
return bool

Reset() public method

public Reset ( ) : void
return void

WikipediaTokenizer() public method

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner. Uses the given org.apache.lucene.util.AttributeSource.AttributeFactory.
public WikipediaTokenizer ( AttributeFactory factory, TextReader input, int tokenOutput, IEnumerable untokenizedTypes ) : System.Collections.Generic
factory AttributeFactory
input TextReader The input
tokenOutput int One of , ,
untokenizedTypes IEnumerable
return System.Collections.Generic

WikipediaTokenizer() public method

Creates a new instance of the WikipediaTokenizer. Attaches the input to a newly created JFlex scanner.
public WikipediaTokenizer ( TextReader input ) : System.Collections.Generic
input TextReader The Input Reader
return System.Collections.Generic

WikipediaTokenizer() public method

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner.
public WikipediaTokenizer ( TextReader input, int tokenOutput, IEnumerable untokenizedTypes ) : System.Collections.Generic
input TextReader The input
tokenOutput int One of , ,
untokenizedTypes IEnumerable
return System.Collections.Generic

Property Details

TOKEN_TYPES public_oe static_oe property

String token types that correspond to token type int constants
public static string[] TOKEN_TYPES
return string[]