C# Class org.apache.lucene.analysis.wikipedia.WikipediaTokenizer

Extension of StandardTokenizer that is aware of Wikipedia syntax. It is based off of the Wikipedia tutorial available at http://en.wikipedia.org/wiki/Wikipedia:Tutorial, but it may not be complete.

@lucene.experimental

Inheritance: Tokenizer
Afficher le fichier Open project: paulirwin/lucene.net Class Usage Examples

Méthodes publiques

Свойство Type Description
TOKEN_TYPES string[]

Méthodes publiques

Méthode Description
WikipediaTokenizer ( AttributeFactory factory, Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner. Uses the given org.apache.lucene.util.AttributeSource.AttributeFactory.

WikipediaTokenizer ( Reader input ) : System.Collections.Generic

Creates a new instance of the WikipediaTokenizer. Attaches the input to a newly created JFlex scanner.

WikipediaTokenizer ( Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner.

close ( ) : void
end ( ) : void
incrementToken ( ) : bool
reset ( ) : void

Private Methods

Méthode Description
collapseAndSaveTokens ( int tokenType, string type ) : void
collapseTokens ( int tokenType ) : void
init ( int tokenOutput, HashSet untokenizedTypes ) : void
setupSavedToken ( int positionInc, string type ) : void
setupToken ( ) : void

Method Details

WikipediaTokenizer() public méthode

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner. Uses the given org.apache.lucene.util.AttributeSource.AttributeFactory.
public WikipediaTokenizer ( AttributeFactory factory, Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic
factory AttributeFactory
input Reader The input
tokenOutput int One of , ,
untokenizedTypes HashSet
Résultat System.Collections.Generic

WikipediaTokenizer() public méthode

Creates a new instance of the WikipediaTokenizer. Attaches the input to a newly created JFlex scanner.
public WikipediaTokenizer ( Reader input ) : System.Collections.Generic
input Reader The Input Reader
Résultat System.Collections.Generic

WikipediaTokenizer() public méthode

Creates a new instance of the org.apache.lucene.analysis.wikipedia.WikipediaTokenizer. Attaches the input to a the newly created JFlex scanner.
public WikipediaTokenizer ( Reader input, int tokenOutput, HashSet untokenizedTypes ) : System.Collections.Generic
input Reader The input
tokenOutput int One of , ,
untokenizedTypes HashSet
Résultat System.Collections.Generic

close() public méthode

public close ( ) : void
Résultat void

end() public méthode

public end ( ) : void
Résultat void

incrementToken() public méthode

public incrementToken ( ) : bool
Résultat bool

reset() public méthode

public reset ( ) : void
Résultat void

Property Details

TOKEN_TYPES public_oe static_oe property

String token types that correspond to token type int constants
public static string[] TOKEN_TYPES
Résultat string[]