C# 클래스 Lucene.Net.Analysis.Analyzer

An Analyzer represents a policy for extracting terms that are indexed from text. The Analyzer builds TokenStreams, which breaks down text into tokens.

A typical Analyzer implementation will first build a Tokenizer. The Tokenizer will break down the stream of characters from the System.IO.TextReader into raw Tokens. One or more TokenFilters may then be applied to the output of the Tokenizer.

상속: IDisposable

파일 보기 프로젝트 열기: paulirwin/lucene.net 1 사용 예제들

공개 프로퍼티들

프로퍼티	타입	설명
GLOBAL_REUSE_STRATEGY	ReuseStrategy
PER_FIELD_REUSE_STRATEGY	ReuseStrategy

공개 메소드들

메소드	설명
Analyzer ( ) : Lucene.Net.Util	Create a new Analyzer, reusing the same set of components per-thread across calls to #tokenStream(String, Reader).
Analyzer ( ReuseStrategy reuseStrategy ) : Lucene.Net.Util	Expert: create a new Analyzer with a custom ReuseStrategy. NOTE: if you just want to reuse on a per-field basis, its easier to use a subclass of AnalyzerWrapper such as PerFieldAnalyerWrapper instead.
CreateComponents ( string fieldName, TextReader reader ) : TokenStreamComponents	Creates a new TokenStreamComponents instance for this analyzer.
Dispose ( ) : void	Frees persistent resources used by this Analyzer
GetOffsetGap ( string fieldName ) : int	Just like #getPositionIncrementGap, except for Token offsets instead. By default this returns 1. this method is only called if the field produced at least one token for indexing.
GetPositionIncrementGap ( string fieldName ) : int	Invoked before indexing a IndexableField instance if terms have already been added to that field. this allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries.
InitReader ( string fieldName, TextReader reader ) : TextReader	Override this if you want to add a CharFilter chain. The default implementation returns `reader` unchanged.
TokenStream ( string fieldName, TextReader reader ) : TokenStream	Returns a TokenStream suitable for `fieldName`, tokenizing the contents of `text`. this method uses #createComponents(String, Reader) to obtain an instance of TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through TokenStreamComponents#setReader(Reader). NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Lucene.Net.Analysis Analysis package documentation for some examples demonstrating this.

메소드 상세

Analyzer() 공개 메소드

Create a new Analyzer, reusing the same set of components per-thread across calls to #tokenStream(String, Reader).

public Analyzer ( ) : Lucene.Net.Util
리턴	Lucene.Net.Util

Analyzer() 공개 메소드

Expert: create a new Analyzer with a custom ReuseStrategy.

NOTE: if you just want to reuse on a per-field basis, its easier to use a subclass of AnalyzerWrapper such as PerFieldAnalyerWrapper instead.

public Analyzer ( ReuseStrategy reuseStrategy ) : Lucene.Net.Util
reuseStrategy	ReuseStrategy
리턴	Lucene.Net.Util

CreateComponents() 공개 추상적인 메소드

Creates a new TokenStreamComponents instance for this analyzer.

public abstract CreateComponents ( string fieldName, TextReader reader ) : TokenStreamComponents
fieldName	string	/// the name of the fields content passed to the /// sink as a reader
reader	System.IO.TextReader	/// the reader passed to the constructor
리턴	TokenStreamComponents

Dispose() 공개 메소드

Frees persistent resources used by this Analyzer

public Dispose ( ) : void
리턴	void

GetOffsetGap() 공개 메소드

Just like #getPositionIncrementGap, except for Token offsets instead. By default this returns 1. this method is only called if the field produced at least one token for indexing.

public GetOffsetGap ( string fieldName ) : int
fieldName	string	the field just indexed
리턴	int

GetPositionIncrementGap() 공개 메소드

Invoked before indexing a IndexableField instance if terms have already been added to that field. this allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries.

public GetPositionIncrementGap ( string fieldName ) : int
fieldName	string	IndexableField name being indexed.
리턴	int

InitReader() 공개 메소드

Override this if you want to add a CharFilter chain.

The default implementation returns reader unchanged.

public InitReader ( string fieldName, TextReader reader ) : TextReader
fieldName	string	IndexableField name being indexed
reader	System.IO.TextReader	original Reader
리턴	System.IO.TextReader

TokenStream() 공개 메소드

Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

this method uses #createComponents(String, Reader) to obtain an instance of TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through TokenStreamComponents#setReader(Reader).

NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Lucene.Net.Analysis Analysis package documentation for some examples demonstrating this.

if the Analyzer is closed. if an i/o error occurs (may rarely happen for strings).

public TokenStream ( string fieldName, TextReader reader ) : TokenStream
fieldName	string	the name of the field the created TokenStream is used for
reader	System.IO.TextReader
리턴	TokenStream

프로퍼티 상세

GLOBAL_REUSE_STRATEGY 공개적으로 정적으로 프로퍼티

A predefined ReuseStrategy that reuses the same components for every field.

public static ReuseStrategy GLOBAL_REUSE_STRATEGY
리턴	ReuseStrategy

PER_FIELD_REUSE_STRATEGY 공개적으로 정적으로 프로퍼티

A predefined ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponent per field name.

public static ReuseStrategy PER_FIELD_REUSE_STRATEGY
리턴	ReuseStrategy