C# 클래스 Similarity.Net.SimilarityQueries

Simple similarity measures.
파일 보기 프로젝트 열기: synhershko/lucene.net

공개 메소드들

메소드 설명
FormSimilarQuery ( System body, Lucene.Net.Analysis.Analyzer a, System field, ISet stop ) : Query

Simple similarity query generators. Takes every unique word and forms a boolean query where all words are optional. After you get this you'll use to to query your IndexSearcher for similar docs. The only caveat is the first hit returned should be your source document - you'll need to then ignore that.

So, if you have a code fragment like this:
Query q = formSimilaryQuery( "I use Lucene to search fast. Fast searchers are good", new StandardAnalyzer(), "contents", null);

The query returned, in string form, will be '(i use lucene to search fast searchers are good').

The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucenes scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.

This method is fail-safe in that if a long 'body' is passed in and BooleanQuery.Add (used internally) throws BooleanQuery.TooManyClauses, the query as it is will be returned.

비공개 메소드들

메소드 설명
SimilarityQueries ( ) : System

메소드 상세

FormSimilarQuery() 공개 정적인 메소드

Simple similarity query generators. Takes every unique word and forms a boolean query where all words are optional. After you get this you'll use to to query your IndexSearcher for similar docs. The only caveat is the first hit returned should be your source document - you'll need to then ignore that.

So, if you have a code fragment like this:
Query q = formSimilaryQuery( "I use Lucene to search fast. Fast searchers are good", new StandardAnalyzer(), "contents", null);

The query returned, in string form, will be '(i use lucene to search fast searchers are good').

The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucenes scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.

This method is fail-safe in that if a long 'body' is passed in and BooleanQuery.Add (used internally) throws BooleanQuery.TooManyClauses, the query as it is will be returned.

public static FormSimilarQuery ( System body, Lucene.Net.Analysis.Analyzer a, System field, ISet stop ) : Query
body System the body of the document you want to find similar documents to ///
a Lucene.Net.Analysis.Analyzer the analyzer to use to parse the body ///
field System the field you want to search on, probably something like "contents" or "body" ///
stop ISet optional set of stop words to ignore ///
리턴 Lucene.Net.Search.Query