C# Class Lucene.Net.Queries.CommonTermsQuery

A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the #add(Term) added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain BooleanQuery scorer mainly due to differences in the Similarity#coord(int,int) number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

Inheritance: Lucene.Net.Search.Query
Show file Open project: paulirwin/lucene.net

Protected Properties

Property Type Description
disableCoord bool
highFreqBoost float
highFreqOccur Lucene.Net.Search.BooleanClause.Occur
lowFreqBoost float
lowFreqOccur Lucene.Net.Search.BooleanClause.Occur
maxTermFrequency float
terms IList

Public Methods

Method Description
Add ( Lucene.Net.Index.Term term ) : void

Adds a term to the CommonTermsQuery

CollectTermContext ( IndexReader reader, IList leaves, TermContext contextArray, Lucene.Net.Index.Term queryTerms ) : void
CommonTermsQuery ( BooleanClause highFreqOccur, BooleanClause lowFreqOccur, float maxTermFrequency ) : System

Creates a new CommonTermsQuery

CommonTermsQuery ( BooleanClause highFreqOccur, BooleanClause lowFreqOccur, float maxTermFrequency, bool disableCoord ) : System

Creates a new CommonTermsQuery

Equals ( object obj ) : bool
ExtractTerms ( ISet terms ) : void
GetHashCode ( ) : int
Rewrite ( IndexReader reader ) : Query
ToString ( string field ) : string

Protected Methods

Method Description
BuildQuery ( int maxDoc, TermContext contextArray, Lucene.Net.Index.Term queryTerms ) : Query
CalcHighFreqMinimumNumberShouldMatch ( int numOptional ) : int
CalcLowFreqMinimumNumberShouldMatch ( int numOptional ) : int
NewTermQuery ( Lucene.Net.Index.Term term, TermContext context ) : Query

Builds a new TermQuery instance.

This is intended for subclasses that wish to customize the generated queries.

Private Methods

Method Description
MinNrShouldMatch ( float minNrShouldMatch, int numOptional ) : int

Method Details

Add() public method

Adds a term to the CommonTermsQuery
public Add ( Lucene.Net.Index.Term term ) : void
term Lucene.Net.Index.Term /// the term to add
return void

BuildQuery() protected method

protected BuildQuery ( int maxDoc, TermContext contextArray, Lucene.Net.Index.Term queryTerms ) : Query
maxDoc int
contextArray Lucene.Net.Index.TermContext
queryTerms Lucene.Net.Index.Term
return Lucene.Net.Search.Query

CalcHighFreqMinimumNumberShouldMatch() protected method

protected CalcHighFreqMinimumNumberShouldMatch ( int numOptional ) : int
numOptional int
return int

CalcLowFreqMinimumNumberShouldMatch() protected method

protected CalcLowFreqMinimumNumberShouldMatch ( int numOptional ) : int
numOptional int
return int

CollectTermContext() public method

public CollectTermContext ( IndexReader reader, IList leaves, TermContext contextArray, Lucene.Net.Index.Term queryTerms ) : void
reader Lucene.Net.Index.IndexReader
leaves IList
contextArray Lucene.Net.Index.TermContext
queryTerms Lucene.Net.Index.Term
return void

CommonTermsQuery() public method

Creates a new CommonTermsQuery
/// if is pass as lowFreqOccur or /// highFreqOccur
public CommonTermsQuery ( BooleanClause highFreqOccur, BooleanClause lowFreqOccur, float maxTermFrequency ) : System
highFreqOccur Lucene.Net.Search.BooleanClause /// used for high frequency terms
lowFreqOccur Lucene.Net.Search.BooleanClause /// used for low frequency terms
maxTermFrequency float /// a value in [0..1) (or absolute number >=1) representing the /// maximum threshold of a terms document frequency to be considered a /// low frequency term.
return System

CommonTermsQuery() public method

Creates a new CommonTermsQuery
/// if is pass as lowFreqOccur or /// highFreqOccur
public CommonTermsQuery ( BooleanClause highFreqOccur, BooleanClause lowFreqOccur, float maxTermFrequency, bool disableCoord ) : System
highFreqOccur Lucene.Net.Search.BooleanClause /// used for high frequency terms
lowFreqOccur Lucene.Net.Search.BooleanClause /// used for low frequency terms
maxTermFrequency float /// a value in [0..1) (or absolute number >=1) representing the /// maximum threshold of a terms document frequency to be considered a /// low frequency term.
disableCoord bool /// disables in scoring for the low /// / high frequency sub-queries
return System

Equals() public method

public Equals ( object obj ) : bool
obj object
return bool

ExtractTerms() public method

public ExtractTerms ( ISet terms ) : void
terms ISet
return void

GetHashCode() public method

public GetHashCode ( ) : int
return int

NewTermQuery() protected method

Builds a new TermQuery instance.

This is intended for subclasses that wish to customize the generated queries.

protected NewTermQuery ( Lucene.Net.Index.Term term, TermContext context ) : Query
term Lucene.Net.Index.Term term
context Lucene.Net.Index.TermContext the TermContext to be used to create the low level term query. Can be null.
return Lucene.Net.Search.Query

Rewrite() public method

public Rewrite ( IndexReader reader ) : Query
reader Lucene.Net.Index.IndexReader
return Lucene.Net.Search.Query

ToString() public method

public ToString ( string field ) : string
field string
return string

Property Details

disableCoord protected property

protected bool disableCoord
return bool

highFreqBoost protected property

protected float highFreqBoost
return float

highFreqOccur protected property

protected BooleanClause.Occur,Lucene.Net.Search highFreqOccur
return Lucene.Net.Search.BooleanClause.Occur

lowFreqBoost protected property

protected float lowFreqBoost
return float

lowFreqOccur protected property

protected BooleanClause.Occur,Lucene.Net.Search lowFreqOccur
return Lucene.Net.Search.BooleanClause.Occur

maxTermFrequency protected property

protected float maxTermFrequency
return float

terms protected property

protected IList terms
return IList