C# Class Lucene.Net.Util.NumericUtils

this is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.

this class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding.

To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: #doubleToSortableLong, #floatToSortableInt. You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long: java.util.Date#getTime).

For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index int, long, float, and double. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.

this class can also be used, to generate lexicographically sortable (according to BytesRef#getUTF8SortedAsUTF16Comparator()) representations of numeric data types for other usages (e.g. sorting). @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0

Datei anzeigen Open project: paulirwin/lucene.net Class Usage Examples

Public Methods

Method Description
DoubleToSortableLong ( double val ) : long

Converts a double value to a sortable signed long. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including Double#NaN) is defined by Double#compareTo; {@code NaN} is greater than positive infinity.

FilterPrefixCodedInts ( TermsEnum termsEnum ) : TermsEnum

Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.

FilterPrefixCodedLongs ( TermsEnum termsEnum ) : TermsEnum

Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.

FloatToSortableInt ( float val ) : int

Converts a float value to a sortable signed int. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including Float#NaN) is defined by Float#compareTo; {@code NaN} is greater than positive infinity.

GetPrefixCodedIntShift ( BytesRef val ) : int

Returns the shift value from a prefix encoded {@code int}.

GetPrefixCodedLongShift ( BytesRef val ) : int

Returns the shift value from a prefix encoded {@code long}.

IntToPrefixCoded ( int val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

IntToPrefixCodedBytes ( int val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

LongToPrefixCoded ( long val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

LongToPrefixCodedBytes ( long val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

PrefixCodedToInt ( BytesRef val ) : int

Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.

PrefixCodedToLong ( BytesRef val ) : long

Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.

SortableIntToFloat ( int val ) : float

Converts a sortable int back to a float.

SortableLongToDouble ( long val ) : double

Converts a sortable long back to a double.

SplitIntRange ( IntRangeBuilder builder, int precisionStep, int minBound, int maxBound ) : void

Splits an int range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its IntRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

SplitLongRange ( LongRangeBuilder builder, int precisionStep, long minBound, long maxBound ) : void

Splits a long range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its LongRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

Private Methods

Method Description
AddRange ( object builder, int valSize, long minBound, long maxBound, int shift ) : void

Helper that delegates to correct range builder

NumericUtils ( ) : Lucene.Net.Documents
SplitRange ( object builder, int valSize, int precisionStep, long minBound, long maxBound ) : void

this helper does the splitting for both 32 and 64 bit.

Method Details

DoubleToSortableLong() public static method

Converts a double value to a sortable signed long. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including Double#NaN) is defined by Double#compareTo; {@code NaN} is greater than positive infinity.
public static DoubleToSortableLong ( double val ) : long
val double
return long

FilterPrefixCodedInts() public static method

Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.
public static FilterPrefixCodedInts ( TermsEnum termsEnum ) : TermsEnum
termsEnum TermsEnum /// the terms enum to filter
return TermsEnum

FilterPrefixCodedLongs() public static method

Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.
public static FilterPrefixCodedLongs ( TermsEnum termsEnum ) : TermsEnum
termsEnum TermsEnum /// the terms enum to filter
return TermsEnum

FloatToSortableInt() public static method

Converts a float value to a sortable signed int. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including Float#NaN) is defined by Float#compareTo; {@code NaN} is greater than positive infinity.
public static FloatToSortableInt ( float val ) : int
val float
return int

GetPrefixCodedIntShift() public static method

Returns the shift value from a prefix encoded {@code int}.
if the supplied is /// not correctly prefix encoded.
public static GetPrefixCodedIntShift ( BytesRef val ) : int
val BytesRef
return int

GetPrefixCodedLongShift() public static method

Returns the shift value from a prefix encoded {@code long}.
if the supplied is /// not correctly prefix encoded.
public static GetPrefixCodedLongShift ( BytesRef val ) : int
val BytesRef
return int

IntToPrefixCoded() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static IntToPrefixCoded ( int val, int shift, BytesRef bytes ) : void
val int the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
return void

IntToPrefixCodedBytes() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static IntToPrefixCodedBytes ( int val, int shift, BytesRef bytes ) : void
val int the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
return void

LongToPrefixCoded() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static LongToPrefixCoded ( long val, int shift, BytesRef bytes ) : void
val long the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
return void

LongToPrefixCodedBytes() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static LongToPrefixCodedBytes ( long val, int shift, BytesRef bytes ) : void
val long the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
return void

PrefixCodedToInt() public static method

Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.
if the supplied is /// not correctly prefix encoded.
public static PrefixCodedToInt ( BytesRef val ) : int
val BytesRef
return int

PrefixCodedToLong() public static method

Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.
if the supplied is /// not correctly prefix encoded.
public static PrefixCodedToLong ( BytesRef val ) : long
val BytesRef
return long

SortableIntToFloat() public static method

Converts a sortable int back to a float.
public static SortableIntToFloat ( int val ) : float
val int
return float

SortableLongToDouble() public static method

Converts a sortable long back to a double.
public static SortableLongToDouble ( long val ) : double
val long
return double

SplitIntRange() public static method

Splits an int range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its IntRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

public static SplitIntRange ( IntRangeBuilder builder, int precisionStep, int minBound, int maxBound ) : void
builder IntRangeBuilder
precisionStep int
minBound int
maxBound int
return void

SplitLongRange() public static method

Splits a long range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its LongRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

public static SplitLongRange ( LongRangeBuilder builder, int precisionStep, long minBound, long maxBound ) : void
builder LongRangeBuilder
precisionStep int
minBound long
maxBound long
return void