C# Class Lucene.Net.Util.NumericUtils

this is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.

this class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding.

To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: #doubleToSortableLong, #floatToSortableInt. You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long: java.util.Date#getTime).

For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index int, long, float, and double. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.

this class can also be used, to generate lexicographically sortable (according to BytesRef#getUTF8SortedAsUTF16Comparator()) representations of numeric data types for other usages (e.g. sorting). @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0

Mostra file Open project: paulirwin/lucene.net Class Usage Examples

Public Methods

Method	Description
DoubleToSortableLong ( double val ) : long	Converts a `double` value to a sortable signed `long`. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including Double#NaN) is defined by Double#compareTo; {@code NaN} is greater than positive infinity.
FilterPrefixCodedInts ( TermsEnum termsEnum ) : TermsEnum	Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of `0`.
FilterPrefixCodedLongs ( TermsEnum termsEnum ) : TermsEnum	Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of `0`.
FloatToSortableInt ( float val ) : int	Converts a `float` value to a sortable signed `int`. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including Float#NaN) is defined by Float#compareTo; {@code NaN} is greater than positive infinity.
GetPrefixCodedIntShift ( BytesRef val ) : int	Returns the shift value from a prefix encoded {@code int}.
GetPrefixCodedLongShift ( BytesRef val ) : int	Returns the shift value from a prefix encoded {@code long}.
IntToPrefixCoded ( int val, int shift, BytesRef bytes ) : void	Returns prefix coded bits after reducing the precision by `shift` bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
IntToPrefixCodedBytes ( int val, int shift, BytesRef bytes ) : void	Returns prefix coded bits after reducing the precision by `shift` bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
LongToPrefixCoded ( long val, int shift, BytesRef bytes ) : void	Returns prefix coded bits after reducing the precision by `shift` bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
LongToPrefixCodedBytes ( long val, int shift, BytesRef bytes ) : void	Returns prefix coded bits after reducing the precision by `shift` bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
PrefixCodedToInt ( BytesRef val ) : int	Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.
PrefixCodedToLong ( BytesRef val ) : long	Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.
SortableIntToFloat ( int val ) : float	Converts a sortable `int` back to a `float`.
SortableLongToDouble ( long val ) : double	Converts a sortable `long` back to a `double`.
SplitIntRange ( IntRangeBuilder builder, int precisionStep, int minBound, int maxBound ) : void	Splits an int range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its IntRangeBuilder#addRange(BytesRef,BytesRef) method. this method is used by NumericRangeQuery.
SplitLongRange ( LongRangeBuilder builder, int precisionStep, long minBound, long maxBound ) : void	Splits a long range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its LongRangeBuilder#addRange(BytesRef,BytesRef) method. this method is used by NumericRangeQuery.

Private Methods

Method	Description
AddRange ( object builder, int valSize, long minBound, long maxBound, int shift ) : void	Helper that delegates to correct range builder
NumericUtils ( ) : Lucene.Net.Documents
SplitRange ( object builder, int valSize, int precisionStep, long minBound, long maxBound ) : void	this helper does the splitting for both 32 and 64 bit.

Method Details

DoubleToSortableLong() public static method

Converts a double value to a sortable signed long. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including Double#NaN) is defined by Double#compareTo; {@code NaN} is greater than positive infinity.

public static DoubleToSortableLong ( double val ) : long
val	double
return	long

FilterPrefixCodedInts() public static method

Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.

public static FilterPrefixCodedInts ( TermsEnum termsEnum ) : TermsEnum
termsEnum	TermsEnum	/// the terms enum to filter
return	TermsEnum

FilterPrefixCodedLongs() public static method

Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.

public static FilterPrefixCodedLongs ( TermsEnum termsEnum ) : TermsEnum
termsEnum	TermsEnum	/// the terms enum to filter
return	TermsEnum

FloatToSortableInt() public static method

Converts a float value to a sortable signed int. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including Float#NaN) is defined by Float#compareTo; {@code NaN} is greater than positive infinity.

public static FloatToSortableInt ( float val ) : int
val	float
return	int

GetPrefixCodedIntShift() public static method

Returns the shift value from a prefix encoded {@code int}.

if the supplied is /// not correctly prefix encoded.

public static GetPrefixCodedIntShift ( BytesRef val ) : int
val	BytesRef
return	int

GetPrefixCodedLongShift() public static method

Returns the shift value from a prefix encoded {@code long}.

if the supplied is /// not correctly prefix encoded.

public static GetPrefixCodedLongShift ( BytesRef val ) : int
val	BytesRef
return	int

IntToPrefixCoded() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

public static IntToPrefixCoded ( int val, int shift, BytesRef bytes ) : void
val	int	the numeric value
shift	int	how many bits to strip from the right
bytes	BytesRef	will contain the encoded value
return	void

IntToPrefixCodedBytes() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

public static IntToPrefixCodedBytes ( int val, int shift, BytesRef bytes ) : void
val	int	the numeric value
shift	int	how many bits to strip from the right
bytes	BytesRef	will contain the encoded value
return	void

LongToPrefixCoded() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

public static LongToPrefixCoded ( long val, int shift, BytesRef bytes ) : void
val	long	the numeric value
shift	int	how many bits to strip from the right
bytes	BytesRef	will contain the encoded value
return	void

LongToPrefixCodedBytes() public static method

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

public static LongToPrefixCodedBytes ( long val, int shift, BytesRef bytes ) : void
val	long	the numeric value
shift	int	how many bits to strip from the right
bytes	BytesRef	will contain the encoded value
return	void

PrefixCodedToInt() public static method

Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.

if the supplied is /// not correctly prefix encoded.

public static PrefixCodedToInt ( BytesRef val ) : int
val	BytesRef
return	int

PrefixCodedToLong() public static method

Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.

if the supplied is /// not correctly prefix encoded.

public static PrefixCodedToLong ( BytesRef val ) : long
val	BytesRef
return	long

SortableIntToFloat() public static method

Converts a sortable int back to a float.

public static SortableIntToFloat ( int val ) : float
val	int
return	float

SortableLongToDouble() public static method

Converts a sortable long back to a double.

public static SortableLongToDouble ( long val ) : double
val	long
return	double

SplitIntRange() public static method

Splits an int range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its IntRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

public static SplitIntRange ( IntRangeBuilder builder, int precisionStep, int minBound, int maxBound ) : void
builder	IntRangeBuilder
precisionStep	int
minBound	int
maxBound	int
return	void

SplitLongRange() public static method

Splits a long range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its LongRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

public static SplitLongRange ( LongRangeBuilder builder, int precisionStep, long minBound, long maxBound ) : void
builder	LongRangeBuilder
precisionStep	int
minBound	long
maxBound	long
return	void