C# Класс Lucene.Net.Util.NumericUtils

this is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.

To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.

this class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding.

To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: #doubleToSortableLong, #floatToSortableInt. You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long: java.util.Date#getTime).

For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index int, long, float, and double. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.

this class can also be used, to generate lexicographically sortable (according to BytesRef#getUTF8SortedAsUTF16Comparator()) representations of numeric data types for other usages (e.g. sorting). @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0

Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод Описание
DoubleToSortableLong ( double val ) : long

Converts a double value to a sortable signed long. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including Double#NaN) is defined by Double#compareTo; {@code NaN} is greater than positive infinity.

FilterPrefixCodedInts ( TermsEnum termsEnum ) : TermsEnum

Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.

FilterPrefixCodedLongs ( TermsEnum termsEnum ) : TermsEnum

Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.

FloatToSortableInt ( float val ) : int

Converts a float value to a sortable signed int. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including Float#NaN) is defined by Float#compareTo; {@code NaN} is greater than positive infinity.

GetPrefixCodedIntShift ( BytesRef val ) : int

Returns the shift value from a prefix encoded {@code int}.

GetPrefixCodedLongShift ( BytesRef val ) : int

Returns the shift value from a prefix encoded {@code long}.

IntToPrefixCoded ( int val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

IntToPrefixCodedBytes ( int val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

LongToPrefixCoded ( long val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

LongToPrefixCodedBytes ( long val, int shift, BytesRef bytes ) : void

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.

PrefixCodedToInt ( BytesRef val ) : int

Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.

PrefixCodedToLong ( BytesRef val ) : long

Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.

SortableIntToFloat ( int val ) : float

Converts a sortable int back to a float.

SortableLongToDouble ( long val ) : double

Converts a sortable long back to a double.

SplitIntRange ( IntRangeBuilder builder, int precisionStep, int minBound, int maxBound ) : void

Splits an int range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its IntRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

SplitLongRange ( LongRangeBuilder builder, int precisionStep, long minBound, long maxBound ) : void

Splits a long range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its LongRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

Приватные методы

Метод Описание
AddRange ( object builder, int valSize, long minBound, long maxBound, int shift ) : void

Helper that delegates to correct range builder

NumericUtils ( ) : Lucene.Net.Documents
SplitRange ( object builder, int valSize, int precisionStep, long minBound, long maxBound ) : void

this helper does the splitting for both 32 and 64 bit.

Описание методов

DoubleToSortableLong() публичный статический Метод

Converts a double value to a sortable signed long. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (including Double#NaN) is defined by Double#compareTo; {@code NaN} is greater than positive infinity.
public static DoubleToSortableLong ( double val ) : long
val double
Результат long

FilterPrefixCodedInts() публичный статический Метод

Filters the given TermsEnum by accepting only prefix coded 32 bit terms with a shift value of 0.
public static FilterPrefixCodedInts ( TermsEnum termsEnum ) : TermsEnum
termsEnum TermsEnum /// the terms enum to filter
Результат TermsEnum

FilterPrefixCodedLongs() публичный статический Метод

Filters the given TermsEnum by accepting only prefix coded 64 bit terms with a shift value of 0.
public static FilterPrefixCodedLongs ( TermsEnum termsEnum ) : TermsEnum
termsEnum TermsEnum /// the terms enum to filter
Результат TermsEnum

FloatToSortableInt() публичный статический Метод

Converts a float value to a sortable signed int. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (including Float#NaN) is defined by Float#compareTo; {@code NaN} is greater than positive infinity.
public static FloatToSortableInt ( float val ) : int
val float
Результат int

GetPrefixCodedIntShift() публичный статический Метод

Returns the shift value from a prefix encoded {@code int}.
if the supplied is /// not correctly prefix encoded.
public static GetPrefixCodedIntShift ( BytesRef val ) : int
val BytesRef
Результат int

GetPrefixCodedLongShift() публичный статический Метод

Returns the shift value from a prefix encoded {@code long}.
if the supplied is /// not correctly prefix encoded.
public static GetPrefixCodedLongShift ( BytesRef val ) : int
val BytesRef
Результат int

IntToPrefixCoded() публичный статический Метод

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static IntToPrefixCoded ( int val, int shift, BytesRef bytes ) : void
val int the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
Результат void

IntToPrefixCodedBytes() публичный статический Метод

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static IntToPrefixCodedBytes ( int val, int shift, BytesRef bytes ) : void
val int the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
Результат void

LongToPrefixCoded() публичный статический Метод

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static LongToPrefixCoded ( long val, int shift, BytesRef bytes ) : void
val long the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
Результат void

LongToPrefixCodedBytes() публичный статический Метод

Returns prefix coded bits after reducing the precision by shift bits. this is method is used by NumericTokenStream. After encoding, {@code bytes.offset} will always be 0.
public static LongToPrefixCodedBytes ( long val, int shift, BytesRef bytes ) : void
val long the numeric value
shift int how many bits to strip from the right
bytes BytesRef will contain the encoded value
Результат void

PrefixCodedToInt() публичный статический Метод

Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.
if the supplied is /// not correctly prefix encoded.
public static PrefixCodedToInt ( BytesRef val ) : int
val BytesRef
Результат int

PrefixCodedToLong() публичный статический Метод

Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. this method can be used to decode a term's value.
if the supplied is /// not correctly prefix encoded.
public static PrefixCodedToLong ( BytesRef val ) : long
val BytesRef
Результат long

SortableIntToFloat() публичный статический Метод

Converts a sortable int back to a float.
public static SortableIntToFloat ( int val ) : float
val int
Результат float

SortableLongToDouble() публичный статический Метод

Converts a sortable long back to a double.
public static SortableLongToDouble ( long val ) : double
val long
Результат double

SplitIntRange() публичный статический Метод

Splits an int range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its IntRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

public static SplitIntRange ( IntRangeBuilder builder, int precisionStep, int minBound, int maxBound ) : void
builder IntRangeBuilder
precisionStep int
minBound int
maxBound int
Результат void

SplitLongRange() публичный статический Метод

Splits a long range recursively. You may implement a builder that adds clauses to a Lucene.Net.Search.BooleanQuery for each call to its LongRangeBuilder#addRange(BytesRef,BytesRef) method.

this method is used by NumericRangeQuery.

public static SplitLongRange ( LongRangeBuilder builder, int precisionStep, long minBound, long maxBound ) : void
builder LongRangeBuilder
precisionStep int
minBound long
maxBound long
Результат void