C# Класс Lucene.Net.Util.UnicodeUtil

Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.

WARNING: This API is a new and experimental and may suddenly change.

Показать файл Открыть проект

Открытые свойства

Свойство	Тип	Описание
BIG_TERM	BytesRef

Открытые методы

Метод	Описание
CodePointCount ( BytesRef utf8 ) : int	Returns the number of code points in this UTF8 sequence. this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).
NewString ( int codePoints, int offset, int count ) : string	Cover JDK 1.5 API. Create a String from an array of codePoints.
ToCharArray ( int codePoints, int offset, int count ) : char[]	Generates char array that represents the provided input code points
ToHexString ( string s ) : string
UTF16toUTF8 ( CharsRef source, int offset, int length, BytesRef result ) : void	Encode characters from a char[] source, starting at offset for length chars. After encoding, result.offset will always be 0.
UTF16toUTF8 ( char s, int offset, int length, BytesRef result ) : void	Encode characters from this String, starting at offset for length characters. After encoding, result.offset will always be 0.
UTF8toUTF16 ( BytesRef bytesRef, CharsRef chars ) : void	Utility method for #UTF8toUTF16(byte[], int, int, CharsRef)
UTF8toUTF16 ( byte utf8, int offset, int length, CharsRef chars ) : void	Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint. NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
UTF8toUTF32 ( BytesRef utf8, IntsRef utf32 ) : void	this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).
ValidUTF16String ( char s ) : bool
ValidUTF16String ( char s, int size ) : bool

Приватные методы

Метод	Описание
UnicodeUtil ( ) : System

Описание методов

CodePointCount() публичный статический Метод

Returns the number of code points in this UTF8 sequence.

this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).

If invalid codepoint header byte occurs or the /// content is prematurely truncated.

public static CodePointCount ( BytesRef utf8 ) : int
utf8	BytesRef
Результат	int

NewString() публичный статический Метод

Cover JDK 1.5 API. Create a String from an array of codePoints.

If an invalid code point is encountered If the offset or count are out of bounds.

public static NewString ( int codePoints, int offset, int count ) : string
codePoints	int	The code array
offset	int	The start of the text in the code point array
count	int	The number of code points
Результат	string

ToCharArray() публичный статический Метод

Generates char array that represents the provided input code points

public static ToCharArray ( int codePoints, int offset, int count ) : char[]
codePoints	int	The code array
offset	int	The start of the text in the code point array
count	int	The number of code points
Результат	char[]

ToHexString() публичный статический Метод

public static ToHexString ( string s ) : string
s	string
Результат	string

UTF16toUTF8() публичный статический Метод

Encode characters from a char[] source, starting at offset for length chars. After encoding, result.offset will always be 0.

public static UTF16toUTF8 ( CharsRef source, int offset, int length, BytesRef result ) : void
source	CharsRef
offset	int
length	int
result	BytesRef
Результат	void

UTF16toUTF8() публичный статический Метод

Encode characters from this String, starting at offset for length characters. After encoding, result.offset will always be 0.

public static UTF16toUTF8 ( char s, int offset, int length, BytesRef result ) : void
s	char
offset	int
length	int
result	BytesRef
Результат	void

UTF8toUTF16() публичный статический Метод

Utility method for #UTF8toUTF16(byte[], int, int, CharsRef)

public static UTF8toUTF16 ( BytesRef bytesRef, CharsRef chars ) : void
bytesRef	BytesRef
chars	CharsRef
Результат	void

UTF8toUTF16() публичный статический Метод

Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.

NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.

public static UTF8toUTF16 ( byte utf8, int offset, int length, CharsRef chars ) : void
utf8	byte
offset	int
length	int
chars	CharsRef
Результат	void

UTF8toUTF32() публичный статический Метод

If invalid codepoint header byte occurs or the /// content is prematurely truncated.

public static UTF8toUTF32 ( BytesRef utf8, IntsRef utf32 ) : void
utf8	BytesRef
utf32	IntsRef
Результат	void

ValidUTF16String() публичный статический Метод

public static ValidUTF16String ( char s ) : bool
s	char
Результат	bool

ValidUTF16String() публичный статический Метод

public static ValidUTF16String ( char s, int size ) : bool
s	char
size	int
Результат	bool

Описание свойств

BIG_TERM публичное статическое свойство

A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms (e.g. collation keys) one would normally encounter, and definitely bigger than any UTF-8 terms.

WARNING: this is not a valid UTF8 Term

public static BytesRef BIG_TERM
Результат	BytesRef