C# Класс Lucene.Net.Util.UnicodeUtil

Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.

WARNING: This API is a new and experimental and may suddenly change.

Показать файл Открыть проект

Открытые свойства

Свойство Тип Описание
BIG_TERM BytesRef

Открытые методы

Метод Описание
CodePointCount ( BytesRef utf8 ) : int

Returns the number of code points in this UTF8 sequence.

this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).

NewString ( int codePoints, int offset, int count ) : string

Cover JDK 1.5 API. Create a String from an array of codePoints.

ToCharArray ( int codePoints, int offset, int count ) : char[]

Generates char array that represents the provided input code points

ToHexString ( string s ) : string
UTF16toUTF8 ( CharsRef source, int offset, int length, BytesRef result ) : void

Encode characters from a char[] source, starting at offset for length chars. After encoding, result.offset will always be 0.

UTF16toUTF8 ( char s, int offset, int length, BytesRef result ) : void

Encode characters from this String, starting at offset for length characters. After encoding, result.offset will always be 0.

UTF8toUTF16 ( BytesRef bytesRef, CharsRef chars ) : void

Utility method for #UTF8toUTF16(byte[], int, int, CharsRef)

UTF8toUTF16 ( byte utf8, int offset, int length, CharsRef chars ) : void

Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.

NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.

UTF8toUTF32 ( BytesRef utf8, IntsRef utf32 ) : void

this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).

ValidUTF16String ( char s ) : bool
ValidUTF16String ( char s, int size ) : bool

Приватные методы

Метод Описание
UnicodeUtil ( ) : System

Описание методов

CodePointCount() публичный статический Метод

Returns the number of code points in this UTF8 sequence.

this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).

If invalid codepoint header byte occurs or the /// content is prematurely truncated.
public static CodePointCount ( BytesRef utf8 ) : int
utf8 BytesRef
Результат int

NewString() публичный статический Метод

Cover JDK 1.5 API. Create a String from an array of codePoints.
If an invalid code point is encountered If the offset or count are out of bounds.
public static NewString ( int codePoints, int offset, int count ) : string
codePoints int The code array
offset int The start of the text in the code point array
count int The number of code points
Результат string

ToCharArray() публичный статический Метод

Generates char array that represents the provided input code points
public static ToCharArray ( int codePoints, int offset, int count ) : char[]
codePoints int The code array
offset int The start of the text in the code point array
count int The number of code points
Результат char[]

ToHexString() публичный статический Метод

public static ToHexString ( string s ) : string
s string
Результат string

UTF16toUTF8() публичный статический Метод

Encode characters from a char[] source, starting at offset for length chars. After encoding, result.offset will always be 0.
public static UTF16toUTF8 ( CharsRef source, int offset, int length, BytesRef result ) : void
source CharsRef
offset int
length int
result BytesRef
Результат void

UTF16toUTF8() публичный статический Метод

Encode characters from this String, starting at offset for length characters. After encoding, result.offset will always be 0.
public static UTF16toUTF8 ( char s, int offset, int length, BytesRef result ) : void
s char
offset int
length int
result BytesRef
Результат void

UTF8toUTF16() публичный статический Метод

Utility method for #UTF8toUTF16(byte[], int, int, CharsRef)
public static UTF8toUTF16 ( BytesRef bytesRef, CharsRef chars ) : void
bytesRef BytesRef
chars CharsRef
Результат void

UTF8toUTF16() публичный статический Метод

Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.

NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.

public static UTF8toUTF16 ( byte utf8, int offset, int length, CharsRef chars ) : void
utf8 byte
offset int
length int
chars CharsRef
Результат void

UTF8toUTF32() публичный статический Метод

this method assumes valid UTF8 input. this method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).

If invalid codepoint header byte occurs or the /// content is prematurely truncated.
public static UTF8toUTF32 ( BytesRef utf8, IntsRef utf32 ) : void
utf8 BytesRef
utf32 IntsRef
Результат void

ValidUTF16String() публичный статический Метод

public static ValidUTF16String ( char s ) : bool
s char
Результат bool

ValidUTF16String() публичный статический Метод

public static ValidUTF16String ( char s, int size ) : bool
s char
size int
Результат bool

Описание свойств

BIG_TERM публичное статическое свойство

A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms (e.g. collation keys) one would normally encounter, and definitely bigger than any UTF-8 terms.

WARNING: this is not a valid UTF8 Term

public static BytesRef BIG_TERM
Результат BytesRef