C# Class UniHax.Fuzzer

The Fuzzer has cases for some of the oddball manifestations of Unicode that can trip up software including: - non-character, reserved, and private use area code points - special meaning characters such as the BOM and RLO - ill-formed byte sequences - a half-surrogate code point

Mostrar archivo Open project: cweb/unicode-hax

Public Properties

Property	Type	Description
u0390	string
u1D160	string
u1F82	string
uBOM	string
uBoldEight	string
uDAAD	string
uDEAD	string
uFB2C	string
uFDFA	string
uFullwidthSolidus	string
uIdnaSs	string
uMVS	string
uNotACharacter	string
uPrivate	string
uRLO	string
uReservedCodePoint	string
uUnassigned	string
uWordJoiner	string

Public Methods

Method	Description
GetBom ( ) : string
GetCharacterBytes ( string encoding, string character ) : byte[]	Gets the requested byte representation of the current Unicode character codepoint
GetCharacterBytesMalformed ( string encoding, string character ) : byte[]	Malforms the bytes by removing the last byte from whichever encoding you specify.
OutOfRangeCodePointAsUtf32BE ( ) : byte[]	Return a UTF32 byte encoding for an illegal code point value U+1FFFFF. Note that Unicode 6.0 supports only up to U+10FFFF. UTF-8 percent encoding for something out of range is %F4%8F%BF%BE

Method Details

GetBom() public method

public GetBom ( ) : string
return	string

GetCharacterBytes() public method

Gets the requested byte representation of the current Unicode character codepoint

public GetCharacterBytes ( string encoding, string character ) : byte[]
encoding	string	The encoding you want a byte representation in. Specify utf-8, utf-16le, or utf16-be
character	string	A single character sent as a string.
return	byte[]

GetCharacterBytesMalformed() public method

Malforms the bytes by removing the last byte from whichever encoding you specify.

public GetCharacterBytesMalformed ( string encoding, string character ) : byte[]
encoding	string	The encoding you want a byte representation in. Specify utf-8, utf-16le, or utf16-be
character	string	A single character sent as a string.
return	byte[]

OutOfRangeCodePointAsUtf32BE() public method

Return a UTF32 byte encoding for an illegal code point value U+1FFFFF. Note that Unicode 6.0 supports only up to U+10FFFF. UTF-8 percent encoding for something out of range is %F4%8F%BF%BE

public OutOfRangeCodePointAsUtf32BE ( ) : byte[]
return	byte[]

Property Details

u0390 public_oe static_oe property

U+0390 expands by 3x (UTF-8) under NFD UTF-8 percent encoding is %CE%90

public static string u0390
return	string

u1D160 public_oe static_oe property

U+1D160 expands by 3x (UTF-8) under NFC UTF-8 percent encoding is %F0%9D%85%A0

public static string u1D160
return	string

u1F82 public_oe static_oe property

U+1F82 expands by 4x (UTF-16) under NFD UTF-8 percent encoding is %E1%BE%82

public static string u1F82
return	string

uBOM public_oe static_oe property

The Byte Order Mark U+FEFF is a special character defining the byte order and endianess of text data. UTF-8 percent encoding is %EF%BB%BF

public static string uBOM
return	string

uBoldEight public_oe static_oe property

Code point with a numerical mapping and value U+1D7D6 MATHEMATICAL BOLD DIGIT EIGHT UTF-8 percent encoding is %F0%9D%9F%96

public static string uBoldEight
return	string

uDAAD public_oe static_oe property

An illegal high half-surrogate U+DAAD UTF-8 percent encoding is %ed%aa%ad

public static string uDAAD
return	string

uDEAD public_oe static_oe property

An illegal low half-surrogate U+DEAD UTF-8 percent encoding is %ed%ba%ad

public static string uDEAD
return	string

uFB2C public_oe static_oe property

U+FB2C expands by 3x (UTF-16) under NFC UTF-8 percent encoding is %EF%AC%AC

public static string uFB2C
return	string

uFDFA public_oe static_oe property

U+FDFD expands by 11x (UTF-8) and 18x (UTF-16) under NFKC/NFKC UTF-8 percent encoding is %EF%B7%BA

public static string uFDFA
return	string

uFullwidthSolidus public_oe static_oe property

U+FF0F FULLWIDTH SOLIDUS should normalize to / in a hostname UTF-8 percent encoding is %EF%BC%8F

public static string uFullwidthSolidus
return	string

uIdnaSs public_oe static_oe property

IDNA2003/2008 Deviant - U+00DF normalizes to "ss" during IDNA2003's mapping phase, different from its IDNA2008 mapping. See http://www.unicode.org/reports/tr46/ UTF-8 percent encoding is %C3%9F

public static string uIdnaSs
return	string

uMVS public_oe static_oe property

Mongolian Vowel Separator U+180E is invisible and has the whitespace property. UTF-8 percent encoding is %E1%A0%8E

public static string uMVS
return	string

uNotACharacter public_oe static_oe property

The code point U+FFFF is guaranteed to not be a Unicode character at all UTF-8 percent encoding is %ef%bf%bf

public static string uNotACharacter
return	string

uPrivate public_oe static_oe property

A Private Use Area code point U+F8FF which Apple happens to use for its logo. UTF-8 percent encoding is %EF%A3%BF

public static string uPrivate
return	string

uRLO public_oe static_oe property

The Right to Left Override U+202E defines special meaning to re-order the display of text for right-to-left reading. UTF-8 percent encoding is %E2%80%AE

public static string uRLO
return	string

uReservedCodePoint public_oe static_oe property

A reserved code point U+FEFE UTF-8 percent encoding is %ef%bb%be

public static string uReservedCodePoint
return	string

uUnassigned public_oe static_oe property

An unassigned code point U+0FED UTF-8 percent encoding is %e0%bf%ad

public static string uUnassigned
return	string

uWordJoiner public_oe static_oe property

Word Joiner U+2060 is an invisible zero-width character. UTF-8 percent encoding is %E2%81%A0

public static string uWordJoiner
return	string