C# Class UniHax.Fuzzer

The Fuzzer has cases for some of the oddball manifestations of Unicode that can trip up software including: - non-character, reserved, and private use area code points - special meaning characters such as the BOM and RLO - ill-formed byte sequences - a half-surrogate code point
Mostrar archivo Open project: cweb/unicode-hax

Public Properties

Property Type Description
u0390 string
u1D160 string
u1F82 string
uBOM string
uBoldEight string
uDAAD string
uDEAD string
uFB2C string
uFDFA string
uFullwidthSolidus string
uIdnaSs string
uMVS string
uNotACharacter string
uPrivate string
uRLO string
uReservedCodePoint string
uUnassigned string
uWordJoiner string

Public Methods

Method Description
GetBom ( ) : string
GetCharacterBytes ( string encoding, string character ) : byte[]

Gets the requested byte representation of the current Unicode character codepoint

GetCharacterBytesMalformed ( string encoding, string character ) : byte[]

Malforms the bytes by removing the last byte from whichever encoding you specify.

OutOfRangeCodePointAsUtf32BE ( ) : byte[]

Return a UTF32 byte encoding for an illegal code point value U+1FFFFF. Note that Unicode 6.0 supports only up to U+10FFFF. UTF-8 percent encoding for something out of range is %F4%8F%BF%BE

Method Details

GetBom() public method

public GetBom ( ) : string
return string

GetCharacterBytes() public method

Gets the requested byte representation of the current Unicode character codepoint
public GetCharacterBytes ( string encoding, string character ) : byte[]
encoding string The encoding you want a byte representation in. Specify utf-8, utf-16le, or utf16-be
character string A single character sent as a string.
return byte[]

GetCharacterBytesMalformed() public method

Malforms the bytes by removing the last byte from whichever encoding you specify.
public GetCharacterBytesMalformed ( string encoding, string character ) : byte[]
encoding string The encoding you want a byte representation in. Specify utf-8, utf-16le, or utf16-be
character string A single character sent as a string.
return byte[]

OutOfRangeCodePointAsUtf32BE() public method

Return a UTF32 byte encoding for an illegal code point value U+1FFFFF. Note that Unicode 6.0 supports only up to U+10FFFF. UTF-8 percent encoding for something out of range is %F4%8F%BF%BE
public OutOfRangeCodePointAsUtf32BE ( ) : byte[]
return byte[]

Property Details

u0390 public_oe static_oe property

U+0390 expands by 3x (UTF-8) under NFD UTF-8 percent encoding is %CE%90
public static string u0390
return string

u1D160 public_oe static_oe property

U+1D160 expands by 3x (UTF-8) under NFC UTF-8 percent encoding is %F0%9D%85%A0
public static string u1D160
return string

u1F82 public_oe static_oe property

U+1F82 expands by 4x (UTF-16) under NFD UTF-8 percent encoding is %E1%BE%82
public static string u1F82
return string

uBOM public_oe static_oe property

The Byte Order Mark U+FEFF is a special character defining the byte order and endianess of text data. UTF-8 percent encoding is %EF%BB%BF
public static string uBOM
return string

uBoldEight public_oe static_oe property

Code point with a numerical mapping and value U+1D7D6 MATHEMATICAL BOLD DIGIT EIGHT UTF-8 percent encoding is %F0%9D%9F%96
public static string uBoldEight
return string

uDAAD public_oe static_oe property

An illegal high half-surrogate U+DAAD UTF-8 percent encoding is %ed%aa%ad
public static string uDAAD
return string

uDEAD public_oe static_oe property

An illegal low half-surrogate U+DEAD UTF-8 percent encoding is %ed%ba%ad
public static string uDEAD
return string

uFB2C public_oe static_oe property

U+FB2C expands by 3x (UTF-16) under NFC UTF-8 percent encoding is %EF%AC%AC
public static string uFB2C
return string

uFDFA public_oe static_oe property

U+FDFD expands by 11x (UTF-8) and 18x (UTF-16) under NFKC/NFKC UTF-8 percent encoding is %EF%B7%BA
public static string uFDFA
return string

uFullwidthSolidus public_oe static_oe property

U+FF0F FULLWIDTH SOLIDUS should normalize to / in a hostname UTF-8 percent encoding is %EF%BC%8F
public static string uFullwidthSolidus
return string

uIdnaSs public_oe static_oe property

IDNA2003/2008 Deviant - U+00DF normalizes to "ss" during IDNA2003's mapping phase, different from its IDNA2008 mapping. See http://www.unicode.org/reports/tr46/ UTF-8 percent encoding is %C3%9F
public static string uIdnaSs
return string

uMVS public_oe static_oe property

Mongolian Vowel Separator U+180E is invisible and has the whitespace property. UTF-8 percent encoding is %E1%A0%8E
public static string uMVS
return string

uNotACharacter public_oe static_oe property

The code point U+FFFF is guaranteed to not be a Unicode character at all UTF-8 percent encoding is %ef%bf%bf
public static string uNotACharacter
return string

uPrivate public_oe static_oe property

A Private Use Area code point U+F8FF which Apple happens to use for its logo. UTF-8 percent encoding is %EF%A3%BF
public static string uPrivate
return string

uRLO public_oe static_oe property

The Right to Left Override U+202E defines special meaning to re-order the display of text for right-to-left reading. UTF-8 percent encoding is %E2%80%AE
public static string uRLO
return string

uReservedCodePoint public_oe static_oe property

A reserved code point U+FEFE UTF-8 percent encoding is %ef%bb%be
public static string uReservedCodePoint
return string

uUnassigned public_oe static_oe property

An unassigned code point U+0FED UTF-8 percent encoding is %e0%bf%ad
public static string uUnassigned
return string

uWordJoiner public_oe static_oe property

Word Joiner U+2060 is an invisible zero-width character. UTF-8 percent encoding is %E2%81%A0
public static string uWordJoiner
return string