C# Class CsQuery.HtmlParser.HtmlData

Reference data about HTML tags and attributes; methods to test tokens for certain properties; and the tokenizer.
Afficher le fichier Open project: prepare/HTML-Renderer Class Usage Examples

Méthodes publiques

Свойство Type Description
Debug bool
NumberChars HashSet
Units HashSet

Méthodes publiques

Méthode Description
AttributeEncode ( string text, bool alwaysQuote, string &quoteChar ) : string

HtmlEncode a string, except for double-quotes, so it can be enclosed in single-quotes.

ChildrenAllowed ( string nodeName ) : bool

Test whether this element can have children.

ChildrenAllowed ( ushort tokenId ) : bool

Test whether this element may have children.

HasValueProperty ( string nodeName ) : bool

Test if a node type has a VALUE property.

HasValueProperty ( ushort nodeNameToken ) : bool

Test if a node type has a VALUE property.

HtmlChildrenNotAllowed ( string nodeName ) : bool

This type does not allow HTML children. Some of these types may allow text but not HTML.

HtmlChildrenNotAllowed ( ushort nodeId ) : bool

This type does not allow HTML children. Some of these types may allow text but not HTML.

IsBlock ( string nodeName ) : bool

Test whether the node is a block-type element

IsBlock ( ushort tokenId ) : bool

Test whether the node is a block-type element.

IsBoolean ( string propertyName ) : bool

Test whether the attribute is a boolean type.

IsBoolean ( ushort tokenId ) : bool

Test whether the attribute is a boolean type.

IsCaseInsensitiveValues ( string attributeName ) : bool

Test whether an attribute has case-insensitive values (for selection purposes)

IsCaseInsensitiveValues ( ushort attributeToken ) : bool

Test whether an attribute has case-insensitive values (for selection purposes)

IsFormInputControl ( string nodeName ) : bool

Test if the node name is a form input control.

IsFormInputControl ( ushort nodeNameToken ) : bool

Test if the node name is a form input control

SpecialTagAction ( string tag, string newTag, bool isDocument = true ) : ushort

For testing only - the production code never uses this version.

SpecialTagAction ( ushort parentTagId, ushort newTagId ) : ushort

Return the type of action that should be performed given a tag, and a new tag found as a child of that tag.

Some tags have inner HTML but are often not closed properly. There are two possible situations. A tag may not have a nested instance of itself, and therefore any recurrence of that tag implies the previous one is closed. Other tag closings are simply optional, but are not repeater tags (e.g. body, html). These should be handled automatically by the logic that bubbles any closing tag to its parent if it doesn't match the current tag. The exception is <head> which technically does not require a close, but we would not expect to find another close tag Complete list of optional closing tags: HTML, HEAD, BODY, P, DT, DD, LI, OPTION, THEAD, TH, TBODY, TR, TD, TFOOT, COLGROUP body, html will be closed automatically at the end of parsing and are also not required.

SpecialTagActionForDocument ( ushort parentTagId, ushort newTagId ) : ushort

Determine a course of action given a new tag, its parent, and whether or not to treat this as a document. Return 1 to close, 0 to do nothing, or an ID to generate.

TokenName ( ushort tokenId ) : string

Return a token name for an ID.

Tokenize ( string name ) : ushort

Return a token for a name

TokenizeCaseSensitive ( string name ) : ushort

Return a token for a name, adding to the index if it doesn't exist. When indexing tags and attributes, TokenID(tokenName) should be used.

Private Methods

Méthode Description
HtmlData ( ) : System
PopulateTokenHashset ( IEnumerable tokens ) : HashSet
TokenizeImpl ( string tokenName ) : ushort

Return a token ID for a name, adding to the index if it doesn't exist. When indexing tags and attributes, ignoreCase should be used.

Touch ( ) : void
setBit ( IEnumerable tokens, TokenProperties bit ) : void

For each value in "tokens" (ignoring case) sets the specified bit in the reference table.

setBit ( IEnumerable tokens, TokenProperties bit ) : void

For each value in "tokens" sets the specified bit in the reference table.

setBit ( ushort token, TokenProperties bit ) : void

Set the specified bit in the reference table for "token".

Method Details

AttributeEncode() public static méthode

HtmlEncode a string, except for double-quotes, so it can be enclosed in single-quotes.
public static AttributeEncode ( string text, bool alwaysQuote, string &quoteChar ) : string
text string /// The text to encode ///
alwaysQuote bool /// When true, the attribute value will be quoted even if quotes are not required by the value. ///
quoteChar string /// [out] The quote character. ///
Résultat string

ChildrenAllowed() public static méthode

Test whether this element can have children.
public static ChildrenAllowed ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Résultat bool

ChildrenAllowed() public static méthode

Test whether this element may have children.
public static ChildrenAllowed ( ushort tokenId ) : bool
tokenId ushort /// The token ID. ///
Résultat bool

HasValueProperty() public static méthode

Test if a node type has a VALUE property.
public static HasValueProperty ( string nodeName ) : bool
nodeName string /// The node name token. ///
Résultat bool

HasValueProperty() public static méthode

Test if a node type has a VALUE property.
public static HasValueProperty ( ushort nodeNameToken ) : bool
nodeNameToken ushort /// Token ID of the node name. ///
Résultat bool

HtmlChildrenNotAllowed() public static méthode

This type does not allow HTML children. Some of these types may allow text but not HTML.
public static HtmlChildrenNotAllowed ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Résultat bool

HtmlChildrenNotAllowed() public static méthode

This type does not allow HTML children. Some of these types may allow text but not HTML.
public static HtmlChildrenNotAllowed ( ushort nodeId ) : bool
nodeId ushort /// The token ID ///
Résultat bool

IsBlock() public static méthode

Test whether the node is a block-type element
public static IsBlock ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Résultat bool

IsBlock() public static méthode

Test whether the node is a block-type element.
public static IsBlock ( ushort tokenId ) : bool
tokenId ushort /// The token ID of the node ///
Résultat bool

IsBoolean() public static méthode

Test whether the attribute is a boolean type.
public static IsBoolean ( string propertyName ) : bool
propertyName string /// The attribute or property name ///
Résultat bool

IsBoolean() public static méthode

Test whether the attribute is a boolean type.
public static IsBoolean ( ushort tokenId ) : bool
tokenId ushort /// The token ID ///
Résultat bool

IsCaseInsensitiveValues() public static méthode

Test whether an attribute has case-insensitive values (for selection purposes)
public static IsCaseInsensitiveValues ( string attributeName ) : bool
attributeName string /// Name of the attribute. ///
Résultat bool

IsCaseInsensitiveValues() public static méthode

Test whether an attribute has case-insensitive values (for selection purposes)
public static IsCaseInsensitiveValues ( ushort attributeToken ) : bool
attributeToken ushort /// Token ID of the attribute. ///
Résultat bool

IsFormInputControl() public static méthode

Test if the node name is a form input control.
public static IsFormInputControl ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Résultat bool

IsFormInputControl() public static méthode

Test if the node name is a form input control
public static IsFormInputControl ( ushort nodeNameToken ) : bool
nodeNameToken ushort /// The node name token. ///
Résultat bool

SpecialTagAction() public static méthode

For testing only - the production code never uses this version.
public static SpecialTagAction ( string tag, string newTag, bool isDocument = true ) : ushort
tag string /// . ///
newTag string /// . ///
isDocument bool /// . ///
Résultat ushort

SpecialTagAction() public static méthode

Return the type of action that should be performed given a tag, and a new tag found as a child of that tag.
Some tags have inner HTML but are often not closed properly. There are two possible situations. A tag may not have a nested instance of itself, and therefore any recurrence of that tag implies the previous one is closed. Other tag closings are simply optional, but are not repeater tags (e.g. body, html). These should be handled automatically by the logic that bubbles any closing tag to its parent if it doesn't match the current tag. The exception is <head> which technically does not require a close, but we would not expect to find another close tag Complete list of optional closing tags: HTML, HEAD, BODY, P, DT, DD, LI, OPTION, THEAD, TH, TBODY, TR, TD, TFOOT, COLGROUP body, html will be closed automatically at the end of parsing and are also not required.
public static SpecialTagAction ( ushort parentTagId, ushort newTagId ) : ushort
parentTagId ushort /// The parent tag's token. ///
newTagId ushort /// The new child tag's token. ///
Résultat ushort

SpecialTagActionForDocument() public static méthode

Determine a course of action given a new tag, its parent, and whether or not to treat this as a document. Return 1 to close, 0 to do nothing, or an ID to generate.
public static SpecialTagActionForDocument ( ushort parentTagId, ushort newTagId ) : ushort
parentTagId ushort /// The parent tag ID ///
newTagId ushort /// The new tag ID found ///
Résultat ushort

TokenName() public static méthode

Return a token name for an ID.
public static TokenName ( ushort tokenId ) : string
tokenId ushort /// The token ID ///
Résultat string

Tokenize() public static méthode

Return a token for a name
public static Tokenize ( string name ) : ushort
name string /// The name to tokenize. ///
Résultat ushort

TokenizeCaseSensitive() public static méthode

Return a token for a name, adding to the index if it doesn't exist. When indexing tags and attributes, TokenID(tokenName) should be used.
public static TokenizeCaseSensitive ( string name ) : ushort
name string /// The name to tokenize ///
Résultat ushort

Property Details

Debug public_oe static_oe property

Indicates whether this has been compiled in debug mode. When true, DOM index paths will be stored internally in extended human-readable format.
public static bool Debug
Résultat bool

NumberChars public_oe static_oe property

Things that can be in a CSS number
public static HashSet NumberChars
Résultat HashSet

Units public_oe static_oe property

The units that are allowable unit strings in a CSS style..
public static HashSet Units
Résultat HashSet