C# Класс CsQuery.HtmlParser.HtmlData

Reference data about HTML tags and attributes; methods to test tokens for certain properties; and the tokenizer.
Показать файл Открыть проект Примеры использования класса

Открытые свойства

Свойство Тип Описание
Debug bool
NumberChars HashSet
Units HashSet

Открытые методы

Метод Описание
AttributeEncode ( string text, bool alwaysQuote, string &quoteChar ) : string

HtmlEncode a string, except for double-quotes, so it can be enclosed in single-quotes.

ChildrenAllowed ( string nodeName ) : bool

Test whether this element can have children.

ChildrenAllowed ( ushort tokenId ) : bool

Test whether this element may have children.

HasValueProperty ( string nodeName ) : bool

Test if a node type has a VALUE property.

HasValueProperty ( ushort nodeNameToken ) : bool

Test if a node type has a VALUE property.

HtmlChildrenNotAllowed ( string nodeName ) : bool

This type does not allow HTML children. Some of these types may allow text but not HTML.

HtmlChildrenNotAllowed ( ushort nodeId ) : bool

This type does not allow HTML children. Some of these types may allow text but not HTML.

IsBlock ( string nodeName ) : bool

Test whether the node is a block-type element

IsBlock ( ushort tokenId ) : bool

Test whether the node is a block-type element.

IsBoolean ( string propertyName ) : bool

Test whether the attribute is a boolean type.

IsBoolean ( ushort tokenId ) : bool

Test whether the attribute is a boolean type.

IsCaseInsensitiveValues ( string attributeName ) : bool

Test whether an attribute has case-insensitive values (for selection purposes)

IsCaseInsensitiveValues ( ushort attributeToken ) : bool

Test whether an attribute has case-insensitive values (for selection purposes)

IsFormInputControl ( string nodeName ) : bool

Test if the node name is a form input control.

IsFormInputControl ( ushort nodeNameToken ) : bool

Test if the node name is a form input control

SpecialTagAction ( string tag, string newTag, bool isDocument = true ) : ushort

For testing only - the production code never uses this version.

SpecialTagAction ( ushort parentTagId, ushort newTagId ) : ushort

Return the type of action that should be performed given a tag, and a new tag found as a child of that tag.

Some tags have inner HTML but are often not closed properly. There are two possible situations. A tag may not have a nested instance of itself, and therefore any recurrence of that tag implies the previous one is closed. Other tag closings are simply optional, but are not repeater tags (e.g. body, html). These should be handled automatically by the logic that bubbles any closing tag to its parent if it doesn't match the current tag. The exception is <head> which technically does not require a close, but we would not expect to find another close tag Complete list of optional closing tags: HTML, HEAD, BODY, P, DT, DD, LI, OPTION, THEAD, TH, TBODY, TR, TD, TFOOT, COLGROUP body, html will be closed automatically at the end of parsing and are also not required.

SpecialTagActionForDocument ( ushort parentTagId, ushort newTagId ) : ushort

Determine a course of action given a new tag, its parent, and whether or not to treat this as a document. Return 1 to close, 0 to do nothing, or an ID to generate.

TokenName ( ushort tokenId ) : string

Return a token name for an ID.

Tokenize ( string name ) : ushort

Return a token for a name

TokenizeCaseSensitive ( string name ) : ushort

Return a token for a name, adding to the index if it doesn't exist. When indexing tags and attributes, TokenID(tokenName) should be used.

Приватные методы

Метод Описание
HtmlData ( ) : System
PopulateTokenHashset ( IEnumerable tokens ) : HashSet
TokenizeImpl ( string tokenName ) : ushort

Return a token ID for a name, adding to the index if it doesn't exist. When indexing tags and attributes, ignoreCase should be used.

Touch ( ) : void
setBit ( IEnumerable tokens, TokenProperties bit ) : void

For each value in "tokens" (ignoring case) sets the specified bit in the reference table.

setBit ( IEnumerable tokens, TokenProperties bit ) : void

For each value in "tokens" sets the specified bit in the reference table.

setBit ( ushort token, TokenProperties bit ) : void

Set the specified bit in the reference table for "token".

Описание методов

AttributeEncode() публичный статический Метод

HtmlEncode a string, except for double-quotes, so it can be enclosed in single-quotes.
public static AttributeEncode ( string text, bool alwaysQuote, string &quoteChar ) : string
text string /// The text to encode ///
alwaysQuote bool /// When true, the attribute value will be quoted even if quotes are not required by the value. ///
quoteChar string /// [out] The quote character. ///
Результат string

ChildrenAllowed() публичный статический Метод

Test whether this element can have children.
public static ChildrenAllowed ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Результат bool

ChildrenAllowed() публичный статический Метод

Test whether this element may have children.
public static ChildrenAllowed ( ushort tokenId ) : bool
tokenId ushort /// The token ID. ///
Результат bool

HasValueProperty() публичный статический Метод

Test if a node type has a VALUE property.
public static HasValueProperty ( string nodeName ) : bool
nodeName string /// The node name token. ///
Результат bool

HasValueProperty() публичный статический Метод

Test if a node type has a VALUE property.
public static HasValueProperty ( ushort nodeNameToken ) : bool
nodeNameToken ushort /// Token ID of the node name. ///
Результат bool

HtmlChildrenNotAllowed() публичный статический Метод

This type does not allow HTML children. Some of these types may allow text but not HTML.
public static HtmlChildrenNotAllowed ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Результат bool

HtmlChildrenNotAllowed() публичный статический Метод

This type does not allow HTML children. Some of these types may allow text but not HTML.
public static HtmlChildrenNotAllowed ( ushort nodeId ) : bool
nodeId ushort /// The token ID ///
Результат bool

IsBlock() публичный статический Метод

Test whether the node is a block-type element
public static IsBlock ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Результат bool

IsBlock() публичный статический Метод

Test whether the node is a block-type element.
public static IsBlock ( ushort tokenId ) : bool
tokenId ushort /// The token ID of the node ///
Результат bool

IsBoolean() публичный статический Метод

Test whether the attribute is a boolean type.
public static IsBoolean ( string propertyName ) : bool
propertyName string /// The attribute or property name ///
Результат bool

IsBoolean() публичный статический Метод

Test whether the attribute is a boolean type.
public static IsBoolean ( ushort tokenId ) : bool
tokenId ushort /// The token ID ///
Результат bool

IsCaseInsensitiveValues() публичный статический Метод

Test whether an attribute has case-insensitive values (for selection purposes)
public static IsCaseInsensitiveValues ( string attributeName ) : bool
attributeName string /// Name of the attribute. ///
Результат bool

IsCaseInsensitiveValues() публичный статический Метод

Test whether an attribute has case-insensitive values (for selection purposes)
public static IsCaseInsensitiveValues ( ushort attributeToken ) : bool
attributeToken ushort /// Token ID of the attribute. ///
Результат bool

IsFormInputControl() публичный статический Метод

Test if the node name is a form input control.
public static IsFormInputControl ( string nodeName ) : bool
nodeName string /// The node name to test. ///
Результат bool

IsFormInputControl() публичный статический Метод

Test if the node name is a form input control
public static IsFormInputControl ( ushort nodeNameToken ) : bool
nodeNameToken ushort /// The node name token. ///
Результат bool

SpecialTagAction() публичный статический Метод

For testing only - the production code never uses this version.
public static SpecialTagAction ( string tag, string newTag, bool isDocument = true ) : ushort
tag string /// . ///
newTag string /// . ///
isDocument bool /// . ///
Результат ushort

SpecialTagAction() публичный статический Метод

Return the type of action that should be performed given a tag, and a new tag found as a child of that tag.
Some tags have inner HTML but are often not closed properly. There are two possible situations. A tag may not have a nested instance of itself, and therefore any recurrence of that tag implies the previous one is closed. Other tag closings are simply optional, but are not repeater tags (e.g. body, html). These should be handled automatically by the logic that bubbles any closing tag to its parent if it doesn't match the current tag. The exception is <head> which technically does not require a close, but we would not expect to find another close tag Complete list of optional closing tags: HTML, HEAD, BODY, P, DT, DD, LI, OPTION, THEAD, TH, TBODY, TR, TD, TFOOT, COLGROUP body, html will be closed automatically at the end of parsing and are also not required.
public static SpecialTagAction ( ushort parentTagId, ushort newTagId ) : ushort
parentTagId ushort /// The parent tag's token. ///
newTagId ushort /// The new child tag's token. ///
Результат ushort

SpecialTagActionForDocument() публичный статический Метод

Determine a course of action given a new tag, its parent, and whether or not to treat this as a document. Return 1 to close, 0 to do nothing, or an ID to generate.
public static SpecialTagActionForDocument ( ushort parentTagId, ushort newTagId ) : ushort
parentTagId ushort /// The parent tag ID ///
newTagId ushort /// The new tag ID found ///
Результат ushort

TokenName() публичный статический Метод

Return a token name for an ID.
public static TokenName ( ushort tokenId ) : string
tokenId ushort /// The token ID ///
Результат string

Tokenize() публичный статический Метод

Return a token for a name
public static Tokenize ( string name ) : ushort
name string /// The name to tokenize. ///
Результат ushort

TokenizeCaseSensitive() публичный статический Метод

Return a token for a name, adding to the index if it doesn't exist. When indexing tags and attributes, TokenID(tokenName) should be used.
public static TokenizeCaseSensitive ( string name ) : ushort
name string /// The name to tokenize ///
Результат ushort

Описание свойств

Debug публичное статическое свойство

Indicates whether this has been compiled in debug mode. When true, DOM index paths will be stored internally in extended human-readable format.
public static bool Debug
Результат bool

NumberChars публичное статическое свойство

Things that can be in a CSS number
public static HashSet NumberChars
Результат HashSet

Units публичное статическое свойство

The units that are allowable unit strings in a CSS style..
public static HashSet Units
Результат HashSet