C# Class NCrawler.HtmlProcessor.ContentCrawlerRules

Class for filtering content, for example you might wan't to exclude partial link contained in a speciel part of a html page from beeing followed. Or maybe exclude part of a textual content And replacecontent based regex
Mostra file Open project: esbencarlsen/NCrawler

Protected Methods

Method Description
ContentCrawlerRules ( ) : System.Collections.Generic

Initializes a new instance of the ContentCrawlerRules class.

ContentCrawlerRules ( string>.Dictionary filterTextRules, string>.Dictionary filterLinksRules ) : System.Collections.Generic

Initializes a new instance of the ContentCrawlerRules class.

StripLinks ( string content ) : string

StripText ( string content ) : string

Substitute ( string original, CrawlStep crawlStep ) : string

Private Methods

Method Description
StripByRules ( string>.Dictionary rules, string content ) : string

Basically strips everything between the start marker and the end marker The start marker is the Key in the Dictionary, the end marker is the Value

Method Details

ContentCrawlerRules() protected method

Initializes a new instance of the ContentCrawlerRules class.
protected ContentCrawlerRules ( ) : System.Collections.Generic
return System.Collections.Generic

ContentCrawlerRules() protected method

Initializes a new instance of the ContentCrawlerRules class.
protected ContentCrawlerRules ( string>.Dictionary filterTextRules, string>.Dictionary filterLinksRules ) : System.Collections.Generic
filterTextRules string>.Dictionary /// The filter text rules. ///
filterLinksRules string>.Dictionary /// The filter links rules. ///
return System.Collections.Generic

StripLinks() protected method

protected StripLinks ( string content ) : string
content string /// The content. ///
return string

StripText() protected method

protected StripText ( string content ) : string
content string /// The content. ///
return string

Substitute() protected method

protected Substitute ( string original, CrawlStep crawlStep ) : string
original string
crawlStep CrawlStep
return string