C# Class NBoilerpipe.Filters.English.NumWordsRulesClassifier

Classifies NBoilerpipe.Document.TextBlock s as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
Inheritance: BoilerpipeFilter
Datei anzeigen Open project: oganix/NBoilerpipe

Public Properties

Property Type Description
INSTANCE NumWordsRulesClassifier

Public Methods

Method Description
GetInstance ( ) : NumWordsRulesClassifier

Returns the singleton instance for RulebasedBoilerpipeClassifier.

Returns the singleton instance for RulebasedBoilerpipeClassifier.

Process ( NBoilerpipe.Document.TextDocument doc ) : bool

Protected Methods

Method Description
Classify ( TextBlock prev, TextBlock curr, TextBlock next ) : bool

Method Details

Classify() protected method

protected Classify ( TextBlock prev, TextBlock curr, TextBlock next ) : bool
prev NBoilerpipe.Document.TextBlock
curr NBoilerpipe.Document.TextBlock
next NBoilerpipe.Document.TextBlock
return bool

GetInstance() public static method

Returns the singleton instance for RulebasedBoilerpipeClassifier.
Returns the singleton instance for RulebasedBoilerpipeClassifier.
public static GetInstance ( ) : NumWordsRulesClassifier
return NumWordsRulesClassifier

Process() public method

public Process ( NBoilerpipe.Document.TextDocument doc ) : bool
doc NBoilerpipe.Document.TextDocument
return bool

Property Details

INSTANCE public_oe static_oe property

public static NumWordsRulesClassifier,NBoilerpipe.Filters.English INSTANCE
return NumWordsRulesClassifier