C# Class NBoilerpipe.Filters.Heuristics.DocumentTitleMatchClassifier

Marks NBoilerpipe.Document.TextBlock s which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
Inheritance: BoilerpipeFilter
Mostra file Open project: oganix/NBoilerpipe

Public Methods

Method Description
DocumentTitleMatchClassifier ( string title ) : System.Collections.Generic
GetPotentialTitles ( ) : ICollection
Process ( NBoilerpipe.Document.TextDocument doc ) : bool

Private Methods

Method Description
GetLongestPart ( string title, string pattern ) : string

Method Details

DocumentTitleMatchClassifier() public method

public DocumentTitleMatchClassifier ( string title ) : System.Collections.Generic
title string
return System.Collections.Generic

GetPotentialTitles() public method

public GetPotentialTitles ( ) : ICollection
return ICollection

Process() public method

public Process ( NBoilerpipe.Document.TextDocument doc ) : bool
doc NBoilerpipe.Document.TextDocument
return bool