이름 |
설명 |
AddPrecedingLabelsFilter |
Adds the labels of the preceding block to the current block, optionally adding a prefix. |
ArticleMetadataFilter |
|
ContentFusion |
|
DocumentTitleMatchClassifier |
Marks NBoilerpipe.Document.TextBlock s which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain. |
KeepLargestBlockFilter |
Keeps the largest NBoilerpipe.Document.TextBlock only (by the number of words). In case of more than one block with the same number of words, the first block is chosen. All discarded blocks are marked "not content" and flagged as NBoilerpipe.Labels.DefaultLabels.MIGHT_BE_CONTENT . Note that, by default, only TextBlocks marked as "content" are taken into consideration. |
LabelFusion |
Fuses adjacent blocks if their labels are equal. |
SimpleBlockFusionProcessor |
Merges two subsequent blocks if their text densities are equal. |