Name |
Description |
BoilerplateBlockFilter |
Removes NBoilerpipe.Document.TextBlock s which have explicitly been marked as "not content". |
InvertedFilter |
Reverts the "isContent" flag for all NBoilerpipe.Document.TextBlock s |
LabelToContentFilter |
Marks all blocks that contain a given label as "content". |
MinClauseWordsFilter |
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5). |
MinWordsFilter |
Keeps only those content blocks which contain at least k words. |
SplitParagraphBlocksFilter |
Splits TextBlocks at paragraph boundaries. |