Name |
Description |
BodyContentExtractor |
Gets the inner html of the body. |
CommentsDivCleaner |
|
CommentsRemover |
Removes comments ('<!-- ... >' and '<![ ... ]>') from an html source. |
CorrectAttributesCleaner |
Corrects the attributes that miss ' or ". |
CorrectTagsClosingCleaner |
Corrects the <img> and <br> tags generated by Word. |
DoctypeRemover |
Removes the doctype declaration from a given html code. |
EmptyParagraphsCleaner |
Replaces empty paragraphs with line breaks ('<br/>'). |
HeadSectionRemover |
Removes the head section from an html source. |
ListCharsCleaner |
Replaces some characters used by MS Word for bullet lists with 'o' characters. |
LocalToWebHTMLCleaner |
Main cleaner for LocalToWeb . |
NbspBetweenTagsRemover |
Removes the between tags. |
NbspReplacer |
Removes the between tags. |
OfficeNameSpacesTagsRemover |
Removes the tags that are in the office namespaces. |
TidyHTMLCleaner |
Uses Tidy.Net to clean a html source. |
WebToLocalHTMLCleaner |
Main HTML cleaner for WebToLocal . |
XmlNamespaceDefinitionsReplacer |
Replaces the opening html tag with a given one. |