C# Class WikipediaAvsAnTrieExtractor.WhitespaceNormalizer

显示文件 Open project: EamonNerbonne/a-vs-an Class Usage Examples

Public Methods

Method Description
Normalize ( string text ) : string

This normalizes a string such that consecutive whitespace and tabs are replaced by a single space, and such that any leading or trailing whitespace on any line gets trimmed. Sequences of empty lines are replaced by a single empty line. After the last normal character, at most one line-break is permitted. Implementation limitation: before the first character on the first line, a single whitespace will not be removed. This implementation is somewhat odd, but the regex implementation is surprisingly slow due to backtracking issues which arrise from the matching of consecutive empty lines (which might contain white space). The purpose of this implementation is to essentially remove superfluous spaces being those that lead or trail any line and to remove superflous empty lines, such that a single empty line is still permitted (being a wikipedia paragraph break). Details: carriage returns aren't processed as whitespace (wikipedia doesn't contain these), and it's possible though weird to have a single paragraph break before the text

Method Details

Normalize() public static method

This normalizes a string such that consecutive whitespace and tabs are replaced by a single space, and such that any leading or trailing whitespace on any line gets trimmed. Sequences of empty lines are replaced by a single empty line. After the last normal character, at most one line-break is permitted. Implementation limitation: before the first character on the first line, a single whitespace will not be removed. This implementation is somewhat odd, but the regex implementation is surprisingly slow due to backtracking issues which arrise from the matching of consecutive empty lines (which might contain white space). The purpose of this implementation is to essentially remove superfluous spaces being those that lead or trail any line and to remove superflous empty lines, such that a single empty line is still permitted (being a wikipedia paragraph break). Details: carriage returns aren't processed as whitespace (wikipedia doesn't contain these), and it's possible though weird to have a single paragraph break before the text
public static Normalize ( string text ) : string
text string
return string