C# Класс WikipediaAvsAnTrieExtractor.WhitespaceNormalizer

Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод Описание
Normalize ( string text ) : string

This normalizes a string such that consecutive whitespace and tabs are replaced by a single space, and such that any leading or trailing whitespace on any line gets trimmed. Sequences of empty lines are replaced by a single empty line. After the last normal character, at most one line-break is permitted. Implementation limitation: before the first character on the first line, a single whitespace will not be removed. This implementation is somewhat odd, but the regex implementation is surprisingly slow due to backtracking issues which arrise from the matching of consecutive empty lines (which might contain white space). The purpose of this implementation is to essentially remove superfluous spaces being those that lead or trail any line and to remove superflous empty lines, such that a single empty line is still permitted (being a wikipedia paragraph break). Details: carriage returns aren't processed as whitespace (wikipedia doesn't contain these), and it's possible though weird to have a single paragraph break before the text

Описание методов

Normalize() публичный статический Метод

This normalizes a string such that consecutive whitespace and tabs are replaced by a single space, and such that any leading or trailing whitespace on any line gets trimmed. Sequences of empty lines are replaced by a single empty line. After the last normal character, at most one line-break is permitted. Implementation limitation: before the first character on the first line, a single whitespace will not be removed. This implementation is somewhat odd, but the regex implementation is surprisingly slow due to backtracking issues which arrise from the matching of consecutive empty lines (which might contain white space). The purpose of this implementation is to essentially remove superfluous spaces being those that lead or trail any line and to remove superflous empty lines, such that a single empty line is still permitted (being a wikipedia paragraph break). Details: carriage returns aren't processed as whitespace (wikipedia doesn't contain these), and it's possible though weird to have a single paragraph break before the text
public static Normalize ( string text ) : string
text string
Результат string