C# Класс RTools.Util.StreamTokenizer

A StreamTokenizer similar to Java's. This breaks an input stream (coming from a TextReader) into Tokens based on various settings. The settings are stored in the TokenizerSettings property, which is a StreamTokenizerSettings instance.

This is configurable in that you can modify TokenizerSettings.CharTypes[] array to specify which characters are which type, along with other settings such as whether to look for comments or not.

WARNING: This is not internationalized. This treats all characters beyond the 7-bit ASCII range (decimal 127) as Word characters.

There are two main ways to use this: 1) Parse the entire stream at once and get an List of Tokens (see the Tokenize* methods), and 2) call NextToken() successively. This reads from a TextReader, which you can set directly, and this also provides some convenient methods to parse files and strings. This returns an Eof token if the end of the input is reached.

Here's an example of the NextToken() style of use: StreamTokenizer tokenizer = new StreamTokenizer(); tokenizer.GrabWhitespace = true; tokenizer.Verbosity = VerbosityLevel.Debug; // just for debugging tokenizer.TextReader = File.OpenText(fileName); Token token; while (tokenizer.NextToken(out token)) log.Info("Token = '{0}'", token);

Here's an example of the Tokenize... style of use: StreamTokenizer tokenizer = new StreamTokenizer("some string"); List tokens = new List(); if (!tokenizer.Tokenize(tokens)) { // error handling } foreach (Token t in tokens) Console.WriteLine("t = {0}", t);

Comment delimiters are hardcoded (// and /*), not affected by char type table.

This sets line numbers in the tokens it produces. These numbers are normally the line on which the token starts. There is one known caveat, and that is that when GrabWhitespace setting is true, and a whitespace token contains a newline, that token's line number will be set to the following line rather than the line on which the token started.

Показать файл Открыть проект Примеры использования класса

Открытые свойства

Свойство Тип Описание
NChars int

Открытые методы

Метод Описание
Display ( ) : void

Display the state of this object.

Display ( string prefix ) : void

Display the state of this object, with a per-line prefix.

NextToken ( Token &token ) : bool

Get the next token. The last token will be an EofToken unless there's an unterminated quote or unterminated block comment and Settings.DoUntermCheck is true, in which case this throws an exception of type StreamTokenizerUntermException or sub-class.

SpeedTest ( ) : bool

Speed test. This tests the speed of the parse.

StreamTokenizer ( ) : System

Default constructor.

StreamTokenizer ( TextReader sr ) : System

Construct and set this object's TextReader to the one specified.

StreamTokenizer ( string str ) : System

Construct and set a string to tokenize.

TestSelf ( ) : bool

Simple self test. See StreamTokenizerTestCase for full tests.

Tokenize ( List tokens ) : bool

Parse the rest of the stream and put all the tokens in the input List. This resets the line number to 1.

TokenizeFile ( string fileName ) : RTools.Util.Token[]

Tokenize a file completely and return the tokens in a Token[].

TokenizeFile ( string fileName, List tokens ) : bool

Parse all tokens from the specified file, put them into the input List.

TokenizeReader ( TextReader tr, List tokens ) : bool

Parse all tokens from the specified TextReader, put them into the input List.

TokenizeStream ( Stream s, List tokens ) : bool

Parse all tokens from the specified Stream, put them into the input List.

TokenizeString ( string str, List tokens ) : bool

Parse all tokens from the specified string, put them into the input List.

Защищенные методы

Метод Описание
SpeedTestParse ( StreamTokenizer tokenizer, Stream stream ) : double

Use the supplied tokenizer to tokenize the specified stream and time it.

Приватные методы

Метод Описание
GetNextChar ( ) : int

Read the next character from the stream, or from backString if we backed up.

GrabInt ( CharBuffer sb, bool allowPlus, char &thisChar ) : bool

Starting from current stream location, scan forward over an int. Determine whether it's an integer or not. If so, push the integer characters to the specified CharBuffer. If not, put them in backString (essentially leave the stream as it was) and return false.

If it was an int, the stream is left 1 character after the end of the int, and that character is output in the thisChar parameter.

The formats for integers are: 1, +1, and -1

The + and - signs are included in the output buffer.

Initialize ( ) : void

Utility function, things common to constructors.

InitializeStream ( ) : void

Clear the stream settings.

PickNextState ( byte ctype, int c ) : NextTokenState

Pick the next state given just a single character. This is used at the start of a new token.

PickNextState ( byte ctype, int c, NextTokenState excludeState ) : NextTokenState

Pick the next state given just a single character. This is used at the start of a new token.

Описание методов

Display() публичный Метод

Display the state of this object.
public Display ( ) : void
Результат void

Display() публичный Метод

Display the state of this object, with a per-line prefix.
public Display ( string prefix ) : void
prefix string The pre-line prefix.
Результат void

NextToken() публичный Метод

Get the next token. The last token will be an EofToken unless there's an unterminated quote or unterminated block comment and Settings.DoUntermCheck is true, in which case this throws an exception of type StreamTokenizerUntermException or sub-class.
public NextToken ( Token &token ) : bool
token Token The output token.
Результат bool

SpeedTest() публичный статический Метод

Speed test. This tests the speed of the parse.
public static SpeedTest ( ) : bool
Результат bool

SpeedTestParse() защищенный статический Метод

Use the supplied tokenizer to tokenize the specified stream and time it.
protected static SpeedTestParse ( StreamTokenizer tokenizer, Stream stream ) : double
tokenizer StreamTokenizer
stream Stream
Результат double

StreamTokenizer() публичный Метод

Default constructor.
public StreamTokenizer ( ) : System
Результат System

StreamTokenizer() публичный Метод

Construct and set this object's TextReader to the one specified.
public StreamTokenizer ( TextReader sr ) : System
sr TextReader The TextReader to read from.
Результат System

StreamTokenizer() публичный Метод

Construct and set a string to tokenize.
public StreamTokenizer ( string str ) : System
str string The string to tokenize.
Результат System

TestSelf() публичный статический Метод

Simple self test. See StreamTokenizerTestCase for full tests.
public static TestSelf ( ) : bool
Результат bool

Tokenize() публичный Метод

Parse the rest of the stream and put all the tokens in the input List. This resets the line number to 1.
public Tokenize ( List tokens ) : bool
tokens List The List to append to.
Результат bool

TokenizeFile() публичный Метод

Tokenize a file completely and return the tokens in a Token[].
public TokenizeFile ( string fileName ) : RTools.Util.Token[]
fileName string The file to tokenize.
Результат RTools.Util.Token[]

TokenizeFile() публичный Метод

Parse all tokens from the specified file, put them into the input List.
public TokenizeFile ( string fileName, List tokens ) : bool
fileName string The file to read.
tokens List The List to put tokens in.
Результат bool

TokenizeReader() публичный Метод

Parse all tokens from the specified TextReader, put them into the input List.
public TokenizeReader ( TextReader tr, List tokens ) : bool
tr TextReader The TextReader to read from.
tokens List The List to append to.
Результат bool

TokenizeStream() публичный Метод

Parse all tokens from the specified Stream, put them into the input List.
public TokenizeStream ( Stream s, List tokens ) : bool
s Stream
tokens List The List to put tokens in.
Результат bool

TokenizeString() публичный Метод

Parse all tokens from the specified string, put them into the input List.
public TokenizeString ( string str, List tokens ) : bool
str string
tokens List The List to put tokens in.
Результат bool

Описание свойств

NChars публичное статическое свойство

This is the number of characters in the character table.
public static int NChars
Результат int