C# Class RTools.Util.StreamTokenizer

A StreamTokenizer similar to Java's. This breaks an input stream (coming from a TextReader) into Tokens based on various settings. The settings are stored in the TokenizerSettings property, which is a StreamTokenizerSettings instance.

This is configurable in that you can modify TokenizerSettings.CharTypes[] array to specify which characters are which type, along with other settings such as whether to look for comments or not.

WARNING: This is not internationalized. This treats all characters beyond the 7-bit ASCII range (decimal 127) as Word characters.

There are two main ways to use this: 1) Parse the entire stream at once and get an List of Tokens (see the Tokenize* methods), and 2) call NextToken() successively. This reads from a TextReader, which you can set directly, and this also provides some convenient methods to parse files and strings. This returns an Eof token if the end of the input is reached.

Here's an example of the NextToken() style of use: StreamTokenizer tokenizer = new StreamTokenizer(); tokenizer.GrabWhitespace = true; tokenizer.Verbosity = VerbosityLevel.Debug; // just for debugging tokenizer.TextReader = File.OpenText(fileName); Token token; while (tokenizer.NextToken(out token)) log.Info("Token = '{0}'", token);

Here's an example of the Tokenize... style of use: StreamTokenizer tokenizer = new StreamTokenizer("some string"); List tokens = new List(); if (!tokenizer.Tokenize(tokens)) { // error handling } foreach (Token t in tokens) Console.WriteLine("t = {0}", t);

Comment delimiters are hardcoded (// and /*), not affected by char type table.

This sets line numbers in the tokens it produces. These numbers are normally the line on which the token starts. There is one known caveat, and that is that when GrabWhitespace setting is true, and a whitespace token contains a newline, that token's line number will be set to the following line rather than the line on which the token started.

Show file Open project: PaulMineau/AIMA.Net Class Usage Examples

Public Properties

Property Type Description
NChars int

Public Methods

Method Description
Display ( ) : void

Display the state of this object.

Display ( string prefix ) : void

Display the state of this object, with a per-line prefix.

NextToken ( Token &token ) : bool

Get the next token. The last token will be an EofToken unless there's an unterminated quote or unterminated block comment and Settings.DoUntermCheck is true, in which case this throws an exception of type StreamTokenizerUntermException or sub-class.

SpeedTest ( ) : bool

Speed test. This tests the speed of the parse.

StreamTokenizer ( ) : System

Default constructor.

StreamTokenizer ( TextReader sr ) : System

Construct and set this object's TextReader to the one specified.

StreamTokenizer ( string str ) : System

Construct and set a string to tokenize.

TestSelf ( ) : bool

Simple self test. See StreamTokenizerTestCase for full tests.

Tokenize ( List tokens ) : bool

Parse the rest of the stream and put all the tokens in the input List. This resets the line number to 1.

TokenizeFile ( string fileName ) : RTools.Util.Token[]

Tokenize a file completely and return the tokens in a Token[].

TokenizeFile ( string fileName, List tokens ) : bool

Parse all tokens from the specified file, put them into the input List.

TokenizeReader ( TextReader tr, List tokens ) : bool

Parse all tokens from the specified TextReader, put them into the input List.

TokenizeStream ( Stream s, List tokens ) : bool

Parse all tokens from the specified Stream, put them into the input List.

TokenizeString ( string str, List tokens ) : bool

Parse all tokens from the specified string, put them into the input List.

Protected Methods

Method Description
SpeedTestParse ( StreamTokenizer tokenizer, Stream stream ) : double

Use the supplied tokenizer to tokenize the specified stream and time it.

Private Methods

Method Description
GetNextChar ( ) : int

Read the next character from the stream, or from backString if we backed up.

GrabInt ( CharBuffer sb, bool allowPlus, char &thisChar ) : bool

Starting from current stream location, scan forward over an int. Determine whether it's an integer or not. If so, push the integer characters to the specified CharBuffer. If not, put them in backString (essentially leave the stream as it was) and return false.

If it was an int, the stream is left 1 character after the end of the int, and that character is output in the thisChar parameter.

The formats for integers are: 1, +1, and -1

The + and - signs are included in the output buffer.

Initialize ( ) : void

Utility function, things common to constructors.

InitializeStream ( ) : void

Clear the stream settings.

PickNextState ( byte ctype, int c ) : NextTokenState

Pick the next state given just a single character. This is used at the start of a new token.

PickNextState ( byte ctype, int c, NextTokenState excludeState ) : NextTokenState

Pick the next state given just a single character. This is used at the start of a new token.

Method Details

Display() public method

Display the state of this object.
public Display ( ) : void
return void

Display() public method

Display the state of this object, with a per-line prefix.
public Display ( string prefix ) : void
prefix string The pre-line prefix.
return void

NextToken() public method

Get the next token. The last token will be an EofToken unless there's an unterminated quote or unterminated block comment and Settings.DoUntermCheck is true, in which case this throws an exception of type StreamTokenizerUntermException or sub-class.
public NextToken ( Token &token ) : bool
token Token The output token.
return bool

SpeedTest() public static method

Speed test. This tests the speed of the parse.
public static SpeedTest ( ) : bool
return bool

SpeedTestParse() protected static method

Use the supplied tokenizer to tokenize the specified stream and time it.
protected static SpeedTestParse ( StreamTokenizer tokenizer, Stream stream ) : double
tokenizer StreamTokenizer
stream Stream
return double

StreamTokenizer() public method

Default constructor.
public StreamTokenizer ( ) : System
return System

StreamTokenizer() public method

Construct and set this object's TextReader to the one specified.
public StreamTokenizer ( TextReader sr ) : System
sr TextReader The TextReader to read from.
return System

StreamTokenizer() public method

Construct and set a string to tokenize.
public StreamTokenizer ( string str ) : System
str string The string to tokenize.
return System

TestSelf() public static method

Simple self test. See StreamTokenizerTestCase for full tests.
public static TestSelf ( ) : bool
return bool

Tokenize() public method

Parse the rest of the stream and put all the tokens in the input List. This resets the line number to 1.
public Tokenize ( List tokens ) : bool
tokens List The List to append to.
return bool

TokenizeFile() public method

Tokenize a file completely and return the tokens in a Token[].
public TokenizeFile ( string fileName ) : RTools.Util.Token[]
fileName string The file to tokenize.
return RTools.Util.Token[]

TokenizeFile() public method

Parse all tokens from the specified file, put them into the input List.
public TokenizeFile ( string fileName, List tokens ) : bool
fileName string The file to read.
tokens List The List to put tokens in.
return bool

TokenizeReader() public method

Parse all tokens from the specified TextReader, put them into the input List.
public TokenizeReader ( TextReader tr, List tokens ) : bool
tr TextReader The TextReader to read from.
tokens List The List to append to.
return bool

TokenizeStream() public method

Parse all tokens from the specified Stream, put them into the input List.
public TokenizeStream ( Stream s, List tokens ) : bool
s Stream
tokens List The List to put tokens in.
return bool

TokenizeString() public method

Parse all tokens from the specified string, put them into the input List.
public TokenizeString ( string str, List tokens ) : bool
str string
tokens List The List to put tokens in.
return bool

Property Details

NChars public static property

This is the number of characters in the character table.
public static int NChars
return int