C# (CSharp) Lucene.Net.Util.Automaton Namespace

Classes

Name Description
Automaton Finite-state automaton with regular expression operations.

Class invariants:

  • An automaton is either represented explicitly (with State and Transition objects) or with a singleton string (see #getSingleton() and #expandSingleton()) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.)
  • Automata are always reduced (see #reduce()) and have no transitions to dead states (see #removeDeadTransitions()).
  • If an automaton is nondeterministic, then #isDeterministic() returns false (but the converse is not required).
  • Automata provided as input to operations are generally assumed to be disjoint.

If the states or transitions are manipulated manually, the #restoreInvariant() and #setDeterministic(boolean) methods should be used afterwards to restore representation invariants that are assumed by the built-in automata operations.

Note: this class has internal mutable state and is not thread safe. It is the caller's responsibility to ensure any necessary synchronization if you wish to use the same Automaton from multiple threads. In general it is instead recommended to use a RunAutomaton for multithreaded matching: it is immutable, thread safe, and much faster.

@lucene.experimental
AutomatonTestUtil Utilities for testing automata.

Capable of generating random regular expressions, and automata, and also provides a number of very basic unoptimized implementations (*slow) for testing.

AutomatonTestUtil.RandomAcceptedStrings Lets you retrieve random strings accepted by an Automaton.

Once created, call #getRandomAcceptedString(Random) to get a new string (in UTF-32 codepoints).

AutomatonTestUtil.RandomAcceptedStrings.ArrivingTransition
MinimizationOperations Operations for minimizing automata. @lucene.experimental
MinimizationOperations.IntPair
MinimizationOperations.StateList
MinimizationOperations.StateListNode
RegExp Regular Expression extension to Automaton.

Regular expressions are built from the following abstract syntax:

regexp ::= unionexp
|
unionexp ::= interexp | unionexp (union)
| interexp
interexp ::= concatexp & interexp (intersection) [OPTIONAL]
| concatexp
concatexp ::= repeatexp concatexp (concatenation)
| repeatexp
repeatexp ::= repeatexp ? (zero or one occurrence)
| repeatexp * (zero or more occurrences)
| repeatexp + (one or more occurrences)
| repeatexp {n} (n occurrences)
| repeatexp {n,} (n or more occurrences)
| repeatexp {n,m} (n to m occurrences, including both)
| complexp
complexp ::= ~ complexp (complement) [OPTIONAL]
| charclassexp
charclassexp ::= [ charclasses ] (character class)
| [^ charclasses ] (negated character class)
| simpleexp
charclasses ::= charclass charclasses
| charclass
charclass ::= charexp - charexp (character range, including end-points)
| charexp
simpleexp ::= charexp
| . (any single character)
| # (the empty language) [OPTIONAL]
| @ (any string) [OPTIONAL]
| " <Unicode string without double-quotes>  " (a string)
| ( ) (the empty string)
| ( unionexp ) (precedence override)
| < <identifier> > (named automaton) [OPTIONAL]
| <n-m> (numerical interval) [OPTIONAL]
charexp ::= <Unicode character> (a single non-reserved character)
| \ <Unicode character>  (a single character)

The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's). @lucene.experimental

SpecialOperations Special automata operations. @lucene.experimental
TestDeterminism
TestMinimize
TestUTF32ToUTF8