Name |
Description |
Automaton |
Finite-state automaton with regular expression operations. Class invariants: - An automaton is either represented explicitly (with State and Transition objects) or with a singleton string (see #getSingleton() and #expandSingleton()) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.)
- Automata are always reduced (see #reduce()) and have no transitions to dead states (see #removeDeadTransitions()).
- If an automaton is nondeterministic, then #isDeterministic() returns false (but the converse is not required).
- Automata provided as input to operations are generally assumed to be disjoint.
If the states or transitions are manipulated manually, the #restoreInvariant() and #setDeterministic(boolean) methods should be used afterwards to restore representation invariants that are assumed by the built-in automata operations. Note: this class has internal mutable state and is not thread safe. It is the caller's responsibility to ensure any necessary synchronization if you wish to use the same Automaton from multiple threads. In general it is instead recommended to use a RunAutomaton for multithreaded matching: it is immutable, thread safe, and much faster. @lucene.experimental |
AutomatonTestUtil |
Utilities for testing automata. Capable of generating random regular expressions, and automata, and also provides a number of very basic unoptimized implementations (*slow) for testing. |
AutomatonTestUtil.RandomAcceptedStrings |
Lets you retrieve random strings accepted by an Automaton. Once created, call #getRandomAcceptedString(Random) to get a new string (in UTF-32 codepoints). |
AutomatonTestUtil.RandomAcceptedStrings.ArrivingTransition |
|
MinimizationOperations |
Operations for minimizing automata. @lucene.experimental |
MinimizationOperations.IntPair |
|
MinimizationOperations.StateList |
|
MinimizationOperations.StateListNode |
|
RegExp |
Regular Expression extension to Automaton . Regular expressions are built from the following abstract syntax: regexp | ::= | unionexp | | | | | | | | | unionexp | ::= | interexp | unionexp | (union) | | | | | interexp | | | interexp | ::= | concatexp & interexp | (intersection) | [OPTIONAL] | | | | concatexp | | | concatexp | ::= | repeatexp concatexp | (concatenation) | | | | | repeatexp | | | repeatexp | ::= | repeatexp ? | (zero or one occurrence) | | | | | repeatexp * | (zero or more occurrences) | | | | | repeatexp + | (one or more occurrences) | | | | | repeatexp {n} | (n occurrences) | | | | | repeatexp {n,} | (n or more occurrences) | | | | | repeatexp {n,m} | (n to m occurrences, including both) | | | | | complexp | | | complexp | ::= | ~ complexp | (complement) | [OPTIONAL] | | | | charclassexp | | | charclassexp | ::= | [ charclasses ] | (character class) | | | | | [^ charclasses ] | (negated character class) | | | | | simpleexp | | | charclasses | ::= | charclass charclasses | | | | | | charclass | | | charclass | ::= | charexp - charexp | (character range, including end-points) | | | | | charexp | | | simpleexp | ::= | charexp | | | | | | . | (any single character) | | | | | # | (the empty language) | [OPTIONAL] | | | | @ | (any string) | [OPTIONAL] | | | | " <Unicode string without double-quotes> " | (a string) | | | | | ( ) | (the empty string) | | | | | ( unionexp ) | (precedence override) | | | | | < <identifier> > | (named automaton) | [OPTIONAL] | | | | <n-m> | (numerical interval) | [OPTIONAL] | charexp | ::= | <Unicode character> | (a single non-reserved character) | | | | | \ <Unicode character> | (a single character) | | The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's). @lucene.experimental |
SpecialOperations |
Special automata operations. @lucene.experimental |
TestDeterminism |
|
TestMinimize |
|
TestUTF32ToUTF8 |
|