Property | Type | Description | |
---|---|---|---|
buffer | char[] | ||
offset | int |
Method | Description | |
---|---|---|
End ( ) : void | ||
IncrementToken ( ) : bool | ||
Reset ( ) : void |
Method | Description | |
---|---|---|
IncrementWord ( ) : bool |
Returns true if another word is available
|
|
IsSafeEnd ( char ch ) : bool |
For sentence tokenization, these are the unambiguous break positions.
|
|
SegmentingTokenizerBase ( AttributeFactory factory, |
Construct a new SegmenterBase, also supplying the AttributeFactory
|
|
SegmentingTokenizerBase ( |
Construct a new SegmenterBase, using the provided BreakIterator for sentence segmentation. Note that you should never share BreakIterators across different TokenStreams, instead a newly created or cloned one should always be provided to this constructor.
|
|
SetNextSentence ( int sentenceStart, int sentenceEnd ) : void |
Provides the next input sentence for analysis
|
Method | Description | |
---|---|---|
FindSafeEnd ( ) : int |
Returns the last unambiguous break position in the text.
|
|
IncrementSentence ( ) : bool |
return true if there is a token from the buffer, or null if it is exhausted.
|
|
Read ( |
commons-io's readFully, but without bugs if offset != 0
|
|
Refill ( ) : void |
Refill the buffer, accumulating the offset and setting usableLength to the last unambiguous break position
|
protected SegmentingTokenizerBase ( AttributeFactory factory, |
||
factory | AttributeFactory | |
reader | ||
iterator | BreakIterator | |
return | ICU4NET |
protected SegmentingTokenizerBase ( |
||
reader | ||
iterator | BreakIterator | |
return | ICU4NET |
protected abstract SetNextSentence ( int sentenceStart, int sentenceEnd ) : void | ||
sentenceStart | int | |
sentenceEnd | int | |
return | void |