cc.mallet.extract
Interface Tokenization
- All Superinterfaces:
- Sequence
- All Known Implementing Classes:
- StringTokenization
public interface Tokenization
- extends Sequence
Method Summary |
java.lang.Object |
getDocument()
Returns the document of which this is a tokenization. |
Span |
getSpan(int i)
|
Span |
subspan(int start,
int end)
Returns a span formed by concatenating the spans from start to end. |
getDocument
java.lang.Object getDocument()
- Returns the document of which this is a tokenization.
getSpan
Span getSpan(int i)
subspan
Span subspan(int start,
int end)
- Returns a span formed by concatenating the spans from start to end.
In more detail:
- The start of the new span will be the start index of getSpan(start).
- The end of the new span will be the start index of getSpan(end).
- Unless start == end, the new span will completely include getSpan(start).
- The new span will never intersect getSpan(end)
- If start == end, then the new span contains no text.
- Parameters:
start
- The index of the first token in the new span (inclusive).
This is an index of a token, *not* an index into the document.end
- The index of the first token in the new span (exclusive).
This is an index of a token, *not* an index into the document.
- Returns:
- A span into this tokenization's document