Class StringTokenization

  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.ArrayList<Token>
              extended by cc.mallet.types.TokenSequence
                  extended by cc.mallet.extract.StringTokenization
All Implemented Interfaces:
Tokenization, Sequence,, java.lang.Cloneable, java.lang.Iterable<Token>, java.util.Collection<Token>, java.util.List<Token>, java.util.RandomAccess

public class StringTokenization
extends TokenSequence
implements Tokenization

See Also:
Serialized Form

Field Summary
Fields inherited from class java.util.AbstractList
Constructor Summary
StringTokenization(java.lang.CharSequence seq)
          Create an empty StringTokenization
StringTokenization(java.lang.CharSequence string, CharSequenceLexer lexer)
          Creates a tokenization of the given string.
Method Summary
 java.lang.Object getDocument()
          Returns the document of which this is a tokenization.
 Span getSpan(int i)
 Span subspan(int firstToken, int lastToken)
          Returns a span formed by concatenating the spans from start to end.
Methods inherited from class cc.mallet.types.TokenSequence
add, addAll, getNumericProperty, getProperties, getProperty, hasProperty, removeLast, setNumericProperty, setProperty, toFeatureSequence, toFeatureVector, toString, toStringShort
Methods inherited from class java.util.ArrayList
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, remove, removeRange, set, size, toArray, toArray, trimToSize
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator, subList
Methods inherited from class java.util.AbstractCollection
containsAll, removeAll, retainAll
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface cc.mallet.types.Sequence
get, size
Methods inherited from interface java.util.List
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll, subList

Constructor Detail


public StringTokenization(java.lang.CharSequence seq)
Create an empty StringTokenization


public StringTokenization(java.lang.CharSequence string,
                          CharSequenceLexer lexer)
Creates a tokenization of the given string. Tokens are added from all the matches of the given lexer.

Method Detail


public Span subspan(int firstToken,
                    int lastToken)
Description copied from interface: Tokenization
Returns a span formed by concatenating the spans from start to end. In more detail:

Specified by:
subspan in interface Tokenization
firstToken - The index of the first token in the new span (inclusive). This is an index of a token, *not* an index into the document.
lastToken - The index of the first token in the new span (exclusive). This is an index of a token, *not* an index into the document.
A span into this tokenization's document


public Span getSpan(int i)
Specified by:
getSpan in interface Tokenization


public java.lang.Object getDocument()
Description copied from interface: Tokenization
Returns the document of which this is a tokenization.

Specified by:
getDocument in interface Tokenization