cc.mallet.extract
Class StringTokenization

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.ArrayList<Token>
              extended by cc.mallet.types.TokenSequence
                  extended by cc.mallet.extract.StringTokenization
All Implemented Interfaces:
Tokenization, Sequence, java.io.Serializable, java.lang.Cloneable, java.lang.Iterable<Token>, java.util.Collection<Token>, java.util.List<Token>, java.util.RandomAccess

public class StringTokenization
extends TokenSequence
implements Tokenization

See Also:
Serialized Form

Field Summary
 
Fields inherited from class java.util.AbstractList
modCount
 
Constructor Summary
StringTokenization(java.lang.CharSequence seq)
          Create an empty StringTokenization
StringTokenization(java.lang.CharSequence string, CharSequenceLexer lexer)
          Creates a tokenization of the given string.
 
Method Summary
 java.lang.Object getDocument()
          Returns the document of which this is a tokenization.
 Span getSpan(int i)
           
 Span subspan(int firstToken, int lastToken)
          Returns a span formed by concatenating the spans from start to end.
 
Methods inherited from class cc.mallet.types.TokenSequence
add, addAll, getNumericProperty, getProperties, getProperty, hasProperty, removeLast, setNumericProperty, setProperty, toFeatureSequence, toFeatureVector, toString, toStringShort
 
Methods inherited from class java.util.ArrayList
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, remove, removeRange, set, size, toArray, toArray, trimToSize
 
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator, subList
 
Methods inherited from class java.util.AbstractCollection
containsAll, removeAll, retainAll
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface cc.mallet.types.Sequence
get, size
 
Methods inherited from interface java.util.List
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll, subList
 

Constructor Detail

StringTokenization

public StringTokenization(java.lang.CharSequence seq)
Create an empty StringTokenization


StringTokenization

public StringTokenization(java.lang.CharSequence string,
                          CharSequenceLexer lexer)
Creates a tokenization of the given string. Tokens are added from all the matches of the given lexer.

Method Detail

subspan

public Span subspan(int firstToken,
                    int lastToken)
Description copied from interface: Tokenization
Returns a span formed by concatenating the spans from start to end. In more detail:

Specified by:
subspan in interface Tokenization
Parameters:
firstToken - The index of the first token in the new span (inclusive). This is an index of a token, *not* an index into the document.
lastToken - The index of the first token in the new span (exclusive). This is an index of a token, *not* an index into the document.
Returns:
A span into this tokenization's document

getSpan

public Span getSpan(int i)
Specified by:
getSpan in interface Tokenization

getDocument

public java.lang.Object getDocument()
Description copied from interface: Tokenization
Returns the document of which this is a tokenization.

Specified by:
getDocument in interface Tokenization