cc.mallet.pipe
Class SimpleTokenizer
java.lang.Object
cc.mallet.pipe.Pipe
cc.mallet.pipe.SimpleTokenizer
- All Implemented Interfaces:
- AlphabetCarrying, java.io.Serializable
public class SimpleTokenizer
- extends Pipe
A simple unicode tokenizer that accepts sequences of letters
as tokens.
- See Also:
- Serialized Form
Methods inherited from class cc.mallet.pipe.Pipe |
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
USE_EMPTY_STOPLIST
public static final int USE_EMPTY_STOPLIST
- See Also:
- Constant Field Values
USE_DEFAULT_ENGLISH_STOPLIST
public static final int USE_DEFAULT_ENGLISH_STOPLIST
- See Also:
- Constant Field Values
stoplist
protected java.util.HashSet<java.lang.String> stoplist
SimpleTokenizer
public SimpleTokenizer(int languageFlag)
SimpleTokenizer
public SimpleTokenizer(java.io.File stopfile)
SimpleTokenizer
public SimpleTokenizer(java.util.HashSet<java.lang.String> stoplist)
deepClone
public SimpleTokenizer deepClone()
stop
public void stop(java.lang.String word)
pipe
public Instance pipe(Instance instance)
- Description copied from class:
Pipe
- Really this should be 'protected', but isn't for historical reasons.
- Overrides:
pipe
in class Pipe