cc.mallet.pipe
Class SimpleTokenizer

java.lang.Object
  extended by cc.mallet.pipe.Pipe
      extended by cc.mallet.pipe.SimpleTokenizer
All Implemented Interfaces:
AlphabetCarrying, java.io.Serializable

public class SimpleTokenizer
extends Pipe

A simple unicode tokenizer that accepts sequences of letters as tokens.

See Also:
Serialized Form

Field Summary
protected  java.util.HashSet<java.lang.String> stoplist
           
static int USE_DEFAULT_ENGLISH_STOPLIST
           
static int USE_EMPTY_STOPLIST
           
 
Constructor Summary
SimpleTokenizer(java.io.File stopfile)
           
SimpleTokenizer(java.util.HashSet<java.lang.String> stoplist)
           
SimpleTokenizer(int languageFlag)
           
 
Method Summary
 SimpleTokenizer deepClone()
           
 Instance pipe(Instance instance)
          Really this should be 'protected', but isn't for historical reasons.
 void stop(java.lang.String word)
           
 
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

USE_EMPTY_STOPLIST

public static final int USE_EMPTY_STOPLIST
See Also:
Constant Field Values

USE_DEFAULT_ENGLISH_STOPLIST

public static final int USE_DEFAULT_ENGLISH_STOPLIST
See Also:
Constant Field Values

stoplist

protected java.util.HashSet<java.lang.String> stoplist
Constructor Detail

SimpleTokenizer

public SimpleTokenizer(int languageFlag)

SimpleTokenizer

public SimpleTokenizer(java.io.File stopfile)

SimpleTokenizer

public SimpleTokenizer(java.util.HashSet<java.lang.String> stoplist)
Method Detail

deepClone

public SimpleTokenizer deepClone()

stop

public void stop(java.lang.String word)

pipe

public Instance pipe(Instance instance)
Description copied from class: Pipe
Really this should be 'protected', but isn't for historical reasons.

Overrides:
pipe in class Pipe