cc.mallet.pipe
Class AddClassifierTokenPredictions

java.lang.Object
  extended by cc.mallet.pipe.Pipe
      extended by cc.mallet.pipe.AddClassifierTokenPredictions
All Implemented Interfaces:
AlphabetCarrying, java.io.Serializable

public class AddClassifierTokenPredictions
extends Pipe
implements java.io.Serializable

This pipe uses a Classifier to label each token (i.e., using 0-th order Markov assumption), then adds the predictions as features to each token. This pipe assumes the input Instance's data is of type FeatureVectorSequence (each an augmentable feature vector). Example usage:

                1) Create and serialize a featurePipe that converts raw input to FeatureVectorSequences
                2) Pipe input data through featurePipe, train a TokenClassifiers via cross validation, then serialize the classifiers
                2) Pipe input data through featurePipe and this pipe (using the saved classifiers), and train a Transducer 
                4) Serialize the trained Transducer 
 

Author:
ghuang
See Also:
Serialized Form

Nested Class Summary
static class AddClassifierTokenPredictions.TokenClassifiers
          This inner class represents the trained token classifiers.
 
Constructor Summary
AddClassifierTokenPredictions(AddClassifierTokenPredictions.TokenClassifiers tokenClassifiers, int[] predRanks2add, boolean binary, InstanceList testList)
           
AddClassifierTokenPredictions(InstanceList trainList)
           
AddClassifierTokenPredictions(InstanceList trainList, InstanceList testList)
           
 
Method Summary
static InstanceList convert(InstanceList ilist, Noop alphabetsPipe)
          Converts each instance containing a FeatureVectorSequence to multiple instances, each containing an AugmentableFeatureVector as data.
static InstanceList convert(Instance inst, Noop alphabetsPipe)
           
 Alphabet getDataAlphabet()
           
 boolean getInProduction()
           
 Instance pipe(Instance carrier)
          Add the token classifier's predictions as features to the instance.
 void setInProduction(boolean inProduction)
           
static void setInProduction(Pipe p, boolean value)
           
 
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AddClassifierTokenPredictions

public AddClassifierTokenPredictions(InstanceList trainList)

AddClassifierTokenPredictions

public AddClassifierTokenPredictions(InstanceList trainList,
                                     InstanceList testList)

AddClassifierTokenPredictions

public AddClassifierTokenPredictions(AddClassifierTokenPredictions.TokenClassifiers tokenClassifiers,
                                     int[] predRanks2add,
                                     boolean binary,
                                     InstanceList testList)
Method Detail

setInProduction

public void setInProduction(boolean inProduction)

getInProduction

public boolean getInProduction()

setInProduction

public static void setInProduction(Pipe p,
                                   boolean value)

getDataAlphabet

public Alphabet getDataAlphabet()
Overrides:
getDataAlphabet in class Pipe

pipe

public Instance pipe(Instance carrier)
Add the token classifier's predictions as features to the instance. This method assumes the input instance contains FeatureVectorSequence as data

Overrides:
pipe in class Pipe

convert

public static InstanceList convert(InstanceList ilist,
                                   Noop alphabetsPipe)
Converts each instance containing a FeatureVectorSequence to multiple instances, each containing an AugmentableFeatureVector as data.

Parameters:
ilist - Instances with FeatureVectorSequence as data field
alphabetsPipe - a Noop pipe containing the data and target alphabets for the resulting InstanceList
Returns:
an InstanceList where each Instance contains one Token's AugmentableFeatureVector as data

convert

public static InstanceList convert(Instance inst,
                                   Noop alphabetsPipe)
Parameters:
inst - input instance, with FeatureVectorSequence as data.
alphabetsPipe - a Noop pipe containing the data and target alphabets for the resulting InstanceList and AugmentableFeatureVectors
Returns:
list of instances, each with one AugmentableFeatureVector as data