cc.mallet.pipe
Class SimpleTaggerSentence2TokenSequence

java.lang.Object
  extended by cc.mallet.pipe.Pipe
      extended by cc.mallet.pipe.SimpleTaggerSentence2TokenSequence
All Implemented Interfaces:
AlphabetCarrying, java.io.Serializable
Direct Known Subclasses:
SimpleTaggerSentence2StringTokenization

public class SimpleTaggerSentence2TokenSequence
extends Pipe

Converts an external encoding of a sequence of elements with binary features to a TokenSequence. If target processing is on (training or labeled test data), it extracts element labels from the external encoding to create a target LabelSequence. Two external encodings are supported:

  1. A String containing lines of whitespace-separated tokens.
  2. a String[][].

Both represent rows of tokens. When target processing is on, the last token in each row is the label of the sequence element represented by this row. All other tokens in the row, or all tokens in the row if not target processing, are the names of features that are on for the sequence element described by the row.

See Also:
Serialized Form

Field Summary
protected  boolean setTokensAsFeatures
           
 
Constructor Summary
SimpleTaggerSentence2TokenSequence()
          Creates a new SimpleTaggerSentence2TokenSequence instance.
SimpleTaggerSentence2TokenSequence(boolean inc)
          creates a new SimpleTaggerSentence2TokenSequence instance which includes tokens as features iff the supplied argument is true.
 
Method Summary
protected  java.lang.String makeText(java.lang.String[] in)
          returns the first String in the array or "" if the array has length 0.
protected  java.lang.String[][] parseSentence(java.lang.String sentence)
          Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.
 Instance pipe(Instance carrier)
          Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence.
 
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

setTokensAsFeatures

protected boolean setTokensAsFeatures
Constructor Detail

SimpleTaggerSentence2TokenSequence

public SimpleTaggerSentence2TokenSequence()
Creates a new SimpleTaggerSentence2TokenSequence instance. By default we include tokens as features.


SimpleTaggerSentence2TokenSequence

public SimpleTaggerSentence2TokenSequence(boolean inc)
creates a new SimpleTaggerSentence2TokenSequence instance which includes tokens as features iff the supplied argument is true.

Method Detail

parseSentence

protected java.lang.String[][] parseSentence(java.lang.String sentence)
Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.

Parameters:
sentence - a String
Returns:
the corresponding array of arrays of tokens.

makeText

protected java.lang.String makeText(java.lang.String[] in)
returns the first String in the array or "" if the array has length 0.


pipe

public Instance pipe(Instance carrier)
Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence. Each Token in the sequence is gets the test of the line preceding it and once feature of value 1 for each "Feature" in the line. For example, if the String[][] is {{a,b},{c,d,e}} (and target processing is off) then the text would be "a b" for the first token and "c d e" for the second. Also, the features "a" and "b" would be set for the first token and "c", "d" and "e" for the second. The last element in the String[] for the current token is taken as the target (label), so in the previous example "b" would have been the label of the first sequence.

Overrides:
pipe in class Pipe