Class TokenSequenceMatchDataAndTarget

  extended by cc.mallet.pipe.Pipe
      extended by cc.mallet.pipe.TokenSequenceMatchDataAndTarget
All Implemented Interfaces:

public class TokenSequenceMatchDataAndTarget
extends Pipe

Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.

For example, if you have a data file containing one line per token, and the label also appears on that line, you can first get a TokenSequence in which the text of each line is the Token.getText() of each token, then run this pipe, and separate the target information from the data information. For example to process the following,

         BACKGROUND Then
         PERSON Mr.
         PERSON Smith
         BACKGROUND said
use new TokenSequenceMatchDataAndTarget (Pattern.compile ("([A-Z]+) (.*)"), 2, 1).

Andrew McCallum
See Also:
Serialized Form

Constructor Summary
TokenSequenceMatchDataAndTarget(java.util.regex.Pattern regex, int dataGroup, int targetGroup)
TokenSequenceMatchDataAndTarget(java.lang.String regex, int dataGroup, int targetGroup)
Method Summary
 Instance pipe(Instance carrier)
          Really this should be 'protected', but isn't for historical reasons.
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public TokenSequenceMatchDataAndTarget(java.util.regex.Pattern regex,
                                       int dataGroup,
                                       int targetGroup)


public TokenSequenceMatchDataAndTarget(java.lang.String regex,
                                       int dataGroup,
                                       int targetGroup)
Method Detail


public Instance pipe(Instance carrier)
Description copied from class: Pipe
Really this should be 'protected', but isn't for historical reasons.

pipe in class Pipe