cc.mallet.pipe
Class TokenSequenceMatchDataAndTarget
java.lang.Object
cc.mallet.pipe.Pipe
cc.mallet.pipe.TokenSequenceMatchDataAndTarget
- All Implemented Interfaces:
- AlphabetCarrying, java.io.Serializable
public class TokenSequenceMatchDataAndTarget
- extends Pipe
- implements java.io.Serializable
Run a regular expression over the text of each token; replace the
text with the substring matching one regex group; create a target
TokenSequence from the text matching another regex group.
For example, if you have a data file containing one line per token,
and the label also appears on that line, you can first get a
TokenSequence in which the text of each line is the Token.getText()
of each token, then run this pipe, and separate the target
information from the data information. For example to process the
following,
BACKGROUND Then
PERSON Mr.
PERSON Smith
BACKGROUND said
...
use new TokenSequenceMatchDataAndTarget (Pattern.compile ("([A-Z]+) (.*)"), 2, 1)
.
- Author:
- Andrew McCallum mccallum@cs.umass.edu
- See Also:
- Serialized Form
Method Summary |
Instance |
pipe(Instance carrier)
Really this should be 'protected', but isn't for historical reasons. |
Methods inherited from class cc.mallet.pipe.Pipe |
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TokenSequenceMatchDataAndTarget
public TokenSequenceMatchDataAndTarget(java.util.regex.Pattern regex,
int dataGroup,
int targetGroup)
TokenSequenceMatchDataAndTarget
public TokenSequenceMatchDataAndTarget(java.lang.String regex,
int dataGroup,
int targetGroup)
pipe
public Instance pipe(Instance carrier)
- Description copied from class:
Pipe
- Really this should be 'protected', but isn't for historical reasons.
- Overrides:
pipe
in class Pipe