cc.mallet.pipe
Class SGML2TokenSequence
java.lang.Object
cc.mallet.pipe.Pipe
cc.mallet.pipe.SGML2TokenSequence
- All Implemented Interfaces:
- AlphabetCarrying, java.io.Serializable
public class SGML2TokenSequence
- extends Pipe
- implements java.io.Serializable
Converts a string containing simple SGML tags into a dta TokenSequence of words,
paired with a target TokenSequence containing the SGML tags in effect for each word.
It does not handle nested SGML tags, nor gracefully handle malformed SGML.
- Author:
- Andrew McCallum mccallum@cs.umass.edu
- See Also:
- Serialized Form
Method Summary |
static void |
main(java.lang.String[] args)
|
Instance |
pipe(Instance carrier)
Really this should be 'protected', but isn't for historical reasons. |
Methods inherited from class cc.mallet.pipe.Pipe |
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SGML2TokenSequence
public SGML2TokenSequence(CharSequenceLexer lexer,
java.lang.String backgroundTag,
boolean saveSource)
SGML2TokenSequence
public SGML2TokenSequence(CharSequenceLexer lexer,
java.lang.String backgroundTag)
SGML2TokenSequence
public SGML2TokenSequence(java.lang.String regex,
java.lang.String backgroundTag)
SGML2TokenSequence
public SGML2TokenSequence()
pipe
public Instance pipe(Instance carrier)
- Description copied from class:
Pipe
- Really this should be 'protected', but isn't for historical reasons.
- Overrides:
pipe
in class Pipe
main
public static void main(java.lang.String[] args)