TokenSequenceParseFeatureString (Mallet 2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

cc.mallet.pipe
Class TokenSequenceParseFeatureString

java.lang.Object
  cc.mallet.pipe.Pipe
      cc.mallet.pipe.TokenSequenceParseFeatureString

All Implemented Interfaces:: AlphabetCarrying, java.io.Serializable

public class TokenSequenceParseFeatureString
extends Pipe
implements java.io.Serializable
extends Pipe
implements java.io.Serializable

Convert the string in each field Token.text to a list of Strings (space delimited). Add each string as a feature to the token. If realValued is true, then treat the position in the list as the feature name and the value as a double. Otherwise, the feature name is the string itself and the value is 1.0.

Modified to allow feature names and values to be specified.eg: featureName1=featureValue1 featureName2=featureValue2 ... The name/value separator (here '=') can be specified.

If your data consists of feature/value pairs (eg height=10.7 width=3.6 length=1.7), use new TokenSequenceParseFeatureString(true, true, "="). This format is typically used for sparse data, in which most features are equal to 0 in any given instance.

If your data consists only of values, and the position determines which feature the value is for (eg 10.7 3.6 1.7), use new TokenSequenceParseFeatureString(true). This format is typically used for data that has a small number of features that all have non-zero values most of the time.

If your data is in the form of named binary indicator variables (eg yellow quacks has_webbed_feet), use the constructor new TokenSequenceParseFeatureString(false). Each token will be interpreted as the name of a feature, whose value is 1.0.

Author:: Aron Culotta culotta@cs.umass.edu
See Also:: Serialized Form

Constructor Summary
`TokenSequenceParseFeatureString(boolean _realValued)`
`TokenSequenceParseFeatureString(boolean _realValued, boolean _specifyFeatureNames)`
`TokenSequenceParseFeatureString(boolean _realValued, boolean _specifyFeatureNames, java.lang.String _nameValueSeparator)`

Method Summary
`Instance`	`pipe(Instance carrier)` Really this should be 'protected', but isn't for historical reasons.

Methods inherited from class cc.mallet.pipe.Pipe
`alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing`

Methods inherited from class cc.mallet.pipe.Pipe

alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail