FeatureDocFreqPipe (Mallet 2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

cc.mallet.pipe
Class FeatureDocFreqPipe

java.lang.Object
  cc.mallet.pipe.Pipe
      cc.mallet.pipe.FeatureDocFreqPipe

All Implemented Interfaces:: AlphabetCarrying, java.io.Serializable

public class FeatureDocFreqPipe
extends Pipe
extends Pipe

Pruning low-count features can be a good way to save memory and computation. However, in order to use Vectors2Vectors, you need to write out the unpruned instance list, read it back into memory, collect statistics, create new instances, and then write everything back out.

This class supports a simpler method that makes two passes over the data: one to collect statistics and create an augmented "stop list", and a second to actually create instances.

See Also:: Serialized Form

Constructor Summary
`FeatureDocFreqPipe()`
`FeatureDocFreqPipe(Alphabet dataAlphabet, Alphabet targetAlphabet)`

Method Summary
`void`	`addPrunedWordsToStoplist(SimpleTokenizer tokenizer, double docFrequencyCutoff)` Add all pruned words to the internal stoplist of a SimpleTokenizer.
`Instance`	`pipe(Instance instance)` Really this should be 'protected', but isn't for historical reasons.

Methods inherited from class cc.mallet.pipe.Pipe
`alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing`

Methods inherited from class cc.mallet.pipe.Pipe

alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail