|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cc.mallet.pipe.Pipe cc.mallet.pipe.FeatureCountPipe
public class FeatureCountPipe
Pruning low-count features can be a good way to save memory and computation. However, in order to use Vectors2Vectors, you need to write out the unpruned instance list, read it back into memory, collect statistics, create new instances, and then write everything back out.
This class supports a simpler method that makes two passes over the data: one to collect statistics and create an augmented "stop list", and a second to actually create instances.
Constructor Summary | |
---|---|
FeatureCountPipe()
|
|
FeatureCountPipe(Alphabet dataAlphabet,
Alphabet targetAlphabet)
|
Method Summary | |
---|---|
void |
addPrunedWordsToStoplist(SimpleTokenizer tokenizer,
int minimumCount)
Add all pruned words to the internal stoplist of a SimpleTokenizer. |
Alphabet |
getPrunedAlphabet(int minimumCount)
Returns a new alphabet that contains only features at or above the specified limit. |
Instance |
pipe(Instance instance)
Really this should be 'protected', but isn't for historical reasons. |
void |
writeCommonWords(java.io.File commonFile,
int totalWords)
List the most common words, for addition to a stop file |
void |
writePrunedWords(java.io.File prunedFile,
int minimumCount)
Writes a list of features that do not occur at or above the specified cutoff to the pruned file, one per line. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public FeatureCountPipe()
public FeatureCountPipe(Alphabet dataAlphabet, Alphabet targetAlphabet)
Method Detail |
---|
public Instance pipe(Instance instance)
Pipe
pipe
in class Pipe
public Alphabet getPrunedAlphabet(int minimumCount)
public void writePrunedWords(java.io.File prunedFile, int minimumCount) throws java.io.IOException
java.io.IOException
public void addPrunedWordsToStoplist(SimpleTokenizer tokenizer, int minimumCount)
public void writeCommonWords(java.io.File commonFile, int totalWords) throws java.io.IOException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |