cc.mallet.topics.tui
Class DMRLoader
java.lang.Object
cc.mallet.topics.tui.DMRLoader
- All Implemented Interfaces:
- java.io.Serializable
public class DMRLoader
- extends java.lang.Object
- implements java.io.Serializable
This class loads data into the format for the MALLET
Dirichlet-multinomial regression (DMR). DMR topic models
learn topic assignments conditioned on observed features.
The input format consists of two files, one for text and
the other for features. The "text" file consists of one document
per line. This class will tokenize and remove stopwords.
The "features" file contains whitespace-delimited features in this format:
blue heavy width=12.08
Features without explicit values ("blue" and "heavy" in the example) are set to 1.0.
- See Also:
- Serialized Form
Method Summary |
void |
load(java.io.File wordsFile,
java.io.File featuresFile,
java.io.File instancesFile)
|
static void |
main(java.lang.String[] args)
|
static java.io.BufferedReader |
openReader(java.io.File file)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DMRLoader
public DMRLoader()
openReader
public static java.io.BufferedReader openReader(java.io.File file)
throws java.io.IOException
- Throws:
java.io.IOException
load
public void load(java.io.File wordsFile,
java.io.File featuresFile,
java.io.File instancesFile)
throws java.io.IOException,
java.io.FileNotFoundException
- Throws:
java.io.IOException
java.io.FileNotFoundException
main
public static void main(java.lang.String[] args)
throws java.io.FileNotFoundException,
java.io.IOException
- Throws:
java.io.FileNotFoundException
java.io.IOException