cc.mallet.topics.tui
Class DMRLoader

java.lang.Object
  extended by cc.mallet.topics.tui.DMRLoader
All Implemented Interfaces:
java.io.Serializable

public class DMRLoader
extends java.lang.Object
implements java.io.Serializable

This class loads data into the format for the MALLET Dirichlet-multinomial regression (DMR). DMR topic models learn topic assignments conditioned on observed features.

The input format consists of two files, one for text and the other for features. The "text" file consists of one document per line. This class will tokenize and remove stopwords.

The "features" file contains whitespace-delimited features in this format: blue heavy width=12.08 Features without explicit values ("blue" and "heavy" in the example) are set to 1.0.

See Also:
Serialized Form

Constructor Summary
DMRLoader()
           
 
Method Summary
 void load(java.io.File wordsFile, java.io.File featuresFile, java.io.File instancesFile)
           
static void main(java.lang.String[] args)
           
static java.io.BufferedReader openReader(java.io.File file)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DMRLoader

public DMRLoader()
Method Detail

openReader

public static java.io.BufferedReader openReader(java.io.File file)
                                         throws java.io.IOException
Throws:
java.io.IOException

load

public void load(java.io.File wordsFile,
                 java.io.File featuresFile,
                 java.io.File instancesFile)
          throws java.io.IOException,
                 java.io.FileNotFoundException
Throws:
java.io.IOException
java.io.FileNotFoundException

main

public static void main(java.lang.String[] args)
                 throws java.io.FileNotFoundException,
                        java.io.IOException
Throws:
java.io.FileNotFoundException
java.io.IOException