cc.mallet.topics
Class SimpleLDA
java.lang.Object
cc.mallet.topics.SimpleLDA
- All Implemented Interfaces:
- java.io.Serializable
public class SimpleLDA
- extends java.lang.Object
- implements java.io.Serializable
A simple implementation of Latent Dirichlet Allocation using Gibbs sampling.
This code is slower than the regular Mallet LDA implementation, but provides a
better starting place for understanding how sampling works and for
building new topic models.
- Author:
- David Mimno, Andrew McCallum
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
data
protected java.util.ArrayList<TopicAssignment> data
alphabet
protected Alphabet alphabet
topicAlphabet
protected LabelAlphabet topicAlphabet
numTopics
protected int numTopics
numTypes
protected int numTypes
alpha
protected double alpha
alphaSum
protected double alphaSum
beta
protected double beta
betaSum
protected double betaSum
DEFAULT_BETA
public static final double DEFAULT_BETA
- See Also:
- Constant Field Values
oneDocTopicCounts
protected int[] oneDocTopicCounts
typeTopicCounts
protected int[][] typeTopicCounts
tokensPerTopic
protected int[] tokensPerTopic
showTopicsInterval
public int showTopicsInterval
wordsPerTopic
public int wordsPerTopic
random
protected Randoms random
formatter
protected java.text.NumberFormat formatter
printLogLikelihood
protected boolean printLogLikelihood
SimpleLDA
public SimpleLDA(int numberOfTopics)
SimpleLDA
public SimpleLDA(int numberOfTopics,
double alphaSum,
double beta)
SimpleLDA
public SimpleLDA(int numberOfTopics,
double alphaSum,
double beta,
Randoms random)
SimpleLDA
public SimpleLDA(LabelAlphabet topicAlphabet,
double alphaSum,
double beta,
Randoms random)
getAlphabet
public Alphabet getAlphabet()
getTopicAlphabet
public LabelAlphabet getTopicAlphabet()
getNumTopics
public int getNumTopics()
getData
public java.util.ArrayList<TopicAssignment> getData()
setTopicDisplay
public void setTopicDisplay(int interval,
int n)
setRandomSeed
public void setRandomSeed(int seed)
getTypeTopicCounts
public int[][] getTypeTopicCounts()
getTopicTotals
public int[] getTopicTotals()
addInstances
public void addInstances(InstanceList training)
sample
public void sample(int iterations)
throws java.io.IOException
- Throws:
java.io.IOException
sampleTopicsForOneDoc
protected void sampleTopicsForOneDoc(FeatureSequence tokenSequence,
FeatureSequence topicSequence)
modelLogLikelihood
public double modelLogLikelihood()
topWords
public java.lang.String topWords(int numWords)
printDocumentTopics
public void printDocumentTopics(java.io.File file,
double threshold,
int max)
throws java.io.IOException
- Parameters:
file
- The filename to print tothreshold
- Only print topics with proportion greater than this numbermax
- Print no more than this many topics
- Throws:
java.io.IOException
printState
public void printState(java.io.File f)
throws java.io.IOException
- Throws:
java.io.IOException
printState
public void printState(java.io.PrintStream out)
write
public void write(java.io.File f)
main
public static void main(java.lang.String[] args)
throws java.io.IOException
- Throws:
java.io.IOException