cc.mallet.topics
Class TopicInferencer

java.lang.Object
  extended by cc.mallet.topics.TopicInferencer
All Implemented Interfaces:
java.io.Serializable

public class TopicInferencer
extends java.lang.Object
implements java.io.Serializable

See Also:
Serialized Form

Field Summary
protected  double[] alpha
           
protected  double beta
           
protected  double betaSum
           
protected  int numTopics
           
protected  int numTypes
           
protected  Randoms random
           
protected  int[] tokensPerTopic
           
protected  int topicBits
           
protected  int topicMask
           
protected  int[][] typeTopicCounts
           
 
Constructor Summary
TopicInferencer(int[][] typeTopicCounts, int[] tokensPerTopic, Alphabet alphabet, double[] alpha, double beta, double betaSum)
           
 
Method Summary
 double[] getSampledDistribution(Instance instance, int numIterations, int thinning, int burnIn)
          Use Gibbs sampling to infer a topic distribution.
static TopicInferencer read(java.io.File f)
           
 void setRandomSeed(int seed)
           
 void writeInferredDistributions(InstanceList instances, java.io.File distributionsFile, int numIterations, int thinning, int burnIn, double threshold, int max)
          Infer topics for the provided instances and write distributions to the provided file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

numTopics

protected int numTopics

topicMask

protected int topicMask

topicBits

protected int topicBits

numTypes

protected int numTypes

alpha

protected double[] alpha

beta

protected double beta

betaSum

protected double betaSum

typeTopicCounts

protected int[][] typeTopicCounts

tokensPerTopic

protected int[] tokensPerTopic

random

protected Randoms random
Constructor Detail

TopicInferencer

public TopicInferencer(int[][] typeTopicCounts,
                       int[] tokensPerTopic,
                       Alphabet alphabet,
                       double[] alpha,
                       double beta,
                       double betaSum)
Method Detail

setRandomSeed

public void setRandomSeed(int seed)

getSampledDistribution

public double[] getSampledDistribution(Instance instance,
                                       int numIterations,
                                       int thinning,
                                       int burnIn)
Use Gibbs sampling to infer a topic distribution. Topics are initialized to the (or a) most probable topic for each token. Using zero iterations returns exactly this initial topic distribution.

This code does not adjust type-topic counts: P(w|t) is clamped.


writeInferredDistributions

public void writeInferredDistributions(InstanceList instances,
                                       java.io.File distributionsFile,
                                       int numIterations,
                                       int thinning,
                                       int burnIn,
                                       double threshold,
                                       int max)
                                throws java.io.IOException
Infer topics for the provided instances and write distributions to the provided file.

Parameters:
instances -
distributionsFile -
numIterations - The total number of iterations of sampling per document
thinning - The number of iterations between saved samples
burnIn - The number of iterations before the first saved sample
threshold - The minimum proportion of a given topic that will be written
max - The total number of topics to report per document]
Throws:
java.io.IOException

read

public static TopicInferencer read(java.io.File f)
                            throws java.lang.Exception
Throws:
java.lang.Exception