cc.mallet.classify
Class NaiveBayesTrainer

java.lang.Object
  extended by cc.mallet.classify.ClassifierTrainer<NaiveBayes>
      extended by cc.mallet.classify.NaiveBayesTrainer
All Implemented Interfaces:
Boostable, ClassifierTrainer.ByIncrements<NaiveBayes>, ClassifierTrainer.ByInstanceIncrements<NaiveBayes>, AlphabetCarrying, java.io.Serializable

public class NaiveBayesTrainer
extends ClassifierTrainer<NaiveBayes>
implements ClassifierTrainer.ByInstanceIncrements<NaiveBayes>, Boostable, AlphabetCarrying, java.io.Serializable

Class used to generate a NaiveBayes classifier from a set of training data. In an Bayes classifier, the p(Classification|Data) = p(Data|Classification)p(Classification)/p(Data)

To compute the likelihood:
p(Data|Classification) = p(d1,d2,..dn | Classification)
Naive Bayes makes the assumption that all of the data are conditionally independent given the Classification:
p(d1,d2,...dn | Classification) = p(d1|Classification)p(d2|Classification)..

As with other classifiers in Mallet, NaiveBayes is implemented as two classes: a trainer and a classifier. The NaiveBayesTrainer produces estimates of the various p(dn|Classifier) and contructs this class with those estimates.

A call to train() or incrementalTrain() produces a NaiveBayes classifier that can can be used to classify instances. A call to incrementalTrain() does not throw away the internal state of the trainer; subsequent calls to incrementalTrain() train by extending the previous training set.

A NaiveBayesTrainer can be persisted using serialization.

Author:
Andrew McCallum mccallum@cs.umass.edu
See Also:
NaiveBayes, Serialized Form

Nested Class Summary
static class NaiveBayesTrainer.Factory
           
 
Nested classes/interfaces inherited from class cc.mallet.classify.ClassifierTrainer
ClassifierTrainer.ByActiveLearning<C extends Classifier>, ClassifierTrainer.ByIncrements<C extends Classifier>, ClassifierTrainer.ByInstanceIncrements<C extends Classifier>, ClassifierTrainer.ByOptimization<C extends Classifier>
 
Field Summary
 
Fields inherited from class cc.mallet.classify.ClassifierTrainer
finishedTraining, validationSet
 
Constructor Summary
NaiveBayesTrainer()
           
NaiveBayesTrainer(NaiveBayes initialClassifier)
           
NaiveBayesTrainer(Pipe instancePipe)
           
 
Method Summary
 boolean alphabetsMatch(AlphabetCarrying object)
           
 Alphabet getAlphabet()
           
 Alphabet[] getAlphabets()
           
 NaiveBayes getClassifier()
           
 double getDocLengthNormalization()
           
 Multinomial.Estimator getFeatureMultinomialEstimator()
          Get the MultinomialEstimator instance used to specify the type of estimator for features.
 Multinomial.Estimator getPriorMultinomialEstimator()
          Get the MultinomialEstimator instance used to specify the type of estimator for priors.
 NaiveBayesTrainer setDocLengthNormalization(double d)
           
 NaiveBayesTrainer setFeatureMultinomialEstimator(Multinomial.Estimator me)
          Set the Multinomial Estimator used for features.
 NaiveBayesTrainer setPriorMultinomialEstimator(Multinomial.Estimator me)
          Set the Multinomial Estimator used for priors.
 java.lang.String toString()
          Create a NaiveBayes classifier from a set of training data and the previous state of the trainer.
 NaiveBayes train(InstanceList trainingList)
          Create a NaiveBayes classifier from a set of training data.
 NaiveBayes trainIncremental(Instance instance)
           
 NaiveBayes trainIncremental(InstanceList trainingInstancesToAdd)
           
 
Methods inherited from class cc.mallet.classify.ClassifierTrainer
getValidationInstances, isFinishedTraining, setValidationInstances
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NaiveBayesTrainer

public NaiveBayesTrainer(NaiveBayes initialClassifier)

NaiveBayesTrainer

public NaiveBayesTrainer(Pipe instancePipe)

NaiveBayesTrainer

public NaiveBayesTrainer()
Method Detail

getClassifier

public NaiveBayes getClassifier()
Specified by:
getClassifier in class ClassifierTrainer<NaiveBayes>

setDocLengthNormalization

public NaiveBayesTrainer setDocLengthNormalization(double d)

getDocLengthNormalization

public double getDocLengthNormalization()

getFeatureMultinomialEstimator

public Multinomial.Estimator getFeatureMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for features.

Returns:
estimator to be cloned on next call to train() or first call to incrementalTrain()

setFeatureMultinomialEstimator

public NaiveBayesTrainer setFeatureMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for features. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()

Parameters:
me - to be cloned on next call to train() or first call to incrementalTrain()

getPriorMultinomialEstimator

public Multinomial.Estimator getPriorMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for priors.

Returns:
estimator to be cloned on next call to train() or first call to incrementalTrain()

setPriorMultinomialEstimator

public NaiveBayesTrainer setPriorMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for priors. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()

Parameters:
me - to be cloned on next call to train() or first call to incrementalTrain()

train

public NaiveBayes train(InstanceList trainingList)
Create a NaiveBayes classifier from a set of training data. The trainer uses counts of each feature in an instance's feature vector to provide an estimate of p(Labeling| feature). The internal state of the trainer is thrown away ( by a call to reset() ) when train() returns. Each call to train() is completely independent of any other.

Specified by:
train in class ClassifierTrainer<NaiveBayes>
Parameters:
trainingList - The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of Labeling
validationList - Currently unused
testSet - Currently unused
evaluator - Currently unused
initialClassifier - Currently unused
Returns:
The NaiveBayes classifier as trained on the trainingList

trainIncremental

public NaiveBayes trainIncremental(InstanceList trainingInstancesToAdd)
Specified by:
trainIncremental in interface ClassifierTrainer.ByIncrements<NaiveBayes>

trainIncremental

public NaiveBayes trainIncremental(Instance instance)
Specified by:
trainIncremental in interface ClassifierTrainer.ByInstanceIncrements<NaiveBayes>

toString

public java.lang.String toString()
Create a NaiveBayes classifier from a set of training data and the previous state of the trainer. Subsequent calls to incrementalTrain() add to the state of the trainer. An incremental training session should consist only of calls to incrementalTrain() and have no calls to train(); *

Overrides:
toString in class java.lang.Object
Parameters:
trainingList - The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of Labeling
validationList - Currently unused
testSet - Currently unused
evaluator - Currently unused
initialClassifier - Currently unused
Returns:
The NaiveBayes classifier as trained on the trainingList and the previous trainingLists passed to incrementalTrain()

alphabetsMatch

public boolean alphabetsMatch(AlphabetCarrying object)

getAlphabet

public Alphabet getAlphabet()
Specified by:
getAlphabet in interface AlphabetCarrying

getAlphabets

public Alphabet[] getAlphabets()
Specified by:
getAlphabets in interface AlphabetCarrying