cc.mallet.fst
Class HMM

java.lang.Object
  extended by cc.mallet.fst.Transducer
      extended by cc.mallet.fst.HMM
All Implemented Interfaces:
java.io.Serializable

public class HMM
extends Transducer
implements java.io.Serializable

A Hidden Markov Model.

See Also:
Serialized Form

Nested Class Summary
 class HMM.Incrementor
           
static class HMM.State
           
protected static class HMM.TransitionIterator
           
 class HMM.WeightedIncrementor
           
 
Field Summary
 
Fields inherited from class cc.mallet.fst.Transducer
CERTAIN_WEIGHT, IMPOSSIBLE_WEIGHT, inputPipe, outputPipe
 
Constructor Summary
HMM(Alphabet inputAlphabet, Alphabet outputAlphabet)
           
HMM(Pipe inputPipe, Pipe outputPipe)
           
 
Method Summary
 void addFullyConnectedStates(java.lang.String[] stateNames)
          Add a group of states that are fully connected with each other, with parameters equal zero, and labels on their out-going arcs the same name as their destination state names.
 void addFullyConnectedStatesForBiLabels()
           
 void addFullyConnectedStatesForLabels()
           
 void addFullyConnectedStatesForThreeQuarterLabels(InstanceList trainingSet)
           
 void addFullyConnectedStatesForTriLabels()
           
 java.lang.String addOrderNStates(InstanceList trainingSet, int[] orders, boolean[] defaults, java.lang.String start, java.util.regex.Pattern forbidden, java.util.regex.Pattern allowed, boolean fullyConnected)
          Assumes that the HMM's output alphabet contains Strings.
 void addSelfTransitioningStateForAllLabels(java.lang.String name)
           
 void addState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames)
           
 void addState(java.lang.String name, java.lang.String[] destinationNames)
          Add a state with parameters equal zero, and labels on out-going arcs the same name as their destination state names.
 void addStatesForBiLabelsConnectedAsIn(InstanceList trainingSet)
          Add states to create a second-order Markov model on labels, adding only those transitions the occur in the given trainingSet.
 void addStatesForHalfLabelsConnectedAsIn(InstanceList trainingSet)
          Add as many states as there are labels, but don't create separate weights for each source-destination pair of states.
 void addStatesForLabelsConnectedAsIn(InstanceList trainingSet)
          Add states to create a first-order Markov model on labels, adding only those transitions the occur in the given trainingSet.
 void addStatesForThreeQuarterLabelsConnectedAsIn(InstanceList trainingSet)
          Add as many states as there are labels, but don't create separate observational-test-weights for each source-destination pair of states---instead have all the incoming transitions to a state share the same observational-feature-test weights.
 void estimate()
           
 Alphabet getInputAlphabet()
           
 Alphabet getOutputAlphabet()
           
 Transducer.State getState(int index)
           
 HMM.State getState(java.lang.String name)
           
 void initEmissions(java.util.Random random, double noise)
           
 java.util.Iterator initialStateIterator()
           
 void initTransitions(java.util.Random random, double noise)
          Separate initialization of initial/transitions and emissions.
 boolean isTrainable()
           
 int numStates()
           
 void print()
           
 void reset()
          Deprecated. 
 boolean train(InstanceList ilist)
          Trains a HMM without validation and evaluation.
 boolean train(InstanceList ilist, InstanceList validation, InstanceList testing)
          Trains a HMM with evaluator set to null.
 boolean train(InstanceList ilist, InstanceList validation, InstanceList testing, TransducerEvaluator eval)
           
 void write(java.io.File f)
           
 
Methods inherited from class cc.mallet.fst.Transducer
averageTokenAccuracy, canIterateAllTransitions, generatePath, getInputPipe, getMaxLatticeFactory, getOutputPipe, getSumLatticeFactory, isGenerative, label, less_efficient_sumLogProb, no_longer_needed_sumNegLogProb, setMaxLatticeFactory, setSumLatticeFactory, stateIndexOfString, sumLogProb, transduce, transduce
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HMM

public HMM(Pipe inputPipe,
           Pipe outputPipe)

HMM

public HMM(Alphabet inputAlphabet,
           Alphabet outputAlphabet)
Method Detail

getInputAlphabet

public Alphabet getInputAlphabet()

getOutputAlphabet

public Alphabet getOutputAlphabet()

print

public void print()
Overrides:
print in class Transducer

addState

public void addState(java.lang.String name,
                     double initialWeight,
                     double finalWeight,
                     java.lang.String[] destinationNames,
                     java.lang.String[] labelNames)

addState

public void addState(java.lang.String name,
                     java.lang.String[] destinationNames)
Add a state with parameters equal zero, and labels on out-going arcs the same name as their destination state names.


addFullyConnectedStates

public void addFullyConnectedStates(java.lang.String[] stateNames)
Add a group of states that are fully connected with each other, with parameters equal zero, and labels on their out-going arcs the same name as their destination state names.


addFullyConnectedStatesForLabels

public void addFullyConnectedStatesForLabels()

addStatesForLabelsConnectedAsIn

public void addStatesForLabelsConnectedAsIn(InstanceList trainingSet)
Add states to create a first-order Markov model on labels, adding only those transitions the occur in the given trainingSet.


addStatesForHalfLabelsConnectedAsIn

public void addStatesForHalfLabelsConnectedAsIn(InstanceList trainingSet)
Add as many states as there are labels, but don't create separate weights for each source-destination pair of states. Instead have all the incoming transitions to a state share the same weights.


addStatesForThreeQuarterLabelsConnectedAsIn

public void addStatesForThreeQuarterLabelsConnectedAsIn(InstanceList trainingSet)
Add as many states as there are labels, but don't create separate observational-test-weights for each source-destination pair of states---instead have all the incoming transitions to a state share the same observational-feature-test weights. However, do create separate default feature for each transition, (which acts as an HMM-style transition probability).


addFullyConnectedStatesForThreeQuarterLabels

public void addFullyConnectedStatesForThreeQuarterLabels(InstanceList trainingSet)

addFullyConnectedStatesForBiLabels

public void addFullyConnectedStatesForBiLabels()

addStatesForBiLabelsConnectedAsIn

public void addStatesForBiLabelsConnectedAsIn(InstanceList trainingSet)
Add states to create a second-order Markov model on labels, adding only those transitions the occur in the given trainingSet.


addFullyConnectedStatesForTriLabels

public void addFullyConnectedStatesForTriLabels()

addSelfTransitioningStateForAllLabels

public void addSelfTransitioningStateForAllLabels(java.lang.String name)

addOrderNStates

public java.lang.String addOrderNStates(InstanceList trainingSet,
                                        int[] orders,
                                        boolean[] defaults,
                                        java.lang.String start,
                                        java.util.regex.Pattern forbidden,
                                        java.util.regex.Pattern allowed,
                                        boolean fullyConnected)
Assumes that the HMM's output alphabet contains Strings. Creates an order-n HMM with input predicates and output labels given by trainingSet and order, connectivity, and weights given by the remaining arguments.

Parameters:
trainingSet - the training instances
orders - an array of increasing non-negative numbers giving the orders of the features for this HMM. The largest number n is the Markov order of the HMM. States are n-tuples of output labels. Each of the other numbers k in orders represents a weight set shared by all destination states whose last (most recent) k labels agree. If orders is null, an order-0 HMM is built.
defaults - If non-null, it must be the same length as orders , with true positions indicating that the weight set for the corresponding order contains only the weight for a default feature; otherwise, the weight set has weights for all features built from input predicates.
start - The label that represents the context of the start of a sequence. It may be also used for sequence labels.
forbidden - If non-null, specifies what pairs of successive labels are not allowed, both for constructing norder states or for transitions. A label pair (u,v) is not allowed if u + "," + v matches forbidden.
allowed - If non-null, specifies what pairs of successive labels are allowed, both for constructing norder states or for transitions. A label pair (u,v) is allowed only if u + "," + v matches allowed.
fullyConnected - Whether to include all allowed transitions, even those not occurring in trainingSet,

getState

public HMM.State getState(java.lang.String name)

numStates

public int numStates()
Specified by:
numStates in class Transducer

getState

public Transducer.State getState(int index)
Specified by:
getState in class Transducer

initialStateIterator

public java.util.Iterator initialStateIterator()
Specified by:
initialStateIterator in class Transducer

isTrainable

public boolean isTrainable()

reset

@Deprecated
public void reset()
Deprecated. 


initTransitions

public void initTransitions(java.util.Random random,
                            double noise)
Separate initialization of initial/transitions and emissions. All probabilities are proportional to (1+Uniform[0,1])^noise.

Parameters:
random - Random object (if null use uniform distribution)
noise - Noise exponent to use. If zero, then uniform distribution.

initEmissions

public void initEmissions(java.util.Random random,
                          double noise)

estimate

public void estimate()

train

public boolean train(InstanceList ilist)
Trains a HMM without validation and evaluation.


train

public boolean train(InstanceList ilist,
                     InstanceList validation,
                     InstanceList testing)
Trains a HMM with evaluator set to null.


train

public boolean train(InstanceList ilist,
                     InstanceList validation,
                     InstanceList testing,
                     TransducerEvaluator eval)

write

public void write(java.io.File f)