9/22/11

MALLET 2.0.7 Release Notes

* Fixed a bug in the Generalized Expectation (GE) implementation for
MaxEnt models. The old code could give low accuracy when using a small
number of constraints. See the note at the top of this page for more
information: http://mallet.cs.umass.edu/ge-classification.php

* Fixed a bug in SVMLight2Vectors that could result in different
Alphabets when importing multiple files at once.

* Fixed a bug in SVMLight2Classify that allowed previously unobserved
features to be added to the data Alphabet, possibly resulting in
mismatching Classifier and InstanceList Alphabets.

* Fixed bugs in the search direction computation in ConjugateGradient.

* Added support for cross-validation in Vectors2Classify (in addition to
random subsamples of the data set).

* Added support for importing SVMLight data with Alphabets for which
growth is stopped.

* Added new options to Optimizers: it is now possible to set the
convergence tolerance for GradientAscent, and set the LineOptimizer for
LimitedMemoryBFGS, among others.

* The GE implementation for MaxEnt models is more efficient, has support
for multiple types of constraints, and support for implementing new
constraints. More information: http://mallet.cs.umass.edu/ge-classification.php

* The GE implementation for CRFs is much more efficient (O(L^2), where L
is the number of labels, rather than O(L^3) or O(L^4)), has support for
multiple types of constraints, and support for implementing new
constraints. There is also now support for training CRFs with GE from
the command line. See: http://mallet.cs.umass.edu/semi-sup-fst.php

* Added preliminary support for Posterior Regularization (PR) training
of both MaxEnt models and CRFs. See 
http://mallet.cs.umass.edu/ge-classification.php and
http://mallet.cs.umass.edu/semi-sup-fst.php

* Modified RankedFeatureVector to improve efficiency (from David North).

* New topic model wrapper class: cc.mallet.topics.tui.TopicTrainer
 This class simplifies training a topic model by focusing solely on
standard LDA. Using the same interface for LDA, PAM, hLDA and other
models made the command line options unnecessarily complicated and led
to confusion over which options are available for which models.

 We expect this interface to replace the current interface for the
"train-topics" command in future versions. For this version, you can
access the new trainer with this command:

   bin/mallet run cc.mallet.topics.tui.TopicTrainer --input ...

* Topic diagnostics XML. From the new TopicTrainer, use the
  --diagnostics-file [filename] command line argument.

* Ability to restore models from gzipped "state" files. From the new
TopicTrainer, use the --input-state [filename] argument. Note that you
can manually edit this file. Any token with topic set to -1 will be
immediately resampled upon loading.

* The format for the "doc-topics" output file now prints the "Name" field
rather than the "Source" field.

* Bug fixes in likelihood calculation.

* Made GRMM compatible with MALLET 2.0. GRMM should now work with this
version of MALLET

* Made implementations of piecewise training and piecewise
pseudolikelihood available publicly

* Bug fix to GRMM TableFactor (from John Pate)