9/22/11 MALLET 2.0.7 Release Notes * Fixed a bug in the Generalized Expectation (GE) implementation for MaxEnt models. The old code could give low accuracy when using a small number of constraints. See the note at the top of this page for more information: http://mallet.cs.umass.edu/ge-classification.php * Fixed a bug in SVMLight2Vectors that could result in different Alphabets when importing multiple files at once. * Fixed a bug in SVMLight2Classify that allowed previously unobserved features to be added to the data Alphabet, possibly resulting in mismatching Classifier and InstanceList Alphabets. * Fixed bugs in the search direction computation in ConjugateGradient. * Added support for cross-validation in Vectors2Classify (in addition to random subsamples of the data set). * Added support for importing SVMLight data with Alphabets for which growth is stopped. * Added new options to Optimizers: it is now possible to set the convergence tolerance for GradientAscent, and set the LineOptimizer for LimitedMemoryBFGS, among others. * The GE implementation for MaxEnt models is more efficient, has support for multiple types of constraints, and support for implementing new constraints. More information: http://mallet.cs.umass.edu/ge-classification.php * The GE implementation for CRFs is much more efficient (O(L^2), where L is the number of labels, rather than O(L^3) or O(L^4)), has support for multiple types of constraints, and support for implementing new constraints. There is also now support for training CRFs with GE from the command line. See: http://mallet.cs.umass.edu/semi-sup-fst.php * Added preliminary support for Posterior Regularization (PR) training of both MaxEnt models and CRFs. See http://mallet.cs.umass.edu/ge-classification.php and http://mallet.cs.umass.edu/semi-sup-fst.php * Modified RankedFeatureVector to improve efficiency (from David North). * New topic model wrapper class: cc.mallet.topics.tui.TopicTrainer This class simplifies training a topic model by focusing solely on standard LDA. Using the same interface for LDA, PAM, hLDA and other models made the command line options unnecessarily complicated and led to confusion over which options are available for which models. We expect this interface to replace the current interface for the "train-topics" command in future versions. For this version, you can access the new trainer with this command: bin/mallet run cc.mallet.topics.tui.TopicTrainer --input ... * Topic diagnostics XML. From the new TopicTrainer, use the --diagnostics-file [filename] command line argument. * Ability to restore models from gzipped "state" files. From the new TopicTrainer, use the --input-state [filename] argument. Note that you can manually edit this file. Any token with topic set to -1 will be immediately resampled upon loading. * The format for the "doc-topics" output file now prints the "Name" field rather than the "Source" field. * Bug fixes in likelihood calculation. * Made GRMM compatible with MALLET 2.0. GRMM should now work with this version of MALLET * Made implementations of piecewise training and piecewise pseudolikelihood available publicly * Bug fix to GRMM TableFactor (from John Pate)