MAchine Learning for LanguagE Toolkit

Optimization

MALLET includes methods for numerically optimizing functions. The primary use for numerical optimization in machine learning is to find parameters that maximize a log-likelihood function given observed data. The code, however, is very general and can be used for arbitrary problems.
The cc.mallet.optimize package centers around two interfaces, Optimizable and Optimizer. Optimizable has several sub-interfaces that define different classes of methods for optimization. Optimizable classes are stateful; they must store the current values of all parameters. The most commonly used optimizable sub-interface is Optimizable.ByGradientValue. Classes that implement this interface must provide methods for calculating the value of a specific function at the current parameter settings and for calculating the gradient of the function, with respect to the current parameter values. Note that this interface does not require the Hessian matrix of second derivatives, so the interface is appropriate for numerical methods that approximate the Hessian.
The following is an example of a class implementing Optimizable.ByGradientValue. This class implements a very simple quadratic function in two variables.
public class OptimizerExample implements Optimizable.ByGradientValue {

    // Optimizables encapsulate all state variables, 
    //  so a single Optimizer object can be used to optimize 
    //  several functions.

    double[] parameters;

    public OptimizerExample(double x, double y) {
        parameters = new double[2];
        parameters[0] = x;
        parameters[1] = y;
    }

    public double getValue() {

        double x = parameters[0];
        double y = parameters[1];

        return -3*x*x - 4*y*y + 2*x - 4*y + 18;

    }

    public void getValueGradient(double[] gradient) {

        gradient[0] = -6 * parameters[0] + 2;
        gradient[1] = -8 * parameters[1] - 4;

    }

    // The following get/set methods satisfy the Optimizable interface

    public int getNumParameters() { return 2; }
    public double getParameter(int i) { return parameters[i]; }
    public void getParameters(double[] buffer) {
        buffer[0] = parameters[0];
        buffer[1] = parameters[1];
    }

    public void setParameter(int i, double r) {
        parameters[i] = r;
    }
    public void setParameters(double[] newParameters) {
        parameters[0] = newParameters[0];
        parameters[1] = newParameters[1];
    }
}
Now that we have a class representing an optimizable function, we can pass it to an optimizer that takes this sub-interface. The ByGradientValue sub-interface is compatible with the Limited Memory BFGS optimizer, a quasi-Newton method that does not require computation of a Hessian matrix.
In this example, we create an optimizable object, pass it to a new optimizer, and optimize the parameters.
        OptimizerExample optimizable = new OptimizerExample(0, 0);
        Optimizer optimizer = new LimitedMemoryBFGS(optimizable);

        boolean converged = false;

        try {
            converged = optimizer.optimize();
        } catch (IllegalArgumentException e) {
            // This exception may be thrown if L-BFGS
            //  cannot step in the current direction.
            // This condition does not necessarily mean that
            //  the optimizer has failed, but it doesn't want
            //  to claim to have succeeded...        
        }

        System.out.println(optimizable.getParameter(0) + ", " +
                           optimizable.getParameter(1));
As expected, the result is close to 0.33 and -0.5.