cc.mallet.cluster
Class KMeans

java.lang.Object
  extended by cc.mallet.cluster.Clusterer
      extended by cc.mallet.cluster.KMeans
All Implemented Interfaces:
java.io.Serializable

public class KMeans
extends Clusterer

KMeans Clusterer Clusters the points into k clusters by minimizing the total intra-cluster variance. It uses a given Metric to find the distance between Instances, which should have SparseVectors in the data field.

See Also:
Serialized Form

Field Summary
static int EMPTY_DROP
          Drop an empty cluster
static int EMPTY_ERROR
          Treat an empty cluster as an error condition.
static int EMPTY_SINGLE
          Place the single instance furthest from the previous cluster mean
 
Constructor Summary
KMeans(Pipe instancePipe, int numClusters, Metric metric)
          Construct a KMeans object
KMeans(Pipe instancePipe, int numClusters, Metric metric, int emptyAction)
          Construct a KMeans object
 
Method Summary
 Clustering cluster(InstanceList instances)
          Cluster instances
 java.util.ArrayList<SparseVector> getClusterMeans()
          Return the ArrayList of cluster means after a run of the algorithm.
 
Methods inherited from class cc.mallet.cluster.Clusterer
getPipe
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EMPTY_ERROR

public static final int EMPTY_ERROR
Treat an empty cluster as an error condition.

See Also:
Constant Field Values

EMPTY_DROP

public static final int EMPTY_DROP
Drop an empty cluster

See Also:
Constant Field Values

EMPTY_SINGLE

public static final int EMPTY_SINGLE
Place the single instance furthest from the previous cluster mean

See Also:
Constant Field Values
Constructor Detail

KMeans

public KMeans(Pipe instancePipe,
              int numClusters,
              Metric metric,
              int emptyAction)
Construct a KMeans object

Parameters:
instancePipe - Pipe for the instances being clustered
numClusters - Number of clusters to use
metric - Metric object to measure instance distances
emptyAction - Specify what should happen when an empty cluster occurs

KMeans

public KMeans(Pipe instancePipe,
              int numClusters,
              Metric metric)
Construct a KMeans object

Parameters:
instancePipe - Pipe for the instances being clustered
numClusters - Number of clusters to use
metric - Metric object to measure instance distances

If an empty cluster occurs, it is considered an error.

Method Detail

cluster

public Clustering cluster(InstanceList instances)
Cluster instances

Specified by:
cluster in class Clusterer
Parameters:
instances - List of instances to cluster

getClusterMeans

public java.util.ArrayList<SparseVector> getClusterMeans()
Return the ArrayList of cluster means after a run of the algorithm.

Returns:
An ArrayList of Instances.