cc.mallet.cluster
Class GreedyAgglomerative

java.lang.Object
  extended by cc.mallet.cluster.Clusterer
      extended by cc.mallet.cluster.KBestClusterer
          extended by cc.mallet.cluster.HillClimbingClusterer
              extended by cc.mallet.cluster.GreedyAgglomerative
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
GreedyAgglomerativeByDensity

public class GreedyAgglomerative
extends HillClimbingClusterer

Greedily merges Instances until convergence. New merges are scored using NeighborEvaluator.

Since:
1.0
Version:
1.0
Author:
"Aron Culotta"
See Also:
HillClimbingClusterer, Serialized Form

Field Summary
protected  boolean converged
          True if should stop clustering.
protected  PairwiseMatrix scoreCache
          Cache for calls to NeighborhoodEvaluator.
protected  double stoppingThreshold
          Converged when merge score is below this value.
 
Fields inherited from class cc.mallet.cluster.HillClimbingClusterer
evaluator
 
Constructor Summary
GreedyAgglomerative(Pipe instancePipe, NeighborEvaluator evaluator, double stoppingThreshold)
           
 
Method Summary
 boolean converged(Clustering clustering)
           
protected  double getScore(Clustering clustering, int i, int j)
           
 Clustering improveClustering(Clustering clustering)
          For each pair of clusters, calculate the score of the Neighbor that would result from merging the two clusters.
 Clustering initializeClustering(InstanceList instances)
           
 void reset()
          Reset convergence to false so a new round of clustering can begin.
 java.lang.String toString()
           
protected  void updateScoreMatrix(Clustering clustering, int i, int j)
          Resets the values of clusters that have been merged.
 
Methods inherited from class cc.mallet.cluster.HillClimbingClusterer
cluster, cluster, clusterKBest, clusterKBest, getEvaluator
 
Methods inherited from class cc.mallet.cluster.Clusterer
getPipe
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

stoppingThreshold

protected double stoppingThreshold
Converged when merge score is below this value.


converged

protected boolean converged
True if should stop clustering.


scoreCache

protected PairwiseMatrix scoreCache
Cache for calls to NeighborhoodEvaluator. In some experiments, reduced running time by nearly half.

Constructor Detail

GreedyAgglomerative

public GreedyAgglomerative(Pipe instancePipe,
                           NeighborEvaluator evaluator,
                           double stoppingThreshold)
Parameters:
instancePipe - Pipe for each underying Instance.
evaluator - To score potential merges.
stoppingThreshold - Clustering converges when the evaluator score is below this value.
Method Detail

initializeClustering

public Clustering initializeClustering(InstanceList instances)
Specified by:
initializeClustering in class HillClimbingClusterer
Parameters:
instances -
Returns:
A singleton clustering (each Instance in its own cluster).

converged

public boolean converged(Clustering clustering)
Specified by:
converged in class HillClimbingClusterer
Returns:
True if clustering is complete.

reset

public void reset()
Reset convergence to false so a new round of clustering can begin.

Specified by:
reset in class HillClimbingClusterer

improveClustering

public Clustering improveClustering(Clustering clustering)
For each pair of clusters, calculate the score of the Neighbor that would result from merging the two clusters. Choose the merge that obtains the highest score. If no merge improves score, return original Clustering

Specified by:
improveClustering in class HillClimbingClusterer
Parameters:
clustering -
Returns:

getScore

protected double getScore(Clustering clustering,
                          int i,
                          int j)
Parameters:
clustering -
i -
j -
Returns:
The score for merging these two clusters.

updateScoreMatrix

protected void updateScoreMatrix(Clustering clustering,
                                 int i,
                                 int j)
Resets the values of clusters that have been merged.

Parameters:
clustering -
i -
j -

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object