cc.mallet.types
Class GainRatio

java.lang.Object
  extended by cc.mallet.types.SparseVector
      extended by cc.mallet.types.FeatureVector
          extended by cc.mallet.types.RankedFeatureVector
              extended by cc.mallet.types.GainRatio
All Implemented Interfaces:
AlphabetCarrying, ConstantMatrix, Vector, java.io.Serializable

public class GainRatio
extends RankedFeatureVector

List of features along with their thresholds sorted in descending order of the ratio of (1) information gained by splitting instances on the feature at its associated threshold value, to (2) the split information.

The calculations performed do not take into consideration the instance weights.

To create an instance of GainRatio from an InstanceList, one must do the following:

InstanceList ilist = ... ... GainRatio gr = GainRatio.createGainRatio(ilist);

J. R. Quinlan "Improved Use of Continuous Attributes in C4.5" ftp://ftp.cs.cmu.edu/project/jair/volume4/quinlan96a.ps

Author:
Gary Huang ghuang@cs.umass.edu
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class cc.mallet.types.RankedFeatureVector
RankedFeatureVector.Factory, RankedFeatureVector.PerLabelFactory
 
Field Summary
static double log2
           
 
Fields inherited from class cc.mallet.types.SparseVector
hasInfinite, indices, values
 
Constructor Summary
protected GainRatio(Alphabet dataAlphabet, double[] gainRatios, double[] splitPoints, double baseEntropy, LabelVector baseLabelDistribution, int numSplitPointsForBestFeature, int minNumInsts)
           
 
Method Summary
protected static java.lang.Object[] calcGainRatios(InstanceList ilist, int[] instIndices, int minNumInsts)
          Calculates gain ratios for all (feature, split point) pairs snd returns array of:
static GainRatio createGainRatio(InstanceList ilist)
          Constructs a GainRatio object.
static GainRatio createGainRatio(InstanceList ilist, int[] instIndices, int minNumInsts)
          Constructs a GainRatio object
 double getBaseEntropy()
           
 LabelVector getBaseLabelDistribution()
           
 double getMaxValuedThreshold()
           
 int getNumSplitPointsForBestFeature()
           
 double getThresholdAtRank(int rank)
           
static int[] sortInstances(InstanceList ilist, int[] instIndices, int featureIndex)
           
 
Methods inherited from class cc.mallet.types.RankedFeatureVector
getIndexAtRank, getMaxValue, getMaxValuedIndex, getMaxValuedIndexIn, getMaxValuedObject, getMaxValuedObjectIn, getMaxValueIn, getObjectAtRank, getRank, getRank, getValueAtRank, printByRank, printByRank, printLowerK, printTopK, set, setRankOrder, setRankOrder, setRankOrder, setReverseRankOrder
 
Methods inherited from class cc.mallet.types.FeatureVector
alphabetsMatch, cloneMatrix, cloneMatrixZeroed, contains, getAlphabet, getAlphabets, getObjectIndices, location, newFeatureVector, toSimpFile, toString, toString, value
 
Methods inherited from class cc.mallet.types.SparseVector
absNorm, addTo, addTo, arrayCopyFrom, arrayCopyFrom, arrayCopyInto, dotProduct, dotProduct, dotProduct, dotProduct, extendedDotProduct, extendedDotProduct, getDimensions, getIndices, getNumDimensions, getValues, incrementValue, indexAtLocation, infinityNorm, isBinary, isInfinite, isNaN, isNaNOrInfinite, location, makeBinary, makeNonBinary, map, numLocations, oneNorm, plusEqualsSparse, plusEqualsSparse, print, removeDuplicates, setAll, setValue, setValueAtLocation, singleIndex, singleSize, singleToIndices, singleValue, sortIndices, timesEquals, timesEqualsSparse, timesEqualsSparse, timesEqualsSparseZero, twoNorm, value, value, valueAtLocation, vectorAdd
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

log2

public static final double log2
Constructor Detail

GainRatio

protected GainRatio(Alphabet dataAlphabet,
                    double[] gainRatios,
                    double[] splitPoints,
                    double baseEntropy,
                    LabelVector baseLabelDistribution,
                    int numSplitPointsForBestFeature,
                    int minNumInsts)
Method Detail

calcGainRatios

protected static java.lang.Object[] calcGainRatios(InstanceList ilist,
                                                   int[] instIndices,
                                                   int minNumInsts)
Calculates gain ratios for all (feature, split point) pairs snd returns array of:
   1.  gain ratios (each element is the max gain ratio of a feature 
 for those split points with at least average gain)
   2.  the optimal split point for each feature
   3.  the overall entropy 
   4.  the overall label distribution of the given instances
   5.  the number of split points of the split feature.
   


sortInstances

public static int[] sortInstances(InstanceList ilist,
                                  int[] instIndices,
                                  int featureIndex)

createGainRatio

public static GainRatio createGainRatio(InstanceList ilist)
Constructs a GainRatio object.


createGainRatio

public static GainRatio createGainRatio(InstanceList ilist,
                                        int[] instIndices,
                                        int minNumInsts)
Constructs a GainRatio object


getMaxValuedThreshold

public double getMaxValuedThreshold()
Returns:
the threshold of the (feature, threshold) pair with with maximum gain ratio

getThresholdAtRank

public double getThresholdAtRank(int rank)
Returns:
the threshold of the (feature, threshold) pair with the given rank

getBaseEntropy

public double getBaseEntropy()

getBaseLabelDistribution

public LabelVector getBaseLabelDistribution()

getNumSplitPointsForBestFeature

public int getNumSplitPointsForBestFeature()