cc.mallet.types
Class GainRatio
java.lang.Object
cc.mallet.types.SparseVector
cc.mallet.types.FeatureVector
cc.mallet.types.RankedFeatureVector
cc.mallet.types.GainRatio
- All Implemented Interfaces:
- AlphabetCarrying, ConstantMatrix, Vector, java.io.Serializable
public class GainRatio
- extends RankedFeatureVector
List of features along with their thresholds sorted in descending order of
the ratio of (1) information gained by splitting instances on the
feature at its associated threshold value, to (2) the split information.
The calculations performed do not take into consideration the instance weights.
To create an instance of GainRatio from an InstanceList, one must do the following:
InstanceList ilist = ...
...
GainRatio gr = GainRatio.createGainRatio(ilist);
J. R. Quinlan
"Improved Use of Continuous Attributes in C4.5"
ftp://ftp.cs.cmu.edu/project/jair/volume4/quinlan96a.ps
- Author:
- Gary Huang ghuang@cs.umass.edu
- See Also:
- Serialized Form
Field Summary |
static double |
log2
|
Constructor Summary |
protected |
GainRatio(Alphabet dataAlphabet,
double[] gainRatios,
double[] splitPoints,
double baseEntropy,
LabelVector baseLabelDistribution,
int numSplitPointsForBestFeature,
int minNumInsts)
|
Methods inherited from class cc.mallet.types.RankedFeatureVector |
getIndexAtRank, getMaxValue, getMaxValuedIndex, getMaxValuedIndexIn, getMaxValuedObject, getMaxValuedObjectIn, getMaxValueIn, getObjectAtRank, getRank, getRank, getValueAtRank, printByRank, printByRank, printLowerK, printTopK, set, setRankOrder, setRankOrder, setRankOrder, setReverseRankOrder |
Methods inherited from class cc.mallet.types.FeatureVector |
alphabetsMatch, cloneMatrix, cloneMatrixZeroed, contains, getAlphabet, getAlphabets, getObjectIndices, location, newFeatureVector, toSimpFile, toString, toString, value |
Methods inherited from class cc.mallet.types.SparseVector |
absNorm, addTo, addTo, arrayCopyFrom, arrayCopyFrom, arrayCopyInto, dotProduct, dotProduct, dotProduct, dotProduct, extendedDotProduct, extendedDotProduct, getDimensions, getIndices, getNumDimensions, getValues, incrementValue, indexAtLocation, infinityNorm, isBinary, isInfinite, isNaN, isNaNOrInfinite, location, makeBinary, makeNonBinary, map, numLocations, oneNorm, plusEqualsSparse, plusEqualsSparse, print, removeDuplicates, setAll, setValue, setValueAtLocation, singleIndex, singleSize, singleToIndices, singleValue, sortIndices, timesEquals, timesEqualsSparse, timesEqualsSparse, timesEqualsSparseZero, twoNorm, value, value, valueAtLocation, vectorAdd |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
log2
public static final double log2
GainRatio
protected GainRatio(Alphabet dataAlphabet,
double[] gainRatios,
double[] splitPoints,
double baseEntropy,
LabelVector baseLabelDistribution,
int numSplitPointsForBestFeature,
int minNumInsts)
calcGainRatios
protected static java.lang.Object[] calcGainRatios(InstanceList ilist,
int[] instIndices,
int minNumInsts)
- Calculates gain ratios for all (feature, split point) pairs
snd returns array of:
1. gain ratios (each element is the max gain ratio of a feature
for those split points with at least average gain)
2. the optimal split point for each feature
3. the overall entropy
4. the overall label distribution of the given instances
5. the number of split points of the split feature.
sortInstances
public static int[] sortInstances(InstanceList ilist,
int[] instIndices,
int featureIndex)
createGainRatio
public static GainRatio createGainRatio(InstanceList ilist)
- Constructs a GainRatio object.
createGainRatio
public static GainRatio createGainRatio(InstanceList ilist,
int[] instIndices,
int minNumInsts)
- Constructs a GainRatio object
getMaxValuedThreshold
public double getMaxValuedThreshold()
- Returns:
- the threshold of the (feature, threshold)
pair with with maximum gain ratio
getThresholdAtRank
public double getThresholdAtRank(int rank)
- Returns:
- the threshold of the (feature, threshold)
pair with the given rank
getBaseEntropy
public double getBaseEntropy()
getBaseLabelDistribution
public LabelVector getBaseLabelDistribution()
getNumSplitPointsForBestFeature
public int getNumSplitPointsForBestFeature()