|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjava.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.ArrayList<Instance>
cc.mallet.types.InstanceList
public class InstanceList
A list of machine learning instances, typically used for training or testing of a machine learning algorithm.
All of the instances in the list will have been passed through the
same Pipe, and thus must also share the same data and target Alphabets.
InstanceList keeps a reference to the pipe and the two alphabets.
The most common way of adding instances to an InstanceList is through
the add(PipeInputIterator) method. PipeInputIterators are a way of mapping general
data sources into instances suitable for processing through a pipe.
As each Instance is pulled from the PipeInputIterator, the InstanceList
copies the instance and runs the copy through its pipe (with resultant
destructive modifications) before saving the modified instance on its list.
This is the usual way in which instances are transformed by pipes.
InstanceList also contains methods for randomly generating lists of feature vectors; splitting lists into non-overlapping subsets (useful for test/train splits), and iterators for cross validation.
Instance,
Pipe,
Serialized Form| Nested Class Summary | |
|---|---|
class |
InstanceList.CrossValidationIterator
CrossValidationIterator allows iterating over pairs of
InstanceList, where each pair is split into training/testing
based on nfolds. |
| Field Summary | |
|---|---|
static java.lang.String |
TARGET_PROPERTY
|
| Fields inherited from class java.util.AbstractList |
|---|
modCount |
| Constructor Summary | |
|---|---|
InstanceList()
Deprecated. |
|
InstanceList(Alphabet dataAlphabet,
Alphabet targetAlphabet)
Construct an InstanceList with initial capacity of 10, with a Noop default pipe. |
|
InstanceList(Pipe pipe)
Construct an InstanceList with initial capacity of 10, with given default pipe. |
|
InstanceList(Pipe pipe,
int capacity)
Construct an InstanceList having given capacity, with given default pipe. |
|
InstanceList(Randoms r,
Alphabet vocab,
java.lang.String[] classNames,
int meanInstancesPerLabel)
|
|
InstanceList(Randoms r,
Dirichlet classCentroidDistribution,
double classCentroidAverageAlphaMean,
double classCentroidAverageAlphaVariance,
double featureVectorSizePoissonLambda,
double classInstanceCountPoissonLambda,
java.lang.String[] classNames)
Creates a list consisting of randomly-generated FeatureVectors. |
|
InstanceList(Randoms r,
int vocabSize,
int numClasses)
|
|
| Method Summary | |
|---|---|
boolean |
add(Instance instance)
Appends the instance to this list without passing the instance through the InstanceList's pipe. |
boolean |
add(Instance instance,
double instanceWeight)
Appends the instance to this list without passing it through this InstanceList's pipe, assigning it the specified weight. |
void |
add(int index,
Instance element)
|
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source)
Deprecated. Use trainingset.add (new Instance(data,target,name,source)) instead. |
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source,
double instanceWeight)
Deprecated. Use trainingset.addThruPipe (new Instance(data,target,name,source)) instead. |
boolean |
addAll(java.util.Collection<? extends Instance> instances)
|
boolean |
addAll(int index,
java.util.Collection<? extends Instance> c)
|
void |
addThruPipe(Instance inst)
Adds the input instance to this list, after passing it through the InstanceList's pipe. |
void |
addThruPipe(java.util.Iterator<Instance> ii)
Adds to this list every instance generated by the iterator, passing each one through this InstanceList's pipe. |
void |
clear()
|
java.lang.Object |
clone()
|
InstanceList |
cloneEmpty()
|
protected InstanceList |
cloneEmptyInto(InstanceList ret)
|
InstanceList.CrossValidationIterator |
crossValidationIterator(int nfolds)
|
InstanceList.CrossValidationIterator |
crossValidationIterator(int nfolds,
int seed)
|
Alphabet |
getAlphabet()
|
Alphabet[] |
getAlphabets()
|
Alphabet |
getDataAlphabet()
Returns the Alphabet mapping features of the data to
integers. |
java.lang.Class |
getDataClass()
Returns the Java Class 'data' field of Instances in this list. |
FeatureSelection |
getFeatureSelection()
|
double |
getInstanceWeight(Instance instance)
|
double |
getInstanceWeight(int index)
|
FeatureSelection[] |
getPerLabelFeatureSelection()
|
Pipe |
getPipe()
Returns the pipe through which each added Instance is passed,
which may be null. |
Alphabet |
getTargetAlphabet()
Returns the Alphabet mapping target output labels to
integers. |
java.lang.Class |
getTargetClass()
Returns the Java Class 'target' field of Instances in this list. |
void |
hideSomeLabels(java.util.BitSet bs)
|
void |
hideSomeLabels(double proportionToHide,
Randoms r)
|
static InstanceList |
load(java.io.File file)
Constructs a new InstanceList, deserialized from file. |
double |
noisify(double ratio)
Deprecated. |
boolean |
remove(Instance instance)
|
Instance |
remove(int index)
|
void |
removeSources()
Sets the "source" field to null in all instances. |
void |
removeTargets()
Sets the "target" field to null in all instances. |
InstanceList |
sampleWithInstanceWeights(java.util.Random r)
Deprecated. |
InstanceList |
sampleWithReplacement(java.util.Random r,
int numSamples)
|
InstanceList |
sampleWithWeights(java.util.Random r,
double[] weights)
Returns an InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights. |
void |
save(java.io.File file)
Saves this InstanceList to file. |
Instance |
set(int index,
Instance instance)
|
void |
setFeatureSelection(FeatureSelection selectedFeatures)
|
void |
setInstance(int index,
Instance instance)
Replaces the Instance at position index
with a new one. |
void |
setInstanceWeight(Instance instance,
double weight)
|
void |
setInstanceWeight(int index,
double weight)
|
void |
setPerLabelFeatureSelection(FeatureSelection[] selectedFeatures)
|
void |
setPipe(Pipe p)
Change the default Pipe associated with InstanceList. |
InstanceList |
shallowClone()
|
void |
shuffle(java.util.Random r)
|
InstanceList[] |
split(double[] proportions)
|
InstanceList[] |
split(java.util.Random r,
double[] proportions)
Shuffles the elements of this list among several smaller lists. |
InstanceList[] |
splitInOrder(double[] proportions)
Chops this list into several sequential sublists. |
InstanceList[] |
splitInOrder(int[] counts)
|
InstanceList[] |
splitInTwoByModulo(int m)
Returns a pair of new lists such that the first list in the pair contains every mth element of this list, starting with the first. |
InstanceList |
subList(double proportion)
|
InstanceList |
subList(int start,
int end)
|
LabelVector |
targetLabelDistribution()
|
void |
unhideAllLabels()
|
| Methods inherited from class java.util.ArrayList |
|---|
contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, size, toArray, toArray, trimToSize |
| Methods inherited from class java.util.AbstractList |
|---|
equals, hashCode, iterator, listIterator, listIterator |
| Methods inherited from class java.util.AbstractCollection |
|---|
containsAll, removeAll, retainAll, toString |
| Methods inherited from class java.lang.Object |
|---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface java.lang.Iterable |
|---|
iterator |
| Methods inherited from interface java.util.List |
|---|
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll |
| Field Detail |
|---|
public static final java.lang.String TARGET_PROPERTY
| Constructor Detail |
|---|
public InstanceList(Pipe pipe,
int capacity)
pipe - The default pipe used to process instances added via the addThruPipe methods.capacity - The initial capacity of the list; will grow further as necessary.public InstanceList(Pipe pipe)
pipe - The default pipe used to process instances added via the addThruPipe methods.
public InstanceList(Alphabet dataAlphabet,
Alphabet targetAlphabet)
InstanceList; for example, the creation of a
random InstanceList using Dirichlets and
Multinomials.
dataAlphabet - The vocabulary for added instances' data fieldstargetAlphabet - The vocabulary for added instances' targets@Deprecated public InstanceList()
public InstanceList(Randoms r,
Dirichlet classCentroidDistribution,
double classCentroidAverageAlphaMean,
double classCentroidAverageAlphaVariance,
double featureVectorSizePoissonLambda,
double classInstanceCountPoissonLambda,
java.lang.String[] classNames)
FeatureVectors.
public InstanceList(Randoms r,
Alphabet vocab,
java.lang.String[] classNames,
int meanInstancesPerLabel)
public InstanceList(Randoms r,
int vocabSize,
int numClasses)
| Method Detail |
|---|
public InstanceList shallowClone()
public java.lang.Object clone()
clone in class java.util.ArrayList<Instance>
public InstanceList subList(int start,
int end)
subList in interface java.util.List<Instance>subList in class java.util.AbstractList<Instance>public InstanceList subList(double proportion)
public void addThruPipe(java.util.Iterator<Instance> ii)
public void addThruPipe(Instance inst)
If several instances are to be added then accumulate them in a List\
@Deprecated
public boolean add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source,
double instanceWeight)
true
@Deprecated
public boolean add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source)
truepublic boolean add(Instance instance)
add in interface java.util.Collection<Instance>add in interface java.util.List<Instance>add in class java.util.ArrayList<Instance>true
public boolean add(Instance instance,
double instanceWeight)
true
public Instance set(int index,
Instance instance)
set in interface java.util.List<Instance>set in class java.util.ArrayList<Instance>
public void add(int index,
Instance element)
add in interface java.util.List<Instance>add in class java.util.ArrayList<Instance>public Instance remove(int index)
remove in interface java.util.List<Instance>remove in class java.util.ArrayList<Instance>public boolean remove(Instance instance)
public boolean addAll(java.util.Collection<? extends Instance> instances)
addAll in interface java.util.Collection<Instance>addAll in interface java.util.List<Instance>addAll in class java.util.ArrayList<Instance>
public boolean addAll(int index,
java.util.Collection<? extends Instance> c)
addAll in interface java.util.List<Instance>addAll in class java.util.ArrayList<Instance>public void clear()
clear in interface java.util.Collection<Instance>clear in interface java.util.List<Instance>clear in class java.util.ArrayList<Instance>@Deprecated public double noisify(double ratio)
public InstanceList cloneEmpty()
protected InstanceList cloneEmptyInto(InstanceList ret)
public void shuffle(java.util.Random r)
public InstanceList[] split(java.util.Random r,
double[] proportions)
proportions - A list of numbers (not necessarily summing to 1) which,
when normalized, correspond to the proportion of elements in each returned
sublist. This method (and all the split methods) do not transfer the Instance
weights to the resulting InstanceLists.r - The source of randomness to use in shuffling.
InstanceList for each element of proportionspublic InstanceList[] split(double[] proportions)
public InstanceList[] splitInOrder(double[] proportions)
proportions - A list of numbers corresponding to the proportion of
elements in each returned sublist. If not already normalized to sum to 1.0, it will be normalized here.
InstanceList for each element of proportionspublic InstanceList[] splitInOrder(int[] counts)
public InstanceList[] splitInTwoByModulo(int m)
mth element of this list, starting with the first.
The second list contains all remaining elements.
public InstanceList sampleWithReplacement(java.util.Random r,
int numSamples)
@Deprecated public InstanceList sampleWithInstanceWeights(java.util.Random r)
InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the instance weights.
The new instances all have their weights set to one.
public InstanceList sampleWithWeights(java.util.Random r,
double[] weights)
InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights.
The length of the weight array must be the same as the length of this list
The new instances all have their weights set to one.
public java.lang.Class getDataClass()
public java.lang.Class getTargetClass()
public void setInstance(int index,
Instance instance)
Instance at position index
with a new one.
public double getInstanceWeight(Instance instance)
public double getInstanceWeight(int index)
public void setInstanceWeight(int index,
double weight)
public void setInstanceWeight(Instance instance,
double weight)
public void setFeatureSelection(FeatureSelection selectedFeatures)
public FeatureSelection getFeatureSelection()
public void setPerLabelFeatureSelection(FeatureSelection[] selectedFeatures)
public FeatureSelection[] getPerLabelFeatureSelection()
public void removeTargets()
null in all instances. This makes unlabeled data.
public void removeSources()
null in all instances. This will often save memory when
the raw data had been placed in that field.
public static InstanceList load(java.io.File file)
InstanceList, deserialized from file. If the
string value of file is "-", then deserialize from System.in.
public void save(java.io.File file)
InstanceList to file.
If the string value of file is "-", then
serialize to System.out.
public Pipe getPipe()
Instance is passed,
which may be null.
public void setPipe(Pipe p)
public Alphabet getDataAlphabet()
Alphabet mapping features of the data to
integers.
public Alphabet getTargetAlphabet()
Alphabet mapping target output labels to
integers.
public Alphabet getAlphabet()
getAlphabet in interface AlphabetCarryingpublic Alphabet[] getAlphabets()
getAlphabets in interface AlphabetCarryingpublic LabelVector targetLabelDistribution()
public InstanceList.CrossValidationIterator crossValidationIterator(int nfolds,
int seed)
public InstanceList.CrossValidationIterator crossValidationIterator(int nfolds)
public void hideSomeLabels(double proportionToHide,
Randoms r)
public void hideSomeLabels(java.util.BitSet bs)
public void unhideAllLabels()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||