|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object java.util.AbstractCollection<E> java.util.AbstractList<E> java.util.ArrayList<Instance> cc.mallet.types.InstanceList
public class InstanceList
A list of machine learning instances, typically used for training or testing of a machine learning algorithm.
All of the instances in the list will have been passed through the
same Pipe
, and thus must also share the same data and target Alphabets.
InstanceList keeps a reference to the pipe and the two alphabets.
The most common way of adding instances to an InstanceList is through
the add(PipeInputIterator)
method. PipeInputIterators are a way of mapping general
data sources into instances suitable for processing through a pipe.
As each Instance
is pulled from the PipeInputIterator, the InstanceList
copies the instance and runs the copy through its pipe (with resultant
destructive modifications) before saving the modified instance on its list.
This is the usual way in which instances are transformed by pipes.
InstanceList also contains methods for randomly generating lists of feature vectors; splitting lists into non-overlapping subsets (useful for test/train splits), and iterators for cross validation.
Instance
,
Pipe
,
Serialized FormNested Class Summary | |
---|---|
class |
InstanceList.CrossValidationIterator
CrossValidationIterator allows iterating over pairs of
InstanceList , where each pair is split into training/testing
based on nfolds. |
Field Summary | |
---|---|
static java.lang.String |
TARGET_PROPERTY
|
Fields inherited from class java.util.AbstractList |
---|
modCount |
Constructor Summary | |
---|---|
InstanceList()
Deprecated. |
|
InstanceList(Alphabet dataAlphabet,
Alphabet targetAlphabet)
Construct an InstanceList with initial capacity of 10, with a Noop default pipe. |
|
InstanceList(Pipe pipe)
Construct an InstanceList with initial capacity of 10, with given default pipe. |
|
InstanceList(Pipe pipe,
int capacity)
Construct an InstanceList having given capacity, with given default pipe. |
|
InstanceList(Randoms r,
Alphabet vocab,
java.lang.String[] classNames,
int meanInstancesPerLabel)
|
|
InstanceList(Randoms r,
Dirichlet classCentroidDistribution,
double classCentroidAverageAlphaMean,
double classCentroidAverageAlphaVariance,
double featureVectorSizePoissonLambda,
double classInstanceCountPoissonLambda,
java.lang.String[] classNames)
Creates a list consisting of randomly-generated FeatureVector s. |
|
InstanceList(Randoms r,
int vocabSize,
int numClasses)
|
Method Summary | |
---|---|
boolean |
add(Instance instance)
Appends the instance to this list without passing the instance through the InstanceList's pipe. |
boolean |
add(Instance instance,
double instanceWeight)
Appends the instance to this list without passing it through this InstanceList's pipe, assigning it the specified weight. |
void |
add(int index,
Instance element)
|
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source)
Deprecated. Use trainingset.add (new Instance(data,target,name,source)) instead. |
boolean |
add(java.lang.Object data,
java.lang.Object target,
java.lang.Object name,
java.lang.Object source,
double instanceWeight)
Deprecated. Use trainingset.addThruPipe (new Instance(data,target,name,source)) instead. |
boolean |
addAll(java.util.Collection<? extends Instance> instances)
|
boolean |
addAll(int index,
java.util.Collection<? extends Instance> c)
|
void |
addThruPipe(Instance inst)
Adds the input instance to this list, after passing it through the InstanceList's pipe. |
void |
addThruPipe(java.util.Iterator<Instance> ii)
Adds to this list every instance generated by the iterator, passing each one through this InstanceList's pipe. |
void |
clear()
|
java.lang.Object |
clone()
|
InstanceList |
cloneEmpty()
|
protected InstanceList |
cloneEmptyInto(InstanceList ret)
|
InstanceList.CrossValidationIterator |
crossValidationIterator(int nfolds)
|
InstanceList.CrossValidationIterator |
crossValidationIterator(int nfolds,
int seed)
|
Alphabet |
getAlphabet()
|
Alphabet[] |
getAlphabets()
|
Alphabet |
getDataAlphabet()
Returns the Alphabet mapping features of the data to
integers. |
java.lang.Class |
getDataClass()
Returns the Java Class 'data' field of Instances in this list. |
FeatureSelection |
getFeatureSelection()
|
double |
getInstanceWeight(Instance instance)
|
double |
getInstanceWeight(int index)
|
FeatureSelection[] |
getPerLabelFeatureSelection()
|
Pipe |
getPipe()
Returns the pipe through which each added Instance is passed,
which may be null . |
Alphabet |
getTargetAlphabet()
Returns the Alphabet mapping target output labels to
integers. |
java.lang.Class |
getTargetClass()
Returns the Java Class 'target' field of Instances in this list. |
void |
hideSomeLabels(java.util.BitSet bs)
|
void |
hideSomeLabels(double proportionToHide,
Randoms r)
|
static InstanceList |
load(java.io.File file)
Constructs a new InstanceList , deserialized from file . |
double |
noisify(double ratio)
Deprecated. |
boolean |
remove(Instance instance)
|
Instance |
remove(int index)
|
void |
removeSources()
Sets the "source" field to null in all instances. |
void |
removeTargets()
Sets the "target" field to null in all instances. |
InstanceList |
sampleWithInstanceWeights(java.util.Random r)
Deprecated. |
InstanceList |
sampleWithReplacement(java.util.Random r,
int numSamples)
|
InstanceList |
sampleWithWeights(java.util.Random r,
double[] weights)
Returns an InstanceList of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights. |
void |
save(java.io.File file)
Saves this InstanceList to file . |
Instance |
set(int index,
Instance instance)
|
void |
setFeatureSelection(FeatureSelection selectedFeatures)
|
void |
setInstance(int index,
Instance instance)
Replaces the Instance at position index
with a new one. |
void |
setInstanceWeight(Instance instance,
double weight)
|
void |
setInstanceWeight(int index,
double weight)
|
void |
setPerLabelFeatureSelection(FeatureSelection[] selectedFeatures)
|
void |
setPipe(Pipe p)
Change the default Pipe associated with InstanceList. |
InstanceList |
shallowClone()
|
void |
shuffle(java.util.Random r)
|
InstanceList[] |
split(double[] proportions)
|
InstanceList[] |
split(java.util.Random r,
double[] proportions)
Shuffles the elements of this list among several smaller lists. |
InstanceList[] |
splitInOrder(double[] proportions)
Chops this list into several sequential sublists. |
InstanceList[] |
splitInOrder(int[] counts)
|
InstanceList[] |
splitInTwoByModulo(int m)
Returns a pair of new lists such that the first list in the pair contains every m th element of this list, starting with the first. |
InstanceList |
subList(double proportion)
|
InstanceList |
subList(int start,
int end)
|
LabelVector |
targetLabelDistribution()
|
void |
unhideAllLabels()
|
Methods inherited from class java.util.ArrayList |
---|
contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, removeRange, size, toArray, toArray, trimToSize |
Methods inherited from class java.util.AbstractList |
---|
equals, hashCode, iterator, listIterator, listIterator |
Methods inherited from class java.util.AbstractCollection |
---|
containsAll, removeAll, retainAll, toString |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.lang.Iterable |
---|
iterator |
Methods inherited from interface java.util.List |
---|
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll |
Field Detail |
---|
public static final java.lang.String TARGET_PROPERTY
Constructor Detail |
---|
public InstanceList(Pipe pipe, int capacity)
pipe
- The default pipe used to process instances added via the addThruPipe methods.capacity
- The initial capacity of the list; will grow further as necessary.public InstanceList(Pipe pipe)
pipe
- The default pipe used to process instances added via the addThruPipe methods.public InstanceList(Alphabet dataAlphabet, Alphabet targetAlphabet)
InstanceList
; for example, the creation of a
random InstanceList
using Dirichlet
s and
Multinomial
s.
dataAlphabet
- The vocabulary for added instances' data fieldstargetAlphabet
- The vocabulary for added instances' targets@Deprecated public InstanceList()
public InstanceList(Randoms r, Dirichlet classCentroidDistribution, double classCentroidAverageAlphaMean, double classCentroidAverageAlphaVariance, double featureVectorSizePoissonLambda, double classInstanceCountPoissonLambda, java.lang.String[] classNames)
FeatureVector
s.
public InstanceList(Randoms r, Alphabet vocab, java.lang.String[] classNames, int meanInstancesPerLabel)
public InstanceList(Randoms r, int vocabSize, int numClasses)
Method Detail |
---|
public InstanceList shallowClone()
public java.lang.Object clone()
clone
in class java.util.ArrayList<Instance>
public InstanceList subList(int start, int end)
subList
in interface java.util.List<Instance>
subList
in class java.util.AbstractList<Instance>
public InstanceList subList(double proportion)
public void addThruPipe(java.util.Iterator<Instance> ii)
public void addThruPipe(Instance inst)
If several instances are to be added then accumulate them in a List\
@Deprecated public boolean add(java.lang.Object data, java.lang.Object target, java.lang.Object name, java.lang.Object source, double instanceWeight)
true
@Deprecated public boolean add(java.lang.Object data, java.lang.Object target, java.lang.Object name, java.lang.Object source)
true
public boolean add(Instance instance)
add
in interface java.util.Collection<Instance>
add
in interface java.util.List<Instance>
add
in class java.util.ArrayList<Instance>
true
public boolean add(Instance instance, double instanceWeight)
true
public Instance set(int index, Instance instance)
set
in interface java.util.List<Instance>
set
in class java.util.ArrayList<Instance>
public void add(int index, Instance element)
add
in interface java.util.List<Instance>
add
in class java.util.ArrayList<Instance>
public Instance remove(int index)
remove
in interface java.util.List<Instance>
remove
in class java.util.ArrayList<Instance>
public boolean remove(Instance instance)
public boolean addAll(java.util.Collection<? extends Instance> instances)
addAll
in interface java.util.Collection<Instance>
addAll
in interface java.util.List<Instance>
addAll
in class java.util.ArrayList<Instance>
public boolean addAll(int index, java.util.Collection<? extends Instance> c)
addAll
in interface java.util.List<Instance>
addAll
in class java.util.ArrayList<Instance>
public void clear()
clear
in interface java.util.Collection<Instance>
clear
in interface java.util.List<Instance>
clear
in class java.util.ArrayList<Instance>
@Deprecated public double noisify(double ratio)
public InstanceList cloneEmpty()
protected InstanceList cloneEmptyInto(InstanceList ret)
public void shuffle(java.util.Random r)
public InstanceList[] split(java.util.Random r, double[] proportions)
proportions
- A list of numbers (not necessarily summing to 1) which,
when normalized, correspond to the proportion of elements in each returned
sublist. This method (and all the split methods) do not transfer the Instance
weights to the resulting InstanceLists.r
- The source of randomness to use in shuffling.
InstanceList
for each element of proportions
public InstanceList[] split(double[] proportions)
public InstanceList[] splitInOrder(double[] proportions)
proportions
- A list of numbers corresponding to the proportion of
elements in each returned sublist. If not already normalized to sum to 1.0, it will be normalized here.
InstanceList
for each element of proportions
public InstanceList[] splitInOrder(int[] counts)
public InstanceList[] splitInTwoByModulo(int m)
m
th element of this list, starting with the first.
The second list contains all remaining elements.
public InstanceList sampleWithReplacement(java.util.Random r, int numSamples)
@Deprecated public InstanceList sampleWithInstanceWeights(java.util.Random r)
InstanceList
of the same size, where the instances come from the
random sampling (with replacement) of this list using the instance weights.
The new instances all have their weights set to one.
public InstanceList sampleWithWeights(java.util.Random r, double[] weights)
InstanceList
of the same size, where the instances come from the
random sampling (with replacement) of this list using the given weights.
The length of the weight array must be the same as the length of this list
The new instances all have their weights set to one.
public java.lang.Class getDataClass()
public java.lang.Class getTargetClass()
public void setInstance(int index, Instance instance)
Instance
at position index
with a new one.
public double getInstanceWeight(Instance instance)
public double getInstanceWeight(int index)
public void setInstanceWeight(int index, double weight)
public void setInstanceWeight(Instance instance, double weight)
public void setFeatureSelection(FeatureSelection selectedFeatures)
public FeatureSelection getFeatureSelection()
public void setPerLabelFeatureSelection(FeatureSelection[] selectedFeatures)
public FeatureSelection[] getPerLabelFeatureSelection()
public void removeTargets()
null
in all instances. This makes unlabeled data.
public void removeSources()
null
in all instances. This will often save memory when
the raw data had been placed in that field.
public static InstanceList load(java.io.File file)
InstanceList
, deserialized from file
. If the
string value of file
is "-", then deserialize from System.in
.
public void save(java.io.File file)
InstanceList
to file
.
If the string value of file
is "-", then
serialize to System.out
.
public Pipe getPipe()
Instance
is passed,
which may be null
.
public void setPipe(Pipe p)
public Alphabet getDataAlphabet()
Alphabet
mapping features of the data to
integers.
public Alphabet getTargetAlphabet()
Alphabet
mapping target output labels to
integers.
public Alphabet getAlphabet()
getAlphabet
in interface AlphabetCarrying
public Alphabet[] getAlphabets()
getAlphabets
in interface AlphabetCarrying
public LabelVector targetLabelDistribution()
public InstanceList.CrossValidationIterator crossValidationIterator(int nfolds, int seed)
public InstanceList.CrossValidationIterator crossValidationIterator(int nfolds)
public void hideSomeLabels(double proportionToHide, Randoms r)
public void hideSomeLabels(java.util.BitSet bs)
public void unhideAllLabels()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |