cc.mallet.types
Class PagedInstanceList
java.lang.Object
java.util.AbstractCollection<E>
java.util.AbstractList<E>
java.util.ArrayList<Instance>
cc.mallet.types.InstanceList
cc.mallet.types.PagedInstanceList
- All Implemented Interfaces:
- AlphabetCarrying, java.io.Serializable, java.lang.Cloneable, java.lang.Iterable<Instance>, java.util.Collection<Instance>, java.util.List<Instance>, java.util.RandomAccess
public class PagedInstanceList
- extends InstanceList
An InstanceList which avoids OutOfMemoryErrors by saving Instances
to disk when there is not enough memory to create a new
Instance. It implements a fixed-size paging scheme, where each page
on disk stores instancesPerPage
Instances. So, while
the number of Instances per pages is constant, the size in bytes of
each page may vary. Using this class instead of InstanceList means
the number of Instances you can store is essentially limited only
by disk size (and patience).
The paging scheme is optimized for the most frequent case of
looping through the InstanceList from index 0 to n. If there are n
instances, then instances 0->(n/size()) are stored together on page
1, instances (n/size)+1 -> 2*(n/size) are on page 2, ... etc. This
way, pages adjacent in the instances
list will usually
be in the same page.
- Author:
- Aron Culotta culotta@cs.umass.edu
- See Also:
InstanceList
,
Serialized Form
Fields inherited from class java.util.AbstractList |
modCount |
Constructor Summary |
PagedInstanceList(Pipe pipe,
int numPages,
int instancesPerPage)
|
PagedInstanceList(Pipe pipe,
int numPages,
int instancesPerPage,
java.io.File swapDir)
Creates a PagedInstanceList where "instancesPerPage" instances
are swapped to disk in directory "swapDir" if the amount of free
system memory drops below "minFreeMemory" bytes |
Methods inherited from class cc.mallet.types.InstanceList |
add, add, add, add, addAll, addAll, addThruPipe, addThruPipe, clone, cloneEmptyInto, crossValidationIterator, crossValidationIterator, getAlphabet, getAlphabets, getDataAlphabet, getDataClass, getFeatureSelection, getInstanceWeight, getInstanceWeight, getPerLabelFeatureSelection, getPipe, getTargetAlphabet, getTargetClass, hideSomeLabels, hideSomeLabels, noisify, remove, remove, removeSources, removeTargets, sampleWithInstanceWeights, sampleWithReplacement, sampleWithWeights, save, setFeatureSelection, setInstance, setInstanceWeight, setInstanceWeight, setPerLabelFeatureSelection, setPipe, shuffle, split, splitInOrder, splitInOrder, splitInTwoByModulo, subList, subList, targetLabelDistribution, unhideAllLabels |
Methods inherited from class java.util.ArrayList |
contains, ensureCapacity, indexOf, isEmpty, lastIndexOf, remove, removeRange, toArray, toArray, trimToSize |
Methods inherited from class java.util.AbstractList |
equals, hashCode, iterator, listIterator, listIterator |
Methods inherited from class java.util.AbstractCollection |
containsAll, removeAll, retainAll, toString |
Methods inherited from class java.lang.Object |
finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface java.lang.Iterable |
iterator |
Methods inherited from interface java.util.List |
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll |
PagedInstanceList
public PagedInstanceList(Pipe pipe,
int numPages,
int instancesPerPage,
java.io.File swapDir)
- Creates a PagedInstanceList where "instancesPerPage" instances
are swapped to disk in directory "swapDir" if the amount of free
system memory drops below "minFreeMemory" bytes
- Parameters:
pipe
- instance pipenumPages
- number of pages to keep in memoryinstancesPerPage
- number of Instances to store in each pageswapDir
- where the pages on disk live.
PagedInstanceList
public PagedInstanceList(Pipe pipe,
int numPages,
int instancesPerPage)
split
public InstanceList[] split(java.util.Random r,
double[] proportions)
- Shuffles the elements of this list among several smaller
lists. Overrides InstanceList.split to add instances in original
order, to prevent thrashing.
- Overrides:
split
in class InstanceList
- Parameters:
proportions
- A list of numbers (not necessarily summing to 1) which,
when normalized, correspond to the proportion of elements in each returned
sublist.r
- The source of randomness to use in shuffling.
- Returns:
- one
InstanceList
for each element of proportions
add
public boolean add(Instance instance)
- Appends the instance to this list. Note that since memory for
the Instance has already been allocated, no check is made to
catch OutOfMemoryError.
- Specified by:
add
in interface java.util.Collection<Instance>
- Specified by:
add
in interface java.util.List<Instance>
- Overrides:
add
in class InstanceList
- Returns:
true
if successful
get
public Instance get(int index)
- Returns the
Instance
at the specified index. If
this Instance is not in memory, swap a block of instances back
into memory.
- Specified by:
get
in interface java.util.List<Instance>
- Overrides:
get
in class java.util.ArrayList<Instance>
set
public Instance set(int index,
Instance instance)
- Replaces the
Instance
at position
index
with a new one. Note that this is the only
sanctioned way of changing an Instance.
- Specified by:
set
in interface java.util.List<Instance>
- Overrides:
set
in class InstanceList
getCollectGarbage
public boolean getCollectGarbage()
setCollectGarbage
public void setCollectGarbage(boolean b)
shallowClone
public InstanceList shallowClone()
- Overrides:
shallowClone
in class InstanceList
cloneEmpty
public InstanceList cloneEmpty()
- Overrides:
cloneEmpty
in class InstanceList
clear
public void clear()
- Specified by:
clear
in interface java.util.Collection<Instance>
- Specified by:
clear
in interface java.util.List<Instance>
- Overrides:
clear
in class InstanceList
getSwapIns
public int getSwapIns()
getSwapInTime
public long getSwapInTime()
getSwapOuts
public int getSwapOuts()
getSwapOutTime
public long getSwapOutTime()
size
public int size()
- Specified by:
size
in interface java.util.Collection<Instance>
- Specified by:
size
in interface java.util.List<Instance>
- Overrides:
size
in class java.util.ArrayList<Instance>
load
public static InstanceList load(java.io.File file)
- Constructs a new
InstanceList
, deserialized from
file
. If the string value of file
is
"-", then deserialize from System.in
.