Class PagedInstanceList

  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.ArrayList<Instance>
              extended by cc.mallet.types.InstanceList
                  extended by cc.mallet.types.PagedInstanceList
All Implemented Interfaces:
AlphabetCarrying,, java.lang.Cloneable, java.lang.Iterable<Instance>, java.util.Collection<Instance>, java.util.List<Instance>, java.util.RandomAccess

public class PagedInstanceList
extends InstanceList

An InstanceList which avoids OutOfMemoryErrors by saving Instances to disk when there is not enough memory to create a new Instance. It implements a fixed-size paging scheme, where each page on disk stores instancesPerPage Instances. So, while the number of Instances per pages is constant, the size in bytes of each page may vary. Using this class instead of InstanceList means the number of Instances you can store is essentially limited only by disk size (and patience). The paging scheme is optimized for the most frequent case of looping through the InstanceList from index 0 to n. If there are n instances, then instances 0->(n/size()) are stored together on page 1, instances (n/size)+1 -> 2*(n/size) are on page 2, ... etc. This way, pages adjacent in the instances list will usually be in the same page.

Aron Culotta
See Also:
InstanceList, Serialized Form

Nested Class Summary
Nested classes/interfaces inherited from class cc.mallet.types.InstanceList
Field Summary
Fields inherited from class cc.mallet.types.InstanceList
Fields inherited from class java.util.AbstractList
Constructor Summary
PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage)
PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage, swapDir)
          Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes
Method Summary
 boolean add(Instance instance)
          Appends the instance to this list.
 void clear()
 InstanceList cloneEmpty()
 Instance get(int index)
          Returns the Instance at the specified index.
 boolean getCollectGarbage()
 int getSwapIns()
 long getSwapInTime()
 int getSwapOuts()
 long getSwapOutTime()
static InstanceList load( file)
          Constructs a new InstanceList, deserialized from file.
 Instance set(int index, Instance instance)
          Replaces the Instance at position index with a new one.
 void setCollectGarbage(boolean b)
 InstanceList shallowClone()
 int size()
 InstanceList[] split(java.util.Random r, double[] proportions)
          Shuffles the elements of this list among several smaller lists.
Methods inherited from class cc.mallet.types.InstanceList
add, add, add, add, addAll, addAll, addThruPipe, addThruPipe, clone, cloneEmptyInto, crossValidationIterator, crossValidationIterator, getAlphabet, getAlphabets, getDataAlphabet, getDataClass, getFeatureSelection, getInstanceWeight, getInstanceWeight, getPerLabelFeatureSelection, getPipe, getTargetAlphabet, getTargetClass, hideSomeLabels, hideSomeLabels, noisify, remove, remove, removeSources, removeTargets, sampleWithInstanceWeights, sampleWithReplacement, sampleWithWeights, save, setFeatureSelection, setInstance, setInstanceWeight, setInstanceWeight, setPerLabelFeatureSelection, setPipe, shuffle, split, splitInOrder, splitInOrder, splitInTwoByModulo, subList, subList, targetLabelDistribution, unhideAllLabels
Methods inherited from class java.util.ArrayList
contains, ensureCapacity, indexOf, isEmpty, lastIndexOf, remove, removeRange, toArray, toArray, trimToSize
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator
Methods inherited from class java.util.AbstractCollection
containsAll, removeAll, retainAll, toString
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
Methods inherited from interface java.util.List
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll

Constructor Detail


public PagedInstanceList(Pipe pipe,
                         int numPages,
                         int instancesPerPage,
Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes

pipe - instance pipe
numPages - number of pages to keep in memory
instancesPerPage - number of Instances to store in each page
swapDir - where the pages on disk live.


public PagedInstanceList(Pipe pipe,
                         int numPages,
                         int instancesPerPage)
Method Detail


public InstanceList[] split(java.util.Random r,
                            double[] proportions)
Shuffles the elements of this list among several smaller lists. Overrides InstanceList.split to add instances in original order, to prevent thrashing.

split in class InstanceList
proportions - A list of numbers (not necessarily summing to 1) which, when normalized, correspond to the proportion of elements in each returned sublist.
r - The source of randomness to use in shuffling.
one InstanceList for each element of proportions


public boolean add(Instance instance)
Appends the instance to this list. Note that since memory for the Instance has already been allocated, no check is made to catch OutOfMemoryError.

Specified by:
add in interface java.util.Collection<Instance>
Specified by:
add in interface java.util.List<Instance>
add in class InstanceList
true if successful


public Instance get(int index)
Returns the Instance at the specified index. If this Instance is not in memory, swap a block of instances back into memory.

Specified by:
get in interface java.util.List<Instance>
get in class java.util.ArrayList<Instance>


public Instance set(int index,
                    Instance instance)
Replaces the Instance at position index with a new one. Note that this is the only sanctioned way of changing an Instance.

Specified by:
set in interface java.util.List<Instance>
set in class InstanceList


public boolean getCollectGarbage()


public void setCollectGarbage(boolean b)


public InstanceList shallowClone()
shallowClone in class InstanceList


public InstanceList cloneEmpty()
cloneEmpty in class InstanceList


public void clear()
Specified by:
clear in interface java.util.Collection<Instance>
Specified by:
clear in interface java.util.List<Instance>
clear in class InstanceList


public int getSwapIns()


public long getSwapInTime()


public int getSwapOuts()


public long getSwapOutTime()


public int size()
Specified by:
size in interface java.util.Collection<Instance>
Specified by:
size in interface java.util.List<Instance>
size in class java.util.ArrayList<Instance>


public static InstanceList load( file)
Constructs a new InstanceList, deserialized from file. If the string value of file is "-", then deserialize from