cc.mallet.types
Class PagedInstanceList

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.ArrayList<Instance>
              extended by cc.mallet.types.InstanceList
                  extended by cc.mallet.types.PagedInstanceList
All Implemented Interfaces:
AlphabetCarrying, java.io.Serializable, java.lang.Cloneable, java.lang.Iterable<Instance>, java.util.Collection<Instance>, java.util.List<Instance>, java.util.RandomAccess

public class PagedInstanceList
extends InstanceList

An InstanceList which avoids OutOfMemoryErrors by saving Instances to disk when there is not enough memory to create a new Instance. It implements a fixed-size paging scheme, where each page on disk stores instancesPerPage Instances. So, while the number of Instances per pages is constant, the size in bytes of each page may vary. Using this class instead of InstanceList means the number of Instances you can store is essentially limited only by disk size (and patience). The paging scheme is optimized for the most frequent case of looping through the InstanceList from index 0 to n. If there are n instances, then instances 0->(n/size()) are stored together on page 1, instances (n/size)+1 -> 2*(n/size) are on page 2, ... etc. This way, pages adjacent in the instances list will usually be in the same page.

Author:
Aron Culotta culotta@cs.umass.edu
See Also:
InstanceList, Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class cc.mallet.types.InstanceList
InstanceList.CrossValidationIterator
 
Field Summary
 
Fields inherited from class cc.mallet.types.InstanceList
TARGET_PROPERTY
 
Fields inherited from class java.util.AbstractList
modCount
 
Constructor Summary
PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage)
           
PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage, java.io.File swapDir)
          Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes
 
Method Summary
 boolean add(Instance instance)
          Appends the instance to this list.
 void clear()
           
 InstanceList cloneEmpty()
           
 Instance get(int index)
          Returns the Instance at the specified index.
 boolean getCollectGarbage()
           
 int getSwapIns()
           
 long getSwapInTime()
           
 int getSwapOuts()
           
 long getSwapOutTime()
           
static InstanceList load(java.io.File file)
          Constructs a new InstanceList, deserialized from file.
 Instance set(int index, Instance instance)
          Replaces the Instance at position index with a new one.
 void setCollectGarbage(boolean b)
           
 InstanceList shallowClone()
           
 int size()
           
 InstanceList[] split(java.util.Random r, double[] proportions)
          Shuffles the elements of this list among several smaller lists.
 
Methods inherited from class cc.mallet.types.InstanceList
add, add, add, add, addAll, addAll, addThruPipe, addThruPipe, clone, cloneEmptyInto, crossValidationIterator, crossValidationIterator, getAlphabet, getAlphabets, getDataAlphabet, getDataClass, getFeatureSelection, getInstanceWeight, getInstanceWeight, getPerLabelFeatureSelection, getPipe, getTargetAlphabet, getTargetClass, hideSomeLabels, hideSomeLabels, noisify, remove, remove, removeSources, removeTargets, sampleWithInstanceWeights, sampleWithReplacement, sampleWithWeights, save, setFeatureSelection, setInstance, setInstanceWeight, setInstanceWeight, setPerLabelFeatureSelection, setPipe, shuffle, split, splitInOrder, splitInOrder, splitInTwoByModulo, subList, subList, targetLabelDistribution, unhideAllLabels
 
Methods inherited from class java.util.ArrayList
contains, ensureCapacity, indexOf, isEmpty, lastIndexOf, remove, removeRange, toArray, toArray, trimToSize
 
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator
 
Methods inherited from class java.util.AbstractCollection
containsAll, removeAll, retainAll, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.lang.Iterable
iterator
 
Methods inherited from interface java.util.List
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll
 

Constructor Detail

PagedInstanceList

public PagedInstanceList(Pipe pipe,
                         int numPages,
                         int instancesPerPage,
                         java.io.File swapDir)
Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes

Parameters:
pipe - instance pipe
numPages - number of pages to keep in memory
instancesPerPage - number of Instances to store in each page
swapDir - where the pages on disk live.

PagedInstanceList

public PagedInstanceList(Pipe pipe,
                         int numPages,
                         int instancesPerPage)
Method Detail

split

public InstanceList[] split(java.util.Random r,
                            double[] proportions)
Shuffles the elements of this list among several smaller lists. Overrides InstanceList.split to add instances in original order, to prevent thrashing.

Overrides:
split in class InstanceList
Parameters:
proportions - A list of numbers (not necessarily summing to 1) which, when normalized, correspond to the proportion of elements in each returned sublist.
r - The source of randomness to use in shuffling.
Returns:
one InstanceList for each element of proportions

add

public boolean add(Instance instance)
Appends the instance to this list. Note that since memory for the Instance has already been allocated, no check is made to catch OutOfMemoryError.

Specified by:
add in interface java.util.Collection<Instance>
Specified by:
add in interface java.util.List<Instance>
Overrides:
add in class InstanceList
Returns:
true if successful

get

public Instance get(int index)
Returns the Instance at the specified index. If this Instance is not in memory, swap a block of instances back into memory.

Specified by:
get in interface java.util.List<Instance>
Overrides:
get in class java.util.ArrayList<Instance>

set

public Instance set(int index,
                    Instance instance)
Replaces the Instance at position index with a new one. Note that this is the only sanctioned way of changing an Instance.

Specified by:
set in interface java.util.List<Instance>
Overrides:
set in class InstanceList

getCollectGarbage

public boolean getCollectGarbage()

setCollectGarbage

public void setCollectGarbage(boolean b)

shallowClone

public InstanceList shallowClone()
Overrides:
shallowClone in class InstanceList

cloneEmpty

public InstanceList cloneEmpty()
Overrides:
cloneEmpty in class InstanceList

clear

public void clear()
Specified by:
clear in interface java.util.Collection<Instance>
Specified by:
clear in interface java.util.List<Instance>
Overrides:
clear in class InstanceList

getSwapIns

public int getSwapIns()

getSwapInTime

public long getSwapInTime()

getSwapOuts

public int getSwapOuts()

getSwapOutTime

public long getSwapOutTime()

size

public int size()
Specified by:
size in interface java.util.Collection<Instance>
Specified by:
size in interface java.util.List<Instance>
Overrides:
size in class java.util.ArrayList<Instance>

load

public static InstanceList load(java.io.File file)
Constructs a new InstanceList, deserialized from file. If the string value of file is "-", then deserialize from System.in.