cc.mallet.types
Class Alphabet

java.lang.Object
  extended by cc.mallet.types.Alphabet
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
LabelAlphabet

public class Alphabet
extends java.lang.Object
implements java.io.Serializable

A mapping between integers and objects where the mapping in each direction is efficient. Integers are assigned consecutively, starting at zero, as objects are added to the Alphabet. Objects can not be deleted from the Alphabet and thus the integers are never reused.

The most common use of an alphabet is as a dictionary of feature names associated with a FeatureVector in an Instance. In a simple document classification usage, each unique word in a document would be a unique entry in the Alphabet with a unique integer associated with it. FeatureVectors rely on the integer part of the mapping to efficiently represent the subset of the Alphabet present in the FeatureVector.

See Also:
FeatureVector, Instance, Pipe, Serialized Form

Constructor Summary
Alphabet()
           
Alphabet(java.lang.Class entryClass)
           
Alphabet(int capacity)
           
Alphabet(int capacity, java.lang.Class entryClass)
           
Alphabet(java.lang.Object[] entries)
           
 
Method Summary
static boolean alphabetsMatch(AlphabetCarrying object1, AlphabetCarrying object2)
          Convenience method that can often implement alphabetsMatch in classes that implement the AlphabetsCarrying interface.
 java.lang.Object clone()
           
 boolean contains(java.lang.Object entry)
           
 void dump()
           
 void dump(java.io.PrintStream out)
           
 void dump(java.io.PrintWriter out)
           
 java.lang.Class entryClass()
           
 java.rmi.dgc.VMID getInstanceId()
           
 boolean growthStopped()
           
 java.util.Iterator iterator()
           
 int lookupIndex(java.lang.Object entry)
           
 int lookupIndex(java.lang.Object entry, boolean addIfNotPresent)
          Return -1 if entry isn't present.
 int[] lookupIndices(java.lang.Object[] objects, boolean addIfNotPresent)
           
 java.lang.Object lookupObject(int index)
           
 java.lang.Object[] lookupObjects(int[] indices)
           
 java.lang.Object[] lookupObjects(int[] indices, java.lang.Object[] buf)
          Returns an array of the objects corresponding to
 java.lang.Object readResolve()
          This gets called after readObject; it lets the object decide whether to return itself or return a previously read in version.
 void setInstanceId(java.rmi.dgc.VMID id)
           
 int size()
           
 void startGrowth()
           
 void stopGrowth()
           
 java.lang.Object[] toArray()
           
 java.lang.Object[] toArray(java.lang.Object[] in)
          Returns an array containing all the entries in the Alphabet.
 java.lang.String toString()
          Return String representation of all Alphabet entries, each separated by a newline.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Alphabet

public Alphabet(int capacity,
                java.lang.Class entryClass)

Alphabet

public Alphabet(java.lang.Class entryClass)

Alphabet

public Alphabet(int capacity)

Alphabet

public Alphabet()

Alphabet

public Alphabet(java.lang.Object[] entries)
Method Detail

clone

public java.lang.Object clone()
Overrides:
clone in class java.lang.Object

lookupIndex

public int lookupIndex(java.lang.Object entry,
                       boolean addIfNotPresent)
Return -1 if entry isn't present.


lookupIndex

public int lookupIndex(java.lang.Object entry)

lookupObject

public java.lang.Object lookupObject(int index)

toArray

public java.lang.Object[] toArray()

toArray

public java.lang.Object[] toArray(java.lang.Object[] in)
Returns an array containing all the entries in the Alphabet. The runtime type of the returned array is the runtime type of in. If in is large enough to hold everything in the alphabet, then it it used. The returned array is such that for all entries obj, ret[lookupIndex(obj)] = obj .


iterator

public java.util.Iterator iterator()

lookupObjects

public java.lang.Object[] lookupObjects(int[] indices)

lookupObjects

public java.lang.Object[] lookupObjects(int[] indices,
                                        java.lang.Object[] buf)
Returns an array of the objects corresponding to

Parameters:
indices - An array of indices to look up
buf - An array to store the returned objects in.
Returns:
An array of values from this Alphabet. The runtime type of the array is the same as buf

lookupIndices

public int[] lookupIndices(java.lang.Object[] objects,
                           boolean addIfNotPresent)

contains

public boolean contains(java.lang.Object entry)

size

public int size()

stopGrowth

public void stopGrowth()

startGrowth

public void startGrowth()

growthStopped

public boolean growthStopped()

entryClass

public java.lang.Class entryClass()

toString

public java.lang.String toString()
Return String representation of all Alphabet entries, each separated by a newline.

Overrides:
toString in class java.lang.Object

dump

public void dump()

dump

public void dump(java.io.PrintStream out)

dump

public void dump(java.io.PrintWriter out)

alphabetsMatch

public static boolean alphabetsMatch(AlphabetCarrying object1,
                                     AlphabetCarrying object2)
Convenience method that can often implement alphabetsMatch in classes that implement the AlphabetsCarrying interface.


getInstanceId

public java.rmi.dgc.VMID getInstanceId()

setInstanceId

public void setInstanceId(java.rmi.dgc.VMID id)

readResolve

public java.lang.Object readResolve()
                             throws java.io.ObjectStreamException
This gets called after readObject; it lets the object decide whether to return itself or return a previously read in version. We use a hashMap of instanceIds to determine if we have already read in this object.

Returns:
Throws:
java.io.ObjectStreamException