cc.mallet.pipe (Mallet 2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package cc.mallet.pipe

Classes for processing arbitrary data into instances.

See:
Description

Class Summary
AddClassifierTokenPredictions	This pipe uses a Classifier to label each token (i.e., using 0-th order Markov assumption), then adds the predictions as features to each token.
AddClassifierTokenPredictions.TokenClassifiers	This inner class represents the trained token classifiers.
Array2FeatureVector	Converts a Java array of numerical types to a FeatureVector, where the Alphabet is the data array index wrapped in an Integer object.
AugmentableFeatureVectorAddConjunctions	Add specified conjunctions to each instance.
AugmentableFeatureVectorLogScale	Given an AugmentableFeatureVector, set those values greater than or equal to 1 to log(value)+1.
BranchingPipe	Deprecated.
CharSequence2CharNGrams	Transform a character sequence into a token sequence of character N grams.
CharSequence2TokenSequence	Pipe that tokenizes a character sequence.
CharSequenceArray2TokenSequence	Transform an array of character Sequences into a token sequence.
CharSequenceLowercase	Replace the data string with a lowercased version.
CharSequenceRemoveHTML	This pipe removes HTML from a CharSequence.
CharSequenceRemoveUUEncodedBlocks
CharSequenceReplace	Given a string, repeatedly look for matches of the regex, and replace the entire match with the given replacement string.
CharSubsequence	Given a string, return only the portion of the string inside a regex parenthesized group.
Classification2ConfidencePredictingFeatureVector	Pipe features from underlying classifier to the confidence prediction instance list
Csv2Array	Converts a string of comma separated values to an array.
Csv2FeatureVector	Converts a string of the form `feature_1:val_1 feature_2:val_2 ...`
Directory2FileIterator	Convert a File object representing a directory into a FileIterator which iterates over files in the directory matching a pattern and which extracts a label from each file path to become the target field of the instance.
FeatureCountPipe	Pruning low-count features can be a good way to save memory and computation.
FeatureDocFreqPipe	Pruning low-count features can be a good way to save memory and computation.
FeatureSequence2AugmentableFeatureVector	Convert the data field from a feature sequence to an augmentable feature vector.
FeatureSequence2FeatureVector	Convert the data field from a feature sequence to a feature vector.
FeatureSequenceConvolution
FeatureValueString2FeatureVector
FeatureVectorConjunctions	Include in the FeatureVector conjunctions of all its features.
FeatureVectorSequence2FeatureVectors	Given instances with a FeatureVectorSequence in the data field, break up the sequence into the individual FeatureVectors, producing one FeatureVector per Instance.
Filename2CharSequence	Given a filename contained in a string, read in contents of file into a CharSequence.
FilterEmptyFeatureVectors
Input2CharSequence	Pipe that can read from various kinds of text sources (either URI, File, or Reader) into a CharSequence
InstanceListTrimFeaturesByCount	Unimplemented.
LineGroupString2TokenSequence
MakeAmpersandXMLFriendly	convert & to &amp in tokens of a token sequence
Noop	A pipe that does nothing to the instance fields but which has side effects on the dictionary.
Pipe	The abstract superclass of all Pipes, which transform one data type to another.
PipeUtils	Created: Aug 28, 2005
PrintInput	Print the data field of each instance.
PrintInputAndTarget	Print the data and target fields of each instance.
PrintTokenSequenceFeatures	Print properties of the token sequence in the data field and the corresponding value of any token in a token sequence or feature in a featur sequence in the target field.
SaveDataInSource	Set the source field of each instance to its data field.
SelectiveSGML2TokenSequence	Similar to `SGML2TokenSequence`, except that only the tags listed in `allowedTags` are converted to `Label`s.
SerialPipes	Convert an instance through a sequence of pipes.
SGML2TokenSequence	Converts a string containing simple SGML tags into a dta TokenSequence of words, paired with a target TokenSequence containing the SGML tags in effect for each word.
SimpleTaggerSentence2StringTokenization	This extends `SimpleTaggerSentence2TokenSequence` to use {Slink StringTokenizations} for use with the extract package.
SimpleTaggerSentence2TokenSequence	Converts an external encoding of a sequence of elements with binary features to a `TokenSequence`.
SimpleTokenizer	A simple unicode tokenizer that accepts sequences of letters as tokens.
SourceLocation2TokenSequence	Read from File or BufferedRead in the data field and produce a TokenSequence.
StringAddNewLineDelimiter	Pipe that can adds special text between lines to explicitly represent line breaks.
StringList2FeatureSequence	Convert a list of strings into a feature sequence
SvmLight2FeatureVectorAndLabel	This Pipe converts a line in SVMLight format to a Mallet instance with FeatureVector data and Label target.
Target2FeatureSequence	Convert a token sequence in the target field into a feature sequence in the target field.
Target2Label	Convert object in the target field into a label in the target field.
Target2LabelSequence	convert a token sequence in the target field into a label sequence in the target field.
TargetRememberLastLabel	For each position in the target, remember the last non-background label.
TargetStringToFeatures
Token2FeatureVector	convert the property list on a token into a feature vector
TokenSequence2FeatureSequence	Convert the token sequence in the data field each instance to a feature sequence.
TokenSequence2FeatureSequenceWithBigrams	Convert the token sequence in the data field of each instance to a feature sequence that preserves bigram information.
TokenSequence2FeatureVectorSequence	Convert the token sequence in the data field of each instance to a feature vector sequence.
TokenSequence2TokenInstances
TokenSequenceLowercase	Convert the text in each token in the token sequence in the data field to lower case.
TokenSequenceMatchDataAndTarget	Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.
TokenSequenceNGrams	Convert the token sequence in the data field to a token sequence of ngrams.
TokenSequenceParseFeatureString	Convert the string in each field `Token.text` to a list of Strings (space delimited).
TokenSequenceRemoveNonAlpha	Remove tokens that contain non-alphabetic characters.
TokenSequenceRemoveStopwords	Remove tokens from the token sequence in the data field whose text is in the stopword list.

Exception Summary
PipeException

Package cc.mallet.pipe Description

Classes for processing arbitrary data into instances. Every class in this Directory should be a subclass of Pipe. Other classes should go in base.pipe.util.