MalletMain Page | About | Help | FAQ | Special pages | Log in
Advanced Machine Learning for Language
Printable version | Disclaimers

SimpleTagger example

From Mallet

SimpleTagger is a command line interface to the MALLET Conditional Random Field(CRF) class. Here we present an extremely simple example showing the use of SimpleTagger to label a sequence of text. For a more general introduction, see this tutorial on conditional random fields (http://www.cs.umass.edu/~casutton/publications/crf-tutorial.pdf) by Sutton and McCallum (2006).

Your input file should be in the following format:

        Bill CAPITALIZED noun
        slept non-noun
        here LOWERCASE STOPWORD non-noun

That is, each line represents one token, and has the format:

 feature1 feature2 ... featuren label

Then you can train a CRF using SimpleTagger like this (on one line):


hough@gobur:~/tagger-test$ java -cp 
 "/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar"
 edu.umass.cs.mallet.base.fst.SimpleTagger
  --train true --model-file nouncrf  sample

This assumes that mallet has been installed and built in /home/hough/mallet. Note that we specify the MALLET build directory (/home/hough/mallet/class) and the necessary MALLET jar files (/home/hough/mallet/mallet-deps.jar) in the classpath. The --train true option specifies that we are training, and --model-file nouncrf specifies where we would like the CRF written to.

This produces a trained CRF in the file "nouncrf".

If we have a file "stest" we would like labelled:


CAPITAL Al
        slept
        here

we can do this with the CRF in file nouncrf by typing:


hough@gobur:~/tagger-test$ java -cp
"/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar"
 edu.umass.cs.mallet.base.fst.SimpleTagger
--model-file nouncrf  stest

which produces the following output:


Number of predicates: 5
noun CAPITAL Al
non-noun  slept
non-noun  here

A list of all the options available with SimpleTagger can be obtained by specifying the --help option:


hough@gobur:~/tagger-test$ java -cp
"/home/hough/mallet/class:/home/hough/mallet/lib/mallet-deps.jar"
 edu.umass.cs.mallet.base.fst.SimpleTagger
--help

Retrieved from "http://mallet.cs.umass.edu/index.php/SimpleTagger_example"

This page has been accessed 10130 times. This page was last modified 20:08, 9 Feb 2006.


Find
Navigation
Main Page
Community portal
Recent changes
Random page
Help
Donations
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Create an account or log in
Special pages
New pages
Image list
Statistics
Bug reports
More...