|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcc.mallet.extract.HierarchicalTokenizationFilter
public class HierarchicalTokenizationFilter
Tokenization filter that will create nested spans based on a hierarchical labeling of the data. The labels should be of the form LBL1[|LBLk]*. For example,
A A|B A|B|C A|B|C A|B A A w1 w2 w3 w4 w5 w6 w7will result in LabeledSpans like <A>w1 <B>w2 <C>w3 w4</C> w5</B> w6 w7</A> Also, labels of the form <B-field> will force a new instance of the field to begin, even if it is already active. And prefixes of I- are ignored so you can use BIO labeling. Created: Nov 12, 2004
| Constructor Summary | |
|---|---|
HierarchicalTokenizationFilter()
|
|
HierarchicalTokenizationFilter(java.util.regex.Pattern ignorePattern)
|
|
| Method Summary | |
|---|---|
LabeledSpans |
constructLabeledSpans(LabelAlphabet dict,
java.lang.Object document,
Label backgroundTag,
Tokenization input,
Sequence seq)
Converts a the sequence of labels into a set of labeled spans. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HierarchicalTokenizationFilter()
public HierarchicalTokenizationFilter(java.util.regex.Pattern ignorePattern)
| Method Detail |
|---|
public LabeledSpans constructLabeledSpans(LabelAlphabet dict,
java.lang.Object document,
Label backgroundTag,
Tokenization input,
Sequence seq)
TokenizationFilter
constructLabeledSpans in interface TokenizationFilter
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||