Monday, June 1, 2009

opennlp.ccg.lexicon.MorphItem

This class parses the entry elements in morph.xml files:

entry @word @pos (@stem) (@class) (@coart) (@macros) (@excluded)

There are the following fields:

private Word surfaceWord;
private Word word;
private Word coartIndexingWord;
private String[] macros;
private String[] excluded;
private boolean coart = false;

The constructor is quite complicated. The value of 'macros' and 'excluded' is straightforward - you just split the value up into string tokens (macro names and excluded strings). The value of 'surfaceWord' comes from passing the value of the word attribute through the tokeniser in some strange way:

surfaceWord = Grammar.theGrammar.lexicon.tokenizer.parseToken(e.getAttributeValue("word"),coart);

The value of 'word' is derived as follows:

word = word.createFullWord(surfaceWord, 
                           e.getAttributeValue("stem"), 
                           e.getAttributeValue("pos"), 
                           null, 
                           e.getAttributeValue("class"));

No comments:

Post a Comment