Wednesday, July 29, 2009

Lisa Michaud (2008): King Alfred ...

Lisa N. Michaud (2008): 'King Alfred: a translation environment for learners of Anglo-Saxon English' (Proceedings of 3rd ACL Workshop on Innovative Use of NLP for Building Educational Applications, 19-26, link).

MOTIVATION: Learning a dead language is a much simpler task than learning a living language - the student is just learning to read/translate, rather than to listen AND speak AND read AND write. Thus, the standard approaches taken to teaching living and dead languages differ - the former focuses on developing communicative/conversational fluency and the latter syntactic accuracy. From this perspective, it appears that the somewhat primitive state-of-the-art in language technology is more appropriate for learners of dead languages.

SUMMARY: King Alfred is an online tutoring system for developing Anglo-Saxon sentence translation skills. Students have access to: (a) a morpho-syntactic scratchpad for each sentence to be translated, complete with hints and corrections; (b) a complete glossary of Anglo-Saxon word forms; and (c) a complete set of statistics about their interactions with the scratchpad.

DETAILS: The UI has three independent 'tabs': (a) the workspace; (b) the glossary; and (c) user statistics. The workspace is the core of the system:

  1. The user is presented with an Anglo-Saxon sentence to translate, along with a text field to enter his/her translation (and a morpho-syntactic scratchpad - see below).
  2. The user works out the translation (in consultation with the morpho-syntactic scratchpad and the glossary tab), enters it into the text field, and then presses the 'submit' button.
  3. The system presents a screen containing: (a) the user's translation; (b) the instructor's model translation; (c) a 'rate your translation' bar for the user to select very poor/poor/good/very good/excellent; and (d) some tips about what morphosyntactic features the user needs to focus on (derived from his interactions with the scratchpad).
  4. The user clicks the 'submit and continue' button to go to the next sentence.

The morphosyntactic scratchpad allows the user to associate morphosyntactic features with individual words. In other words, the user can guess the part-of-speech (verb, noun, adjective etc.) as well as POS-dependent grammatical features (tense, person, number etc.). The system will tell the user when he makes an incorrect guess, tell him the correct answer on request, or give him a hint.

The glossary is an alphabetised list of WORD-FORMS (i.e. not lexemes), associated with morphosyntactic information. This information is of two types: (a) lexeme-based, e.g. POS, translation, class (strong/weak), declension (1st, 2nd, etc.); and (b) form-based, e.g. number, person, tense, mood.

User statistics are derived from a complete record of the user's interactions with the system, in particular the scratchpad, across all sessions (hence the need for individual user logins). Feedback (either high-detail or low-detail) is given about the user's strengths and weaknesses in terms of particular parts-of-speech and morpho-syntactic features. In the future, this information will ideally be used to 'tailor' the order in which sentences are presented to the user.

IMPLEMENTATION: For reasons of efficient storage, the sentences and glossary are organised into a fairly complicated database structure:

The ROOTS glossary is a list of all the 'lexemes' in the glossary:

roots
  > root+ @id @orthography @pos @definition
      > feature* @name @value

The WORDS glossary is a list of all the 'word-forms' in the glossary, cross-referenced to the ROOTS glossary:

words
  > word+ @id @orthography @root-id
      > feature* @name @value

The sentences corpus is a list of all the sentences to be translated, cross-referenced to the WORDS glossary:

sentences
  > sentence+ @translation
      > word+ @word-id @translation

Our approach, which doesn not rely on any kind of automatic morphological analyser/stemmer, offers pedagogical accuracy.

INSTRUCTOR INTERFACE: There is an online interface to allow the instructor to add sentences to be translated (and hence word-forms and roots to the glossary), without needing to manipulate the database directly.

AUTOMATIC TRANSLATION EVALUATION: We intend to adapt the n-gram-based BLEU metric for translation accuracy to give users automatic feedback about the quality of their translations. Our main modification to BLEU will be to ensure that serious errors are penalised more heavily than trivial errors, and in particular that errors of morpho-syntactic parsing (e.g. mistaking a past-tense form for a present-tense form) are treated most seriously of all.

CRITIQUE: Ignores the potential for group interaction, i.e. only one user at a time can interact with the tutor. No attempt at evaluation. The section on automatic translation evaluation is purely speculative. NO NLP TECHNOLOGIES ARE USED IN THE SYSTEM.

The King Alfred homepage and documentation; Lisa Michaud's homepage; Michael Drout's homepage.

No comments:

Post a Comment