Sunday, May 31, 2009

A TTS system for Early Modern English

Among the public at large, there is little understanding of the fact that languages change constantly, evolving every generation into a new language. In English, this is fueled by the fact that we can read English texts written 400 years ago without any difficulty - written English has remained more or less unchanged over the centuries, masking the vast changes in the spoken language over the same period, a situation which also explains the massive degree of irregularity in modern day English orthography.

The idea here is to build a simple, interactive text-to-speech system for Early Modern English, which will read out English input sentences as they would have been pronounced by a contemporary of Shakespeare. The system will be based, as far as possible, on simple letter-to-sound rules (i.e. with the lexicon kept as small as possible). This project could be seen as the first step in a programme to use language technology to develop awareness of the changes that have taken place in English since the Anglo-Saxon invasion, for example by building a machine translation system between Modern English and Old English (with Old English TTS for output).

Friday, May 29, 2009

opennlp.ccg.TextCCG

This is the top-level class for the OpenCCG parser. It contains the main method. Here is the important stuff (assuming a default grammar file and no input parameters, and ignoring special input commands when running the tccg loop):

String grammarfile = "grammar.xml";
URL grammarURL = new File(grammarFile).toURL();
Grammar grammar = new Grammar(grammarURL);
Parser parser = new Parser(grammar);
Realizer realizer = new Realizer(grammar);
LineReader lineReader = ...
while (true) {
    String input = lineReader.readLine("tccg> ");
    try {
        parse.parse(input.trim());
        List parses = parser.getResult();
        ...
    }
    catch (ParseException pe) System.out.println(pe); 
}

See also opennlp.ccg.grammar.Grammar.

Deliverables

Each project should have a clear list of 'deliverables', each one specifying a tangible output. Each deliverable must be carefully qualified/quantified (e.g. with reference to industry standards), so that it is clear whether it has been delivered or not.

Formulating a list of deliverables presents another valuable opportunity for stakeholder interaction: (a) formulate an initial list of deliverables; (b) circulate this around the stakeholders; (c) ask them to indicate for each proposed deliverable whether they require it or not; (d) ask them if there are any deliverables they want which are not on the list. This is a good way of establishing consensus on the 'scope' or the project.

stakeholders
1234...
deliverablesAYNYY...
BNYNY...
CNYYN...
..................

The list of deliverables should be finalised during the planning and analysis phase of the project (i.e. before the decision whether or not to go ahead with the project has been taken), and the finished list forms part of the Project Initiation Document.

Goal statement

Every project should have a clear and concise 'goal statement', setting out the high-level objectives of the project in no more than 30 words. The goal statement should be written during the project analysis and planning stage, with input from as many stakeholders as is practical, and forms part of the Project Initiation Document. The goal statement should answer the following questions:

  • Who is going to do the project?
  • What are they going to do?
  • Why are they going to do it?
  • Where are they going to do it?
  • When are they going to do it?

One question that should not be answered in the goal statement is: How are they going to do it?

The best way to go about formulating a goal statement is as follows: (a) get the key stakeholders together to brainstorm; (b) come up with a goal statement that includes everyone's priorities for the project; (c) make them cut it down to 30 words. In short, the process of formulating a goal statement is at least as valuable as the finished goal statement itself, being a vital opportunity for team-building.

The goal statement is the 'vision' for the project.

Definition of a project

A project is a temporary endeavour resulting in a unique product or service.

The key words here are 'temporary' and 'unique'.

Tuesday, May 26, 2009

Locatives as events

A place PP like 'in Paris' denotes a kind of thing/event - the set of: (a) things which are located in Paris; and (b) events which take place in Paris. In other words, a place function like IN converts a thing, PARIS, into an event, IN(PARIS). The denotation of 'church in Paris' is the intersection of the denotata of 'church' and 'in Paris'. The denotation of 'Kim slept in Paris' is the intersection of the denotata of 'Kim slept' and 'in Paris'.

1. 'Kim slept' - λe.sleep(e,Kim)

2. 'in Paris' - λPλf.Pf & at(f,in(Paris))

3. 'Kim slept in Paris' - λf.sleep(f,Kim) & at(f,in(Paris))

The same goes for path PPs like 'into Paris' - this denotes the set of motion events which occur along a trajectory which originates outwith and terminates within Paris:

1. 'Kim is driving' - λe.drive(e,Kim)

2. 'into Paris' - λPλf.Pf & to(f,in(Paris))

3. 'Kim is driving into Paris' - λf.drive(f,Kim) & to(f,in(Paris))

But I'm not sure how to handle an argument PP, as in 'Kim put the book into the box'. What about: λe.put(e,Kim,book) & to(e,in(box)), where put(e,x,y) entails go(e,y).

Monday, May 25, 2009

Jorgensen and Lonning 1

Section 1 ('Introduction') of Fredrik Jorgensen and Jan Tore Lonning (2008): A Minimal Recursion Semantic Analysis of Locatives (CL 35(2):229-270)

1. 'Kim slept [in Paris]' - this is a static locative, which locates the whole event.

2. 'Kim is driving into Paris' - this is a directional locative, which specifies the trajectory of the motion.

3. 'The mouse ran [under the table]' - this locative is ambiguous between static (cf. 'The mouse ran around [under the table]') and directional (cf. 'The mouse ran under the table and stayed there' or 'The mouse ran under the table into a hole in the wall').

4. 'A mouse appeared [from under the table]' - this locative is directional, but specifies the source of the trajectory, rather than the goal or some midpoint.

5. 'Kim put the book [on/onto the table]' - the locative can be either (intrinsically) static or directional, but is always interpreted as a goal (thanks to the verb meaning).

6. 'A child ran [down] [under the bridge]' - here we have two directional locatives, one intransitive, one transitive. How can their semantics be composed?

Friday, May 22, 2009

XPath 1.0 node tests

Previously, I discussed axes. Here I turn to node tests. Here is the syntax:

  • every XML name is a node test
  • * is a node test
  • node() and text() are node tests

There is a satisfaction relation between node tests and nodes in a document:

  • for all XML names φ, D,n := φ iff. n is an element or attribute node labelled φ
  • D,n := * iff. n is an element or attribute node
  • for all n, D,n := node()
  • D,n := text() iff. n is a text node

The simplest kind of 'location step' has the form α::τ where α is an axis and τ is a node test. In this case: [[α::τ]]D,n = [[α]]D,n ∩ {n | n∈D, D,n:=τ}.

Playmobil jungle sets

skeleton cave

Skeleton cave (3040) - 1999-2002

jungle ruin

Jungle ruin (3015) - 1998-2002

Natives (3089) - 1999-2002

Adventure set jungle (3097 - 1999-2002

Wednesday, May 20, 2009

Tribal chief

tribal chief

I have two tribal chiefs, from the Special theme (Jungle) 4564 (1999). Here is the inventory.

One of them is missing his bow, anklets and bracelets. Neither are boxed.

Green Man of the Wood

green man

I have this playmobil figure with dark brown skin tone and a kind of elfin print clothing effect. I had it down as part of the Jungle theme, but then looked it up in the Klicky database. Here it is: the Green Man of the Wood.

It comes from the Magic Tree set 3897 (1997-2000) from the Magic theme. Inventory here.

magic tree

I'm missing the gnarled staff, floppy yellow hat, orange beard, and green cloak.

XPath 1.0 nodes and axes

XPath 1.0 partitions the nodes in an XML document into 4 types (ignoring comments, processing instructions and namespaces):

  • root - a document has exactly one of these; no parent; exactly one 'element' child
  • element - parent must be 'element' or the 'root'; zero or more children of types 'attribute', 'element' or 'text'
  • text - parent must be 'element'; no children
  • attribute - parent must be 'element'; no children

XPath specifies a vocabulary of 12 'axes' - ancestor, descendant, preceding, following, preceding-sibling, following-sibling, self, ancestor-or-self, descendant-or-self, parent, child and attribute. There is a denotation function from axes and (context) nodes in a document to sets of nodes in the same document, defined as follows:

  • [[self]]D,n = {n}
  • [[child]]D,n = the set of non-attribute nodes in D which n immediately dominates
  • [[attribute]]D,n = the set of attribute nodes in D which n immediately dominates
  • [[descendant]]D,n = M ∪ Um∈M [[descendant]]D,m, where M = [[child]]D,n
  • [[descendant-or-self]]D,n = [[descendant]]D,n ∪ [[self]]D,n
  • [[parent]]D,n = {m} where m is the parent of n
  • [[ancestor]]D,n = M ∪ Um∈M [[ancestor]]D,m, where M = [[parent]]D,n
  • [[ancestor-or-self]]D,n = [[ancestor]]D,n ∪ [[self]]D,n
  • [[preceding]]D,n = the set of non-attribute nodes in D which precede n (i.e. finish before n starts)
  • [[following]]D,n = the set of non-attribute nodes in D which follow n (i.e. that start after n finishes)
  • [[preceding-sibling]]D,n = [[preceding]]D,n ∩ {n' | ∃ m∈[[parent]]D,n, n'∈[[child]]D,m}
  • [[following-sibling]]D,n = [[following]]D,n ∩ {n' | ∃ m∈[[parent]]D,n, n'∈[[child]]D,m}

You can read the XPath 1.0 reccomendation here.

Mitochondrial DNA

My genetic inheritance consists of two molecules: (a) my nucleic DNA molecule; and (b) my mitochondrial DNA molecule. There is one (?) copy of the former in the nucleus of every cell in my body, and there is at least one copy of the latter in every mitochondrion (i.e. energy generator) in every cell in my body. My nucleic DNA molecule is unique to me (derived from some combination of my mother's and my father's), but my mitochondrial DNA molecule is the same as my mothers, modulo random mutation.

My mitochondrial DNA molecule is double-stranded and circular, consisting of between 15,000 and 17,000 'base pairs', encoding 37 genes.

Tuesday, May 19, 2009

Human population estimates over time

Estimated global human population, in millions:

70,000 BC< 1
10,000 BC1
9000 BC3
8000 BC5
7000 BC7
6000 BC10
5000 BC15
4000 BC20
3000 BC25
2000 BC35
1000 BC50
1200
1000310
20006,070

According to the Toba catastrophe theory, a huge volcanic eruption in Sumatra around 70,000 BC caused an ice age which all but wiped out humans outwith Africa. The human population dropped to between 5,000 and 10,000, and the process of recolonising Eurasia had to begin again.

Running tccg

To do parsing and generation with openCCG, you generally run the tccg script from within the directory containing the main grammar files. This is equivalent to the following command:
$ java -Xmx128m -classpath ../../lib/openccg.jar: 
                           ../../lib/trove.jar: 
                           ../../lib/jdom.jar: 
                           ../../lib/jline.jar:. 
       opennlp.ccg.TextCCG
The tccg command has a number of optional arguments, which are passed straight on to the TextCCG class's main method as input parameters: (a) you can get help using tccg -h; (b) you can set a grammar file to read from, i.e. tccg grammar.xml; (c) you can set a file for 'exporting preferences' to, with tccg -exportprefs blah; or (d) you can set a file for 'importing preferences' from, with tccg -importprefs blah.