Computational Linguistics & Phonetics Computerlinguistik Phonetik   Fachrichtung 4.7 Allgemeine Linguistik, Universität des Saarlandes Fachrichtung 4.7 Allgemeine Linguistik Universität des Saarlandes

Lecture
Tutorials
FAQ
~gparis/

Computational Psycholinguistics - Tutorials

Part 1: Symbolic Models of Human Parsing

COGENT: A Graphical Environment for Cognitive Modelling

COGENT under Linux

Better late than never... We now have a version of COGENT (version 2.4 using GTK2) which appears to be running under the department's new version of Linux. You should be able to use it on the lab machines and on login2 (login is still the old Ubuntu). It can be started in a shell by typing
$ /proj/contrib/cogent/bin/cogent &
at the prompt (not including the $ sign at the beginning of the line).

The first time you start it, you will have to set the various paths, which will be saved in .cogent_2_4_rc (not .cogentrc). In addition, you will also have to disable optimisation in the preferences. If your are interested in using it, please ask me for the appropriate registration key (the key for COGENT 2.3 will not work).

Tutorial 1: Introduction to COGENT

  • Slides
  • Worksheet (new)
    • 7.11.2011
      Please send me your model up until section 3.4 as soon as possible (today!).
      I will then provide you with an archive from which you can go on with the new tutorial, starting at section 3.4 "Setting up (the process) Press Key". For an explanation on how to create COGENT archives, see the FAQ.
    • 8.11.2011
      Please try to finish until section 4.2 for tomorrow's tutorial. It's a little long on paper but the steps are very detailed. It's important you understand list processing in COGENT before we can go on.
    • 9.11.2011
      Due date for Tutorial 1: 13.11.2011
      Submit archive of model 3 + analysis of output in spreadsheet or other format of your own choosing.

Tutorial 2: Top-Down

  • "Pen & paper" exercise to be handed in the lecture on 14.11.2011:
    • Review the top-down parsing algorithm in the lecture slides or in Crocker, 1999.
    • Draw the SEARCH tree (not the parse tree) for "John chases the black cat" using the following grammar:
      s → np vp
      np → pn
      np → det n
      np → det adj n
      vp → vt np
    • Note that nodes in the tree describe the state of the system (the parser); they should therefore indicate both the remaining input and the content of the stack. In addition you should mark the branches as either a "rewrite" step (R) or a "lookup" (L) step.
  • Unification exercise (optional; there are instructions at the bottom of the sheet on how to check your answers)
  • Worksheet
    Due dates for the models:
    • The "parallel" model (up to section 6 included): Sat 19/11
    • The model with rule selection (up to section 7.3 included): Mon 21/11
    • Final results including output: Wed 23/11
  • The n-ary grammar

Tutorial 3: Backtracking

  • Backtracking rules (to be added to Parse) (new)
  • Stimuli: In order for backtracking the work correctly, it is necessary to detect the end of a sentence. Therefore, you should add eos to the end of each of your stimuli: John loves Mary will be represented as [john, loves, mary, eos] (eos stands for "end-of-sentence"; although '.' is correct in Prolog, it does not work in COGENT).
  • Instead of system_end(trial) in the first rule of Present Stimulus, you may need to use the condition "the current cycle is 1".
  • 3.12.2011: I have spotted two minor errors in backtracking.cog and uploaded a new version. The errors should not however have impacted your work. They were a left-over from debugging in the "pop" rule (send backtrack to ...) and a missing system_quiescent in the "parsing_failed" rule.
  • An explanation of the condition definition reprocess/2
  • Frequent errors:
    • All your buffers and processes should be initialized at the beginning of each trial except for the List of Stimuli. Access is always "Random" except for the List of Stimuli, for which it should be set to "FIFO". All buffers should be "Grounded", meaning we do not want to allow unbound variables. Be very careful if you set a process's "Initialise" property to anything else but "Trial" because you want to examine the messages: Make sure you reset it immediately afterward, as this can have nasty side-effects (nasty=difficult to debug).
    • After backtracking, while you reprocess words, you should make sure your model does not "read" any new words from the display, as a basic assumption in self-paced experiments is that participants only press the space bar when the current word has been fully integrated. This requires a minor change to your Input/Output rules (hint: you need to add a condition to each).
  • It may be helpful when debugging to set the properties of some buffers so that duplicates are allowed, although this is not absolutely necessary.

Tutorial 4: Shift-Reduce

  • Worksheet (new)
  • Debugging tip: All our models have the same basic architecture. Therefore, both in this model as well as the next, you may encounter some of the problems we had when building the top-down parser. If you struggle, it may be useful to have a look again at that tutorial, especially at section 6 and the beginning of section 7 (the introductory paragraphs before 7.1).

Tutorial 5: Left-Corner

Tutorial 6: Linking Hypotheses & Cognitive Plausibility

  • Assignment
  • Archive containing potential stimuli, lexicon and grammar to start from
  • Additional instructions:
    • You do not need to run complete "experiments" to complete the assignment.
    • However, you should add the rule-scoring mechanism implemented in the LC parser to the top-down and shift-reduce parsers. The scores for the rules should be chosen so that memory-load is generally minimized. Without this, you will not get the expected results.

Part 2: Connectionism

Note: In order to "get credit" for these tutorials and be allowed to sit the exam, you are expected to hand in the answers to the questions in the worksheet before the discussion of the solution. Normally, it will be less work to print out the tutorial and write your answers by hand, although PDFs and/or spreadsheet solutions are also permitted.

Tlearn under Windows

Being older software, Tlearn has problems with 1) long paths 2) paths containing spaces. In the lab, there is a copy of the executable under C:\Tlearn. This directory is local and everyone can write to it, so it is likely to get messy. It is suggested you create a subdirectory using your own initials (I would use "gp", for example) to work in and subdivide it further as T1, T2, T3 for each tutorial. Any old data under C:\Tlearn (e.g. data from last year) can be deleted. In case you accidentally delete the executable itself, you can restore it by downloading it from the internet. Finally, since everyone can write to that directory, make sure you save all your data before you wrap up, as it might be deleted by someone else.

Tutorial 1: Introduction and Learning

Tutorial 2: Learning the English Past Tense

Tutorial 3: Using SRNs to Process Sequences