Course Description
Introduces finite-state automata as both a theoretical framework as well as a practical tool for phonetics, (morpho)phonology, and (certain aspects of) syntax. Students will learn to use software toolkits for linguistic analysis and how to work with natural-language corpora.
Additional Requirements for Graduate Students:
Graduate students are expected to complete an original final
project that integrates finite-state methods with their own
research interests. In addition, at least twice during the
semester, graduate students will introduce selected topics in
the context of small-group discussions in class. One of these
will be a topic in which the graduate student has previous
preparation. The other will be a “stretch,” involving a
previously unfamiliar area. The grading rubric relates these
experiences to the sorts of teaching graduate students will
likely do if they pursue an academic career in linguistics.
Athena Title
Finite State Linguistics
Undergraduate Prerequisite
LING 3060 or LING(ENGL) 3150 or LING 3150W
Graduate Prerequisite
Permission of department
Grading System
A - F (Traditional)
Course Objectives
The objective of the course is to introduce students to finite-state methods for analyzing linguistic data. Students will learn how to find and count attestations, analyze alternations, and compute morpho-syntactic analyses automatically using software toolkits. Through graded weekly problem sets, students will come to see the wide range of linguistic analyses that can be subsumed under this highly general mathematical model of computation. In the second half of the course, these analyses will become probabilistic in nature, which offers students an entree into the natural language processing literature. Students will also complete a final project, which for undergraduate students may be assigned by the instructor and may involve the transliteration of an existing analysis from the literature into a fully- specified finite-state machine. Graduate students are expected to propose a more original final project that integrates finite-state methods with their own research interests.
Topical Outline
Following is a sample topical outline for the course. Specific topics, their order of presentation, and assigned readings may be changed as needed. 1. Searching morphologically-annotated corpora with regular expressions (REs). 2. The connection between REs and automata Partee et al. 1993. Mathematical methods in linguistics, Chapter 17. 3. Intro to transducers via syntactic chunking. 4. Combining transducers and the Xerox calculus. 5. Finite-state phonology. 6. Pronouncing dictionaries and a peek at Automated Speech Recognition Coleman. 2005. Introducing speech and language processing. Chapter 6. Jurafsky and Martin. 2008. Speech and language processing. Chapter 31. 7. Weighted automata and the AT&T toolkit. 8. Files, awk. and the unix way. Kernighan and Pike. https://en.wikipedia.org/wiki/The_Unix_Programming_Environment. 9. Probability. Krenn and Samuelsson 1997. http://www.ofai.at/~brigitte.krenn/n_edu.html. 10. Markov models. http://www.opengrm.org. Other resources used in the course: Allauzen, Jansche, and Riley. 2009. OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language. http://openfst.cs.nyu.edu/twiki/bin/view/FST/FstHltTutorial. Beesley and Karttunen. 2003. Finite-state morphology. https://web.stanford.edu/~laurik/fsmbook/home.html.
Syllabus