Sao Carlos STIL 2009
September 8-11, 2009
São Carlos, Brazil

TIL


The 7th Brazilian Symposium in Information and Human Language Technology
 

Tutorials

We are please to announce the following tutorial for STIL 2009 and the collocated events:
  • Robust Wide-Coverage Parsing: Evaluations, Representations, Issues, and Applications (Sept 8, 16:30h)
            Prof. Dr. Ted Briscoe  (University of Cambridge, UK)

  • Fast and Practical Corpus Processing using Standard Linux Tools (Sept 9, 17:30h)
          Dra. Caroline Gasperin (USP/ICMC, Brazil)


About the Tutorials

Ted Briscoe
Robust Wide-Coverage Parsing:
 Evaluations, Representations, Issues,
and Applications


Prof. Dr. Ted Briscoe (University of Cambridge, UK)

Description: In this tutorial, I'll firstly define the parsing task and discuss evaluation schemes. I'll then address some of the issues in parser design: optimal representation of syntactic information, statistical vs. heuristic parse ranking, efficiency vs. accuracy, degree of lexicalization, etc. I'll evaluate the strengths and weaknesses of different approaches, considering RASP, XLE, Enju, the C&C parser, PTB (reranking) parsers, and greedy/efficient shift-reduce dependency parsers. Finally, I'll describe some (rare) experiments we have recently undertaken to rigorously quantify the contribution of parsing to various text classification tasks, such as topic categorization, spam detection, information extraction, and language proficiency assessment.

Duration: The tutorial will last 2 hours with a 10min break in the middle.

Pre-requisites: I'll assume introductory course level understanding of computational linguistics and of probability theory.

Notes:   I'll make my slides and bibliography available after the tutorial.

About the Speaker: Ted Briscoe has a Linguistics degree (1980) from the University of Lancaster, UK, MSc (1981) and PhD (1984) from the University of Cambridge, UK. He is a Professor at the Computer Laboratory, University of Cambridge, and his research interests include evolutionary linguistics and statistical language processing.

He has published over 70 research articles, edited three books, and been Principal/Co-Investigator or Coordinator of fourteen EU and UK funded projects since 1985. He is joint editor of Computer Speech and Language and on the editorial board of Natural Language Engineering.


Caroline Gasperin

Fast and Practical Corpus Processing using Standard Linux* Tools

Dra. Caroline Gasperin (USP/ICMC, Brazil)
Description: This two hour course will give an introduction to some Linux commands for  processing both plain text and annotated files. The tools that will be presented include:

  • grep - for searching specific text passages or corpus annotations,
  • sed - for replacing strings or annotations,
  • awk - for filtering a corpus in different ways, and
  • uniq - for merging identical elements in the corpus.
The course will be divided into two parts: the commands will be presented in the first hour and the second hour will consist of a hands-on laboratory practice.

Duration: 2 hours.

Pre-requisites: none.

Notes:
  • There is a limited number of places in this course due to the size of the laboratories.
  • *These tools can also be used on Windows
About the Speaker: Caroline Gasperin has BSc and MSc degrees in Computer Science from the Pontifical Catholic University of Rio Grande do Sul, Brazil, and a PhD degree from the University of Cambridge, UK. She is currently a post-doctoral researcher at USP/ICMC, Brazil, working on the PorSimples project in text simplification for Brazilian Portuguese. Her main research interests include corpus-based techniques for NLP, anaphora resolution and information extraction.