The Tuscan Word Centre
Course 03.01: May 19th-22nd 2003
A four days intensive course at the Tuscan Word Centre
How to use Text Corpora in Language Work
Topic Leaders:
Prof. Geoffrey Leech, University of Lancaster
Prof. Elena Tognini Bonelli, University of Lecce and TWC
Mr Martin Wynne, Oxford Text Archive
Dr Pernilla Danielsson, University of Birmingham
Prof. John Sinclair, Tuscan Word Centre
Course Description:
TWC runs such a course every year; it is open to all students and researchers, and workers in the language industries, and it is relevant to all who are interested in the present state of corpus work and the potential for the future.The course is designed for people who study languages either academically or professionally, and who see the need to understand the value of text corpora in their work. It will address a variety of general topics related to the building, handling and use of corpora, with particular attention given to the way queries to a corpus may be framed, and the results refined and interpreted. The focus is on the relationship of recurrent patterns of language in use and the means by which languages create meaning. People working on the use and development of corpora will find a forum to discuss key issues, while those concerned with translation, lexicography, language teaching and other professions that feature corpora will find that all the topics are of direct relevance to their work. The event will be of special interest to people dedicated to Natural Language Processing because of the great current interest in applying NLP techniques of analysis to corpora.
As well as a series of presentations by the Topic Leaders, there are frequent "hands-on" sessions where participants are given opportunities, either individually or in small groups, to familiarise themselves with the practical side of corpus exploration and to work through a series of tasks designed to give experience in a range of common activities in corpus linguistics.
The linguistic information in corpora is of fundamental importance, and much of it is unobtainable from any other source. But since the computer is used as a tool for storage and access, it is necessary to appreciate the way in which corpora are designed and built, and to become familiar with the ways in which information can be gathered and put to use. In particular it is necessary to evaluate the results of corpus searches as steps in linguistic investigations.
Course Content
Each half-day session consists of the presentation of a topic area, followed by a "hands-on" session where participants have an opportunity to try out some of the practical suggestions that will be made. There will be time for interventions from participants with special interests in the topic area.Topics to be covered - Provisional List:
Developing Linguistic Corpora: a Guide to Good Practice
Most of the topic leaders are involved in the compilation of a reference work with the above title, to be published by the UK Arts and Humanities Data Service, and also by Oxbow Books.
Use of corpora in syntax studies.
What types of syntactical problems invite a corpus approach?
Diachronic changes in recent English.
Syntax studies and annotated corpora.
Software for corpus access and analysis
Corpora and queries
Tools and annotation
Word-lists and concordances
Phrase building
Parallel corpora and paraphrase
Lexical Issues
The lexical item
The differentiation of meaning
Lexicogrammar
Phraseology