The Tuscan Word Centre
Course 03.02: May 26th-29th 2003 inclusive
A four-day Intensive Course at The Tuscan Word Centre
(focus on multilingual work)
Topic Leaders:
Prof. Wolfgang Teubert, University of Birmingham
Prof. Yang Hui-Zhong, Jaio Tong University, Shanghai
Dr. Pernilla Danielsson, University of Birmingham
Prof.ssa Elena Tognini Bonelli, University of
Lecce
Prof. John Sinclair, The Tuscan Word Centre
Guest Lecturer: Prof. Alan Partington, University of Camerino
Course Description:
This course is designed for people who study languages either academically or professionally, and who see the need to understand the value of text corpora in their work. The course will address a variety of general topics related to the building, handling and use of corpora, with particular attention given to the way queries to a corpus may be framed, and the results refined and interpreted. It is suitable for translators, lexicographers, language teachers as well as those working directly with corpora.
As well as a series of presentations by the lecturers, there are frequent "hands-on" sessions where participants are given opportunities, either individually or in small groups, to familiarise themselves with the practical side of corpus exploration and to work through a series of tasks designed to give experience in a range of common activities in corpus linguistics.
The linguistic information in corpora is of fundamental importance, and unobtainable from any other source, but since the computer is used as a tool for storage and access, it is necessary to appreciate the way in which corpora are designed and built, and to become familiar with the ways in which information can be gathered and put to use. In particular it is necessary to evaluate the results of corpus searches as steps in linguistic investigations.
Topics to be covered: provisional list
(a)General Issues
Functionally-complete Units of Meaning
A Contextual Theory of Meaning
Corpus-based and Corpus-driven Linguistics
The Lexical Item
The Differentiation of Meaning
Lexicogrammar
Phraseology
The issue of representativeness
Types of computer corpora
corpora of text samples vs.full texts
general vs. specialized corpora
static vs. monitor corpora
corpora of modern vs.historical texts
raw text vs. analysed text
learner corpora
corpus vs. introspection and elicitation
(b) Corpus
Management and Use
Text and Corpus Typology
Internal and External Criteria
Query languages, query formulation
Developing search techniques
Elaborating hypotheses
Refining and evaluating results
Textual Integrity
Tokenisation
Tags and Headers
Statistical Measures
Span and Gravity
Text-oriented Programming
(c) Multilingual
Corpus Work
Multilingual Resources
Cross-language Units of Meaning
The TRACTOR archive; its origins and aims
The Plato Corpus
Alignment of parallel texts
Multilingual Processing
Translating Dictionaries
Hands-on access to TRACTOR