The Tuscan Word Centre

 

Course 03.02: May 26th-29th 2003 inclusive

 

A four-day Intensive Course at The Tuscan Word Centre

 

How to use Corpora in Language Work

(focus on multilingual work)

 

Topic Leaders:

 

Prof. Wolfgang Teubert, University of Birmingham

 

Prof. Yang Hui-Zhong, Jaio Tong University, Shanghai

 

Dr. Pernilla Danielsson, University of Birmingham

 

Prof.ssa Elena Tognini Bonelli, University of Lecce

 

Prof. John Sinclair, The Tuscan Word Centre

 

 

Guest Lecturer: Prof. Alan Partington, University of Camerino

Course Description:

This course is designed for people who study languages either academically or professionally, and who see the need to understand the value of text corpora in their work. The course will address a variety of general topics related to the building, handling and use of corpora, with particular attention given to the way queries to a corpus may be framed, and the results refined and interpreted. It is suitable for translators, lexicographers, language teachers as well as those working directly with corpora.

As well as a series of presentations by the lecturers, there are frequent "hands-on" sessions where participants are given opportunities, either individually or in small groups, to familiarise themselves with the practical side of corpus exploration and to work through a series of tasks designed to give experience in a range of common activities in corpus linguistics.

The linguistic information in corpora is of fundamental importance, and unobtainable from any other source, but since the computer is used as a tool for storage and access, it is necessary to appreciate the way in which corpora are designed and built, and to become familiar with the ways in which information can be gathered and put to use. In particular it is necessary to evaluate the results of corpus searches as steps in linguistic investigations.

Topics to be covered: provisional list


(a)General Issues
Functionally-complete Units of Meaning
A Contextual Theory of Meaning
Corpus-based and Corpus-driven Linguistics
The Lexical Item
The Differentiation of Meaning
Lexicogrammar
Phraseology
The issue of representativeness

Types of computer corpora
    corpora of text samples vs.full texts
    general vs. specialized corpora
    static vs. monitor corpora
    corpora of modern vs.historical texts
    raw text vs. analysed text
    learner corpora
    corpus vs. introspection and elicitation

(b) Corpus Management and Use
Text and Corpus Typology
Internal and External Criteria

Query languages, query formulation
    Developing search techniques
    Elaborating hypotheses
    Refining and evaluating results

Textual Integrity
Tokenisation
Tags and Headers
Statistical Measures
Span and Gravity
Text-oriented Programming

(c) Multilingual Corpus Work
Multilingual Resources
Cross-language Units of Meaning
The TRACTOR archive; its origins and aims
The Plato Corpus
Alignment of parallel texts
Multilingual Processing
Translating Dictionaries
Hands-on access to TRACTOR


Return to top of page

 

Return to Home Page