Corpas na Gaeilge Labhartha
Researchers: Elaine Uí Dhonnchadha, Alessio Frenda.
with Brian Vaughan (Recording), Daniel Jettka (XSLT)
Funding: Foras na Gaeilge (GaLa 2011) (Comhrá 2012)
Background
The aim of this project is to create a comprehensive corpus of spoken Irish.
This corpus will provide valuable material for linguistic research, the teaching of Irish and language technology such as automatic speech recognition.
It will be a diachronic* corpus, as we are collecting the earliest audio material available to us (going back seventy years or more), as well as making new contemporary recordings, in the Speech Communication Laboratory, and in the Gaeltacht regions around the country. We will endeavour to create a balanced corpus as regards speaker dialect, gender and age.
Creating a corpus of spoken language requires transcribing audio or video recordings (e.g. spoken conversations, interviews, speeches etc). We use specialised transcription software which enables XML formatting of the transcripts as well as time-alignment of the transcript with the audio/video recording.
Transcription Guidelines for Irish
These guidelines were designed by Alessio Frenda, Elaine Uí Dhonnchadha and Pauline Welby (CNRS, France). If you have any queries or suggestions regarding the guidelines, please e-mail us at uidhonne@tcd.ie.
There are a number of aspects of speech which do not need to be recorded in the transcription as they can be automatically generated at a later stage, e.g. the length of pauses. Dialectal pronunciation is not represented in the orthographic transcription as dialectal pronunciations can be more accurately represented in a separate phonetic transcription (apart from a list of pre-defined exceptions, details below). A large percentage of dialectal pronunciations can be automatically generated from the standard orthography using dialect-specific letter-to-sound rules.
- Treoirlínte Tras-scríofa agus Transcriber (nuashonraithe)
- Botúin Coitianta (nua)
Further information regarding standardised spelling of various aspects of spoken language may be found in the following lists:
- Giorrúcháin Focal
- Nathanna Cumarsáid
- Sosanna Líonta (nua)
- Liosta Focail Canúnach
- Liosta Foirmeacha Táite de Bhriathra (nuashonraithe)
- An Fhoirm Choibhneasta (nua)
- Malairt Focail ins an Foclóir Gaeilge-Béarla (Ó Dónaill, 1977)
Transcription Software
To download the Transcriber software, go to the following web page and then install the software on your computer. http://sourceforge.net/projects/trans/files/transcriber/1.5.1/ [Do not use TranscriberAG]
*to study how a language changes over time