National Corpus of Contemporary Welsh

A major new project to record the Welsh language and explore the ways in which it is used is underway.

Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh (CorCenCC) project, aims to develop the first ever large-scale collection of Welsh words representing the full range of language used by people in everyday life. It was officially launched on 28th February 2017 at the Pierhead Building in Cardiff.

‌The launch event attended by Alun Davies AM, Minister for Lifelong Learning and Welsh Language, gave guests the chance to find out more about the project, which is a collaboration between Cardiff, Swansea, Lancaster and Bangor universities, and is breaking new ground in creating a large-scale, open access corpus of contemporary Welsh language.

Backed by high-profile ambassadors - poet Damian Walford-Davies, musician and presenter Cerys Matthews, broadcaster Nia Parry and international rugby referee Nigel Owens - CorCenCC is community-driven and uses mobile and digital technologies to enable public collaboration.

A demonstration of a new data collection app which enables Welsh speakers from all walks of life to contribute to the project, was on show at the event. CorCenCC partners and ambassadors also shared their impressions of how the resource will impact on their research, and on the Welsh language community more widely.

CorCenCymru app The research team aim for the corpus to contain 10 million words of Welsh language, providing concrete evidence about modern Welsh language use for academic researchers, teachers, language learners, dictionary makers, translators, and anyone interested in the way Welsh is used across different speakers and genres.

Dr Dawn Knight, project lead from Cardiff University’s School of English, Communication and Philosophy said: “What we aim to achieve is the development of the first large-scale living and evolving corpus, representing the Welsh language across communication types and informed by real, current, users of the language.  We will be engaging with the public in a number of ways, and using new technologies to do so, including the CorCenCC crowdsourcing app. The use of crowdsourced corpus data is relatively unheard of, and represents a new direction to complement more traditional language collection methods.”  

Steve Morris, Swansea University added: “This is a project about the past, present and future use of the Welsh language and will inform us about variation and change in real language use, such as regional differences or use of mutations over time. By putting speakers themselves in charge of their contributions to the corpus, they can be sure that the recordings they share will be the most natural and accurate representation possible of their everyday Welsh. 

CorCenCC is funded by the Economic and Social Research Council and the Arts and Humanities Research Council. The project also involves Welsh Government; National Assembly for Wales; The National Library of Wales; WJEC-CBAC; Welsh for Adults; S4C; BBC; y Lolfa; and the Dictionary of the Welsh Language. Additional funding for the launch was received from the British Council; the School of English, Communication and Philosophy (ENCAP), Cardiff University and Research Institute for Arts and Humanities (RIAH), Swansea University.