is a journey


CLC - Corpus Linguistics Centre

The Corpus Linguistics Centre of the Università degli Studi Internazionali di Roma – UNINT is composed of academics from the Faculty of Interpreting and Translation who play an active role in the field of corpus-based linguistic research, and who are specialised in analysing written and oral texts, interpreted and translated speeches, L2 interactions, and variations in learning.

The development in research through the use of technologies applied to linguistic analysis has an immediate application to didactics in several fields: interpreting, translating, and language, and terminology and applied linguistics studies (sociolinguistics, contact linguistics, textual linguistics and pragmatic linguistics).

Academic Director

Prof.ssa Laura Mori (Full Professor, Linguistics, Faculty of Interpreting and Translation, UNINT))


The Corpus Linguistics Centre supports research activities undertaken by its members and promotes scientific dialogue within the Faculty of Interpreting and Translation, together with the creation of new synergies at an international level.

The Corpus Linguistics Centre aims at sharing members’ academic findings and promoting the exchange of knowledge with particular regard to the development of applied research protocols, the use of corpus-based linguistic analysis programmes, and the development of linguistic and terminology databases.

The Corpus Linguistics Centre presents itself as a space for sharing among the academic staff, graduates and those who are about to graduate, and is in constant touch with academia at large in order to improve both research and teaching methods.

Fields of Research

The Centre undertakes academic development-oriented research activities in the following fields:

  • Corpus-based sociolinguistics (specialised and sectorial variation);
  • Corpus-based acquisitional linguistics (L2 variation);
  • Corpus-based discourse and communication analysis;
  • Parallel and/or comparable corpus-based translation;
  • Translation memories and other databases;
  • Corpus-based terminology and phraseology

Member of the Corpus-Based Linguistics Centre

Many of the members’ academic works related to their research in the field of corpus-based linguistics have been published.

The current Corpus-Based Linguistics Centre is composed of the following tenured and guest lecturers from the Faculty of Interpreting and Translation:

  • Lorenzo Blini
  • Michael Sherman Boyd
  • Stefania Cerrito
  • Manuela Frontera
  • Laura Mori
  • Lucilla Pizzoli
  • Fabio Proia
  • Annalisa Sandrelli

National and European external members who contribute to the UNINT projects:

  • Lucja Biel (University of Warsaw)
  • Giuditta Caliendo (Université de Lille)
  • Paolo Canavese (Université de Genève)
  • Sandro Caruana (University of Malta)
  • Sara Castagnoli (Università di Macerata)
  • Chiara Degano (Università degli Studi di Roma Tre)
  • Gatis Dilans (Ventspils University College)
  • Vittorio Ganfi (Università degli studi di Modena e Reggio Emilia)
  • Claudio Fantinuoli (University of Mainz/Germersheim)
  • Annarita Felici (Université de Geneve)
  • Adriano Ferraresi (Università di Bologna)
  • Rudy Loock (Université de Lille-Charles de Gaulle)
  • Mikhail Mikahilov (University of Tampere)
  • Aino Piehl (Institute for the Languages of Finland)
  • Valentina Piunno (Università degli studi di Roma Tre)
  • Federica Politanò (Paris-Est-Créteil (UPEC)- CÉDITEC (Centre d’étude des discours, images, textes, écrits, communications)
  • Benedikt  Szmrecsanyi (Katholieke Universiteit Leuven)
  • Vilelmini Sosoni (Ionian University of Corfù)
  • Rubén González Vallejo (Università degli studi di Macerata)

Centre Activity

The Corpus Linguistics Centre organises a number of research seminars every year (Corpus Linguistics Workshops), which are available in the “Seminars” section below.

Seminars offer an in-depth analysis of the issues listed below, a presentation of mixed analytical approaches (qualitative and quantitative), and the use of software for the description of linguistic variation through synchronic and diachronic corpora and databases.

Seminars are held by both internal and external members of the Centre and guest speakers, and are open to those interested in the potentiality of corpora in linguistic research.

Seminars aim at encouraging reflection and exchange within the academic community and represent a growth opportunity for those about to get a Master’s Degree and who want to use technological resources to analyse the linguistic data from a sociolinguistic, pragmatic, translating, and terminology point of view.


15 dicembre 2021 - Plurilinguismo istituzionale, chiarezza linguistica e influenze esogene: italiano legislativo svizzero alla luce dell’analisi su corpus

20 maggio 2021 - Prospettive di ricerca legate all’uso dei corpora per l‘Osservatorio degli italianismi nel mondo (OIM)

25 marzo 2021 - Il lessico dei manuali di italiano per stranieri: i dati del corpus LAICO

21 maggio 2019 - "Il parlare chiaro nella comunicazione medica": seminario sull'analisi corpus-based del linguaggio medico per una sua semplificazione

4 aprile 2019 - Presentazione del volume "Gender in legislative languages. From eu to national law in english, french, german, italian and spanish"

1 marzo 2019 - "Observing Eurolects. Corpus analysis of linguistic variation in EU law" (2018, John Benjamins). Presentazione del volume cuarto da Laura Mori

18 ottobre 2018 - Collocazioni, combinazioni lessicali, lessemi complessi. Una ricerca su corpora euroistituzionali in prospettiva traduttiva

14 giugno 2018 - L'uso di corpora per lo studio sincronico e diacronico delle combinazioni di parole in italiano e spagnolo 

22 maggio 2018 - Analisi data-driven: esempi dalla codifica del corpus DISDIR per uno studio di pragmatica transculturale 

16 aprile 2018 - Il linguaggio giuridico inglese fra teoria e prassi: un approccio corpus-based

UNINT Research Projects 

UNINT has funded the following research projects over the last years, in support of the Centre’s objectives:

Eurolect Observatory Project - First phase: 2013/2016 | Second phase: 2017/2020

The UNINT Research Group Eurolect Observatory, coordinated by Laura Mori, was created within the Faculty of Interpreting and Translation (FIT) and has been funded by the UNINT Fund for Research. 

The first phase of the 11 languages research programme (Finnish, French, Greek, English, Italian, Latvian, Maltese, Dutch, Polish, Spanish, and German) was conducted by academics attached to universities in Italy and elsewhere in Europe (Institute for the Languages of Finland, The Ionian University, Ghent University, Paris Diderot University, University of Nantes, University of Malta, University of Warsaw, Rome ‘Tor Vergata’ University, Rome – UNINT University, and the University of Tampere, Ventspils University College).

The Research Group focused on the analysis of corpora composed of EU directives and national transposition legal texts that had been drafted in the above-mentioned 11 languages, with the aim of:

  • Identifying the existence or otherwise of variations between European and national legislative texts (eurolects);
  • Highlighting the linguistic differences between eurolects and their respective national legal variations, in order to highlight the linguistic and translating variation within functional elements;
  • Providing stakeholders with background information (Language Services in the EU institutions, national, regional and independent chambers);
  • Highlighting improvements in quality for legal drafting at a national and international level.


GEROM Project - GEROM is the end result of an Italian-German project that was undertaken within the Fachbereich Translations-, Sprach- und Kulturwissenschaft (FTSK) of Johannes Gutenberg-Universität Mainz/Germersheim and the Faculty of Interpreting and Translation (FIT) of the Università degli Studi Internazionali di Roma – UNINT. This two-phase project was undertaken between 2013 and 2015. During the first phase the project coordinators pided all students and graduands into two working groups, to assess the contribution of terminography to Italian-German communication. During the second phase a round table discussion was organised to encourage the debate with language professionals, researchers, and academics specialised in terminology and corpus-based linguistics.


Progetto Dub-Talk - 2013/2015 -  e TV-Talk - 2016/2018 - The main purpose of this project is to analyse the features of spoken texts in television and their translation through the process of dubbing, the most common method used in Italy. The collaboration between the Università di Pisa (Philology, Literature, and Linguistics Department) and the Università degli Studi Internazionali di Roma – UNINT started in 2013 and aims at extending the Dub-Talk project, creating a common parallel corpus comprising transcriptions of original and dubbed speeches from American and British TV series and films.

UNINT Corpora, database and portals

UNINT provides the following corpora, databases and portals, the final product of group research projects (with the collaboration of FIT graduands):

Eurolect Observatory Multilingual Corpus (Eurolect Observatory Project): the corpus is composed of 11 languages, of which eight are taught at UNINT.

sub-corpus A: 660 directives in eight languages (1/1/ 1999 – 31/12/2008) concerning all EU policies;

sub-corpus B: national measures that take the above-mentioned directives in France, Germany, England, Italy, Malta, Belgium, Spain, and Poland. (Prof. Laura Mori).


Dub-Talk Corpus (Dub-Talk Project): parallel corpus with transcriptions of dubbed TV series and films. (Prof. Annalisa Sandrelli).


TV-talk Corpus (TV-Talk Project): EN > IT and IT > EN corpus with several genres and translating methods, including voice-over (symil-synch) and subtitling. (Prof. Annalisa Sandrelli).


GEROM - Italian-German Terminology Portal (Gerom Project): bilingual and dynamic lexical and textual platform, to support the analysis and translation of texts concerning politics, society, and the culture of German- and Italian-speaking countries. (Prof. Fabio Proia).

Corpora and Databases created by UNINT members

FOOTiE (Football in Europe): multilingual corpus of simultaneously interpreted football press conferences (Prof. Annalisa Sandrelli).

ArabIt: a collection of recordings of arabophone L2 Italian speakers, taken from free interviews and guided dialogue aided by iconographic images: a descriptive and lexical-referential task. Development of a database with phonetic and acoustic parameters and segmentation information. (Prof. Laura Mori).


Instagram Facebook LinkedIn Twitter YouTube Telegram TikTok