Two students at a seminar desk

Corpus linguistics research

Explore our work in corpus linguistics

In corpus linguistics, we look at the way language is used in different regions, genres and situations.

Our research is based on huge datasets of natural language – often many billions of words – and we're asking questions about words and language choices, such as the implications and factors that lead to journalists referring to men and women differently in tabloid newspapers, and what the word 'we' tells us about how online groups come together.

Through our work, we're investigating pressing issues and helping to solve problems. For example, our work into the root causes and impact of online bullying, which analyses what causes the abuse, and why.

We are also looking at the language that's used when making business transactions. By researching this language, we can develop teaching materials to help organisations conduct these transactions more effectively.

The outputs from our research are frequently published in leading academic journals, such as Corpora and the International Journal of Management.

Our research covers the following topics

  • Corpus linguistics

    Studying language in collections of real-world text, and the sets of rules that govern them, and how they relate to other languages
  • Corpus-assisted discourse studies

    Using collections of text to analyse written or vocal use of language, including writing, conversation and communication
  • Corpus pragmatics

    Working at the interface between corpus linguistics and pragmatics, we're covering topics such as turn-taking and politeness
  • Corpus stylistics

    Investigating the methods, techniques and tools of corpus linguistics, used in the study of literary style
  • Lexical priming

    A theory of language, based on how certain words are used in different combinations and patterns in the real world, and how this differs from traditional
  • Lexical selection

    Understanding how chunks of language seem to get selected and how this is similar to biological evolution
  • Metaphor analysis

    Understanding how people use metaphor to conceptualise society, such as relationships, crises, organisational change and grief

Our members

Glenn Stewart Hadikin Portrait

Dr Glenn Hadikin

Senior Lecturer

School of Education, Languages and Linguistics

Faculty of Humanities and Social Sciences

PhD Supervisor

Read more
Mario Saraceni Portrait

Dr Mario Saraceni

Associate Professor in English Language and Linguistics

School of Education, Languages and Linguistics

Faculty of Humanities and Social Sciences

PhD Supervisor

Read more
John Williams Portrait

Mr John Williams

Senior Lecturer

School of Education, Languages and Linguistics

Faculty of Humanities and Social Sciences

PhD Supervisor

Read more
Alessia Tranchese Portrait

Media ready expert

Dr Alessia Tranchese

Senior Lecturer

School of Education, Languages and Linguistics

Faculty of Humanities and Social Sciences

PhD Supervisor

Read more

Methods and facilities

Different methods can be combined with corpus linguistics. Once the data set – or corpus – is built, we read concordance lines, run collocation analysis, keyword analysis and move between quantitative and qualitative techniques. We also have site licences for Sketch Engine and Lexis Nexis, which can be used to build corpora. Staff also have experience of developing bespoke tools, such as scraping online texts and converting files. We have also developed the world’s largest corpus of online discussions about citizen science – with over 10 million words.


Students and staff at the University of Portsmouth are offered free access to the following resources.

  • Sketch Engine (free access through the university server).Through Sketch Engine you can access corpora in approximately 35 different languages and including some examples of parallel corpora and corpora of academic English. Staff and post-graduate researchers may request an individual user account from John Williams which will allow them to upload their own corpora.
  • Mark Davies's corpora (open access):
    • CoCA (Corpus of Contemporary American English)
    • CoHA (Corpus of Historical American English)
    • TIME (TIME Magazine Corpus of American English)
    • BNC (BYU interface to the British National Corpus)
    • Corpus doPortuguês
    • Corpus delEspañol
  • Michigan Corpus of Academic Spoken English (MICASE) – Another very useful resource for those interested in EAP.
  • Webcorp – An interface that lets you analyse the web using corpus linguistic tools

  • AntConc – Free concordance program for Windows, Macintosh OS X, and Linux.Will run on text only files and quite user-friendly.
  • XAIRA – Open source software package which supports indexing and analysis of large XML textual resources. This is a more powerful tool for concordancing and collocate analysis but only runs on XML texts.
  • BootCaT – Free software for creating web corpora. Very easy to use.
  • UAM CorpusTool – A free environment for annotation (and interrogation)of text corpora.Runs under Windows and MacOSX.

  • International Journal of Corpus Linguistics
  • Corpora
  • Corpus linguistics and linguistic theory

  • Corpus Linguistics Conference – This is the archive for the six conferences - full papers are available for many of the presentations
  • Proceedings of The International Symposium on Using Corpora in Contrastive and Translation Studies 2010. Edited by R. Xiao. Full papers are available for several of the presentations.

External Audio

Life Solved Podcast - The Language of Violence with Dr Alessia Tranchese

Discover our areas of expertise

Corpus linguistics is one of our six areas of expertise within our Linguistics research area. Explore the others below.


We're exploring how texts are translated and the practices around the translation of texts, including professional training, the use of technologies, and non-professional translation communities.

Male translator in speaking into microphone
Read more

Discourse analysis

We're researching how ideas, concepts and people are represented through language, and exploring how language is used in real-life contexts.

Young man in conversation with older man
Read more

Professional communication

Our research in professional communication explores how spoken and written language is used in workplaces to develop relationships and achieve institutional objectives.

Smiling professional communication student seated at table
Read more


Through our work in sociolinguistics, we're studying the ways in which language can affect, and is affected, by social phenomena.

Researchers discuss sociolinguistics text
Read more

Teaching English to speakers of other languages (TESOL)

We're focusing on the learning and teaching of English as a second or foreign language, in primary, secondary and adult learning contexts.

Two women studying and speaking
Read more

Interested in a PhD in Languages and Linguistics?

Browse our postgraduate research degrees – including PhDs and MPhils – at our Languages and Linguistics postgraduate research degrees page.