Two students at a seminar desk

Corpus linguistics research

Explore our work in corpus linguistics, 1 of our areas of expertise in Linguistics

In corpus linguistics, we look at the way language is used in different regions, genres and situations.

Our research is based on huge datasets of natural language – often many billions of words – and we're asking questions about words and language choices, such as the implications and factors that lead to journalists referring to men and women differently in tabloid newspapers, and what the word 'we' tells us about how online groups come together.

Through our work, we're investigating pressing issues and helping to solve problems. For example, our work into the root causes and impact of online bullying, which analyses what causes the abuse, and why.

We are also looking at the language that's used when making business transactions. By researching this language, we can develop teaching materials to help organisations conduct these transactions more effectively.

The outputs from our research are frequently published in leading academic journals, such as Corpora and the International Journal of Management.

Our research covers the following topics


  • Corpus linguistics

    Studying language in collections of real-world text, and the sets of rules that govern them, and how they relate to other languages.
  • Corpus-assisted discourse studies

    Using collections of text to analyse written or vocal use of language, including writing, conversation and communication.
  • Corpus pragmatics

    Working at the interface between corpus linguistics and pragmatics, we're covering topics such as turn-taking and politeness
  • Corpus stylistics

    Investigating the methods, techniques and tools of corpus linguistics, used in the study of literary style.
  • Lexical priming

    A theory of language, based on how certain words are used in different combinations and patterns in the real world, and how this differs from traditional
  • Lexical selection

    Understanding how chunks of language seem to get selected and how this is similar to biological evolution
  • Metaphor analysis

    Understanding how people use metaphor to conceptualise society, such as relationships, crises, organisational change and grief.

Our members

Image of Dr Glenn Hadikin

Dr Glenn Hadikin

  • Job Title Senior Lecturer, English Language and Linguistics
  • Email Address
  • Department School of Languages and Applied Linguistics
  • Faculty Faculty of Humanities and Social Sciences
  • PhD Supervisor PhD Supervisor
Image of Dr Mario Saraceni

Dr Mario Saraceni

  • Job Title Reader, English and Linguistics
  • Email Address
  • Department School of Languages and Applied Linguistics
  • Faculty Faculty of Humanities and Social Sciences
  • PhD Supervisor PhD Supervisor
Image of Mr John Williams

Media ready expert

Mr John Williams

  • Job Title Senior Lecturer in English Language & Linguistics
  • Email Address
  • Department School of Languages and Applied Linguistics
  • Faculty Faculty of Humanities and Social Sciences
  • PhD Supervisor PhD Supervisor

Methods and facilities

Different methods can be combined with corpus linguistics. Once the data set – or corpus – is built, we read concordance lines, run collocation analysis, keyword analysis and move between quantitative and qualitative techniques. We also have site licences for Sketch Engine and Lexis Nexis, which can be used to build corpora. Staff also have experience of developing bespoke tools, such as scraping online texts and converting files. We have also developed the world’s largest corpus of online discussions about citizen science – with over 10 million words.


Students and staff at the University of Portsmouth are offered free access to the following resources.

  • Sketch Engine (free access through the university server).Through Sketch Engine you can access corpora in approximately 35 different languages and including some examples of parallel corpora and corpora of academic English. Staff and post-graduate researchers may request an individual user account from John Williams which will allow them to upload their own corpora.
  • Mark Davies's corpora (open access):
    • CoCA (Corpus of Contemporary American English)
    • CoHA (Corpus of Historical American English)
    • TIME (TIME Magazine Corpus of American English)
    • BNC (BYU interface to the British National Corpus)
    • Corpus doPortuguês
    • Corpus delEspañol
  • Michigan Corpus of Academic Spoken English (MICASE) – Another very useful resource for those interested in EAP.
  • Webcorp – An interface that lets you analyse the web using corpus linguistic tools

  • AntConc – Free concordance program for Windows, Macintosh OS X, and Linux.Will run on text only files and quite user-friendly.
  • XAIRA – Open source software package which supports indexing and analysis of large XML textual resources. This is a more powerful tool for concordancing and collocate analysis but only runs on XML texts.
  • BootCaT – Free software for creating web corpora. Very easy to use.
  • UAM CorpusTool – A free environment for annotation (and interrogation)of text corpora.Runs under Windows and MacOSX.

  • International Journal of Corpus Linguistics
  • Corpora
  • Corpus linguistics and linguistic theory

  • Corpus Linguistics Conference – This is the archive for the six conferences - full papers are available for many of the presentations
  • Proceedings of The International Symposium on Using Corpora in Contrastive and Translation Studies 2010. Edited by R. Xiao. Full papers are available for several of the presentations.

Discover our areas of expertise

Corpus linguistics is 1 of our 6 areas of expertise within our Linguistics research area. Explore the others below.

Research groups

We're researching what happens as speakers and writers cross the boundaries of language systems or transgress the rules within them.

Interested in a PhD in Languages and Linguistics?

Browse our postgraduate research degrees – including PhDs and MPhils – at our Languages and Linguistics postgraduate research degrees page.