Modelling language evolution with automated speech analysis and machine learning

Funding

Funded (UK/EU and international students)

Project code

SMAP5960521

Department

School of Mathematics and Physics

Start dates

October 2021

Application deadline

4 May 2021

Applications are invited for a fully-funded three year PhD to commence in October 2021.

The PhD will be based in the School of Mathematics and Physics and will be supervised by Dr Michal Gnacik, Dr James Burridge and Dr Bert Vaux (University of Cambridge).

Candidates applying for this project may be eligible to compete for one of a small number of bursaries available; these cover tuition fees at the UK rate for three years and a stipend in line with the UKRI rate (£15,609 for 2021/22). Bursary recipients will also receive a £1,500 p.a. for project costs/consumables.

The work on this project could involve:

Application of machine learning techniques to deconstruct speech such as boosted trees and neural networks
Stochastic (Markovian and non-Markovian) modelling of language acquisition and evolution
Automated collection of speech data via Web Apps powered by Flask

Languages are complex structures built from simple units of sound. They exhibit patterns on many time scales, from rules for arranging sounds within words to principles of sentence construction and timing. Each person uses a different set of sounds and patterns, and because these are learned by copying, our speech can reveal a lot about us: where we grew up, our education, ethnicity, how we wish to be seen, as well as physical and mental attributes.

In the past, we learned how languages work by painstaking data collection, analysing voices, writing, and the workings of the vocal tract. This is now changing: speech recording and recognition based on modern machine learning (typically Hidden Markov Models), is ubiquitous. However, these powerful methods are only beginning to be applied to understand how the components of speech differ between people, how these evolve over time, and what people’s voices reveal about them. Automatic speech deconstruction opens the possibility to understand human language in unprecedented detail and at large scale. It will help reveal what “black-box” algorithms can learn about us. In this project you will develop machine learning methods to deconstruct speech into units, to analyse these, and the sentences they form. You will learn how people use sound to communicate, how to analyse audio signals and train machines to recognise, deconstruct and measure them. You will build mathematical models of language change, using models of social behaviour and networks, to yield predictions about the future. The project brings together two important research themes at Portsmouth: “Future and emerging technologies”, and “Democratic Citizenship”.

The cross-disciplinary team have a track record of novel work in mathematical language models, linguistic theory and large-scale data collection. The PhD will equip you with valuable skills in data science, modelling, machine learning, automatic speech processing and linguistics.

Entry requirements

You'll need a good first degree from an internationally recognised university (minimum upper second class or equivalent, depending on your chosen course) or a Master’s degree in an appropriate subject. In exceptional cases, we may consider equivalent professional experience and/or qualifications. English language proficiency at a minimum of IELTS band 6.5 with no component score below 6.0.

You should have an interest in and aptitude for programming, statistical and probabilistic modelling.

How to apply

We’d encourage you to contact Dr Michal Gnacik ([email protected]) or Dr James Burridge ([email protected]) to discuss your interest before you apply, quoting the project code.

When you are ready to apply, you can use our online application form. Make sure you submit a personal statement, proof of your degrees and grades, details of two referees, proof of your English language proficiency and an up-to-date CV. Our ‘How to Apply’ page offers further guidance on the PhD application process.

If you want to be considered for this funded PhD opportunity you must quote project code SMAP5960521 when applying.