Healthy Data: improving data integrity for health information

  • Application end date: 11th February 2018
  • Funding Availability: Funded PhD project (EU/UK/International)
  • Department: School of Computing
  • PhD Supervisor: Prof. Adrian Hopgood & Dr Philip Scott

Project code: CCTS4190218

Project description

This project concerns the use of artificial intelligence (AI) to correct errors in individual medical records. There has been a great deal of interest in the application of data analytics (“big data”) to such records. With millions of healthcare records potentially available, there is clear interest in spotting patterns between symptoms, diagnoses, and treatments. However, there is plenty of evidence that a large proportion of individual medical records contain errors, rendering the data analytics prone to “garbage in, garbage out”.

NHS Digital have investigated whether data quality failures could be detected in national data returns using the diagnosis of dementia as an example. The data show that of the 317,000 patients diagnosed with dementia between April 2010 and March 2015, only 51% of these had a recorded diagnosis of dementia when admitted to hospital during the year 2014/15. Clearly the dementia had not gone away, so the records were flawed. A separate study of emergency admissions data by Dr Tom Hughes of John Radcliffe Hospital found that 40% of patients have no diagnosis at all and that, of those that do, nearly half are meaningless, vague or merely a symptom.

In this context, big data approaches are important but insufficient for extracting useful information. The data need to be checked for inconsistencies and repaired. This is a challenging problem for an AI system. While some data records may be impossible, such as the ‘cured’ dementia, many others will lie on a spectrum from unusual to implausible. Despite the scale of the challenge, there is nevertheless recent work to draw upon in correcting both free-text data (primarily for spelling) and structured data records. The data records for this project will come from research-community sources such as the Mimic III database, managed by MIT.

It is proposed that a multi-stage (and possibly multi-agent) approach is adopted to repair the free text, repair the structured data, and finally cross-check between the two forms of data. Data-driven and knowledge-driven approaches will be explored. In a data-driven approach, any records that show a unique pattern among a dataset of millions would be considered suspect. In a knowledge-driven approach, fuzzy rules might propose likely combinations of symptoms and diagnoses, based on medical knowledge, and recognise any improbable combinations in the data records.

Supervisor profiles

Prof. Adrian Hopgood

Dr Philip Scott

Admissions criteria

You’ll need a good first degree from an internationally recognised university (depending upon chosen course, minimum second class or equivalent) or a Master’s degree in an appropriate subject. Exceptionally, equivalent professional experience and/or qualifications will be considered. English language proficiency at a minimum of IELTS band 6.5 with no component score below 6.0.  


Informal enquiries are encouraged and can be made to Prof. Adrian Hopgood at (02392 842946) orDr Philip Scott at (02392 846378).

For administrative and admissions enquiries please contact

How to Apply

We welcome applications from highly motivated prospective students who are committed to develop outstanding research outcomes. You can apply online at You are required to create an account which gives you the flexibility to save the form, log out and return to it at any time convenient to you.

A link to the online application form and comprehensive guidance notes can be found at

Applications should include:

- Full CV including personal details, qualifications, educational history and, where applicable, any employment or other experience relevant to the application

- Contact details for two referees able to comment on your academic performance

- Research proposal of 1,000 words outlining the main features of a research design you would propose to meet the stated objectives, identifying the challenges this project might present and discussing how the work will build on or challenge existing research in the above field.

- Proof of English language proficiency (for EU/ International students)

When applying, please quote project code: CCTS4190218

Interview date: TBC

Start date: October 2018.

Funding notes

UK/EU students -  The fully-funded, full-time three-year studentship provides a stipend that is in line with that offered by Research Councils UK of £14,553 per annum.

International students - International students applying for this project are eligible to be considered for the Portsmouth Global PhD scholarships.

Research at The School of Computing

Discover more about our research areas on our webpages.

Visit us

Visit us at a Postgraduate Information Day to discover more about the research programmes we offer. Book your place at