A Flexible Framework for Unlocking Potential in Big Data Pipelines through Machine Learning and Visualisation
PhDs and postgraduate research
Self-funded PhD students only
School of Creative Technologies
Applications accepted all year round
The PhD will be based in the School of Creative Technologies and will be supervised by Dr Mel Krokos, Dr Jiacheng Tan and Professor Kazuya Koyama. The project will be linked with our participation in a flagship EU Horizon 2020 project and involves working within a large international network of prestigious research collaborators and institutions.
The work on this project will broadly consist of:
- Enabling the processing of big data from emerging instrumentation infrastructures; data from the field of astronomy will be used as demonstrators;
- Facilitating remote visualisation and analysis of big data volumes through client-server models exploiting HPC facilities;
- Underpinning analysis of big data pipelines through machine learning algorithms; and finally
- Identifying synergies for the repurposing and application of the resulting framework to other big data domains.
Big data pipelines are increasingly encountered in a variety of scientific domains, e.g. in astronomy there is an emerging flurry of enormously large, incredibly rich and highly complex data volumes expected to be delivered by a new generation of technologically advanced telescopes, in particular in the radio bandwidth, e.g. in the order of around 10GB of data per second.
Such enormous data volumes impose extremely challenging demands on traditional approaches for data management and analysis. For example, such data will, due to its mere size, have to be stored in dedicated computing facilities providing the highest performance capacity and will further need to be processed near its physical location by appropriate high-performance computing (HPC) resources. Analysis and imaging software tools will have to be adapted at various levels, if not completely re- designed, to run efficiently at scale with satisfactory performance to accommodate this.
The project vision is to pioneer development of a first of its kind framework for a new generation of high capacity software tools underpinning fully automated big data pipelines through real-time machine learning and creative visualisation. The aim is enabling capacity for real-time interaction with such big data processing pipelines in emerging instrumentations. The development will be informed by work on data pipelines from our participation in a flagship EU Horizon 2020 project. The framework will be built upon existing earlier research work in collaboration with an international network of renowned academic, research and industrial institutions from Germany, Australia and the UK, representing a comprehensive range of big data sources and technology stakeholders. The work will be highly multidisciplinary with potential opportunities for student work placements within these institutions.
The proposed framework will be flexible in the sense of allowing repurposing and application to other domains. Further, it will be sufficiently rich for the deployment of tools capable of extracting relevant information to unlock the intrinsic data value at the different stages of time evolving pipelines. The aim is to enable realisation of effective software stacks for high performance computing systems that will be working in a fully integrated and automated way with advanced creative visualisation tools (possibly with appropriate extensions to virtual reality environments). They will effectively display results to end users and support appropriate interaction so as to feed information back to the pipeline and enable the possibility of improving the results, e.g. by focusing on specific features or by discarding potentially corrupted data chunks.
Both the pipeline and the framework software will run on remote HPC facilities at the data centres where data is stored. The work will involve gaining an understanding of the basics of data capture and data processing pipelines from cutting-edge, emerging instrumentation. A first implementation of the framework will be design and development of tools for the data analysis of one of the stages of the pipeline and the related visualisation and interaction with the data. The tools will integrate with appropriate feedback-generation mechanisms to inform improvements in the pipeline. They will also have to be designed to be extensible and scalable to the next generations of data volumes and observational instruments. The results will be validated by a professional user community within our network of stakeholders, who will also set the requirements and priorities for the expected functionalities. This will require a continuous interaction with all the members of our highly multidisciplinary international research team, who may want to offer paid visits and/or student internships.
Fees and funding
Funding availability: Self-funded PhD students only.
PhD full-time and part-time courses are eligible for the UK Government Doctoral Loan (UK and EU students only).
2020/2021 fees (applicable for October 2020 and February 2021 start)
Home/EU/CI full-time students: £4,407 p/a*
Home/EU/CI part-time students: £2,204 p/a*
International full-time students: £15,100 p/a*
International part-time students: £7,550 p/a*
*All fees are subject to annual increase
You will need a good first degree from an internationally recognised university (minimum upper second class or equivalent, depending on your chosen course) or a Master’s degree in an appropriate area. In exceptional cases, we may consider equivalent professional experience and/or qualifications. English language proficiency is required to be at a minimum of IELTS band 6.5 with no component score below 6.0.
Ideally, you should have a degree in any of the following disciplines with an emphasis on a strong coding background: computer games/animation, creative computing, digital media technology, information technology, virtual and augmented reality or engineering. Experience with HPC methodologies, as well as visualisation and deep learning algorithms would be advantageous. It is envisaged that travel to our overseas partners may be required at some point during the run of this project, e.g. to take up an internship, so it is essential that the successful candidate is able and willing to undertake international travel once the current pandemic restrictions are lifted and it is safe to travel.
How to apply
We’d encourage you to contact Dr Mel Krokos (email@example.com) to discuss your interest before you apply, quoting the project code CCTS4510920 and the project title.
When you are ready to apply, you can use our online application form. Make sure you submit a personal statement, proof of your degrees and grades, details of two referees, proof of your English language proficiency and an up-to-date CV. Our ‘How to Apply’ page offers further guidance on the PhD application process.
Please note, to be considered for this PhD opportunity you must quote project code CCTS4510920 when applying.