What this PhD programme offers

HDR UK is funding six PhDs with leading UK universities and research organisations. This brand new programme offers the chance to carry out a doctoral research project at the leading edge of health data science.

Each project has been selected for its scientific excellence, importance and originality and will help deliver the aims and objectives of HDR UK’s Big Data for Complex Diseases (BDCD) Driver Programme.

Graduates will be well placed for a career at the forefront of health data research.

The research projects

  • Characterising a high-risk Type 2 Diabetes phenotype using big data – the role of comorbidity and ethnicity: University of Bristol
  • Advancing primary prevention strategies for complex diseases: University of Cambridge
  • Expediting the diagnosis of cancer and other diseases where early diagnosis can improve clinical outcomes in patients presenting to a GP with new symptoms: UCL
  • Repurposing and enriching cardiovascular risk prediction model to identify people at risk of cancer: UCL
  • BLOod Test Trend for cancEr Detection (BLOTTED): an observational and prediction model development study using English primary care electronic health records data: University of Oxford
  • Modelling the impact of diagnostic pathways in cancer and cardiovascular disease: University of Swansea.

See below for details of each project and to apply.

Who the PhD projects are for

The programme is for enthusiastic, talented students who want to use data-driven research to develop and shape the UK’s response to the most complex health challenges of our times.

  • For eligibility details see the information about each PhD project.
  • HDR UK is committed to promoting equality and inclusion, click here to find out more.
  • We aim to accommodate specific needs and personal circumstances. Please make us aware of individual circumstances when applying or contact us directly at learn@hdruk.ac.uk. Please note the HDR UK PhD Student Privacy Notice.

If you have questions or require adjustments to the application process, please contact learn@hdruk.ac.uk.

Outcomes and benefits

This is a chance to earn a PhD from a leading university and by conducting a research project that will make a direct contribution to improving the health and care of patients.

Three or four years of funding will be available from HDR UK depending on the project. A three-year research project will be carried out at your home university and will be linked to HDR UK Big Data for Complex Disease Driver Programme aims and objectives.

Students will receive excellent training and career development opportunities in an environment that nurtures team science.

They can be hosted by a single research organisation or part of the studentship can be hosted by a second research organisation.

Each BDCD studentship award will comprise a three-year stipend and research costs of maximum £75,000 per studentship awarded to the host organisation(s). This is based on UK Research and Innovation (UKRI) minimum rate (Appendix B).

Benefits include:

  • Tax-free stipends with annual increases based on UKRI advertised rates
  • Fully-paid tuition fees (and college fees where required) – international fee waivers are not available.
  • Research costs of up to £5,000 a year
  • Expenses and travel costs for conferences and events £300 a year.


Applications are open until 14:00 on 15 June 2023. Application forms are available underneath the information about each of the projects.

The provisional timeline is for interviews to take place w/c 19 June with offers being made w/c 26 June onwards. Studentships will generally begin in October 2023 or January 2024.

BDCD Driver Programme aims and objectives

For a wide range of complex diseases, deriving intelligence from nationwide, multisource, linked health relevant data has the potential to yield crucial insights that accelerate and enhance opportunities for innovation in disease detection, diagnosis, treatment, improved care, better outcomes and more rational health policy.

Cancer and cardiovascular diseases (CVD) are the two commonest causes of morbidity and mortality in the UK and globally, with incidence, morbidity and mortality increasing over the last several decades as the world’s population has aged.

Slowing these global trends requires approaches that recognise and exploit the power of whole, large population-scale health relevant data to catalyse health data science and its translation.

We also need to break down traditionally siloed disease and expertise specific domains, rising to the challenge of jointly addressing cancer, CVD, other complex diseases, their inter-relationships and their sequelae.

Crucially, we need to use the intelligence gained to translate into real benefit for citizens and patients and influence national and international policy and best practice.


The PhD Projects

  • Key details

    • Hosted by: University of Bristol
    • Lead supervisor: Dr Rachel Denholm, Bristol Medical School
    • Duration: Four years
    • Stipend: UKRI stipend rates apply
    • No international fee waivers
    • Start date: October 2023

    Project summary

    Recent research indicates that about a third of people initially diagnosed with Type 2 Diabetes (T2D) have very high levels of blood glucose at diagnosis (severe hyperglycaemia, characterised by high HbA1c – a measure of average blood glucose in the preceding three months); this appears particularly common in people of African/ African Caribbean ethnicity. However, it is unclear whether this subtype is indicative of inequalities in early diagnosis of T2D, including whether certain patients are less likely to seek treatment; the impact of existing illnesses, or a very aggressive form of T2D.

    This project will use national data from general practices andhospitals to identify a group of people with T2D, characterised by glycaemic status at diagnosis, and investigate therole of patient’s medical history, such as existing health conditions, and demographic characteristics, like ethnicity, inexplaining differences in HbA1C levels at diagnosis. We will also investigate the long-term consequences of high HbA1C levels at diagnosis on the risk of cardiovascular conditions and cancer, compared to those with lower levels, and the role of pre-existing health conditions and ethnicity.

    Findings from the study will help identify whether the severely hyperglycaemic form of T2D is a distinct, severe form of the disease, or a consequence of inequalities in early diagnosis and treatment or the effect of the complex inter­ relationships with other long-term conditions.

    Suitability and eligibility

    This project would suit students with a degree in a biomedical, mathematics, data science or related quantitative discipline (with some familiarity with statistical analysis) and an interest in the application of statistical methods incardiometabolic research. Some post-graduate research experience would be an advantage

    Experience in the analysis of routine health data, the use of statistical software such as R, or a programming language such as Python would be an advantage.

    Click here for further details and to access the application form.

    • If you have any questions contact r.denholm@bristol.ac.uk
  • Key details

    • Hosted by: University of Cambridge
    • Lead supervisor: Angela Wood, Department of Public Health and Primary Care
    • Duration: Three years
    • Stipend: UKRI stipend rates apply
    • No international fee waivers
    • Start date: Flexible for right candidate

    Project summary

    It is far better for people and cheaper for health services to prevent or delay development of complex diseases, such as cardiovascular disease and cancer, than to treat patients after they become unwell. Knowing about the future risk of complex diseases can help people and health professionals caring for them make decisions about how risk best be managed through life-style changes and/or treatments.

    At present, a wide array of tools are used to predict a person’s future risk of single diseases based on particular risk factors (eg, age, blood pressure, body mass index) measured at a single point in time. Two opportunities are being missed. First, because these tools target single diseases, the opportunity to identify people who are at risk of multiple complex diseases, and often interrelated, is being missed. Second, decisions based on measures at a single time point do not take into account important fluctuations and changes in risk over time.

    There is an urgent need to combine risk prediction tools and decision making for multiple complex diseases together, and to extend these tools for monitoring of risk over a patients’ lifetime.

    The overarching goal of this research is to enhance the prevention of multiple complex diseases by developing and evaluating risk prediction tools that monitor a person’s risk of multiple diseases over time and to translate the risk prediction tools into primary prevention strategies.​

    Eligibility and suitability

    Applicants need a good first degree and/or master’s degree or equivalent experience in one of the following subjects: statistics, epidemiology, biostatistics, mathematics, data science, informatics.

    They will need to be committed to open source, reproducible, research and able to work accurately, with attention to detail. Excellent written and verbal communication skills and organisational and time management skills are also essential.

    Strong data manipulation and analysis experience/skills would be beneficial including:

    • scripting skills in writing re-usable code in at least one programming language (eg, SQL, Python/PySpark)
    • skills in at least one statistical software package (e.g. R, Stata).
    • experience of analysing person-level data, ideally from electronic health records.

    Click here for further details and to access the application form.

  • Key details

    • Hosted by: UCL
    • Lead supervisor: Georgios Lyratzopoulos, Deptartment of Behavioural Science and Health, Institute of Epidemiology and Healthcare (IEHC), UCL
    • Duration: Three years (four years part time)
    • Stipend: UKRI stipend rates apply
    • No international fee waivers
    • Start date: Preferred start October 2023

    Project summary

    Improving medical diagnosis is a priority for the NHS and health systems across the world. However, the diagnosisof cancer and several other diseases where early diagnosis can improve clinical outcomes is often delayed.

    Meticulous analysis of electronic health records data can provide information to support doctors in deciding when to order tests or refer a patient to a specialist, but such data remain under-explored. The proposed PhD will systematically examine the predictive value for consequential diseases of one or more non-specific symptoms (e.g. loss of weight, abdominal pain, or abdominal bloating). Incorporating information from additional presenting features (e.g. common blood test results, symptom combinations, or comorbidity) will also be explored.

    The supervisory team includes healthcare epidemiologists; data scientists; statisticians; data managers and academic GPs. Rich linked data (~2.7M records) are available to the student at the outset to support the project. Example prior work includes: Phenotyping electronic health records at scale (PMID:31329239); assessing risk of different diseases (PMID:34339405); examining the predictive values of symptom combinations (PMID:36702593); formally identifying diseases with greatest earlier diagnosis potential (doi.org/10.3399/BJGP.2023.0044)

    The project will improve the assessment of risk of underlying consequential illness in primary care, thereby improving clinical outcomes and patient experience and helping to update national clinical practice guidelines.

    Eligibility and suitability

    This project requires applicants with high academic potential, with affinity for coding and analysis of complex data, fluency in statistical analysis software and relational databases, and the ability to be trained in statistical techniques. Excellence in written communication is essential. Interest is welcome from candidates who are:

    • working as data analysts (e.g. post-master’s) seeking to study for a PhD
    • or completing a master’s degree in a quantitative field (e.g. data science, statistics)
    • or completing undergraduate studies in a quantitative discipline and have an exceptional academic record.

    Research experience in analysing large longitudinal health datasets such as administrative data, electronic health records and disease registries would be an advantage as would:

    • experience of study design and planning of analysis for successful completed/published research projects
    • relevant analytical work experience in academic research departments, or Public Health England/NHS, or relevant sectors of the health industry.

    Click here for further details and to access the application form.

  • Key details

    • Hosted by: UCL
    • Lead supervisor: A Floriaan Schmidt, Institute of Cardiovascular Science, University College London
    • Duration: Three years
    • Stipend: UKRI stipend rates apply
    • No international fee waivers
    • Start date: Flexible for the right candidate

    Project summary

    Risk-stratified management of cardiovascular disease (CVD), where people without established disease receive preventative interventions and monitoring based on their 10-years predicted risk, has been highly successful to ensure healthcare resources are allocated to those most likely to benefit.

    Through the development of tumour-specific medicines, cancer has been on the forefront of personalised medicine. Preventative strategies for cancer have, however, focussed on costly on-size-fits­-all screening programmes, irrespective of the individual cancer risk. While there have been attempts to prioritise screening and prevention strategies to high risk individuals, such as using prostate-specific antigen, there is a general sparsity of risk-based approached for early detection and prevention of cancers in clinical practice.

    Despite distinct pathways of disease development, and depending on the type of cancer (e.g. lung cancer, colorectal cancer), CVD risk factors also contributed to the development of cancer. For example, smoking and alcohol use are important risk factors for both diseases, as well as sedentary lifestyle, poor diet and environmental factors such as air pollution. Given the interrelation between risk factors for CVD and risk factors for cancer, as well as the high clinical uptake of risk prediction tools for CVD, we wish to explore to what extent established models for CVD prediction can be repurposed and enriched to consider the onset of cancer.

    Eligibility and suitability

    Applicants will need:

    • A background in statistics/epidemiology/computer science or related disciplines.
    • An interest in developing their knowledge of statistics and/or machine learning.
    • The ability to work collaboratively with a team of clinical and statistical experts.

    Experience would also be valuable in:

    • working computationally with tabular data, and healthcare data in particular, is appreciated.
    • programming languages such as Python or R
    • working genomic, metabolomics, or proteomics.

    Click here for further details and to access the application form.

    • Hosted by: University of Oxford
    • Lead supervisor: Brian Nicholson, Nuffield Department of Primary Care Health Sciences
    • Duration: Three years
    • Stipend: UKRI stipend rates apply
    • No international fee waivers
    • Start date: Flexible for the right candidate

    Project summary

    The PhD student will lead research exploring trends in blood tests for early detection of cancer and assess how the presence of CVD influences trends and cancer risk. Analyses will be conducted using linked national electronic health record data. The student will be responsible for developing and validating prediction models, with a particular focus on methods to utilise longitudinal data. The student will be a member of the multidisciplinary Cancer Research, Medical Statistics, and Cardiovascular Groups at the University of Oxford and Epidemiology of Cancer Healthcare Outcomes Group at University College.

    Eligibility and suitability

    The successful applicant will need:

    • to be educated to degree level (minimum 2:1 at undergraduate level) or equivalent experience
    • experience in conducting/supporting medical research
    • an understanding of basic/fundamental statistics/data science
    • to be motivated and interested in cancer/CVD research, primary care, and linked electronic health data
    • the ability to communicate research findings to a multidisciplinary group
    • the ability to plan, implement and deliver programmes of work
    • high proficiency in the English language.

    It will be an advantage to have:

    • experience in analysing longitudinal data from primary care
    • experience in working in a cancer/CVD-related position
    • familiarity with analytic software, such as Stata, R, or Python
    • experience in developing/validating risk prediction models.

    Click here for further details and to access the application form.

  • Key details

    • Hosted by: University of Swansea
    • Lead supervisor: Dr Rhiannon Owen, Swansea University Medical School
    • Duration: Three years
    • Stipend: UKRI stipend rates apply
    • No international fee waivers
    • Start date: Flexible for the right candidate

    Project summary

    Many patients who are diagnosed with cancer and/or cardiovascular disease (CVD) receive their diagnosis in Accident and Emergency (A&E). These patients tend to have more severe disease than those who receive a diagnosis from their general practice (GP).

    This PhD project will develop mathematical models to predict what the benefits of receiving an earlier diagnosis (via their GP) would have been for both patients and the NHS. With the frequent co-existence of cancer and CVD, this presents an important opportunity to improve population health, and reduce potential health inequalities, by improving the diagnostic pathway of both diseases.

    The project will specifically explore the epidemiology of cancer and/or CVD diagnoses, especially with respect to where, and for whom, such diagnoses are made, using population-scale data including 2.9 million individuals in the Secure Anonymised Information Linkage (SAIL) Databank Wales Multimorbidity e-Cohort.

    The project will use statistical modelling and machine learning techniques to predict the impact of in-hospital diagnoses for cancer and/or CVD on patient-relevant and NHS outcomes, especially with regards to life-expectancy and quality of life. In doing so, it will provide both a benchmark against which existing diagnostic/pathway initiatives can be evaluated, as well as identifying potential inequalities and predicting the impact of new areas of system development to improve patient outcomes.

    Eligibility and suitability

    Applicants will need an MSc in Statistics/Biostatistics or Epidemiology/Health Data Science (with a strong analytical component) plus programming and data analysis skills/experience in R and/or Python.

    Experience of analysing large-scale linked electronic health record data Knowledge of Bayesian methods would be an advantage.

    Click here for further details and to access the application form.

    • If you have any questions contact R.K.Owen@Swansea.ac.uk or Keith.Abrams@warwick.ac.uk

PhD Programme team

The core team for the programme includes:

  • Director: Prof Cathie Sudlow (HDR UK)
  • Director: Prof Mark Lawler (Queen’s University Belfast, HDR UK Northern Ireland)
  • Dr Rhiannon Owen, Associate Professor of Statistics, Population Data Science, Swansea University
  • Prof Julian Halcox, Professor of Cardiology & Health Data Science, Population Data Science, Swansea University
  • Prof Georgios Lyratzopoulos, Professor of Cancer Epidemiology (principal supervisor, healthcare epidemiology), UCL
  • Prof Spiros Denaxas, Professor of Biomedical Informatics (computational phenotyping), UCL
  • Dr Matthew Barclay, Senior Research Fellow / Cancer Research UK-ACED Fellow, Statistician (methodological lead), UCL
  • Ms Becky White, Data Science Research Fellow (data management), UCL
  • Dr Meena Rafiq, Clinical Fellow / Academic General Practitioner (clinical input into phenotyping and outcome selection/interpretation, UCL
  • Prof Angela Wood, Professor of Health Data Science, University of Cambridge
  • Dr Brian D. Nicholson (GP and NIHR Clinical Lecturer), University of Oxford
  • Prof Eva Morris (Prof of Health Data Epidemiology): University of Oxford
  • Dr Pradeep S. Virdee (Medical Statistician), University of Oxford
  • Emanuele Di Angelantonio (Professor of Clinical Epidemiology), University of Cambridge
  • Antonis Antoniou (Professor of Cancer Risk Prediction), University of Cambridge
  • Dr Rachel Denholm, University of Bristol
  • Dr Sophie Eastwood, UCL
  • Prof Jonathan Sterne, University of Bristol
  • Prof Nish Chaturvedi, UCL
  • Prof Kate Tilling, University of Bristol

The wider leadership team consists of subject matter experts from across the UK.