Genetics has transformed our understanding of how variation in DNA can influence risk of developing conditions, such as cancer and heart disease. Studies that can combine this genetic information with other blood-based factors – including proteins, metabolites and lipids – and health records, have the potential to provide more direct insight into disease aetiology and prediction. A key challenge so far, however, has been accessing this information at sufficient scale.

The development of a National Multi-omics Consortium aims to address this challenge by bringing together information on participants from multiple studies to enhance scientific power, breadth, and robustness. The Consortium, led by HDR UK Researcher Dr Adam Butterworth, will bring together existing and unique data assets, maximising their value within an open and collaborative national team.

Challenge

The UK has strength in population cohorts and in generating a wide range of molecular measurements about participants in these cohorts. Cohort studies investigate the interactions of predisposing genetic, environmental, social and lifestyle factors, often evaluating multiple hypotheses, but are often analysed in isolation from other studies. To maximise the value of population cohorts and speed up innovation, there is a need for robust cross-generational and cross-cohort research.

One of HDR UK’s research priorities is to understand the causes of disease at a deeper than biological level, to enable better prediction of the onset and progression of ill-health and to tailor medicines for sub-types of disease. The Consortium will bring together multiple cohorts and use advanced data methods to interrogate the complex data layers, going from genomics through multi-omics to health and disease outcomes.

Solution

The Consortium will initially involve nine longitudinal UK population cohorts within the HDR UK network, comprising over 750,000 participants. Combining these studies will considerably improve the statistical power, molecular breadth, and generalisability compared to analyses of individual cohorts. 

The nine cohorts are:

  • AIRWAVE Health Monitoring dataset investigates whether there are any long-term health impacts of Airwave use on police personnel in this study of 53,000 participants. Lead institution: Imperial College London
  • COMPARE Study is determining the best method for measuring iron levels in blood donors. Lead institution: University of Cambridge
  • EPIC-Norfolk is a study of over 25,000 people living in Norfolk, looking to provide data-based evidence for health policies to prevent or delay disease onset and maintain health and independence in older people. Lead institution: MRC Epidemiology
  • The Fenland Study investigates the interaction between environmental and genetic factors in determining obesity, type 2 diabetes, and related metabolic disorders. Lead institution: MRC Epidemiology
  • Generation Scotland is a resource of human biological samples and data which are available for medical research. Lead institution: University of Edinburgh
  • GoDARTS identified 18,000 patients with diabetes within the wider Tayside region, through electronic record linkage, in order to improve health care over and above that which was practical through existing general practice lists alone. Lead institution: University of Dundee
  • INTERVAL Study of 50,000 blood donors across England have joined the INTERVAL study to help our researchers compare the effects of donating blood at different time intervals. Lead institution: University of Cambridge
  • UCLEB Consortium of 40,000 participants works to explain the precise relationship between the genes and biological changes that precede cardiovascular disease. Lead institution: University College London
  • UK Biobank is a national and international open access resource following the health and wellbeing of 500,000 volunteer participants, which aims to improve the prevention, diagnosis and treatment of a wide range of illnesses. 

Discover more about these cohorts in the Multi-omics Consortium Collection on the Health Data Research Innovation Gateway.

Chosen because of the range of ages, ethnicities, and socioeconomic backgrounds,– as well as the linkages with NHS databases – the cohorts complement each other, allowing for exploration of a greater range of disease and health outcomes.

The project, which is initially funded for two years, aims to test the feasibility of integrating the extensive molecular measurements across this initial set of cohorts, before expanding to include other population cohorts. One of the key limitations that the Consortium hopes to overcome is that one study may only have a small number of people who have developed a disease, but bringing them together across cohorts will substantially increase the ability to identify associated risk factors.

Dr Adam Butterworth, HDR-UK Cambridge, said:

“The aim is to create a platform that will allow researchers to interrogate the multi-omic data from a broad discovery point of view, addressing a wide range of disease areas. One option is to wait for large studies to accrue multi-omic data and mature, but a more efficient step is to bring this dense molecular data in existing cohorts together now. By accelerating this process we hope to more rapidly advance science by putting in place infrastructure and methods to allow the research community to efficiently analyse data across longitudinal population studies.”

Process

Designed in collaboration with patients, the public and scientists, the team will:

  • identify optimal IT solutions to make datasets easily analysable by the scientific community
  • work out how to harmonise data across studies with different measurement technologies
  • develop new statistical approaches to combine these complex data types
  • report initial findings on exemplar diseases to demonstrate the value of this integrative approach.

There are two major components to the project: one is to develop the IT infrastructure, by scoping the landscape and identifying existing systems that could span the breadth of analysing vast genomic and molecular data, as well as the high security needed to store NHS data. Alternatively, the Consortium may require the development of new, bespoke academic or commercial solutions. An important consideration is balancing the immediate needs of the research community with the long-term ambitions of HDR UK, taking into account ethical and governance considerations, so that the infrastructure is secure and has longevity.

The second component is to identify and implement the statistical methods, to ensure that in combining cohorts, the many layers of information, from genetics, to metabolomics, through to health records, can be maximally exploited.

Dr Butterworth said:

“Science is often driven by the development of technology, such as assays, to measure particular domains of molecular information, but biology doesn’t work this way. By combining these cohorts, we will bring together many biological layers of information, but specific biodata may be missing for one cohort. We intend to develop methods to integrate information across different layers and leverage the information we have to drive scientific breakthroughs.”

The team will also benefit from a collaboration with Cambridge Services for Data Driven Discovery (CSD3) – a national data intensive science cloud for converged simulation, AI and analytics. The CSD3 will support the secure analysis of linked electronic health records and multi-omic datasets to transform disease classification from an approach based on symptoms and pathology to one founded in the molecular causes of disease.

Impact

The long-term vision is to be the catalyst to a better understanding of biological pathways and disease by integrative analysis at scale. The overlaying of the ‘expressed genome’ onto population cohorts, with genomic information linked with high resolution electronic health records, will allow researchers to go beyond correlations to identify causal risk factors which can help to prevent or treat disease.

The Consortium will reveal insights into biology and disease by integrating the information at scale, which includes measures of the genes switched on in a person, levels of proteins in blood samples, and data that reflect both one’s genes and one’s lifestyle throughout the individual’s life. These layers of biodata will be linked to health outcomes using electronic health records to help develop better, more targeted treatment for disease.

Collaboration opportunities

The project aims to provide a nucleus to attract additional relevant cohorts nationally and, eventually, internationally. The first stage is to bring together nine UK cohorts and provide the proof of concept, infrastructure and methodology to prove it can be effective. The next step is to engage with other UK population cohorts with genetic and molecular information, and potential to link with health records, to advance large-scale multi-omics.

Partners: University of Cambridge, MRC Epidemiology, University of Dundee, University of Edinburgh, Imperial College London, University College London, Swansea University, EMBL European Bioinformatics Institute, NIHR Cambridge Biomedical Research Centre, Alan Turing Institute, and University of Manchester.

Contact: Dr Adam Butterworth