Overview

UK scientists recognised early in the pandemic that viral genomic sequencing would be a powerful tool in monitoring the evolution and spread of SARS-CoV-2, the virus that causes COVID-19. The COG-UK network established a decentralised network of sequencing facilities primarily at academic facilities, but in order to integrate and analyse the data a centralised processing environment was required. To fulfil this need CLIMB-COVID was developed which provided CPU and storage, an entirely bespoke database and an intricate pipeline of software tools to enable this dataset to be analysed. CLIMB-COVID plays an invaluable role in surveillance of SARS-CoV-2 and has potentially shown how we can respond better to future major disease outbreaks.

Challenge

Whole-genome sequencing for pathogen surveillance had been well-established in previous epidemics. Typically they were measured in hundreds or thousands of genomes and no system was available that could scale to millions of pathogen sequences during a growing pandemic.

Solution

In March 2020 the scientists behind COG-UK (the COVID-19 Genomics UK Consortium, which works with HDR UK and is part of the UK Health Data Research Alliance) realised that a decentralised UK-wide sequencing platform could be created. This major open science initiative would see samples and sample metadata from hospitals and test centres processed regionally (using the sequencing capacities of universities and other institutions). The results would then be securely stored and analysed on CLIMB (the MRC funded Cloud Infrastructure for Microbial Bioinformatics) at the Universities of Birmingham and Cardiff, as well as the University of Birmingham’s BEAR infrastructure which provided additional computing resources.

In March 2020, in response to the challenges posed by the fast developing pandemic, a group including Dr Samuel Nicholls, Senior Research Fellow in Sequencing Bioinformatics at University of Birmingham, working with Radoslaw Poplawski, CLIMB’s Cloud Computing Manager, made the first instance of CLIMB-COVID over a single weekend.

Days later 250 genomes had been uploaded – more than any country other than China at that time. The figure has now passed one million, around a quarter of the world total.

This work has involved numerous research groups building software tools to enable analysis of viral genomes, particularly the Centre for Pathogen Genomic Surveillance who developed the Microreact tool to visualise the outputs from COG-UK and the University of Edinburgh who developed the PANGO lineage assignment tool and phylogenetics analysis pipelines.

Impact and outcomes

CLIMB-COVID has been praised worldwide and is described by HPC Wire as “one of the most meaningful anti-COVID tools in advanced computing”. Its development (see Genome Biology) was fast and impact widespread, feeding vital information to SAGE, to government and health authorities about the prevalence and trends of the disease – including about the presence of the Alpha and Delta variants.

Dr Nicholls says: “CLIMB-COVID offers an unprecedented insight into how the virus is moving. And we’re able to identify variants, and how they’re moving through a population. It really drives our ability to help investigate outbreaks as they happen. So if Delta starts to get displaced we’ll be among the first to see it.”

Dr Nicholls and colleagues are working closely with UK HSA to ensure that CLIMB-COVID data is put to full use. CLIMB-COVID could prove invaluable to COVID-19 vaccine and booster programmes. Genomic epidemiology might now become central to future pandemic planning.

Patient and Public Involvement and Engagement

Data has been fed back to the public and CLIMB-COVID results are available on the COG-UK website.

Impact Committee

HDR UK’s Impact Committee selected this paper as an example of research excellence with demonstrated impact on public health.

Contact

Dr Nicholls sam@samnicholls.net.