HDR UK today announces 1000 phenotypes have been uploaded to its Phenotype Library, making it a major resource to help researchers answer important questions from UK health data. 

When patients interact with the healthcare system, digital information is collected on their symptoms, diagnoses, laboratory test results, and prescriptions as an Electronic Health Record.  

Electronic Health Records are a valuable resource for researchers to find ways to improve health and care, but the data they hold is often recorded in variable detail and quality. As a result, researchers spend time creating complex computer programs – known as phenotype algorithms – to fix and analyse information from Electronic Health Records. These phenotypes are tailored to identify and analyse specific diseases from certain datasets. 

The HDR UK Phenotype Library hosts over 1000 phenotypes to aid health data research

Led by Professor Spiros Denaxas (Professor of Biomedical Informatics at UCL) and Professor Emily Jefferson (Director of the Health Informatics Centre at the University of Dundee), the Phenotype Library is the first national platform to share these crucial tools between researchers, helping them avoid duplication of effort and improve the reproducibility of their work. 

Emily Jefferson, co-lead of the Phenotype Library, said: “The HDR UK Phenotype Library has reached a momentous milestone with 1000 phenotyping algorithms and metadata now available for use by the research community to drive research. 

“Over the last few years, the Library has grown to provide a powerful platform for researchers. Now, the ability to access curated, data-driven definitions for 1000 common and rare health conditions gives researchers the opportunity to save time and improve the quality of their research on an all-new scale, to ultimately bring major benefits to patients.” 

The Library has already enabled significant research developments, like identifying 10,486 previously unattributed deaths from COVID-19, recognising that people with cardiovascular diseases are at higher risk of death from COVID-19, and speeding up recruitment to clinical trials to help bring new treatments and healthcare interventions faster to patients. 


Martin Chapman, Lecturer in Health Informatics at King’s College London and a user of the Library, said: “We used the Phenotype Library to explore whether phenotype algorithms could be used to quickly and digitally identify potential participants for clinical trials based on their Electronic Health Records, replacing traditional methods that recruit individuals as and when they interact with the healthcare system. 

“We found phenotypes could indeed address this critical challenge in recruiting for clinical trials, and be integrated completely with existing clinical trial software, making the Library an invaluable source of tools and resources for clinical trial researchers across the UK.” 

Daniel Thayer, Senior Data Scientist at Swansea University’s SAIL Databank and Development Lead of the Phenotype Library said: “Reaching 1000 phenotypes is a nice round number – but even more than the raw volume of content, the breadth of what the Library holds shows our progress in serving as a resource to support the health data research community.  

“We hold content from numerous contributing organisations, defined against 34 different datasets and representing the contributions of hundreds of researchers. We look forward to further expanding this to support even more important work with health data. We have now provided access to researchers to be able to upload and publish their own content, which we hope will be the next step in that process”. 

Visit the Phenotype Library