Phenotyping algorithms are special tools that enable researchers to extract information such as diagnoses, drug prescriptions and laboratory test values from complex, and often messy data that are generated during routine interactions within the healthcare system. Health Data Research UK (HDR UK) have developed a national platform for dissemination of citable algorithms (including validation information) and tools which improves research reproducibility.


A primary reason for using data from electronic health records is the creation of phenotyping algorithms to identify disease status, onset and progression. Creating and evaluating phenotyping algorithms in electronic health records however is challenging as data are collected for different purposes, have variable quality and often require significant harmonisation. While considerable effort goes into developing these algorithms, there is no consistent methodology for creating and evaluating them and no centralised repository for depositing and sharing them.


As a funded project of HDR UK, researchers have developed specialised algorithms that enable phenotypes to be defined from electronic health records. These algorithms identify and extract information from electronic health records using the clinical ontology codes which are the building blocks of how information is recorded in healthcare (for example ICD-10 and SNOMED-CT). Algorithms are freely and openly accessible via the HDR UK Phenotype Library – a platform for creation, storage, dissemination, re-use, evaluation, and citation of curated algorithms and metadata.

Using these specialised algorithms, researchers and clinicians can maximise the value of patient data contained in electronic health records and use the data to answer clinically meaningful questions that improve human health and healthcare.

Impact and Outcomes

The Phenotype Library is the world’s largest national standards-driven libraries of citable phenotyping algorithms, metadata and tools for defining human disease, lifestyle risk factors and biomarkers in electronic health records research.

It provides a step-change in the current UK electronic heath record community by bringing together clinicians, researchers and patients to co-create phenotype definitions within an easy-to-use environment which can be enhanced and leveraged by the research community.

Its user-centred design provides the basis for an open source API infrastructure to support research and innovation to improve the health and care of patients. The Library now curates over 100,000 clinical ontology terms into 773 phenotypes (half with computational phenotypes) integrating phenotypes and collections from numerous contributing organisations across the UK, spanning critical disease (PIONEER) areas including heart disease, cancer (DATACAN), the British Heart Foundation (BHF) Data Science Centre and others.

Phenotypes are defined against 24 different research datasets and 16 coding systems, with more being added frequently.  New contributions are welcome from anyone working in the field.

The Library is interoperable with other tools and resources, making it part of a broader ecosystem driving the next generation of health research methods. It is currently integrated with the metadata catalogue of the Health Data Research Innovation Gateway (the ‘Gateway’), as well as Phenoflow, a tool enabling workflow-based computable phenotype definitions.

The information and tools contained in the Library support faster, higher quality, and more transparent research – using and maximising the value of the data contained in electronic health records, thereby answering important questions that can improve health and healthcare.

Team Science

Creation of the Phenotype Library involved highly collaborative teams distributed across 10 UK universities, with each bringing unique domain expertise, and was developed based on engagement with user community including patients & public.

Click here to access the Phenotype Library 

Click here for the API