Being able to link addresses across systems offers a valuable resource for health data science. However, they are often not standardised despite a government push towards this. Researchers at the Clinical Effectiveness Group (CEG) at Queen Mary University of London, supported by HDR UK and Endeavour Health, have developed an algorithm to accurately match addresses from health records to a reference database, providing a powerful tool for future work.
Health data researchers are increasingly linking data from different sources to draw new insights from the connections. This includes location information to allow, for instance, linking health records to environmental data. To do this, the non-standardised addresses in health records need to be linked to reference databases that contain unique property reference numbers (UPRNs) and coordinates.
Address-matching algorithms are available in the UK, but these are not transparent or have not been tested against patient-recorded addresses in electronic health records. Linking places to people is a key part of the Government’s policy to connect databases together and improve lives.
Researchers supported by HDR UK and Endeavour Health Charity have developed a new algorithm specifically designed for working with electronic health records to match the registered addresses to UPRNs. The algorithm, called ASSIGN, was tested using gold-standard datasets from London and Wales, each containing over 9,000 addresses. The team then applied the ASSIGN algorithm to the recorded addresses of a sample of 1,700,000 adults registered with all general practices in northeast London.
The aim was to transparently carry out quality assurance and examine potential biases in matching, using multivariable analyses to estimate the likelihood of a match by demographic, registration and organisational variables.
Impact and outcomes
The researchers found that ASSIGN had at least a 99.5% match rate in the gold standard datasets and 98.6% for the northeast London study population. The 1.4% without a UPRN match were more likely to have changed registered address in the last 12 months, be from a Chinese ethnic background or registered with a GP using the SystmOne clinical record system. People who were registered for more than 6.5 years with their GP were more likely to have a match than those who had registered more recently.
The paper describing ASSIGN was published in the International Journal of Population Data Science and the algorithm code is available open source for others to use freely. The work done to identify the address-matching algorithm accuracy and biases will support the use of UPRNs in electronic health records and potentially in other sectors also. It builds on the pioneering work of Welsh HDRUK colleagues who more than a decade ago proposed the use of Residential Anonymised Linkage Fields to enable evaluation of the wider determinants of health. ASSIGN is to be implemented in both Wales and Scotland Trusted Research Environments to provide a uniform way of assigning UPRNs when doing collaborative work with multi-nation addresses.
The impact committee thought that this was impressive research. The incredible accuracy of the algorithm and the thoroughness of validation in a large population means that ASSIGN is likely to have a significant impact on research and, ultimately, patients.
Patient-centric characterization of multimorbidity trajectories in patients with severe mental illnesses: A temporal bipartite network modeling approach
21 June 2022
People with severe mental illness have a lower life expectancy and a higher risk of physical conditions. To improve how these comorbidities can be detected and predicted, researchers have used...
Mapping multimorbidity in individuals with schizophrenia and bipolar disorders: evidence from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register
21 April 2022
People with severe mental illness, such as schizophrenia spectrum disorders or bipolar disorders, have higher death rates. It is difficult to study mental health records at scale, so a team of...
A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease
18 January 2022
Overview Alzheimer’s disease (AD) is a highly prevalent form of dementia – the genetic variations underlying the disease are poorly understood and the number and effectiveness of drug...