Many artificial intelligence algorithms are developed using large, publicly accessible datasets. But we know very little about what data is out there, where it comes from, and the people, settings and health conditions it represents. The risk is that new AI health technologies will be based on unrepresentative datasets and will therefore only work for some people in some contexts, leaving countless others behind – an issue known as ‘health data poverty’.
Eye health is one of the leading areas of digital innovation. Through global searches and analysis, this project is mapping what publicly available eye imaging datasets are out there and reviewing the extent to which they represent the diversity and the needs of the world population. In understanding and assessing the information being used by AI algorithms, the team can identify gaps – such as a lack of basic, clinically important information about the people represented (like age, sex and ethnicity) or disparities in who or what conditions are represented. We can then work to understand why this is the case – and look for possible solutions. For example, if data isn’t publicly available, why? Are there barriers to collecting it, and how can we overcome these barriers? Or are there particular challenges in making more representative data visible, accessible and useable?
Only 1 of the 98 eye health datasets we identified and assessed came from sub-Saharan Africa – most came from populations in Asia, North America and Europe – and none came from Australia and New Zealand. This means the people and diseases in these datasets represent only a small part of the global population, and others are left ‘off the map’.
The impact and outcomes
The project has already uncovered major gaps in the publicly available data on eye health and highlights a concerning lack of data on certain conditions, from certain parts of the world and certain population groups.
The ambition is to extend these reviews into other health disciplines to understand the scale of the problem and to make sure that new AI health technologies are based on representative datasets so that everyone benefits from AI innovations and decision-making for better health and care.
and Pearse Keane.
Can we accurately forecast non-elective bed occupancy and admissions in the NHS? A time-series MSARIMA analysis of longitudinal data from an NHS Trust
1 July 2022
Hospitals need to be able to predict their capacity for admitting patients when planning elective surgeries. Researchers funded by HDR UK developed a new model for making forecasts that were more...
A population-based cohort study of obesity, ethnicity and COVID-19 mortality in 12.6 million adults in England
21 June 2022
Obesity dramatically increases the risk of death from COVID-19 but, the extent of this risk across different body weights and ethnic groups was not clear. Researchers using health and Census...