Open Science and Open Code
We are helping to accelerate science through open source and open code, enabling vital discoveries that save lives.
We have brought together over 150 repositories of open standards, data and source code, tackling some of the most important challenges in wrangling multi-modal data and generating replicable insights.
Founded on the FAIR principles and open reproducible science, we believe that all research and development activities will be able to maximise their limited resources and ultimately their impact, through open practices – open standards, open data, open access, open source and open innovation.
Open source is often misunderstood as the sole mechanism to promote open science. We humbly believe that open source is part of a wider open ecosystem approach that encourages open practice in multiple ways – fostering transparency and increasing reuse & collaboration in a open and vibrant knowledge commons. HDR UK has brought together over 150 repositories of open standards, data and source code, tackling some of the most important challenges in wrangling multi-modal data and generating replicable insights.
We adopt, extend and contribute to existing open standards to enable better sharing of data within the ecosystem. Our main dataset schemata library extends the W3C DCAT, schema.org and DUO standards & ontologies. We also contribute to international community standards via the ICODA Workbench development and the GA4GH.
We provide, where possible access to open data collected by HDR UK to be freely accessed, analysed and shared in a machine-readable form. Examples of this are our GitHub Datasets and GitHub Papers Repositories. Many other examples can be found on our Innovation Gateway and our open source collection on Github.
Find out about the types of health data that we use
We use and extensively contribute to open source by default with an open permissive license. Our development principles are geared towards an open-first and reuse based approach to leverage existing open source investments and to seek out new open-source developments, where this is not possible. Our open-source investments and contributions are highlighted here.
We actively seek to build, foster, contribute and co-create new research ideas, concepts and designs with a design-thinking approach by 1) involving users and the public from the beginning, 2) Understanding the existing ecosystem by learning about people, networks, cultures, politics, infrastructures and ensuring we do not duplicate efforts or violate norms, 3) Design for scale and sustainability by collaborating with stakeholders and the ecosystem, and 4) Being collaborative by seeking out partnership opportunities to ensure the collective success of the community. HDR UK’s Innovation Gateway is a true representation of this open innovation culture we wish to foster in the health data research ecosystem.
We are strongly committed to information sharing and transparency as stated in our Open Access Policy and Attribution Policy which sets out all research outputs resulting from HDR UK affiliated or acknowledged funding must be broadly and widely disseminated. This includes preprints or journal articles, algorithms or methodologies, reports, case studies and blog posts. We have also blogged this before here and below are some of the highlights of our recent research outputs:
Open software offers
Our open software offers advance in:
- Understanding how to tackle heart disease and stroke in the British Heart Foundation Data Science Centre.
Find the code in this GitHub repository
- Defining hundreds of the most common diseases affecting human health: working with text and disease terminologies to define disease: we tackling the challenges of different coding systems in primary and secondary care – and the need for researchers to access validation studies.
Find the code in this GitHub repository
- Converting what doctors write in clinical records into research insights which inform policy: medical notes and text holds vital information, and is different from the words we read everyday. So we need new NLP approaches. Read more in this case study highlighting how this code has been used.
Find the code itself in this GitHub repository
- Building the Innovation Gateway, providing a common entry point to discover and enquire about access to UK health datasets for research and innovation. It provides detailed information about the datasets, which are held by members of the UK Health Data Research Alliance, such as a description, size of the population, and the legal basis for access.
RADAR-base mHealth platform
RADAR-base is an award-winning open source mHealth platform that unlocks the potential of wearable and mobile devices for health data research and individual care. It allows passive and active...
CogStack information retrieval and extraction platform
CogStack supports open source healthcare analytics within the NHS, in line with HDR UK’s mission to enable software use whenever possible. The platform implements best-of-breed enterprise...
CALIBER is an exemplar of HDR UK’s commitment to open source. The comprehensive, open-access resource led by Spiros Denaxas provides the research community with information, tools and...
MEDCAT open software is transforming the way people are tackling the challenge of working with unstructured data in health records.
We know that vital information on disease severity and treatment is contained in the notes that doctors keep. A marker of the impact of the usefulness of this software comes from daily download numbers (see Figure from PyPi). HDR UK catalysed the development of MEDCAT through initiation of the National Text Analytics Resource (Richard Dobson and colleagues).
Research insights from its use have informed policy during the pandemic, such as refining the UK’s National Early Warning System during the pandemic to better predict which patients admitted to hospital are at most at risk of dying or needing intensive care treatment over the ensuing 14 days.
We believe the HDR UK community has established a strong open practice on which to foster a community of open innovation.
Now we are curating the whole into an exciting, living library as a public good. Over the next few months, we will:
- Provide a narrative Library Catalogue of software which explains to patients, clinicians and other stakeholders what the software does: along the pathway from raw data to reliable insight. We envisage ‘human readable’ narratives to complement vital dialogue among software engineers. This is one of the deliverables of the National Phenomics Resource.
- Identify and showcase the best software based on the three principles identified by the Software Sustainability Institute: originality, significance and rigour through our new Impact Committee, launching end of April 2021 – find out more about the criteria we will base our selection on.
- Build on our work with our research software engineers, digital research technologists, academics and others to recognise and reward software development in the context of career paths through our training portfolio.
- Unveiling the next stage of our strategy to train over 10,000 data scientists by the beginning of 2023, which will include the launch of our new Virtual Learning Environment – a Gateway to Education – for health data scientists, technologists and innovators, in Spring 2021.
- Contribute back to existing open source platforms and investments to improve reuse and foster a community
- Help to make our software more discoverable, not just in individual repositories but across cognate groups of standards, papers, code and other research outputs.