Throughout my six-week internship in the Genetic Epidemiology group at the University of Leicester, I was given the opportunity to work on an exciting project about asthma in individuals from the UK. Asthma is a common, long-term disease of the airways in the lung.  About 5-10% of people with asthma have a severe form of disease that is not helped by currently available treatments.  The aim of my project was to identify individuals with severe asthma in the UK Biobank cohort, to enable further studies on the genetic causes of severe asthma. One matter I was seeking to understand was the number of severe asthma patients in the whole UK Biobank cohort and more specifically how this might relate with their ancestry. I was also interested in analysing how the different health care records link to define severe asthma and what conclusions I could draw from it.  

I analysed self-reported medications and hospital inpatient records to identify severe asthma in all UK Biobank participants. A previous study had already been conducted, though it was based on participants with European ancestry and only a third of the cohort with genetic data. By using R within the Linux environment, I was able to analyse the above-mentioned health care records to identify 15,600 severe asthmatics in a diverse population of over 487,400 UK Biobank participants. The highest percentage were of South Asian descent. 

During my internship, I analysed UK Biobank health data through the High Performance Computing (HPC) service ALICE, which was based on Linux. Safety of patient data is crucial and so this was achieved by restricting access whereby only staff members who were given permission could access the data. There were many other ways that patient records were kept safe such as the anonymisation of patient data.  

I am grateful for the many skills and processes I have learnt throughout my internship. One particular skill I developed was using R within the Linux environment. Although I was familiar with RStudio, prior to this internship I was unacquainted with Linux which is a great skill I was able to gain. Outside of my internship project, I attended a Linux workshop which was very insightful and helpful throughout my work.  

Additionally, I was able to develop my organisational skills. I worked on two projects simultaneously. Aside from defining severe asthma, I also teamed up with seven other interns on the ‘Technical Team Challenge’; a group project set by HDR UK to analyse a data set of our choice. Working on several tasks concurrently is something I was able to master during my time as an intern. 

Attending weekly Genetic Epidemiology group team meetings and journal clubs also enabled me to further develop my communication skills. I learnt what working as a health data scientist truly entails. During these meetings I was able to absorb a lot of information about different health data projects, which further stimulated my interest in health data science. This allowed me to understand and appreciate the work carried out to help improve the health of individuals.  

The research I have undertaken will benefit asthma patients through my analysis and identification of individuals with severe asthma in a large health research cohort.  This information will be used to perform studies to identify genetic risk factors for severe asthma and potentially enable the development of more effective therapies and treatments for patients with severe asthma.