Overview:

This three day webinar on docker will be split into three sessions to look at containers (Docker) for reproducibly interrogating multiple biobanks and other data enclaves. The target audience for this webinar is end users in biotech covering pharma, diagnostics and data roles.

The first session on Monday 13 November 2023 will look at using Docker containers reproducibly in multiple computing environments. Reproducibly aggregating and comparing across Biobanks requires the use of reproducible software environments such as Docker Containers. In this webinar, we’ll review basic Docker concepts and show examples of utilizing existing Docker containers in both high performance computing (HPC) and cloud computing environments. We’ll learn tricks to effectively use Docker Image Files and specify them within existing workflows.

In the second session on Thursday 16 November 2023 will look more into building and modifying containers for reproducible research. Building your own Docker Containers for reproducible search involves modifying and rebuilding existing containers. In this webinar, we will cover the build process to build your own container images using Dockerfiles. Once your image is built, it can be shared and distributed with others using container image libraries such as DockerHub.

The third session on Tuesday 21 November 2023 will be a roundtable discussion on phenotype mining and aggregation. Panellists in this session will discuss practical and technical aspects of summary data aggregation after analysis in distinct data enclaves. We will be joined by Hernando Sanches from The Hyve, Dr Tiffany J. Callahan from IBM Research, Deepak Unni from Swiss Institute of Bioinformatics (SIB) and DNAnexus’ Ben Busby, who will be acting as a moderator.

Discussion prompts will include:

  • Why should people care about data harmonization when they are integrating phenotypic datasets?
  • What are ontologies and why are they important?
  • What ontologies do you use in your work?
  • How do you use those ontologies?
  • Why are large public datasets important to you?
    • UKB and similar governmentally supported biobanks
    • UKB, NCBI, EBI, etc.
    • Health insurance data
  • What new datasets are you excited about?
  • What emerging models are you excited about?

Book your free place via the links below:

Register here for Session 1: 13 November 16:00-17:00 GMT

Register here for Session 2: 16 November 16:00-17:00 GMT

Register here for Session 3: 21 November 16:00-17:00 GMT

Timetable

  • Time & Date:

    13 November 2023, 16:00-17:00 GMT

    Learning Objectives:

    1. Explain basic Docker concepts, including containers, images, and DockerHub
    2. Utilise Docker containers in a high performance computing environment
    3. Utilise Docker containers in an existing workflow within a cloud computing environment such as the UK Biobank Research Analysis Platform
    • Docker presentation* (the basics and using Docker Hub)
    • Case study: using Docker on the UKBRAP
    • Using Docker in a variety of environments

    Register here for Session 1: 13 November 16:00-17:00 GMT

  • Time & Date:

    16 November 2023, 16:00-17:00 GMT

    Learning Objectives:

    1. Review basic interactions with DockerHub to pull and push container images
    2. Modify and extend existing containers by editing and building from Dockerfiles
    3. Share containers and workflows across Biobanks for reproducible analysis
    • Review: Docker Hub
    • Modification of existing containers
      • Including base containers
    • Sharing containers with colleagues and other scientists
    • Scientific results aggregation

    Register here for Session 2: 16 November 16:00-17:00 GMT

  • Time & Date:

    21 November 2023, 16:00-17:00 GMT

    Roundtable discussion on phenotype mining and aggregation. Panellists in this session will discuss practical and technical aspects of summary data aggregation after analysis in distinct data enclaves. Speakers are TBC but a range of organisations will be represented. Discussion prompts will include:

    • Why should people care about data harmonization when they are integrating phenotypic datasets?
    • What are ontologies and why are they important?
    • What ontologies do you use in your work?
    • How do you use those ontologies?
    • Why are large public datasets important to you?
    • What new datasets are you excited about?
    • What emerging models are you excited about?

    Register here for Session 3: 21 November 16:00-17:00 GMT

Interested in accessing more training from HDR UK? Sign up to HDR UK Futures to access our growing online curriculum.

HDR UK Futures