Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Dr Ben Skinner obtained a PhD in Genetics from the University of Kent in 2009, and then performed postdoctoral research at the University of Cambridge on structural and evolutionary genomics – how genomes and the chromosomes they contain change and rearrange over time. In 2019 he joined the University of Essex as a Lecturer in the School of Life Sciences. His research group works on development of image analysis methods, and genome structure and evolution. He teaches computational analysis and programming to undergraduate and postgraduate students.
Dr Dave Clark is a current lecturer in Ecoinformatics within the School of Life Sciences at the University of Essex. He obtained his PhD in Microbiology in 2017, working on microbial community ecology using high-throughput DNA sequencing and data-synthesis approaches. Since then, he has moved on to subsequent post-doctoral and research fellowship roles within the Institute of Analytics and Data Sciences, before progressing to his current position. Dr Clark’s research draws on bioinformatic-, statistical- and geographical-analyses to answer novel questions on global microbial community ecology. Dr Clark has extensive experience teaching programming skills in R to students of all levels in a variety of disciplines, including practitioners in industry.
Course Content
In the era of misinformation and fake news, engaging people with data in a clear and interpretable way is becoming an essential skill to data scientists in all fields. Whilst there are many software packages available to produce data graphics and conduct statistical analyses, the R programming language is one of the most flexible and feature-rich toolsets available. The purpose of this course is to equip participants with the knowledge to use these tools effectively to communicate concepts and data-analyses in a transparent, reproducible, and engaging manner to any audience. In essence, we hope to transform participants into data storytellers by the end of the course.
We will do this by addressing the following five topics: · Functions, control flow, and automation – How can we use tools in R to make our analyses more efficient and robust, and then communicate and share the code we use by making our own functions. · Advanced data wrangling with tidyverse & data.table – Using pre-existing packages and frameworks in R, we will learn how to fully explore, clean, and transform our data in a manner that is efficient, scalable to very large datasets, and optimised for speed. · Data visualisation and graphic design – We will think about the key principles that go into creating effective data visualisations, and how we can build graphics using the ggplot2 (and other) packages to communicate the ‘story’ of our data to different types of audiences and in different contexts. · Reproducible research & version control – How can we maximise the reproducibility of our analyses? We will learn about the tools available to create fully reproducible outputs of many kinds (reports, presentations, web-pages etc using markdown and knitr), and how we can document and share our code and analyses to maximise their impact and keep track of our versions (git). · Interactive dashboards with Shiny – Creating interactive dashboards to communicate concepts from data can be an effective route to engage stakeholders and non-technical users with your analyses. Here, we will learn about how such dashboards can be created using the Shiny package within R, fully integrating all of the key skills and concepts dealt with over the prior course material. |
Course Objectives:
- By the end of the course, using R, participants should be able to:
- To be able to automate code using loops and control flow structures.
- Construct and document functions
- Construct data wrangling pipelines using both tidyverse and data.table
- Compare different data wrangling pipelines via benchmarking
- Construct and format different types of data visualisations using ggplot2
- Extend and format plots for specific contexts using ggplot extension packages (e.g. gganimate).
- Learn how to create reproducible reports using markdown and knitr
- Create different types of outputs using markdown and knitr including presentations, books, and websites.
- Be able to version control and share code via git / github
- Create interactive data-dashboards with shiny
- Be able to dynamically update data visualisations by downloading data from the web within R
Course Prerequisites:
This course assumes that attendees are beginner – intermediate R users. This means that attendees should have some experience using R including being able to import and work with data.frames, install and load packages, use basic functions, and make simple plots using base R. Basic statistical knowledge is assumed, including knowledge of measures of central tendency (means, medians, variance etc), linear regression, percentages etc.
Required reading:
Assuming attendees have prior experience using R, we do not specify any obligatory pre-reading. However, we suggest the following freely available texts as optional reading to supplement the course material.
- R for Data Science
- Advanced R
- ggplot2: Elegant Graphics for Data Analysis
- R Markdown: The Definitive Guide
Background knowledge
Computer background
R = elementary