Roger Beecham is a Lecturer in Geographic Data Science at University of Leeds; previously he was at giCentre, City, University of London. His research and teaching demonstrates how new, passively-collected datasets can be repurposed for social science research. This spans several disciplinary areas: spatial data analysis, information visualization, transport planning, political geography and crime science. A current focus, as demonstrated by his recent award-winning work in graphical inference, is around how new data and new disciplines such as ‘Data Science’ are reshaping statistical model building — and the role of visualization in supporting this activity.
In modern data analysis, graphics and computational statistics are increasingly used together to explore and identify complex patterns in data and to make and communicate claims under uncertainty. This course will go beyond traditional ideas of charts, graphs, maps (and also statistics!) to equip you with the critical analysis, design and technical skills to analyse and communicate with social science datasets.
The course emphasises real-world applications. You will work with both new, large-scale behavioural datasets, as well as more traditional, administrative datasets located within various social science domains: Political Science, Crime Science, Urban and Transport Planning. As well as learning how to use graphics and statistics to explore patterns in these data, implementing recent ideas from data journalism you will learn how to communicate research findings – how to tell stories with data.
• Data and Visualization Fundamentals — tidy data, visual variables and grammar of graphics
• Exploratory visual data analysis — using graphics to explore social outcomes and processes
• Visualization for model building – using graphics with models to evaluate social-spatial processes and outcomes
• Visualization applications — using non-standard data graphics to explore how social processes distribute and interact over space and time
• Communicating with social science datasets — uncertainty visualization, data-driven storytelling, graphical integrity and the reproducibility agenda
The course will consist of short, focussed lectures, followed by more involved practical sessions. All data analysis activities – data collection and processing, visualization design and statistical procedures – will be carried out using the R statistical programming environment.
This is a ‘hands-on’ course that will equip you with the technical and critical-reasoning skills to explore, analyse and communicate with datasets using modern data analysis approaches.
You will collect and work efficiently with large, complex and multivariate social science datasets (10s+ million records). You will learn how to process and analyse these datasets using high-level and reproducible programming routines and to write code to create sophisticated data graphics using leading software libraries and frameworks — the ggplot2 library for declarative visualization design and functional programming routines using tidyverse and related packages.
By the end of the course, you will be able to:
1. Describe, process and combine social-spatial datasets from a range of sources
2. Design non-standard statistical graphics that expose multivariate structure in social-spatial data and be able to critique data graphics using established principles in information visualization
3. Apply modern statistical techniques for analysing, representing and communicating data and model uncertainty
The course will help build technical skills, confidence and creativity when working with data as you develop as quantitative social scientists.
You should have some existing awareness of general statistical concepts and particularly an understanding of data types. Prior familiarity with the R statistical programming environment is also beneficial.
Before starting the course, you should have some experience in:
• Describing datasets and variables according to type – Interval, Ratio, Ordinal, Nominal
• Applying measures of dispersion and central tendency when exploring data – Mean, mode, median, standard deviation, percentiles.
• Working with estimates of effect size: Ratios and proportions, correlation coefficients, z-scores
• Implementing statistical tests to support evaluation of effect sizes: T-tests, chi-square tests etc
• Statistical model building (elementary): Running standard linear regression models, interpreting model coefficients, analysing residuals.
Representative Background Reading
Wood, J., Badawood, D., Dykes, J. & Slingsby, A. (2011). BallotMaps: Detecting name bias in alphabetically ordered ballot papers. IEEE Transactions on Visualization and Computer Graphics, 17(12), pp. 2384-2391.
Kieran Healy, Data Visualization: A Practical Introduction (Princeton: Princeton University Press, 2018), http://socviz.co/.
Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017), http://r4ds.had.co.nz/.
Standards Required for Course
Descriptive Statistics = moderate
OLS = moderate
Maximum Likelihood = elementary
R = moderate