Image result for roger beecham leeds

Roger Beecham is a Lecturer in Geographic Data Science at University of Leeds; previously he was at giCentre, City, University of London. His research and teaching demonstrates how new, passively-collected datasets can be repurposed for social science research. This spans several disciplinary areas: spatial data analysis, information visualization, transport planning, political geography and crime science. A current focus, as demonstrated by his recent award-winning work in graphical inference, is around how new data and new disciplines such as ‘Data Science’ are reshaping statistical model building — and the role of visualization in supporting this activity. 

Course Content
In modern data analysis, graphics and computational statistics are increasingly used together to explore and identify complex patterns in data and to make and communicate claims under uncertainty. This course will go beyond traditional ideas of charts, graphs, maps (and also statistics!) to equip you with the critical analysis, design and technical skills to analyse and communicate with social science datasets.

The course emphasises real-world applications. You will work with both new, large-scale behavioural datasets, as well as more traditional, administrative datasets located within various social science domains: Political Science, Crime Science, Urban and Transport Planning. As well as learning how to use graphics and statistics to explore patterns in these data, implementing recent ideas from data journalism you will learn how to communicate research findings – how to tell stories with data.

Thematic topics:

• Data and Visualization Fundamentals — tidy data, visual variables and grammar of graphics
• Exploratory visual data analysis — using graphics to explore social outcomes and processes
• Visualization for model building – using graphics with models to evaluate social-spatial processes and outcomes
• Visualization applications — using non-standard data graphics to explore how social processes distribute and interact over space and time
• Communicating with social science datasets — uncertainty visualization, data-driven storytelling, graphical integrity and the reproducibility agenda

The course will consist of short, focussed lectures, followed by more involved practical sessions. All data analysis activities – data collection and processing, visualization design and statistical procedures – will be carried out using the R statistical programming environment.

Course Objectives
This is a ‘hands-on’ course that will equip you with the technical and critical-reasoning skills to explore, analyse and communicate with datasets using modern data analysis approaches.

You will collect and work efficiently with large, complex and multivariate social science datasets (10s+ million records). You will learn how to process and analyse these datasets using high-level and reproducible programming routines and to write code to create sophisticated data graphics using leading software libraries and frameworks  — the ggplot2 library for declarative visualization design and functional programming routines using tidyverse and related packages.

By the end of the course, you will be able to:

1. Describe, process and combine social-spatial datasets from a range of sources

2. Design non-standard statistical graphics that expose multivariate structure in social-spatial data and be able to critique data graphics using established principles in information visualization

3. Apply modern statistical techniques for analysing, representing and communicating data and model uncertainty

The course will help build technical skills, confidence and creativity when working with data as you develop as quantitative social scientists.

Course Prerequisites
You should have some existing awareness of general statistical concepts and particularly an understanding of data types. Prior familiarity with the R statistical programming environment is also beneficial.

Before starting the course, you should have some experience in:

• Describing datasets and variables according to type – Interval, Ratio, Ordinal, Nominal
• Applying measures of dispersion and central tendency when exploring data – Mean, mode, median, standard deviation, percentiles.
• Working with estimates of effect size: Ratios and proportions, correlation coefficients, z-scores
• Implementing statistical tests to support evaluation of effect sizes: T-tests, chi-square tests etc
• Statistical model building (elementary): Running standard linear regression models, interpreting model coefficients, analysing residuals.

 

Representative Background Reading

Wood, J., Badawood, D., Dykes, J. & Slingsby, A. (2011). BallotMaps: Detecting name bias in alphabetically ordered ballot papers. IEEE Transactions on Visualization and Computer Graphics, 17(12), pp. 2384-2391.

 

Required texts

Kieran Healy, Data Visualization: A Practical Introduction (Princeton: Princeton University Press, 2018), http://socviz.co/.

Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017), http://r4ds.had.co.nz/.

Standards Required for Course

Statistics Background
Descriptive Statistics = moderate
OLS = moderate
Maximum Likelihood = elementary

Computer Background
R = moderate

 

Day 1  

Lecture : Introduction and Foundations — Exploring Society through Visualization and Modelling

Practical : Download and Configure — R, RStudio and GitHub

Reading :

* Arribas-Bel, D. and Reades, J. (2018) “Geography and Computers: Past, Present, and Future,” Geography Compass 12, 10: e12403, doi:10.1111/gec3.12403

* Wickham, H, (2017) R for Data Science, Chapters 1,2,6,8.

Day 2

Lecture : Fundamentals — Tidy data, visual variables and grammar of graphics

Practical : Tidyverse and ggplot2

 Reading :

* Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 1.

* Munzner, T. (2015) Visualization Analysis and Design, Chapters 1,2 and 5.

* Wickham, H. (2017) R for Data Science, Chapters 3-12.

 Day 3

 Lecture : Grammar of graphics and tidy data to explore socio-spatial outcomes

Practical : Exploring voting outcomes

Reading :

* Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 3, 4, 7.

* Gamio, L. and Keating, D. (2016) How Trump redrew the electoral map, from sea to shining sea, The Washington Post

 Day 4

 Lecture : Grammar of graphics and tidy data for exploratory model building

Practical : Explaining voting outcomes

 Reading :

* Beecham, R., Slingsby, A., and Brunsdon, C. (2018). Locally-varying explanations behind the United Kingdom’s vote to leave the European Union. Journal of Spatial Information Science, 16:117–136.

* Beecham, R., Willaims, N., and Comber, L. (forthcoming). Regionally-structured explanations behind area-level populism: an update to recent ecological analyses, PLOS ONE.

* Wickham, H. (2017) R for Data Science, Chapters 22-25.

Day 5

Lecture :  Grammar of graphics and tidy data for  model evaluation

Practical: Presenting models of voting outcomes

Reading :

* Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 6, 7.

* Loy, A., Hofmann, H. and Cook, D. (2017) Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners, Journal of Computational and Graphical Statistics, 26(3):478-492

* Beecham, R., Dykes, J., Meulemans, W., Slingsby, A., Turkay, C., and Wood, J. (2017). Map line-ups: effects of spatial structure on graphical inference. IEEE Transactions on Visualization and Computer Graphics, 23(1):391–400

Day 6

Lecture : Applications part 1 – Collecting and wrangling multivariate social-science data

Practical : Signals from noise : statistical process control charts for road safety monitoring

Reading :

* Lovelace, Robin, Malcolm Morgan, Layik Hama, Mark Padgham, and M Padgham. 2019. “Stats19 A Package for Working with Open Road Crash Data.” Journal of Open Source Software 4 (33): 1181.

* Aylin, P., Best, N., Bottle, A. and Marshall, C. (2003) “ollowing Shipman: a pilot system for monitoring mortality rates in primary care, The Lancet, vol. 362, no. 9382, pp. 485–491.

* Lovelace, R., Jakub, N. and Meunchow, J. (2019) Geocomputation with R, Chapters 1-6.

Day 7

Lecture : Applications part 2 – Collecting and wrangling social-spatial network data

Practical : Exploring and visualizing flows

Reading :

* Lovelace, R., Jakub, N. and Meunchow, J. (2019) Geocomputation with R, Chapter 12.

* Wood, J., Slingsby, A. and Dykes, J. (2011). Visualizing the dynamics of London’s bicycle hire scheme. Cartographica, 46(4), pp. 239-251.

* Beecham, R. and Slingsby, A. (2019). Characterising labour market self-containment in London with geographically arranged small multiples. Environment and Planning A: Economy and Space.

* Beecham, R. and Wood, J. (2014). Exploring gendered cycling behaviours within a large- scale behavioural data-set. Transportation Planning and Technology, 37(1):83–97.

Day 8

Lecture : Communicating with data part 1 —  Uncertainty visualization

Practical : TBC

Reading :

* Correl, M. and Heer, J. (2017) Surprise! Bayesian Weighting for De-Biasing Thematic Maps, IEEE Transactions on Visualization and Computer Graphics, 23(1): 651-660

* Dragicevic, P., Jansen, Y., Sarma, A., Kay, M, and Chevalier. F. (2019) Increasing the Transparency of Research Papers with Explorable Multiverse Analyses. CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK.

* Kale, A., Nguyen, F., Kay, M. and Hullman, J. (2018) Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data, IEEE Transactions on Visualization and Computer Graphics, 25(1):892 – 902.

* Kay, M. (2018) Tidy data and Bayesian Analysis, OpenVis2018.

* Wickham, H. (2017) R for Data Science, Chapters 26-30

Day 9

Lecture : Communicating with data part 2 — Data-driven storytelling

Practical : TBC

Reading :

* Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 8.

* Wood, J. (2015) Visualizing personal progress in participatory sports cycling events. IEEE Computer Graphics and Applications, 35(4), 73-81.

* Riche, NH., Hurter, C. (2018), Diakopoulos, N. and Carpendale, S. Data-Driven Storytelling, CRC Press

Day 10

Lecture : Consolidation — reproducible workflows and showcasing your work

Practical : TBC