Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Roger Beecham is Associate Professor in Visual Data Science at University of Leeds; previously he was at giCentre, City, University of London. His research and teaching demonstrates how new, passively-collected datasets can be repurposed for social science research. This spans several disciplinary areas: spatial data analysis, information visualization, transport planning, political geography and crime science. A key focus, as demonstrated by his work in graphical inference, is around how new data and new disciplines such as ‘Data Science’ are reshaping statistical model building — and the role of visualization in supporting this activity.
Course content
Increasingly in modern data analysis, graphics and computational statistics are used together to explore and identify complex patterns in data and to make and communicate claims under uncertainty. This course will go beyond traditional ideas of charts, graphs, maps (and also statistics!) to equip you with the critical analysis, design and technical skills to analyse and communicate with data.
The course emphasises real-world applications. You will work with both “new” and behavioural as well as more traditional, administrative datasets located within various social science domains: Political Science, Health Sciences and Urban and Transport Planning. As well as learning how to apply graphics and statistics to explore patterns in data, you will learn how to communicate research findings – how to tell stories with data – drawing-on and implementing recent ideas from data journalism.
Thematic topics:
- Data and Visualization Fundamentals — tidy data, visual variables and grammar of graphics.
- Visualization for exploratory data analysis — using graphics to explore social outcomes and processes.
- Visualization applications — using non-standard data graphics to explore how social processes distribute and interact over space and time.
- Visualization for model building – using graphics with models to evaluate social-spatial processes and outcomes.
- Communicating with social science datasets — uncertainty visualization, data-driven storytelling, graphical integrity and the reproducibility agenda.
The course will consist of sessions which blend theory and practical activities as we together to explore structure in social science data. All data analysis activities – data collection and processing, visualization design and statistical procedures – will be carried out using the R statistical programming environment.
Course objectives
This is a ‘hands-on’ course that will equip you with the technical and critical-reasoning skills to explore, analyse and communicate with datasets using modern approaches.
You will be collecting and working efficiently with large (10s of millions records), complex and multivariate social science datasets. You will then program sophisticated data graphics and apply statistical computing procedures using established software libraries and frameworks — the ggplot2 library for declarative visualization design and functional programming approaches using tidyverse and related packages.
By the end of the course, you will be able to:
- Describe, process and combine social-spatial datasets from a range of sources.
- Design non-standard statistical graphics that expose multivariate structure in social-spatial data and be able to critique data graphics using established principles in information visualization.
- Apply modern statistical techniques for analysing, representing and communicating data and model uncertainty.
The course helps you to build technical skills, confidence and creativity in applying modern computational approaches to social science datasets.
Course prerequisites
Students should have some existing awareness of general statistical concepts and particularly an understanding of data types. Some familiarity of the R statistical programming environment is also beneficial.
Before starting the course, students should have some experience in:
- Categorising datasets and variables according to type – Interval, Ratio, Ordinal, Nominal.
- Applying measures of dispersion and central tendency when exploring data – Mean, mode, median, standard deviation, percentiles.
- Working with estimates of effect size: Ratios and proportions, correlation coefficients, z-scores.
- Implementing statistical tests to support evaluation of effect size: T-tests, chi-square tests.
- Statistical model building (elementary): Running standard linear regression models, interpreting model coefficients
Background Knowledge
Statistics:
OLS = elementary
Maximum Likelihood = elementary
Computer Background:
R = moderate
Maths:
Linear Regression = elementary
Day 1 –
Introduction: Computational Methods for Social Data Science
Practical: Download and Configure — R, RStudio (and GitHub)
Reading
- Arribas-Bel, D. and Reades, J. (2018) “Geography and Computers: Past, Present, and Future,” Geography Compass 12, 10: e12403.
- Brunsdon, C and Comber, A. (2021) “Opening Practice: Supporting Reproducibility and Critical Spatial Data Science,” Journal of Geographical Systems, 23:477-496.
- Wickham, H. and Grolemund, G. (2017) R for Data Science, Chapters 1,2,6,8.
Day 2 –
Data Fundamentals: Describe, Wrangle, Tidy
Practical: Download, rearrange and summarise large bikeshare datasets
Reading
- Padgham, M., Ellison, R. (2017) bikedata Journal of Open Source Software, 2(20).
- Wickham, H. and Grolemund, G. (2017) R for Data Science, Chapters 3-12.
- Wickham, H. (2014) Tidy Data, Journal of Statistical Software, 59(10):1-23.
Day 3 –
Visualization Fundamentals: Codify, Map, Evaluate
Practical: Using vis fundamenals to expose patterns in election datasets
Reading
- Healy, K. (2018) Data Visualization: A Practical Introduction. Princeton: Princeton University Press. Chapters 3, 4, 7.
- Munzner, T. 2014. Visualization Analysis and Design. AK Peters Visualization Series.
- Wickham, H. and Grolemund, G.(2017) R for Data Science, Chapters 3.
Day 4 –
Exploratory Data Analysis: Using Colour and Layout for Comparison
Practical: Exploring what, when and how questions in road safety datasets
Reading
- Beecham, R. and Lovelace, R. (2022) “A framework for inserting visually-supported inferences into geographical analysis workflow: application to road crash analysis.” Geographical Analysis.
- Correl, M. and Heer, J. (2017) “Surprise! Bayesian Weighting for de-Biasing Thematic Maps.” IEEE Transactions on Visualization & Computer Graphics, 23(1):651-660.
- Wickham, H. and Grolemund, G.(2017) R for Data Science, Chapters 7.
- Badawood Wood J. and A. Slingsby, (2011) “BallotMaps: Detecting Name Bias in Alphabetically Ordered Ballot Papers,” IEEE Transactions on Visualization and Computer Graphics, 17(12):2384–91.
Day 5 –
Exploratory Spatial Networks: Containment and Connection
Practical: Mapping flows of workers
Reading
- Lovelace, R., Jakub, N. and Meunchow, J. (2019) Geocomputation with R, Chapter 12.
- Wickham, H, Navarro, D. and T. Lin Pedersen. T. (2020) Ggplot2: Elegant Graphics for Data Analysis, Springer. Chapters 7.
- Wood, J., Slingsby, A. and Dykes, J. (2011) “Visualizing the Dynamics of London’s Bicycle-Hire Scheme,” Cartographica: The International Journal for Geographic Information and Geovisualization, 4:239–251.
Day 6 –
Model Building 1: Expose, Estimate, Evaluate
Practical: Re-visiting the 2016 elections
Reading
- Beecham, R., Williams, N. and Comber, L. (2020) “Regionally-structured explanations behind area-level populism: An update to recent ecological analyses,” PLOS One 15(3):e0229974.
- Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 6, 7.
- Wickham, H. and Grolemund, G. (2017) R for Data Science, Chapters 22-25.
Day 7 –
Model Building 2: Expose, Estimate, Evaluate
Practical: Re-visiting the 2016 elections
Reading
- Beecham, R., J. Dykes, W. Meulemans, A. Slingsby, C. Turkay, and J. Wood. (2017) “Map Line-Ups: Effects of Spatial Structure on Graphical Inference.” IEEE Transactions on Visualization & Computer Graphics 23 (1): 391–400.
- Loy, A., Hofmann, H. and Cook, D. (2017) Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners, Journal of Computational and Graphical Statistics, 26(3):478-492.
- Wolf, L. J., Anselin, L., Arribas-Bel, D., Mobley, L. R. (2021). “On Spatial and Platial Dependence: Examining Shrinkage in Spatially Dependent Multilevel Models”. Annals of the American Association of Geographers.
Day 8 –
Uncertainty Analysis: Quantifying and Understanding Risk
Practical: Quantifying road safety risk
Reading
- Padilla, L., Kay, M. and Hullman, J. (2021) Uncertainty visualization. InWiley StatsRef:Statistics Reference Online (eds. N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch,F. Ruggeri and J. L. Teugels). Wiley.
- Wilke, C. (2019) Fundamentals of Data Visualization, Sebastopol, California: O’Reilly Media. Chapters 16.
Day 9 –
Data Storytelling: Communicating Social Data Science Findings
Practical: Communicating accelerating and deceleration Covid-19 case trajectories
Reading
- Beecham, R., J. Dykes, L. Hama, and N. Lomax. (2021) “On the use of ‘glyphmaps’ for analysing Covid-19 reported cases.” ISPRS International Journal of Geo-Information 10(4).
- Riche, N. H., Hurter, C., Diakopoulos, N. and Carpendale, S. (2018) Data-Driven Storytelling, Boca Raton, Chapter 5, 7, 9.
- Roth, R. (2020) “Cartographic Design as Visual Storytelling: Synthesis and Review of Map-Based Narratives, Genres, and Tropes.” The Cartographic Journal.
- Spiegelhalter, D. (2019) The Art of Statistics, Pelican.
Day 10 –
Sharing your Social Data Science Research
Practical: Organising and publishing your projects with GitHub and RMarkdown
Reading
- Bryan, J. (2020) Happy Git and GitHub for the R User.
- Brunsdon, C and Comber, A. (2021) “Opening Practice: Supporting Reproducibility and Critical Spatial Data Science,” Journal of Geographical Systems, 23:477-496.
- Wood, J., A. Kachkaev, and J. Dykes. (2018) “Design Exposition with Literate Visualization.” IEEE Transactions on Visualization and Computer Graphics, 25(1):759–68.