Image result for roger beecham leeds

Roger Beecham is a Lecturer in Geographic Data Science at University of Leeds; previously he was at giCentre, City, University of London. His research and teaching demonstrates how new, passively-collected datasets can be repurposed for social science research. This spans several disciplinary areas: spatial data analysis, information visualization, transport planning, political geography and crime science. A current focus, as demonstrated by his recent work in graphical inference, is around how new data and new disciplines such as ‘Data Science’ are reshaping statistical model building — and the role of visualization in supporting this activity.
 

Course Content
Increasingly in modern data analysis, graphics and computational statistics are used together to explore and identify complex patterns in data and to make and communicate claims under uncertainty. This course will go beyond traditional ideas of charts, graphs, maps (and also statistics!) to equip you with the critical analysis, design and technical skills to analyse and communicate with data.

The course emphasises real-world applications. You will work with both  “new” and behavioural as well as more traditional, administrative datasets located within various social science domains: Political Science, Health Sciences and Urban and Transport Planning. As well as learning how to apply graphics and statistics to explore patterns in data, you will learn how to communicate research findings – how to tell stories with data – drawing-on and implementing recent ideas from data journalism.
Thematic topics:
• Data and Visualization Fundamentals — tidy data, visual variables and grammar of graphics
• Exploratory visual data analysis — using graphics to explore social outcomes and processes
• Visualization for model building – using graphics with models to evaluate social-spatial processes and outcomes
• Visualization applications — using non-standard data graphics to explore how social processes distribute and interact over space and time
• Communicating with social science datasets — uncertainty visualization, data-driven storytelling, graphical integrity and the reproducibility agenda
The course will consist of sessions which blend theory and practical activities as we together explore structure in social science data. All data analysis activities – data collection and processing, visualization design and statistical procedures – will be carried out using the R statistical programming environment.

Course Objectives

This is a ‘hands-on’ course that will equip you with the technical and critical-reasoning skills to explore, analyse and communicate with datasets using modern approaches.
You will be collecting and working efficiently with large (10s of millions records), complex and multivariate  social science datasets. You will then program sophisticated data graphics and apply statistical computing procedures using established software libraries and frameworks  — the ggplot2 library for declarative visualization design and functional programming approaches using tidyverse and related packages.
By the end of the course, you will be able to:
1. Describe, process and combine social-spatial datasets from a range of sources
2. Design non-standard statistical graphics that expose multivariate structure in social-spatial data and be able to critique data graphics using established principles in information visualization
3. Apply modern statistical techniques for analysing, representing and communicating data and model uncertainty
The course helps you to build technical skills, confidence and creativity in applying modern computational approaches to social science datasets.

Course Prerequisites

Students should have some existing awareness of general statistical concepts and particularly an understanding of data types. Some familiarity of the R statistical programming environment is also beneficial.
Before starting the course, students should have some experience in:
• Categorising datasets and variables according to type – Interval, Ratio, Ordinal, Nominal
• Applying measures of dispersion and central tendency when exploring data – Mean, mode, median, standard deviation, percentiles.
• Working with estimates of effect size: Ratios and proportions, correlation coefficients, z-scores
• Implementing statistical tests to support evaluation of effect size: T-tests, chi-square tests
•  Statistical model building (elementary): Running standard linear regression models, interpreting model coefficients.

Representative Background Reading

Wood, J., Badawood, D., Dykes, J. & Slingsby, A. (2011). BallotMaps: Detecting name bias in alphabetically ordered ballot papers. IEEE Transactions on Visualization and Computer Graphics, 17(12), pp. 2384-2391.

Required texts

Kieran Healy, Data Visualization: A Practical Introduction (Princeton: Princeton University Press, 2018), http://socviz.co/.

Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017), http://r4ds.had.co.nz/.

Standards Required for Course

Statistics Background
Descriptive Statistics = moderate
OLS = moderate
Maximum Likelihood = elementary

Computer Background
R = moderate

Day 1 –
Introduction: Why, What and How of Social Data Science
Practical: Download and Configure — R, RStudio (and GitHub) 
Reading
• Arribas-Bel, D. and Reades, J. (2018) “Geography and Computers: Past, Present, and Future,” Geography Compass 12, 10: e12403
• Wickham, H. (2017) R for Data Science, Chapters 1,2,6,8
 
Day 2 –
Data Fundamentals: Describe, Wrangle, Tidy
Practical: Download, rearrange and summarise large bikeshare datasets
Reading
• Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 1
• Padgham, M., Ellison, R. (2017). bikedata Journal of Open Source Software, 2(20)
• Wickham, H. (2017) R for Data Science, Chapters 3-12
• Wickham, H. (2014) Tidy Data, Journal of Statistical Software, 59(10)
 
Day 3 –
Visualization Fundamentals: Codify, Map, Evaluate
Practical: Using vis fundamenals to expose patterns in election datasets
Reading
• Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 3, 4, 7
• Roth, R. (2017) Visual Variables in The International Encyclopedia of Geography: People, The Earth, Environment, and Technology. Wiley
• Tufte, E. (2001) The Visual Display of Quantitative Information, Graphics Press
 
Day 4 –
Visualization for Exploratory Data Analysis: Colour and Layout
Practical: Exploring what, when and how questions in road safety datasets
Reading
• Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 3, 4, 7
• Lovelace, Robin, Malcolm Morgan, Layik Hama, Mark Padgham, and M Padgham. 2019. “Stats19 A Package for Working with Open Road Crash Data.” Journal of Open Source Software 4 (33): 1181.
• Tufte, E. (2001) The Visual Display of Quantitative Information, Graphics Press
• Rost, L-C. (2018) Your friendly guide to colors in data visualization. Datawrapper blog
 
Day 5 –
Visualization for Exploratory Geospatial Data Analysis: Containment and Connection
Practical: Mapping flows of workers
Reading
• Munzner, T. (2015) Chapter 9: Arrange Networks and Trees, pp.200-217 in Visualization Analysis and Design, CRC Press
• Lovelace, R., Jakub, N. and Meunchow, J. (2019) Geocomputation with R, Chapter 12
• Wood, J. (2010) Visualization of Origins, Destinations and Flows with OD Maps, The Cartographic Journal, pp. 117-129.
 
Day 6 –
Model Building 1 :  Expose, Estimate, Evaluate
Practical: Re-visiting the 2016 election
Reading
• Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 6, 7.
• Loy, A., Hofmann, H. and Cook, D. (2017) Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners, Journal of Computational and Graphical Statistics, 26(3):478-492
 
Day 7 –
Model Building 2 :  Expose, Estimate, Evaluate
Practical: Re-visiting the 2016 election
Reading
• Healy, K. (2018) Data Visualization: A Practical Introduction, Chapter 6, 7.
• Loy, A., Hofmann, H. and Cook, D. (2017) Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners, Journal of Computational and Graphical Statistics, 26(3):478-492
 
Day 8 –
Uncertainty Analysis: Quantifying and Understanding Risk
Practical: Quantifying road safety risk
Reading
• Padilla, L., Kay, M. & Hullman, J. (in press). Uncertainty Visualization. To appear in, Handbook of Computational Statistics and Data Science
• Spiegelhalter, D. (2019) The Art of Statistics, Pelican. Chapter 12, 13.
 
Day 9 –
Data Storytelling: Communicating Social Data Science Findings
Practical: Is Cycling in London getting more or less dangerous?
Reading
• Riche, N. H., Hurter, C., Diakopoulos, N. and Carpendale, S., Data-Driven Storytelling, Boca Raton, Chapter 5, 7, 9
• Spiegelhalter, D. (2019) The Art of Statistics, Pelican.
 
Day 10 –
Sharing your Social Data Science Research
Practical: Organising and publishing your projects with GitHub and RMarkdown
Reading
• Bryan, J. (2020) Happy Git and GitHub for the R User
• Bunsdon, C. and Comber, L. (2020) Opening practice: supporting reproducibility and critical spatial data science, Journal of Geographical Systems