Note: This course offered by the UK Data Service follows a different structure than others at ESS-SSDA. Be aware this is a one-week and full day format, this it will not be possible to enroll concurrently with two week courses in session one.

Louise Corti is an Associate Director at the UK Data Archive and heads up the Data Services teams. Her research activities are focused around standards and technologies for reviewing, curating and reusing digital social science data, particularly using open source infrastructures and tools. She is an author of the Sage Publications book, Managing and Sharing Research Data; a Guide to Good Practice and many chapters and articles on qualitative data sharing. Louise teaches regularly and set up the summer school on Encounters with Big Data in 2016 which has run four times

Simon Parker Simon Parker is the Data Liaison Manager for the Cancer Intelligence team at Cancer Research UK. He has overseen the development of infrastructure to support the safe use of sensitive research data for cancer researchers and produced a long-term research data strategy for the Charity. He has co-authored the Handbook on Statistical Disclosure Control and written associated training materials. He previously worked at the UK Data Service with a focus on the Secure Lab, and has taught on a previous summer school on preparing to and using big data in the social sciences in the UK.

 

 


This week long course run by the UK Data Service introduces key concepts and discussions around using big data in the social sciences. It introduces approaches to and open source tools for exploring and analysing new and novel forms of data. It looks at the challenges of reproducibility in social science and covers best practices in transparency for data creation, manipulation and analysis. The course, aimed at researchers, statisticians, or data analysts, covers aspects of data evaluation (ethical, legal and practical), extraction, exploration, basic analysis and visualisation of data from the web, using Spark R and various R Packages. In addition to the hands-on lab sessions, participants spend a full day on group projects applying what they have learned on real data challenges. This course mostly focuses on numeric data and does not cover in any detail text, social media or audio sources.

This course is introductory, but students will be expected to have experience using quantitative research data in the social sciences. This includes a good understanding of statistical methodology and concepts like standard error and standard deviation and competence in writing commands in a statistical computing environment like Stata, R or SPSS.

  Introducing big data research
  Manipulating and analysing data using Spark
  Manipulating data using Hive
  Tools and techniques for dealing with external data
  Transparency agenda and Github
 Creating interactive maps in R with Leaflet.
 ODBC in Excel and R.

Prerequisites
OLS – elementary
R – elementary

Monday 20 July:  Introducing big data research

10.00

Welcome

 

Introduction

Presentation: Introduction to the summer school course

Discussion: Participants’ background and expectations. All

Presentation: Big data services in practice: demo of Smart Energy Research Lab (SERL)

 

Coffee break

 

Big data, social science and social surveys

Presentation: National statistics: Big data instead of social surveys?

Exercise: National statistics experiment: discuss and investigate non-traditional data sources

13:30 -14.15

Lunch

 

Big data: ethics and risk

Presentation: Ethics and rights in big data: risk, harm, governance, IPR, and 5 safes

Exercise: Sharing data debate

Case study: Researching the Dark Web. Christian Kemp or similar

 

Coffee break

 

Demo: Introduction to documenting code: Jupyter Notebooks and R Markdown

Exercise: Keeping track of your work. Introducing Jupyter Notebooks and R markdown

17:45

Close

 

 

Tuesday 21 July: Obtaining, assessing and exploring big data using Spark R and R

10.00

 

 

Obtaining and managing big data

Demo and exercise: Introduction to R and Spark

Demo and exercise: Overview of data wrangling with R and Spark, including linking and merging data sources

 

Coffee break

 

Tools and techniques for getting and converting data from external sources

Presentation, demo and exercise: Querying APIs

Presentation, demo and exercise: Handling JSON formats

Presentation, demo and exercise: SQL queries and using the Open Data Base Connector (ODBC)

13:30 -14.15

Lunch

 

Making big data research ready

Presentation: Assessing and dealing with dirty data

Exercise: Assessing, dealing with and cleaning dirty data, using QAMydata and R packages

 

 

 

Assessing Disclosure risk: data and published outputs

Presentation: Assessing disclosure risk in microdata

17:45

Close

 

Wednesday 22 July:  Assessing, manipulating, exploring and analysing big data

10.00

 

 

Assessing disclosure risk in micodata

Exercise: Assessing disclosure risk in data using sdcMicro

 

Coffee break

 

Exploring data with Spark and R

Demo and exercise: Basic data visualization and modelling with R and Spark

13:30 -14.15

Lunch

 

Exploring data with Spark and R.

Demo and exercises: Analysis using R packages and applying R Markdown

 

Coffee break

 

Maps: Creating maps in R with leaflet

Demo and exercise: Creating interactive maps in R with Leaflet

17:45

Close

 

 

Thursday 23 July:  Publishing big data and group projects

10.00

 

 

Publishing and being transparent with big data

Presentation: Being transparent in science, publishing data and code

Exercise: Reproducible code

 

Coffee break

 

Publishing and being transparent with big data

Demo and exercise: Create your own GitHub account and repository

13:30 -14.15

Lunch

 

Group projects

Introducing your group projects. Louise Corti

Exercise: Brainstorm and formulate group projects

 

Coffee break

 

Group projects (supported by tutors and assistants)

17:45

Close

 

 

Friday 24 July: Group projects

10.00

Morning coffee

 

Group projects (supported by tutors and assistants)

 

Coffee break

 

Group projects (supported by tutors and assistants)

13:30 -14.15

Lunch

 

 

 

Coffee break

 

Project presentations and prize ceremony

17:00

Close