Note: This course offered by the UK Data Service follows a different structure than others at ESS-SSDA. Be aware this is a one-week and full day format, this it will not be possible to enroll concurrently with two week courses in session one.
Louise Corti is an Associate Director at the UK Data Archive and heads up the Data Services teams. Her research activities are focused around standards and technologies for reviewing, curating and reusing digital social science data, particularly using open source infrastructures and tools. She is an author of the Sage Publications book, Managing and Sharing Research Data; a Guide to Good Practice and many chapters and articles on qualitative data sharing. Louise teaches regularly and set up the summer school on Encounters with Big Data in 2016 which has run four times
Simon Parker Simon Parker is the Data Liaison Manager for the Cancer Intelligence team at Cancer Research UK. He has overseen the development of infrastructure to support the safe use of sensitive research data for cancer researchers and produced a long-term research data strategy for the Charity. He has co-authored the Handbook on Statistical Disclosure Control and written associated training materials. He previously worked at the UK Data Service with a focus on the Secure Lab, and has taught on a previous summer school on preparing to and using big data in the social sciences in the UK.
This week long course run by the UK Data Service introduces key concepts and discussions around using big data in the social sciences. It introduces approaches to and open source tools for exploring and analysing new and novel forms of data. It looks at the challenges of reproducibility in social science and covers best practices in transparency for data creation, manipulation and analysis. The course, aimed at researchers, statisticians, or data analysts, covers aspects of data evaluation (ethical, legal and practical), extraction, exploration, basic analysis and visualisation of data from the web, using Spark R and various R Packages. In addition to the hands-on lab sessions, participants spend a full day on group projects applying what they have learned on real data challenges. This course mostly focuses on numeric data and does not cover in any detail text, social media or audio sources.
This course is introductory, but students will be expected to have experience using quantitative research data in the social sciences. This includes a good understanding of statistical methodology and concepts like standard error and standard deviation and competence in writing commands in a statistical computing environment like Stata, R or SPSS.
|Introducing big data research|
|Manipulating and analysing data using Spark|
|Manipulating data using Hive|
|Tools and techniques for dealing with external data|
|Transparency agenda and Github|
|Creating interactive maps in R with Leaflet.|
|ODBC in Excel and R.|
OLS – elementary
R – elementary