Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Akitaka Matsuo is a postdoctoral fellow in the Institute for Analytics and Data Science at the University of Essex. His research interests lie in data science and politics, in particular in the statistical methodology for scaling survey responses and legislative behavior and natural language processing of political texts (e.g. social media texts, open-ended survey answers, and parliamentary speeches).

Course description

This course is intended to provide knowledge of Python to social science students who already have some experience in quantitative analysis. Nowadays, knowledge of Python is indispensable for social scientists interested in data science-related fields. This is because the statistical languages traditionally and widely used in the social sciences are sometimes not fit for the various situations that occur frequently in social science research.

Through the lectures and hands-on labs, this course introduces core Python concepts such as types of objects, functions, and control flows. Then moving on to the topics that Python would be quite handy for data analytics, such as constructing a stable environment for data acquisition from the Internet, managing and summarizing text data, and constructing machine learning models.

 

Course objectives

By completing the course, students will obtain a basic understanding of the Python programming language as well as how to work with the data. In addition, students will acquire skills to use Python in situations where Python would be helpful to enhance the progression of research projects. Those include:

  1. Building datasets from the web through Web scraping and API access
  2. Handling textual data in Python
  3. Estimating and evaluating machine learning models

In addition, through the use of a cloud computing environment, students will learn how to use such environments.

 

Required texts

Automate the Boring Stuff with Python [ABSP], Al Sweigart, No Starch Press

Available online at: https://automatetheboringstuff.com/ 

Python Data Science Handbook [PDSH], Jake VanderPlas, O’Reilly

Available online at: https://jakevdp.github.io/PythonDataScienceHandbook/ 

Web Scraping with Python, 2e: Collecting More Data from the Modern Web [WSP], Ryan Mitchell – this book will be provided by ESS

 

Course Prerequisites

Students are expected to have some experience in conducting quantitative analysis in other statistical languages, especially in R, and understand basic concepts in programming, such as functions and control flow. Knowledge of statistical models, especially categorical dependent variable models would be helpful for understanding some of the materials. An elementary understanding of machine learning is a plus.

 

Background Knowledge

Software

Intermediate knowledge at either R or Stata is required.

Maths

Calculus = elementary

Linear Regression = elementary

Statistics

OLS = elementary

Maximum Likelihood = elementary

Categorical Data Analysis = elementary

 

Day 1: Introduction to Python language

Basics

         Object types

Jupyter notebook (via Google Colab)

Control flows (loops, etc)

         List comprehension

Functions

         Lambda functions

Reading

         PDSH, Ch 1

        ABSP, Ch 1-5

 

Day 2: Numpy and Pandas

Numpy

Pandas

         Data wrangling

         File I/O

Reading

         PDSH, Ch 2-3

 

Day 3: Handling text with Python

Basic string operations in Python

Regular expressions

         re, pandas

Introduction to spaCy

Reading

        ABSP, Ch 5-6

 

Day 4: Getting the data from the web with Python

HTML/XML and JSON

Web-scraping

API

Reading

       WSP: Ch. 1-3, 12

 

Day 5: Machine Learning with Python

Machine learning for classification problem

scikit-learn

Reading

      PDSH, Ch 5