Ines Levin is an assistant professor in the Department of Political Science at the University of California, Irvine. Her research focuses on quantitative research methods with substantive applications in the areas of elections and public opinion. She received her Ph.D. from the California Institute of Technology in 2012. In 2011-12 she was a Max Weber Fellow at the European University Institute, and between 2012 and 2016 she was an assistant professor in the Department of Political Science at the University of Georgia.

Course Content

In an era of ever larger data sets and unprecedent computational resources, machine learning techniques are becoming mainstream tools for social scientists. These tools are now being used for a variety of purposes, including: classifying political text and audio-visual materials; learning about elite and mass opinion from social media activity; and detecting voter fraud. The goal of this course is to introduced students to several machine learning techniques, starting with unsupervised clustering algorithms, then turning to supervised methods, including Bayes classifiers and regression trees, and lastly covering more advanced ensemble approaches.

To facilitate interpretation and comparison of the different techniques, the implementation of each method will be illustrated via applications to election forensics – an emerging area of scholarship that focuses on using statistical techniques to detect anomalies in election returns. The course will emphasize active learning and will have a strong applied component.

Course Objectives

By the end of the course, participants should be able to implement all the machine learning techniques covered in class using the R software environment. They should also be able to explain the logic underlying the different procedures, interpret results, and draw conclusions from their analyses.

Course Prerequisites

Before taking the course, participants should have knowledge of basic statistics (e.g. have taken an “Introduction to Probability and Statistics” course) and be able to prepare and explore data using the R statistical environment (e.g. load and explore data, make two-way tables, create basic plots, and calculate descriptive statistics). They should also have solid grounding on regression analysis (e.g. understanding of Gauss-Markov assumptions, be able to interpret OLS regression results and carry out regression diagnostics, and be able to implement analyses of categorical data using logistic regression).

Representative Background Reading

Varian, Hal R. 2014. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives 28(2): 3-27.

Levin, Ines, Julia Pomares, and R. Michael Alvarez. 2016. “Using Machine Learning Algorithms to Detect Election Fraud.” In R.M. Alvarez (Ed.), Computational Social Science: Discovery and Prediction. Cambridge University Press.

Required texts

James, Gareth, Daniela Witten, and Trevor Hastie. 2013. An Introduction to Statistical Learning: With Applications in R. New York: Springer. (We will use the corrected 7th printing. Book’s website: