Ines Levin is an assistant professor in the Department of Political Science at the University of California, Irvine. Her research focuses on quantitative research methods with substantive applications in the areas of elections and public opinion. She received her Ph.D. from the California Institute of Technology in 2012. In 2011-12 she was a Max Weber Fellow at the European University Institute, and between 2012 and 2016 she was an assistant professor in the Department of Political Science at the University of Georgia.
In an era of ever larger data sets and unprecedent computational resources, machine learning techniques are becoming mainstream tools for social scientists. These tools are now being used for a variety of purposes, including: classifying political text and audio-visual materials; learning about elite and mass opinion from social media activity; and detecting voter fraud. The goal of this course is to introduced students to several machine learning techniques, starting with unsupervised clustering algorithms, then turning to supervised methods, including Bayes classifiers and regression trees, and lastly covering more advanced ensemble approaches.
To facilitate interpretation and comparison of the different techniques, the implementation of each method will be illustrated via applications to election forensics – an emerging area of scholarship that focuses on using statistical techniques to detect anomalies in election returns. The course will emphasize active learning and will have a strong applied component.
By the end of the course, participants should be able to implement all the machine learning techniques covered in class using the R software environment. They should also be able to explain the logic underlying the different procedures, interpret results, and draw conclusions from their analyses.
Before taking the course, participants should have knowledge of basic statistics (e.g. have taken an “Introduction to Probability and Statistics” course) and be able to prepare and explore data using the R statistical environment (e.g. load and explore data, make two-way tables, create basic plots, and calculate descriptive statistics). They should also have solid grounding on regression analysis (e.g. understanding of Gauss-Markov assumptions, be able to interpret OLS regression results and carry out regression diagnostics, and be able to implement analyses of categorical data using logistic regression).
Representative Background Reading
Varian, Hal R. 2014. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives 28(2): 3-27.
Levin, Ines, Julia Pomares, and R. Michael Alvarez. 2016. “Using Machine Learning Algorithms to Detect Election Fraud.” In R.M. Alvarez (Ed.), Computational Social Science: Discovery and Prediction. Cambridge University Press.
James, Gareth, Daniela Witten, and Trevor Hastie. 2013. An Introduction to Statistical Learning: With Applications in R. New York: Springer. (We will use the corrected 7th printing. Book’s website: http://www-bcf.usc.edu/~gareth/ISL/.)
Background knowledge required Statistics
OLS = m
Maximum Likelihood = e
1. Statistical Learning and Review of Regression
Gareth, Witten, Hastie, and Tibshirani. Ch 2 and 3
Wallach, Hannah. 2014. Big data, machine learning, and the social sciences. NIPS 2014 workshop keynote.
Levin, Ines, Julia Pomares, and R. Michael Alvarez. 2016. Using machine learning algorithms to detect election fraud. In R.M. Alvarez (ed.), “Computational social science: Discovery and prediction.” Cambridge University Press.
Gareth, Witten, Hastie, and Tibshirani. Ch 4
Cantú, Francisco, and Sebastián M. Saiegh. 2011. Fraudulent democracy? An analysis of Argentina’s Infamous Decade using supervised machine learning. Political Analysis 19(4): 409-433.
3. Resampling Methods
Gareth, Witten, Hastie, and Tibshirani. Ch 5
Anderson, Michael L., and Jeremy Magruder. 2017. Split-sample strategies for avoiding false discoveries. No. w23544. National Bureau of Economic Research.
4. Linear Model Selection and Regularization
Gareth, Witten, Hastie, and Tibshirani. Ch 6
Ratkovic, Marc and Dustin Tingley. 2017. Sparse estimation and uncertainty with application to subgroup analysis. Political Analysis 25(1): 1-40.
Imai, Kosuke, and Strauss, Aaron. 2011. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get- out-the-vote campaign. Political Analysis 19(1): 1-19.
Ines Levin. Learning about the influence of spatial and temporal proximity using regression trees. Working paper.
7. Tree-Based Methods: Multiple-Tree Approaches
Gareth, Witten, Hastie, and Tibshirani. Ch 8 (8.2, 8.3.3, 8.3.4)
Montgomery, Jacob M., Santiago Olivella, Joshua D. Potter, and Brian F. Crisp. 2015. An informed forensics approach to detecting vote irregularities. Political Analysis 23(4): 488-505.
8. Support Vector Machines and Neural Networks
Gareth, Witten, Hastie, and Tibshirani. Ch 9
D’Orazio, Vito, Steven T. Landis, Glenn Palmer, and Philip Schrodt. 2014. Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Political analysis 22(2): 224-242.
9. Unsupervised Learning: Principal Component Analysis
Gareth, Witten, Hastie, and Tibshirani. Ch 10 (10.1, 10.2, 10.4)
Bakker, Ryan, Catherine de Vries, Erica Edwards, Liesbet Hooge, Seth Jolly, Gary Marks, Jonathon Polk, Jan Rovny, Marco Steenbergen, and Milada Vachudova. 2015. Measuring party positions in Europe: The Chapel Hill Expert Survey Trend File, 1999-2010. Party Politics 21(1): 143-512.
Pan, Jennifer and Yiqing Xu. 2018. China’s ideological spectrum. Journal of Politics 2018 80(1): 254-273.
Roberts, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder‐Luis, Shana Kushner Gadarian, Bethany Albertson, and David G. Rand. 2014. Structural topic models for open‐ended survey responses. American Journal of Political Science 58(4): 1064-1082.