Please note: This course will be taught online only. In person study is not available for this course.
Reto Wüest is a postdoctoral researcher in the Department of Comparative Politics at the University of Bergen. He holds a PhD from the University of Geneva. Reto’s methodological research focuses on measurement and the leveraging of machine learning techniques to improve the prediction accuracy of multilevel regression and post-stratification models. His substantive research focuses on political representation and legislative politics in Europe and the US. Reto has been teaching machine learning courses at the Essex Summer School in Social Science Data Analysis, the Barcelona Summer School in Survey Methodology, the University of Geneva, and the University of Bergen. His work has been published in Journal of Politics, European Journal of Political Research, and West European Politics, among others.
Machine learning refers to the automated detection of meaningful patterns in data. Not only are machine learning techniques ubiquitous today in that they are behind many of the technologies we use in our daily lives, but they have also become an important part of the social scientist’s toolkit, especially since more and more social science data are now available in electronic form. A common feature of all applications of machine learning is the use of computer algorithms that can “learn” and adapt. The goal of this course is to introduce participants to a variety of widely used supervised and unsupervised machine learning methods. After discussing the fundamental concepts of machine learning, the course continues with supervised learning methods (e.g., regularization, tree boosting, support vector machines, and neural networks), then turns to unsupervised learning methods (e.g., principal components analysis and clustering methods), and lastly covers flexible model averaging techniques (e.g., Bayesian model averaging).
Participants will gain a solid understanding of a number of widely used and powerful machine learning methods. By the end of the course, they should be able to apply all the methods covered in class using the R statistical computing environment. They should also be able to explain the logic underlying the different methods, to understand the respective strengths and weaknesses of these methods, and to interpret their results.
Participants should have a solid understanding of probability theory and regression analysis. They should also be familiar with basic programming in R.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. 2nd ed. New York: Springer. Available at https://www.statlearning.com/
Background knowledge/skills required:
Maximum Likelihood: Elementary
Linear Regression: Moderate