Marco Steenbergen studied political science at the University of Amsterdam and Stony Brook University (USA). He is a specialist in political psychology and social research methods. He is currently professor of political methodology at the University of Zürich. He previously held positions at Carnegie Mellon University, the University of North Carolina at Chapel Hill, and the Universität Bern.

Course Content
This course will introduce participants to a fascinating field of statistics. We will see how we can rely on statistical models to gain a deep understanding from data. This often involves finding optimal predictions and classifications. Machine Learning (also known as Statistical Learning) is quickly developing and is being applied in various fields such as business analytics, political science, sociology, and elsewhere.

Course Objectives
This course aims to provide an introduction to the data science approach to the quantitative analysis of data using the methods of statistical learning, an approach blending classical statistical methods with recent advances in computational and machine learning. The course will cover the main analytical methods from this field with hands-on applications using example datasets. This will allow students to gain experience with and confidence in using the methods we cover.

Students will know how to successfully apply a number of tools and models for supervised and unsupervised learning. After a short probability refresher, students will learn how to evaluate various methods based on cross-validation. We will then see how we can create optimal prediction models. Creating a good prediction model requires choosing an optimal set of explanatory variables. To this end, we will rely on subset selection, shrinkage methods, lasso, and ridge regression. Classification is another prominent topic and we will use decision trees and random forests to solve such problems. Finally, in terms of data reduction, we will rely on principle component analysis. All these tools provide the foundation for students to then solve real-world problems, potentially by combining these various approaches. The focus of this class is on giving the students sufficient practical training such that they can fruitfully apply these methods in their own work.

Course Prerequisites
Students are expected to have a solid understanding of linear regression models and preferably know binary models. Some prior exposure to statistical software is beneficial but not required. The course will also provide a short introduction to RStudio at the beginning. More important than prior training will be a willingness to engage with the topics of the class.

1. Introduction Course introduction:
What is machine learning? How can model accuracy be assessed? Course introduction and basic concepts of machine learning. We will do this refresher with a special focus on how linear regression is used in statistical learning.

2. Logit models
Logit models can be used for models where the outcome variable is binary. We will use logit models to solve classification problems.

3. Resampling
How good are our prediction models? Cross-validation provides an optimal approach to assess model quality. We will focus specifically on k-fold cross-validation.

4. Model selection I
Model selection 1: When facing many potential explanatory factors we need to choose the “right” combination. Day 5 will cover best subset selection which allows us to identify the optimal set of predictors.

5. Model selection II
Model selection 2: Beyond best subset selection we also have additional tools at our disposal. Lasso regression provides a flexible approach to deal with many potential explanatory variables.

6. Polynomial models
Moving beyond linearity: Polynomial functions add a great deal of flexibility and we will see how we can combine these models with the various techniques from the first six days.

7. Regression trees
Prediction can also be based on non-standard models such as regression trees. We will see how to use regression trees and how we combine several regression trees into one prediction to boost predictive accuracy.

8. Support Vector
A classic classification method that can be extremely helpful in a
Machine variety of settings. We will see how it differs from Linear Discriminant Analysis.

9. Data reduction
Unsupervised learning: A prominent tool within unsupervised learning techniques is principle component analysis. It is an easy way to reduce the dimensionality of the parameter space.

10. Final project

“Bringing it all together”: We will spend our last day on how to work through an applied example and how to combine the various elements successfully. Participants are encouraged to bring their own projects.