Marco Steenbergen is a professor of political methodology at the University of Zurich. He previously taught at Carnegie Mellon University, the University of North Carolina at Chapel Hill, and the University of Bern. An award-winning author, Steenbergen’s methodological research focuses on measurement, choice models, multilevel analysis, and machine learning. He has an interest in the automatic coding of political texts and images. His substantive research has focused on political behavior in the Americas and in Europe.

Course content
In this course, we discuss methods of supervised machine learning, which apply when the outcomes are known. After discussing the general principles of machine learning, the course continues with specific classes of algorithms. These include nearest neighbour, rule-based algorithms, probabilistic learning, regression, classification and regression trees, support vector machines, neural networks, and ensemble learning. Special emphasis is placed on predictive feature selection and on (local) interpretation, a topic that is gaining a great deal of attention. All algorithms are implemented in R and there are detailed discussions of various methods of assessing predictive performance.

Course Objectives
Participants will gain a solid understanding of machine learning. They will learn how various algorithms are used in decision processes in government and industry. Because they learn how the algorithms function, they are also able to identify potential weaknesses. The focus on interpretation helps participants to understand how algorithms derive their predictions. Based on the course, participants can continue with advanced courses in deep learning or algorithmic design. Based on the course, participants should be able to use machine learning to advance two goals. First, the methods can be used to generate data, for example, screening for relevant documents for a research project. Second, the methods can be used to generate new theoretic insights about complex interactions.

Course Prerequisites
A solid understanding of descriptive statistics and probability theory is of great advantage, as is a basic understanding of regression analysis. Most important, students should be familiar with R. This course is not meant as an introduction to R but assumes that students are familiar with basic R programming.

Required text
We shall be using the following textbook: Lantz, Brett. 2019. Machine Learning with R. Brigmingham: Packt. 3rd ed. This book will be provided by the Summer School as part of your course pack on arrival.

Background knowledge required
Statistics
OLS = Moderate
Maximum Likelihood = elementary

Computer Background
R = strong

If you have questions about the course, I shall be available at the Zest Fresh during week days from 9-10. The course is meant to be an intensive learning experience. Next to daily readings, there will also be daily homework assignments to practice concepts and algorithms. To take full advantage of the course it is essential that you complete both the readings and the assignments. Expect to spend an average of 4 hours each day in addition to the time spent in class.

Day 1 — Learning, Performance, and Cross-Validation Goals and challenges of machine learning; overview of different learning methods; overview of predictive performance metrics for supervised learning; the bias-variance tradeoff and the problem of overfitting; the logic of cross-validation; methods of cross-validation and bootstrapping.

• Literature: Kelleher, Mac Namee & D’Arcy, Chapters 1 & 8.

• Packages: caret.

Day 2 & 3 — Error-based Learning I Regression analysis; backward and forward selection; regularization through lasso, ridge, and elastic net methods; oracle property; logistic regression; linear discriminant analysis.

• Literature: Kelleher, Mac Namee & D’Arcy, Chapters 2 & 7.

• Packages: glmnet; MASS.

Day 4 — Error-based Learning II Support vectors; support vector machines; hard and soft margins; budgets; kernels.

• Literature: Smola, Alex J., and Schölkopf, Bernhard. 2004. A Tutorial on Support Vector Regression. Statistics and Computing 14(3): 199-222.

• Packages: e1071; kernlab.

Day 5 — Information-based Learning Entropy models; information gain; automatic interaction detection; classification and regression trees; greedy algorithms; pruning.

• Literature: Kelleher, Mac Namee & D’Arcy, Chapter 4.

• Packages: CHAID; rpart; rpart.plot.

Day 6—Similarity-based Learning Distance metrics; Voronoi tessellation; nearest neighbour classification and regression; kernel densities; efficient memory search.

• Literature: Kelleher, Mac Namee & D’Arcy, Chapter 5.

• Packages: kknn;knn.

Day 7 — Rule-based Learning Deriving association rules from data; the one rule algorithm; the a priori algorithm; the eclat algorithm.

• Literature: Liu, Han, Alexander Gegov, and Michaela Cocea. 2016. Rule Based Systems for Big Data: A Machine Learning Approach. Heidelberg: Springer. Chapters 3-5.

• Packages: arules; OneR.

Day 8 — Probability-based Learning Bayes’ theorem; maximum a posteriori (MAP) hypothesis; naive Bayes classifier; Laplacian smoothing; Gaussian Bayes; decision boundaries and discriminant functions; regression problems; Bayesian networks.

• Literature: Kelleher, Mac Namee & D’Arcy, Chapter 6.

• Packages: klaR.

Day 9 — Ensemble Learning Ensembles; bagging; boosting; stacking; super learners.

• Literature: Dietterich, Thomas G. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, Vol. 1857. Heidelberg: Springer.

• Packages: adabag; fastAdaboost; ipred;randomForest;superlearner.

Day 10 — Overflow and Project Extra time should we need it; time to work on group projects that require selecting a learning algorithm for a specific data set.