Marco Steenbergen studied political science at the University of Amsterdam and Stony Brook University (USA). He is a specialist in political psychology and social research methods. He is currently professor of political methodology at the University of Zürich. He previously held positions at Carnegie Mellon University, the University of North Carolina at Chapel Hill, and the Universität Bern.
Course content Machine learning refers to the use of computer algorithms to learn from data for the purpose of making predictions. It is a burgeoning field, with a close cousin in statistical learning. It is becoming ever more important in academia, government, and industry, or any other domain focused on prediction. The goal of this course is to introduce you to the core principles of machine learning and to expose you to a variety of useful algorithms. Intuition plays a more important role than mathematical development. There is a heavy emphasis on practical examples, which shall be programmed using R.
The course focuses on supervised learning whereby defined outcomes are used to define predictive performance metrics, which we seek to optimise. Both classification (is X and instance of S?) and regression (what is X’s score) problems are covered. After outlining the general approach of machine learning, with a special emphasis on cross-validation, students will learn about algorithms that are error-based (e.g., regression and support vector machines), information-based (e.g., classification and regression trees), similarity-based (e.g., nearest neighbour algorithms), rule-based (e.g., one-rule), probability-based (e.g., naive Bayes classifiers), and ensemble-based (e.g., random forests).
Course Objectives 1. To give an understanding of the purposes and challenges of machine learning. 2. To discuss various predictive performance metrics for classification and regression tasks. 3. To introduce methods of cross-validation and their purpose. 4. To demonstrate the use of ensemble, error-based, information-based, probability-based, rule-based, and similarity-based algorithms. 5. To provide hands-on experience implementing the various methods in R.
Course Prerequisites The course requires a basic understanding of probability theory and descriptive statistics. Prior familiarity with regression analysis and maximum likelihood is useful but not essential. On the other hand, students are expected to be conversant with R. Specifically, they should be able to read in and manipulate data sets (using, for example, haven and dplyr), to install and call libraries, and to write basic functions. This course is explicitly not an introduction into R.
Required text We shall be using the following textbook: Lantz, Brett. 2019. Machine Learning with R. Brigmingham: Packt. 3rd ed. This book will be provided by the Summer School as part of your course pack on arrival.
If you have questions about the course, I shall be available at the Zest Fresh during week days from 9-10. The course is meant to be an intensive learning experience. Next to daily readings, there will also be daily homework assignments to practice concepts and algorithms. To take full advantage of the course it is essential that you complete both the readings and the assignments. Expect to spend an average of 4 hours each day in addition to the time spent in class.
Day 1 — Learning, Performance, and Cross-Validation Goals and challenges of machine learning; overview of different learning methods; overview of predictive performance metrics for supervised learning; the bias-variance tradeoff and the problem of overfitting; the logic of cross-validation; methods of cross-validation and bootstrapping.
Day 2 & 3 — Error-based Learning I Regression analysis; backward and forward selection; regularization through lasso, ridge, and elastic net methods; oracle property; logistic regression; linear discriminant analysis.