Marco Steenbergen studied political science at the University of Amsterdam and Stony Brook University (USA). He is a specialist in political psychology and social research methods. He is currently professor of political methodology at the University of Zürich. He previously held positions at Carnegie Mellon University, the University of North Carolina at Chapel Hill, and the Universität Bern.
Machine learning refers to the use of computer algorithms to learn from data for the purpose of making predictions. It is a burgeoning field, with a close cousin in statistical learning. It is becoming ever more important in academia, government, and industry, or any other domain focused on prediction.
The goal of this course is to introduce you to the core principles of machine learning and to expose you to a variety of useful algorithms. Intuition plays a more important role than mathematical development. There is a heavy emphasis on practical examples, which shall be programmed using R.
The course focuses on supervised learning whereby defined outcomes are used to define predictive performance metrics, which we seek to optimise. Both classification (is X and instance of S?) and regression (what is X’s score) problems are covered. After outlining the general approach of machine learning, with a special emphasis on cross-validation, students will learn about algorithms that are error-based (e.g., regression and support vector machines), information-based (e.g., classification and regression trees), similarity-based (e.g., nearest neighbour algorithms), rule-based (e.g., one-rule), probability-based (e.g., naive Bayes classifiers), and ensemble-based (e.g., random forests).
1. To give an understanding of the purposes and challenges of machine learning.
2. To discuss various predictive performance metrics for classification and regression tasks.
3. To introduce methods of cross-validation and their purpose.
4. To demonstrate the use of ensemble, error-based, information-based, probability-based, rule-based, and similarity-based algorithms.
5. To provide hands-on experience implementing the various methods in R.
The course requires a basic understanding of probability theory and descriptive statistics. Prior familiarity with regression analysis and maximum likelihood is useful but not essential. On the other hand, students are expected to be conversant with R. Specifically, they should be able to read in and manipulate data sets (using, for example, haven and dplyr), to install and call libraries, and to write basic functions. This course is explicitly not an introduction into R.
We shall be using the following textbook: Lantz, Brett. 2015. Machine Learning with R. Brigmingham: Packt. 2nd ed. This book will be provided by the Summer School as part of your course pack on arrival.
Additionally, a number of articles and book chapters will be assigned. These are noted in the detailed course plan and will be made available in a course pack.
In terms of software, we shall be using R. To complete the homework assignments on your own computer, you should install the latest version of R. It is recommended that you also install the latest version of RStudio. You will also need various libraries. Those are noted in the detailed course plan.