Marco Steenbergen studied political science at the University of Amsterdam and Stony Brook University (USA). He is a specialist in political psychology and social research methods. He is currently professor of political methodology at the University of Zürich. He previously held positions at Carnegie Mellon University, the University of North Carolina at Chapel Hill, and the Universität Bern.

Course description: Machine learning refers to the use of computer algorithms to learn from data for the purpose of making predictions. It is a burgeoning field, with a close cousin in statistical learning. It is becoming ever more important in academia, government, and industry, or any other domain focused on prediction.
The goal of this course is to introduce you to the core principles of machine learning and to expose you to a variety of useful algorithms. Intuition plays a more important role than mathematical development. There is a heavy emphasis on practical examples, which shall be programmed using R.

The course focuses on supervised learning whereby defined outcomes are used to define predictive performance metrics, which we seek to optimise. Both classification (is X and instance of S?) and regression (what is X’s score) problems are covered. After outlining the general approach of machine learning, with a special emphasis on cross-validation, students will learn about algorithms that are error-based (e.g., regression and support vector machines), information-based (e.g., classification and regression trees), similarity-based (e.g., nearest neighbour algorithms), rule-based (e.g., one-rule), probability-based (e.g., naive Bayes classifiers), and ensemble-based (e.g., random forests).

Course Objectives:
1. To give an understanding of the purposes and challenges of machine learning.
2. To discuss various predictive performance metrics for classification and regression tasks.
3. To introduce methods of cross-validation and their purpose.
4. To demonstrate the use of ensemble, error-based, information-based, probability-based, rule-based, and similarity-based algorithms.
5. To provide hands-on experience implementing the various methods in R.

Prerequisites
The course requires a basic understanding of probability theory and descriptive statistics. Prior familiarity with regression analysis and maximum likelihood is useful but not essential. On the other hand, students are expected to be conversant with R. Specifically, they should be able to read in and manipulate data sets (using, for example, haven and dplyr), to install and call libraries, and to write basic functions. This course is explicitly not an introduction into R.

Required text:
We shall be using the following textbook: Kelleher, John D., Brian Mac Namee, and Aoife D’Arcy. 2015. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. Cambridge, MA: MIT Press. (Circa £70.)

Additionally, a number of articles and book chapters will be assigned. These are noted in the detailed course plan and will be made available in a course pack.
In terms of software, we shall be using R. To complete the homework assignments on your own computer, you should install the latest version of R. It is recommended that you also install the latest version of RStudio. You will also need various libraries. Those are noted in the detailed course plan.

If you have questions about the course, I shall be available at the Zest Fresh during week days from 9-10.

The course is meant to be an intensive learning experience. Next to daily readings, there will also be daily homework assignments to practice concepts and algorithms. To take full advantage of the course it is essential that you complete both the readings and the assignments. Expect to spend an average of 4 hours each day in addition to the time spent in class.

Day 1 — Learning, Performance, and Cross-Validation
Goals and challenges of machine learning; overview of different learning methods; overview of predictive performance metrics for supervised learning; the bias-variance tradeoff and the problem of overfitting; the logic of cross-validation; methods of cross-validation and bootstrapping.
• Literature: Kelleher, Mac Namee & D’Arcy, Chapters 1 & 8.
• Packages: caret.

Day 2 & 3 — Error-based Learning I
Regression analysis; backward and forward selection; regularization through lasso, ridge, and elastic net methods; oracle property; logistic regression; linear discriminant analysis.
• Literature: Kelleher, Mac Namee & D’Arcy, Chapters 2 & 7.
• Packages: glmnet; MASS.

Day 4 — Error-based Learning II
Support vectors; support vector machines; hard and soft margins; budgets; kernels.
• Literature: Smola, Alex J., and Schölkopf, Bernhard. 2004. A Tutorial on Support Vector Regression. Statistics and Computing 14(3): 199-222.
• Packages: e1071; kernlab.

Day 5 — Information-based Learning
Entropy models; information gain; automatic interaction detection; classification and regression trees; greedy algorithms; pruning.
• Literature: Kelleher, Mac Namee & D’Arcy, Chapter 4.
• Packages: CHAID; rpart; rpart.plot.

Day 6—Similarity-based Learning
Distance metrics; Voronoi tessellation; nearest neighbour classification and regression; kernel densities; efficient memory search.
• Literature: Kelleher, Mac Namee & D’Arcy, Chapter 5.
• Packages: kknn;knn.

Day 7 — Rule-based Learning
Deriving association rules from data; the one rule algorithm; the a priori algorithm; the eclat algorithm.
• Literature: Liu, Han, Alexander Gegov, and Michaela Cocea. 2016. Rule Based Systems for Big Data: A Machine Learning Approach. Heidelberg: Springer. Chapters 3-5.
• Packages: arules; OneR.

Day 8 — Probability-based Learning
Bayes’ theorem; maximum a posteriori (MAP) hypothesis; naive Bayes classifier; Laplacian smoothing; Gaussian Bayes; decision boundaries and discriminant functions; regression problems; Bayesian networks.
• Literature: Kelleher, Mac Namee & D’Arcy, Chapter 6.
• Packages: klaR.

Day 9 — Ensemble Learning
Ensembles; bagging; boosting; stacking; super learners.
• Literature: Dietterich, Thomas G. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, Vol. 1857. Heidelberg: Springer.
• Packages: adabag; fastAdaboost; ipred;randomForest;superlearner.

Day 10 — Overflow and Project
Extra time should we need it; time to work on group projects that require selecting a learning algorithm for a specific data set.