Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Marco Steenbergen has been a professor of political methodology at the University of Zurich since 2011. Prior to that, he held appointments at the University of Bern, the University of North Carolina at Chapel Hill, and Carnegie Melon University. Marco’s research covers methodology, as well as political psychology. He has published several books and articles in these areas. His current research focuses on electoral consideration sets, cleavages and identities, and new forms of political participation.
Course content
Curious about machine learning but not sure where to start? This course is designed for social scientists with no prior experience in machine learning who want to make sense of messy, complex data—and actually enjoy doing it.
We’ll introduce you to data science through hands-on learning with real social science applications. Using tidymodels in R, you’ll build and interpret powerful models that can help you:
- Predict outcomes (like who’s likely to vote or support a policy)
- Impute missing data in surveys or large datasets
- Discover patterns in open-ended responses or opinion data
- Explore how machine learning can support causal questions
Our focus is on methods that are easy to understand and widely used: decision trees, random forests, boosting, and simple neural networks. You’ll learn how they work, when to use them, and how to explain them—all without needing a math degree.
Whether you’re a grad student, policy researcher, or early-career scholar, this course will help you bring machine learning into your toolkit and use it to answer real-world questions.
Course Objectives
Machine learning is of ever greater importance in the social sciences, both inside and outside of academia. The ultimate goal of this course is to make you conversant with the most important techniques and ideas of machine learning. This means that you have a good overview of the fields and its relevance for social scientific research. It also means that you have sufficient background knowledge to allow you to study further. This is important because 2-week course can only scratch the surface of machine learning, which evolves quickly. Being conversant with machine learning also means that you understand how to implement these methods, which we shall do in R. Note that the examples will be relatively small, with an eye on minimizing computation time. Where necessary, we shall discuss how to engage in big data analysis.
Course Prerequisites
This is an introductory course, meaning that prior familiarity with machine learning is not expected. It is useful if you have used the linear regression model before, as it is a starting point for much of the course. A basic knowledge of probability theory is indispensable, as is a working understanding of R. In R, you should know: (1) how to access various data sources; (2) the basic objects of the language; (3) basic operations; (4) the ability to compute descriptive statistics and create graphs; and (5) the basics of tidyverse.
Required text – (this text will be provided by ESS):
Max Kun and Juli Silge, Tidy Modeling with R: A Framework for Modeling in the Tidyverse. ISBN: 978-1492096481.
Background texts
For a cursory introduction to many of the topics, you might consult: Lantz, Brett. 2019. Machine Learning with R: Expert Techniques for Predictive Modeling. Packt Publishing, 3rd. edition.
For an introduction to statistical concepts and R, you might want to consult Learning Statistics with R.
Background knowledge required
Maths
Calculus = moderate
Linear Regression = moderate
Statistics
OLS = elementary
Maximum Likelihood – elementary
Computer Background
R = moderate
For participation in this course, students are required to bring with them their own laptops.