Moritz Marbach is a postdoctoral researcher at the Immigration Policy Lab at the ETH Zurich which is the European branch of the Immigration Policy Lab at Stanford University. His research focuses on political methodology and the role of migration in politics. Marbach received his Ph.D. from the University of Mannheim in 2016. Over the last years, he has taught various courses in political methodology and international relations.

Course Content: Building on students’ knowledge in simple linear regression, this course introduces more advanced statistical techniques. We will focus on the following topics: 1) Subclassification, matching, and propensity score weighting as fundamental strategies to reduce confounding; 2) Common interpretations of the OLS estimator; 3) Robust inference with cluster-robust standard errors as well as bootstrap estimators; 4) Classical exposition of various biases and introduction to causal graphs to select control variables; 5) Flexible function form specification using splines, interactions and regularization; 6) Instrumental variable estimation with two-stage least-squares and the Heckman estimator for selected samples; 7) Two-way fixed effects and the difference-in-difference estimator with panel data; 8) Introduction to tree-based methods for regression and classification as well as random forests. The course combines a theoretical introduction to each topic with lab sessions in which students perform replications of published work in Economics and Political Science using STATA. Students are encouraged to bring their own data sets and present their research projects and empirical analyses throughout the course.

Course Objectives: Students will develop a deeper understanding of the statistical problems that arise in applied research. It will give participants the skills to i) Make sensible choices about which data to collect; ii) Select an appropriate estimator for their estimands and the data at hand; iii) Precisely state the assumptions on which estimates are based; and iv) Conduct sensible, robust inferences and interpret their estimates accurately. More generally, students will be able to decide which variables they have to control for (and which they do not) and develop an understanding of what it means when regression results are distorted by selection bias, measurement error or omitted variable bias.

Course Prerequisites: This course is targeted at social and political scientists with a strong interest in applied empirical research and data analysis. The course is designed for students who already have training in statistics, including a good understanding of simple linear regression and statistical tests, as well as basic knowledge of matrix algebra, calculus, and probability theory. Participants must be familiar with STATA and its command structure and be able to write their own do-files. R users are also welcome.

Core Reading:
– Angrist, J. D. and J.-S. Pischke (2009). Mostly Harmless Econometrics. An Empiricist’s Companion. Princeton: Princeton University Press.

Representative Background Reading:
– Alan Agresti and Barbara Finlay. Statistical Methods for the Social Sciences. Pearson, Upper Saddle River, 4th edition, 2009, Chapter 1-9.

Course Structure:

Day 1 Cause and Effect
Day 2 Subclassification, Weighting and Matching
Day 3 Perspectives on OLS
Day 4 Robust Inference with OLS
Day 5 Causal Graphs
Day 6 Functional Form Specification
Day 7 Instrumental Variables
Day 8 DID and Panel Data
Day 9 Tree-based Methods
Day 10 Review and Student Presentations

Essential Course Readings:

Day 1
– Angrist and Pischke (2009), Chapter 1-2.
– Holland (1986).
– Agresti and Finlay (2009), Chapter 5-6.

Day 2

– Cochran (1968).
– Rosenbaum (2009), Chapter 7, 8.1-8-3, 9.

Day 3
– Angrist and Pischke (2009), Chapter 3.
– Wooldridge (2010), Chapter 2.

Day 4
– Wooldridge (2009), Chapter 5, 8.
– Angrist and Pischke (2009), Chapter 8.

Day 5
– Morgan and Winship (2007), Chapter 3.
– Elwert and Winship (2014)


Day 6

– Keele (2008), Chapter 2.1-2.2, 3.1-3.3.
– Hastie et al. (2009), Chapter 3, 5

Day 7
– Angrist and Pischke (2009), Chapter 4.
– Wooldridge (2010), Chapter 5.

Day 8
– Angrist and Pischke (2009), Chapter 5.
– Wooldridge (2010), Chapter 10.

Day 9
– Siroky (2009)
– Hastie et al. (2009), Chapter 9, 15

Day 10
– Freedman (1991)


References:

Agresti, Alan, and Barbara Finlay. 2009. Statistical Methods for the Social Sciences. 4th ed. Upper Saddle River: Pearson.

Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics. an Empiricist’s Companion. Princeton: Princeton University Press.

Cochran, W. G. 1968. “The Effectiveness of Adjustment by Subclassification in Removing Bias in Observational Studies.” Biometrics 24 (2): 295–313.

F. Elwert and C. Winship. Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. Annual Review of Sociology, 40:31–53, 2014.

Freedman, David A. 1991. “Statistical Models and Shoe Leather.” Sociological Methodology 21: 291–313.

T. Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. 2nd ed. Heidelberg: Springer.

Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60.

Keele, Luke. 2008. Semiparametric Regression for the Social Sciences. Chichester: Wiley.

Morgan, Stephen L., and Christopher Winship. 2007. Counterfactual and Causal Inference. Methods and Principles for Social Research. Cambridge: Cambridge University Press.

Rosenbaum, Paul R. 2009. Design of Observational Studies. 2nd ed. New York: Springer.

D. S. Siroky. 2009. “Navigating Random Forests and Related Advances in Algorithmic Modeling.” Statistics Surveys 3:147–163.

Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge: MIT Press.