3B Machine Learning For Estimating Treatment Effects From Observational Data

Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Annalivia Polselli is a Post-doctoral Fellow in the Institute for Analytics and Data Science (IADS) at University of Essex, and affiliated with the Research Centre on Micro-Social Change at the Institute for Social and Economic Research (ISER). She received her PhD in Economics from University of Essex in 2022. Her research interests include econometric methods for panel data models, causal machine learning, labour and gender economics. Her current research focuses on double machine learning for panel data models.

Damian Machlanski is a Senior Research Officer at the Research Centre for Micro-Social Change (MiSoC), Institute for Social and Economic Research (ISER), University of Essex. He is also a Computer Science Ph.D. student at the university. He works at the intersection of machine learning and causality with a particular interest in the methods for heterogeneous treatment effect estimation, causal structure learning, their sensitivity to hyperparameter selection, and the problem of performance evaluation.

Paul Clarke is professor of social statistics in the Institute for Social & Economic Research (ISER) at the University of Essex. He is also a co-investigator on the ESRC Research Centre of Micro-Social Change (MiSoC) and Understanding Society: The UK Household Longitudinal Study. He is a social statistician with expertise in methods for inference from incomplete data, causal inference, longitudinal data analysis, and the latterly the use of machine learning in statistical analysis. He has been involved in work with collaborators from economics, computer science, survey methods, psychology, social epidemiology and infectious disease epidemiology.

Course Content

This course offers a comprehensive discussion of various (new and established) machine learning techniques for prediction and causal effect estimation (ATE, ATT, CATE, ITE) with observational data. The course will cover relevant techniques in machine learning from the basics (e.g., Lasso, decision and boosted trees) to more advanced topics (e.g., random forest, causal forest, meta-learners, neural-networks, double machine learning), and how they can be harnessed for effective causal estimation. Best practices in the field will be followed throughout the course, hence the content will also cover how to evaluate obtained models and select among different modelling options.

The course will combine the theory from lectures with practical (hands-on data) sessions in the statistical software R. Practical sessions will use well-established data sets or ad-hoc simulated data to apply the methods presented in the lectures with practical examples.

The main goal of the course is to equip participants with the latest machine learning techniques to conduct data visualisation and causal analysis independently. By the end of the course, the participants will know the challenges that come with observational data and know how to address them through good practice to obtain robust causal estimates.

Prior knowledge of causal estimation and machine learning is not necessary as those topics will be revised at the beginning of the course. However, thorough understanding of statistical modelling is imperative to fully appreciate the course content.

Course Objectives

By the end of the course the students will:

Know the basic principles of causal inference and machine learning.
Be aware of advantages as well as challenges that come with observational data.
Understand the role of modelling in causal inference.
Be comfortable with using various machine learning techniques to estimate causal effects.
Know how to better understand obtained estimates through visualisation and evaluation metrics.
Be familiar with the most powerful machine learning methods, including neural networks and generative models, and their use in effect estimation.
Have an in-depth knowledge of the latest state-of-the-art causal estimators, such as double machine learning.
Be confident in applying new skills in practical settings.

Course Prerequisites

Working knowledge of R (e.g., data management and visualisation)
Basics of statistical modelling (OLS, lasso)
Basics of probability and calculus

Required reading (these will be supplied by ESS)

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R (Second edition). New York: Springer. ISBN: 978-1-0716-1417-4
Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer. John Wiley & Sons. ISBN: 978-1-119-18684-7

Supplementary reading

Cerulli, Giovanni. Fundamentals of Supervised Machine Learning: With Applications in Python, R, and Stata. Springer Nature, 2023.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.
Cunningham, S. (2021). Causal inference: The mixtape. Yale university press.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: Springer.
Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165.
Machlanski, D., Samothrakis, S., & Clarke, P. (2023). Hyperparameter Tuning and Model Evaluation in Causal Effect Estimation (arXiv:2303.01412). ArXiv.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701.
Yao, L., Chu, Z., Li, S., Li, Y., Gao, J., & Zhang, A. (2021). A survey on causal inference. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(5), 1-46.

Background knowledge required

Maths:

Calculus – elementary

Linear Regression – elementary

Statistics:

OLS – moderate

Computer background:

R – moderate

Machine Learning for Estimating Treatment Effects from Observational Data

Course Outline

Day 1 (AP, DM, PC)

- Welcome (course structure, meet your instructors, assessment instructions)
- Introduction to causality
- Correlation vs. causation
- Causal inference from observational data
- Potential outcomes
  - Assumptions
  - Target causal (treatment) parameters (ATE, ATT, CATE, policy)
- Practical session
  - Google Colab, GitHub
  - R for reading and visualising data
  - Description of the data sets used in the course

Day 2 (DM, AP, PC)

Introduction to machine learning (ML)
Supervised ML (regression and classification)
Good practices: data splitting, cross-validation, model evaluation
Practical in R (learning and prediction, evaluation and metrics, CV)

Day 3 (DM, AP, PC)

- - - - Traditional causal estimators (ATE)
      - Propensity scores (inverse weighting and doubly-robust methods)
      - Lasso and simple trees
      - Metrics (ATE bias)
      - Practical in R (simple implementations of IPW/DR, learning with lasso/trees, performance evaluation)

Day 4 (DM, AP, PC)

- - - - ML for individualised estimates (CATE)
      - ATE vs. CATE vs. ITE
      - Meta-learners, more complex base learners (e.g. random forest, boosted trees)
      - Inspecting predicted CATEs: visualisation and confidence intervals
      - Metrics (PEHE)
      - Practical in R (simple implementations and external packages, learning and prediction, performance evaluation)

Day 5 (DM, AP, PC)

- - - - Hyperparameter optimisation (importance and pitfalls)
      - Goodness-of-fit vs. CATE accuracy
      - More advanced metrics (R-loss, plugins)
      - Hyperparameter tuning vs. model selection vs. ensembles
      - Practical in R (previous exercises revisited but now with tuning, using advanced metrics)

Day 6 (DM, AP, PC)

- - - - Neural networks
        
        Simple architectures as base learners (MLP)
        
        Advanced standalone estimators
      - Generative modelling
        
        Auto-encoding and adversarial networks
        
        Tree-based approaches
        
        Data imputation and augmentation
      - Practical in R (using implementations in Tensorflow/Keras, running models on GPUs)

Day 7 (AP, DM, PC)

- - - - Basics of DML:
        
        Overfitting and regularisation bias
        
        Neyman-orthogonality
        
        Sample-splitting
        
        Cross-fitting
        
        Score functions
        
        Target and nuisance parameters
      - Overview of structural causal models (PLRM, PLIVM, IRM, IIVRM)

Day 8 (AP, DM, PC)

- - - - Identification and estimation with PLRM and PLIVM
      - Practical in R with examples

Day 9 (AP, DM, PC)

- - - - Identification and estimation with IRM and IIVRM
      - Practical in R with examples

Day 10 (AP, DM, PC)

- - - - Hyperparameter tuning in DML
      - Ensembles in DML
      - Practical in R with examples
      - Recap of the course and further learning suggestions (e.g., DML for DiD)