Please note: This course will be delivered in person at the Colchester campus only. Online study is not available for this course.

Tobias Böhmelt is a Professor in the Department of Government at the University of Essex (UK). His main research and teaching interests are civil military relations, conflict resolution and management, environmental politics, international cooperation, migration, party politics and spatial analysis. 

Course Content

This course offers an application-oriented introduction to maximum likelihood (ML) based models for categorical, discrete choice, and count data. We begin with the basics of ML estimation and a discussion of the theoretical foundations of categorical, discrete choice, and count-data models. We then focus on exploring logistic and probit regression models and learn how to apply them in the statistical software package Stata. Afterwards, we cover interpretation and hypothesis for testing these kinds of models. Against this background, we will consider more complicated estimation strategies, including ordered logit and probit regression models, multinomial logits, count models, or discrete duration models. The course concludes with an overview of advanced techniques of models for time-series cross-section (TSCS) categorical, discrete choice, and count data.

Course Objectives

After this course, participants will be able to understand models for categorical, discrete choice, and count data most commonly used in the social sciences, and properly apply and interpret these models in their own work.

Course Prerequisites

Participants are assumed to have a basic knowledge of multiple linear regression. Some familiarity with linear algebra and experience estimating regression models with statistical software might also be helpful, but not essential to success in the course.

Representative Background Readings

Gujarati, Damodar N., and Dawn C. Porter. 2009. Essentials of Econometrics. Fourth Edition. New York: Irwin/McGraw-Hill.

Kohler, Ulrich, and Frauke Kreuter. 2012. Data Analysis Using Stata. Third Edition. College Station, TX: Stata Press.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.

Long, Scott J., and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. Third Edition. College Station, TX: Stata Press.

Required Reading – this text will be provided by ESS:

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Second Edition. Cambridge: Cambridge University Press. 

Background knowledge required

Statistics

OLS = elementary

Maths:

Linear Regression = elementary

Course Outline

During the course, we will be using Stata (www.stata.com) as our statistical package. You can also find more information about Stata at https://stats.idre.ucla.edu/stata. There are various books on Stata that you might find helpful. For instance:
Kohler, Ulrich, and Frauke Kreuter. 2012. Data Analysis Using Stata. Third Edition. College Station, TX: Stata Press.
Long, Scott J., and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. Third Edition. College Station, TX: Stata Press.

The following reading list then provides an overview of the “technical” literature behind each session. For the applications (and the corresponding “template”), we use other articles that are stored in the Dropbox folder. You may want to read each article’s empirical parts before a respective session, but we will go through the relevant parts in the lab as well.

Day 1: Introduction

Content

Preliminaries, introduction to Stata, replication, do-files, log-files, terminology, theories of discrete choice as well as models for categorical and count data, and the linear probability model.

Readings

King, Gary. 1995. Replication, Replication. PS: Political Science and Politics 28: 443-499

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. Making the Most of Statistical Analyses: Improving Interpretation and Presentation. American Journal of Political Science 44: 341-355.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Chapters 1, 2 (except Section 2.6), and Section 3.1.

Nagler, Jonathan. 1995. Coding Style and Good Computing Practices. The Political Methodologist 6: 2-8.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge University Press, Chapters 1, 2.

Day 2: Maximum Likelihood Estimation

Content

Maximizing log-likelihood functions, hypothesis testing and goodness of fit, regression via maximum likelihood estimation, manual application of maximum likelihood estimation to regression.

Readings

Gould, William, Jeffrey Pitblado, and William Sribney. 2003. Maximum Likelihood Estimation with Stata. Second Edition. College Station, TX: Stata Press. Chapters 2, 3.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice Hall. Chapter 16, Appendix E.1-E.4.

King, Gary. 2001. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Ann Arbor, MI: University of Michigan Press. Chapters 1-4.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Sections 2.6, 3.5, 3.6, and Chapter 4.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge University Press, Chapter 8.

Day 3. and 4.: Binary Dependent Variables – Logit, Probit, Scobit, Heteroskedastic Probit, Rare Events Logit

Content

Deriving the logit and probit models, identification assumptions, and alternative distributional assumptions such as scobit and heteroskedastic probit, goodness of fit measures such as pseudo-R2 and percent correctly predicted, interpretation and quantities of interest such as predicted probabilities, first differences, marginal effects, confidence intervals, hypothesis testing such as Wald test, likelihood ratio test, interaction terms, and rare-events data.

Readings

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice Hall. Sections 23.1-23.4.

Herron, Michael. 1999. Post-Estimation Uncertainty in Limited Dependent Variable Models. Political Analysis 8: 83-98.

King, Gary, and Langche Zeng. 2001a. Logistic Regression in Rare Events Data. Political Analysis 12: 137-163.

King, Gary, and Langche Zeng. 2001b. Explaining Rare Events in International Relations. International Organization 55: 693-715.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Sections 3.2, 3.3, 3.4, 3.7, 3.8.

Nagler, Jonathan. 1994. Scobit: An Alternative Estimator to Logit and Probit. American Journal of Political Science 38: 230-255.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge University Press, Chapters 3, 4, 5 (except Section 5.5)

Day 5. and 6: Multichotomous Dependent Variables

Content

A general framework for models for categorical data and discrete choice, multinomial and conditional logit models, identification assumptions, estimation and interpretation, random taste variation and independence of irrelevant alternatives, nested logit and multinomial probit models, random coefficient models and mixed logit models, simulated maximum likelihood.

Readings

Alvarez, R. Michael, and Jonathan Nagler. 1998. When Politics and Models Collide: Estimating Models of Multi-Party Elections. American Journal of Political Science 42: 55-96.

Glasgow, Garrett. 2001. Mixed Logit Models for Multiparty Elections. Political Analysis 9: 116-136.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice Hall. Chapter 17, Section 23.11.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Chapter 6.

Martin, Lanny W., and Randolph T. Stevenson. 2001. Government Formation in Parliamentary Democracies. American Journal of Political Science 45: 33-50.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge University Press, Chapters 6, 9, 10.

Day 7: Ordered Dependent Variable Models

Content

Ordered logit and probit models, identification, assumption of parallel regression, generalized logit, continuation ratio model, cut-points, interpretation, and quantities of interest such as predicted probabilities, first differences, marginal effects, and confidence intervals, and the heteroskedastic ordered probit.

Readings

Gelpi, Christopher. 1997. Crime and Punishment: The Role of Norms in Crisis Bargaining. American Political Science Review 91: 339-360.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice Hall. Section 23.10

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Chapter 5.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge University Press, Chapter 7.

Day 8: Count Models

Content

Poisson model: estimation and interpretation, exposure, underdispersion, overdispersion, and mean-variance equality, negative binomial models, continuous parameter binomial models, and generalized event count models, censored and truncated data, zero-inflated count models.

Readings

King, Gary. 1989a. Variance Specification in Event Count Models: From Restrictive Assumptions to a Generalized Estimator. American Journal of Political Science 33: 762-784.

King, Gary. 1989b. Event Count Models for International Relations: Generalizations and Applications. International Studies Quarterly 33: 123-147.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice Hall. Sections 25.1-25.5.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications, Chapter 8.

Zorn, Christopher. 1998. An Analytic and Empirical Examination of Zero-Inflated and Hurdle Poisson Specifications. Sociological Methods and Research 26: 368-400.

Day 9: Models for Repeated Observations: Panel and Time-Series Cross-Section Categorical Data

Content

Connection to time-series cross-section models with binary dependent variables, time dependence including robust clustered standard errors, temporal dummies, and cubic splines, ongoing events and second spells, time varying covariates, and Markov transitions models.

Readings

Alt, James E., Gary King, and Curtis S. Signorino. 2000. Aggregation among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data. Political Analysis 9: 21-44.

Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. Taking Time Seriously: Time-Series-Cross-Section Analysis with a Binary Dependent Variable. American Journal of Political Science 42: 1260-1288.

Box-Steffensmeier, Janet M., and Bradford S. Jones. 2004. Event History Modeling: A Guide for Social Scientists. Cambridge: Cambridge University Press. Chapters 5, 7.

Box-Steffensmeier, Janet M., and Christopher Zorn. 2002. Duration Models for Repeated Events. Journal of Politics 64: 1069-1094.

Carter, David B., and Curtis Signorino. 2010. Back to the Future: Modeling Time Dependence in Binary Data. Political Analysis 18: 271-292.

Oneal, John, and Bruce Russett. 1999. Assessing the Liberal Peace with Alternative Specifications: Trade Still Reduces Conflict. Journal of Peace Research 36: 423-442.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge University Press, Section 5.5.

Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. Sections 15.1-15.7, 15.9, 15.10.

Zorn, Christopher. 2000. Modeling Duration Dependence. Political Analysis 8: 367-380.