Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Tobias Böhmelt is a Professor in the Department of Government at the University of Essex (UK). His main research and teaching interests are civil military relations, conflict resolution and management, environmental politics, international cooperation, migration, party politics and spatial analysis. 

Course Content

This course offers an application-oriented introduction to maximum likelihood (ML) based models for categorical, discrete choice, and count data. We begin with the basics of ML estimation and a discussion of the theoretical foundations of categorical, discrete choice, and count-data models. We then focus on exploring logistic and probit regression models and learn how to apply them in the statistical software package Stata. Afterwards, we cover interpretation and hypothesis for testing these kinds of models. Against this background, we will consider more complicated estimation strategies, including ordered logit and probit regression models, multinomial logits, count models, or discrete duration models. The course concludes with an overview of advanced techniques of models for time-series cross-section (TSCS) categorical, discrete choice, and count data.

Course Objectives

After this course, participants will be able to understand models for categorical, discrete choice, and count data most commonly used in the social sciences, and properly apply and interpret these models in their own work.

Course Prerequisites

Participants are assumed to have a basic knowledge of multiple linear regression. Some familiarity with linear algebra and experience estimating regression models with statistical software might also be helpful, but not essential to success in the course.

Representative Background Readings

Gujarati, Damodar N., and Dawn C. Porter. 2009. Essentials of Econometrics. Fourth Edition. New York: Irwin/McGraw-Hill.

Kohler, Ulrich, and Frauke Kreuter. 2012. Data Analysis Using Stata. Third Edition. College Station, TX: Stata Press.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.

Long, Scott J., and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. Third Edition. College Station, TX: Stata Press.

Required Reading – this text will be provided by ESS:

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Second Edition. Cambridge: Cambridge University Press. 

Background knowledge required

Statistics

OLS = elementary

Computer Background

Course Outline

During the course, we will be using Stata (www.stata.com) as our statistical package. You can
also find more information about Stata at https://stats.idre.ucla.edu/stata. There are various
books on Stata that you might find helpful. For instance:

Kohler, Ulrich, and Frauke Kreuter. 2012. Data Analysis Using Stata. Third Edition. College
Station, TX: Stata Press.

Long, Scott J., and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables
Using Stata. Third Edition. College Station, TX: Stata Press.

The following reading list then provides an overview of the “technical” literature behind each
session. For the applications (and the corresponding “template”), we use other articles that are
stored in the Dropbox folder. You may want to read each article’s empirical parts before a
respective session, but we will go through the relevant parts in the lab as well.

1. Day: Introduction

Content

Preliminaries, introduction to Stata, replication, do-files, log-files, terminology, theories of
discrete choice as well as models for categorical and count data, and the linear probability model.

Readings

King, Gary. 1995. Replication, Replication. PS: Political Science and Politics 28: 443-499

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. Making the Most of Statistical Analyses:
Improving Interpretation and Presentation. American Journal of Political Science 44: 341-355.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications, Chapters 1, 2 (except Section 2.6), and Section 3.1.

Nagler, Jonathan. 1995. Coding Style and Good Computing Practices. The Political
Methodologist 6: 2-8.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge
University Press, Chapters 1, 2.

2. Day: Maximum Likelihood Estimation

Content

Maximizing log-likelihood functions, hypothesis testing and goodness of fit, regression via
maximum likelihood estimation, manual application of maximum likelihood estimation to
regression.

Readings

Gould, William, Jeffrey Pitblado, and William Sribney. 2003. Maximum Likelihood Estimation
with Stata. Second Edition. College Station, TX: Stata Press. Chapters 2, 3.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice
Hall. Chapter 16, Appendix E.1-E.4.

King, Gary. 2001. Unifying Political Methodology: The Likelihood Theory of Statistical
Inference. Ann Arbor, MI: University of Michigan Press. Chapters 1-4.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications, Sections 2.6, 3.5, 3.6, and Chapter 4.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge
University Press, Chapter 8.

3. and 4. Day: Binary Dependent Variables – Logit, Probit, Scobit, Heteroskedastic Probit,
Rare Events Logit

Content

Deriving the logit and probit models, identification assumptions, and alternative distributional
assumptions such as scobit and heteroskedastic probit, goodness of fit measures such as pseudo-
R2 and percent correctly predicted, interpretation and quantities of interest such as predicted
probabilities, first differences, marginal effects, confidence intervals, hypothesis testing such as
Wald test, likelihood ratio test, interaction terms, and rare-events data.

Readings

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice
Hall. Sections 23.1-23.4.

Herron, Michael. 1999. Post-Estimation Uncertainty in Limited Dependent Variable Models.
Political Analysis 8: 83-98.

King, Gary, and Langche Zeng. 2001a. Logistic Regression in Rare Events Data. Political
Analysis 12: 137-163.

King, Gary, and Langche Zeng. 2001b. Explaining Rare Events in International Relations.
International Organization 55: 693-715.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications, Sections 3.2, 3.3, 3.4, 3.7, 3.8.

Nagler, Jonathan. 1994. Scobit: An Alternative Estimator to Logit and Probit. American Journal
of Political Science 38: 230-255.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge
University Press, Chapters 3, 4, 5 (except Section 5.5)

5. and 6. Day: Multichotomous Dependent Variables

Content

A general framework for models for categorical data and discrete choice, multinomial and
conditional logit models, identification assumptions, estimation and interpretation, random taste
variation and independence of irrelevant alternatives, nested logit and multinomial probit models,
random coefficient models and mixed logit models, simulated maximum likelihood.

Readings

Alvarez, R. Michael, and Jonathan Nagler. 1998. When Politics and Models Collide: Estimating
Models of Multi-Party Elections. American Journal of Political Science 42: 55-96.

Glasgow, Garrett. 2001. Mixed Logit Models for Multiparty Elections. Political Analysis 9: 116-
136.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice
Hall. Chapter 17, Section 23.11.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications, Chapter 6.

Martin, Lanny W., and Randolph T. Stevenson. 2001. Government Formation in Parliamentary
Democracies. American Journal of Political Science 45: 33-50.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge
University Press, Chapters 6, 9, 10.

7. Day: Ordered Dependent Variable Models

Content

Ordered logit and probit models, identification, assumption of parallel regression, generalized
logit, continuation ratio model, cut-points, interpretation, and quantities of interest such as
predicted probabilities, first differences, marginal effects, and confidence intervals, and the
heteroskedastic ordered probit.

Readings

Gelpi, Christopher. 1997. Crime and Punishment: The Role of Norms in Crisis Bargaining.
American Political Science Review 91: 339-360.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice
Hall. Section 23.10.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications, Chapter 5.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge
University Press, Chapter 7.

8. Day: Count Models

Content

Poisson model: estimation and interpretation, exposure, underdispersion, overdispersion, and
mean-variance equality, negative binomial models, continuous parameter binomial models, and
generalized event count models, censored and truncated data, zero-inflated count models.

Readings

King, Gary. 1989a. Variance Specification in Event Count Models: From Restrictive Assumptions
to a Generalized Estimator. American Journal of Political Science 33: 762-784.

King, Gary. 1989b. Event Count Models for International Relations: Generalizations and
Applications. International Studies Quarterly 33: 123-147.

Greene, William H. 2008. Econometric Analysis. Sixth Edition. Upper Saddle River, NJ: Prentice
Hall. Sections 25.1-25.5.

Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications, Chapter 8.

Zorn, Christopher. 1998. An Analytic and Empirical Examination of Zero-Inflated and Hurdle
Poisson Specifications. Sociological Methods and Research 26: 368-400.

9. and 10. Day: Models for Repeated Observations: Panel and Time-Series Cross-Section Categorical Data

Content

Connection to time-series cross-section models with binary dependent variables, time dependence including robust clustered standard errors, temporal dummies, and cubic splines, ongoing events and second spells, time varying covariates, and Markov transitions models.

Readings

Alt, James E., Gary King, and Curtis S. Signorino. 2000. Aggregation among Binary, Count, and
Duration Models: Estimating the Same Quantities from Different Levels of Data. Political
Analysis 9: 21-44.

Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. Taking Time Seriously: Time-
Series-Cross-Section Analysis with a Binary Dependent Variable. American Journal of Political
Science 42: 1260-1288.

Box-Steffensmeier, Janet M., and Bradford S. Jones. 2004. Event History Modeling: A Guide for
Social Scientists. Cambridge: Cambridge University Press. Chapters 5, 7.

Box-Steffensmeier, Janet M., and Christopher Zorn. 2002. Duration Models for Repeated Events.
Journal of Politics 64: 1069-1094.

Carter, David B., and Curtis Signorino. 2010. Back to the Future: Modeling Time Dependence in
Binary Data. Political Analysis 18: 271-292.

Oneal, John, and Bruce Russett. 1999. Assessing the Liberal Peace with Alternative
Specifications: Trade Still Reduces Conflict. Journal of Peace Research 36: 423-442.

Train, Kenneth E. 2009. Discrete Choice Models with Simulation. Cambridge: Cambridge
University Press, Section 5.5.

Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data.
Cambridge, MA: MIT Press. Sections 15.1-15.7, 15.9, 15.10.

Zorn, Christopher. 2000. Modeling Duration Dependence. Political Analysis 8: 367-380