Patrick Shea is an assistant professor in the Department of Political Science at the University of Houston. His research interests are international relations, the political economy of conflict, and statistical inference. His research can be found in the Journal of Conflict Resolution, International Studies Quarterly, Economics & Politics, and Statistics, Politics and Policy among other journals. He received his PhD from Rutgers University.

Course Content
This course introduces statistical models called generalized linear models (GLM’s). GLM’s encompass an incredibly flexible family of models, which extend to a broader class of outcome variables than OLS can. We will consider how to perform regression analysis with the following types of outcome variables: continuous, counts, dichotomous outcomes, categorical outcomes, duration, and more. In addition, we will analyze mixture models, which combine two or more of model processes, such as zero-inflation count model, tobit models, and selection models. GLM’s are widely used across the social sciences to gain empirical traction upon all sorts of questions.  We will focus on how to apply these models to a range of data, assess the models, and interpret and present the results. The biggest payoff from this course will likely come from the substantive work you can do by unleashing generalized linear models into social science questions – work which you cannot properly do with a simple linear model.

Course Objectives
The main goal of this course is to help you make progress towards becoming a well-informed user and consumer of generalized linear models.  We will focus on (1) how to appropriately translate research questions into statistical models for non-linear problems, (2) estimating GLM parameters using the maximum likelihood principle, and (3) interpreting results and identifying limitations of non-linear regression models. The first aspect of this class focuses on understanding the unified theoretical basis for the using GLM’s. Emphasis will be placed on building from standard linear models, extending the linear model to GLM’s, and going beyond GLM’s. The second aspect of the course is focused on using the statistical package R to model GLM’s. R is a powerful and capable statistical computing tool. And it’s free! The skills attained in this course are the foundation for any social science data science toolkit.

Course Prerequisites
The background required for the course is a good introduction to probability and statistical inference, some experience with linear regression, and some knowledge of a statistical software program like R or Stata.

Representative Background Reading
Gailmard, Sean. Statistical modeling and inference for social science. Cambridge University Press, 2014.

Core Reading
Faraway, Julian J. 2016. Extending the Linear Model with R: Generalized Linear, Mixed Effects, and Nonparametric Regression Models. Chapman & Hall/CRC. 2nd Edition

Background knowledge required
OLS = moderate
Maximum Likelihood = elementary

Computer Background
Stata = elementary
R = elementary

• Each class begins with a short lecture over the class material (approximately 60 – 75 minutes).
• After each lecture, I will then provide an overview of some GLM /MLE applications in R (approximately 60 minutes).
• The remaining portion of class will be devoted to hands on learning with real data examples.
• The course schedule section, which is below, provides even more details about the topic of the lecture for each class day and citations for relevant required readings (which will be provided).

1. Introduction: Uncertainty and Data-Generating-Processes (DGP)
– Moore and Siegel (2013) A Mathematics Course for Political & Social Research Princeton University Press. Chapters 9-11.
– Gailmard, Sean. Statistical modeling and inference for social science. Cambridge University Press, 2014. Chapter 3

2. OLS, Inference, and Hypothesis Testing.
– Faraway, Chapter 1.
– Crawley, Statistics – An Introduction Using R, Chapter 7

3. Introduction to GLM and MLE
– Faraway, Appendix A.
– Dobson, Ch3 – Exponential Family and GLM in An Introduction to GLM

4. Models for Dichotomous Outcomes.
– Long- Regression Models, Chapter 3

5. Binary Response Models: Advanced Topics
– Esarey, Justin and Andrew Pierce. 2012. “Assessing Fit Quality and Testing for
Misspecification in Binary-Dependent Variable Models.” Political Analysis 20(4):480-500.
– Greenhill, B, Ward, MD, and Sacks, A. “The Separation Plot: A New Visual Method for Evaluating the Fit of Binary Models.” American Journal of Political Science 55.4 (October 1, 2011): 991-1002

6. Models for Count Outcomes.
– Germán Rodríguez- Ch 4 Count Models
– Regression Models for Count Data in R

7. Models For Ordered and Unordered Categorical Data.
– Rodríguez- Ch 6 Multinomial Outcomes
-Multinomial Probit and Logit: A Comparison of Choice Models for Voting Research, by Jay K. Dow and James W. Enderby

8. Event duration models
-Teachman, Jay D. and Mark D. Hayward. 1993. “Interpreting Hazard Rate
Models.” Sociological Methods and Research 21(3):340-371.

9. Mixture Models, Censored Data, and Selection Models
-Sigelman, Lee and Langche Zeng. 1999. “Analyzing Censored and Sample-Selected
Data with Tobit and Heckit Models.” Political Analysis 8(2):167-182.
-Dubin, Jeffery A. and Douglas Rivers. 1989. “Selection Bias in Linear Regression,
Logit and Probit Models.” Sociological Methods and Research 18(2-3): 360-390.

10. Panel Data Analysis in GLM and Wrap Up.
-Faraway, Chapter 6.