 Jonathan Kropko is an assistant professor of political science at the University of Virginia where he teaches the graduate methods sequence in mathematics, regression, generalized linear models, time series and panel data, and measurement. He is the author of Mathematics for Social Scientists (2015, Sage), and his work has been published in Political Analysis, Biomatrika, and the British Journal of Political Science.

Course Content
Students will learn about statistical models for binary, ordinal, unordered categorical, count, and duration dependent variables. They will learn about the statistical theory that is used to build these models, how to run them in Stata or in R, and how to produce results that are elegant and understandable to a wide audience.

Course Objectives
In this course, students will learn how to design statistical models with purpose and creativity and to adapt them for their own particular theories and data. Generalized linear models (GLMs) combine probability models for particular variable types with the classic linear regression. If a researcher chooses a probability function that accurately reflects the distribution of the outcome variable, and designs a linear model that expresses the hypotheses implied by the theory, then the researcher can combine these models in a GLM that is perfectly tailored to fit the data and test the theory.

Course Prerequisites
Participants should be knowledgeable about linear regression models and should be able to accurately interpret a coefficient table. Participants should also know the basics for programming in Stata or in R. A solid background in some math, especially logarithms, summations, and derivatives, is useful but not strictly required.

Eric C. C. Chang, Miriam A. Golden and Seth J. Hill. 2010. “Legislative Malfeasance and Political Accountability.” World Politics. 62(2): 177-220.

Required texts
Gary King. 1998. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Ann Arbor, MI: University of Michigan Press. This book will be provided on arrival to the Summer School as part of the course material.

Background knowledge required
Statistics
OLS = S
Maximum Likelihood = e

Computer Background
Stata = S
R = e

e = elementary, m = moderate, s = strong

In this course, you will learn how to design statistical models with purpose and creativity and to adapt them for your own particular theories and data. Generalized linear models (GLMs) combine probability models for particular variable types with the classic linear regression. If you choose a probability function that accurately reflects the distribution of your outcome variable, and you design a linear model that expresses the hypotheses implied by your theory, then you can combine these models in a GLM that is perfectly tailored to fit your data and test your theory.

Week 1: Foundations of GLMs — the math and statistical theory behind all GLMs. This part of the class will mostly proceed working with pencil and paper. This may be strange for an applied data analysis course, but we won’t use statistical software for the first time until the second week of the course. The first week, although theoretical, will help you a great deal to understand the logic and construction of all the examples of GLM that we cover in detail in the second part of the course.

• Day 1: Probability functions
• Day 2: Mathematics review
• Day 3: Generalized linear models
• Day 4: Maximum likelihood estimation
• Day 5: How to report interesting, understandable results

Week 2: Examples of GLMs — we will discuss models for binary, ordinal, unordered categorical, count, and survival-time dependent variables and how to run them in Stata and in R. This course is different from more traditional courses in that we will emphasize that models like logit, Poisson, and the Cox proportional hazards model are examples of the same GLM methodology, rather than separate methods. We cannot cover every possible model in this course, and we will not attempt to do so. But we will emphasize the mechanics of GLM construction and estimation so that you will be able able to apply this knowledge to new GLMs you encounter and create in your own research.

• Day 6: Binary models
• Day 7: Ordinal models
• Day 8: Multinomial choice models
• Day 9: Count models
• Day 10: Survival models