Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

walker

Dr. Robert W. Walker: Is Associate Professor of Quantitative Methods in the Atkinson Graduate School of Management at Willamette University (2012-). He earned a Ph. D. in political science from the University of Rochester in 2005 and has previously held teaching positions at Dartmouth College, Rice University, Texas A&M University, and Washington University in Saint Louis. His current research develops and applies semi-Markov processes to time-series, cross-section data in international relations and international/comparative political economy. He teaches courses in quantitative methods/applied statistics and microeconomic strategy and previously taught four iterations in the U. S. National Science Foundation funded Empirical Implications of Theoretical Models sequence at Washington University in Saint Louis.

Course Content

This course is designed for students who already have training in basic statistics and
knowledge of linear regression analysis. The course deals with problems arising from dynamics and combining the time and space dimension in statistical data analysis. In particular, we will work with aggregated time-series data first and then aggregated time-series cross-sectional data e.g. geographic/administrative units over time. This data structure has the advantage of allowing for testing highly general theories with a wide scope but renders data analysis more complicated because one has to consider the time-series aspects (dynamics) and cross-sectional aspects (spatial correlation/unit heterogeneity) at the same time. The course confronts the problems arising from this complex data structure and also provides techniques to control and account for specific complications.

We begin with an overview that presents some key review and background with a focus on linear models. From that foundation, we complete our one week course in time series analysis focusing on univariate time series models, intervention analysis, stationarity, dynamic linear models, structural models, Vector Autoregression, cointegration, and generalized ARCH models. The second week begins by discussing characteristics and types of pooled data and underlying assumptions of basic statistical models for panel data before turning to complex error structures, different kinds of heterogeneity (e.g. unit and slope), dynamic specification issues (lag structures), missing data, spatial heterogeneity and dependency, time invariant and rarely changing variables in panel data analysis with correlated unit specific effects among others. Furthermore, we will look at different data generating processes and adequate estimation procedures for e.g. binary choice and limited dependent variable models. The course combines a more theoretical introduction with practical analysis of diverse data sets using STATA and R. Students are encouraged to bring their own data sets and we are happy to schedule time for discussing unique time series and/or panel problems you mayhave.

Course Objectives

The course requires solid knowledge of inferential statistics and linear algebra and is
designed to further develop the understanding of statistical problems arising from the complex structure of pooled data. The course mostly deals with questions of specification and model choice and is therefore a practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand. The course materials are designed to help participants to solve their own estimation problems and increase the reliability and efficiency of their statistical results. The course is targeted at social scientists and business academics with average or better statistical skills and a strong interest in applied empirical research and data analysis.

Course Prerequisites

The course benefits from skills and knowledge in inferential statistics, including
basic understanding of maximum likelihood and generalized linear estimation methods. In addition, participants should have a basic understanding of matrix algebra and calculus, though the main focus of the course is applied. In addition, participants need to have a basic familiarity with STATA and/or R for the applications. Both have considerable and overlapping capabilities for the analysis of two-dimensional data.
The course is designed to build on a good working knowledge of cross-section multiple regression models. This includes knowledge of the underlying assumptions of basic linear models and the essence/implication (heteroskedasticity, autocorrelation) of these assumptions. Participants should be able to interpret regression coefficients, standard errors and significance tests and have a mastery of related concepts in statistical inference.

This course has two general foci: (1) to prepare students with an understanding of the unique challenges posed by longitudinal/panel data and (2) to provide students with tools to implement extant models from statistics and econometrics or develop their own when extant models prove inappropriate. Though lectures will cover key material and derivations, we will work through examples and new problems in a collaborative fashion. The classroom is but a small fraction of the course; you will learn by doing problem sets, readings, replications, or programming in statistical computing languages.

The first week of the course follows the general framework for time series analysis set forth by Time Series Analysis for the Social Sciences. We begin with a review of regression topics before turning to stationarity and dynamic models for single and then multiple time series. Extending the aforementioned, we examine the dominant approaches to time series modelling, structural models and VAR, before cointegration and the class of ARCH models. The second week sets about the translation to time series models with multiple distinct units. We first extend basic linear models to the host of pathologies that arise from data that vary along multiple dimensions with models of dynamics and heterogeneity. Our final discussion of standard panel data models will focus on causal interpretation of panel data models (difference-in-difference and the like). Most of this course will focus on conventional estimators for panel data, we will only briefly extend the course topics to models of discrete Markov chains and state-space transition models, limited dependent variables, and other data types. An overview of much of this is covered in Beck and Katz (N.d.).

The course will rely on a book for the time series parts – this text will be provided by ESS:

Janet M. Box-Steffensmeier, John R. Freeman, Matthew P. Hitt, and Jon C.W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge University Press.

and a series of articles and single chapters from the following texts.

Baltagi, Badi H. 2008. Econometric Analysis of Panel Data. Wiley & Wiley Interscience.

Wooldridge, Jeffrey. 2001. Econometric Analysis of Cross-Sectional and Panel Data. MIT Press.

Hsiao, Cheng. 2002. Analysis of Panel Data. Cambridge University Press.

Arrellano, Manuel. 2001. Panel Data Econometrics. Oxford University Press.

A. Colin Cameron and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge
University Press.

Enders, Walter. 1995. Applied Econometric Times Series. Wiley & Wiley Interscience.

Statistical Software and Computation

All of the models covered in this class can be estimated using standard software packages; we will focus on R and Stata. In any case, you should be very cautious of understanding the underlying math and mechanics.

We will dedicate some time to discussing relevant data sources that you have uncovered and brainstorm (collectively) appropriate methods that we have or will have discussed for addressing relevant theoretical claims given the data.

Background knowledge required

Statistics

Maximum Likelihood = elementary

OLS = moderate

Computer Background

Stata = elementary

R = elementary

(Elementary knowledge at one of the two is sufficient.)

Maths

Linear Regression = moderate

Week 1

 

08.Aug.2022

Regression Overview, Introduction to Pooling/Time Series:

Box-Steffensmeier et al. (2014, ch. 1) and Hsiao (N.d.)

(optional: Berk and Freedman (2003)).

Key Issue: T = B +W

Lab: Summarizing and describing 2-D data.

 

09.Aug.2022

ARMA, ARIMA, and Stationarity

Box-Steffensmeier et al. (2014, chs. 2 and 5) (encouraged: Appendix: Difference Equations).

Key Issues: Stationarity Testing and ARMA processes, filtering

Lab: Unit root tests and identifying ARMA/ARIMA with Interventions

 

10.Aug.2022

Dynamic Time Series Models

Box-Steffensmeier et al. (2014, chs. 3 and 4).

Key Issues: Structures and VARs

Lab: Dynamic linear models and VAR estimation; interpretation of dynamic models

 

11.Aug.2022

Cointegration

Box-Steffensmeier et al. (2014, ch. 6); Keele and Linn (2008) and the controversy.

Key Issues: Equilibrium and Equilibration

Lab: Indian and Pakistani Arms using Error Correction Models

 

12.Aug.2022

ARCH and Advances in Time Series

Box-Steffensmeier et al. (2014, ch. 7).

Key Issues: Time Varying Parameters

Lab: Blair’s War: ARCH models of consensus and dissensus

 

Week 2

 

15.Aug.2022

Unit Heterogeneity:

Bell, Fairbrother and Jones (2019); Hsaio, ch. 6; Beck and Katz (N.d.); ref. Plumper and Troeger (2019)

Rec. Mundlak (1978); Hausman (1978); Beck and Katz (2007*).

Key Issue: What models do we compare and how?

Lab: Estimate, compare and interpret FE, RE, and HLM

 

16.Aug.2022

Exploring Missing Data and Missingness

Honaker and King (2010) and Horton and Kleinman (2007)

Key Issue: Missing Data are nasty but 2-D gives leverage.

Lab: Imputation and Combination

 

17.Aug.2022

To Generic Data

Baltagi, ch. 11 and Beck et al. (N.d.)

Rec. (Dirty Pool controversy**)

Beck, Katz and Tucker (1998) Carter and Signorino (2010).

Key Issue: Odd things happen with limited outcomes.

Lab: Fixed effects logits, Grouped Duration, and Markov Processes

 

18.Aug.2022

Dynamic Panel Data Estimators (With a little IV)

Cameron and Trivedi, ch. 22; Arrellano, Appendix; Plumper and Troeger (2007); Wawro (2002)

Key Issue: Valid Instruments and Instrumentation in two dimensions.

Lab: Estimating DPDs and FEVD.

 

19.Aug.2022

Summary Topics and New Directions: TWFE and Causal Models

Wilson and Butler (2007); Plumper, Troeger and Manow (2005)

Esarey and Menger (N.d.)

Key Issue: Work backward from substance.

Lab: PCSE and Specification Issues

*: Optional: Troeger (N.d.)] and Whitten and Williams (2012)

**: Skim the International Organization debate including Green, Kim and Yoon (2001), Oneal and Russett (2001),

Beck and Katz (2001), and King (2001) and the follow up work best summarised by Cook, Hays and Franzese (2020)

and Beiser-McGrath (2020).

 

References

 

Beck, Nathaniel, David Epstein, Simon Jackman and Sharyn O’Halloran. N.d. “Alternative Models of

Dynamics in Binary Time-Series-Cross-Section Models: The Example of State Failure.” Paper presented at the 2001 Annual Meeting of the Society for Political Methodology, Emory University (Draft: July 12, 2002).

URL: http://www.nyu.edu/gsas/dept/politics/faculty/beck/emory.pdf

 

Beck, Nathaniel and Jonathan N. Katz. 2001. “Throwing out the Baby with the BathWater: A Comment on Green, Kim, and Yoon.” International Organization 55(2):487–495.

URL: http://www.jstor.org/stable/3078640

 

Beck, Nathaniel, Jonathan N. Katz and Richard Tucker. 1998. “Taking time seriously: Time-series-crosssection analysis with a binary dependent variable.” American Journal of Political Science 42(4):1260–1288.

URL: http://www.jstor.org/stable/2991857

 

Beck, Nathaniel L. and Jonathan Katz. N.d. “MODELING DYNAMICS IN TIME-SERIES?CROSSSECTION

POLITICAL ECONOMY DATA.” California Institute of Technology Social ScienceWorking

Paper 1304 (June 2009).

 

Beck, Nathaniel L. and Jonathan N. Katz. 1995. “What to Do (and Not to Do) with Time-Series-Cross-

Section Data in Comparative Politics.” American Political Science Review 89(3):634–647.

URL: http://www.jstor.org/stable/2082979

 

Beck, Nathaniel L. and Jonathan N. Katz. 2007. “Random Coefficient Models for Time-Series-Cross-

Section Data: Monte Carlo Experiments.” Political Analysis 15(2):182–95.

URL: http://pan.oxfordjournals.org/cgi/reprint/15/2/182

 

Beiser-McGrath, Liam F. 2020. “Separation and Rare Events.” Political Science Research and Methods

  1. 1–10.

 

Bell, Andrew, Malcolm Fairbrother and Kelvyn Jones. 2019. “Fixed and random effects models: making an informed choice.” Quality & Quantity 53(2):1051–1074.

URL: https://doi.org/10.1007/s11135-018-0802-x

 

Berk, R. A. and D. A. Freedman. 2003. Statistical Assumptions as Empirical Commitments. In Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger, ed. T. G. Blomberg and S. Cohen. Second ed. Aldine de Gruyter chapter 10, pp. 235–54.

URL: http://stat-www.berkeley.edu/˜census/berk2.pdf

 

Box-Steffensmeier, Janet M., John R. Freeman, Matthew P. Hitt and Jon C.W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge, UK: Cambridge University Press.

 

Carter, David B. and Curtis S. Signorino. 2010. “Back to the Future: Modeling Time Dependence in Binary Data.” Political Analysis 18(3):271–292.

URL: http://pan.oxfordjournals.org/content/18/3/271.abstract

 

Cook, Scott J., Jude C. Hays and Robert J. Franzese. 2020. “Fixed effects in rare events data: a penalized maximum likelihood solution.” Political Science Research and Methods 8(1):92–105.

 

Esarey, Justin and Andrew Menger. N.d. “Practical and Effective Approaches to Dealing with Clustered Data.” version: January 31, 2017.

 

Green, Donald P., Soo Yeon Kim and David H. Yoon. 2001. “Dirty Pool.” International Organization

55(2):441–68.

URL: http://www.jstor.org/stable/3078638

 

Hausman, J. A. 1978. “Specification Tests in Econometrics.” Econometrica 46(6):1251–71.

URL: http://www.jstor.org/stable/1913827

 

Honaker, James and Gary King. 2010. “What to do About Missing Values in Time Series Cross-Section

Data.” American Journal of Political Science 54:561–581.

URL: http://gking.harvard.edu/files/abs/pr-abs.shtml

 

Horton, Nicholas J. and Ken P. Kleinman. 2007. “Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.” The American Statistician 61(1):79–90.

 

Hsiao, C. N.d. “Why Panel Data?” Institute for Economic Policy Research 05.33.

URL: http://ideas.repec.org/p/scp/wpaper/05-33.html

 

Keele, Luke and Suzanna Linn. 2008. “Taking Time Seriously.” American Journal of Political Science

52(1):184–200.

URL: https://onlinelibrary.wiley.com/doi/10.1111/j.1540-5907.2007.00307.x

 

King, Gary. 2001. “Proper Nouns and Methodological Propriety: Pooling Dyads in International Relations Data.” International Organization 55(2):497–507.

URL: http://www.jstor.org/stable/3078641

 

Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.” Econometrica 46(1):69–85.

URL: http://www.jstor.org/stable/1913646

 

Oneal, John R. and Bruce Russett. 2001. “Clear and Clean: The Fixed Effects of the Liberal Peace.”

International Organization 55(2):469–485.

URL: http://www.jstor.org/stable/3078639

 

Plumper, Thomas and Vera Troeger. 2007. “Efficient Estimation of Time-Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects.” Political Analysis 15(2):124–139.

URL: http://pan.oxfordjournals.org/cgi/reprint/15/2/124

 

Plumper, Thomas, Vera Troeger and Philip Manow. 2005. “Panel data analysis in comparative politics: Linking method to theory.” European Journal of Political Research 44:327–54.

URL: http://www.essex.ac.uk/ecpr/events/generalconference/marburg/papers/6/2/troeger.pdf

 

Plumper, Thomas and Vera E. Troeger. 2019. “Not so Harmless After All: The Fixed-Effects Model.”

Political Analysis 27(1):21–45.

 

Troeger, Vera. N.d. “Problematic Choices.” Paper presented at the Annual Meetings of the American Political Science Association, Toronto, ON.

 

Wawro, Gregory. 2002. “Estimating Dynamic Panel Data Models in Political Science.” Political Analysis 10(1):25–48.

URL: http://pan.oxfordjournals.org/cgi/reprint/10/1/25

 

Whitten, Guy B. and Laron D.Williams. 2012. “ButWait, There’s More: Maxmizing Substantive Inferences from TSCS Models.” Journal of Politics 74(3):685–93.

 

Wilson, Sven E. and Daniel M. Butler. 2007. “A Lot More to Do: The Sensitivity of Time-Series Cross-

Section Analyses to Simple Alternative Specifications.” Political Analysis 15(2):101–23.

URL: http://pan.oxfordjournals.org/cgi/reprint/15/2/101