2W Advanced Methods for Time Series and Panel Data

Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Robert W. Walker, Ph. D. is Associate Professor of Quantitative Methods in the Atkinson Graduate School of Management at Willamette University. Though his Ph.D. is in political science, Professor Walker has taught statistics and research methods to both undergraduate and graduate students at Dartmouth College, Texas A&M University, Washington University in Saint Louis, and Rice University prior to his arrival at Atkinson in addition to courses in political economy. He was a regular instructor in the National Science Foundation’s Empirical Implications of Theoretical Models summer program at Washington University in Saint Louis and has regularly taught courses in the analysis of longitudinal data at the Essex Summer School in Social Science Data Analysis in the United Kingdom.

His joint work received the Warren Miller Prize for the best published paper in 2009 in Political Analysis, the journal of the Society for Political Methodology and the most cited journal in Political Science during the most recent evaluation period. His published work spans international political economy, political methodology, and the political economy of state and municipal bond markets.

Course Content

This course reflects a years long collaboration between the legendary Professor Harold D. Clarke, Ph. D. of the University of Texas at Dallas and I that begins by thinking about the core issues of data collected over time. In 2019, we merged our separate courses to reflect the crucial role of time series analysis in the analysis of data that vary over time and space. The course is dedicated to Harold’s memory.

This course is designed for students who already have training in basic statistics and knowledge of linear regression analysis. The course deals with problems arising from dynamics and combining the time and space dimension in statistical data analysis. In particular, we will work with aggregated time-series data first and then aggregated time-series cross-sectional data e.g. geographic/administrative units over time. This data structure has the advantage of allowing for testing highly general theories with a wide scope but renders data analysis more complicated because one has to consider the time-series aspects (dynamics) and cross-sectional aspects (spatial correlation/unit heterogeneity) at the same time. The course confronts the problems arising from this complex data structure and also provides techniques to control and account for specific complications.

We begin with an overview that presents some key review and background with a focus on linear models. From that foundation, we complete our one week course in time series analysis focusing on univariate time series models, intervention analysis, stationarity, dynamic linear models, structural models, Vector Autoregression, cointegration, and generalized ARCH models. The second week begins by discussing characteristics and types of pooled data and underlying assumptions of basic statistical models for panel data before turning to complex error structures, different kinds of heterogeneity (e.g. unit and slope), dynamic specification issues (lag structures), missing data, spatial heterogeneity and dependency, time invariant and rarely changing variables in panel data analysis with correlated unit specific effects among others. Furthermore, we will look at different data generating processes and adequate estimation procedures for e.g. binary choice and limited dependent variable models. The course combines a more theoretical introduction with practical analysis of diverse data sets using STATA and R. Students are encouraged to bring their own data sets and we are happy to schedule time for discussing unique time series and/or panel problems you may have.

Course Objectives

The course requires solid knowledge of inferential statistics and linear algebra and is designed to further develop the understanding of statistical problems arising from the complex structure of pooled data. The course mostly deals with questions of specification and model choice and is therefore a practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand. The course materials are designed to help participants to solve their own estimation problems and increase the reliability and efficiency of their statistical results. The course is targeted at social scientists and business academics with average or better statistical skills and a strong interest in applied empirical research and data analysis.

Course Prerequisites

The course benefits from skills and knowledge in inferential statistics, including basic understanding of maximum likelihood and generalized linear estimation methods. In addition, participants should have a basic understanding of matrix algebra and calculus, though the main focus of the course is applied. In addition, participants need to have a basic familiarity with STATA and/or R for the applications. Both have considerable and overlapping capabilities for the analysis of two-dimensional data.

The course is designed to build on a good working knowledge of cross-section multiple regression models. This includes knowledge of the underlying assumptions of basic linear models and the essence/implication (heteroskedasticity, autocorrelation) of these assumptions. Participants should be able to interpret regression coefficients, standard errors and significance tests and have a mastery of related concepts in statistical inference.

This course has two general foci: (1) to prepare students with an understanding of the unique challenges posed by longitudinal/panel data and (2) to provide students with tools to implement extant models from statistics and econometrics or develop their own when extant models prove inappropriate. Though lectures will cover key material and derivations, we will work through examples and new problems in a collaborative fashion. The classroom is but a small fraction of the course; you will learn by doing problem sets, readings, replications, or programming in statistical computing languages.

The first week of the course follows the general framework for time series analysis set forth by Time Series Analysis for the Social Sciences. We begin with a review of regression topics before turning to stationarity and dynamic models for single and then multiple time series. Extending the aforementioned, we examine the dominant approaches to time series modelling, structural models and VAR, before cointegration and the class of ARCH models. The second week sets about the translation to time series models with multiple distinct units. We first extend basic linear models to the host of pathologies that arise from data that vary along multiple dimensions with models of dynamics and heterogeneity. Our final discussion of standard panel data models will focus on causal interpretation of panel data models (difference-in-difference and the like). Most of this course will focus on conventional estimators for panel data, we will only briefly extend the course topics to models of discrete Markov chains and state-space transition models, limited dependent variables, and other data types. An overview of much of this is covered in Beck and Katz (N.d.).

The course will rely on a book for the time series parts – this text will be provided by ESS:

Box-Steffensmeier, Janet M., John R. Freeman, Matthew P. Hitt, and Jon C. W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge University Press.

And a series of articles and single chapters from the following texts:

Baltagi, Badi H. 2008. Econometric Analysis of Panel Data. Wiley & Wiley Interscience.

Wooldridge, Jeffrey. 2001. Econometric Analysis of Cross-Sectional and Panel Data. MIT Press.

Hsiao, Cheng. 2002. Analysis of Panel Data. Cambridge University Press.

Arrellano, Manuel. 2001. Panel Data Econometrics. Oxford University Press.

A. Colin Cameron and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge
University Press.

Enders, Walter. 1995. Applied Econometric Times Series. Wiley & Wiley Interscience.

Statistical Software and Computation

All of the models covered in this class can be estimated using standard software packages; we will focus on R and Stata. In any case, you should be very cautious of understanding the underlying math and mechanics.

Homework: The course moves fast. Working through texts is crucial and your homework is centred around a replication exercise of the text. We want to walk through how we implement/apply the techniques in the language of your choosing with syntax that allows you to construct the solutions you will need. In the second week, there are more formal replication exercises for the specific papers of interest.

We will dedicate some time to discussing relevant data sources that you have uncovered and brainstorm (collectively) appropriate methods that we have or will have discussed for addressing relevant theoretical claims given the data.

I like feedback, please share it. Always.

Background knowledge required

Statistics

OLS = elementary

Computer Background

Stata = elementary

R = elementary

(Elementary knowledge at one of the two is sufficient.)

Maths

Linear Regression = elementary

Week 1

Regression Overview, Introduction to Pooling/Time Series:
Box-Steffensmeier et al. (2014, ch. 1) and Hsiao (N.d.)
(optional: Berk and Freedman (2003)).
Key Issue: T = B +W
Lab: Summarizing and describing 2-D data and basic computing issues.

ARMA, ARIMA, and Stationarity
Box-Steffensmeier et al. (2014, chs. 2 and 5) (encouraged: Appendix: Difference Equations).
Key Issues: Stationarity Testing and ARMA processes, filtering
Lab: Unit root tests and identifying ARMA/ARIMA with Interventions

Dynamic Time Series Models
Box-Steffensmeier et al. (2014, chs. 3 and 4).
Key Issues: Structures and VARs
Lab: Dynamic linear models and VAR estimation; interpretation of dynamic models

Cointegration
Box-Steffensmeier et al. (2014, ch. 6); Keele and Linn (2008) and the controversy.
Key Issues: Equilibrium and Equilibration
Lab: Indian and Pakistani Arms using Error Correction Models

11th Aug 2023

ARCH, Advances in Time Series, and Introductory Panel Data
Box-Steffensmeier et al. (2014, ch. 7) and Pickup and Kellstedt (2022).
Key Issues: Time Varying Parameters
Lab: ARCH models of consensus and dissensus

Week 2

Unit Heterogeneity and Space:
Bell, Fairbrother and Jones (2019); Hsaio, ch. 6; Beck and Katz (N.d.);
ref. Pl¨umper and Troeger (2019); COOK, HAYS and FRANZESE (2022)
Rec. Mundlak (1978); Hausman (1978); Beck and Katz (2007)†.
Key Issue: What models do we compare and how?
Lab: Estimate, compare and interpret FE, RE, and HLM

Exploring Missing Data and Missingness
Honaker and King (2010) and Horton and Kleinman (2007)
Key Issue: Missing Data are nasty but 2-D gives leverage.
Lab: Imputation and Combination

To Generic Data
Baltagi, ch. 11 and Beck et al. (N.d.)
Rec. (Dirty Pool controversy‡)
Beck, Katz and Tucker (1998) Carter and Signorino (2010).
Key Issue: Odd things happen with limited outcomes.
Lab: Fixed effects logits, Grouped Duration, and Markov Processes

Dynamic Panel Data Estimators (With a little IV)
Cameron and Trivedi, ch. 22; Arrellano, Appendix; Pl¨umper and Troeger (2007); Wawro (2002)
Key Issue: Valid Instruments and Instrumentation in two dimensions.
Lab: Estimating DPDs and FEVD.

New Directions: TWFE and Causal Models and Other Topics
Wilson and Butler (2007); ?; Pl¨umper, Troeger and Manow (2005)
Esarey and Menger (2019) ?
Key Issue: Work backward from substance.
Lab: PCSE and Specification Issues

†: Optional: Troeger (N.d.)] and Whitten and Williams (2012)
‡: Skim the International Organization debate including Green, Kim and Yoon (2001), Oneal and Russett (2001),
Beck and Katz (2001), and King (2001) and the follow up work best summarised by Cook, Hays and Franzese (2020)
and Beiser-McGrath (2020).

References
Beck, Nathaniel, David Epstein, Simon Jackman and Sharyn O’Halloran. N.d. “Alternative Models of Dynamics in Binary Time-Series-Cross-Section Models: The Example of State Failure.” Paper presented at the 2001 Annual Meeting of the Society for Political Methodology, Emory University (Draft: July 12, 2002). URL: http://www.nyu.edu/gsas/dept/politics/faculty/beck/emory.pdf

Beck, Nathaniel and Jonathan N. Katz. 2001. “Throwing out the Baby with the Bath Water: A Comment on Green, Kim, and Yoon.” International Organization 55(2):487–495. URL: http://www.jstor.org/stable/3078640

Beck, Nathaniel, Jonathan N. Katz and Richard Tucker. 1998. “Taking time seriously: Time-series-cross section analysis with a binary dependent variable.” American Journal of Political Science 42(4):1260–1288. URL: http://www.jstor.org/stable/2991857

Beck, Nathaniel L. and Jonathan Katz. N.d. “MODELING DYNAMICS IN TIME-SERIES? CROSSSECTION POLITICAL ECONOMY DATA.” California Institute of Technology Social Science Working Paper 1304 (June 2009).

Beck, Nathaniel L. and Jonathan N. Katz. 2007. “Random Coefficient Models for Time-Series-Cross-Section Data: Monte Carlo Experiments.” Political Analysis 15(2):182–95. URL: http://pan.oxfordjournals.org/cgi/reprint/15/2/182

Beiser-McGrath, Liam F. 2020. “Separation and Rare Events.” Political Science Research and Methods pp. 1–10. Bell, Andrew, Malcolm Fairbrother and Kelvyn Jones. 2019. “Fixed and random effects models: making an informed choice.” Quality & Quantity 53(2):1051–1074. URL: https://doi.org/10.1007/s11135-018-0802-x

Berk, R. A. and D. A. Freedman. 2003. Statistical Assumptions as Empirical Commitments. In Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger, ed. T. G. Blomberg and S. Cohen. Second ed. Aldine de Gruyter chapter 10, pp. 235–54. URL: http://stat-www.berkeley.edu/˜census/berk2.pdf

Box-Steffensmeier, Janet M., John R. Freeman, Matthew P. Hitt and Jon C.W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge, UK: Cambridge University Press. Callaway, Brantly and Pedro H.C. Sant’Anna. 2021. “Difference-in-Differences with multiple time periods.” Journal of Econometrics 225(2):200–230. Themed Issue: Treatment Effect 1. URL: https://www.sciencedirect.com/science/article/pii/S0304407620303948

Carter, David B. and Curtis S. Signorino. 2010. “Back to the Future: Modeling Time Dependence in Binary Data.” Political Analysis 18(3):271–292. URL: http://pan.oxfordjournals.org/content/18/3/271.abstract

Cook, Scott J., Jude C. Hays and Robert J. Franzese. 2020. “Fixed effects in rare events data: a penalized maximum likelihood solution.” Political Science Research and Methods 8(1):92–105. E(SS)2DA 2022:

COOK, SCOTT J., JUDE C. HAYS and ROBERT J. FRANZESE. 2022. “STADL Up! The Spatiotemporal Autoregressive Distributed Lag Model for TSCS Data Analysis.” American Political Science Review pp. 1–21.

Esarey, Justin and Andrew Menger. 2019. “Practical and Effective Approaches to Dealing With Clustered Data.” Political Science Research and Methods 7(3):541–559. Green, Donald P., Soo Yeon Kim and David H. Yoon. 2001. “Dirty Pool.” International Organization 55(2):441–68. URL: http://www.jstor.org/stable/3078638

Hausman, J. A. 1978. “Specification Tests in Econometrics.” Econometrica 46(6):1251–71. URL: http://www.jstor.org/stable/1913827

Honaker, James and Gary King. 2010. “What to do About Missing Values in Time Series Cross-Section Data.” American Journal of Political Science 54:561–581. URL: http://gking.harvard.edu/files/abs/pr-abs.shtml

Horton, Nicholas J. and Ken P. Kleinman. 2007. “Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.” The American Statistician 61(1):79–90. Hsiao, C. N.d. “Why Panel Data?” Institute for Economic Policy Research 05.33. URL: http://ideas.repec.org/p/scp/wpaper/05-33.html

Keele, Luke and Suzanna Linn. 2008. “Taking Time Seriously.” American Journal of Political Science 52(1):184–200. URL: https://onlinelibrary.wiley.com/doi/10.1111/j.1540-5907.2007. 00307.x

King, Gary. 2001. “Proper Nouns and Methodological Propriety: Pooling Dyads in International Relations Data.” International Organization 55(2):497–507. URL: http://www.jstor.org/stable/3078641

Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.” Econometrica 46(1):69–85. URL: http://www.jstor.org/stable/1913646

Oneal, John R. and Bruce Russett. 2001. “Clear and Clean: The Fixed Effects of the Liberal Peace.” International Organization 55(2):469–485. URL: http://www.jstor.org/stable/3078639

Pickup, Mark and Paul M. Kellstedt. 2022. “Balance as a Pre-Estimation Test for Time Series Analysis.” Political Analysis pp. 1–10. Pl¨umper, Thomas and Vera E. Troeger. 2019. “Not so Harmless After All: The Fixed-Effects Model.” Political Analysis 27(1):21–45.

Pl¨umper, Thomas and Vera Troeger. 2007. “Efficient Estimation of Time-Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects.” Political Analysis 15(2):124–139. URL: http://pan.oxfordjournals.org/cgi/reprint/15/2/124

Pl¨umper, Thomas, Vera Troeger and Philip Manow. 2005. “Panel data analysis in comparative politics: Linking method to theory.” European Journal of Political Research 44:327–54. URL: http://www.essex.ac.uk/ecpr/events/generalconference/marburg/ papers/6/2/troeger.pdf

Troeger, Vera. N.d. “Problematic Choices.” Paper presented at the Annual Meetings of the American Political Science Association, Toronto, ON. Wawro, Gregory. 2002. “Estimating Dynamic Panel Data Models in Political Science.” Political Analysis 10(1):25–48. URL: http://pan.oxfordjournals.org/cgi/reprint/10/1/25

Whitten, Guy B. and Laron D.Williams. 2012. “ButWait, There’s More: Maxmizing Substantive Inferences from TSCS Models.” Journal of Politics 74(3):685–93. Wilson, Sven E. and Daniel M. Butler. 2007. “A Lot More to Do: The Sensitivity of Time-Series Cross- Section Analyses to Simple Alternative Specifications.” Political Analysis 15(2): 101–23.
URL: http://pan.oxfordjournals.org/cgi/reprint/15/2/101