Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Robert W. Walker is Associate Professor of Quantitative Methods in the Atkinson Graduate School of Management at Willamette University where he teaches statistics and data science. He earned a Ph. D. in political science from the University of Rochester in 2005 and has previously held teaching positions at Dartmouth College, Rice University, Texas A&M University, and Washington University in Saint Louis. He researches models as treatments for observational data and distributed lag variants of the between-within model, and semi-Markov processes for time-series, cross-section data. He previously taught four iterations in the U. S. National Science Foundation funded Empirical Implications of Theoretical Models sequence at Washington University in Saint Louis and has been an honorary instructor in panel data and time-series/panel data [joint with Harold D. Clarke from 2019 to 2021] at the Essex Summer School since 2010. His work with Curt Signorino and Muhammet Bas was awarded the Miller Prize for the best article in Political Analysis in 2009.
Course Content
This course reflects a years long collaboration between the legendary Professor Harold D. Clarke, Ph. D. of the University of Texas at Dallas and I that begins by thinking about the core issues of data collected over time. In 2019, we merged our separate courses to reflect the crucial role of time series analysis in the analysis of data that vary over time and space. The course is dedicated to Harold’s memory.
This course is designed for students who already have training in basic statistics and knowledge of linear regression analysis. The course deals with problems arising from dynamics and combining the time and space dimension in statistical data analysis. In particular, we will work with aggregated time-series data first and then aggregated time-series cross-sectional data e.g. geographic/administrative units over time. This data structure has the advantage of allowing for testing highly general theories with a wide scope but renders data analysis more complicated because one has to consider the time-series aspects (dynamics) and cross-sectional aspects (spatial correlation/unit heterogeneity) at the same time. The course confronts the problems arising from this complex data structure and also provides techniques to control and account for specific complications.
We begin with an overview that presents some key review and background with a focus on linear models. From that foundation, we complete our one week course in time series analysis focusing on univariate time series models, intervention analysis, stationarity, dynamic linear models, structural models, Vector Autoregression, cointegration, and generalized ARCH models. The second week begins by discussing characteristics and types of pooled data and underlying assumptions of basic statistical models for panel data before turning to complex error structures, different kinds of heterogeneity (e.g. unit and slope), dynamic specification issues (lag structures), missing data, spatial heterogeneity and dependency, time invariant and rarely changing variables in panel data analysis with correlated unit specific effects among others. Furthermore, we will look at different data generating processes and adequate estimation procedures for e.g. binary choice and limited dependent variable models. The course combines a more theoretical introduction with practical analysis of diverse data sets using STATA and R. Students are encouraged to bring their own data sets and we are happy to schedule time for discussing unique time series and/or panel problems you may have.
Link to last years course – ESSSSDA 2022 3K: Dynamics and Heterogeneity – ESSSSDA22-3K
Course Objectives
The course requires solid knowledge of inferential statistics and linear algebra and is designed to further develop the understanding of statistical problems arising from the complex structure of pooled data. The course mostly deals with questions of specification and model choice and is therefore a practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand. The course materials are designed to help participants to solve their own estimation problems and increase the reliability and efficiency of their statistical results. The course is targeted at social scientists and business academics with average or better statistical skills and a strong interest in applied empirical research and data analysis.
Course Prerequisites
The course benefits from skills and knowledge in inferential statistics, including
basic understanding of maximum likelihood and generalized linear estimation methods. In addition, participants should have a basic understanding of matrix algebra and calculus, though the main focus of the course is applied. In addition, participants need to have a basic familiarity with STATA and/or R for the applications. Both have considerable and overlapping capabilities for the analysis of two-dimensional data.
The course is designed to build on a good working knowledge of cross-section multiple regression models. This includes knowledge of the underlying assumptions of basic linear models and the essence/implication (heteroskedasticity, autocorrelation) of these assumptions. Participants should be able to interpret regression coefficients, standard errors and significance tests and have a mastery of related concepts in statistical inference.
This course has two general foci: (1) to prepare students with an understanding of the unique challenges posed by longitudinal/panel data and (2) to provide students with tools to implement extant models from statistics and econometrics or develop their own when extant models prove inappropriate. Though lectures will cover key material and derivations, we will work through examples and new problems in a collaborative fashion. The classroom is but a small fraction of the course; you will learn by doing problem sets, readings, replications, or programming in statistical computing languages.
The first week of the course follows the general framework for time series analysis set forth by Time Series Analysis for the Social Sciences. We begin with a review of regression topics before turning to stationarity and dynamic models for single and then multiple time series. Extending the aforementioned, we examine the dominant approaches to time series modelling, structural models and VAR, before cointegration and the class of ARCH models. The second week sets about the translation to time series models with multiple distinct units. We first extend basic linear models to the host of pathologies that arise from data that vary along multiple dimensions with models of dynamics and heterogeneity. Our final discussion of standard panel data models will focus on causal interpretation of panel data models (difference-in-difference and the like). Most of this course will focus on conventional estimators for panel data, we will only briefly extend the course topics to models of discrete Markov chains and state-space transition models, limited dependent variables, and other data types. An overview of much of this is covered in Beck and Katz (N.d.).
The course will rely on a book for the time series parts – this text will be provided by ESS:
Box-Steffensmeier, Janet M., John R. Freeman, Matthew P. Hitt, and Jon C. W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge University Press.
And a series of articles and single chapters from the following texts:
Baltagi, Badi H. 2008. Econometric Analysis of Panel Data. Wiley & Wiley Interscience.
Wooldridge, Jeffrey. 2001. Econometric Analysis of Cross-Sectional and Panel Data. MIT Press.
Hsiao, Cheng. 2002. Analysis of Panel Data. Cambridge University Press.
Arrellano, Manuel. 2001. Panel Data Econometrics. Oxford University Press.
A. Colin Cameron and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge
University Press.
Enders, Walter. 1995. Applied Econometric Times Series. Wiley & Wiley Interscience.
Statistical Software and Computation
All of the models covered in this class can be estimated using standard software packages; we will focus on R and Stata. In any case, you should be very cautious of understanding the underlying math and mechanics.
Homework: The course moves fast. Working through texts is crucial and your homework is centred around a replication exercise of the text. We want to walk through how we implement/apply the techniques in the language of your choosing with syntax that allows you to construct the solutions you will need. In the second week, there are more formal replication exercises for the specific papers of interest.
We will dedicate some time to discussing relevant data sources that you have uncovered and brainstorm (collectively) appropriate methods that we have or will have discussed for addressing relevant theoretical claims given the data.
I like feedback, please share it. Always.
Background knowledge required
Statistics
Maximum Likelihood = elementary
OLS = moderate
Computer Background
Stata = elementary
R = elementary
(Elementary knowledge at one of the two is sufficient.)
Maths
Linear Regression = elementary