Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Robert W. Walker, Ph. D. is Associate Professor of Quantitative Methods in the Atkinson Graduate School of Management at Willamette University. Though his Ph.D. is in political science, Professor Walker has taught statistics and research methods to both undergraduate and graduate students at Dartmouth College, Texas A&M University, Washington University in Saint Louis, and Rice University prior to his arrival at Atkinson in addition to courses in political economy. He was a regular instructor in the National Science Foundation’s Empirical Implications of Theoretical Models summer program at Washington University in Saint Louis and has regularly taught courses in the analysis of longitudinal data at the Essex Summer School in Social Science Data Analysis in the United Kingdom.
His joint work received the Warren Miller Prize for the best published paper in 2009 in Political Analysis, the journal of the Society for Political Methodology and the most cited journal in Political Science during the most recent evaluation period. His published work spans international political economy, political methodology, and the political economy of state and municipal bond markets.
Course Content
This course reflects a years long collaboration between the legendary Professor Harold D. Clarke, Ph. D. of the University of Texas at Dallas and I that begins by thinking about the core issues of data collected over time. In 2019, we merged our separate courses to reflect the crucial role of time series analysis in the analysis of data that vary over time and space. The course is dedicated to Harold’s memory.
This course is designed for students who already have training in basic statistics and knowledge of linear regression analysis. The course deals with problems arising from dynamics and combining the time and space dimension in statistical data analysis. In particular, we will work with aggregated time-series data first and then aggregated time-series cross-sectional data e.g. geographic/administrative units over time. This data structure has the advantage of allowing for testing highly general theories with a wide scope but renders data analysis more complicated because one has to consider the time-series aspects (dynamics) and cross-sectional aspects (spatial correlation/unit heterogeneity) at the same time. The course confronts the problems arising from this complex data structure and also provides techniques to control and account for specific complications.
We begin with an overview that presents some key review and background with a focus on linear models. From that foundation, we complete our one week course in time series analysis focusing on univariate time series models, intervention analysis, stationarity, dynamic linear models, structural models, Vector Autoregression, cointegration, and generalized ARCH models. The second week begins by discussing characteristics and types of pooled data and underlying assumptions of basic statistical models for panel data before turning to complex error structures, different kinds of heterogeneity (e.g. unit and slope), dynamic specification issues (lag structures), missing data, spatial heterogeneity and dependency, time invariant and rarely changing variables in panel data analysis with correlated unit specific effects among others. Furthermore, we will look at different data generating processes and adequate estimation procedures for e.g. binary choice and limited dependent variable models. The course combines a more theoretical introduction with practical analysis of diverse data sets using STATA and R. Students are encouraged to bring their own data sets and we are happy to schedule time for discussing unique time series and/or panel problems you may have.
Course Objectives
The course requires solid knowledge of inferential statistics and linear algebra and is designed to further develop the understanding of statistical problems arising from the complex structure of pooled data. The course mostly deals with questions of specification and model choice and is therefore a practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand. The course materials are designed to help participants to solve their own estimation problems and increase the reliability and efficiency of their statistical results. The course is targeted at social scientists and business academics with average or better statistical skills and a strong interest in applied empirical research and data analysis.
Course Prerequisites
The course benefits from skills and knowledge in inferential statistics, including basic understanding of maximum likelihood and generalized linear estimation methods. In addition, participants should have a basic understanding of matrix algebra and calculus, though the main focus of the course is applied. In addition, participants need to have a basic familiarity with STATA and/or R for the applications. Both have considerable and overlapping capabilities for the analysis of two-dimensional data.
The course is designed to build on a good working knowledge of cross-section multiple regression models. This includes knowledge of the underlying assumptions of basic linear models and the essence/implication (heteroskedasticity, autocorrelation) of these assumptions. Participants should be able to interpret regression coefficients, standard errors and significance tests and have a mastery of related concepts in statistical inference.
This course has two general foci: (1) to prepare students with an understanding of the unique challenges posed by longitudinal/panel data and (2) to provide students with tools to implement extant models from statistics and econometrics or develop their own when extant models prove inappropriate. Though lectures will cover key material and derivations, we will work through examples and new problems in a collaborative fashion. The classroom is but a small fraction of the course; you will learn by doing problem sets, readings, replications, or programming in statistical computing languages.
The first week of the course follows the general framework for time series analysis set forth by Time Series Analysis for the Social Sciences. We begin with a review of regression topics before turning to stationarity and dynamic models for single and then multiple time series. Extending the aforementioned, we examine the dominant approaches to time series modelling, structural models and VAR, before cointegration and the class of ARCH models. The second week sets about the translation to time series models with multiple distinct units. We first extend basic linear models to the host of pathologies that arise from data that vary along multiple dimensions with models of dynamics and heterogeneity. Our final discussion of standard panel data models will focus on causal interpretation of panel data models (difference-in-difference and the like). Most of this course will focus on conventional estimators for panel data, we will only briefly extend the course topics to models of discrete Markov chains and state-space transition models, limited dependent variables, and other data types. An overview of much of this is covered in Beck and Katz (N.d.).
The course will rely on a book for the time series parts – this text will be provided by ESS:
Box-Steffensmeier, Janet M., John R. Freeman, Matthew P. Hitt, and Jon C. W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge University Press.
And a series of articles and single chapters from the following texts:
Baltagi, Badi H. 2008. Econometric Analysis of Panel Data. Wiley & Wiley Interscience.
Wooldridge, Jeffrey. 2001. Econometric Analysis of Cross-Sectional and Panel Data. MIT Press.
Hsiao, Cheng. 2002. Analysis of Panel Data. Cambridge University Press.
Arrellano, Manuel. 2001. Panel Data Econometrics. Oxford University Press.
A. Colin Cameron and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge
University Press.
Enders, Walter. 1995. Applied Econometric Times Series. Wiley & Wiley Interscience.
Statistical Software and Computation
All of the models covered in this class can be estimated using standard software packages; we will focus on R and Stata. In any case, you should be very cautious of understanding the underlying math and mechanics.
Homework: The course moves fast. Working through texts is crucial and your homework is centred around a replication exercise of the text. We want to walk through how we implement/apply the techniques in the language of your choosing with syntax that allows you to construct the solutions you will need. In the second week, there are more formal replication exercises for the specific papers of interest.
We will dedicate some time to discussing relevant data sources that you have uncovered and brainstorm (collectively) appropriate methods that we have or will have discussed for addressing relevant theoretical claims given the data.
I like feedback, please share it. Always.
Background knowledge required
Statistics
OLS = elementary
Computer Background
Stata = elementary
R = elementary
(Elementary knowledge at one of the two is sufficient.)
Maths
Linear Regression = elementary