Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Robert W. Walker is Associate Professor of Quantitative Methods in the Atkinson Graduate School of Management at Willamette University where he teaches statistics and data science. He earned a Ph. D. in political science from the University of Rochester in 2005 and has previously held teaching positions at Dartmouth College, Rice University, Texas A&M University, and Washington University in Saint Louis. He researches models as treatments for observational data and distributed lag variants of the between-within model, and semi-Markov processes for time-series, cross-section data. He previously taught four iterations in the U. S. National Science Foundation funded Empirical Implications of Theoretical Models sequence at Washington University in Saint Louis and has been an honorary instructor in panel data and time-series/panel data [joint with Harold D. Clarke from 2019 to 2021] at the Essex Summer School since 2010.  His work with Curt Signorino and Muhammet Bas was awarded the Miller Prize for the best article in Political Analysis in 2009.

Course Description:

This course provides an introduction to applied statistics with a focus on management applications. The course begins with numerical and graphical summary methods for data, e.g. means, standard deviations, and percentiles, before introducing probability and probability distributions.  We next turn to inference in the form of confidence intervals and hypothesis tests where we encounter the key probability distributions (normal, t, chi-square, F) deployed in core statistical methods that we examine: crosstabulation, t-tests, analysis of variance, correlation, and bivariate and multivariate regression.  For computation, we will rely on R and, in particular, a graphical user interface for R [radiant] that contains many of the most useful tools for report and presentation construction using RMarkdown tools.

Course Objectives:

Participants will become familiar with a wide range of statistical methods for analysing observational data that are common to both academics and professionals in management and in the social and behavioral sciences.  Participants will also gain some familiarity with R’s basic structure and more intimate familiarity with useful tools for report construction combining numerical outputs and graphical techniques built on RMarkdown through the graphical user interface for R known as radiant.

Course Prerequisites:

This is an introductory course and participants are not required or assumed to have anything more than basic mathematics.  Supplemental materials will provide students with additional tools for embarking on a more thorough journey of the R language for statistical computing and the use of markdown for document construction.

Representative Background Reading/Preparation:

A working R installation will assist in hitting the ground running.  Details and a tutorial for installation will be made available in advance of the first meeting along with information about the supplemental materials that will assist students in making the most of the materials in advance of a compressed course.

David Diez, Mine Çetinkaya-Rundel, and Christopher Barr. 2019. OpenIntro Statistics: Fourth Edition. ISBN: 1943450072. (this will be provided by ESS)

Course Outline

Day 1: Introducing Data and Summarizing Data [Chapters 1 and 2]

What are variables?  How are data collected?  Numerical summary [central tendency and spread for symmetric and asymmetric distributions] and basic graphical summary.  Scaled measures [z-scoring].

Application: Importing data, summarizing data numerically and graphically using R and radiant

Day 2: Probability and probability distributions [Chapters 3 and 4]

Formally, what are random variables?  Categorical variables via tables and probability/proportion in tables.  Defining marginal, conditional, and joint probability, Bayes rule and some key probability distributions [Normal, binomial, Poisson, chi-squared]

Application: Contintency tables and probability calculation with assumed distributions with a focus on customer churn

Day 3: Foundations of Inference [Chapters 5 and 6]

What are point estimates and what is sampling variability?  Defining confidence intervals and hypothesis tests.  How do we perform the two aforementioned types of inference with categorical data?

Application: Chi-square tests of independence; confidence intervals and hypothesis tests for single proportions and differences in proportions extending the example of customer churn.

Day 4: Inference for Quantitative Variables [Chapters 7 and 8]

Student’s t-distribution, distribution of the sample mean, F and variance ratios.  Inferring differences of means in independent and paired samples, covariance and correlation among quantitative variables, analysis of variance, and basic regression.

Application: one and two-sample t-tests, correlation tests, ANOVA, and estimating and criticizing regression with examples from cost accounting [estimating fixed and variable costs].

Day 5: Regression Modelling [Chapters 8 and 9]

Interpreting multiple regression and all else equal.  What is influence and how do outliers impact regression estimates? What are diagnostic tools for assessing regression fit and inference, including variance inflation factors?  Can we validate linear model assumptions?  How do we arrive at a best regression model?  Variable and functional form selection in multiple regression.  How do we deploy multiple regression for prediction?  How do we assess the appropriateness of inference in regression?

Application: Implementing and assessing multiple regression, basic methods of model selection and information criteria in a predictive modelling of house prices.