Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Rabia Malik is Lecturer/Assistant Professor in the Department of Government at the University of Essex, which she joined in July 2020. Before this, she received her Ph.D. in Political Science from the University of Rochester in 2016, was a Post-Doctoral Associate at New York University Abu Dhabi (2016-2019) and spent a year at the Lahore University of Management Sciences (LUMS). Her research uses both observational and experimental data to study questions related to distributive politics and development, political accountability, clientelism, and gender, particularly in South Asia. She has also taught classes on quantitative methods, authoritarianism, and South Asian politics. Rabia’s research has appeared in The Journal of PoliticsThe British Journal of Political Science, Comparative Political Studies and Legislative Studies Quarterly.

 

Course description and goals:
This course introduces participants to the analysis of quantitative data in the free, open-source software R. R is a highly versatile software environment suitable for introductory and advanced quantitative social science and data analysis. The course offers participants a near-complete foundation to use R for all commonly encountered tasks in social science data analytics.

The course will explore the following topics:

  • Introduction to the R language and software architecture
  • Incorporating R code and document production (R Markdown)
  • Workflow, reproducibility, and version control in R
  • Data import and data management, including working with “messy” datasets
  • Descriptive statistics
  • Data visualization with base-R and advanced R packages
  • Common techniques for statistical inference, including correlations, linear regressions, and logistic regressions
  • Diagnostics for linear regression assumptions and violations
  • Interpretation of non-linear relationships in OLS
  • R packages for advanced statistical methods, including survey and field experiments, conjoint experiments, and regression discontinuity designs

 

Course Objectives:
Upon successful completion of the course, participants will be able to use R for most commonly encountered tasks in social science data analysis, including all of the topics listed above. The course is suitable for researchers at the beginning of their quantitative training as well as those with advanced background in quantitative social science wishing to acquire a new, free, open-source, and highly versatile set of tools. Applications from classic statistical methods (such as regression) toward newer tools (such as conjoint experiments and regression discontinuity designs) are supported. Participants will also learn to incorporate data analysis and document creation (via R Markdown). A workflow for reproducible data analysis is also a core element of the course.

Course Prerequisites:
Participants are advised to have a background in introductory statistics or concurrently be enrolled in an introductory statistics course. Prior initial exposure to statistical techniques up to linear regression (at a fundamental level) is helpful but not required. No background in R or computer programming is required. The course introduces R from a beginner’s perspective. At the same time, participants with experience in other tools (e.g., SPSS, Stata, or SAS) will find the course structure helpful to transfer their skillsets into R.

Representative Background Reading:
Since this is an introductory course, participants are not required to do any prior reading.

The following list can be used as reference readings by interested participants:

  • Dalpiaz, David (2019). Applied Statistics with R. Online Resource.
  • Huntington-Klein, Nick (2021). The Effect: An Introduction to Research Design and Causality. Online Resource.
  • Kellstedt, Paul and Guy Whitten (2018). The Fundamentals of Political Science Research. Cambridge University Press.
  • Imai, Kosuke (2018). Quantitative Social Science: An Introduction. Princeton: Princeton University Press.
  • Salkind, Neil J. (2017). Statistics for People Who (Think They) Hate Statistics. Sage Publications.
  • Wickham, Hadley, & Grolemund, Garrett (2017). R for Data Science. Online Resource.

Further readings mentioned in the course schedule, including those marked as optional, will be made available to participants during the course.

Required text (will be provided by ESS):
Agresti, Alan. (2018). Statistical Methods for the Social Sciences (Fifth Edition). Pearson.

Background knowledge required:

Maths:

Calculus – Elementary

Linear Regression – Elementary*

Statistics:

OLS – Elementary*

* It is helpful if students have some background on these topics but not a requirement.

Software and Preparation:
Participants will be asked to install R and RStudio on their personal laptops during the first course meeting. We will go over how to use these programs on the first day of the course, using a detailed tutorial with step-by-step instructions. We will also have time to catch up on installation problems on the first day. Therefore, participants do not need to do anything in this regard beforehand.

Day 1:

Introduction to R and quantitative methods

  • Variables and their types
  • Why use R? What for?
  • Basics of R: installation; of R and RStudio; object-oriented programming; basic operations; creating vectors and variables; setting up your first project in R
  • Readings: Agresti Chapter 1 (p13-20), Chapter 2.1-2.3 (p23-33); Imai Chapter 1.3 (p10-16 only)

 

Day 2:

Describing and managing data in R

  • Describing and summarizing variables: range; mean; median; mode; standard deviation; variance.
  • Histograms and density plots
  • Importing and managing datasets
  • Manipulating data
  • Introduction to R packages and ggplot2
  • Readings: Agresti Chapter 3.1-3.3 (p41-58)

 

Day 3:

Probability and Distributions

  • Basic rules of probability
  • Randomness (random variables)
  • Distributions: discrete and continuous
  • Basics of R Markdown
  • Readings: Skim Agresti Chapter 4 (pp79-107)

 

Day 4:

Bivariate associations and Large Samples

  • Large sample theorems (CLT)
  • Sampling distributions
  • (Good) estimators and their properties
  • Relationships between two variables: independent and dependent variables; t-tests; correlations
  • Visualizing the relationship between two variables in R: correlations, t-tests, scatterplots
  • Readings: Skim Agresti Chapter 6 (p151-183); Agresti Chapter 7.1-7.4 (p191-205)

 

Day 5:

Regressions and Interpretation

  • Hypotheses
  • Linear regressions (OLS)
  • Interpreting regression coefficients
  • Confidence intervals and standard errors
  • Running regressions in R
  • Exporting R output to Word/LaTeX through stargazer
  • Readings: Agresti Chapter 9.1-9.5 (p259-284), Chapter 10.1-10.2 (p299-305)
  • Assignment given today: your first complete research project in R

 

Day 6:

OLS diagnostics and non-linear relationships in R

  • Assignment debrief and feedback
  • Violation of OLS assumptions
  • Outliers and influential observations
  • Interaction terms: purpose, interpretation, visualization
  • Quadratic terms: purpose, interpretation, visualization
  • Readings: Agresti Chapter 9.6 (p294-289), Chapter 10.3-10.4 (p306-313), Chapter 14.5 (p451-456)
  • Reference reading (for R): Dalpiaz Chapter 13 (p209-242) and Chapter 11.2 (p164-171)

 

Day 7:

Binary dependent variables and advanced data management

  • Logistic regressions and interpretation (in R)
  • Downloading, cleaning and merging large datasets
  • Changing variable types, creating new variables
  • Writing for-loops and nested for-loops
  • Readings: Agresti Chapter 15.1-15.3 (p471-484)

 

Day 8:

Causal inference and treatment effects

  • Introduction to the causal framework
  • Average treatment effects (ATE)
  • Overview of different types of vignette experiments: survey vs lab vs field
  • ATE and heterogeneous treatment effects in R
  • Readings: Skim Huntington-Klein Chapters 6, 7 and 10
  • Optional sample experiment readings: Will be provided by instructor

 

Day 9:

Conjoint experiments and forced choice experiments

  • Designing and implementing conjoint experiments
  • Data collection and management
  • Balance tests and diagnostics in R
  • Calculating marginal means and average marginal component effects (AMCE) in R
  • Visualizing AMCEs and marginal means in R
  • Readings: Druckman, James N. and Donald P. Green (ed.) 2021. Advances in Experimental Political Science. Cambridge University Press. Chapter 2.
  • Optional sample conjoint experiment readings: Will be provided by instructor

 

Day 10:

Regression Discontinuity Designs (RDD) and course wrap-up

  • Theoretical introduction to RDDs: purpose; when to use them; assumptions; interpretation; limitations
  • RDDs in R: preparing data, testing assumptions, running the RDD, plotting the discontinuity, robustness checks
  • Readings: Huntington-Klein Chapter 20
  • Optional reading: Malik, Rabia. 2021. “(A)Political Constituency Development Funds: Evidence from Pakistan.” British Journal of Political Science. 51(3): p963-980.