Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Rabia Malik is a Lecturer/Assistant Professor in the Department of Government at the University of Essex, which she joined in July 2020. Before this, she received her Ph.D. in Political Science from the University of Rochester in 2016, was a Post-Doctoral Associate at New York University Abu Dhabi (2016-2019) and spent a year at the Lahore University of Management Sciences (LUMS). Her research uses both observational and experimental data to study questions related to distributive politics and development, political accountability, clientelism, and gender, particularly in South Asia. She has also taught classes on quantitative methods, authoritarianism, and South Asian politics. Rabia’s research has appeared in The Journal of Politics, The British Journal of Political Science, Comparative Political Studies and Legislative Studies Quarterly.

Course description and goals:

This course introduces participants to the analysis of quantitative data both theoretically and empirically. Introductory quantitative methods will be covered along with an introduction to the free, open-source software R. R is a highly versatile software environment suitable for introductory and advanced quantitative social science and data analysis. The course offers participants a near-complete foundation for introductory quantitative analysis and to use R for all commonly encountered tasks in social science data analytics. The primary goal of the course is to equip participants with the tools necessary to conduct their own data analysis independently. To do so, the course will cover relevant statistical and descriptive techniques with a strong focus on applied skills and data visualization that will be introduced using R.

The course will cover various topics, both theoretically and in R, including the following:

– Introduction to R and programming in R
– Descriptive statistics
– Data visualization with advanced R packages
– Data import and management, including working with “messy” datasets
– Variable types and creating new variables in R
– Data uncertainty
– Correlations and causal relationships
– Linear regressions, assumptions, interpretation
– Non-linear relationships in OLS
– Binary dependent variables

Course Objectives:

Upon successful completion of the course, participants will be able to consume quantitative texts for social science, download and manage messy datasets, describe and visualize various sorts of data, conduct their own statistical analyses, and use R for most commonly encountered tasks in social science data analysis. Participants will leave with the knowledge of a range of introductory statistical techniques and the ability to implement these in their own research. They will also be comfortable using R for descriptive and statistical analyses, and for managing complex datasets. The course is most suitable for researchers at the beginning of their quantitative training though those with existing background in quantitative social sciences wishing to acquire a new, free, open- source, and highly versatile set of tools (in R) will also benefit. Participants will also learn to incorporate data analysis and document creation (via R Markdown).

Course Prerequisites:

Participants are advised to have a basic background in introductory statistics or concurrently be enrolled in an introductory statistics course. However, initial exposure to statistical techniques up to linear regression (at a fundamental level) is not required as the course will go through necessary background concepts as well. No background in R or computer programming is required or expected. The course introduces R from a beginner’s perspective.

Representative Background Reading:

Since this is an introductory course, participants are not required to do any prior reading.

Required text (will be provided by ESS):

Agresti, Alan. (2018). Statistical Methods for the Social Sciences (Fifth Edition). Pearson.

Background knowledge required:

Maths:

Calculus – Elementary

Linear Regression – Elementary*

Statistics:

OLS – Elementary*

* It is helpful if students have some background on these topics but not a requirement.

For participation in this course, students are required to bring with them their own laptops.

The following list can be used as reference readings by interested participants:

– Dalpiaz, David (2019). Applied Statistics with R. Online Resource.
– Huntington-Klein, Nick (2021). The Effect: An Introduction to Research Design and
Causality. Online Resource.
– Kellstedt, Paul and Guy Whitten (2018). The Fundamentals of Political Science
Research. Cambridge University Press.
– Imai, Kosuke (2018). Quantitative Social Science: An Introduction. Princeton:
Princeton University Press.
– Salkind, Neil J. (2017). Statistics for People Who (Think They) Hate Statistics. Sage
Publications.
– Wickham, Hadley, & Grolemund, Garrett (2017). R for Data Science. Online
Resource.

Further readings mentioned below in the course schedule, including those marked as optional,
will be made available to participants during the course.

Software and Preparation:

Participants will be asked to install R and RStudio on their personal laptops during the first
course meeting. We will go over how to use these programs on the first day of the course,
using a detailed tutorial with step-by-step instructions. We will also have time to catch up on
installation problems on the first day. Therefore, participants do not need to do anything in
this regard beforehand.

Course Schedule:

Day 1:

Introduction to R and quantitative methods; introduction to data description
– Variables and their types
– Summarizing variables: mean; median; mode
– Why use R? What for?
– Basics of R: installation; of R and RStudio; object-oriented programming; basic
operations; creating vectors and variables
– Summary statistics in R
– Readings: Agresti Chapter 1 (p13-20), Chapter 2.1-2.3 (p23-33)

Day 2:

Describing and visualization data; introduction to probability
– Describing and summarizing variables: range; standard deviation; variance.
– Histograms and density plots
– Describing and visualizing data in R
– Introduction to R packages and ggplot2
– (Time permitting): introduction to the basics of probability
– Readings: Agresti Chapter 3.1-3.3 (p41-58); Chapter 4.1 (p79-81)

Day 3:

Probability and Distributions
– Expected value
– Distributions: discrete and continuous
– Sampling distributions
– Importing datasets into R
– Basics of R Markdown
– Readings: Skim Agresti Chapter 4 (pp79-107)

Day 4:

Data uncertainty and continuous probability distributions
– Central Limit Theorem
– Confidence intervals
– Single-variable significance tests in R
– Ifelse() and nested ifelse() statements in R
– Managing datasets and changing variable types in R
– t-tests in R
– Readings: Skim Agresti Chapter 5 (p115-143) and Chapter 6 (p151-183)

Day 5:

Significance tests and introduction to two variables
– Independent and dependent variables
– Two-variable t-tests; difference-in-means
– Visualizing the relationship between two variables in R: correlations, scatterplots
– Choosing datasets and research questions for independent projects in R
– Readings: Agresti Chapter 7.1-7.4 (p191-205)

Day 6:

Introduction to OLS
– Good theories; four causal hurdles
– Bivariate linear regressions (OLS)
– Interpreting regression coefficients
– Control variables and multiple regression
– Running regressions in R
– Exporting R output to Word/LaTeX through stargazer
– Starting on independent projects in R
– Readings: Agresti Chapter 9.1-9.5 (p259-284), Chapter 10.1-10.2 (p299-305)

Day 7:

OLS diagnostics and violations
– Goodness of fit measures
– Model comparisons
– Violation of OLS assumptions
– Outliers and influential observations
– Continuing independent projects in R
– Readings: Agresti Chapter 9.6 (p294-289), Chapter 10.3-10.4 (p306-313), Chapter
11.1-11.3 (p319-337)

Day 8:

Data non-linearities and binary dependent variables
– Interaction terms: purpose, interpretation, visualization
– Quadratic terms: purpose, interpretation, visualization
– (Time permitting:) Logistic regressions and interpretation (in R)
– Implementing these in R
– Completion of independent projects in R
– Readings: Agresti Chapter 11.4 (p337-341); Chapter 14.5 (p451-456), Chapter 15.1-
15.3 (p471-484)

Day 9:

Advanced data management; course wrap-up
– Downloading, cleaning and merging large datasets
– Writing for-loops and nested for-loops
– Presentation and discussion of independent projects in R