Rochelle Terman is an Assistant Professor of Political Science at University of Chicago. Her research examines international norms, gender, and human rights using a mix of quantitative, qualitative, and computational methods. She teaches computational social science in a variety of capacities.
Note: this class has been moved to the afternoon
This course teaches students to acquire, process, and analyze data from the Internet using the R statistical programming language. The first portion of the class introduces tools to clean, transform, and wrangle data using `tidyverse` packages. We will also review key programming concepts and techniques to make the best use of R. In the second portion of the course, students will learn how to collect internet data in a variety of forms, including application programming interfaces (APIs) and scraping the open web. The third portion of the class focuses on analyzing the data we’ve collected, introducing the basics of text analysis and visualization.
This course is geared towards social scientists who work with are interested in extracting, processing, and analyzing data from the internet. By the end of the course, participants will:
1. Understand basic legal and ethical issues surrounding web scraping.
2. Collect data via RESTful APIs:
a. Master key principles and concepts of RESTful APIs.
b. Use plug-n-play R packages for popular APIs such as Twitter, Google Translate, and others.
c. Write a custom API query to extract data from RESTful APIs, such as the New York Times Article API.
3. Collect data via web scraping:
d. Understand how HTML & CSS work to display a website.
e. Inspect a website using Google Developer Tools and SelectorGadget to understand its underlying structure and identify elements.
f. Write a program that scrapes multiple webpages using R.
4. Clean, transform, and wrangle data using `tidyverse` packages.
Be introduced to the main methods and techniques involved in modern computational text analysis.
Participants must have basic computer skills and be familiar with their computer’s file systems (e.g. paths). We will assume students have basic knowledge of R and RStudio. Participants with no prior experience with R are encouraged to complete this brief tutorial (requiring 2-3 hours) to learn the basics of R before the course.
Representative Background Reading
Justin Grimmer and Brandon Stewart. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
Background knowledge required
R = elementary