Dr Iulia Cioroianu is a Prize Fellow at the Institute for Policy research at University of Bath. She holds a Ph.D. in political science from New York University and an M.A. in political science from Central European University. Before joining the IPR, she was a research fellow in the Q-Step Centre for quantitative social sciences at the University of Exeter, and a pre-doctoral fellow in the LSE Department of Methodology.

Iulia is a social data scientist who studies the effects of social media and online information exposure on political competition and polarization using natural language processing and quantitative text analysis, machine learning and survey experiments. Her work was published in Electoral Studies, Social Networks and AAAI conference proceedings, and was features in NCRM podcasts and research methods videos. She received an IBM Faculty Award as well as an ESRC IAA Innovation Fellowships, and is currently working with the IBM Centre for Advanced Studies in Amsterdam on the project Understanding News Bias (UNBias). The project develops algorithms for measuring topic-specific ideological positions in news articles, and a web browser extension which reveals these positions to users, while offering them the opportunity to read other articles on the same topic but which may present a different ideological perspective.

Course Content
Our world is increasingly being recorded as digital text, capturing human knowledge and interactions to an unprecedented level and providing a rich source of data for researchers across different academic disciplines and subjects. Consequently, computational text analysis methods and tools are becoming increasingly popular and starting to make their way into the core research methods curriculum.

This course is designed to provide social science researchers an entry point to computational text analysis. Participants will gain hands-on experience designing and implementing a quantitative text analysis research project and will learn to discuss, evaluate and interpret the results. Each class consists of a 2-hours lecture followed by a 1.5 hours lab in which participants apply the methods covered in the lecture.

We will start with an overview of computational text analysis methods and discuss examples of their application across multiple disciplines and research fields. We will then survey the main ways in which text data can be acquired and present several major online text data sources.

The first steps in a text analysis research project – covering imputing, importing, manipulating and storing text data under different formats, as well as cleaning and processing it – often prove to be the most challenging for beginners. After addressing this initial set of issues we will study: the main ways in which text data can be turned into numbers; descriptive methods such as frequency tables and word clouds; automated dictionary methods (such as those developed to extract different emotions from text); text comparison methods (which are often used to study the diffusion and evolution of laws, policies and ideas); and text scaling methods (such as those used by political scientists to map the positions of political actors in the ideological space).

Finally, the course provides an introduction to machine learning applied to text data: supervised classification (routinely used in multiple disciplines to label large volumes of text documents based on a small subset of coded data) and unsupervised learning methods (as a very light introduction to topic modelling).

Course Objectives
At the end of the course, participants will have an understanding of the current quantitative text analysis research landscape, the ways in which computational text analysis can be applied to their area of interest and the main data sources, tools and methods available for further exploration. Participants will also gain hands-on experience designing and implementing a quantitative text analysis research project in R and will be able to discuss and interpret the results and acknowledge the limitations of the methods used.

Course Prerequisites
Familiarity with basic research design and statistical analysis is expected, and exposure to the R computing environment before the course is strongly encouraged.

Representative Background Reading
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297.

Required texts

Silge, J., & Robinson, D. (2018). Text Mining with R: A Tidy Approach. O’Reilly Media Available online at https://www.tidytextmining.com

Background knowledge required
Statistics
OLS = elementary
Maximum Likelihood = elementary

Computer Background
R = elementary