This course is now full, and we are operating a waiting list. Please complete an application form if you would like to be added to the waiting list.
Please note: This course will be taught online only. In person study is not available for this course.
Douglas Rice is Associate Professor of Political Science and Legal Studies at the University of Massachusetts Amherst, where he is also a faculty affiliate of the Computational Social Science Institute and the Data Analytics and Computational Social Science (DACSS) graduate program. His research examines judicial policymaking in American politics, with a particular interest in the power of courts in the American policymaking context, and the implications of according policymaking power to judicial institutions in a democratic political system. His work has appeared in The Journal of Politics, Political Research Quarterly, The Journal of Law, Economics, and Organization, Political Science Research & Methods, The Journal of Law and Courts, American Politics Research, and other journals, as well as in a book, Lighting the Way: Federal Courts, Civil Rights, and Public Policy, published with University of Virginia Press.
With the recent explosion in the availability of digitized text and the expanded access computing power, social scientists are increasingly leveraging advanced computational tools for the analysis of text as data. In this course, students will explore the application of many advanced approaches for text-as-data research in the social sciences.
The course will begin with an overview of text-as-data research for social scientists, orienting students to the general area and contextualizing the advanced approaches we will explore in the class. Then, we will begin to extend our text-as-data work beyond the “bag of words” to models that better represent the richness of text.
Next, the course will turn to embedding-based representations of texts and the underlying distributional theory. We will begin with static embedding models like word2vec and GloVe, and will discuss the benefits and utility of embedding-based representations for social science research.
We will then further our work on embeddings by transitioning to contextual embeddings. To inform our understanding of pretrained contextual embedding models like ELMo and BERT, we will explore neural networks and deep learning in NLP, and will learn how to develop and deploy our own deep learning models. In doing so, we will cover feedforward neural networks, recurrent neural networks, and transformers. Then, we will explore transfer learning, or how to leverage pretrained models for application in our own specific domains.
Finally, we will explore an area of increasing interest at the confluence of NLP and social science research: causal inference with text. In this section, we’ll explore how and where text is being used as part of causal research designs, with a focus on efforts to leverage embedding based representations in those designs.
Most days will be split into roughly 2 hours of lecture and 1.5 hours of computing tutorials. Where possible, computing examples will be demonstrated in both Python and R. Students should be aware that some modern NLP models are extremely computationally intensive, requiring GPUs and/or hours/days for realistic examples to be completed. In these cases, tutorials will be limited to “toy examples” or will be demonstrated only partially live in class.
Students will gain an understanding of important concepts and tools at the leading edge of text-as-data research and how they can be applied in social science text-as-data research. In so doing, the course will equip students as knowledgeable consumers of advanced text-as-data research and provide them with the tools to design and complete more advanced approaches leveraging text in their own work.
Participants are assumed to have completed a course in text-as-data / quantitative text analysis covering basic text processing, supervised learning (e.g., classification), and unsupervised learning (e.g., scaling, topic modeling), such as 1B. Some facility with Python and/or R is assumed.
Calculus = Elementary
Linear Regression = Elementary
OLS = Elementary
Maximum Likelihood = Elementary
R = Moderate
Python = Moderate