Chris Fariss is an Assistant Professor in the Department of Political Science at the University of Michigan. Prior to beginning this appointment, he was the Jeffrey L. Hyde and Sharon D. Hyde and Political Science Board of Visitors Early Career Professor in Political Science in the Department of Political Science at Penn State University. In June 2013, he graduated with a Ph.D. in political science from the University of California, San Diego. He also studied at the University of North Texas, where he graduated with an M.S. in political science (2007), a B.F.A in drawing and painting (2005), and a B.A. in political science (2005).

His core research focuses on the politics and measurement of human rights, discrimination, violence, and repression. Chris uses computational methods to understand why governments around the world torture, maim, and kill individuals within their jurisdiction and the processes monitors use to observe and document these abuses.

Other projects cover a broad array of themes but share a focus on computationally intensive methods and research design. These methodological tools, essential for analyzing data at massive scale, open up new insights into the micro-foundations of state repression and the politics of measurement. Below you will find links to his publications , working papers, teaching material , a Dataverse archive where you can access replication data, and links to human rights data generated from several measurement projects.

Course Content
This course focuses on the research design and analysis tools used to explore and understand social data using new computational tools. The fundamentals of research design are the same throughout the social sciences; however the topical focus of this class is on computationally intensive data generating processes and the research designs used to understand and manipulate such data at scale. By massive or large scale, I mean that there are lots of subjects/connections/units/rows in the data (e.g., social network data like the kind available from Facebook or twitter), or there are lots of variables/items/columns in the data (e.g., text data with many thousands of columns that represent the words in the document corpus), or the selected analytical tool is a computationally complex algorithm (e.g., a Bayesian simulation for modelling a latent variable or a random forest model for exploratory data analysis), or finally some combination of these three issues. The course will provide students with the tools to design observational studies and experimental interventions into large and unstructured social media data sets at increasingly massive scales and at different degrees of computational complexity.

Course Objectives
Students will learn how to design studies to take advantage of the wealth of information contained in new massive scale online datasets such as data available from Facebook, twitter, and many newly digitized document corpuses now available online. The focus of the course is on designing studies in such a way as to maximize the validity of inferences obtained from these complex datasets.

Course Prerequisites

Students should have some familiarity with concepts from research design and statistics. Generally, exposure to these concepts occurs during the first year course at a typical PhD program in political science. Students should have at least some exposure to the R computing environment. The more familiarity with R the better.

Required Reading Material
1. Matloff, Norman. 2011. Art of R Programming: A Tour of Statistical Software Design. no starch press. This book will be provided as part of the course material.

2. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning Data Mining, Inference, and Prediction. Springer Series in Statistics.

Background knowledge required
Statistics
OLS = m
Maximum Likelihood = m

Computer Background
R = m

e = elementary, m = moderate, s = strong

Course Details
• I will begin each class day with a short lecture over the class material (approximately 45-60 minutes).
• After each lecture, students will discuss one or two articles as they relate to the lecture (approximately 30-45 minutes).
• On the first day of class, I will introduce students two large scale datasets. Students will use these data for applied examples over the 10 days of the course.
• The remaining portion of class (approximately 1.5-2 hours) will be devoted to hands on learning with R, simulated data, and the large scale datasets provided by the instructor. Day 7, and Day 9 will consist entirely of in class lab.
• The course schedule section, which is below, provides even more details about the topic of the lecture for each class day, citations for the discussion readings, and chapter entries from the text books for the lab portions of the class.

Day 1: Overview of Supervised Learning

Day 2: Linear Methods for Regression

Day 3: Linear Methods for Classification

Day 4: Model Assessment and Selection

Day 5: Additive Models, Trees, and Related Methods

Day 6: Random Forests

Day 7: Random Forests

Day 8: Neural Networks

Day 9: Neural Networks

Day 10: Ethical Responsibilities for the Social Data Scientist

Syllabus Acknowledgments

This syllabus is based on several courses that I have taken and designed over the last several years. Some of the material is based on the Research Design (PL SC 501) course that I developed at Pennsylvania State University when I began teaching there in the fall of 2013, which itself is based on similar course developed by David Lake and Mathew McCubbins at the University of California, San Diego. It is also based on material that I developed for a graduate measurement theory class (PL SC 597) and undergraduate Social Data Analysis and Design class (SO DA 308) that I also developed at Pennsylvania State University. Elements of the syllabus and other class materials created for this class are also based in part on the Bayesian Statistics class offered by Seth Hill at University of California, San Diego and the Measurement class offered by Keith Poole at UCSD and now the University of Georgia.