2C Introduction to Programming in Python for Social Scientists

Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Christopher Hare is an Associate Professor in Political Science at the University of California, Davis. His research focuses on ideology, polarization, and the application of measurement models and supervised machine learning methods to the study of voter behaviour. His work has been published in the Journal of Politics, the American Journal of Political Science, Political Analysis, and the British Journal of Political Science. He is also co-author of the book Analyzing Spatial Models of Choice and Judgment (second edition).

Course description

This course introduces the Python programming language with an emphasis on applications in data science for the social sciences. Students will learn how to work effectively within the Python environment to collect, organize, and analyze a wide array of data types, including text and numerical data. Topics include object manipulation, web scraping, data visualization, elementary statistical analysis, text analysis, and an introduction to machine learning methods such as random forests. The course emphasizes practical implementation using key libraries including NumPy, Pandas, Matplotlib, and Scikit-Learn. Designed for students with limited or no programming background, the course assumes only basic familiarity with statistical concepts. Instruction focuses on the application of Python to real-world research tasks rather than abstract computer science principles. By the end of the course, students will be able to execute data collection and analysis workflows, visualize results, and apply computational techniques to substantive social science questions.

Course Prerequisites

While experience with an object-oriented programming language (e.g., R) is valuable, this course is designed to be introductory and programming experience is neither required nor assumed. We will focus on Python from a social science—rather than a computer science—perspective. An understanding of basic probability and statistics and experience with applied regression is helpful, but not necessary. Proficiency in calculus or other mathematics, while helpful, is not required.

Texts (all freely available online)

Sweigart, Al. Automate the Boring Stuff with Python: Practical Programming for Total Beginners. San Francisco: No Starch Press, 2025. https://automatetheboringstuff.com/
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor. An Introduction to Statistical Learning with Applications in Python. New York: Springer, 2023. https://www.statlearning.com/

Course outline

Day 1: Installing Python and getting started in the Python environment
Day 2: Variables, data types, and basic operations; introducing the NumPy library
Day 3: Data wrangling; introducing the Pandas library
Day 4: Web scraping and working with APIs (part I)
Day 5: Web scraping and working with APIs (part II)
Day 6: Basic regression and classification models; introducing the Scikit-learn library
Day 7: Text-as-data (part I)
Day 8: Text-as-data (part II)
Day 9: Data visualization; introducing the Matplotlib library
Day 10: Machine learning methods