Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Christopher Hare is Assistant Professor of Political Science at the University of California, Davis. His research focuses on ideology, polarization, and the application of measurement models and supervised machine learning methods to the study of voter behaviour. His work has been published in the American Journal of Political Science, Political Analysis, and the British Journal of Political Science. He is also co-author of the book Analyzing Spatial Models of Choice and Judgment (second edition).
Python is a powerful object-oriented, general-purpose programming language for collecting, organizing, and analyzing a wide array of data types. Alongside the R statistical computing platform, it provides one of the best tools available for social scientists looking to employ cutting-edge data science in their own research. This course covers the basics of the Python programming environment with a focus on data science applications: object manipulation, web scraping, data visualization, elementary statistical methods, text analysis, and introductory machine learning tools.
This course is designed so that participants become comfortable working in the general Python environment while simultaneously learning about specific applications of Python to common social science research projects and tasks. Students will learn how to work with various data types and structures (including text), collect original data using web scraping/API tools, write and execute functions, produce data visualizations, and run popular machine learning algorithms (such as random forests) in Python. In doing so, students will gain familiarity with Python’s NumPy, Pandas, Matplotlib, and Scikit-Learn libraries.
While experience with an object-oriented programming language (e.g., R) is valuable, this course is designed to be introductory and programming experience is neither required nor assumed. We will focus on Python from a social science—rather than a computer science—perspective. Hence, students should have some prior experience with basic statistical concepts and methods such as linear regression. Proficiency in calculus or other mathematics, while helpful, is not required.
While there are no required texts for this course, I highly recommend any or all the following books in this triad from O’Reilly (especially the first):
1.) VanderPlas, Jake. 2022. Python Data Science Handbook: Essential Tools for Working with Data, 2nd edition. O’Reilly Media. ISBN: 9781098121228 (will be provided by ESS)
2.) McKinney, Wes. 2022. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter, 3rd edition. O’Reilly Media. ISBN: 9781098104030
3.) Benhfort, Bejamin, Rebecca Bilbro, and Tony Ojeda. 2018. Applied Text Analysis with Python. O’Reilly Media. ISBN: 9781491963043. (will be provided by ESS)
In the course schedule, I list corresponding chapters from each of the above texts as optional readings.
I also highly recommend the following text for students looking to learn about more advanced machine learning methods in Python.
4.) Géron, Aurélien. 2022. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 3rd edition. O’Reilly Media. ISBN: 9781098125974
Linear Regression = elementary
OLS = elementary