Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Christopher Hare is Assistant Professor of Political Science at the University of California, Davis. His research focuses on ideology, polarization, and the application of measurement models and supervised machine learning methods to the study of voter behaviour. His work has been published in the American Journal of Political Science, Political Analysis, and the British Journal of Political Science. He is also co-author of the book Analyzing Spatial Models of Choice and Judgment (second edition).

 

Course content

Python is a powerful object-oriented, general-purpose programming language for collecting, organizing, and analyzing a wide array of data types. Alongside the R statistical computing platform, it provides one of the best tools available for social scientists looking to employ cutting-edge data science in their own research. This course covers the basics of the Python programming environment with a focus on data science applications: object manipulation, web scraping, data visualization, elementary statistical methods, text analysis, and introductory machine learning tools.

 

Course objectives

This course is designed so that participants become comfortable working in the general Python environment while simultaneously learning about specific applications of Python to common social science research projects and tasks. Students will learn how to work with various data types and structures (including text), collect original data using web scraping/API tools, write and execute functions, produce data visualizations, and run popular machine learning algorithms (such as random forests) in Python. In doing so, students will gain familiarity with Python’s NumPy, Pandas, Matplotlib, and Scikit-Learn libraries.

 

Course Prerequisites

While experience with an object-oriented programming language (e.g., R) is valuable, this course is designed to be introductory and programming experience is neither required nor assumed. We will focus on Python from a social science—rather than a computer science—perspective. Hence, students should have some prior experience with basic statistical concepts and methods such as linear regression. Proficiency in calculus or other mathematics, while helpful, is not required.

 

Required texts

While there are no required texts for this course, I highly recommend any or all the following books in this triad from O’Reilly (especially the first):

1.) VanderPlas, Jake. 2022. Python Data Science Handbook: Essential Tools for Working with Data, 2nd edition. O’Reilly Media. ISBN: 9781098121228 (will be provided by ESS)
2.) McKinney, Wes. 2022. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter, 3rd edition. O’Reilly Media. ISBN: 9781098104030
3.) Benhfort, Bejamin, Rebecca Bilbro, and Tony Ojeda. 2018. Applied Text Analysis with Python. O’Reilly Media. ISBN: 9781491963043. (will be provided by ESS)
In the course schedule, I list corresponding chapters from each of the above texts as optional readings.

I also highly recommend the following text for students looking to learn about more advanced machine learning methods in Python.
4.) Géron, Aurélien. 2022. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 3rd edition. O’Reilly Media. ISBN: 9781098125974

Background Knowledge

Maths

Linear Regression = elementary

Statistics

OLS = elementary

 

 

Day 1:

Installing Python and getting started in the Python environment
PDSH, Part I; PfDA, Chapters 1-2

Day 2:

Variables, data types, and basic operations; introducing the NumPy library
PDSH, Part II; PfDa, Chapters 3-4

Day 3:

Data wrangling; introducing the Pandas library
PDSH, Part III; PfDA, Chapters 5-8, 10

Day 4:

Web scraping and working with APIs
PfDA, Chapter 6

Day 5:

Basic regression and classification models; introducing the Scikit-learn library
PDSH, Part V; HOML, Chapters 1-4

Day 6:

Text-as-data (part I)
ATA, Chapters 1-3

Day 7:

Text-as-data (part II)
ATA, Chapters 4-6

Day 8:

Data visualization; introducing the Matplotlib library
PDSH, Part IV; PfDA, Chapter 9

Day 9:

Machine learning methods (part I)
PDSH, Part V; HOML, Chapters 6-7

Day 10:

Machine learning methods (part II)
PDSH, Part V; PfDA, Chapter 13; HOML, Chapter 10; ATA, Chapters 7-8