Please note: This course will be taught online only. In person study is not available for this course. 

 

Raymond Hicks has been working with Columbia University’s History Lab since 2017. While at History Lab, he helped build pipelines to run topic modeling and Named Entity Recognition on more than 4 million declassified government documents. His work at Columbia bridges data science and historical research, focusing on improving access to and analysis of government archives through natural language processing and machine learning methods.

Before starting at Columbia, he worked as the Statistical Programmer for the Niehaus Center for Globalization and Governance at Princeton University. His research interests include monetary policy, trade policy, and statistics and his work has appeared in the Journal of Politics, International Organization, and the British Journal of Political Science, among other journals.

Relevant publications:
“New evidence and new methods for analyzing the Iranian revolution as an intelligence failure” (with Matthew Connelly, Robert Jervis, and Arthur Spirling), 2021, Intelligence and National Security, 36(6): 781-806.

“Diplomatic documents data for international relations: the Freedom of Information Archive Database” (with Matthew Connelly, Robert Jervis, Arthur Spirling, and Clara Suong), 2021, Conflict Management and Peace Science, 38(6): 762-781.

Course Content

Python has become one of the most widely used programming languages in the social sciences, combining an intuitive syntax with powerful libraries for data analysis, visualization, and computational modeling. This course introduces participants to the fundamentals of Python programming through a social science lens, emphasizing hands-on practice and real-world research applications.

The course begins with the essentials of programming in Python: working with variables, data types, and control structures such as loops and conditionals. Participants will learn to organize code into functions and scripts, structure projects efficiently, and work within Jupyter Notebooks, a tool widely used in research for combining code, data, and narrative text.

Building on these basics, the course introduces key Python libraries used in empirical social science research, including pandas for data management, matplotlib and seaborn for visualization, and NumPy for numerical analysis. Students will learn how to import and manipulate data from CSV and Excel files, summarize variables, clean datasets, and create informative graphics to communicate results.

Throughout the course, participants will also be introduced to best practices for reproducible research, including version control with Git, documenting workflows, and using virtual environments to manage dependencies. Discussion of real-world examples from political science, economics, sociology, and public policy helps connect programming skills to substantive research questions.

Course Objectives

By the end of the course, participants will have built a solid foundation in using Python and the confidence to extend their skills to advanced applications such as text analysis, network analysis, and machine learning. Students will learn to understand and apply the fundamentals of Python programming, to work with core data structures including lists, dictionaries, and data frames, and to read, clean, and manipulate tabular data using pandas. They will develop the ability to create clear, effective data visualizations with matplotlib and seaborn, write modular and reusable code, and document their analytical workflows for transparency and reproducibility. Throughout the course, participants will apply programming concepts directly to social science research questions, building a strong conceptual and practical base for further exploration of computational methods in the social sciences.

Course Prerequisites

No prior programming experience is required. The course is designed for participants with backgrounds in the social sciences who wish to develop or strengthen their computational skills.

Before the course, participants may find it helpful to explore short, beginner-friendly Python tutorials such as:

  • w3schools Python Tutorial: https://www.w3schools.com/python
  • Python for Everybody (Dr. Charles Severance): https://www.py4e.com
  • DataCamp: Introduction to Python (interactive exercises).

Participants should install Anaconda (which includes Python, Jupyter Notebook, and key libraries) prior to the first session. Detailed setup instructions will be provided.

Introduction to Python

 

Day 1: Reproducible work flows

  • Introduction to Python
  • IDEs
  • Virtual environments
  • GitHub

 

Day 2: Working with Python

  • Python basics
  • Loops, conditionals, list manipulation
  • String manipulation

 

Day 3: Dataframes and text analysis

  • Introduction to pandas
    • Importing data and converting from Python objects
    • Selecting and filtering columns and rows
  • Data cleaning
  • Summarizing and merging
  • Introduction to text analysis

 

Day 4: Graphing and visualization

  • Matplotlib
  • Plotly

 

Day 5: Functions, modularization, and projects

  • Writing functions
    • Modularity
  • Project structuring
    •