3B Generative AI for Social Science Research

Please note: This course will be taught online only.

Maximilian Weber is a Postdoctoral Researcher in the Department of Education at the University of Mainz. In Summer 2025, he will serve as interim Professor of Sociology at Goethe University Frankfurt. His research focuses on the intersection of education, technology, and society, with a particular interest in text-as-data methods for social science research. He worked on the rollama R package, which facilitates the use of generative models in research applications.

Course Content

Generative AI (GenAI) is enabling new ways to analyze and generate human-like text and images. This transformation is driven by recent breakthroughs in large language models (LLMs) and their demonstrated capabilities in processing natural language. These advances are reshaping social science research by introducing new methodological approaches and expanding analytical possibilities. The course provides a practical introduction to working with generative AI models, from basic interactions to advanced applications in social science research.

The course will explore both proprietary models (like ChatGPT and GPT-4) and open-weight alternatives (like Llama), with a particular focus on open models. While we will discuss commercial models to understand the broader landscape, we will primarily work with open-weight models as they offer researchers greater control, transparency, and customization possibilities for research applications.

The first week covers foundational concepts of LLMs, including their architecture, capabilities, and limitations, along with hands-on experience in prompt selection. This establishes the technical groundwork while ensuring students can effectively interact with these models for research purposes. The first half of the course also serves as an introduction to practical applications such as text summarization and advanced reasoning techniques.

The second half of the course moves into more advanced applications, focusing on model fine-tuning. Students will learn how to adapt models for specific research tasks and scale their applications to handle large datasets. At the end of the course the students will apply these technologies in practical projects to social science research questions, working with datasets such as parliamentary speeches and open-ended survey responses.

Course Objectives

Generative AI has emerged as a tool for social science research, offering new possibilities for analyzing complex textual and multimodal sources. These technologies enable researchers to process and analyze data in ways that were previously impractical or impossible. However, effectively leveraging these tools requires understanding both their capabilities and limitations. How can researchers appropriately apply GenAI to their research questions? What are the best practices for prompt engineering and model fine-tuning? How can we ensure the reliability and validity of AI assisted analyses? This course addresses these questions while providing hands-on experience with current GenAI technologies, with particular emphasis on open models.

Course Prerequisites

Students should have basic programming experience in Python. Familiarity with fundamental data science concepts and basic statistical methods is helpful. While no prior experience with GenAI is required, students should be comfortable learning new technical concepts. Those wishing to prepare for the course might benefit from reviewing basic Python programming through resources like “Python for Data Science” or completing an introductory machine learning course in Python. Experience with natural language processing (NLP) concepts is helpful but not required.

For the duration of the course, a paid subscription to Google Colab (or a similar service) is highly recommended to fully engage with the course materials and exercises.

Representative Background Reading

Bail, C. A. (2024). Can Generative AI improve social science? Proceedings of the National Academy of Sciences, 121(21), e2314021121. https://doi.org/10.1073/pnas.2314021121

Papers with interesting or creative use cases of GenAI will be distributed closer to the course.

Course Outline

Day 1: Introduction to GenAI

Theme: Introduction to GenAI and Large Language Models
Concepts: Overview of LLMs; Understanding transformers, embeddings, and training data
Hands-on: Exploring pre-trained models (OpenAI, Llama); Simple text generation and first-hand interaction

Day 2: Prompt Engineering

Theme: Effective prompting techniques
Concepts: Basics of prompting including zero-shot, one-shot, few-shot learning; Effective prompt construction
Hands-on: Prompt crafting for open-ended survey questions answers

Day 3: Text Summarization

Theme: Leveraging LLMs for text summarization
Concepts: LLM capabilities for summarization
Hands-on: Summarizing policy documents and newspaper articles

Day 4: Reasoning with LLMs

Theme: Advanced reasoning techniques
Concepts: Techniques for improving LLM reasoning including chain-of-thought prompting, self-reflection prompts, and reasoning models
Hands-on: Applying reasoning to policy documents and newspaper articles

Day 5: Multimodal Models

Theme: Understanding and applying multimodal capabilities
Concepts: Introduction to multimodal models; Image-to-text applications (and text-to-image)
Hands-on: Describing images

Day 6: Agents and Automation

Theme: Working with LLM-powered agents
Concepts: Introduction to agents (e.g., LangChain, AutoGPT); Automating workflows using LLM-powered agents
Hands-on: Workshop on building simple pipelines for collecting, summarizing, and analyzing data

Day 7: Introduction to Fine-Tuning

Theme: Model customization through fine-tuning
Concepts: Understanding the need for fine-tuning; Fine-tuning methods
Hands-on: Fine-tuning experiments

Day 8 (Part 1): Building an LLM from Scratch

Theme: Understanding LLM construction
Concepts: Steps for building LLMs from scratch
Hands-on: Presentation of an example

Day 8 (Part 2): Scaling Generative AI

Theme: Large-scale AI implementation
Concepts: Use of cloud resources
Hands-on: Presentation of an example

Day 9: Custom Use Cases

Theme: Applied research applications
Concepts: Use of models for specific research questions
Hands-on: Team projects focusing on applying (fine-tuned) models to analyze parliamentary speeches or open-ended survey questions

Day 10: Final Presentations

Theme: Project presentations and course conclusion
Activity: Group presentations on use case applications
Discussion: Feedback, lessons learned, and open discussions on the future of generative AI in social science