Gina Yannitell Reinhardt is a Reader/Senior Associate Professor in the Government Department at the University of Essex. She joined the University in 2015 from Texas A&M University, where she taught policy analysis and quantitative methods for 10 years in the Bush School of Government and Public Service. She studies disaster resilience and international development, and founded two academic networks: the Global South Academic Network to help build research capabilities in developing countries; and the Disaster and Emergency Research Network to encourage academic work on resilience. She consults widely as a program evaluator for public sector agencies.

Course Content

The purpose of this course is to provide Stata programming tools to manage and work with large databases. The course is designed for new and intermediate Stata users who want to acquire advanced skills in data management and programming in Stata, and for people . The course focuses on skills relevant to social science data analysis.

Course Objectives

Those completing this course should be able to:

1. Perform database management and estimation tasks using Stata.

2. Understand and use Stata programming routines and user-contributed .ado files.

3. Interpret Stata output.

4. Program new commands in Stata, from simple procedural commands to more complex estimation commands.

5. Install and use packages and produce graphics, tables, charts, and other output in other word processing programs.

Course Prerequisites

This is a course designed for new and intermediate Stata users, or those familiar with other statistical softwares such as SPSS, who have a basic understanding of econometric analysis. Although participants’ familiarity with Stata may be introductory, it is expected that participants will be familiar with fundamental statistical concepts and vocabulary such as OLS, heteroskedasticity, and t-tests.

Representative Background Reading
You are expected to have knowledge of descriptive statistics, statistical significance, correlation, and sampling and estimation. An introductory statistics book from your field will cover these issues. One example would be:

Sirkin, R. Mark. Statistics for the social sciences. Sage Publications, 2005.

Required texts
The following text will be provided by the Summer School as part of your course material and
used throughout the course:

Baum, Christopher F. An introduction to Stata programming. Vol. 2. College Station: Stata Press, 2009.

Background knowledge required
OLS = strong
Maximum Likelihood = elementary

Computer Background
Stata = elementary


Classes will meet for ten sessions. Each session will include demonstrations and exercises. The exercises are based on practical examples that may or may not be particularly meaningful to your specific research interests. I therefore encourage you to raise examples based on your own data and research needs. I am flexible and happy to move at the pace you desire. If there is anything specific you wish to know, or material for which you would like greater detail, I will do my best to accommodate these requests. 

Computer Software 

Computer-based exercises will feature prominently in the course, especially in the lab sessions. The use of all software tools will be explained in the sessions, including how to download plug-ins and add-ons. We will be working in Stata. It is recommended that you purchase and install a perpetual license for the most recent version of Stata SE on your personal laptop. If you would like to consider a different version, see options here. It will be possible to conduct all work for this class on University lab computers. Please note that if you do purchase your own Stata license for your laptop, it is expected that Stata will be purchased and installed on your laptop prior to the beginning of class. 

Required Text 

  • Baum, Christopher F. An Introduction to Stata Programming. Vol. 2. College Station: Stata Press, 2009. 

Recommended Texts and Potentially Helpful Resources 

• Cox, Nicholas J. and H. Joseph Newton. One Hundred Nineteen Stata Tips, Third Edition 978-1-59718-143-3.
• Long, J. Scott. The Workflow of Data Analysis Using Stata. College Station: Stata Press, 2009.
• Getting Started with Stata for Windows [or Mac, or Unix]. (freely available online)
• Acock, Alan C. A Gentle Introduction to Stata, 4th edition, 2014.
• The UCLA Stata guide (online)
• Baum, Christopher F. An Introduction to Modern Econometrics Using Stata. College Station: Stata Press, 2006.

Detailed Course Schedule

 Day 1: Getting Started with Stata 

• Course goals and logistics; topics overview
• Working with Stata: the Stata environment
• Help files, online PDF documentation
• Data import and description
• Do-files Required Reading:
• Baum o Chapter 1 (entire); o Chapter 2 (entire) Recommended Reading:
• Long, Chapter 3 and Appendix A
• Getting Started with Stata for Windows/Mac/Unix Lab session:
• Exercise 1: Getting started with Stata and do-files

Day 2: Database Manipulation and Basic Do-File Programming 

• Describe, summarize, tabulate
• Cleaning data (missing values, changing variable type, naming and labelling variables)
• Logical expressions
• Generating new variables, dropping variables and observations
• Dummy variables
• Recoding, sorting, and combining datasets
• Beginning macros Required Reading:
• Baum o Chapter 3 (entire); o Chapter 5, Sections 5.6-5.9 Recommended Reading:
• Long, Chapters 5-6
• Acock, Chapter 3 Lab session:
• Exercise 2: Preparing your data

Day 3: Basic Statistical Routines, Regression and Post-Regression Analysis 

• Mean, standard deviation, median, mode, correlation
• T-tests
• Cross-tabulation, Chi-squared test
• OLS, Logit
• Saving results Required Reading:
• Baum, Chapter 4 (entire) Recommended Reading:
• Acock, Chapters 5-7, 8, 10-11 Lab session:
• Exercise 3: Basic Analysis and Description

Day 4: Advanced Estimation Methods and Commands 

• Time Series (tsset; date and time; time series operators)
• Panel Data (wide v. long; reshape; xtset and xtdes)
• `By’ and `Collapse’ commands
• 2SLS • Begin presenting results Required Reading:
• Baum o Chapter 5, Sections 5.1-5.5 o Chapters 6 and 8 (entire) Recommended Reading:
• Baum 2006, Chapters 7-10 Lab session:
• Exercise 4: Time, Panels, and Nested Models

Day 5: Presenting Results 

• Tables • Regression results
• Graphics (scatter, line, CDF, histogram, combining graphs) Required Reading:
• Baum o Chapter 2, Sections 2.4-2.6; o Chapter 3, Section 3.5 Recommended Reading:
• Long, 7.7
• Acock, Chapters 5-6
• A Visual Guide to Stata Graphics, 3rd edition by Michael N. Mitchell, 2012 Lab session:
• Exercise 5: Generating Tables and Graphs

Day 6: Programming Basics: Prefixes, Loops, and Lists 

• Comments
• More with Macros
• Loops (foreach; forvalues)
• `If’ condition • Combining loops and macros Required Reading:
• Baum, Chapter 7 (entire) Recommended Reading:
• Long, Chapters 4, 7
• Baum 2006, Appendix B Lab session:
• Exercise 6: Looping Exercise

Day 7: Extended Do-File Programming 

  • Generating datasets
  • Exporting variable contents
  • Creating flexible output files
  • Automating standard-format tables and graphs
  • Marginal Effects Required Reading:
  • Baum o Chapters 9-10 Lab session:
  • Exercise 7: Generating and Plotting Marginal Effects 


Days 8-9: Writing Stata Programs (ado-files) 

  • Creating/defining a program
  • Implementing program options
  • Documenting and Certifying programs
  • Macro shift (where number of loops is variable)
  • Naming and debugging programs
  • Comments and long lines
  • Arguments
  • Programs with return values and other options
  • Help files and publishing programs Required Reading:
  • Baum , Chapters 11-12 Lab session:
  • Exercise 8: Welcome to the ado!
  • Exercise 9: More work with ado-files 


Day 10: Specialised Models 

  • Structural Equation Models
  • Stochastic Frontier Models
  • Multiple-stage Models
  • Other specialised models as nominated by students