1E Applied Social Statistics using Stata

Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and applied research on processes of social stratification and social inequality.

Summary

Become fluent in using important tools of applied social statistics including: Multilevel models – Categorical outcomes models – Marginal effects – Panel models for longitudinal data – Selection models

This course helps participants to widen their knowledge and experience of applied social statistics. The programme introduces topics that are related to using regression modelling approaches in relatively sophisticated ways. It features accessible introductions to the theory and procedures involved in implementing selected tools of analysis, with lots of examples that use real world social survey (or survey-like) datasets.

The programme covers:

Multilevel models
Marginal effects
Categorical outcomes models
Sampling design and weighting adjustments
Panel models for longitudinal data
Missing data
Comparing fixed and random effects models
Estimating and representing uncertainty
Selection models
Causal interpretations of model results
Measurement models
Learning from simulated data
Workflow and documentation considerations

Course prerequisites:

The course can be thought of as an accessible introduction to selected advanced issues. Concepts, basic algebraic formulae, software training, and extension issues and debates will all be introduced in ways that focus on the social science contribution of the method.

It is expected that participants will have had some previous training in social statistics, for example on popular descriptive analytical techniques (e.g. chi-square tests; correlation values) and well-known types of regression models (e.g. multiple regression, logistic regression). Teaching sessions include some recap content, but concentrate on introducing the specialist topics listed above.

The course is best suited to participants with some previous experience in using Stata code or ‘syntax’. Course materials include some introductory resources, but students without any background in using Stata syntax should be prepared that extra effort will be needed near the start of the course in order to make good use of the lab exercises.

Background reading

Specific background study prior to attending the module is not required.

During the course, texts by Long & Freese and by Rabe-Hesketh and Skrondal are used regular and made available to participants, and numerous other readings are recommended for further study.

Long, J.S. & Freese, J. (2014) Regression Models for Categorical Dependent Variables Using Stata, Third Edition, Tx: Stata Press [ISBN: 9781597181112] (will be provided by ESS)
Rabe-Hesketh, S., & Skrondal, A. (2022). Multilevel and Longitudinal Modeling Using Stata (Volume 1), Fourth Edition. College Station, Tx: Stata Press [ISBN 9781597181365] (will be provided by ESS)

Before the course begins, participants might benefit from revising any text on basic statistical methods in the social sciences that has coverage of descriptive statistics and multiple regression models. We recommend:

Kohler, H. P., & Kreuter, F. (2012). Data Analysis using Stata, 3rd edition. College Station, Tx: Stata Press.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass. (Chpts 1-7)

Software

All topics are elaborated upon with multiple illustrative examples using Stata. We focus on Stata since it is well-equipped to support the topics and datasets being addressed. Introductory materials are available to participants with limited Stata experience whilst students completing the course can expect to develop relatively advanced Stata programming skills. Selected materials are also made available using SPSS and R, mainly for the purposes of comparative assessment of different software tools. Also of relevance:

Whilst practical lab sessions are centred on exercises that use Stata, other lecture and study materials are normally software-independent. Nevertheless most statistical outputs within lectures will have been generated via Stata, and lecture contents occasionally address issues that are specific to Stata.
Some participants are likely to be fluent in using Stata already. Extensive prior experience with Stata is not required, but it would be difficult to make good use of course materials without some previous exposure to using Stata ‘syntax’ code.

Background knowledge required:

Mathematics:
Calculus = Elementary
Linear Regression = Moderate

Statistics:
OLS = Moderate
Maximum Likelihood = Elementary

Computer Background:
Stata = Elementary

The module’s teaching approach links theoretical introductions (in lectures) with class-led practical exercises (in ‘lab’ sessions).

Lecture-based introductions concentrate upon understanding the principles behind a particular approach and the practical impact of using it. Algebraic expositions are generally kept to a minimum, with the focus instead on what an approach is conceived to contribute, and how the outputs from an approach can be interpreted. In this style the course lectures can be thought of as providing an accessible introduction to relatively advanced or intermediate issues.

The lab sessions concentrate upon providing illustrative examples using Stata. Participants can expect to develop their Stata programming skills and leave the course readily able to adapt the illustrative examples to their own application areas.

A typical study day involves around two hours of lecture sessions which are designed cumulatively to introduce, explain and interrogate the topics, ultimately to a relatively advanced level. This is followed by around 1.5 hours of lab exercises in which participants are given illustrated guides to implementing techniques using software and to interpreting the results, as well as being encouraged to adapt the examples to their own research needs and datasets. Outwith scheduled class times (lectures and labs), further study materials including optional homework takes are available when desired, and options for follow-up queries such as drop-in sessions are offered.

Lectures (L) / Computer practicals (P)

Day 1

Foundations in applied social statistics (i)

L1a: Why statistical models can help us undertake social science research

L1b: Course arrangements and overview

P1: Using Stata for social science data analysis

Day 2

Foundations in applied social statistics (ii)

L2a: Getting to grips with complex datasets and the extension issues they can raise

L2b: Tricks of the trade in working with statistical models

P2: Exploring and summarising complex data; key elements of statistical modelling

Day 3

Introducing and understanding multilevel models (i)

L3a: Understanding and interpreting the two-level random intercepts model

L3b: The two-level random slopes model

P3: Two-level random effects model specifications and interpretations

Day 4

Introducing and understanding models for categorical outcomes

L4a: Understanding and implementing non-linear outcome models

L4b: Using marginal effects constructively

P5: Implementing and interpreting models for non-linear outcomes

Day 5

Introducing and understanding multilevel models (ii)

L5a: Multilevel models for categorical outcomes

L5b: Random effects models with three and more levels and with cross-classified and multiple membership designs

P5: Multilevel models for non-linear outcomes and with complex data structures

Day 6

Panel models for longitudinal data

L6a: Varieties of panel models and longitudinal data analyses strategies

L6b: Comparing fixed and random effects models

P6: Data and models for longitudinal panel datasets

Day 7

Using models to study multiprocess systems

L7a: Selection models

L7b: Measurement models

P7: Introductory examples of selected multiprocess models including selection models and SEMs

Day 8

Models and analyses that focus on causal interpretations

L8a: Reflecting on descriptive and causal analytical strategies

L8b: Selected models designed to assess causal effects

P8: Illustrating techniques for causal modelling in Stata

Day 9

Focus on research applications in statistical modelling

L9a: Class plenary: Participants’ projects that (may) use statistical models

L9b: Option: Case study on statistical models for cross-national datasets

L9c: Option: Review/Questions/Selected recap topics

P9: Applied research – extension topics

Day 10

Reflections and next steps

L10a: Trends and prospects in using statistical models in the social sciences

L10b: Making progress in applied research with complex quantitative data

P10: Lab review/recap opportunity