Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.

Key objectives

The course seeks to provide participants with a fluent understanding of selected issues in applied social statistics, particularly related to using statistical models productively. It also seeks to ensure participants develop confidence to implement the same techniques using Stata.

Participants should learn

  • how relevant statistical models are formulated and interpreted
  • the relative attractions and limitations of different model strategies
  • practical skills in handling and analysing complex social data using Stata

Summary

Become fluent in using important tools of applied social statistics including: Multilevel models – Categorical outcomes models – Marginal effects – Panel models for longitudinal data – Selection models

This course is for participants with some previous training in social statistics but who are keen to widen their knowledge and expertise. Across the programme we explore selected statistical analytical topics and debates, using lectures and lab exercises, with emphasis on applications using real world microdata (e.g. from large-scale social surveys or survey-like sources).

All of our topics are in some way based on regression modelling approaches, but typically with adaptations or extensions tailored to different specialist requirements. The topics that we cover are all supported by fairly well-established software procedures, yet equally they are not routinely deployed by non-specialists.

A full list of the module’s topics:

Analytical techniques Debates on procedures and outputs:
Multilevel models Marginal effects
Categorical outcomes models Sampling design and weighting adjustments
Panel models for longitudinal data Missing data
Comparing fixed and random effects models Estimating and representing uncertainty
Selection models Causal interpretations of model results
Measurement models Learning from simulated data
Workflow and documentation considerations

 

Course Prerequisites:

The course can be thought of as an accessible introduction to selected advanced issues. Concepts, basic algebraic formulae, software training, and extension issues and debates will all be introduced in ways that focus on the social science contribution of the method.

It is expected that participants will have had some previous training in social statistics, for example on popular descriptive analytical techniques (e.g. chi-square tests; correlation values) and well-known types of regression models (e.g. multiple regression, logistic regression). Teaching sessions include some recap content, but concentrate on introducing the specialist topics listed above.

The course is best suited to participants with some previous experience in using Stata code or ‘syntax’. Course materials include some introductory resources, but students without any background in using Stata syntax should be prepared that extra effort will be needed near the start of the course in order to make good use of the lab exercises.

Background reading

Specific background study prior to attending the module is not required.

  • Long, J.S. & Freese, J. (2014) Regression Models for Categorical Dependent Variables Using Stata, Third Edition, Tx: Stata Press [ISBN: 9781597181112] (will be provided by ESS)
  • Rabe-Hesketh, S., & Skrondal, A. (2022). Multilevel and Longitudinal Modeling Using Stata (Volume 1), Fourth Edition. College Station, Tx: Stata Press [ISBN 9781597181365] (will be provided by ESS)

Before the course begins, participants might benefit from revising any text on basic statistical methods in the social sciences that has coverage of descriptive statistics and multiple regression models. Many alternative sources could be used for this purpose, but as examples we recommend:

  • Kohler, H. P., & Kreuter, F. (2012). Data Analysis using Stata, 3rd edition. College Station, Tx: Stata Press.
  • Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass. (Chpts 1-7)

Any participants keen to prepare further might benefit from reading introductory-level materials on any of the topics that are covered by the module. Selected recommendations of texts that deliberately take quite an introductory approach include:

  • Gayle, V., & Lambert, P. S. (2018). What is Quantitative Longitudinal Data Analysis? London: Bloomsbury.
  • Longhi, S., & Nandi, A. (2015). A Practical Guide to Using Panel Data. London: Sage
  • Luke, D. A. 2004. Multilevel Modelling, Sage Quantitative Applications in the Social Sciences, Volume 143. London: Sage.
  • Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.

During the summer school course, numerous readings are recommended for further study during or after the teaching programme.

Software

All topics are elaborated upon with multiple illustrative examples using Stata. We focus on Stata since it is well-equipped to support the topics and datasets being addressed. Introductory materials are available to participants with limited Stata experience whilst students completing the course can expect to develop relatively advanced Stata programming skills. Selected materials are also made available using SPSS and R, mainly for the purposes of comparative assessment of different software tools. Also of relevance:

  • Whilst practical lab sessions are centred on exercises that use Stata, other lecture and study materials are normally software-independent. Nevertheless most statistical outputs within lectures will have been generated via Stata, and lecture contents occasionally address issues that are specific to Stata.
  • Some participants are likely to be fluent in using Stata already, but extensive prior experience with Stata is not necessarily required, since introductory materials are available when needed. However, previous exposure to syntax programming in at least one statistical software package will be beneficial, since the software examples in the course use ‘syntax’ modes of operation. A note on software is prepared amongst the course materials which discusses and illustrates ways of using syntax effectively.

 

Background knowledge required:

Mathematics:
Calculus = Elementary
Linear Regression = Moderate

Statistics:
OLS = Moderate
Maximum Likelihood = Elementary

Computer Background:
Stata = Elementary

 

 

 

The module’s teaching approach links theoretical introductions (in lectures) with class-led practical exercises (in ‘lab’ sessions).

Lecture-based introductions concentrate upon understanding the principles behind a particular approach and the practical impact of using it. Algebraic expositions are generally kept to a minimum, with the focus instead on what an approach is conceived to contribute, and how the outputs from an approach can be interpreted. In this style the course lectures can be thought of as providing an accessible introduction to relatively advanced or intermediate issues.

The lab sessions concentrate upon providing illustrative examples using Stata. Participants can expect to develop their Stata programming skills and leave the course readily able to adapt the illustrative examples to their own application areas.

A typical study day involves around two hours of lecture sessions which are designed cumulatively to introduce, explain and interrogate the topics, ultimately to a relatively advanced level. This is followed by around 1.5 hours of lab exercises in which participants are given illustrated guides to implementing techniques using software and to interpreting the results, as well as being encouraged to adapt the examples to their own research needs and datasets. Outwith scheduled class times (lectures and labs), further study materials including optional homework takes are available when desired, and options for follow-up queries such as drop-in sessions are offered.  

Lectures (L) / Computer practicals (P)

Day 1                   

Foundations in applied social statistics (i)        

L1a:       Why statistical models can help us undertake social science research

L1b:       Course arrangements and overview

P1:         Using Stata for social science data analysis

 

Day 2                   

Foundations in applied social statistics (ii)

L2a:       Getting to grips with complex datasets and the extension issues they can raise

L2b:       Tricks of the trade in working with statistical models  

P2:         Exploring and summarising complex data; key elements of statistical modelling

 

Day 3                   

Introducing and understanding multilevel models (i)   

L3a:       Understanding and interpreting the two-level random intercepts model

L3b:       The two-level random slopes model

P3:         Two-level random effects model specifications and interpretations

 

Day 4   

Introducing and understanding models for categorical outcomes

L4a:       Understanding and implementing non-linear outcome models

L4b:       Using marginal effects constructively

P5:        Implementing and interpreting models for non-linear outcomes

 

Day 5   

Introducing and understanding multilevel models (ii)

L5a:       Multilevel models for categorical outcomes

L5b:       Random effects models with three and more levels and with cross-classified and multiple membership designs 

P5:        Multilevel models for non-linear outcomes and with complex data structures

 

Day 6                   

Panel models for longitudinal data

L6a:       Varieties of panel models and longitudinal data analyses strategies

L6b:       Comparing fixed and random effects models

P6:         Data and models for longitudinal panel datasets 

 

Day 7                   

Using models to study multiprocess systems

L7a:       Selection models

L7b:       Measurement models  

P7:         Introductory examples of selected multiprocess models including selection models and SEMs

 

Day 8                   

Models and analyses that focus on causal interpretations

L8a:       Reflecting on descriptive and causal analytical strategies

L8b:       Selected models designed to assess causal effects

P8:         Illustrating techniques for causal modelling in Stata

 

Day 9                 

Focus on research applications in statistical modelling

L9a:       Class plenary: Participants’ projects that (may) use statistical models

L9b:       Option: Case study on statistical models for cross-national datasets

L9c:       Option: Review/Questions/Selected recap topics

P9:         Applied research – extension topics

 

Day 10                 

Reflections and next steps        

L10a:    Trends and prospects in using statistical models in the social sciences

L10b:    Making progress in applied research with complex quantitative data

P10:      Lab review/recap opportunity