1E Multilevel Statistical Models for the Social Sciences using Stata

Please note: This course will be delivered in person at the Colchester campus. Online study is not available for this course.

Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.

Course Content:

This module takes a wide-ranging approach to studying multilevel statistical models in the social sciences. Materials will cover:

– how to understand, implement and interpret multilevel statistical models

– how to assess and compare different ways of taking account of complex and multilevel data within a statistical analysis

– how to work with complex social science data as part of the process of developing a useful statistical analysis

– how to use Stata for the preparation and analysis of complex social science datasets*

*(some course materials are also available in R, SPSS and MLwiN)

A ‘multilevel’ data structure is one where the units of analysis can be located within wider hierarchical groups or ‘clusters’, and a multilevel statistical model is one that features an adjustment that’s designed to appropriately recognise that structure within a model-based analysis. In the social sciences, there are a great many scenarios where data has some sort of multilevel structure, and hence where a multilevel statistical model might be a helpful approach. Examples include educational datasets where students might be clustered into classes or institutions; and longitudinal panel datasets when multiple responses can be thought of as clustered within the survey respondents.

Description of course activities

Students taking course 1E explore the challenges of applying multilevel statistical models to social science data. Daily teaching sessions comprise lectures followed by lab exercises that implement examples that were described in lectures. The teaching style tries to present statistical materials in an accessible manner that assumes only introductory previous knowledge. Lab sessions contain a wealth of illustrative examples, some of which provide opportunities to develop understanding of quite challenging research issues.

The course begins with selected foundational content relevant to using multilevel statistical models in the social sciences. We discuss how statistical models work in general terms, and how we implement them in social research, including best practice in using statistical software to run models. Topics include the appropriate specification and interpretation of regression and statistical models – for instance, on building models, interpreting their parameters, and assessing how well they fit the data. The course also reviews features of social science data that are often described as ‘complex’. A multilevel data structure is one example of complexity, but there are others too that often arise in social research data, and the best analyses are able to compare and contrast different options in different scenarios.

The course then turns to the popular strategy of using multilevel models with random effects as a way of studying society. Random effects models are a popular and widely used tool for analysing ‘clustered’ or ‘hierarchical’ datasets. Multilevel models with random effects let us use statistical models to analyse such data in a way which appropriately takes account of that clustering. In doing so, multilevel models are also linked to valuable conceptual distinctions such as the interplay between micro- and macro-level influences. The course materials introduce and contextualise multilevel models, with practical training in running them and comparing them to other models. Within week 1 of the course, we spend time on ‘2-level’ multilevel models with ‘random intercepts’ and ‘random slopes’, and on approaches that can be used for linear and categorical outcome measures.

As the course moves into week 2, we turn to a selection of important adjustments and extension issues that often arise when social scientists work with multilevel research data. Many examples continue to use random effects models, but with adaptations that are designed to account for more complex cluster structures (e.g. ‘three-level’, ‘four-level’, and ‘cross-classified’ data structures), more specialist analytical scenarios (e.g. longitudinal analyses), and/or other data complexities such as missing data or sampling weights.

The course also has coverage of data preparation work that is designed to enhance complex data resources. Examples here include making choices over the operationalisation of variables in a statistical modelling analysis, and ways of linking information between different data sources. Such practical topics are sometimes neglected in research training, but they feature within course 1E because they can be very valuable aspects of adapting to and properly exploiting research data.

Not all our analytical examples use random effects models. Many studies make an appropriate choice to analyse multilevel data with models that don’t use random effects but include some other adjustment or extension (e.g. ‘fixed effects’ for clusters, and ‘robust standard errors’) and our materials discuss these options as well as selected other modelling strategies such as structural equation models. Indeed, data analysis in the social science often involves comparing or choosing between several different plausible options, and the last materials in the module focus on weighing up and critically evaluating the relative benefits of multilevel statistical models in comparison to other strategies of analysis that might be considered. We hope participants will leave the course not just with an awareness of what multilevel statistical models are and how they can be formulated, but also with an appreciation of whether or not they are likely to make a useful difference in a given analytical scenario.

Course Objectives:

The course seeks to provide participants with a strong understanding of how multilevel statistical models can be applied in the social sciences, and of when they are most likely to be helpful.

Participants should learn how relevant statistical models are formulated and interpreted, the relative attractions and limitations of different model strategies and practical skills in fluently handling and analysing complex data using Stata

There are a number of benefits to learning how to understand and to implement multilevel statistical models. Multilevel models are widely used in the social sciences so there are many good reasons to learn in detail about their theory and their practical implementation. Further course materials explore several other important but under-utilised options in the specification of statistical models and in making good use of complex datasets. Training in these areas should provide course participants with the confidence to compare between the strengths and limitations of different plausible models, and equip them with valued practical skills in using software to work with data and run statistical models.

Course Prerequisites:

This is an introductory course in multilevel statistical models. Concepts, basic algebraic formulae, and extension issues that are associated with using multilevel statistical models will all be introduced in ways that should not require substantial background knowledge.

It is expected that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions begin by recapping features of the most common regression models, but move quite rapidly to relatively more complex aspects of statistical models and related extension topics. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting standard regression outputs.

The course is also best suited to participants with at least some previous experience in using statistical software through code or ‘syntax’. The course features lab sessions which primarily give examples in using Stata for tasks of both data preparation and analysis. Course materials include some introductory documentation to help with using software, and for this reason the course should still be accessible to people who have little previous software experience, however students without any background in using syntax should be prepared that extra effort will be needed near the start of the course in order to make good use of the lab exercises.

The course also features optional materials in R, SPSS and MLwiN. These do not have as extensive a range of examples as the Stata materials, but they do provide coverage of the majority of lecture topics. Ordinarily the course should be taken be participants who already use, or are planning to use, Stata. However in certain circumstances the combination of materials might still be productive for a researcher who is unlikely to work with Stata (prospective students are welcome to contact the course leader in advance to discuss if the course is likely to be suitable in this regard).

Required texts (this will be provided by ESS)

Hox, J., M. Moerbeek, and R. van de Schoot. 2017. Multilevel Analysis: Techniques and Applications (Third Edition). London: Routledge.

Rabe-Hesketh, S., and A. Skrondal. 2022. Multilevel and Longitudinal Modeling Using Stata, Fourth edition (two-volume set). College Station, Tx: Stata Press.

Background knowledge required:

Mathematics:
Calculus = Elementary
Linear Regression = Moderate

Statistics:
OLS = Elementary

Maximum likelihood = Elementary

Computer Background:
Stata = Elementary

Experience in using syntax code to run commands in at least one of Stata, SPSS or R is advantageous. It is not essential, as introductory information will be available, however lab materials do involve using syntax code in at least one of those languages. Coverage of Stata examples is more substantial by comparison to coverage of SPSS and R.

The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model, and it includes some wider topics about strategies for dealing with complex data aside from using multilevel models; 2E goes a little further on statistical details and estimation strategies for multilevel models, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.

Lectures (L) / Computer practicals (P)

Day 1 Tuesday 9 July

Foundations in using multilevel statistical models for the social sciences (i)

L1a: Why multilevel and other statistical models can help us undertake social science

L1b: Course arrangements and overview

P1: Using statistical software for social science data analysis

Day 2 Wednesday 10 July

Foundations in using multilevel statistical models for the social sciences (ii)

L2a: Comparing descriptive and model-based ways of analysing social science data

L2b: Getting to grips with multilevel and complex datasets

P2: Exploring and summarising complex data; key elements of statistical modelling

Day 3 Thursday 11 July

Classical approaches to modelling multilevel data (i)

L3a: Adapting statistical models for multilevel and complex data

L3b: Understanding and interpreting the two-level random intercepts model

P3: Two-level random intercepts model specifications and interpretations

Day 4 Friday 12 July

Classical approaches to modelling multilevel data (ii)

L4: The two-level random slopes model

L5: Two-level models for non-linear outcomes

P4: Models for random intercepts and slopes

P5: Models for non-linear outcomes

Day 5 Monday 15 July

Varieties of multilevel and other statistical models (i)

L6a: Recap and refresher on week 1 topics

L6b: Alternative popular models for complex and multilevel data

P6: Alternative ways to treat and assess clustered data and hierarchical effects; introductory examples of other special purpose models such as SEMs and multiprocess models; ways of comparing measures and datasets

Day 6 Tuesday 16 July

Varieties of multilevel and other statistical models (ii)

L7a: Random effects models with three and more levels and with cross-classified and multiple membership designs

L7b: Case study: Using statistical models for cross-national comparisons

P7: Data and models with complex clustering

Day 7 Wednesday 17 July

Varieties of multilevel and other statistical models (iii)

L8: Popular statistical models for longitudinal data analysis

P8: Examples in analysing longitudinal data;

Day 8 Thursday 18 July

Research applications in statistical modelling

L9a: Class plenary: Participants’ projects that (may) use multilevel statistical models

L9b: More on multilevel models in applied research, with further attention to sampling weights, missing data, model specification and post-estimation diagnostics

L9c: Option: Review/Questions/Selected recap topics

P9: Applied research – extension topics

Day 9 Friday 19 July

Reflections and next steps

L10a: Trends and prospects in using multilevel statistical models in the social sciences

L10b: Making progress in applied research with complex and multilevel data

P10: Lab review/recap opportunity

Software

This course is primarily delivered using Stata (see e.g. www.stata.com/support).

Practical lab sessions are centred on exercises that use Stata. Other lecture and study materials are normally software-independent, however, most statistical outputs within lectures will have been generated via Stata, and lecture contents occasionally address issues that are specific to Stata.

Some participants are likely to be fluent in using Stata already, but extensive prior experience with Stata is not necessarily required, since introductory materials are available when needed. However, previous exposure to syntax programming in at least one statistical software package will be beneficial, since the software examples in the course use ‘syntax’ modes of operation. A note on software is prepared for the course that discusses and illustrates ways of using syntax effectively – participants are asked to read through this note in the opening days of the course, and should be prepared to spend additional study time in improving their software skills at this point if relevant.

In spite of its title, the course also features optional illustrative materials that use R, SPSS and MLwiN. It is instructive to compare and contrast how different packages can be used to similar ends. In some instances, the course may be suitable to students who normally undertake work in R, SPSS or MLwiN, in which case the supplementary materials provide partial coverage of lecture topics.

Study materials and information on readings

Course materials that are provided to participants include a study handout, linked lecture slides, lab syntax files, and supplementary study materials (e.g. optional homework exercise sheets). The study handout and lecture slides overwhelmingly use the same materials, although the handout features some expansions on topics that are curtailed within lectures.
The text by Hox et al. (2017) and volume 1 of Rabe-Hesketh and Skrondal (2022) are supplied to participants as e-Books and are regularly cited and used within teaching sessions:

Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge. [ISBN: 9781138121362]

Rabe-Hesketh, S., & Skrondal, A. (2022). Multilevel and Longitudinal Modeling Using Stata, Fourth Edition. College Station, Tx: Stata Press [ISBN 9781597181365]; alternatively: ISBN: 9781597180405, single-volume 2^nd ed; 9781597181082, 2-volume set, 3^rd ed]

During the course delivery, other specialist texts are often cited and/or discussed within teaching sessions (for instance, academic journal articles that focus on methodology and/or research application areas).

[Bibliographic references to cited texts will be provided within course materials].

Alternative readings, that incorporate some introductory content suitable as pre-course readings on multilevel models, and that have been popular with previous cohorts of students for use both before and during the course delivery, include:

Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.

Centre for Multilevel Modelling (2019) Multilevel Modelling Course. LEMMA VLE, University of Bristol, Centre for Multilevel Modelling. Accessed at https://www.cmm.bris.ac.uk/lemma/

Heck, R. H., Thomas, S. L., & Tabata, L. N. (2013). Multilevel and Longitudinal Modeling with IBM SPSS, Second Edition. London: Routledge.

Luke, D. A. (2020). Multilevel Modeling, 2^nd Edition. London: Sage.

Plewis, I. (1994). Longitudinal Multilevel Models. In A. Dale & R. B. Davies (Eds.), Analysing Social and Political Change: A casebook of methods. London: Sage.

Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.

Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2^nd Edition. London: Sage.

1E Multilevel statistical models for the social sciences using Stata

Latest News

Social Events

Apply now

1E Multilevel Statistical Models for the Social Sciences using Stata

1E Multilevel statistical models for the social sciences using Stata

Latest News

Social Events

Apply now

Find us online!