Please note: This course will be delivered in person at the Colchester campus only. Online study is not available for this course.

Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.

Course Content:

Statistical models are important tools for analysing quantitative datasets. In the social sciences, it is also common to refine or adjust models, beyond their standard formulations, in order to take account of the complexities of ‘real life’ social data. Participants in course 1E will learn about statistical models in the social sciences and about certain popular strategies of using models to analyse complex or multilevel data. Students will learn:

–  how to specify, formulate and interpret common types of statistical model

– how to understand, implement and interpret multilevel models

– how to assess and compare other ways of taking account of complex and multilevel data within a modelling framework

– how to enhance complex data such as by merging variables or datasets and analysing them with appropriate statistical models

Description of course activities

Students taking course 1E explore the challenges of applying statistical models to multilevel and complex data. Daily teaching sessions comprise lectures, followed by lab exercises that implement examples of things that were described in lectures. The teaching style tries to present statistical materials in an accessible manner that assumes only introductory previous knowledge. Lab sessions contain a wealth of illustrative examples, some of which provide opportunities to develop understanding of quite challenging research issues.

The course begins by reviewing how statistical models work in general terms, and how we implement them in social research. Materials cover general aspects of the appropriate specification and interpretation of regression and statistical models – for instance, on building models, interpreting their parameters, and assessing how well they fit the data.

The course then describes features of social science data that are often described as ‘complex’. We stress how these are normal features of social research data, and that there are relevant adjustments that we can make to a statistical analysis in response.

The course then examines a selection of important strategies that are relevant when analysing multilevel and complex data. The specific topics that are explored are chosen because they arise frequently in research, and the strategies that we apply to them can easily impact the results of our analyses.

The first concerns the use of multilevel models with random effects (about half of the course materials). Multilevel models are a popular and widely used tool for analysing ‘clustered’ or ‘hierarchical’ datasets. Social science data often features the clustering of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models with random effects can allow us to use statistical models to analyse such data in a way which appropriately takes account of that clustering. In doing so, multilevel models are also linked to valuable conceptual distinctions such as the interplay between micro- and macro-level influences. The course materials introduce and contextualise multilevel models, with practical training in running them and comparing them to other models. We spend time on models with ‘random intercepts’ and ‘random slopes’, with linear and categorical outcome measures, and we explore models that can be used with increasingly complex cluster structures (e.g. ‘three-level’, ‘four-level’, and ‘cross-classified’ data structures).

Further content then introduces other important adjustments that social scientists often make in response to multilevel and complex data. It is also possible to analyse multilevel data with models that don’t use random effects but include some other adjustment or extension (e.g. ‘fixed effects’ for clusters, and ‘robust standard errors’). Also relevant are adjustments to models in response to complex sampling and missing data, such as by using sampling weights within a model-based analysis. As well as through model specification, we also often adapt to complex or multilevel data with data preparation work that is designed to enhance complex data resources. Examples here include making choices over the operationalisation of variables in a statistical modelling analysis, and ways of linking information between different data sources and organising data files that aren’t neatly rectangular. Such practical topics are sometimes neglected in research training, but they feature within course 1E because they can be very valuable aspects of adapting to and properly exploiting complex and multilevel data.

Course Objectives:

The course seeks to provide participants with a strong understanding of how statistical models can be applied in the social sciences when data is complex and/or multilevel in its nature.

Participants should learn

  • how relevant statistical models are formulated and interpreted
  • the relative attractions and limitations of different model strategies
  • practical skills in fluently handling and analysing complex data using one or more relevant software packages

There are a number of benefits to learning how to understand and to implement statistical models for complex and multilevel data. Multilevel models are widely used in the social sciences so there are many good reasons to learn in detail about their theory and their practical implementation. Further course materials explore several other important but under-utilised options in the specification of statistical models and in making good use of complex datasets. Training in these areas should provide course participants with the confidence to compare between the strengths and limitations of different plausible models, and equip them with valued practical skills in using software to work with data and run statistical models.

Course Prerequisites:

This is an introductory course, designed for people who have little or no previous experience in applying models to multilevel or complex data. It is expected that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions will take basic versions of these regression models as a starting point, and build onwards to multilevel models and other related extension topics in statistical modelling. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting regression outputs.

The course is also best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials available in several packages (Stata, SPSS, R and MLwiN, with Stata used most often). The lab materials also make use of several different social science datasets. Previous exposure to the ‘syntax’ languages of at least one of these packages will be an advantage. Course materials include some introductory documentation to help with using software, and for this reason the course should still be accessible to people who have little previous experience, however students without any background in the programming of software using syntax should be prepared that extra effort will probably be needed near the start of the course in order to make good use of the lab exercises.

Required texts

This text will be provided by ESS: Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.

For Stata users, we also recommend accessing or purchasing the following:
Rabe-Hesketh, S. and Skrondal, A. (2008/12) Multilevel and Longitudinal Modelling Using Stata, Second Edition/3rd Edition (2 volume set). College Station, Tx: Stata Press.

Background knowledge required:
Calculus = Elementary. Some background knowledge is required.
Linear Regression = Elementary. Some background knowledge is required.

OLS = Elementary. Some background knowledge is required.

Computer Background:
Stata = Elementary. Some background knowledge is required.

The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model, and it includes some wider topics about strategies for dealing with complex data aside from using multilevel models; 2E goes a little further on statistical details and estimation strategies for multilevel models, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.

Lectures (L) / Computer practicals (P)

Day 1: Monday 11 July
Introduction to statistical modelling and to multilevel and complex data (i)
L1a: Why statistical models can help us to study multilevel and complex data
L1b: Course arrangements and overview
P1: Using statistical software for social science data analysis

Day 2: Tuesday 12 July
Introduction to statistical modelling and to multilevel and complex data (ii)
L2a: Comparing descriptive and model-based ways of analysing social science data
L2b: Getting to grips with multilevel and complex datasets
P2: Exploring and summarising complex data; key elements of statistical modelling

Day 3: Wednesday 13 July
Linear outcomes models for multilevel data (i)
L3a: Important ways of adapting statistical models for multilevel and complex data
L3b: Understanding and interpreting the two-level random intercepts model
P3: Two-level random intercepts model specifications and interpretations

Day 4: Thursday 14 July
Linear outcomes models for multilevel data (ii)
L4a: The two-level random slopes model
L4b: Interpreting random intercepts and slopes
P4: Models for random intercepts and slopes

Day 5: Friday 15 July
Multilevel models for binary and other non-linear outcomes
L5a: Multilevel models for binary outcomes
L5b: Multilevel models for multinomial, ordered and count outcomes
P5: Non-linear outcomes in multilevel models

Day 6: Monday 18 July
Varieties of models for complex and multilevel data (i)
L6a: Recap and refresher on week 1 topics
L6b: Alternative important models for complex and multilevel data
L6c: Critical perspectives on random effects models
P6: Alternative ways to treat and assess clustered data and hierarchical effects; ways of using complex variables; introductory examples of SEMs, multiprocess models, and models for causal analysis

Day 7: Tuesday 19 July
Varieties of models for complex and multilevel data (ii)
L7a: Random effects models with three and more levels and with cross-classified and multiple membership designs
L7b: Case study: Using statistical models for cross-national comparisons
P7: Data and models with complex clustering

Day 8: Wednesday 20 July
Varieties of models for complex and multilevel data (iii)
L8a: Approaches to dealing with sampling weights and missing data
L8b: Popular statistical models for longitudinal data analysis
P8: Examples of using sampling weights; strategies for taking account of missing data;
examples in analysing longitudinal data;

Day 9: Thursday 21 July
Research applications in statistical modelling
L9a: Class plenary: Participants’ projects that use multilevel and/or complex data
L9b: Option: Review/Questions/Selected recap topics
P9: Applied research applications – extension topics

Day 10: Friday 22 July
Reflections and next steps
L10a: Trends and prospects in using statistical models for multilevel and complex data
L10b: Making progress in applied research with complex and multilevel data
P10: Lab review/recap opportunity
R: Hox et al. (2017: c10-16; Browne et al. 2019)

Software used (suggested introductory online information):

Stata (
MLwiN (
R (

Stata will be used most frequently. Many, but not all, exercises will also be available in SPSS, R and/or MLwiN. Prior knowledge of these packages is not assumed but previous exposure to syntax programming in at least one of them will be beneficial, since the software examples in the course use ‘syntax’ modes of operation. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the coursepack). Participants are asked to read through this note in the opening days of the course, and should be prepared to spend additional study time in improving their software skills at this point if relevant.

Recommended References:

The text by Hox et al. (2017) will be supplied with the course-pack.

The text by Rabe-Hesketh and Skrondal (2008/2012) is also recommended for additional purchase (for Stata users) if possible.

Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge. [ISBN: 9781138121362]
Rabe-Hesketh, S., & Skrondal, A. (2008/2012). Multilevel and Longitudinal Modeling Using Stata, Second Edition/Third Edition. College Station, Tx: Stata Press [ISBN: 9781597180405, single-volume 2nd ed; 9781597181082, 2-volume set, 3rd ed]
Both Hox et al. (2017) and Rabe-Hesketh and Skrondal (2008/2012) focus upon multilevel modelling through ‘random effects’ models, but course 1E also encompasses other topics in statistical modelling for complex and multilevel data. Accordingly we will often point you to other references and recommended readings aside from these texts, with references presented throughout the course.
• Some popular alternative readings that also focus on random effects multilevel models are:
Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.
Heck, R. H., Thomas, S. L., & Tabata, L. N. (2013). Multilevel and Longitudinal Modeling with IBM SPSS, Second Edition. London: Routledge.
Luke, D. A. (2020). Multilevel Modeling, 2nd Edition. London: Sage.
Plewis, I. (1994). Longitudinal Multilevel Models. In A. Dale & R. B. Davies (Eds.), Analysing Social and Political Change : A casebook of methods. London: Sage.
Plewis, I. (1998). Multilevel Models. Social Research Update, 23,
Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.

• The texts below are popular readings on other relevant aspects of statistical modelling:
Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.
Allison, P. D. (2009). Fixed Effects Regression Models. London: Sage.
DiPrete, T. A., & Forristal, J. D. (1994). Multilevel Models – Methods and Substance. Annual Review of Sociology, 20, 331-357.
Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (Chpts 2-4).
Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.
Tarling, R. (2009). Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

• These texts are cited within the website’s ‘course outline’:
Bell, A., Fairbrother, M., & Jones, K. (2019a). Fixed and random effects models: making an informed choice. Quality & Quantity, 53(2), 1051-1074.
Bell, A., Holman, D., & Jones, K. (2019b). Using Shrinkage in Multilevel Models to Understand Intersectionality: A Simulation Study and a Guide for Best Practice. Methodology, 15(2), 88-96.
Browne, W. J., Charlton, C. M. J., Michaelides, D. T., Parker, R. M. A., Cameron, B., Szmaragd, C., . . . Moreau, L. (2019). A Beginner’s Guide to Stat-JR’s TREE Interface version 1.0.7. Bristol: Centre for Multilevel Modelling, University of Bristol & Electronics and Computer Science, University of Southampton.
DiPrete, T. A., & Forristal, J. D. (1994). Multilevel Models – Methods and Substance. Annual Review of Sociology, 20, 331-357.
Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge.

• There are several online depositories of materials covering statistical modelling for complex and multilevel data, in particular see the LEMMA course at: