Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph Social Inequalities and Occupational Stratification that analyses data on social interaction patterns and social inequalities, and an introductory textbook What is… Quantitative Longitudinal Data Analysis that focusses upon the secondary analysis of longitudinal survey datasets

Course Content

Social science data often features the ‘clustering’ or ‘hierarchical nesting’ of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models are analytical tools that are designed for such scenarios – we often use them in order to undertake regression-style analyses which take account of the clustering (when we would otherwise have made a specification error if we ignored it), and we also use multilevel models when we particularly want to analyse and describe patterns that are related to a hierarchical data structure. Multilevel models are an important, widely used tool in social statistics, and they are of potential relevance to almost any study that uses complex social data.

This course introduces multilevel modelling as a special case of statistical modelling in social research. The first materials cover what multilevel models are and the situations in which they can be most useful. Introductory materials also explore more general aspects of the appropriate specification and interpretation of regression and statistical models – for instance, on building models, interpreting their parameters, and assessing how well they fit the data. A second group of materials spend time introducing some of the most commonly used examples of multilevel models, such as the ‘random intercepts’ and ‘random slopes’ multilevel model, and the statistical model formulations that can be used when outcome variables are not linear. These materials also make lots of connections to more general themes in using statistical models. Lastly, we try to provide an accessible overview of a number of special scenarios and more advanced topics that often become relevant when using multilevel models and statistical models to study social processes – for instance, dealing with more complex hierarchical data structures, adapting models in order to focus on particular types of social process, and considering the strengths and weaknesses of different options in specifying and estimating statistical models. The daily teaching sessions comprise lectures, followed by lab exercises that implement examples of things that were described in lectures.

Course Objectives

The course seeks to provide participants with a strong understanding of how multilevel models are used in the social sciences, and of how to use multilevel models in their own research. Through lectures and study exercises, participants should learn how multilevel models are formulated in statistical terms, how they are related to other types of statistical model, and their relative attractions and limitations as a strategy of statistical modelling in research applications. Through the lab programme, participants should also develop skills and fluency in handling data with clustered and hierarchical features and in specifying multilevel models in relevant software packages.

The course’s lab programme features daily sessions where command files are provided which illustrate handling data and specifying multilevel models in several software packages. Most often, lab examples use the Stata package, since that software features a wide range of options both for handling complex data, and for specifying relevant statistical models. Many examples are also given in SPSS, R, and MLwiN (a specialist software, designed explicitly for estimating multilevel models). Worked examples will be available in these packages using several different, often large-scale, social survey datasets. The variety of examples are designed to provide participants with important operational skills which are not widely taught.

There are a number of benefits to learning how to understand and to implement multilevel models through this course. Firstly, multilevel models are important devices for exploring the character of clustered or hierarchical structures within a dataset (for example, to compare the scale of pupil-level and class-level influences in an educational study which features pupils clustered within classes). Secondly, multilevel models often provide important controls for hierarchical structural features within data (that is, when a pattern of clustering is not substantively important, but does need to be controlled for). Finally, a thorough introduction and review of the practical implementation of multilevel models also serves as an effective means of understanding the implementation and interpretation of statistical models in the social sciences more generally.

 

Course Prerequisites

This is an introductory course, designed for people who have little or no previous experience in applying multilevel models. It is expected, however, that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions will take basic versions of these regression models as a starting point, and builds onwards to multilevel models and other related extension topics in statistical modelling. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting regression outputs.

The course is also best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials spanning several packages (Stata, SPSS, R and MLwiN, with Stata used most often; we encourage participants to explore all of these packages at some stage in the course, though most participants do focus mainly on only one of them). The lab materials also make use of several different social science datasets. Previous exposure to the ‘syntax’ languages of at least one of these packages will be an advantage, since the practical materials involve programming in these languages. The course should still be accessible to people who have limited previous experience with software, since background materials on the software packages will be made available, but students without any background in the programming of software using syntax should be prepared that extra effort will probably be required during the opening days of the course in order to follow the lab exercises. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the course pack). Participants are asked to read through this note in the opening days of the course, and they may need to spend additional study time improving their software skills at this point if relevant.

§ Stata (https://www.stata.com/support/ ; http://tutorials.iq.harvard.edu/Stata/StataIntro/StataIntro.html))
§ SPSS (http://www.spss-tutorials.com/)
§ MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/)
§ R (https://www.statmethods.net/r-tutorial/index.html)
The course includes daily illustrative lab exercises using these software packages, but please be aware that not every example is available in every package, and Stata is used relatively more than the other packages.

Representative Background Reading
1) Background on modelling social science data
Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.
Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (See chpts 2-4).
Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.
2) Introductions to multilevel models
Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.
Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.
Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.
Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge (C.9).
3) Illustrative research articles which use multilevel modelling
Andersen, R., Yang, M., & Heath, A. F. (2006). Class Politics and Political Context in Britain, 1964-1997: Have Voters Become More Individualised? European Sociological Review, 22(2), 215-228.
Jen, M. H., Jones, K., & Johnston, R. J. (2009). Compositional and contextual approaches to the study of health behaviour and outcomes: Using multi-level modelling to evaluate Wilkinson’s income inequality hypothesis. Health and Place, 15, 198-203.
Maas, I., & Zijdeman, R. L. (2010). Beyond the local marriage market: The influence of modernization on geographical heterogamy Demographic Research, 23(33), 933-962.
Rasbash, J., Leckie, G., Pillinger, R., & Jenkins, J. (2010). Children’s educational progress: Partitioning family, school and area effects. Journal of the Royal Statistical Society Series A, 173, 657-682.
Verbakel, E. (2013). Leisure values of Europeans from 46 countries. European Sociological Review, 29(3), 669-682.
Required texts:

The following text will be provided by the Summer School as part of your course material and used throughout the course:
Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.
For Stata users, we also recommend accessing or purchasing the following:
Rabe-Hesketh, S. and Skrondal, A. (2008/12) Multilevel and Longitudinal Modelling Using Stata, Second Edition/3rd Edition (2 volume set). College Station, Tx: Stata Press.

Background knowledge required
Statistics
OLS = moderate
Maximum Likelihood = elementary

Computer Background
Stata = elementary
R = elementary

The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model; 2E goes a little further on statistical details and estimation strategies, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter  coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.

Course outline

Lectures (L) / Computer practicals (P) / Readings (R)

 

Day 1   Monday 12 July         

Introduction to multilevel modelling (i)     

L1a:      The idea of multilevel modelling

L1b:      Course arrangements and overview

P1:       Getting started with multilevel software

R:         Hox et al. (2017: preface & c1)

 

Day 2   Tuesday 13 July        

Introduction to multilevel modelling (ii)

L2a:      The formulae, assumptions and interpretations of statistical and multilevel models

L2b:      Multilevel data structures and examples

P2:       Exploring and summarising multilevel data and key elements of statistical modelling

R:         Hox et al. (2017: sections 2.1 & 4.2)

 

Day 3   Wednesday 14 July   

Multilevel applications with linear outcomes (i): Two-level random intercepts models      

L3a:      The two-level random intercepts model

L3b:      Interpreting random intercepts and their residuals

P3:       Two-level random intercepts model examples

R:         Hox et al. (2017: sections 2.2, 3.1 & 3.4)

 

Day 4   Thursday 15 July       

Multilevel applications with linear outcomes (ii): Random intercepts and slopes       L4a:      The two-level random slopes model

L4b:      Interpreting random intercepts and slopes

P4:       Models for random intercepts and slopes

R:         Hox et al. (2017: sections 4.1, 2.4 & 4.3)

 

Day 5   Friday 16 July

Multilevel applications for binary and other categorical outcomes      

L5a:      Multilevel models for binary outcomes

L5b:      Multinomial, ordered and count outcomes

P5:       Categorical outcomes in multilevel models

R:         Hox et al. (2017: c6, c7)

                                                                  

 

Day 6   Monday 19 July         

Multilevel techniques in context

L6a:      Cross-national comparisons and multilevel models

L6b:      Do we always need multilevel models?

P6:       Alternative statistical treatments for clustered data; testing and comparing hierarchical effects

R:         DiPrete and Forristal (1994); Bell et al. (2019b)

 

Day 7   Tuesday 20 July        

Multilevel models with more than two levels

L7a:      Hierarchical effects at three and more levels

L7b:      Cross-classified and multiple membership designs

P7:       Data and models with complex clustering

R:         Hox et al. (2017: c2.3; c9)

 

Day 8   Wednesday 21 July   

Special cases of multilevel modelling

L8a:      Multilevel models and structural equation models

L8b:      Multilevel models for longitudinal data analysis

P8:       Examples in analysing longitudinal data; Using SEMs as multilevel models

R:         Hox et al. (2017: c5; c10; c14)

 

Day 9   Thursday 22 July       

Multilevel modelling applications

L9a:      Class plenary: Selected hierarchical data and models

L9b:      Option: Review/Questions/Selected recap topics

P9:       Practical applications – extension topics

R:         Bell et al. (2019a)

 

Day 10 Friday 23 July

Review session         

L10a:     Next steps in advanced multilevel analysis

L10b:     The contribution of multilevel modelling

P10:      Multiprocess models and models for causal analysis; Lab review/recap opportunity

R:          Hox et al. (2017: c10-16; Browne et al. 2019)

 

 

Software used (suggested introductory online information)

  • Stata (https://www.stata.com/support/)
  • SPSS (http://www.spss-tutorials.com/)
  • MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/)
  • R (https://www.statmethods.net/r-tutorial/index.html)

Stata will be used most frequently. Many, but not all, exercises will also be available in SPSS, R and/or MLwiN. Prior knowledge of the above packages is not assumed but previous exposure to syntax programming in at least one of them will be beneficial, since the software examples in the course use ‘syntax’ modes of operation. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the coursepack). Participants are asked to read through this note in the opening days of the course, and may need to spend additional study time in improving their software skills at this point if relevant. 

 

Recommended References

 

  • The text by Hox et al. (2017) will be supplied with the course-pack. The text by Rabe-Hesketh and Skrondal (2008/2012) is recommended for additional purchase (for Stata users) if possible.
  • There are several other texts with a wide range of materials on multilevel modelling that could be useful before and during the module – some of the most popular options are listed below. Additional and follow-up reading recommendations will be made during the course.

Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.

Heck, R. H., Thomas, S. L., & Tabata, L. N. (2013). Multilevel and Longitudinal Modeling with IBM SPSS, Second Edition. London: Routledge.

Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge. [ISBN: 9781138121362]

Luke, D. A. (2020). Multilevel Modeling, 2nd Edition. London: Sage.

Rabe-Hesketh, S., & Skrondal, A. (2008/2012). Multilevel and Longitudinal Modeling Using Stata, Second Edition/Third Edition. College Station, Tx: Stata Press [ISBN: 9781597180405, single-volume 2nd ed; 9781597181082, 2-volume set, 3rd ed]

Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.

 

  • The texts listed below are good alternative options for preparatory reading. They are either short introductions to multilevel modelling, or useful statements on statistical modelling in general. It is desirable but not essential to read one or more of these before the course.

Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.

DiPrete, T. A., & Forristal, J. D. (1994). Multilevel Models – Methods and Substance. Annual Review of Sociology, 20, 331-357.

Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (Chpts 2-4).

Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.

Plewis, I. (1994). Longitudinal Multilevel Models. In A. Dale & R. B. Davies (Eds.), Analysing Social and Political Change : A casebook of methods. London: Sage.

Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.

Tarling, R. (2009). Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.

Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

 

  • These texts are cited in the ‘course outline’ above:

Bell, A., Fairbrother, M., & Jones, K. (2019a). Fixed and random effects models: making an informed choice. Quality & Quantity, 53(2), 1051-1074.

Bell, A., Holman, D., & Jones, K. (2019b). Using Shrinkage in Multilevel Models to Understand Intersectionality: A Simulation Study and a Guide for Best Practice. Methodology, 15(2), 88-96.

Browne, W. J., Charlton, C. M. J., Michaelides, D. T., Parker, R. M. A., Cameron, B., Szmaragd, C., . . . Moreau, L. (2019). A Beginner’s Guide to Stat-JR’s TREE Interface version 1.0.7. Bristol: Centre for Multilevel Modelling, University of Bristol & Electronics and Computer Science, University of Southampton.

DiPrete, T. A., & Forristal, J. D. (1994). Multilevel Models – Methods and Substance. Annual Review of Sociology, 20, 331-357.

Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge.