Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.

Course Content
Social science data often features the ‘clustering’ or ‘hierarchical nesting’ of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models are analytical tools that are designed for such scenarios – we often use them in order to undertake regression-style analyses which take account of the clustering (when we would otherwise have made a specification error if we ignored it), and we also use multilevel models when we particularly want to analyse and explore patterns related to the clustering itself. Multilevel models are an important, widely used tool in social statistics, and they are of potential relevance to almost any study that uses complex social data.
The course introduces multilevel modelling as a special case of statistical modelling in social research. Many of the course materials cover topics that are concerned with appropriately specifying and interpreting statistical models in general – for instance, interpreting parameters, and assessing model diagnostics and assumptions. When social scientists refer to multilevel models, they normally mean a specification that is also known as the ‘random effects’ model. The course features many materials on random effects models, but it also includes points of comparison with other modelling devices that can be used for clustered or nested data.
The daily teaching sessions comprise lectures, followed by lab exercises that implement examples of things that were described in lectures. The course coverage begins with the concepts and statistical formulations of multilevel models, and statistical models in general. The core materials address the specification of statistical models for linear and categorical outcome variables in the context of a multilevel data structure, with attention to the appropriate estimation and interpretation of the relevant model parameters. Extension materials address selected relevant specialist issues – research applications for longitudinal datasets; undertaking cross-national comparisons; working with complex clustering structures that involve multiple ‘higher levels’; and extending models when multiple indicator variables are available, when multiple linked processes might be analysed in relation to each other, and when there is an explicit desire to understand causal relationships. Many materials try to address the practical application of multilevel models, including training on the operationalisation of measures and the organisation of complex datasets, and on the specification and estimation of models with relevant software.

Course Objectives
The course seeks to provide participants with a solid grounding in the application of multilevel models. This involves combining a strong understanding of how multilevel models are formulated in statistical terms (and their relationship to other types of statistical model), with a fluency in handling data with clustered and hierarchical features and an ability to specify multilevel models in popular statistical analysis packages. The course seeks to convey both the attractions and limitations of a multilevel modelling approach as a strategy of statistical modelling.
The course will feature daily lab sessions with command files which illustrate handling data and specifying multilevel models in several software packages. Most often, lab examples use the Stata package, since that software features a wide range of options both for handling complex data, and for specifying relevant statistical models. Selected examples are also given in other packages, including SPSS, R, and MLwiN (a specialist software, designed explicitly for estimating multilevel models). Worked examples will be available in these packages using several different, often large scale, social survey datasets. The variety of examples are designed to provide participants with important operational skills which are not widely taught.
There are a number of benefits to studying the practical application of multilevel models. Firstly, multilevel models are important devices for exploring the character of clustered or hierarchical structures within a dataset (for example, to compare the scale of pupil-level and class-level influences in an educational study which features pupils clustered within classes). Secondly, they are often used simply to control for hierarchical structural features within data (that is, when a pattern of clustering is not substantively important, but does need to be controlled for). Finally, a thorough introduction and review of the practical implementation of multilevel models also serves as an effective means of understanding the implementation and interpretation of statistical models in the social sciences more generally.

Course Prerequisites
This is an introductory course, designed for people who have little or no previous experience in applying multilevel models. It is expected, however, that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions will take these sorts of regression models as a starting point, and build onwards to multilevel models and other related extension topics in statistical modelling. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting regression outputs (e.g. Allison 1999; Tarling 2009), and all participants are encouraged to read at least one paper, chapter or book from the list below of ‘representative background reading’ prior to attending.
The course is also best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials spanning several packages (Stata, SPSS, R and MLwiN, with Stata used most often), and the lab materials also make use of several different social science datasets. Previous exposure to the ‘syntax’ languages of at least one of these packages will be an advantage, since the practical materials involve programming in these languages. The course should still be accessible to people who have little previous experience in this area, since background materials on the software packages will be made available, but students without some background in the programming of software using syntax should be prepared that extra effort will probably be required during the opening days of the course in order to follow the lab exercises. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the coursepack). Participants are asked to read through this note in the opening days of the course, and they may need to spend additional study time improving their software skills at this point if relevant.
Software used (and suggested introductory online information):
§ Stata (https://www.stata.com/support/ ; http://tutorials.iq.harvard.edu/Stata/StataIntro/StataIntro.html))
§ SPSS (http://www.spss-tutorials.com/)
§ MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/)
§ R (https://www.statmethods.net/r-tutorial/index.html)
The course includes daily illustrative lab exercises using these software packages, but please be aware that not every example is available in every package, and Stata is used much more than the other packages.

Representative Background Reading
1) Background on modelling social science data
Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.
Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (See chpts 2-4).
Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.
2) Introductions to multilevel models
Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.
Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.
Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.
Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge (C.9).
3) Illustrative research articles which use multilevel modelling
Andersen, R., Yang, M., & Heath, A. F. (2006). Class Politics and Political Context in Britain, 1964-1997: Have Voters Become More Individualised? European Sociological Review, 22(2), 215-228.
Jen, M. H., Jones, K., & Johnston, R. J. (2009). Compositional and contextual approaches to the study of health behaviour and outcomes: Using multi-level modelling to evaluate Wilkinson’s income inequality hypothesis. Health and Place, 15, 198-203.
Maas, I., & Zijdeman, R. L. (2010). Beyond the local marriage market: The influence of modernization on geographical heterogamy Demographic Research, 23(33), 933-962.
Rasbash, J., Leckie, G., Pillinger, R., & Jenkins, J. (2010). Children’s educational progress: Partitioning family, school and area effects. Journal of the Royal Statistical Society Series A, 173, 657-682.
Verbakel, E. (2013). Leisure values of Europeans from 46 countries. European Sociological Review, 29(3), 669-682.
Required texts:

The following text will be provided by the Summer School as part of your course material and used throughout the course:
Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.
For Stata users, we also recommend accessing or purchasing the following:
Rabe-Hesketh, S. and Skrondal, A. (2008/12) Multilevel and Longitudinal Modelling Using Stata, Second Edition/3rd Edition (2 volume set). College Station, Tx: Stata Press.

Background knowledge required
Statistics
OLS = moderate
Maximum Likelihood = elementary

Computer Background
Stata = elementary
R = elementary

The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model; 2E goes a little further on statistical details and estimation strategies, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter  coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN, with some illustration of Stata and R. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.

Lectures (L) / Computer practicals (P) / Readings (R)

Day 1  Monday 13 July         

Introduction to multilevel modelling (i)     

L1a:      The idea of multilevel modelling

L1b:      Course arrangements and overview

P1:       Getting started with multilevel software

R:         Hox et al. (2017: preface & c1)

Day 2  Tuesday 14 July        

Introduction to multilevel modelling (ii)

L2a:      The formulae, assumptions and interpretations of statistical and multilevel models

L2b:      Multilevel data structures and research examples

P2:       Exploring and summarising multilevel data and key elements of statistical modelling

R:         Hox et al. (2017: sections 2.1 & 4.2)

Day 3  Wednesday 15 July   

Multilevel applications with  linear outcomes (i): Two-level random intercepts models       

L3a:      The two-level random intercepts model  

L3b:      Interpreting random intercepts and their residuals

P3:       Two-level random intercepts model examples

R:         Hox et al. (2017: sections 2.2, 3.1 & 3.4)

Day 4 Thursday 16 July       

Multilevel applications with linear outcomes (ii): Random intercepts and slopes      

L4a:      The two-level random slopes model

L4b:      Interpreting random intercepts and slopes

P4:       Models for random intercepts and slopes

R:         Hox et al. (2017: sections 4.1, 2.4 & 4.3)

Day 5 Friday 17 July

Multilevel applications for binary and other categorical outcomes      

L5a:      Multilevel models for binary outcomes

L5b:      Multinomial, ordered and count outcomes

P5:       Categorical outcomes in multilevel models

R:         Hox et al. (2017: c6, c7)

Day 6  Monday 20 July         

Multilevel techniques in context

L6a:      Cross-national comparisons and multilevel models

L6b:      Do we always need multilevel models?

P6:       Alternative statistical treatments for clustered data; testing and comparing hierarchical effects

R:         DiPrete and Forristal (1994); Bell et al. (2019b)

Day 7  Tuesday 21 July        

Multilevel models with more than two levels

L7a:      Hierarchical effects at three and more levels

L7b:      Cross-classified and multiple membership designs

P7:       Data and models with complex clustering 

R:         Hox et al.  (2017: c2.3; c9)

Day 8 Wednesday 22 July   

Special cases of multilevel modelling

L8a:      The relationship between multilevel models and structural equation models

L8b:      Special examples of multilevel models for longitudinal data analysis

P8:       Examples in analysing longitudinal data; Using SEMs as multilevel models

R:         Hox et al. (2017: c5; c10; c14)

Day 9  Thursday 23 July       

Multilevel modelling applications

L9a:      Class plenary: Selected hierarchical data and models

L9b:      Option: Review/Questions/Selected recap topics

P9:       Practical applications – extension topics

R:         Bell et al. (2019a)

Day 10            Friday 24 July

Review session         

L10a:     Next steps in advanced multilevel analysis

L10b:     The contribution of multilevel modelling

P10:      Multiprocess models and models for causal analysis; Lab review/recap opportunity

R:          Hox et al. (2017: c10-16; Browne et al. 2019)

Software used (suggested introductory online information)

Stata will be used most frequently. Many, but not all, exercises will also be available in SPSS, R and/or MLwiN. Prior knowledge of the above packages is not assumed but previous exposure to syntax programming in at least one of them will be beneficial, since the software examples in the course use ‘syntax’ modes of operation. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the coursepack). Participants are asked to read through this note in the opening days of the course, and may need to spend additional study time in improving their software skills at this point if relevant. 

Recommended References

  • The text by Hox et al. (2017) will be supplied with the course-pack. The text by Rabe-Hesketh and Skrondal (2008/2012) is recommended for additional purchase (for Stata users) if possible.
  • There are several other texts with a wide range of materials on multilevel modelling that could be useful before and during the module – some of the most popular options are listed below. Additional and follow-up reading recommendations will be made during the course.

Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.

Heck, R. H., Thomas, S. L., & Tabata, L. N. (2013). Multilevel and Longitudinal Modeling with IBM SPSS, Second Edition. London: Routledge.

Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge. [ISBN: 9781138121362]

Luke, D. A. (2004). Multilevel Modeling. London: Sage.

Rabe-Hesketh, S., & Skrondal, A. (2008/2012). Multilevel and Longitudinal Modeling Using Stata, Second Edition/Third Edition. College Station, Tx: Stata Press [ISBN: 9781597180405, single-volume 2nd ed; 9781597181082, 2-volume set, 3rd ed]

Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.

  • The texts listed below are good alternative options for preparatory reading. They are either short introductions to multilevel modelling, or useful statements on statistical modelling in general. It is desirable but not essential to read one or more of these before the course.

Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.

DiPrete, T. A., & Forristal, J. D. (1994). Multilevel Models – Methods and Substance. Annual Review of Sociology, 20, 331-357.

Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (Chpts 2-4).

Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.

Plewis, I. (1994). Longitudinal Multilevel Models. In A. Dale & R. B. Davies (Eds.), Analysing Social and Political Change : A casebook of methods. London: Sage.

Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.

Tarling, R. (2009). Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.

Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

  • These texts are cited in the ‘course outline’ above:

Bell, A., Fairbrother, M., & Jones, K. (2019a). Fixed and random effects models: making an informed choice. Quality & Quantity, 53(2), 1051-1074.

Bell, A., Holman, D., & Jones, K. (2019b). Using Shrinkage in MultilevelModels to Understand Intersectionality: A Simulation Study and a Guide for Best Practice. Methodology, 15(2), 88-96.

Browne, W. J., Charlton, C. M. J., Michaelides, D. T., Parker, R. M. A., Cameron, B., Szmaragd, C., . . . Moreau, L. (2019). A Beginner’s Guide to Stat-JR’s TREE Interface version 1.0.7. Bristol: Centre for Multilevel Modelling, University of Bristol & Electronics and Computer Science, University of Southampton.

DiPrete, T. A., & Forristal, J. D. (1994). Multilevel Models – Methods and Substance. Annual Review of Sociology, 20, 331-357.

Hox, J., Moerbeek, M., van de Schoot, R. (2017). Multilevel Analysis, 3rd Edition. London: Routledge.