Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.

**Course Content**Social science data often features the ‘clustering’ or ‘hierarchical nesting’ of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models are analytical tools that are designed for such scenarios – we often use them in order to undertake regression-style analyses which take account of the clustering (when we would otherwise have made a specification error if we ignored it), and we also use multilevel models when we particularly want to analyse and explore patterns related to the clustering itself. Multilevel models are an important, widely used tool in social statistics, and they are of potential relevance to almost any study that uses complex social data.

The course introduces multilevel modelling as a special case of statistical modelling in social research. Many of the course materials cover topics that are concerned with appropriately specifying and interpreting statistical models in general – for instance, interpreting parameters, and assessing model diagnostics and assumptions. When social scientists refer to multilevel models, they normally mean a specification that is also known as the ‘random effects’ model. The course features many materials on random effects models, but it also includes points of comparison with other modelling devices that can be used for clustered or nested data.

The daily teaching sessions comprise lectures, followed by lab exercises that implement examples of things that were described in lectures. The course coverage begins with the concepts and statistical formulations of multilevel models, and statistical models in general. The core materials address the specification of statistical models for linear and categorical outcome variables in the context of a multilevel data structure, with attention to the appropriate estimation and interpretation of the relevant model parameters. Extension materials address selected relevant specialist issues – research applications for longitudinal datasets; undertaking cross-national comparisons; working with complex clustering structures that involve multiple ‘higher levels’; and extending models when multiple indicator variables are available, when multiple linked processes might be analysed in relation to each other, and when there is an explicit desire to understand causal relationships. Many materials try to address the practical application of multilevel models, including training on the operationalisation of measures and the organisation of complex datasets, and on the specification and estimation of models with relevant software.

**Course Objectives**The course seeks to provide participants with a solid grounding in the application of multilevel models. This involves combining a strong understanding of how multilevel models are formulated in statistical terms (and their relationship to other types of statistical model), with a fluency in handling data with clustered and hierarchical features and an ability to specify multilevel models in popular statistical analysis packages. The course seeks to convey both the attractions and limitations of a multilevel modelling approach as a strategy of statistical modelling.

The course will feature daily lab sessions with command files which illustrate handling data and specifying multilevel models in several software packages. Most often, lab examples use the Stata package, since that software features a wide range of options both for handling complex data, and for specifying relevant statistical models. Selected examples are also given in other packages, including SPSS, R, and MLwiN (a specialist software, designed explicitly for estimating multilevel models). Worked examples will be available in these packages using several different, often large scale, social survey datasets. The variety of examples are designed to provide participants with important operational skills which are not widely taught.

There are a number of benefits to studying the practical application of multilevel models. Firstly, multilevel models are important devices for exploring the character of clustered or hierarchical structures within a dataset (for example, to compare the scale of pupil-level and class-level influences in an educational study which features pupils clustered within classes). Secondly, they are often used simply to control for hierarchical structural features within data (that is, when a pattern of clustering is not substantively important, but does need to be controlled for). Finally, a thorough introduction and review of the practical implementation of multilevel models also serves as an effective means of understanding the implementation and interpretation of statistical models in the social sciences more generally.

**Course Prerequisites**This is an introductory course, designed for people who have little or no previous experience in applying multilevel models. It is expected, however, that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions will take these sorts of regression models as a starting point, and build onwards to multilevel models and other related extension topics in statistical modelling. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting regression outputs (e.g. Allison 1999; Tarling 2009), and all participants are encouraged to read at least one paper, chapter or book from the list below of ‘representative background reading’ prior to attending.

The course is also best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials spanning several packages (Stata, SPSS, R and MLwiN, with Stata used most often), and the lab materials also make use of several different social science datasets. Previous exposure to the ‘syntax’ languages of at least one of these packages will be an advantage, since the practical materials involve programming in these languages. The course should still be accessible to people who have little previous experience in this area, since background materials on the software packages will be made available, but students without some background in the programming of software using syntax should be prepared that extra effort will probably be required during the opening days of the course in order to follow the lab exercises. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the coursepack). Participants are asked to read through this note in the opening days of the course, and they may need to spend additional study time improving their software skills at this point if relevant.

Software used (and suggested introductory online information):

§ Stata (https://www.stata.com/support/ ; http://tutorials.iq.harvard.edu/Stata/StataIntro/StataIntro.html))

§ SPSS (http://www.spss-tutorials.com/)

§ MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/)

§ R (https://www.statmethods.net/r-tutorial/index.html)

The course includes daily illustrative lab exercises using these software packages, but please be aware that not every example is available in every package, and Stata is used much more than the other packages.

**Representative Background Reading**1) Background on modelling social science data

Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.

Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (See chpts 2-4).

Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.

Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.

Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

2) Introductions to multilevel models

Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.

Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.

Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.

Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.

Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge (C.9).

3) Illustrative research articles which use multilevel modelling

Andersen, R., Yang, M., & Heath, A. F. (2006). Class Politics and Political Context in Britain, 1964-1997: Have Voters Become More Individualised? European Sociological Review, 22(2), 215-228.

Jen, M. H., Jones, K., & Johnston, R. J. (2009). Compositional and contextual approaches to the study of health behaviour and outcomes: Using multi-level modelling to evaluate Wilkinson’s income inequality hypothesis. Health and Place, 15, 198-203.

Maas, I., & Zijdeman, R. L. (2010). Beyond the local marriage market: The influence of modernization on geographical heterogamy Demographic Research, 23(33), 933-962.

Rasbash, J., Leckie, G., Pillinger, R., & Jenkins, J. (2010). Children’s educational progress: Partitioning family, school and area effects. Journal of the Royal Statistical Society Series A, 173, 657-682.

Verbakel, E. (2013). Leisure values of Europeans from 46 countries. European Sociological Review, 29(3), 669-682.

Required texts:

The following text will be provided by the Summer School as part of your course material and used throughout the course:

Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.

For Stata users, we also recommend accessing or purchasing the following:

Rabe-Hesketh, S. and Skrondal, A. (2008/12) Multilevel and Longitudinal Modelling Using Stata, Second Edition/3rd Edition (2 volume set). College Station, Tx: Stata Press.

**Background knowledge required***Statistics*

OLS = moderate

Maximum Likelihood = elementary

*Computer Background*

Stata = elementary

R = elementary

The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model; 2E goes a little further on statistical details and estimation strategies, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN, with some illustration of Stata and R. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.