Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph Social Inequalities and Occupational Stratification that analyses data on social interaction patterns and social inequalities, and an introductory textbook What is… Quantitative Longitudinal Data Analysis that focusses upon the secondary analysis of longitudinal survey datasets
Social science data often features the ‘clustering’ or ‘hierarchical nesting’ of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models are analytical tools that are designed for such scenarios – we often use them in order to undertake regression-style analyses which take account of the clustering (when we would otherwise have made a specification error if we ignored it), and we also use multilevel models when we particularly want to analyse and describe patterns that are related to a hierarchical data structure. Multilevel models are an important, widely used tool in social statistics, and they are of potential relevance to almost any study that uses complex social data.
This course introduces multilevel modelling as a special case of statistical modelling in social research. The first materials cover what multilevel models are and the situations in which they can be most useful. Introductory materials also explore more general aspects of the appropriate specification and interpretation of regression and statistical models – for instance, on building models, interpreting their parameters, and assessing how well they fit the data. A second group of materials spend time introducing some of the most commonly used examples of multilevel models, such as the ‘random intercepts’ and ‘random slopes’ multilevel model, and the statistical model formulations that can be used when outcome variables are not linear. These materials also make lots of connections to more general themes in using statistical models. Lastly, we try to provide an accessible overview of a number of special scenarios and more advanced topics that often become relevant when using multilevel models and statistical models to study social processes – for instance, dealing with more complex hierarchical data structures, adapting models in order to focus on particular types of social process, and considering the strengths and weaknesses of different options in specifying and estimating statistical models. The daily teaching sessions comprise lectures, followed by lab exercises that implement examples of things that were described in lectures.
The course seeks to provide participants with a strong understanding of how multilevel models are used in the social sciences, and of how to use multilevel models in their own research. Through lectures and study exercises, participants should learn how multilevel models are formulated in statistical terms, how they are related to other types of statistical model, and their relative attractions and limitations as a strategy of statistical modelling in research applications. Through the lab programme, participants should also develop skills and fluency in handling data with clustered and hierarchical features and in specifying multilevel models in relevant software packages.
The course’s lab programme features daily sessions where command files are provided which illustrate handling data and specifying multilevel models in several software packages. Most often, lab examples use the Stata package, since that software features a wide range of options both for handling complex data, and for specifying relevant statistical models. Many examples are also given in SPSS, R, and MLwiN (a specialist software, designed explicitly for estimating multilevel models). Worked examples will be available in these packages using several different, often large-scale, social survey datasets. The variety of examples are designed to provide participants with important operational skills which are not widely taught.
There are a number of benefits to learning how to understand and to implement multilevel models through this course. Firstly, multilevel models are important devices for exploring the character of clustered or hierarchical structures within a dataset (for example, to compare the scale of pupil-level and class-level influences in an educational study which features pupils clustered within classes). Secondly, multilevel models often provide important controls for hierarchical structural features within data (that is, when a pattern of clustering is not substantively important, but does need to be controlled for). Finally, a thorough introduction and review of the practical implementation of multilevel models also serves as an effective means of understanding the implementation and interpretation of statistical models in the social sciences more generally.
This is an introductory course, designed for people who have little or no previous experience in applying multilevel models. It is expected, however, that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions will take basic versions of these regression models as a starting point, and builds onwards to multilevel models and other related extension topics in statistical modelling. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting regression outputs.
The course is also best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials spanning several packages (Stata, SPSS, R and MLwiN, with Stata used most often; we encourage participants to explore all of these packages at some stage in the course, though most participants do focus mainly on only one of them). The lab materials also make use of several different social science datasets. Previous exposure to the ‘syntax’ languages of at least one of these packages will be an advantage, since the practical materials involve programming in these languages. The course should still be accessible to people who have limited previous experience with software, since background materials on the software packages will be made available, but students without any background in the programming of software using syntax should be prepared that extra effort will probably be required during the opening days of the course in order to follow the lab exercises. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the course pack). Participants are asked to read through this note in the opening days of the course, and they may need to spend additional study time improving their software skills at this point if relevant.
§ Stata (https://www.stata.com/support/ ; http://tutorials.iq.harvard.edu/Stata/StataIntro/StataIntro.html))
§ SPSS (http://www.spss-tutorials.com/)
§ MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/)
§ R (https://www.statmethods.net/r-tutorial/index.html)
The course includes daily illustrative lab exercises using these software packages, but please be aware that not every example is available in every package, and Stata is used relatively more than the other packages.
Representative Background Reading
1) Background on modelling social science data
Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.
Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (See chpts 2-4).
Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.
2) Introductions to multilevel models
Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.
Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.
Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.
Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge (C.9).
3) Illustrative research articles which use multilevel modelling
Andersen, R., Yang, M., & Heath, A. F. (2006). Class Politics and Political Context in Britain, 1964-1997: Have Voters Become More Individualised? European Sociological Review, 22(2), 215-228.
Jen, M. H., Jones, K., & Johnston, R. J. (2009). Compositional and contextual approaches to the study of health behaviour and outcomes: Using multi-level modelling to evaluate Wilkinson’s income inequality hypothesis. Health and Place, 15, 198-203.
Maas, I., & Zijdeman, R. L. (2010). Beyond the local marriage market: The influence of modernization on geographical heterogamy Demographic Research, 23(33), 933-962.
Rasbash, J., Leckie, G., Pillinger, R., & Jenkins, J. (2010). Children’s educational progress: Partitioning family, school and area effects. Journal of the Royal Statistical Society Series A, 173, 657-682.
Verbakel, E. (2013). Leisure values of Europeans from 46 countries. European Sociological Review, 29(3), 669-682.
The following text will be provided by the Summer School as part of your course material and used throughout the course:
Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.
For Stata users, we also recommend accessing or purchasing the following:
Rabe-Hesketh, S. and Skrondal, A. (2008/12) Multilevel and Longitudinal Modelling Using Stata, Second Edition/3rd Edition (2 volume set). College Station, Tx: Stata Press.
Background knowledge required
OLS = moderate
Maximum Likelihood = elementary
Stata = elementary
R = elementary
The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model; 2E goes a little further on statistical details and estimation strategies, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.