Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research and publications cover methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality.
Social science data often features the ‘clustering’ or ‘hierarchical nesting’ of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models are statistical models which provide analytical tools for dealing with data of this nature. They provide a convenient means to undertake regression analysis which takes account of, and can help to summarise, patterns of clustering. As such multilevel models are an important tool in social statistics, and are of potential relevance to almost any study using complex social data.
This course provides an applied introduction to multilevel modelling for social science datasets. It will introduce the statistical features of multilevel models, deal with approaches to handling data which has clustered or hierarchical elements, and provide training in specifying multilevel models for linear and categorical outcome variables in a variety of survey data scenarios. When social scientists refer to multilevel models, they normally mean a specification that is also known as the ‘random effects’ model. This course concentrates upon random effects models, but also features many points of comparison with other modelling devices that can be used for clustered or nested data. The course also emphasises the practical application of multilevel models, and seeks to convey both the attractions and limitations of a multilevel modelling approach.
The course seeks to provide participants with a solid grounding in the application of multilevel models. This involves combining a strong understanding of how multilevel models are formulated in statistical terms (and their relationship to other types of statistical model), with a fluency in handling data with clustered and hierarchical features and an ability to specify multilevel models in popular statistical analysis packages.
The course will feature lab sessions with command files which illustrate handling data and specifying multilevel models in several software packages. Most often, lab examples use the Stata package, since that software features a wide range of options both for handling complex data, and for specifying multilevel models. Selected examples are also given in other packages, including SPSS, the freeware R, and MLwiN (a specialist software, designed explicitly for estimating multilevel models). Worked examples will be available in these packages using several different, often large scale, social survey datasets: this is an ambitious objective which seeks to provide participants with important operational skills which are not widely taught.
There are a number of benefits to studying the practical application of multilevel models. Firstly, multilevel models are important devices for exploring the character of clustered or hierarchical structures within a dataset (for example, to compare the scale of pupil-level and class-level influences in an educational study which features pupils clustered within classes). Secondly, they are often used simply to control for hierarchical structural features within data (that is, when a pattern of clustering is not substantively important, but does need to be controlled for). Finally, a thorough introduction and review of the practical implementation of multilevel models also serves as an effective means of refreshing understanding of the implementation and interpretation of statistical models in the social sciences more generally.
This is an introductory course, but participants will benefit from having moderate levels of previous statistical training and previous experience in using statistical software (see descriptions below). Prior to attending the course, all participants are encouraged to read a research article that uses a multilevel model, and an introductory article or chapter on the methodology (suggestions given below).
The course is suitable for participants who have received statistical training at least to the level of understanding the application of conventional regression modelling approaches (e.g. multiple regression and logistic regression), and who are fluent in popular descriptive analytical techniques and the statistical tests behind them (e.g. chi-square tests; correlation values). Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting the outputs from conventional regression analyses, such as on coefficient effects and indicators of model fit (e.g. Allison 1999; Tarling 2009). The course will take conventional regression models as its starting point, and build onwards to multilevel models and other related extension topics in statistical modelling.
The course is best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials spanning several packages (Stata, SPSS, R and MLwiN, with Stata used most often). It also uses several different social science datasets. Previous exposure to the ‘syntax’ languages of these packages will be an advantage, since the practical materials involve programming in these languages. The course should be accessible to people who have little previous experience in this area, since background materials on the software packages will be made available, but students without some background in the programming of software using syntax should be prepared that extra effort will probably be required during the opening days of the course in order to follow the lab exercises. A note on software is prepared for the course that discusses and illustrates how to use the packages involved (supplied as an appendix to the coursepack). Participants are asked to read through this note in the opening days of the course, and may need to spend additional study time in improving their software skills at this point if relevant.
Software used (and suggested introductory online information):
Stata (https://www.stata.com/support/ ; http://tutorials.iq.harvard.edu/Stata/StataIntro/StataIntro.html))
We stress that not every example is available in every package. Stata will be used much more than the other packages.
Representative Background Reading
1)Background on modelling social science data
Allison, P. D. (1999). Multiple Regression: A primer. London: Sage.
Long, J.S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata, 3rd Edition. College Station, Tx: Stata Press. (See chpts 2-4).
Menard, S. (2001). Applied Logistic Regression Analysis, Second Edition. Berkley, Ca: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.
2)Introductions to multilevel models
Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s Just Regression! New York: The Guilford Press.
Plewis, I. (1998). Multilevel Models. Social Research Update, 23, http://sru.soc.surrey.ac.uk/SRU23.html.
Robson, K., & Pevalin, D. (2016). Multilevel Modeling in Plain Language. London: Sage.
Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An introduction to basic and advanced multilevel modelling, 2nd Edition. London: Sage.
Tarling, R. (2009) Statistical Modelling for Social Researchers: Principles and practice. London: Routledge (C.9).
3)Illustrative research articles which use multilevel modelling
Andersen, R., Yang, M., & Heath, A. F. (2006). Class Politics and Political Context in Britain, 1964-1997: Have Voters Become More Individualised? European Sociological Review, 22(2), 215-228.
Jen, M. H., Jones, K., & Johnston, R. J. (2009). Compositional and contextual approaches to the study of health behaviour and outcomes: Using multi-level modelling to evaluate Wilkinson’s income inequality hypothesis. Health and Place, 15, 198-203.
Maas, I., & Zijdeman, R. L. (2010). Beyond the local marriage market: The influence of modernization on geographical heterogamy Demographic Research, 23(33), 933-962.
Rasbash, J., Leckie, G., Pillinger, R., & Jenkins, J. (2010). Children’s educational progress: Partitioning family, school and area effects. Journal of the Royal Statistical Society Series A, 173, 657-682.
Verbakel, E. (2013). Leisure values of Europeans from 46 countries. European Sociological Review, 29(3), 669-682.
The following text will be provided by the Summer School as part of your course material and
used throughout the course:
Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.
For Stata users, we also recommend accessing or purchasing the following:
Rabe-Hesketh, S. and Skrondal, A. (2008/12) Multilevel and Longitudinal Modelling Using Stata, Second Edition/3rd Edition (2 volume set). College Station, Tx: Stata Press.
Background knowledge required
OLS = m
Maximum Likelihood = e
Stata = e
R = e
e = elementary, m = moderate, s = strong