Please note: This course will be delivered in person at the Colchester campus. Online study is not available for this course.
Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.
Course Content:
This module takes a wide-ranging approach to studying multilevel statistical models in the social sciences. Materials will cover:
– how to understand, implement and interpret multilevel statistical models
– how to assess and compare different ways of taking account of complex and multilevel data within a statistical analysis
– how to work with complex social science data as part of the process of developing a useful statistical analysis
– how to use Stata for the preparation and analysis of complex social science datasets*
*(some course materials are also available in R, SPSS and MLwiN)
A ‘multilevel’ data structure is one where the units of analysis can be located within wider hierarchical groups or ‘clusters’, and a multilevel statistical model is one that features an adjustment that’s designed to appropriately recognise that structure within a model-based analysis. In the social sciences, there are a great many scenarios where data has some sort of multilevel structure, and hence where a multilevel statistical model might be a helpful approach. Examples include educational datasets where students might be clustered into classes or institutions; and longitudinal panel datasets when multiple responses can be thought of as clustered within the survey respondents.
Description of course activities
Students taking course 1E explore the challenges of applying multilevel statistical models to social science data. Daily teaching sessions comprise lectures followed by lab exercises that implement examples that were described in lectures. The teaching style tries to present statistical materials in an accessible manner that assumes only introductory previous knowledge. Lab sessions contain a wealth of illustrative examples, some of which provide opportunities to develop understanding of quite challenging research issues.
The course begins with selected foundational content relevant to using multilevel statistical models in the social sciences. We discuss how statistical models work in general terms, and how we implement them in social research, including best practice in using statistical software to run models. Topics include the appropriate specification and interpretation of regression and statistical models – for instance, on building models, interpreting their parameters, and assessing how well they fit the data. The course also reviews features of social science data that are often described as ‘complex’. A multilevel data structure is one example of complexity, but there are others too that often arise in social research data, and the best analyses are able to compare and contrast different options in different scenarios.
The course then turns to the popular strategy of using multilevel models with random effects as a way of studying society. Random effects models are a popular and widely used tool for analysing ‘clustered’ or ‘hierarchical’ datasets. Multilevel models with random effects let us use statistical models to analyse such data in a way which appropriately takes account of that clustering. In doing so, multilevel models are also linked to valuable conceptual distinctions such as the interplay between micro- and macro-level influences. The course materials introduce and contextualise multilevel models, with practical training in running them and comparing them to other models. Within week 1 of the course, we spend time on ‘2-level’ multilevel models with ‘random intercepts’ and ‘random slopes’, and on approaches that can be used for linear and categorical outcome measures.
As the course moves into week 2, we turn to a selection of important adjustments and extension issues that often arise when social scientists work with multilevel research data. Many examples continue to use random effects models, but with adaptations that are designed to account for more complex cluster structures (e.g. ‘three-level’, ‘four-level’, and ‘cross-classified’ data structures), more specialist analytical scenarios (e.g. longitudinal analyses), and/or other data complexities such as missing data or sampling weights.
The course also has coverage of data preparation work that is designed to enhance complex data resources. Examples here include making choices over the operationalisation of variables in a statistical modelling analysis, and ways of linking information between different data sources. Such practical topics are sometimes neglected in research training, but they feature within course 1E because they can be very valuable aspects of adapting to and properly exploiting research data.
Not all our analytical examples use random effects models. Many studies make an appropriate choice to analyse multilevel data with models that don’t use random effects but include some other adjustment or extension (e.g. ‘fixed effects’ for clusters, and ‘robust standard errors’) and our materials discuss these options as well as selected other modelling strategies such as structural equation models. Indeed, data analysis in the social science often involves comparing or choosing between several different plausible options, and the last materials in the module focus on weighing up and critically evaluating the relative benefits of multilevel statistical models in comparison to other strategies of analysis that might be considered. We hope participants will leave the course not just with an awareness of what multilevel statistical models are and how they can be formulated, but also with an appreciation of whether or not they are likely to make a useful difference in a given analytical scenario.
Course Objectives:
The course seeks to provide participants with a strong understanding of how multilevel statistical models can be applied in the social sciences, and of when they are most likely to be helpful.
Participants should learn how relevant statistical models are formulated and interpreted, the relative attractions and limitations of different model strategies and practical skills in fluently handling and analysing complex data using Stata
There are a number of benefits to learning how to understand and to implement multilevel statistical models. Multilevel models are widely used in the social sciences so there are many good reasons to learn in detail about their theory and their practical implementation. Further course materials explore several other important but under-utilised options in the specification of statistical models and in making good use of complex datasets. Training in these areas should provide course participants with the confidence to compare between the strengths and limitations of different plausible models, and equip them with valued practical skills in using software to work with data and run statistical models.
Course Prerequisites:
This is an introductory course in multilevel statistical models. Concepts, basic algebraic formulae, and extension issues that are associated with using multilevel statistical models will all be introduced in ways that should not require substantial background knowledge.
It is expected that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions begin by recapping features of the most common regression models, but move quite rapidly to relatively more complex aspects of statistical models and related extension topics. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting standard regression outputs.
The course is also best suited to participants with at least some previous experience in using statistical software through code or ‘syntax’. The course features lab sessions which primarily give examples in using Stata for tasks of both data preparation and analysis. Course materials include some introductory documentation to help with using software, and for this reason the course should still be accessible to people who have little previous software experience, however students without any background in using syntax should be prepared that extra effort will be needed near the start of the course in order to make good use of the lab exercises.
The course also features optional materials in R, SPSS and MLwiN. These do not have as extensive a range of examples as the Stata materials, but they do provide coverage of the majority of lecture topics. Ordinarily the course should be taken be participants who already use, or are planning to use, Stata. However in certain circumstances the combination of materials might still be productive for a researcher who is unlikely to work with Stata (prospective students are welcome to contact the course leader in advance to discuss if the course is likely to be suitable in this regard).
Required texts (this will be provided by ESS)
Hox, J., M. Moerbeek, and R. van de Schoot. 2017. Multilevel Analysis: Techniques and Applications (Third Edition). London: Routledge.
Rabe-Hesketh, S., and A. Skrondal. 2022. Multilevel and Longitudinal Modeling Using Stata, Fourth edition (two-volume set). College Station, Tx: Stata Press.
Background knowledge required:
Mathematics:
Calculus = Elementary
Linear Regression = Moderate
Statistics:
OLS = Elementary
Maximum likelihood = Elementary
Computer Background:
Stata = Elementary
Experience in using syntax code to run commands in at least one of Stata, SPSS or R is advantageous. It is not essential, as introductory information will be available, however lab materials do involve using syntax code in at least one of those languages. Coverage of Stata examples is more substantial by comparison to coverage of SPSS and R.
The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model, and it includes some wider topics about strategies for dealing with complex data aside from using multilevel models; 2E goes a little further on statistical details and estimation strategies for multilevel models, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.