Please note: This course will be delivered in person at the Colchester campus only. Online study is not available for this course.
Paul Lambert is a Professor of Sociology at the University of Stirling, UK, where he teaches courses on research methods and on social stratification. His research covers methodological topics in social survey data analysis and data management (with a particular interest in handling data on occupations and on ethnicity), and substantive studies into processes of social stratification and social inequality. His recent publications include a research monograph – Social Inequalities and Occupational Stratification – that analyses data on social interaction patterns and social inequalities, and an introductory textbook – What is… Quantitative Longitudinal Data Analysis – that focusses upon the secondary analysis of longitudinal survey datasets.
Statistical models are important tools for analysing quantitative datasets. In the social sciences, it is also common to refine or adjust models, beyond their standard formulations, in order to take account of the complexities of ‘real life’ social data. Participants in course 1E will learn about statistical models in the social sciences and about certain popular strategies of using models to analyse complex or multilevel data. Students will learn:
– how to specify, formulate and interpret common types of statistical model
– how to understand, implement and interpret multilevel models
– how to assess and compare other ways of taking account of complex and multilevel data within a modelling framework
– how to enhance complex data such as by merging variables or datasets and analysing them with appropriate statistical models
Description of course activities
Students taking course 1E explore the challenges of applying statistical models to multilevel and complex data. Daily teaching sessions comprise lectures, followed by lab exercises that implement examples of things that were described in lectures. The teaching style tries to present statistical materials in an accessible manner that assumes only introductory previous knowledge. Lab sessions contain a wealth of illustrative examples, some of which provide opportunities to develop understanding of quite challenging research issues.
The course begins by reviewing how statistical models work in general terms, and how we implement them in social research. Materials cover general aspects of the appropriate specification and interpretation of regression and statistical models – for instance, on building models, interpreting their parameters, and assessing how well they fit the data.
The course then describes features of social science data that are often described as ‘complex’. We stress how these are normal features of social research data, and that there are relevant adjustments that we can make to a statistical analysis in response.
The course then examines a selection of important strategies that are relevant when analysing multilevel and complex data. The specific topics that are explored are chosen because they arise frequently in research, and the strategies that we apply to them can easily impact the results of our analyses.
The first concerns the use of multilevel models with random effects (about half of the course materials). Multilevel models are a popular and widely used tool for analysing ‘clustered’ or ‘hierarchical’ datasets. Social science data often features the clustering of individual cases within larger units of analysis – for example in household surveys, when there may be several individual responses clustered within the same household. Multilevel models with random effects can allow us to use statistical models to analyse such data in a way which appropriately takes account of that clustering. In doing so, multilevel models are also linked to valuable conceptual distinctions such as the interplay between micro- and macro-level influences. The course materials introduce and contextualise multilevel models, with practical training in running them and comparing them to other models. We spend time on models with ‘random intercepts’ and ‘random slopes’, with linear and categorical outcome measures, and we explore models that can be used with increasingly complex cluster structures (e.g. ‘three-level’, ‘four-level’, and ‘cross-classified’ data structures).
Further content then introduces other important adjustments that social scientists often make in response to multilevel and complex data. It is also possible to analyse multilevel data with models that don’t use random effects but include some other adjustment or extension (e.g. ‘fixed effects’ for clusters, and ‘robust standard errors’). Also relevant are adjustments to models in response to complex sampling and missing data, such as by using sampling weights within a model-based analysis. As well as through model specification, we also often adapt to complex or multilevel data with data preparation work that is designed to enhance complex data resources. Examples here include making choices over the operationalisation of variables in a statistical modelling analysis, and ways of linking information between different data sources and organising data files that aren’t neatly rectangular. Such practical topics are sometimes neglected in research training, but they feature within course 1E because they can be very valuable aspects of adapting to and properly exploiting complex and multilevel data.
The course seeks to provide participants with a strong understanding of how statistical models can be applied in the social sciences when data is complex and/or multilevel in its nature.
Participants should learn
- how relevant statistical models are formulated and interpreted
- the relative attractions and limitations of different model strategies
- practical skills in fluently handling and analysing complex data using one or more relevant software packages
There are a number of benefits to learning how to understand and to implement statistical models for complex and multilevel data. Multilevel models are widely used in the social sciences so there are many good reasons to learn in detail about their theory and their practical implementation. Further course materials explore several other important but under-utilised options in the specification of statistical models and in making good use of complex datasets. Training in these areas should provide course participants with the confidence to compare between the strengths and limitations of different plausible models, and equip them with valued practical skills in using software to work with data and run statistical models.
This is an introductory course, designed for people who have little or no previous experience in applying models to multilevel or complex data. It is expected that participants will have had some previous training in social statistics – for example, the course is best suited to participants who are fluent in popular descriptive analytical techniques and some of the statistical tests behind them (e.g. chi-square tests; correlation values), and who have had at least some previous exposure to using regression models in the social sciences (e.g. multiple regression and/or logistic regression). Teaching sessions will take basic versions of these regression models as a starting point, and build onwards to multilevel models and other related extension topics in statistical modelling. Most participants are likely to benefit from preparatory study or revision of materials which cover generating and interpreting regression outputs.
The course is also best suited to participants with at least some previous experience in using statistical software packages for social science data analysis. The course features lab materials available in several packages (Stata, SPSS, R and MLwiN, with Stata used most often). The lab materials also make use of several different social science datasets. Previous exposure to the ‘syntax’ languages of at least one of these packages will be an advantage. Course materials include some introductory documentation to help with using software, and for this reason the course should still be accessible to people who have little previous experience, however students without any background in the programming of software using syntax should be prepared that extra effort will probably be needed near the start of the course in order to make good use of the lab exercises.
This text will be provided by ESS: Hox, J., Moerbeek, M. and van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications, Third edition. London: Routledge.
For Stata users, we also recommend accessing or purchasing the following:
Rabe-Hesketh, S. and Skrondal, A. (2022) Multilevel and Longitudinal Modelling Using Stata, 4th Edition (2 volume set). College Station, Tx: Stata Press.
Background knowledge required:
Calculus = Elementary. Some background knowledge is required.
Linear Regression = Elementary. Some background knowledge is required.
OLS = Elementary. Some background knowledge is required.
Stata = Elementary. Some background knowledge is required.
Experience in using syntax code to run commands in at least one of Stata, SPSS or R is advantageous. It is not essential, as introductory information will be available, however lab materials do involve using syntax code in at least one of those languages. Coverage of Stata examples is more substantial by comparison to coverage of SPSS and R.
The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model, and it includes some wider topics about strategies for dealing with complex data aside from using multilevel models; 2E goes a little further on statistical details and estimation strategies for multilevel models, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.