Please note: This course will be taught online only. In person study is not available for this course. 

Andrew Bell is Senior Lecturer in Quantitative Social Sciences at the Sheffield Methods Institute at the University of Sheffield. Before moving to Sheffield, Andy was a lecturer at the University of Bristol, where he also completed his undergraduate degree (in Geography) and PhD (in Advanced Quantitative Methods). His current substantive research focuses on mental health from a life course perspective, but also spans a diverse range of other subject areas, including geography, political science, social epidemiology and economics. Methodologically, Andy’s interests are in the development and application of multilevel models, with work focusing on age-period-cohort analysis and fixed and random effects models.

Overview
This course is an applied introduction to multilevel modelling that aims to give you deep understanding of the standard model. It does not presume any prior knowledge in multilevel modelling but does require you to be very familiar with multiple regression analysis.

Course Content
Populations commonly exhibit complex structure with many levels, so that patients (level 1) are assigned to clinics (level 2); while individuals (1) may ‘learn’ their health-related behaviour in the context of households (2) and local cultures (3). In many cases, the survey design reflects the population structure, so in a survey of voting intentions respondents (1) are clustered by constituencies (2). Multilevel models are currently being applied to a growing number of social science research areas, including educational and organisational research, epidemiology, voting behaviour, sociology, and geography. Data at different levels are often seen as a convenience in the design which is a nuisance in the analysis. However, by using multilevel models we can model simultaneously at several levels, gaining the potential for improved estimation, valid inference, and a better substantive understanding of the realities of social organisation.

In the first week of the course, and building on standard single-level models, we develop the two-level model with continuous predictors and response. Examples include house-prices varying over districts, and pupil progress varying by school. In the second week, these models are extended to cover complex variation, both within and between levels, three-level models, and models with categorical predictors and response (the multilevel logit model). We end with a consideration of estimators including maximum likelihood operationalized through iterative generalised least squares. Throughout the course, we shall use graphical examples, verbal equations, algebraic formulation, class-based model interpretation, and practical modelling using either the software package MLwiN or R, depending on the student’s preference and past experience. We use these packages because of their flexibility, graphics capability and the possibilities of estimating model via maximum likelihood and MCMC methods.

Course Objectives
On completion of the course, participants will be able to recognise a multilevel structure, specify a multilevel model with complex variation at a number of levels, and fit and interpret a range of multilevel models. The course does not cover multilevel analysis of panel data, multivariate responses, or survival data, although the course does provide the essential groundwork for these extensions. This course is appropriate if you are analysing a survey with complex structure, are interested in the importance of contextual questions, or if you need to undertake a quantitative performance review of an organisation. A distinctive feature of the course is the focus on variance functions estimated simultaneously as several levels.

Course Prerequisites
This is not an introductory course to statistical modelling, as participants require familiarity with regression modelling and inferential statistics, especially regression intercepts and slopes, standard errors, t-ratios, residuals, and the concepts of variance and co-variance. Even so, the aim is not to cover mathematical derivations and statistical theory, but to provide a conceptual framework and ‘hands-on’ experience. It does not require prior knowledge of multilevel modelling. Students choosing to conduct practical exercises in R should have a moderate level of experience using R; no past experience of the software is required for those choosing to use MLwiN.

Remedial Reading:
Weisberg, S. 1980. Applied Linear Regression. Wiley. Chs. 1 and 2. Or equivalently, participants are strongly encouraged to undertake the Lemma course on regression modelling before coming to Essex; modules 1 to 3 of http://www.cmm.bristol.ac.uk/learning-training/course.shtml

Background Reading
Paterson, L., and Goldstein, H. 1992. ‘New statistical models for analyzing social structures: An introduction to multilevel models’, British Education Research Journal, 20:190-9.

Jones, K., and Duncan, C. 1998. ‘Modelling context and heterogeneity: Applying multilevel models’. http://www.oxfordscholarship.com/view/10.1093/0198292376.001.0001/acprof-9780198292371-chapter-6

Scarbrough and E. Tanenbaum (Eds.), Research Strategies in the Social Sciences. Oxford University Press.

Jones, K Multilevel models for geographical research; freely downloadable from https://www.researchgate.net/profile/Kelvyn_Jones

Background knowledge required
Statistics

OLS = moderate

Maths

Linear Regression = moderate

Software/Programming background

R = moderate (if using R)

The modules 1E and 2E overlap in several areas of their coverage. Both courses seek to introduce core aspects of multilevel models as well as covering selected extension topics associated with more advanced specifications. 1E tries to take a more introductory approach with regard to how statistical models are specified and how multilevel models link with other types of statistical model, and it includes some wider topics about strategies for dealing with complex data aside from using multilevel models; 2E goes a little further on statistical details and estimation strategies for multilevel models, and seeks to ground its methodological examples in detailed discussions of research applications. Both courses feature software examples but 1E is weighted towards Stata examples, with lighter coverage of R, SPSS and MLwiN; 2E makes most use of MLwiN and R, with some additional illustration of Stata. Some students choose to take both courses – if doing so, there will be some reiteration of some content, but there are plenty of detailed materials in both courses that point in different directions.

Overview: this course is an applied introduction to multilevel modelling that aims to give you deep understanding of the standard model. It does not presume any prior knowledge in multilevel modelling but does require you to be very familiar with multiple regression analysis

What are multilevel models? Populations commonly exhibit complex structure with many levels, so that patients (at level 1) are assigned to clinics (at level 2); pupils (1) attend schools (2), while individuals (1) may ‘learn’ their health-related behaviour in the context of households nested within households (2), in postcode sectors (3) in districts (4) in regions (5). In many cases the survey design reflects the structure of the population so that in a survey of voting intentions, the respondents (1) are clustered by constituencies (2). Longitudinal designs also give rise to multilevel structures so that there could be annual, repeated measures of income (level 1) on individuals (2) in different sectors of the economy (3). Another type of repeated-measures design is when the repetition occurs at the higher level. Thus schools (level 3) could be repeatedly monitored every year (2) for the performance of their students (1). Another possibility is a ‘multivariate’ structure where a number of different but related measurements are made on individuals.

For example, there may be measurements of smoking, drinking and eating (all at level 1) for individuals (2) in communities (3). Yet another possibility arises in meta-analysis where an attempt is made to summarise quantitatively the results for subjects (level 1) nested within several studies (2). All these examples have so far been strictly hierarchical so that each lower unit nests exactly into one, and only one, higher-level unit. It is also possible to have cross-classified structures such as pupils (level 1) nested in neighbourhoods (2) and schools (also level 2), or respondents (1) in a survey nested within areas (2) and interviewers (2). In traditional analysis these levels in the data are often seen as a convenience in the design which has become a nuisance in the analysis. In contrast, by using multilevel models we are able to model simultaneously at several levels, gaining the potential for improved estimation, valid inference, and a better substantive understanding. In substantive terms, by working simultaneously at the individual and contextual levels, these analytic models begin to reflect the social organisation of life. By providing estimates of both the average effect of a variable over a number of settings, and the extent to which that effect varies over settings, these models provide a means of ‘thick’ quantitative description. The complexity of the real-world of people and places, both with a history, is not ignored in the pursuit of a single universal equation.

DAY TOPIC

1. Hierarchies and levels; introducing multilevel structures; unit and classification diagrams, fixed and random classifications, what alternatives from of analysis are available, and why they are inferior to random coefficient modelling?

2. Contextuality and varying relationships

3. From graphs to equations; random intercepts; a worked example; houses within neighbours; context and composition

4. Random-slope models; exemplification with analysis of school performance

5. Variables at higher-levels; modelling the effects of environmental quality on house prices

6. Modelling population heterogeneity: between-group variability, understanding dependency and autocorrelation, variance functions

7. Models with complex heterogeneity at level 1; between-individual variability; significance testing

8. Models with categorical predictors; complex heterogeneity at level 1 and level 2

9. Estimators: properties of shrinkage estimates, full information maximum likelihood and REML, Bayesian MCMC estimation

10. Going further – models with categorical responses, longitudinal analysis, etc; review and conclusions.