Rob Johns is a Professor in Politics in the Department of Government at the University of Essex. Appointed at Essex in 2010, his previous post was at the University of Strathclyde in Glasgow. Rob’s research is in the fields of public opinion, political psychology and questionnaire design. He has published various books and articles based on analyses of public opinion data and is currently researching the connections between mental health and political attitudes.
This course provides an introduction to statistics for social science data analysis. We begin with key concepts – means, deviations, distributions, confidence intervals, and so on – and then move on to the core statistical methods, covering crosstabulation, t-tests, analysis of variance, correlation, and various forms of regression. There is a particular emphasis on analysis of survey data, including survey experiments – however, this remains primarily a course in statistical analysis rather than issues in survey methodology (like sampling or questionnaire design). The methods covered will be demonstrated using the computer package SPSS and a variety of example datasets. However, students are welcome to use other software and they can also bring their own datasets on which to practise the different methods.
Participants will become adept in using a wide range of statistical methods for analysing survey data. These methods are widely used by both academic and professional researchers in a wide range of fields: political science, sociology, psychology, health sciences, sports science, marketing, and so on. Participants will also acquire a good working knowledge of SPSS, the most commonly used package among survey researchers, although support will be given for those wishing to use other software. In addition to boosting participants’ current skills, this course also serves as a springboard for the study of more advanced statistical methods – many of which are available in later sessions at the Summer School.
This is an INTRODUCTORY course. Participants are not required or assumed to have anything more than basic mathematics. There will also be a full introduction to SPSS, the computer program that we will use.
Representative Background Reading
Since this is an introductory course, participants are not required to do any prior reading. However, those a bit nervous about confronting statistics may benefit from a quick look at the gentle introduction provided by:
Salkind, Neil J., 2017. Statistics for People Who (Think They) Hate Statistics (6th edn.), Thousand Oaks, CA: Sage.
Day 1: Variables: The basics
Lecture: research questions; hypotheses; independent and dependent variables; correlation and causation; prior and intervening variables; interactions; levels of measurement
Lab: what SPSS looks like; how to navigate around it; simple tables
Day 2: Descriptive statistics and distributions
Lecture: measure of central tendency; measures of dispersion (standard deviation, variance); frequency distributions; normal, skewed and bimodal distributions; why distributions matter; the sampling distribution
Lab: obtaining measures of central tendency and dispersion; graphing distributions; entering survey data
Day 3: Probability, hypotheses and significance testing
Lecture: probability and the area under the normal curve; standardising scores; comparing normally-distributed variables; the logic of inferential statistics; standard errors; confidence intervals; hypothesis-testing; the concept of statistical significance
Lab: calculating standardised scores, standard errors and confidence intervals; testing distributions for normality
Day 4: Testing differences between means I: t-tests
Lecture: comparing means across two groups; t-tests for independent samples; t-tests for dependent samples; effect sizes.
Lab: t-tests in SPSS; recoding variables
Day 5: Testing differences between means II: ANOVA
Lecture: comparing means across multiple groups; one-way ANOVA; post-hoc tests; effect sizes; multi-way ANOVA; interpreting and illustrating interactions.
Lab: ANOVA in SPSS; illustrating interactions
Day 6: Correlation and bivariate regression
Lecture: the idea of correlation; the Pearson correlation; correlation and prediction; scatterplots and lines of best fit; regression equations; residuals; accuracy of prediction and R2; standardising regression coefficients
Lab: correlations and regression in SPSS; scatterplots and lines of best fit; dealing with non-linear relationships
Day 7: Multiple regression
Lecture: why multivariate regression; collinearity and multicollinearity; partial coefficients; categorical and dummy variables in regression; multiple regression equations; comparing coefficients; model specification, parsimony and R2; re-specifying models
Lab: multiple regression in SPSS; recoding categorical variables; collinearity diagnostics
Day 8: Crosstabulation and measures of association
Lecture: reading crosstabs; the chi-squared statistics; measures of association; introducing layer variables; reading interactions from crosstabs
Lab: crosstabs in SPSS; cell options; chi-square tests; obtaining measures of association
Day 9: Logistic regression
Lecture: why avoid linear regression with dummy dependent variables; from predicted values to predicted probabilities; odds and log odds; interpreting logistic coefficients; odds ratios; model fit, pseudo-R2 and improved prediction; extensions to binary logit
Lab: logistic regression in SPSS; interpreting the results; post-estimation options
Day 10: Data reduction, scaling and reliability
Lecture: why data reduction; exploratory versus confirmatory approaches; principal components analysis and factor analysis; extraction and rotation; interpreting factors; types of scaling; developing and evaluating scales; measures of reliability
Lab: PCA and factor analysis in SPSS; illustrating dimensionality; saving factor scores; obtaining measures of reliability