miles

Jeremy Miles is a Senior Quantitative Analyst at Google. He is author (with Mark Shevlin) of Applying Correlation and Regression Analysis (Sage. 2001); author of Research Methods and Statistics (Crucial Press. 2001), author (with Phil Banyard) of Understanding and Using Statistics in Psychology (Sage, 2007), co-author (with Andy Field) of Discovering Statistics Using SAS (Sage, 2010), co-editor (with Paul Gilbert) of A Handbook of Research Methods in Clinical and Health Psychology (Oxford, 2005) and co-editor (with Brian Stucky) of Quantitative Research in Psychology [don’t even think about buying this book]. He has served as chair of the British Psychological Society Mathematics, Statistics and Computing Section, and is associate editor of the British Journal of Mathematical and Statistical Psychology; Frontiers in Measurement and Quantitative Psychology, and PLOSOne. ,He also serves as statistical editor of the British Journal of Clinical Psychology, and the British Journal of Health Psychology.

Course Content
The course will cover the theory and practice of regression analysis in its various forms. Regression models (broadly defined) are models which attempt to use predictors to explain a single outcome variable. This outcome variable may be continuous, ordinal, categorical or discrete counts and the predictors may be interval or categorical. The predictors may be linear, non-linear, or interactive.

Although the focus of the course is applying regression, we will start by looking at the meaning of models in statistics. We will consider the mean, correlation and regression as models, and regression to the mean. We look at describing models, and at statistical significance and confidence intervals (although we expect you to have prior knowledge of these areas, we will refresh them). In the third part we move on to develop more complex models (e.g. hierarchical regression, categorical independent variables), and consider the implications of the assumptions made in regression analysis (including the effect of their violation). We then look at extending regression in different ways: logistic regression, path analysis, interactions and Poisson regression. Throughout the module we will cover examples in Stata, and occasionally use other programs, e.g. GPower for power analysis.

Course Objectives
The course will enable participants to carry out a range of regression analyses. It is appropriate for participants who have covered some statistics, and wish to extend their knowledge to modelling more complex social science phenomena. The module provides appropriate background for people who want to go on to modules such as multilevel modelling, probit and logit analysis, or structural equation modelling.

Course Prerequisites
The Course starts from the beginning – we cover the mean, standard deviation, statistical significance, etc, but participants should probably consider this a refresher, and should have knowledge of descriptive and inferential statistics. Similarly, while we begin with simple correlation and regression, we will be thinking about these in some (possibly) new ways. We will use Stata – if you’re not familiar with Stata this is not a problem; it’s very straightforward and students pick it up within a few minutes. We shall use Excel a little at the start as well. (If students prefer, they can use R instead.)

Remedial Reading
We will expect that you have some knowledge of descriptive statistics, statistical significance, correlation, sampling and estimation, and will only cover these things briefly. Any introductory statistics book from your field will cover these issues. One example would be:

Miles, J and Banyard, P (2007). Understanding and using statistics in psychology. London: Sage.

But there are many others that you may be familiar with, which are just as good, or even better.

If you’re not familiar with Stata, a little practice would not hurt, but is not necessary, and the same goes for Excel. (Please feel free to contact me if you would like guidance on what you need to know – Jeremy.miles@gmail.com).

Representative Background Reading
Cohen, J., P. Cohen, et al. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. (3rd ed.). Erlbaum. (Very long, very thorough, best if your background is psychology).
Miles, J.N.V., and, Shevlin, M. 2001. Applying Regression and Correlation. Sage. (The course closely follows this book – making it worth buying to cover the course, but making it not worth buying, because the material is similar to the handouts – in addition, a second edition will be appearing soon, so perhaps you should save your money).
Pedhazur, E. J. 1997. Multiple Regression in Behavioral Research. Harcourt Brace. (Not everyone likes the style of this book, so have a look before you buy it)
Studenmund, A. H. 2010. Using Econometrics: A Practical Guide. Addison Wesley. (This book focuses on econometrics, which has a slightly different emphasis from that we will take; it’s also ridiculously expensive – older editions are fine if you can get a second hand one).

A couple of more gentle starters are:
Allison, P. 1999. Multiple Regression: A Primer. Pine Forge Press.
Garson, D. (2012). Multiple regression. Statistical Associates Publishers. (This book is only available on Kindle, but it’s $5.)

Different books have different emphases, and we shall be talking about some of these issues in the classes.

This course is about doing, and understanding, regression analysis. If you have a practical understanding of regression analysis, it is possible to do regression, and understand computer outputs. We will use computer-based exercises, but we will go beyond simply making sense of those computer based exercises, we will try to look at some of the theory that underlies the computations.

If you only have a practical understanding of regression analysis, issues such as violations of assumptions are problematic – why does it matter that a particular assumption has been violated? What are the substantive interpretations of violations of the assumptions? What remedial actions can, and should, be taken, to improve the model?

To investigate the theory of regression, we will use standard statistical software (Stata) to analyse regression models, this takes all of the hard work away from the analysis. We will also make use of spread sheets (MS Excel) to investigate regression models at a deeper level. Throughout the course we will focus on practical examples, however occasionally we rely on mathematical notation, x’s, y’s, ’s and ’s. You should therefore have some (not a great deal) familiarity with algebraic notation, and should not be too frightened of an equation. We will not ask you to be able to do matrix algebra, but it might be helpful to know something about matrix algebra – simply that it exists, and the operations that can be done to matrices (multiplication of a matrix by a scalar, by a vector and by a matrix; matrix inversion; calculation of determinant). (We will briefly cover matrix algebra as an option.)

The course will be taught in a computer room, and will be a mixture of (slightly) more formal lecture, and less formal computer exercises.

There will be computer-based exercises and pen-and-paper based exercises. We do not usually have enough time to discuss the paper and pencil exercises in class, but we will try if there is a lot of enthusiasm (and anyone actually does them). The computer-based exercises will provide an opportunity to analyse your own data, if you want to bring it. If there is demand (and enough people have done it) we will try to set aside some time for discussion of different people’s problems.

To get the most out of the course, you should have a decent knowledge of descriptive and inferential statistics, and some knowledge of ordinary algebra. It is very useful if you know the basics of Stata, including use of the compute and recode commands. Please do as much background reading as you can, and if you have any problems or questions, email me on Jeremy.miles@gmail.com.

Reading: These are three fairly substantial (and expensive) books that cover what we cover (and more). Pedhazur and Cohen, et al. is probably better for those with a more psychology/sociology background, Studenmund is better for those with more of an econometrics background.

Cohen, J.. Cohen, P., West, S.G. and Aiken, L.S. (2003). Applied Multiple Regression / Correlation Analysis for the Behavioral Sciences. Hillsdale, NJ: Erlbaum.

Pedhazur, E.J. (1997). Multiple regression in behavioural research, 3rd Ed. New York: Harcourt Brace Jovanovich.

Studenmund, A.H. (2001). Using econometrics: a practical guide, 4th Edition. Boston: Addison-Wesley.

The course will cover the same material as: Miles, J. and Shevlin, M. (2001). Applying regression and correlation: a guide for students and researchers. London: Sage. In one sense, it would be good to read this book as a guide to the course. In another, much of the material in it will be presented in handouts, and so you might be better reading another book.

Background: For smaller (and cheaper) books on regression which will provide some background introduction, but not cover everything in the course, have a look at:
Allison, P.D. (1999). Multiple regression. Pine Forge Press.

Hutcheson, G. and Sofreniou, N. (1998). The multivariate social scientist. London: Sage.

Lewis-Beck, M.S. (1980). Applied Regression : An Introduction. Hillsdale, NJ: Sage. (This is one of the little green books on quantitative methods.)

For smaller (and even cheaper) books that examine specific issues in regression analysis

Berry, W.D. (1993). Understanding regression assumptions. Sage University Series Quantitative Applications in the Social Sciences, no. 92). Newbury Park, CA: NJ: Sage.

Fox, J. (1991). Regression diagnostics. Sage University Series Quantitative Applications in the Social Sciences, no. 79). Newbury Park, CA: NJ: Sage.

Menard, S. (1995). Applied logistic regression analysis. Sage University Series Quantitative Applications in the Social Sciences, no. 34). Newbury Park, CA: NJ: Sage.

Pampel, F.C. (2000). Logistic regression: a primer. Sage University Series Quantitative Applications in the Social Sciences, no. 132). Newbury Park, CA: NJ: Sage.

Detailed Outline
The course is divided into 15 “lessons”, some of these take less than one day, some take more than one day to cover.

Part I: Theory of Regression
1. Models in statistics
2. Models with more than one parameter: regression
3. Why regression?
4. Samples to populations
5. Introducing multiple regression
6. More on multiple regression
Part 2: Application of regression
7. Categorical predictor variables
8. Assumptions in regression analysis
9. Issues in regression analysis
10. Non-linear regression
11. Moderators (interactions) in regression
12. Mediation and path analysis
Part 3: Advanced Types of Regression
13. Logistic Regression
14. Poisson regression