Applied methods in regression analysis with implementation in
R. Topics include linear regression with mathematical examination
of model assumptions and inferential procedures; multiple
regression and model building, including collinearity, variable
selection and inferential procedures; ANOVA as regression
analysis; analysis of covariance; diagnostic checking techniques;
generalized linear models, including logistic regression.
Additional Requirements for Graduate Students: Additional and/or alternative problems of a more challenging
nature will be required for graduate students on homework
assignments and exams. Typically, these problems will be of a
more theoretical nature than those required of undergraduate
students, or will require more self-study of material not
emphasized during lectures, or will require more intricate
and/or time-consuming data analysis tasks.
Athena Title
Applied Regression Analysis
Undergraduate Prerequisite
(STAT 4210 or STAT 4210H) and (MATH 2250 or MATH 2250E) and (STAT 2010 or STAT 2100H or STAT 2360-2360L)
Graduate Prerequisite
STAT 6210 or STAT 6210E or STAT 6310 or STAT 6315 or STAT 6315E or permission of department
Semester Course Offered
Offered fall, spring and summer
Grading System
A - F (Traditional)
Student Learning Outcomes
Students will describe the concepts underlying linear regression analysis, including the assumptions, limitations, and interpretation of regression coefficients.
Students will interpret the results of regression analysis correctly, including the significance of predictor variables, the goodness of fit of the model, and the practical implications of regression coefficients.
Students will understand the assumptions underlying linear regression analysis and be able to check for violations of these assumptions using diagnostic tools such as residual plots, normality tests and independence tests.
Students will develop techniques for building regression models, including variable selection methods such as (forward selection, backward elimination, stepwise regression) and all possible subset selection methods using R-square, Adjusted R-square, Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC) and Cross-validation.
Students will apply regularized regression techniques such as Ridge regression and Least Absolute Shrinkage Selection Operator (LASSO) to combat multicollinearity and overfitting.
Students will predict a binary response using logistic regression models with one or more predictors.
Topical Outline
After a brief review of techniques, concepts, and mathematical examination of the model assumptions associated with simple linear regression, the course provides an in-depth coverage of multiple linear regression. This includes model fitting,
estimation and prediction, diagnostics and model adequacy checking, transformations, including Box-Cox, leverage and
influence measures, including Cook’s distance, polynomial regression, indicator variables, collinearity, interaction
terms, variable selection based on AIC, BIC, cross-validation and other measures, residual analysis, and model building. A brief introduction to generalized linear models, including logistic regression, is also provided. Time permitting, special topics will be chosen from ridge regression, survival methods for censored time-to-event data, linear mixed models, non-linear mixed effects models, and generalized estimating equations. All procedures will be covered in R and
possibly other statistical software packages.