Course Description
The methodology of categorical data analysis and its applications. The course covers descriptive and inferential methods for contingency tables, an introduction to generalized linear models, logistic regression, multinomial response models, regression for counts, and methods for categorical data from matched pairs.
Additional Requirements for Graduate Students:
Additional and/or alternative problems of a more challenging
nature will be required for graduate students on homework
assignments and exams. Typically, these problems will be of a
more theoretical nature than those required of undergraduate
students, or will require more self-study of material not
emphasized during lectures, or will require more intricate
and/or time-consuming data analysis tasks.
Athena Title
Categorical Data Analysis
Undergraduate Pre or Corequisite
STAT 4220 and STAT 4510/6510
Grading System
A - F (Traditional)
Course Objectives
Students will learn the categorical and discrete data types and why and how the analysis of these types of data differ from the analysis of continuous data. They will learn what contingency tables are and how to use them to summarize categorical data. They will learn the descriptive and inferential techniques to analyze contingency table data and how to apply them. They will learn the structure of regression models for categorical and discrete data, including logistic regression; Poisson and negative binomial log-linear regression for counts; and various multinomial response regression models. They will learn how to specify these models correctly, how to fit them, and how to interpret results of fitted models properly to draw meaningful conclusions about real data. Students will learn the assumptions underlying the statistical models and methods taught in the course and how to assess their validity. More generally, they will learn the domains of application of the methods to be covered and how to choose an appropriate statistical methodology to describe and draw inferences from categorical data. Statistical software will be integrated into the course and students will learn how to use software to implement analyses and to produce results that are suitable for communicating statistical information clearly in a non-technical manner. Students will develop their abilities to communicate statistical information in both written and oral format.
Topical Outline
The first part of the course will introduce types and scales of categorical and discrete data; contingency tables; measures of association and summary statistics for contingency table data, probability distributions for contingency table datai including the binomial, multinomial, Poisson, and hypergeometric distributions; and inferential methods for contingency tables, including both large-sample methods and exact methods. Next, generalized linear models will be introduced with an emphasis on the structure of this model class and its scope, including an overview of the important special cases that fall within it. Special cases of the generalized linear model will then be introduced and discussed, each in turn. These special cases will include the logistic regression model, loglinear regression models for unbounded counts, and multinomial regression models. The proper handling of overdispersion relative to these standard model classes will be discussed. Statistical methods and models for categorical data from matched pairs and longitudinal data will be introduced toward the end of the course. Topics will be motivated and illustrated with real data examples throughout the course, and the practical implementation of methodology will be emphasized.
Syllabus