Exploration of advanced statistical learning techniques, covering nonlinear methods, dimension reduction, classification, clustering, ensemble methods, and model evaluation. Use of simulations for evaluating, comparing, and understanding methods. Emphasizing practical applications, students will fit models using Python and R to solve real-world data science challenges.
Athena Title
Elements of Statistical Learn
Prerequisite
[(STAT 6420 and (STAT 4365/6365 or STAT 4365E/6365E)] or permission of department
Semester Course Offered
Offered spring
Grading System
A - F (Traditional)
Student learning Outcomes
Students will differentiate between supervised and unsupervised learning approaches and analyze the bias-variance tradeoff in model selection.
Students will assess model performance using cross-validation, performance metrics, and regularization techniques to prevent overfitting and improve generalization.
Students will apply nonlinear modeling techniques, including polynomial regression, splines, and kernel methods, to capture complex relationships in data.
Students will implement dimension reduction techniques such as PCA, and manifold learning to improve model interpretability and efficiency.
Students will compare and evaluate classification and clustering methods, including logistic regression, SVM, k-NN, k-means, and Gaussian mixture models, for various data scenarios.
Students will construct ensemble learning models, including bagging, boosting, and stacking, to enhance predictive accuracy and model robustness.
Students will use computer simulations to assess and compare statistical methods.
Students will apply statistical learning techniques in R and Python to analyze real-world datasets and communicate findings effectively through written reports.
Topical Outline
Introduction to Statistical Learning
o Overview of supervised vs. unsupervised learning
o Bias-variance tradeoff and model complexity
o Key concepts in statistical learning theory
Monte Carlo Simulations
o Formulating the assumptions of a simulation.
o Implementing simulations in computer programs.
o Using simulations to assess statistical methods.
Model Assessment and Evaluation
o Cross-validation and resampling methods
o Performance metrics (AUC, F1-score, log-loss)
o Overfitting, regularization, and interpretability
Nonlinear Methods
o Polynomial regression and splines
o Generalized additive models (GAMs)
o Kernel methods and smoothing techniques
o Tree based methods
Dimension Reduction Techniques
o Principal Component Analysis (PCA)
o Lasso
o Manifold learning (t-SNE, UMAP)
Classification Methods
o Logistic regression and discriminant analysis
o Support vector machines (SVM) and kernel methods
o k-Nearest Neighbors (k-NN)
Clustering Methods
o k-Means and hierarchical clustering
o Model-based clustering and Gaussian Mixture Models (GMM)
o Spectral clustering and latent variable models
Ensemble Methods
o Bagging and boosting
o Random forests and gradient boosting machines
o Stacking and blending strategies
Practical Applications
o Applications will be emphasized throughout the course.
o Case studies in healthcare, finance, marketing, and other areas.
o End-to-end model deployment in Python and R
o Ethical considerations and model fairness
Institutional Competencies Learning Outcomes
Analytical Thinking
The ability to reason, interpret, analyze, and solve problems from a wide array of authentic contexts.