Topics in supervised learning, unsupervised learning, dimension reduction, and feature selection. Course covers multiple methods (e.g., regression, tree-based models, and deep-learning models) and their implementation in the R computing environment. An emphasis is placed on rigorously training and testing models to achieve high reliability and accuracy.
Additional Requirements for Graduate Students: In addition to the undergraduate requirements, graduate students will complete an extra project deliverable involving the development and evaluation of a machine learning model, utilizing data they collect independently.
Athena Title
Machine Learning Bus Analytics
Undergraduate Prerequisite
MIST 4600 or MIST 4600E with a minimum grade of C
Graduate Prerequisite
MIST 4600 or MIST 4600E with a minimum grade of C
Semester Course Offered
Offered every year.
Grading System
A - F (Traditional)
Student Learning Outcomes
Students will learn fundamentals of machine learning, including cross-validation, loss functions and model fit, and feature selection.
Students will learn tradeoffs in machine learning design (bias-variance tradeoff, overfitting vs. underfitting, parameter tuning).
Students will learn supervised learning for both regression and classification problems.
Students will learn unsupervised learning and dimension reduction.
Students will learn the use of machine learning models for prescriptive applications in business.
Students will learn deployment and maintenance of machine learning models.
Students will learn supervised learning in R (linear regression, logistic regression, decision trees, SVM, neural networks).
Students will learn unsupervised learning in R (centroid clustering, mixture models).
Students will learn to conduct cross-validation as part of a machine learning training pipeline.
Students will learn to conduct simulations to test alternative policy choices.
Students will learn to implement a cloud-based machine learning pipeline.
Topical Outline
Module 1: Basics of Machine Learning
Introduction to loss functions, features, labels, and parameters
Basics of linear algebra, calculus, and probability
Training versus testing data and cross-validation
Bias-variance tradeoff, overfitting, and underfitting
Module 2: Regression Models
Linear regression in R
Interaction terms and dummy variables
Generalized linear models
Regularization
Module 3: Classification Models
Logistic regression
Tree-based models
Support vector machines
Neural networks
Module 4: Unsupervised Methods
Centroid methods such as K-means
Mixture models
Dimension reduction, including PCA and factor analysis
Module 5: Prescriptive Models
Basic optimization concepts, with an introduction to linear programming
Simulation in R, including Monte Carlo methods
Module 6: Deployment
Basic design principles of a machine learning pipeline
Integrating machine learning models into a cloud environment
Best practices for maintaining and updating the pipeline