Topics in supervised learning, unsupervised learning, dimension reduction, and feature selection. Course covers multiple methods (e.g., regression, tree-based models, and deep-learning models) and their implementation in the R computing environment. An emphasis is placed on rigorously training and testing models to achieve high reliability and accuracy.
Additional Requirements for Graduate Students: In addition to the undergraduate requirements, graduate students will complete an extra project deliverable involving the development and evaluation of a machine learning model, utilizing data they collect independently.
Athena Title
Machine Learning Bus Analytics
Undergraduate Prerequisite
MIST 4600 or MIST 4600E with a minimum grade of C
Graduate Prerequisite
MIST 4600 or MIST 4600E with a minimum grade of C
Semester Course Offered
Offered every year.
Grading System
A - F (Traditional)
Student learning Outcomes
Students will understand when and how to use supervised machine learning for prediction of numerical and categorical outcomes. They will learn how to improve model performance using cross validation.
Students will understand when and how to use unsupervised machine learning to identify patterns and associations in data.
Students will learn how to use machine learning to gain insights from unstructured data such as text or images. Students will understand principles of text generation and completion.
Students will learn techniques for communicating machine learning results and translating predictions into business decisions.
Topical Outline
Module 1: Basics of Machine Learning
Introduction to loss functions, features, labels, and parameters
Basics of linear algebra, calculus, and probability
Training versus testing data and cross-validation
Bias-variance tradeoff, overfitting, and underfitting
Module 2: Regression Models
Linear regression in R
Interaction terms and dummy variables
Generalized linear models
Regularization
Module 3: Classification Models
Logistic regression
Tree-based models
Support vector machines
Neural networks
Module 4: Unsupervised Methods
Centroid methods such as K-means
Mixture models
Dimension reduction, including PCA and factor analysis
Module 5: Prescriptive Models
Basic optimization concepts, with an introduction to linear programming
Simulation in R, including Monte Carlo methods
Module 6: Deployment
Basic design principles of a machine learning pipeline
Integrating machine learning models into a cloud environment
Best practices for maintaining and updating the pipeline
Institutional Competencies Learning Outcomes
Analytical Thinking
The ability to reason, interpret, analyze, and solve problems from a wide array of authentic contexts.