An introduction to advanced analytics techniques in data
science, including random forests, semi-supervised learning,
spectral analytics, randomized algorithms, and just-in-time
compilers. Distributed and out-of-core processing.
Additional Requirements for Graduate Students: Each graduate student will present a recent research article to
the class and do extra project work. In the homework assignments
and exams, graduate students will be assigned additional
graduate-level questions. They will also be graded using a
stricter scale.
Athena Title
Data Science ML II
Prerequisite
CSCI 3360 or CSCI 3360E
Semester Course Offered
Offered every year.
Grading System
A - F (Traditional)
Student Learning Outcomes
Students will have a deep knowledge of sophisticated data science techniques for making sense of data across domains.
Students will process data that is incomplete or missing, how to use hybrid techniques to analyze such as semi-supervised learning, and are introduced to distributed programming using Hadoop and Spark.
Students will explore just-in-time compilation, both in Python and in the new scientific computing language, Julia.
Topical Outline
Scientific programming with Python
Review of core statistics and probability
Information theory and uncertainty
Collection and integration of structured versus unstructured data