Course Description
An introduction to advanced analytics techniques in data science, including random forests, semi-supervised learning, spectral analytics, randomized algorithms, and just-in-time compilers. Distributed and out-of-core processing.
Additional Requirements for Graduate Students:
Each graduate student will present a recent research article to
the class and do extra project work. In the homework assignments
and exams, graduate students will be assigned additional
graduate-level questions. They will also be graded using a
stricter scale.
Athena Title
Data Science II
Prerequisite
CSCI 3360
Semester Course Offered
Offered every year.
Grading System
A - F (Traditional)
Course Objectives
This course provides students with deep knowledge of sophisticated data science techniques for making sense of data across domains. Students are instructed in how to process data that is incomplete or missing, how to use hybrid techniques to analyze such as semi-supervised learning, and are introduced to distributed programming using Hadoop and Spark. Furthermore, students are given the opportunity to explore just-in-time compilation, both in Python and in the new scientific computing language, Julia. The course is appropriate both for students preparing for research in data mining and machine learning, as well as Bioinformatics, Science and Engineering students who want to apply data mining techniques to solve problems in their fields of study.
Topical Outline
Scientific programming with Python Review of core statistics and probability Information theory and uncertainty Collection and integration of structured versus unstructured data Decision trees and random forests Randomized algorithms Semi-supervised learning and label propagation Spectral analytics Out-of-core data processing Just-in-time compilation and Julia Data visualization Introduction to distributed programming
Syllabus