Course ID: | CSCI 4360/6360. 4 hours. |
Course Title: | Data Science II |
Course Description: | An introduction to advanced analytics techniques in data
science, including random forests, semi-supervised learning,
spectral analytics, randomized algorithms, and just-in-time
compilers. Distributed and out-of-core processing. |
Oasis Title: | Data Science II |
Prerequisite: | CSCI 3360 |
Semester Course Offered: | Offered every year. |
Grading System: | A-F (Traditional) |
|
Course Objectives: | This course provides students with deep knowledge of
sophisticated data science techniques for making sense of data
across domains. Students are instructed in how to process data
that is incomplete or missing, how to use hybrid techniques to
analyze such as semi-supervised learning, and are introduced to
distributed programming using Hadoop and Spark. Furthermore,
students are given the opportunity to explore just-in-time
compilation, both in Python and in the new scientific computing
language, Julia. The course is appropriate both for students
preparing for research in data mining and machine learning, as
well as Bioinformatics, Science and Engineering students who
want to apply data mining techniques to solve problems in their
fields of study. |
Topical Outline: | Scientific programming with Python
Review of core statistics and probability
Information theory and uncertainty
Collection and integration of structured versus unstructured data
Decision trees and random forests
Randomized algorithms
Semi-supervised learning and label propagation
Spectral analytics
Out-of-core data processing
Just-in-time compilation and Julia
Data visualization
Introduction to distributed programming |