UGA Bulletin Logo

Data Science II


Course Description

An introduction to advanced analytics techniques in data science, including random forests, semi-supervised learning, spectral analytics, randomized algorithms, and just-in-time compilers. Distributed and out-of-core processing.

Additional Requirements for Graduate Students:
Each graduate student will present a recent research article to the class and do extra project work. In the homework assignments and exams, graduate students will be assigned additional graduate-level questions. They will also be graded using a stricter scale.


Athena Title

Data Science II


Prerequisite

CSCI 3360


Semester Course Offered

Offered every year.


Grading System

A - F (Traditional)


Course Objectives

This course provides students with deep knowledge of sophisticated data science techniques for making sense of data across domains. Students are instructed in how to process data that is incomplete or missing, how to use hybrid techniques to analyze such as semi-supervised learning, and are introduced to distributed programming using Hadoop and Spark. Furthermore, students are given the opportunity to explore just-in-time compilation, both in Python and in the new scientific computing language, Julia. The course is appropriate both for students preparing for research in data mining and machine learning, as well as Bioinformatics, Science and Engineering students who want to apply data mining techniques to solve problems in their fields of study.


Topical Outline

Scientific programming with Python Review of core statistics and probability Information theory and uncertainty Collection and integration of structured versus unstructured data Decision trees and random forests Randomized algorithms Semi-supervised learning and label propagation Spectral analytics Out-of-core data processing Just-in-time compilation and Julia Data visualization Introduction to distributed programming


Syllabus