UGA Bulletin

Course Description

An introduction to advanced analytics techniques in data science, including random forests, semi-supervised learning, spectral analytics, randomized algorithms, and just-in-time compilers. Distributed and out-of-core processing.

Additional Requirements for Graduate Students:
Each graduate student will present a recent research article to the class and do extra project work. In the homework assignments and exams, graduate students will be assigned additional graduate-level questions. They will also be graded using a stricter scale.

Athena Title

Data Science ML II

Prerequisite

CSCI 3360 or CSCI 3360E or INFO 3000 or INFO 3000E

Semester Course Offered

Offered every year.

Grading System

A - F (Traditional)

Student learning Outcomes

Students will have a deep knowledge of sophisticated data science techniques for making sense of data across domains.
Students will process data that is incomplete or missing, how to use hybrid techniques to analyze such as semi-supervised learning, and are introduced to distributed programming using Hadoop and Spark.
Students will explore just-in-time compilation, both in Python and in the new scientific computing language, Julia.

Topical Outline

Scientific programming with Python
Review of core statistics and probability
Information theory and uncertainty
Collection and integration of structured versus unstructured data
Decision trees and random forests
Randomized algorithms
Semi-supervised learning and label propagation
Spectral analytics
Out-of-core data processing
Just-in-time compilation and Julia
Data visualization
Introduction to distributed programming

Syllabus

Public CV

publish

Data Science and Applied Machine Learning II