UGA Bulletin Logo

Data Science and Applied Machine Learning II


Course Description

An introduction to advanced analytics techniques in data science, including random forests, semi-supervised learning, spectral analytics, randomized algorithms, and just-in-time compilers. Distributed and out-of-core processing.

Additional Requirements for Graduate Students:
Each graduate student will present a recent research article to the class and do extra project work. In the homework assignments and exams, graduate students will be assigned additional graduate-level questions. They will also be graded using a stricter scale.


Athena Title

Data Science ML II


Prerequisite

CSCI 3360 or CSCI 3360E


Semester Course Offered

Offered every year.


Grading System

A - F (Traditional)


Student Learning Outcomes

  • Students will have a deep knowledge of sophisticated data science techniques for making sense of data across domains.
  • Students will process data that is incomplete or missing, how to use hybrid techniques to analyze such as semi-supervised learning, and are introduced to distributed programming using Hadoop and Spark.
  • Students will explore just-in-time compilation, both in Python and in the new scientific computing language, Julia.

Topical Outline

  • Scientific programming with Python
  • Review of core statistics and probability
  • Information theory and uncertainty
  • Collection and integration of structured versus unstructured data
  • Decision trees and random forests
  • Randomized algorithms
  • Semi-supervised learning and label propagation
  • Spectral analytics
  • Out-of-core data processing
  • Just-in-time compilation and Julia
  • Data visualization
  • Introduction to distributed programming

Syllabus