Course Description
The design and implementation of data structures. Comparative analysis of algorithms and their applications to solving data science problems. Topics include recursion, lists, stacks, queues and priority queues, trees, graphs, dictionaries, decision trees, disjoint set, tensors, and data frames.
Athena Title
Data Structures Data Science
Prerequisite
CSCI 1302
Grading System
A - F (Traditional)
Course Objectives
This course surveys data structures and explores their different implementations with a focus on their manipulating algorithms. The objective is to effectively use data structures and their manipulating algorithms to design and implement efficient solutions and to develop problem-solving skills required in the data science paradigm. The course introduces approaches to algorithm design, including divide and conquer, greedy algorithms, and dynamic programming and will draw on applications from data science. Learning Outcomes: 1. Use asymptotic analysis notations to give upper bounds on time and space complexity of algorithms. 2. Design, analyze, and implement recursive functions. 3. Describe, design, and implement generic, reusable abstract data types (ADTs), including sorted list, unsorted list, tensors, stack, queue, priority queue, tree, graph, and dictionary. 4. Analyze and assess the impact of data structures and algorithm design on the performance of algorithms. 5. Understand algorithm design methods such as the greedy method, divide and conquer, and dynamic programming. 6. Choose the appropriate data structures and algorithm to solve a real world problem and to defend the selection. 7. Design, apply, and compare the complexity of principal algorithms for sorting, searching, and hashing. 8. Understand and apply graph algorithms, such as graph traversal algorithms, minimum spanning tree, topological sort, and single-source shortest path. 9. Write programs to satisfy the requirements of a real-word problem, integrating course concepts (data structures and algorithms efficiency) and object-oriented design principles.
Topical Outline
1. Algorithm Analysis [Knowledge Level: Usage] a) Explain and compare the growth of functions: Logarithmic functions, Polynomials, Linearithmic, and Exponential functions. b) Formally define the Asymptotic Complexity notations (Big O, Big theta, and Big Omega). Distinguish between case analysis methods (best case, average case, and worst case analysis). c) Measure operations as a function of problem size and compute the asymptotic complexity of simple algorithms/functions. 2. Fundamental Data Structures [Knowledge level: Assessment] a) Design, implement, analyze, and assess the different representations of the ADT List (singly-linked list, doubly- linked list, circularly-linked list) and their impact on the run-time and space complexity of the ADT operations. b) Design, implement, analyze, and assess the different representations of the Queue ADT and the Stack ADTs and their impact on the run-time and space complexity of the ADT operations. 3. Recursion: Understand, design, and implement recursive functions. [Knowledge Level: Usage] 4. Binary Search Trees (BST), and Balanced Trees. [Knowledge Level: Assessment] a) Understand, analyze, and compare BST implementation strategies (array vs. linked representations, iterative vs. recursive algorithms). b) Implement BSTs using linked structures. c) Design and analyze balanced search trees (AVL trees, red black trees or 2-3 trees, B trees). Evaluate which search tree techniques are best suited to various application scenarios. 5. Heaps and Priority Queues a) Understand the implementation of the Heap data structure and basic heap operations. [Knowledge Level: Familiarity] b) Use the heap structure as an internal traversal method to develop efficient solutions to real-world problems. [Knowledge Level: Usage] 6. Understand, Design, and implement the Priority Queue ADT using various data structures (Array List, Linked List, and Heap); evaluate the different implementations with respect to space and run-time efficiency. [Knowledge Level: Assessment] 7. Graphs and Graph Algorithms a) Understand and compare the adjacency list and the adjacency matrix implementations of the ADT Graph. [Knowledge Level: Familiarity] b) Understand graph algorithms, including depth first traversal, breadth first traversal, minimum spanning trees, topological sort, and single-source and all-pairs shortest path. Be able to use graph algorithms to solve problems. [Knowledge Level: Usage] 8. Algorithm design strategies a) For each of the strategies (greedy, divide and conquer, and dynamic programming), identify a practical example to which it would apply. [Knowledge Level: Familiarity] b) Use a divide and conquer algorithm to solve an appropriate problem. [Knowledge Level: Usage] 9. Hash Tables [Knowledge level: Usage] a) Design and implement two different versions of the hash table interface: Open addressing and chaining. b) Analyze and assess the complexity of basic hash table operations (insert, delete, and search) 10. Implement and assess sorting algorithms, including quadratic sorts, linearithmic sorts, and non-comparison linear sorts.[Knowledge Level: Assessment] 11. Discuss factors other than computational efficiency that influence the choice of algorithms, such as programming time, maintainability, and the use of application-specific patterns in the input data. [Knowledge Level: Familiarity] 12. Understand string-matching algorithms. [Knowledge Level: Usage] 13. Understand and use advanced data structures for data science, including tensors, disjoint sets, data frames, and decision trees. Be able to evaluate and use existing data science libraries that implement these data structures. [Knowledge Level: Usage]
Syllabus