Course Description
Application of discrete algorithms to computational problems in molecular biology. Topics are drawn from such areas as classical sequence comparison, multiple sequence alignment, DNA sequence assembly, DNA physical mapping, genome rearrangement, evolutionary tree construction, and protein folding. Background in molecular biology is not required.
Additional Requirements for Graduate Students:
Extra projects will be assigned to graduate students.
Athena Title
ALG FOR COMP BIO
Prerequisite
CSCI 4470/6470 or permission of department
Semester Course Offered
Not offered on a regular basis.
Grading System
A - F (Traditional)
Course Objectives
The objective of this course is to expose students to current research on discrete algorithms in computational biology. The focus is on the application of string and graph algorithms and combinatorial optimization to problems in sequence comparison, multiple alignment, physical mapping, sequence assembly, and genome rearrangement. The emphasis throughout is on problem formulation, algorithm design, and implementation.
Topical Outline
1. Introduction Brief overview of molecular biology Survey of problem areas in computational molecular biology 2. Sequence comparison Basic sequence alignment. Edit distance, reduction to sequence alignment, and dynamic programming. Equivalence of distance minimization and similarity maximization. Local similarity. Linear gap costs. Algorithms for the 0-1 metric, and longest common subsequences. Advanced sequence alignment. Linear-space alignment. Convex gap costs. Suboptimal alignments. Four-Russians technique. Incremental sequence alignment. Parallel sequence alignment. Sparse dynamic programming. Subquadratic database searching. Approximate pattern matching. Matching of strings, repeats, regular-expressions, and context-free grammars in the presence of error. 3. Multiple sequence alignment The general multiple alignment problem. Exact and approximation algorithms for the sum of pairs, fixed-tree, and maximum-trace objective functions. Finding an ordered series of motifs. 4. DNA sequencing Shortest common superstring. Approximation with respect to compression and length. Relation to learning theory and Occam algorithms. Shotgun sequence assembly. Approximation with respect to compression and length. 5. DNA mapping Physical mapping with non-unique and unique probes. Clone-probe, clone-clone, and restriction-fragment data. 6. Genome rearrangements Inversions. Signed and unsigned variations, exact algorithms and approximation. Transpositions. Approximation. Translocations. Survey of models. Exact solution and approximation. Mixed inversion, translocation, fission, and fusion.
Syllabus