Course Description
An introduction to foundational programming techniques, data organization, and computational analysis approaches commonly used across the biomedical and informational sciences fields. This course uses open-source lessons and curricula developed by The Carpentries organization (https://carpentries.org/). Students will gain an understanding of the command-line interface and how this can be used to manipulate files and perform data analysis tasks on their computers.
Athena Title
Software Carpentries
Equivalent Courses
Not open to students with credit in BINF 7960E
Non-Traditional Format
Credit hours and lecture hours per week will vary by semester, depending on the course length and format. Course may be run as a traditional semester-long course or an intensive immersive course (e.g., weeklong intensive course during the summer).
Semester Course Offered
Offered fall, spring and summer
Grading System
S/U (Satisfactory/Unsatisfactory)
Student learning Outcomes
- Students will be able to explain how to use command-line computing interfaces and demonstrate the use of common Unix commands and shell scripts.
- Students will be able to compare and contrast the benefits and drawbacks of command-line versus graphical interfaces.
- Students will understand the basic principles of automated version control systems (Git) and explain how to apply these systems to common computational tasks.
- Students will develop a working knowledge of Python programming fundamentals and common programming applications of this language.
- Students will develop a working knowledge of R programming and the use of the R studio graphical interface, with an emphasis on using R for data cleaning, organization, and visualization.
- Students will understand how specific data science tools and workflows are commonly applied across diverse scientific disciplines such as Ecology, Genomics, Social Science, and Library Science.
Topical Outline
- Topic 1: The Unix Shell
Introducing the Shell
Navigating and Working with Files and Directories
Pipes and Filters
Loops
Shell Scripts
Finding Files on the Command Line
- Topic 2: Version Control with Git
Introduction to Automated Version Control Systems
Setting up Git and Creating a Repository
Tracking Changes and Exploring History
Ignoring Files and Customizing Tracking
Collaboration and Using Remotes in GitHub
Identifying and Correcting Conflicts
Open Science, Licensing, Citation, and Hosting
Integrating Git and R Studio
- Topic 3: Programming with Python
Python Fundamentals
Lists, Loops, and Conditionals
Analyzing Data from Multiple Files
Creating Functions
Errors and Exceptions
Debugging and Defensive Programming
Plotting and Visualizing Data Using Python
Best Practices for Code Formatting and Commenting
- Topic 4: Programming with R and R Studio
Fundamental of R and R Studio
Data types and Data Structures
Importing and Organizing Common Types of Data Files (CSV, TSV)
Creating Functions and using Statements
Loops and the Call Stack
Categorical Data and Factors
Dynamic Reports with knitr
Data Frame Manipulation with dplyr and tidyr
Visualizing Data using Base R and Common Commands
Creating Publication-quality Graphics with ggplot2
Best Practices for Writing R code
Making Packages in R
- Topic 5: Introduction to Data Analysis Workflows in Ecology, Genomics, Geospatial Data, Social Sciences, and Library Sciences