UGA Bulletin Logo

Data Science and Statistical Programming Applied to Agriculture


Course Description

Students will be exposed to data analytical workflows in agriculture utilizing data science principles. Workflows include analysis of designed and observational data (analysis of variance, regression, machine learning). Tasks will be performed using data science tools for reproducibility like version control, R, open data, code automation, and interactive dashboards.


Athena Title

Data Sci Stat Prog App to Ag


Pre or Corequisite

STAT 6315 or STAT 6315E


Semester Course Offered

Offered spring


Grading System

A - F (Traditional)


Course Objectives

The general course objective is to provide students with hands-on, applied experience in analyzing agricultural data using modern reproducible tools. This involves: - Learning and applying analytical workflows that involve importing data, processing, analyzing, assessing model fit, extracting model information (means and pairwise comparisons, regression coefficients), and producing publication-ready figures for different analysis including ANOVAs and regression. - Conducting analysis of variance workflows for the most commonly used agricultural designed studies (completely randomized design, randomized complete block design, split-plot design). - Conducting linear and non-linear regression workflows. - Learning and applying machine learning concepts (bias-variance trade-off, data split, hyper-parameter optimization, predictive metrics) and algorithms to agricultural observational data (soils, weather, yield). - Doing all the above while learning and using data science tools for reproducibility like version control, statistical programming, APIs to publicly available data sets, task automation, and creating online interactive dashboards.


Topical Outline

1. Intro to R and RStudio (R script, Rmarkdown, quarto, RStudio Projects) 2. Version control with git and GitHub 3. R APIs to publicly available data (USDA NASS, weather, soil) 4. Data wrangling with dplyr, tidyr, pipe operator 5. Data visualization with ggplot2, gganimate 6. Experimental concepts of experimental unit, randomization, and replication 7. Experimental and treatment designs and ANOVAs (model fit, assumption checking, inference, plot) a. Completely randomized design (CRD) b. Randomized complete block design (RCBD) c. Split-plot 8. Fixed vs. Random effects 9. Automating repetitive tasks through iteration with purrr 10. Linear regression 11. Non-linear regression 12. Regression for finding optimum 13. Machine learning concepts a. Bias-variance trade-off b. Data split c. Hyperparameter optimization d. Predictive assessment 14. Machine learning models a. K-means (unsupervised) b. Conditional inference tree/Random forest (supervised, regression, and classification) 15. Dashboards a. Creating a simple dashboard with shiny apps b. Publishing a dashboard online


Syllabus


Public CV