5/21/2019

Why document your code?

  • Reproducibility
  • FAIR data principles: findable, accessible, interopable, reusable
  • Make it usable for other scientists – including future you!

See papers listed at the Riffomonas Tutorial for further reading on reproducibility.

Ten simple rules for documenting scientific software

Excerpted from Lee et al. 2018:

  • Write comments as you code.
  • Include a README file with basic information.
  • Version control your documentation.
  • Use automated documentation tools.
  • Write error messages that provide solutions or point to your documentation.

What not to document

  • “Document Design and Purpose, Not Mechanics” - Wilson et al. 2014

  • In other words, “Why, not how”.

  • Don’t do this:

    i <- i + 1  # increment `i` by 1
  • Try to write code that is self-documenting.

    • Use descriptive variable names. If you have to write a comment to describe your variable, you probably need to rename it. Examples:
      • counter instead of i
      • patient_metadata instead of df
      • data_cleaned instead of df2

What to document

  • No code is fully self-documenting.
  • You should document:
    • packages
    • datasets
    • functions
    • classes
    • any tricky lines of code

How to document

The bare minimum:

Include a comment at the top of your R script to briefly describe what it does at a high level.

# Generate plots from mothur sensspec files for comparing clustering algorithms.
library(ggplot2)

How (specifics for R)

Best practices:

  • For any project with R scripts:
    • Write comments alongside your code with roxygen2 syntax for man/ files.
    • Make your project a package with usethis & devtools.
  • For packages you release into the wild:
    • Write a vignette with R Markdown.
    • Create a website with pkgdown.

roxygen2 syntax

Document functions in R/*.R files

#' Add together two numbers.
#' 
#' @param x A number.
#' @param y A number.
#' @return The sum of \code{x} and \code{y}.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
   x + y
}

roxygen2 syntax

Document datasets in R/data.R

#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds.
#'
#' @format A data frame with 53940 rows and 10 variables:
#' \describe{
#'   \item{price}{price, in US dollars}
#'   \item{carat}{weight of the diamond, in carats}
#'   ...
#' }
#' @source \url{http://www.diamondse.info/}
"diamonds"

roxygen2 syntax

Document the package in R/package_name.R

#' foo: A package for computating the notorious bar statistic.
#'
#' The foo package provides three categories of important functions:
#' foo, bar and baz.
#' 
#' @section Foo functions:
#' The foo functions ...
#'
#' @docType package
#' @name foo
NULL

Activity

Let’s document code from the Riffomonas minimalR tutorial!

  1. The package: minimalR. Edit R/minimalR.R.
  2. The dataset: baxter_metadata. Edit R/data.R (raw data in inst/extdata/, processed data in data/)
  3. Functions: Edit R/baxter.R
    • get_metadata
    • get_bmi
    • get_bmi_category
    • is_obese

Activity

  1. Clone this repo

    git clone https://github.com/SchlossLab/documenting-R

    or if you previously cloned it, pull new commits:

    cd path/to/documenting-R/ ; git pull
  2. Checkout a new branch

    git checkout -b descriptive-branch-name
  3. After modifying your part, commit your changes

    git add . ; git commit -m "descriptive commit message"

Activity Wrap-up

  1. Push your changes

    git push -u origin descriptive-branch-name
  2. Open a pull request on GitHub to merge your branch into master. Mention your issue number(s) & assign me to the PR.

R Packages

Setup an R package

library(usethis)
library(devtools)
create_package(file.path(getwd()))

Specify dependencies

use_package("dplyr")
use_package("readxl")

Compile documentation

devtools::document()

Additional Reading