June 25, 2019

## Doing Hydrogeology in R

*Posted by larryohanlon*

**By Sam Zipper**

Use programming languages to interact with, analyze, and visualize data is an increasingly important skill for hydrogeologists to have. Coding-based science makes it easier to process and visualize large amounts of data and increase the reproducibility of your work, both for yourself and others.

There are many programming languages out there; anecdotally, the most commonly used languages in the hydrogeology community are Python, MATLAB, and R. Kevin previously wrote a post highlighting Python’s role in the hydrogeology toolbox, in particular the excellent FloPy package for creating and interacting with MODFLOW models.

In this post, we’ll focus on R to explore some of the tools that can be used for hydrogeology. R uses ‘packages’, which are collections of functions related to a similar task. There are thousands of R packages; recently, two colleagues and I compiled a ‘Hydrology Task View’ which compiles and describes a large number of water-related packages. We found that water-related R packages can be broadly categorized into data retrieval, data analysis, and modelling applications. Though packages related to surface water and meteorological data constitute the bulk of the package, there are many groundwater-relevant packages for each step of a typical workflow.

Here, I’ll focus on some of the packages I use most frequently.

**Data Retrieval:**

Instead of downloading data as a CSV file and reading it into R, many packages exist to directly interface with online water data portals. For instance, dataRetrieval and waterData connect to the US Geological Survey water information service, tidyhydat to the Canadian streamflow monitoring network, and rnrfa for the UK National River Flow Archive.

**Data Analysis:**

Many common data analysis tasks are contained in various R packages. hydroTSM and zoo are excellent for working with timeseries data, and lfstat calculates various low-flow statistics. The EcoHydRology package contains an automated digital filter for baseflow separation from streamflow data.

**Modelling:**

While R does not have an interface to MODFLOW, there are many other models that can be run within R. The boussinesq package, unsurprisingly, contains functions to solve the 1D Boussinesq equation, and the kwb.hantush package models groundwater mounding beneath an infiltration basin. The first and only package I’ve ever made, streamDepletr, contains analytical models for estimating streamflow depletion due to groundwater pumping. To evaluate your model, check out the hydroGOF package which calculated many common goodness-of-fit metrics.

**How do I get and learn R?**

R is an open-source software program, available here. RStudio is a user-friendly interface for working with R. RStudio has also compiled a number of tutorials to help you get started!

**Other Useful Resources**

Louise Slater and many co-authors currently have a paper under discussion about ‘Using R in Hydrology’ which has many excellent resources.

While not hydrogeology-specific, there are many packages for generic data analysis and visualization that will be of use to hydrogeologists. In particular, the Tidyverse has a number of packages for reading, tidying, and visualizing data such as dplyr and ggplot2.

Claus Wilke’s Fundamentals of Data Visualization book (free online) was written entirely within R and shows examples of the many ways that R can be used to make beautiful graphs.

*Sam Zipper (@ZipperSam) is a Postdoctoral Fellow at the University of Victoria and soon-to-be research scientist with the Kansas Geological Survey at the University of Kansas.*

Nice to see R getting some much-deserved attention in the field of hydrogeology. Hopefully this isn’t a shameless plug, but for those folks interested in in using R for reproducible model building, please check out the following USGS report, https://doi.org/10.3133/sir20165080 .The report describes a MODFLOW model for the Wood River Valley aquifer system, south-central Idaho. And the collection of datasets, source code, and processing instructions used to construct and analyze the model was distributed in the R-package wrv, available at https://github.com/USGS-R/wrv .

If you’re not aware, the R package ‘reticulate’ allows for strong interoperability with Python. So you could call FloPy routines from within R if you wanted to.