(02) SNIH dataset

Ezequiel Toum

2023-04-12

library(hydrotoolbox)

Servicio Nacional de Información Hídrica (SNHI) dataset

Sin lugar a dudas, el SNIH posee la más extensa base de datos hidro-meteorológicos (tanto desde el punto de vista espacial como temporal) para la República Argentina (SNIH). En él se pueden encontrar los registros de estaciones desde la Quiaca a Tierra del Fuego, además contiene series que datan de principios del siglo pasado.


Without a doubt, the SNIH has the most extensive hydro-meteorological database (both from the spatial and temporal point of view) for the Argentine Republic (SNIH). In it the user can find the records of stations from La Quiaca to Tierra del Fuego (northernmost and southernmost places respectively), it also contains series dating from the beginning of the last century.

Reading individual files

La página web permite descargar las variables medidas en cada estación de a una por vez. El paquete hydrotoolbox ofrece la posibilidad de leer estos archivos (formato .xlsx) de manera automática mediante la función read_snih(). Al hacerlo, se cargará al Global Environment de R un data.frame con los datos del archivo original. Cabe destacar que esta función rellena automáticamente los vacíos existentes entre registros con NA_real_. Las siguientes líneas de código muestran cómo aplicar esta función con la serie de caudales medios diarios registradas en la estación Guido (provincia de Mendoza).


The website allows you to download the variables measured at each station one at a time. hydrotoolbox allows to read these files (.xlsx format) automatically using the read_snih() function. Doing so will load to the Global Environment a data.frame with the data from the original file. It should be noted that this function automatically fills the gaps between records with NA_real_. In the following code lines I show how to apply this function with the daily mean streamflow series recorded at the Guido station (Mendoza province).

# set path to file
path_file <- system.file('extdata', 'snih_qd_guido.xlsx', package = 'hydrotoolbox')

# read daily mean streamflow with default column name
guido_qd <- read_snih(path = path_file, by = 'day') 

head(guido_qd)

# now we use the function with column name
rm(guido_qd)
guido_qd <- read_snih(path = path_file,  by = 'day', 
                      out_name = 'qd(m3/s)') 

head(guido_qd)

# plot the series
plot(x = guido_qd[ , 1], y = guido_qd[ , 2], type = 'l', 
     main = 'Daily mean streamflow at Guido (Mendoza basin)', 
     xlab = 'Date', ylab = 'Q(m3/s)', col = 'dodgerblue', lwd = 1,
     ylim = c(0, 200))

Si bien esta función resulta de gran utilidad, a medida que la cantidad de variables a analizar crece, cargar estas tablas, ordenarlas y modificarlas, se vuelve tarea complicada. La solución que ofrece hydrotoolbox es la de trabajar con los objetos y métodos que el paquete provee. En las siguientes secciones muestro cómo usarlos.


Although this function is very useful, as the number of variables to be analyzed grows, loading these tables, ordering and modifying them becomes a complicated task. The solution that hydrotoolbox offers is to work with the objects and methods that the package provides. In the following sections I will show you how to use them.

Using classes and methods to build a meteorological station

Como menciono en los principios de diseño de este paquete (vignette('package_overview', package = 'hydrotoolbox')), los datos que se registran en las estaciones deben almacenarse en un mismo objeto. Por ello primero habrá que crear dicho objeto (o estación hidro-meteorológica) y luego usar hm_build_generic(), un método que permite cargar automáticamente al objeto todas las variables que la estación real registra.


As I mentioned in the design principles of this package (vignette ('package_overview', package = 'hydrotoolbox')), the data that is recorded in the stations must be stored in the same object. For this reason, you must first create the object (or hydro-meteorological station) and then use hm_build_generic(), a method that allows you to automatically load all variables to the object that the real world station records.

# in this path you will find the raw example data 
path <- system.file('extdata', package = 'hydrotoolbox')

list.files(path)

# we load in a single object (hydromet_station class)
# the streamflow and water height series
guido <- 
  hm_create() %>% # create the met-station
  hm_build_generic(path = path,
                   file_name = c('snih_qd_guido.xlsx'),
                   slot_name = c('qd'),
                   FUN = read_excel, 
                   by = c('day'),
                   sheet = 1L
                   ) 

# we can explore the data-set inside it by using hm_show
guido %>% hm_show()

# you can also rename the column names
guido <- 
  guido %>% 
  hm_name(slot_name = 'qd',
        col_name = 'q(m3/s)')

guido %>% hm_show(slot_name = 'qd')

Data visualization

Una de las herramientas más útiles para analizar series hidrológicas y sintetizar resultados son los gráficos. En esta sección muestro cómo emplear hm_plot(), método que permite graficar series de tiempo de forma estática y dinámica a través de argumentos intuitivos y por lo tanto sencillos de aplicar. hm_plot() usa internamente parte de la funcionalidad de los paquetes ggplot2 y plotly.


One of the most useful tools to analyze hydrological series and synthesize results are graphics. In this section I show how to use hm_plot (), a method that allows to plot time series statically and dynamically through intuitive arguments. hm_plot () uses some of the functionality of the ggplot2 and plotly packages.

# we ask hydrotolkit to show all the variables 
# with data in our station
guido %>% hm_show()

# if want to analyze the daily mean streamflow records
guido %>%
  hm_plot(slot_name = 'qd',
          col_name = list('q(m3/s)'),
          interactive = TRUE,
          line_color = 'dodgerblue', 
          x_lab = 'Date', y_lab = 'Q(m3/s)' )
# just show the discharge for the hydrological year 2016/2017
# for publishing
guido %>%
  hm_plot(slot_name = 'qd',
          col_name = list('q(m3/s)'),
          interactive = FALSE,
          line_color = 'dodgerblue', 
          x_lab = 'Date', y_lab = 'Q(m3/s)', 
          from = '2016-07-01', to = '2017-06-30', 
          legend_lab = 'Guido station',
          title_lab = 'Daily mean discharge' )

Access to met-satation information

En esta sección muestro cómo usar los métodos hm_show(), hm_report() y hm_get(). Éstos sirven para obtener información cuantitativa acerca de los datos y para extraer las tablas de la estación.


In this section I show how to use the hm_show(), hm_report() and hm_get() methods. They are used to obtain quantitative information about the data and to extract out of the hydromet_station object the data.frames.

# the show method allows to get an idea about the stored variables
guido %>%
  hm_show()

# or maybe we want to specify the slots
guido %>%
  hm_show(slot_name = c('id', 'qd', 'tair') )
# suppose that to get an idea about the basic statistics of our data
# and we want to know how many missing data we have
guido %>%
  hm_report(slot_name = 'qd')
# now you want to extract the table 
guido %>%
  hm_get(slot_name = 'qd') %>%
  head()

Data transformation

Como menciono en los principios de diseño del paquete, las modificaciones se deben poder almacenar en el mismo archivo con el fin de evitar las múltiples vesiones. En esta sección vamos a ver algunos ejemplos en el uso de los métodos hm_mutate() y hm_melt().


As I mention in the package design principles, modifications must be able to be stored in the same file, in order to avoid the multiple versioning issue. In this section we will see some examples with hm_mutate() and hm_melt() methods.

# apply a moving average windows to streamflow records
guido %>%
  hm_mutate(slot_name = 'qd',
            FUN = mov_avg, k = 10,
            pos = 'c', out_name = 'mov_avg') %>% # see ?mov_avg()
  hm_plot(slot_name = 'qd',
         col_name = list(c('q(m3/s)', 'mov_avg') ),
         interactive = TRUE,
         line_color = c('dodgerblue', 'red3'),
         y_lab = 'Q(m3/s)',
         legend_lab = c('obs', 'mov_avg')  )

NOTE: hm_mutate() can also be combined with the dplyr package function mutate().

# lets say that we want to put together snow water equivalent from Toscas (dgi)
# and daily streamflow discharge from Guido (snih)

# on the first place we build the Toscas station
# dgi file
toscas <- 
  hm_create() %>%
  hm_build_generic(path = path,
                   file_name = 'dgi_toscas.xlsx',
                   slot_name = c('swe', 'tmax',
                                 'tmin', 'tmean',
                                 'rh', 'patm'),
                   by = 'day', 
                   FUN = read_dgi, 
                   sheet = 1L:6L ) 

# now we melt the required data in a new object
hm_create(class_name = 'compact') %>%
     hm_melt(melt = c('toscas', 'guido'),
             slot_name = list(toscas = 'swe', guido = 'qd'),
             col_name = 'all',
             out_name = c('swe(mm)', 'qd(m3/s)')
             ) %>%
       hm_plot(slot_name = 'compact',
               col_name = list( c('swe(mm)', 'qd(m3/s)') ),
               interactive = TRUE,
               legend_lab = c('swe-Toscas', 'qd-Guido'),
               line_color = c('dodgerblue', 'red'),
               y_lab = c('q(m3/s)', 'swe(mm)'),
               dual_yaxis = c('right', 'left')
                )

Quality flags and non-numeric columns

Desde la versión 1.1.0 del paquete, los objetos hydromet_station y hydromet_compact admiten columnas no numéricas. Esto permite agregar metadatos varios a las series en cuestión.


Since version 1.1.0 of the package, the hydromet_station and hydromet_compact objects support non-numeric columns. This allows to add several metadata-types to the tables.

# we are going to add come quality-flags to the data
library(tibble)

my_station <- hm_create(class_name = "station")

my_tb <-
  tibble(
        date = seq.POSIXt(from = ISOdate(2022, 1, 1, 0, 0, 0),
        to = ISOdate(2022, 1, 1, 23, 0, 0),
        by = "hour" ),
        random_var = runif(n = 24, min = 0, max = 10),
        unit = "my_units",
        quality_flag = c(rep("good", 20), rep("bad", 4))
        )

my_station <-
  my_station %>%
  hm_set(unvar = my_tb)

my_station %>% hm_show()