% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/external.bold.analyze.diversity.R
\name{bold.analyze.diversity}
\alias{bold.analyze.diversity}
\title{Create a biodiversity profile of the retrieved data}
\usage{
bold.analyze.diversity(
  bold_df,
  taxon_rank,
  taxon_name = NULL,
  site_type = c("locations", "grids"),
  location_type = NULL,
  gridsize = NULL,
  presence_absence = FALSE,
  diversity_profile = c("richness", "preston", "shannon", "beta", "all"),
  beta_index = NULL
)
}
\arguments{
\item{bold_df}{A data frame obtained from \code{\link[=bold.fetch]{bold.fetch()}}.}

\item{taxon_rank}{A single character string specifying the taxonomic hierarchical rank. Needs to be provided by default.}

\item{taxon_name}{A single or multiple character vector specifying the taxonomic names associated with the ‘taxon_rank’. Default value is NULL.}

\item{site_type}{A character string specifying one of two broad categories of \code{sites} (\code{locations} or \code{grids}). Needs to be provided by default.}

\item{location_type}{A single character vector specifying the geographic category if \code{locations} is selected as the \code{site_type} and for which a community matrix should be created. Default value is NULL.}

\item{gridsize}{A numeric value of the size of the grid if \code{grids} is selected as the \code{site_type}. Size is in sq.m. Default value is NULL.}

\item{presence_absence}{A logical value specifying whether the generated matrix should be converted into a ’presence-absence’ matrix. Default value is FALSE.}

\item{diversity_profile}{A character string specifying the type of diversity profile ("richness","preston","shannon","beta","all"). Needs to be provided by default.}

\item{beta_index}{A character vector specifying the type of beta diversity index (’jaccard’ or ’sorensen’ available) if \code{beta} or \code{all} \code{diversity_profile} selected. Default value is NULL.}
}
\value{
An 'output' list containing results based on the profile selected:

#Common to all
\itemize{
\item comm.matrix = site X species like matrix required for the biodiversity results
#Common to all if \code{site_type}=\code{grids}
\item comm.matrix = site X species like matrix required for the biodiversity results
}

#Based on the type of diversity profile
#1. richness
\itemize{
\item richness = A richness profile matrix
#2. shannon
\item Shannon_div = Shannon diversity values for the given sites/grids (from gen.comm.mat)
#3. preston
\item preston.res = a Preston plot numerical data output
\item preston.plot = a ggplot2 visualization of the preston.plot
#4. beta
\item total.beta = beta.total
\item replace = beta.replace (replacement)
\item richnessd = beta.richnessd (richness difference)
#5. all
\item All of the above results
}
}
\description{
This function creates a biodiversity profile of the downloaded data using \code{\link[=bold.fetch]{bold.fetch()}}.
}
\details{
\code{bold.analyze.diversity} estimates the richness, Shannon diversity and beta diversity from the BIN counts or presence-absence data. Internally, the function converts the downloaded BCDM data into a community matrix (site X species) which is also provided as a part of the output. \code{taxon_rank} refers to a specific taxonomic rank (Ex. class, order, family etc or even BINs) and the \code{taxon_name} to one or more names of organisms in that specific rank. \code{taxon_rank} cannot be NULL while all the data will be used if \code{taxon_name} = \code{NULL} for a specified \code{taxon_rank}. The \code{site_type}=\code{locations} followed by providing a \code{location_type} refers to any geographic field (country.ocean,province.state etc.; for more information check the \code{bold.fields.info()} function help). \code{site_type}=\code{grids} generates grids based on BIN occurrence data (latitude, longitude) with grid size determined by the user in square meters using the \code{gridsize} argument. \code{site_type}=\code{grids} converts the Coordinate Reference System (CRS) of the data to a ‘Mollweide’ projection by which distance-based grid can be correctly specified (Gott III et al. 2007).Each grid is assigned a cell id, with the lowest number given to the lowest latitudinal point in the dataset. Rows lacking latitude and longitude data (NULL values) are removed when \code{site_type}=\code{grids}. Conversely, NULL entries are permitted when \code{site_type}=\code{locations}, even if latitude and longitude values are missing. This distinction exists because grids rely on bounding boxes, which require latitude and longitude values. This filtering could impact the richness values and other analyses, as all records for the selected \code{taxon_rank} that contain \code{location} information but lack latitude and longitude will be excluded if \code{site_type}=\code{grids}. This means that the same dataset could yield different results depending on the chosen \code{site_type}. \code{location_type} has to be specified when \code{site_type}=\code{locations} to avoid errors. The community matrix generated based on the sites/grids is then used to create richness profiles using \code{BAT::alpha.accum()} and Preston and Shannon diversity analyses using \code{vegan::prestondistr()} and \code{vegan::diversity()} respectively. The \code{BAT::alpha.accum()} currently offers various richness estimators, including Observed diversity (Obs); Singletons (S1); Doubletons (S2); Uniques (Q1); Duplicates (Q2); Jackknife1 abundance (Jack1ab); Jackknife1 incidence (Jack1in); Jackknife2 abundance (Jack2ab); Jackknife2 incidence (Jack2in); Chao1 and Chao2. The results depend on the input data (true abundances vs counts vs incidences) and users should be careful in the subsequent interpretation. Preston plots are generated using the data from the \code{prestondistr} results in \code{ggplot2} featuring cyan bars for observed species (or equivalent taxonomic group) and orange dots for expected counts. Beta diversity values are calculated using \code{BAT::beta()} function, which partitions the data using the Podani & Schmera (2011)/Carvalho et al. (2012) approach. These results are stored as distance matrices in the output.

\emph{Note on the community matrix}: Each cell in this matrix contains the counts (or abundances) of the specimens whose sequences have an assigned BIN, in a given \code{site_type} (\code{locations} or \code{grids}). These counts can be generated at any taxonomic hierarchical level, applicable to one or multiple taxa including \code{bin_uri}. The \code{presence_absence} argument converts these counts (or abundances) to 1s and 0s.

\emph{Important Note}: Results, including counts, adapt based on \code{taxon_rank} argument.
}
\examples{
\dontrun{
# Search for ids
comm.mat.data <- bold.public.search(taxonomy = list("Poecilia"))

# Fetch the data using the ids.
#1. api_key must be obtained from BOLD support before using `bold.fetch()` function.
#2. Use the `bold.apikey()` function  to set the apikey in the global env.

bold.apikey('apikey')

BCDMdata <- bold.fetch(get_by = "processid",
                       identifiers = comm.mat.data$processid)

# Remove rows which have no species data
BCDMdata <- BCDMdata[!BCDMdata$species== "",]

#1. Analyze richness data
res.rich <- bold.analyze.diversity(bold_df=BCDMdata,
                                   taxon_rank = "species",
                                   site_type = "locations",
                                   location_type = 'country.ocean',
                                   diversity_profile = "richness")

# Community matrix (BCDM data converted to community matrix)
res.rich$comm.matrix

# richness results
res.rich$richness

#2. Shannon diversity (based on grids)
res.shannon <- bold.analyze.diversity(bold_df=BCDMdata,
                                      taxon_rank = "species",
                                      site_type = "grids",
                                      gridsize = 1000000,
                                      diversity_profile = "shannon")

# Shannon diversity results
res.shannon$shannon_div

# Grid data (sf)
res.shannon$grids.data

# grid map
res.shannon$grid.map

#3. Preston plots and results
pres.res <- bold.analyze.diversity(bold_df=BCDMdata,
                                   taxon_rank = "species",
                                   site_type = "locations",
                                   location_type = 'country.ocean',
                                   diversity_profile = "preston")

# Preston plot
pres.res$preston.plot

# Preston plot data
pres.res$preston.res

#4. beta diversity
beta.res <- bold.analyze.diversity(bold_df=BCDMdata,
                                   taxon_rank = "species",
                                   site_type = "locations",
                                   location_type = 'country.ocean',
                                   diversity_profile = "beta",
                                   beta_index = "jaccard")

#Total diversity
beta.res$total.beta

#Replacement
beta.res$replace

#Richness difference
beta.res$richnessd

#5. All profiles
all.diversity.res<-bold.analyze.diversity(bold_df=BCDMdata,
                                          taxon_rank = "species",
                                          site_type = "locations",
                                          location_type = 'country.ocean',
                                          diversity_profile = "all",
                                          beta_index = "jaccard")
#Explore all results
all.diversity.res
}

}
\references{
Carvalho, J.C., Cardoso, P. & Gomes, P. (2012) Determining the relative roles of species replace- ment and species richness differences in generating beta-diversity patterns. Global Ecology and Biogeography, 21, 760-771.

Podani, J. & Schmera, D. (2011) A new conceptual and methodological framework for exploring and explaining pattern in presence-absence data. Oikos, 120, 1625-1638.

Richard Gott III, J., Mugnolo, C., & Colley, W. N. (2007). Map projections minimizing distance errors. Cartographica: The International Journal for Geographic Information and Geovisualization, 42(3), 219-234.
}
