% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/delineate_with_similarity.R
\name{delineate_with_similarity}
\alias{delineate_with_similarity}
\title{Delineate clusters from a similarity matrix}
\usage{
delineate_with_similarity(sim_matrix, threshold)
}
\arguments{
\item{sim_matrix}{A \emph{n} × \emph{n} similarity matrix, with \emph{n} the number of spectra. Columns should be named as the rows.}

\item{threshold}{A numeric value indicating the minimal similarity between two spectra. Adjust accordingly to the similarity metric used.}
}
\value{
A tibble of \emph{n} rows for each spectra and 3 columns:
\itemize{
\item \code{name}: the rownames of the similarity matrix indicating the spectra names
\item \code{membership}: integers stating the cluster number to which the spectra belong to. It starts from 1 to \emph{c}, the total number of clusters.
\item \code{cluster_size}: integers indicating the total number of spectra in the corresponding cluster.
}
}
\description{
From a matrix of spectra similarity (e.g., with the cosine metric,
or Pearson product moment), infer the species clusters based on a
threshold \strong{above} (or \strong{equal to}) which spectra are considered alike.
}
\details{
The matrix is essentially a network
where nodes are spectra and links exist between spectra only if the similarity
between the spectra is above the threshold.

The original idea to find the cluster members comes from a \href{https://stackoverflow.com/a/57613463}{StackOverflow answer by the user ekstroem}. However, here the
implementation differs in two way:
\enumerate{
\item It relies on the connected components of the network instead of the fast greedy
modularity algorithm.
\item It uses base R functions to reduce the dependencies
}
}
\examples{
# Toy similarity matrix between the six example spectra of
#  three species. The cosine metric is used and a value of
#  zero indicates dissimilar spectra and a value of one
#  indicates identical spectra.
cosine_similarity <- matrix(
  c(
    1, 0.79, 0.77, 0.99, 0.98, 0.98,
    0.79, 1, 0.98, 0.79, 0.8, 0.8,
    0.77, 0.98, 1, 0.77, 0.77, 0.77,
    0.99, 0.79, 0.77, 1, 1, 0.99,
    0.98, 0.8, 0.77, 1, 1, 1,
    0.98, 0.8, 0.77, 0.99, 1, 1
  ),
  nrow = 6,
  dimnames = list(
    c(
      "species1_G2", "species2_E11", "species2_E12",
      "species3_F7", "species3_F8", "species3_F9"
    ),
    c(
      "species1_G2", "species2_E11", "species2_E12",
      "species3_F7", "species3_F8", "species3_F9"
    )
  )
)
# Delineate clusters based on a 0.92 threshold applied
#  to the similarity matrix
delineate_with_similarity(cosine_similarity, threshold = 0.92)
}
\seealso{
For similarity metrics: \href{https://rdrr.io/cran/coop/man/cosine.html}{\code{coop::tcosine}}, \link[stats:cor]{stats::cor}, \href{https://rdrr.io/cran/Hmisc/man/rcorr.html}{\code{Hmisc::rcorr}}. For using taxonomic identifications for clusters : \link{delineate_with_identification}. For further analyses: \link{set_reference_spectra}.
}
