% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clusterInfectors.R
\name{clusterInfectors}
\alias{clusterInfectors}
\title{Clusters the infectors based on their transmission probabilities}
\usage{
clusterInfectors(
  df,
  indIDVar,
  pVar,
  clustMethod = c("n", "kd", "hc_absolute", "hc_relative"),
  cutoff
)
}
\arguments{
\item{df}{The name of the dateset with transmission probabilities (column \code{pVar}),
individual IDs (columns \code{<indIDVar>.1} and \code{<indIDVar>.2}).}

\item{indIDVar}{The name (in quotes) of the individual ID columns
(data frame \code{df} must have variables called \code{<indIDVar>.1}
 and \code{<indIDVar>.2}).}

\item{pVar}{The name (in quotes) of the column with transmission probabilities.}

\item{clustMethod}{The method used to cluster the infectors; 
one of \code{"n", "kd", "hc_absolute", "hc_relative"} (see details).}

\item{cutoff}{The cutoff for clustering (see details).}
}
\value{
The original data frame (\code{df}) with a new column called \code{cluster}
which is a factor variable with value \code{1} if the infector is in the high probability cluster
or \code{2} if the infector is in the low probability cluster.
}
\description{
The function \code{clusterInfectors} uses either kernel density estimation or
hierarchical clustering to cluster the infectors for each infectee. This clustering
provides a way to separate out the few top possible infectors for each infectee
if there is such a cluster.
}
\details{
This function provides a way to find the most likely infectors for each infectee
using various clustering methods indicated by the \code{clustmethod}.
The methods can be one of \code{c("n", "kd", "hc_constant", "hc_relative")}.

If \code{clustMethod == "n"} then this function simply assigns the top n possible 
infectors in the top cluster where n is defined by the value of \code{cutoff}.

If \code{clustMethod == "kd"} then kernel density estimation is used to split the infectors.
The density for the probabilities for all infectors is estimated using a binwidth defined
by the value of \code{cutoff}. If the density is made up of at least two separate curves
(separated by a region where the density drops to 0) then the infectors with probabilities
greater than the lowest 0 region are assigned to the high probability cluster. If the density of the
probabilities does not drop to 0 then all infectors are assigned to the low probability cluster 
(indicating no real clustering).

If \code{clustMethod == "hc_absolute"} or \code{clustMethod == "hc_relative"}, then
hierarchical clustering with minimum distance is used to split the possible infectors
into two clusters. This method functionally splits the infectors by the largest gap
in their probabilities.

Then if \code{clustMethod == "hc_absolute"}, those infectees
where the gap between the two clusters is less than \code{cutoff} have all of their
possible infectors reassigned to the low probability cluster (indicating no real clustering).
If \code{clustMethod == "hc_relative"}, then all infectees where the gap between the two
clusters is less than \code{cutoff} times the second largest gap in probabilities
are reassigned to the low probability cluster (indicating no real clustering).
}
\examples{

## Use the nbResults data frame included in the package which has the results
## of the nbProbabilities() function on a TB-like outbreak.

## Clustering using top n
# High probability cluster includes infectors with highest 3 probabilities
clust1 <- clusterInfectors(nbResults, indIDVar = "individualID", pVar = "pScaled",
                           clustMethod = "n", cutoff = 3)
table(clust1$cluster)

## Clustering using hierarchical clustering

# Cluster all infectees, do not force gap to be certain size
clust2 <- clusterInfectors(nbResults, indIDVar = "individualID", pVar = "pScaled",
                            clustMethod = "hc_absolute", cutoff = 0)
table(clust2$cluster)

\donttest{
# Absolute difference: gap between top and bottom clusters is more than 0.05
clust3 <- clusterInfectors(nbResults, indIDVar = "individualID", pVar = "pScaled",
                           clustMethod = "hc_absolute", cutoff = 0.05)
table(clust3$cluster)

# Relative difference: gap between top and bottom clusters is more than double any other gap
clust4 <- clusterInfectors(nbResults, indIDVar = "individualID", pVar = "pScaled",
                           clustMethod = "hc_relative", cutoff = 2)
table(clust4$cluster)

## Clustering using kernel density estimation
# Using a small binwidth of 0.01
clust5 <- clusterInfectors(nbResults, indIDVar = "individualID", pVar = "pScaled",
                           clustMethod = "kd", cutoff = 0.01)
table(clust5$cluster)
}

}
\seealso{
\code{\link{nbProbabilities}}
}
