% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/allele_cluster.R
\name{inferAlleleClusters}
\alias{inferAlleleClusters}
\title{Allele similarity cluster}
\usage{
inferAlleleClusters(
  germline_set,
  locus = NULL,
  clustering_method = c("hierarchical", "leiden"),
  distance_method = c("decipher", "hamming", "lv"),
  trim_3prime_side = 318,
  mask_5prime_side = 0,
  family_threshold = 75,
  allele_cluster_threshold = 95,
  cluster_method = "complete",
  resolution = NULL,
  target_clusters = NULL,
  optimize_silhouette = TRUE,
  ncores = 1,
  aa_set = FALSE,
  quiet = FALSE
)
}
\arguments{
\item{germline_set}{A character vector of Ig sequence alleles (must be gapped by IMGT scheme for optimal results).}

\item{locus}{The locus type. One of "IGHV", "IGKV", "IGLV", "IGHD", "IGHJ", "IGKJ", "IGLJ".
Default is NULL (auto-detected from sequence names).}

\item{clustering_method}{Clustering method. One of "hierarchical" (default) or "leiden".}

\item{distance_method}{Distance calculation method. One of "decipher" (default), "hamming", or "lv".}

\item{trim_3prime_side}{Position to trim sequences from 3' end. Default is 318; NULL uses full length.}

\item{mask_5prime_side}{Length to mask from 5' side. Default is 0.}

\item{family_threshold}{Similarity threshold for family level (hierarchical only). Default is 75.}

\item{allele_cluster_threshold}{Similarity threshold for allele cluster level (hierarchical only). Default is 95.}

\item{cluster_method}{Hierarchical clustering linkage method. Default is "complete".}

\item{resolution}{Resolution parameter for Leiden clustering. Default is NULL (auto-optimized).}

\item{target_clusters}{Target number of clusters for Leiden optimization. Default is NULL.}

\item{optimize_silhouette}{Optimize resolution using silhouette score (Leiden only). Default is TRUE.}

\item{ncores}{Number of cores for parallel processing (Leiden only). Default is 1.}

\item{aa_set}{Logical. Is the sequence set amino acids? Default is FALSE.}

\item{quiet}{Logical. Suppress messages. Default is FALSE.}
}
\value{
An object of class \link{GermlineCluster} containing:
\itemize{
\item germlineSet: Modified germline set (3' trimming and 5' masking)
\item alleleClusterSet: Renamed germline set with ASC names
\item alleleClusterTable: data.frame of allele similarity clusters
\item threshold: List of threshold parameters
\item hclustAlleleCluster: hclust object (hierarchical method)
\item clusteringMethod: Method used ("hierarchical" or "leiden")
\item communityObject: Community object (Leiden method)
\item graphObject: igraph object (Leiden method)
\item silhouetteScore: Silhouette score (Leiden method)
\item resolutionParameter: Resolution used (Leiden method)
\item locus: Locus identifier
}
}
\description{
A wrapper function to infer the allele clusters. Supports both hierarchical
clustering (default) and Leiden community detection.
}
\details{
The distance between pairs of allele sequences is calculated, then the alleles are clustered.
For hierarchical clustering, two similarity thresholds define family and allele clusters.
For Leiden clustering, community detection identifies clusters at a specified resolution.

The allele cluster names follow this scheme:
IGHVF1-G1*01 - IGH = chain, V = region, F1 = family cluster numbering,
G1 = allele cluster numbering, 01 = allele numbering (by clustering order)

For V segments, the "decipher" distance method is recommended.
For D and J segments with variable lengths, "lv" (Levenshtein) is more appropriate.
}
\examples{
# load the initial germline set
\donttest{
data(HVGERM)

germline <- HVGERM[!grepl("^[.]", HVGERM)]

# Hierarchical clustering (default)
asc <- inferAlleleClusters(germline)

# Leiden community detection
asc_leiden <- inferAlleleClusters(germline[1:50],
                                  clustering_method = "leiden",
                                  target_clusters = 10)

## plotting the clusters
plot(asc)
}
}
\seealso{
\code{\link{igDistance}}, \code{\link{igClust}}, \code{\link{plot.GermlineCluster}}
}
