% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DISTANCES-gak.R
\name{GAK}
\alias{GAK}
\alias{gak}
\title{Fast global alignment kernels}
\usage{
GAK(
  x,
  y,
  ...,
  sigma = NULL,
  window.size = NULL,
  normalize = TRUE,
  error.check = TRUE
)

gak(
  x,
  y,
  ...,
  sigma = NULL,
  window.size = NULL,
  normalize = TRUE,
  error.check = TRUE
)
}
\arguments{
\item{x, y}{Time series. A multivariate series should have time spanning the rows and variables
spanning the columns.}

\item{...}{Currently ignored.}

\item{sigma}{Parameter for the Gaussian kernel's width. See details for the interpretation of
\code{NULL}.}

\item{window.size}{Parameterization of the constraining band (\emph{T} in Cuturi (2011)). See details.}

\item{normalize}{Normalize the result by considering diagonal terms.}

\item{error.check}{Logical indicating whether the function should try to detect inconsistencies
and give more informative errors messages. Also used internally to avoid repeating checks.}
}
\value{
The logarithm of the GAK if \code{normalize = FALSE}, otherwise 1 minus the normalized GAK. The value
of \code{sigma} is assigned as an attribute of the result.
}
\description{
Distance based on (triangular) global alignment kernels.
}
\details{
This function uses the Triangular Global Alignment Kernel (TGAK) described in Cuturi (2011). It
supports series of different length and multivariate series, so long as the ratio of the series'
lengths doesn't differ by more than 2 (or less than 0.5).

The \code{window.size} parameter is similar to the one used in DTW, so \code{NULL} signifies no constraint,
and its value should be greater than 1 if used with series of different length.

The Gaussian kernel is parameterized by \code{sigma}. Providing \code{NULL} means that the value will be
estimated by using the strategy mentioned in Cuturi (2011) with a constant of 1. This estimation
is subject to \strong{randomness}, so consider estimating the value once and re-using it (the estimate
is returned as an attribute of the result). See the examples.

For more information, refer to the package vignette and the referenced article.
}
\note{
The estimation of \code{sigma} does \emph{not} depend on \code{window.size}.

If \code{normalize} is set to \code{FALSE}, the returned value is \strong{not} a distance, rather a similarity.
The \code{\link[proxy:dist]{proxy::dist()}} version is thus always normalized. Use \code{\link[proxy:dist]{proxy::simil()}} with \code{method} set to
"uGAK" if you want the unnormalized similarities.

A constrained unnormalized calculation (i.e. with \code{window.size > 0} and \code{normalize = FALSE}) will
return negative infinity if \verb{abs(NROW(x)} \code{-} \verb{NROW(y))} \code{>} \code{window.size}. Since the function
won't perform calculations in that case, it might be faster, but if this behavior is not desired,
consider reinterpolating the time series (see \code{\link[=reinterpolate]{reinterpolate()}}) or increasing the window size.
}
\section{Proxy version}{


The version registered with \code{\link[proxy:dist]{proxy::dist()}} is custom (\code{loop = FALSE} in \link[proxy:registry]{proxy::pr_DB}).
The custom function handles multi-threaded parallelization directly with \link[RcppParallel:RcppParallel-package]{RcppParallel}.
It uses all available threads by default (see \code{\link[RcppParallel:setThreadOptions]{RcppParallel::defaultNumThreads()}}),
but this can be changed by the user with \code{\link[RcppParallel:setThreadOptions]{RcppParallel::setThreadOptions()}}.

An exception to the above is when it is called within a \code{\link[foreach:foreach]{foreach}} parallel loop \strong{made by dtwclust}.
If the parallel workers do not have the number of threads explicitly specified,
this function will default to 1 thread per worker.
See the parallelization vignette for more information - \code{browseVignettes("dtwclust")}

It also includes symmetric optimizations to calculate only half a distance matrix when appropriate---only one list of series should be provided in \code{x}.
Starting with version 6.0.0, this optimization means that the function returns an array with the lower triangular values of the distance matrix,
similar to what \code{\link[stats:dist]{stats::dist()}} does;
see \linkS4class{DistmatLowerTriangular} for a helper to access elements as it if were a normal matrix.
If you want to avoid this optimization, call \link[proxy:dist]{proxy::dist} by giving the same list of series in both \code{x} and \code{y}.
}

\examples{

\dontrun{
data(uciCT)

set.seed(832)
GAKd <- proxy::dist(zscore(CharTraj), method = "gak",
                    pairwise = TRUE, window.size = 18L)

# Obtained estimate of sigma
sigma <- attr(GAKd, "sigma")

# Use value for clustering
tsclust(CharTraj, k = 20L,
        distance = "gak", centroid = "shape",
        trace = TRUE,
        args = tsclust_args(dist = list(sigma = sigma,
                                        window.size = 18L)))
}

# Unnormalized similarities
proxy::simil(CharTraj[1L:5L], method = "ugak")

}
\references{
Cuturi, M. (2011). Fast global alignment kernels. In \emph{Proceedings of the 28th international
conference on machine learning (ICML-11)} (pp. 929-936).
}
