\name{simEval}
\alias{simEval}
\title{A function for evaluating similarity/dissimilarity matrices (simEval)}
\usage{
simEval(d, sideInf, lower.tri = FALSE, cores = 1, ...)
}
\arguments{
  \item{d}{a \code{vector} or a square symmetric
  \code{matrix} (or \code{data.frame}) of
  similarity/dissimilarity scores between samples of a
  given dataset (see \code{lower.tri}).}

  \item{sideInf}{a \code{vector} containing the side
  information corresponding to the samples in the dataset
  from which the similarity/dissimilarity matrix was
  computed. It can be either a numeric vector (continuous
  variable) or a factor (discrete variable). If it is a
  numeric \code{vector}, the root mean square of
  differences is used for assessing the similarity between
  the samples and their corresponding most similar samples
  in terms of the side information provided. If it is a
  factor, then the kappa index is used. See details.}

  \item{lower.tri}{a \code{logical} indicating whether the
  input similarities/dissimilarities are given as a
  \code{vector} of the lower triangle of the distance
  matrix (as returned e.g. by \code{base::dist}) or as a
  square symmetric \code{matrix} (or \code{data.frame})
  (default = \code{FALSE})}

  \item{cores}{number of cores used to find the neareast
  neighbours of similarity/dissimilarity scores (default =
  1). See details.}

  \item{...}{additional parameters (for internal use
  only).}
}
\value{
\code{simEval} returns a \code{list} with the following
components: \itemize{ \item{"\code{eval}}{either the RMSD
(and the correlation coefficient) or the kappa index}
\item{\code{firstNN}}{a \code{data.frame} containing the
original side informative variable in the first column and
the side informative values of the corresponding nearest
neighbours in the second column} }
}
\description{
This function searches for the most similar sample of each
sample in a given data set based on a
similarity/dissimilarity (e.g. distance matrix). The
samples are compared against their corresponding most
similar samples in terms of the side information provided.
The root mean square of differences and the correlation
coefficient are computed for continuous variables and for
discrete variables the kappa index is calculated.
}
\details{
For the evaluation of similarity/dissimilarity matrices
this function uses side information (information about one
variable which is available for a group of samples,
Ramirez-Lopez et al., 2013). It is assumed that there is a
correlation (or at least an indirect or secondary
correlation) between this side informative variable and the
spectra. In other words, this approach is based on the
assumption that the similarity measures between the spectra
of a given group of samples should be able to reflect their
similarity also in terms of the side informative variable
(e.g. compositional similarity). If \code{sideInf} is a
numeric \code{vector} the root mean square of differences
(RMSD) is used for assessing the similarity between the
samples and their corresponding most similar samples in
terms of the side information provided. It is computed as
follows: It can be computed as: \deqn{RMSD =
\sqrt{\frac{1}{n} \sum_{i=1}^n {(y_i - \ddot{y}_i)^2}}}
where \eqn{y_i} is the value of the side variable of the
\eqn{i}th sample, \eqn{\ddot{y}_i} is the value of the side
variable of the nearest neighbour of the \eqn{i}th sample
and \eqn{n} is the total number of observations. If
\code{sideInf} is a factor the kappa index (\eqn{\kappa})
is used instead the RMSD. It is computed as follows:
\deqn{\kappa = \frac{p_{o}-p_{e}}{1-p_{e}}} where both
\eqn{p_o} and \eqn{p_e} are two different agreement indexes
between the the side information of the samples and the
side information of their corrresponding nearest samples
(i.e. most similar samples). While \eqn{p_o} is the
relative agreement \eqn{p_e} is the the agreement expected
by chance. Multi-threading for the computation of
dissimilarities (see \code{cores} parameter) is based on
OpenMP and hence works only on windows and linux.
}
\examples{
\dontrun{
require(prospectr)

data(NIRsoil)

Yr <- NIRsoil$Nt[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]

# Example 1
# Compute a principal components distance
pca.d <- orthoDiss(Xr = Xr, pcSelection = list("cumvar", 0.999),
                   method = "pca",
                   local = FALSE,
                   center = TRUE, scaled = TRUE)

# The final number of pcs used for computing the distance
# matrix of objects in Xr
pca.d$n.components

# The final distance matrix
ds <- pca.d$dissimilarity

# Example 1.1
# Evaluate the distance matrix on the baisis of the
# side information (Yr) associated with Xr
se <- simEval(d = ds, sideInf = Yr)

# The final evaluation results
se$eval

# The final values of the side information (Yr) and the values of
# the side information corresponding to the first nearest neighbours
# found by using the distance matrix
se$firstNN

# Example 1.2
# Evaluate the distance matrix on the baisis of two side
# information (Yr and Yr2)
# variables associated with Xr
Yr2 <- NIRsoil$CEC[as.logical(NIRsoil$train)]
se2 <- simEval(d = ds, sideInf = cbind(Yr, Yr2))

# The final evaluation results
se2$eval

# The final values of the side information variables and the values
# of the side information variables corresponding to the first
# nearest neighbours found by using the distance matrix
se2$firstNN

###
# Example 2
# Evaluate the distances produced by retaining different number of
# principal components (this is the same principle used in the
# optimized principal components approach ("opc"))

# first project the data
pca <- orthoProjection(Xr = Xr, method = "pca",
                       pcSelection = list("manual", 30),
                       center = TRUE, scaled = TRUE)

# standardize the scores
scores.s <- sweep(pca$scores, MARGIN = 2,
                  STATS = pca$sc.sdv, FUN = "/")
rslt <-  matrix(NA, ncol(scores.s), 3)
colnames(rslt) <- c("pcs", "rmsd", "r")
rslt[,1] <- 1:ncol(scores.s)
for(i in 1:ncol(scores.s))
{
  sc.ipcs <- scores.s[ ,1:i, drop = FALSE]
  di <- fDiss(Xr = sc.ipcs, method = "euclid",
              center = FALSE, scaled = FALSE)
  se <- simEval(d = di, sideInf = Yr)
  rslt[i,2:3] <- unlist(se$eval)
}
plot(rslt)

###
# Example 3
# Example 3.1
# Evaluate a dissimilarity matrix computed using a moving window
# correlation method
mwcd <- mcorDiss(Xr = Xr, ws = 35, center = FALSE, scaled = FALSE)
se.mw <- simEval(d = mwcd, sideInf = Yr)
se.mw$eval

# Example 3.2
# Evaluate a dissimilarity matrix computed using the correlation
# method
cd <- corDiss(Xr = Xr, center = FALSE, scaled = FALSE)
se.nc <- simEval(d = cd, sideInf = Yr)
se.nc$eval
}
}
\author{
Leonardo Ramirez-Lopez
}
\references{
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A.,
Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based
learner: A new local approach for modeling soil vis-NIR
spectra of complex datasets. Geoderma 195-196, 268-279.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra
Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance
and similarity-search metrics for use with soil vis-NIR
spectra. Geoderma 199, 43-53.
}

