\name{refmean}
\alias{refmean}
\encoding{latin1}

\title{Averaging of multiple reference genes}

\description{
This function averages the expression of several reference genes before calculation of gene expression ratios by \code{\link{ratiocalc}} or \code{\link{ratiobatch}}. The method is similar to that described in Vandesompele \emph{et al.} (2002), but uses \bold{arithmetic} averaging of threshold cycles/efficiencies and \bold{not geometric} averaging of relative expression values. This is equivalent, as discussed in 'Details' and as shown in 'Examples'. An essential extension of this method is, that if replicates for the reference genes are supplied, the threshold cycles and efficiencies are subjected to error propagation prior to ratio calculation. The propagated error is then included in the calculation of the gene expression ratios, as advocated in Nordgard \emph{et al.} (2006).
}

\usage{
refmean(data, group, which.eff = c("sig", "sli", "exp", "mak", "ext"),
        type.eff = c("individual", "mean.single"), 
        which.cp = c("cpD2", "cpD1", "cpE", "cpR", "cpT", "Cy0", "ext"),
        verbose = TRUE, ...)
}

\arguments{
  \item{data}{multiple qPCR data generated by \code{\link{pcrbatch}}.}
  \item{group}{a character vector defining the replicates (if any) of control/treatment samples and reference genes/genes-of-interest. See 'Details'}
  \item{which.eff}{efficiency calculated by which method. Defaults to sigmoidal fit. Can also be a value such as 1.8, as shown in 'Examples'. See \code{\link{ratiocalc}}.}
  \item{type.eff}{using individual or averaged efficiencies for the replicates of a reference gene. See 'Details'.}
  \item{which.cp}{type of threshold cycles to be used for the analysis. Defaults to cpD2. See \code{\link{ratiocalc}}.}  
  \item{verbose}{logical. If \code{TRUE}, the steps of analysis are shown in the console window.}
  \item{...}{parameters to be supplied to \code{\link{propagate}}.}
}

\details{
As in \code{\link{ratiobatch}}, the samples are to be defined as a character vector in the style of "g1s1", "g1c1", "r1s1" and "r1c1" etc. If \code{refmean} is used as a standalone function or switched on in \code{\link{ratiobatch}} using \code{refmean = TRUE}, different reference genes per control/treatment samples are averaged when supplied either as single runs or as replicates. 

Examples (omitting genes-of-interest in control/treatment samples):\cr
2 reference genes, 2 replicates each:\cr
c("r1s1", "r1s1, "r2s1", "r2s1", "r1c1", "r1c1, "r2c1", "r2c1", ...).\cr
3 reference genes, no replicates:\cr
c("r1s1", "r2s1, "r3s1", "r1c1", "r2c1, "r3c1", ...)

Averaging of multiple reference genes is accomplished the following way:\cr
Given \eqn{i} reference genes with \eqn{j} replicates in a sample, all replicates \eqn{r_{ij}} are used for calculating mean \eqn{\mu_{r_i}} and standard deviation \eqn{\sigma_{r_i}} of the threshold cycles and efficiencies. The overall (grand) mean \eqn{\mu_r} and propagated error \eqn{\sigma_r} is calculated using \code{\link{propagate}} with first-order Taylor expansion including covariance: \eqn{\sigma_r = F_{r_i}C_{r_i}F_{r_i}^T}. Finally, a vector of length \eqn{L = n(r_{ij})} containing equidistant numbers \eqn{X = (x_1, x_2, x_3, \ldots x_L)} with mean \eqn{\mu_r} and standard deviation \eqn{\sigma_r} is generated for a new overall reference gene \eqn{r_1}. This is done using the internal function \code{qpcR:::makeStat} which calculates a shifted (\eqn{\mu_r}) and scaled (\eqn{\sigma_r}) Z-transformation on a vector \eqn{x_1 \ldots x_L}:
\deqn{Z_i = \mu_r + \frac{(x_i - \bar{X})}{\sigma_X} \cdot \sigma_r}
The new \eqn{Z_i} threshold cycle and efficiency values replace all values of \eqn{r_{ij}} in \code{data}. When using \code{\link{ratiobatch}}, this modified data is then used for the ratio calculation, again using \code{\link{propagate}} to calculate errors for ratios using the \eqn{Z_i} values as mentioned above.\cr
By using logarithmic identities (\url{https://en.wikipedia.org/wiki/Logarithmic_identities}), it can be shown that the \bold{geometric} mean can be transformed to the \bold{arithmetic} mean by logarithmation (assuming constant \eqn{E}):
\deqn{\left( \prod_{i=1}^n E^{x_i} \right)^{\frac{1}{n}} = \frac{1}{n} \cdot \log_E \left( \prod_{i=1}^n E^{x_i} \right) = \frac{1}{n} \sum_{i=1}^n x_i}
Hence, \bold{arithmetic} averaging of the threshold cycles \bold{BEFORE} ratio calculation is the same as doing \bold{geometric} averaging on relative quantities \bold{AFTER} ratio calculation. This is demonstrated in 'Examples' and also mentioned in the geNorm manual (\url{https://www.gene-quantification.com/geNorm_manual.pdf}) in Q8 (page 12).

When setting \code{type.eff = "individual"} (default), all efficiencies from replicates of a reference gene in a control/treatment sample \eqn{E(r_{ij})} are used for calculating mean \eqn{\mu_{E(r_i)}} and standard deviation \eqn{\sigma_{E(r_i)}}, the latter being used for calculating a propagated error for all reference gene efficiencies \eqn{\sigma_{E(r)}}. If \code{type.eff = "mean.single"}, all \eqn{E(r_{ij})} values from the replicates are set to the same value \eqn{\mu_{E(r_i)}}, that is, there is no variation assumed between the different \eqn{E(r_{ij})}. In this case, \eqn{\sigma_{E(r_i)} = 0}, so that no error of the replicates is propagated to \eqn{\sigma_{E(r)}}. This results in smaller overall errors of the output, but it can be debated if this is a realistic approach, hence both settings were implemented.

\code{which.eff} can be supplied with an efficiency value such as \code{1.8}, which is then used as the efficiency for all reference runs \eqn{E(r_{ij})}.
}

\value{
The same dataset \code{data} which was supplied to the function, but with modified threshold cycle/efficiency values in which the values are created per sample in a way, that they have the mean of all averaged reference genes and the same standard deviation as obtained by error propagation. See 'Details' for a more thorough explanation. Furthermore, a modified label vector \code{"NAME_mod"} is written to the global environment (if \code{"NAME"} was supplied for \code{group}) in which the different reference gene labels are aggregated, i.e. c("r1c1", "r2c1", "r3c1") will become c("r1c1", "r1c1", "r1c1"). This new label vector is also attached as an attribute to the output data and can be obtained by \code{attr(RES1, "group")}.
}

\author{
Andrej-Nikolai Spiess
}

\note{
The function checks stringently if the same number of different reference genes are used for control and treatment samples, although the number of replicates may differ.\cr
Example:\cr
GROUP <- c("r1c1", "r1c1", "r2c1", "r2c1", "r1s1", "r2s1") will work (2 reference genes in control/treatment samples), but GROUP <- c("r1c1", "r2c1", "r3c1", "r1s1", "r1s1", "r1s2", "r1s2", "r2s1", "r2s1") will not work (3 reference genes in controls, only 2 in treatment samples).
Also, when no or only one reference genes are detected, the original data is not averaged and returned unchanged.
}

\references{
Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes.\cr
Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F.\cr
\emph{Genome Biol} (2002), \bold{3}: research0034-research0034.11.\cr

Error propagation in relative real-time reverse transcription polymerase chain reaction quantification models: the balance between accuracy and precision.\cr
Nordgard O, Kvaloy JT, Farmen RK, Heikkil? R.\cr
\emph{Anal Biochem} (2006), \bold{356}: 182-193.\cr
}

\seealso{
In \code{\link{ratiobatch}}, reference gene averaging can be done automatically by setting \code{refmean = TRUE}. See there.
}

\examples{
\donttest{
## Replacing the reference gene values by
## averaged ones in the original data.
## => RES1 is new dataset.
## => GROUP1_mod in global environment is
## new labeling vector.
DAT1 <- pcrbatch(reps, fluo = 2:19, model = l5)
GROUP1 <- c("r1c1", "r1c1", "r2c1", "r2c1", "g1c1", "g1c1",
           "r1s1", "r1s1", "r1s2", "r1s2", "r2s1", "r2s1",
           "r2s2", "r2s2", "g1s1", "g1s1", "g1s2", "g1s2") 
RES1 <- refmean(DAT1, GROUP1, which.eff = "sig", which.cp = "cpD2")

## Using three reference genes without replicates
## and then 'ratiobatch'.
## This can also be called in 'ratiobatch' directly
## with parameter 'refmean = TRUE'. See there.
## In this example, already averaged dataset and 
## new labeling vector are supplied to 'ratiobatch', 
## so one has to set 'refmean = FALSE'.
DAT2 <- pcrbatch(reps, fluo = 2:9, model = l5)
GROUP2 <- c("r1c1", "r2c1", "r3c1", "g1c1", "r1s1", "r2s1", "r3s1", "g1s1" ) 
RES2 <- refmean(DAT2, GROUP2, which.eff = "sig", which.cp = "cpD2")
ratiobatch(RES2, GROUP2, refmean = FALSE)

## Comparison between 'refmean' ct-value arithmetic averaging
## and 'geNorm' relative quantities geometric averaging
## using data from the geNorm manual (2008), page 6.
## We will use HK1-HK3 as in the manual (no replicates).
## First we create a 'pcrbatch' dataset and then 
## override the ct values with those of the manual and all
## efficiencies with E = 2. Sample A is considered as control sample.
DAT3 <- pcrbatch(reps, fluo = 2:17, model = l5)
DAT3[8, -1] <- c(32.10, 27.00, 34.90, 23.00,
                 33.30, 28.40, 36.10, 24.20,
                 31.00, 27.50, 34.00, 26.35,
                 30.50, 28.20, 33.00, 25.45)
DAT3[1, -1] <- 2
GROUP3 <- c("r1c1", "r2c1", "r3c1", "g1c1",
            "r1s1", "r2s1", "r3s1", "g1s1",
            "r1s2", "r2s2", "r3s2", "g1s2",
            "r1s3", "r2s3", "r3s3", "g1s3")
RES3 <- refmean(DAT3, GROUP3, which.eff = "sig", which.cp = "cpD2")
ratiobatch(RES3, GROUP3, which.cp = "cpD2",
           which.eff = "sig", refmean = FALSE)
}
}

\keyword{nonlinear}

