% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/log_likelihood.R
\name{log_likelihood}
\alias{log_likelihood}
\title{Log-likelihood}
\usage{
log_likelihood(
  par,
  const_params = NULL,
  data = NULL,
  data_sigma_group = NULL,
  modelfun = NULL,
  dose_norm = FALSE,
  log10_trans = FALSE,
  negative = TRUE,
  force_finite = FALSE,
  max_multiplier = NULL,
  includes_preds = FALSE,
  suppress.messages = TRUE
)
}
\arguments{
\item{par}{A named list of parameters and their values that are being
optimized.}

\item{const_params}{Optional: A named list of parameters and their values that
are being held constant.}

\item{data}{A `data.frame` of data with harmonized variable names. Required:
`Time_trans`, `Dose`, `Conc`, `Detect`, `N_Subjects`, `Conc_SD`. `Conc` and
`Conc_SD` will be transformed according to `dose_norm` and `log10_trans`.}

\item{data_sigma_group}{A `factor` vector which could be a new variable in
`data`, giving the error group for each row in `data`.}

\item{modelfun}{Character or function: The name of the function that produces
concentration predictions for the model being evaluated.}

\item{dose_norm}{Logical: Whether to dose-normalize predicted and observed
concentrations before calculating likelihood.}

\item{log10_trans}{Logical: Whether to apply a [log10()] transformation to
predicted and observed concentrations before calculating likelihood.}

\item{negative}{Logical: Whether to return the *negative* log-likelihood
(i.e., the log-likelihood multiplied by negative 1). Default TRUE, to
multiply the log-likelihood by negative 1 before returning it. This option
is useful when treating the log-likelihood as an objective function to be
*minimized* by an optimization algorithm.}

\item{force_finite}{Logical: Whether to force return of a finite value (e.g.
as required by method `L-BFGS-B` in [optimx::optimx()]). Default FALSE. If
TRUE, then if the log-likelihood works out to be non-finite, then it will be
replaced with `.Machine$double.xmax`.}

\item{max_multiplier}{Numeric, but NULL by default. Determines which multiple
of the maximum concentration to use to limit possible predictions.
Note: only use this as part of the fitting function.}

\item{includes_preds}{Logical: whether `data` includes predictions.}

\item{suppress.messages}{Logical.}
}
\value{
A log-likelihood value for the data given the parameter values in
 params
}
\description{
The log-likelihood function (probability of data given model parameters).
}
\details{
The log-likelihood is formulated by assuming that residuals (transformed
model-predicted concentrations minus transformed observed concentrations) are
independent, and that groups of residuals obey zero-mean normal distributions
where each group may have a separate error variance. Error groups are defined
as unique combinations of variables in the harmonized data, by a command such
as `pk(data = ...) + stat_error_model(error_group = vars(...)`.

# Log-likelihood equations

For chemical-species combination \eqn{i} and study \eqn{j}, define the
following quantities.

\eqn{y_{ijk}} is the \eqn{k^{th}} observation of concentration (which may be
transformed, e.g. dose-normalized and/or log10-transformed), corresponding to
dose \eqn{d_{ijk}} and time \eqn{t_{ijk}}. Each observation has a
corresponding LOQ, \eqn{\textrm{LOQ}_{ijk}}.

For multiple-subject observations, \eqn{y_{ijk}} is the \eqn{k^{th}} *sample
mean* observed concentration for chemical-species combination \eqn{i} and
study \eqn{j}, corresponding to dose \eqn{d_{ijk}} and time \eqn{t_{ijk}}. It
represents the mean of \eqn{n_{ijk}} individual measurements. It has a
corresponding sample standard deviation, \eqn{s_{ijk}}. In the harmonized
data,  \eqn{s_{ijk}} is contained in variable `Conc_SD`.

\eqn{\bar{\theta}_i} represents the vector of model parameters supplied in
argument `params` for chemical-species combination \eqn{i}.

\eqn{\mu_{ijk}} is the model-predicted concentration for dose \eqn{d_{ijk}}
and time \eqn{t_{ijk}}. If \eqn{f(d, t; \bar{ \theta})} is the model function
evaluated at dose \eqn{d} and time \eqn{t}, with parameter vector
\eqn{\bar{\theta}}, then

 \deqn{\mu_{ijk} = f \left(d_{ijk}, t_{ijk};
\bar{\theta}_i \right)}

\eqn{\sigma_{ij}^2} is the study- and chemical-specific residual variance. (It
is a hyperparameter.)

## Single-subject observations above limit of quantification (detects)

This is the normal probability density function evaluated at the observed
concentration, as implemented in [stats::dnorm()].

\deqn{LL_{ijk} = \log \left( \frac{1}{\sqrt{\sigma_{ij} 2 \pi}} \exp \left[
\frac{-1}{2} \left( \frac{y_{ijk} - \mu_{ijk}}{\sigma_{ij}} \right)^2
\right]   \right) }


## Single-subject observations below limit of quantification (non-detects)

This is the normal cumulative density function evaluated at the LOQ, as
implemented in [stats::pnorm()]. It is the total probability of observing a
concentration anywhere between 0 and the LOQ.

\deqn{LL_{ijk} =  \log \left( \frac{1}{2} \left[ 1 + \textrm{erf} \left(
\frac{\textrm{LOQ}_{ijk} - \mu_{ijk}}{\sigma_{ij} \sqrt{2}} \right) \right]  \right) }

## Multiple-subject observations above limit of quantification

This is the joint log-likelihood across the multiple subjects included in one
observation, re-expressed in terms of the sample mean, sample SD, and number
of subjects for that observation. It is implemented in [dnorm_summary()].

\deqn{LL_{ijk} = n_{ijk} * \log \left[
 \frac{1}{\sigma_{ij} \sqrt{2 \pi}} +
  \frac{-1}{2 \sigma_{ij}^2}  \left(
   \left( n_{ijk} - 1 \right)
     s_{ijk}^2 + n_{ijk} \left( y_{ijk} - \mu_{ijk} \right)^2
      \right)
       \right]}

## Multiple-subject observations below limit of quantification

This case is not implemented. If sample mean concentration is reported below
LOQ, then it is unclear what individual observed concentrations are
represented, and how they were combined to produce the summary data in terms
of sample mean and sample SD. Were all individual observations below LOQ? Or
were below-LOQ observations replaced with 0, LOQ/2, etc. before sample mean
and sample SD were computed? If the sample mean is reported below LOQ, what
LOQ is reported? Did individual observations all have the same LOQ, or is an
average or median LOQ being used?  It is impossible to formulate the
log-likelihood without knowing the answers to these questions. Therefore,
multiple-subject observations below LOQ are excluded from analysis (they are
marked as excluded in [preprocess_data()]).

# Joint log-likelihood for a chemical and species

The joint log-likelihood for a chemical and species is simply the sum of
log-likelihoods across observations.

\deqn{LL_{i} = \sum_{j=1}^{J_i} \sum_{k=1}^{K_{ij}} LL_{ijk}}

This is the overall probability of the observed data, given the model and
parameters.
}
\author{
Caroline Ring, Gilberto Padilla Mercado
}
