% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/project.R
\name{project}
\alias{project}
\title{Projection onto submodel(s)}
\usage{
project(
  object,
  nterms = NULL,
  solution_terms = NULL,
  refit_prj = TRUE,
  ndraws = 400,
  nclusters = NULL,
  seed = sample.int(.Machine$integer.max, 1),
  regul = 1e-04,
  ...
)
}
\arguments{
\item{object}{An object which can be used as input to \code{\link[=get_refmodel]{get_refmodel()}} (in
particular, objects of class \code{refmodel}).}

\item{nterms}{Only relevant if \code{object} is of class \code{vsel} (returned by
\code{\link[=varsel]{varsel()}} or \code{\link[=cv_varsel]{cv_varsel()}}). Ignored if \code{!is.null(solution_terms)}. Number
of terms for the submodel (the corresponding combination of predictor terms
is taken from \code{object}). If a numeric vector, then the projection is
performed for each element of this vector. If \code{NULL} (and
\code{is.null(solution_terms)}), then the value suggested by \code{\link[=suggest_size]{suggest_size()}} is
taken (with default arguments for \code{\link[=suggest_size]{suggest_size()}}, implying that this
suggested size is based on the ELPD). Note that \code{nterms} does not count the
intercept, so use \code{nterms = 0} for the intercept-only model.}

\item{solution_terms}{If not \code{NULL}, then this needs to be a character vector
of predictor terms for the submodel onto which the projection will be
performed. Argument \code{nterms} is ignored in that case. For an \code{object} which
is not of class \code{vsel}, \code{solution_terms} must not be \code{NULL}.}

\item{refit_prj}{A single logical value indicating whether to fit the
submodels (again) (\code{TRUE}) or to retrieve the fitted submodels from
\code{object} (\code{FALSE}). For an \code{object} which is not of class \code{vsel},
\code{refit_prj} must be \code{TRUE}. Note that currently, \code{refit_prj = FALSE}
requires some caution, see GitHub issues #168 and #211.}

\item{ndraws}{Only relevant if \code{refit_prj} is \code{TRUE}. Number of posterior
draws to be projected. Ignored if \code{nclusters} is not \code{NULL} or if the
reference model is of class \code{datafit} (in which case one cluster is used).
If both (\code{nclusters} and \code{ndraws}) are \code{NULL}, the number of posterior
draws from the reference model is used for \code{ndraws}. See also section
"Details" below.}

\item{nclusters}{Only relevant if \code{refit_prj} is \code{TRUE}. Number of clusters
of posterior draws to be projected. Ignored if the reference model is of
class \code{datafit} (in which case one cluster is used). For the meaning of
\code{NULL}, see argument \code{ndraws}. See also section "Details" below.}

\item{seed}{Pseudorandom number generation (PRNG) seed by which the same
results can be obtained again if needed. Passed to argument \code{seed} of
\code{\link[=set.seed]{set.seed()}}, but can also be \code{NA} to not call \code{\link[=set.seed]{set.seed()}} at all. Here,
this seed is used for clustering the reference model's posterior draws (if
\code{!is.null(nclusters)}) and for drawing new group-level effects when
predicting from a multilevel submodel (however, not yet in case of a GAMM)
and having global option \code{projpred.mlvl_pred_new} set to \code{TRUE}. (Such a
prediction takes place when calculating output elements \code{dis} and \code{ce}.)}

\item{regul}{A number giving the amount of ridge regularization when
projecting onto (i.e., fitting) submodels which are GLMs. Usually there is
no need for regularization, but sometimes we need to add some
regularization to avoid numerical problems.}

\item{...}{Arguments passed to \code{\link[=get_refmodel]{get_refmodel()}} (if \code{\link[=get_refmodel]{get_refmodel()}} is
actually used; see argument \code{object}) as well as to the divergence
minimizer (if \code{refit_prj} is \code{TRUE}).}
}
\value{
If the projection is performed onto a single submodel (i.e.,
\code{length(nterms) == 1 || !is.null(solution_terms)}), an object of class
\code{projection} which is a \code{list} containing the following elements:
\describe{
\item{\code{dis}}{Projected draws for the dispersion parameter.}
\item{\code{ce}}{The cross-entropy part of the Kullback-Leibler (KL)
divergence from the reference model to the submodel. For some families,
this is not the actual cross-entropy, but a reduced one where terms which
would cancel out when calculating the KL divergence have been dropped. In
case of the Gaussian family, that reduced cross-entropy is further
modified, yielding merely a proxy.}
\item{\code{weights}}{Weights for the projected draws.}
\item{\code{solution_terms}}{A character vector of the submodel's predictor
terms.}
\item{\code{submodl}}{A \code{list} containing the submodel fits (one fit per
projected draw).}
\item{\code{cl_ref}}{A numeric vector of length equal to the number of
posterior draws in the reference model, containing the cluster indices of
these draws.}
\item{\code{wdraws_ref}}{A numeric vector of length equal to the number of
posterior draws in the reference model, giving the weights of these
draws. These weights should be treated as not being normalized (i.e.,
they don't necessarily sum to \code{1}).}
\item{\code{p_type}}{A single logical value indicating whether the
reference model's posterior draws have been clustered for the projection
(\code{TRUE}) or not (\code{FALSE}).}
\item{\code{refmodel}}{The reference model object.}
}
If the projection is performed onto more than one submodel, the output from
above is returned for each submodel, giving a \code{list} with one element for
each submodel.
}
\description{
Project the posterior of the reference model onto the parameter space of a
single submodel consisting of a specific combination of predictor terms or
(after variable selection) onto the parameter space of a single or multiple
submodels of specific sizes.
}
\details{
Arguments \code{ndraws} and \code{nclusters} are automatically truncated at
the number of posterior draws in the reference model (which is \code{1} for
\code{datafit}s). Using less draws or clusters in \code{ndraws} or \code{nclusters} than
posterior draws in the reference model may result in slightly inaccurate
projection performance. Increasing these arguments affects the computation
time linearly.

Note that if \code{\link[=project]{project()}} is applied to output from \code{\link[=cv_varsel]{cv_varsel()}}, then
\code{refit_prj = FALSE} will take the results from the \emph{full-data} search.
}
\examples{
if (requireNamespace("rstanarm", quietly = TRUE)) {
  # Data:
  dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)

  # The "stanreg" fit which will be used as the reference model (with small
  # values for `chains` and `iter`, but only for technical reasons in this
  # example; this is not recommended in general):
  fit <- rstanarm::stan_glm(
    y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
    QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
  )

  # Variable selection (here without cross-validation and with small values
  # for `nterms_max`, `nclusters`, and `nclusters_pred`, but only for the
  # sake of speed in this example; this is not recommended in general):
  vs <- varsel(fit, nterms_max = 3, nclusters = 5, nclusters_pred = 10,
               seed = 5555)

  # Projection onto the best submodel with 2 predictor terms (with a small
  # value for `nclusters`, but only for the sake of speed in this example;
  # this is not recommended in general):
  prj_from_vs <- project(vs, nterms = 2, nclusters = 10, seed = 9182)

  # Projection onto an arbitrary combination of predictor terms (with a small
  # value for `nclusters`, but only for the sake of speed in this example;
  # this is not recommended in general):
  prj <- project(fit, solution_terms = c("X1", "X3", "X5"), nclusters = 10,
                 seed = 9182)
}

}
