% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/flexsurvmix.R
\name{flexsurvmix}
\alias{flexsurvmix}
\title{Flexible parametric mixture models for times to competing events}
\usage{
flexsurvmix(
  formula,
  data,
  event,
  dists,
  pformula = NULL,
  anc = NULL,
  partial_events = NULL,
  initp = NULL,
  inits = NULL,
  fixedpars = NULL,
  dfns = NULL,
  method = "direct",
  em.control = NULL,
  optim.control = NULL,
  aux = NULL,
  sr.control = survreg.control(),
  integ.opts,
  hess.control = NULL,
  ...
)
}
\arguments{
\item{formula}{Survival model formula.  The left hand side is a \code{Surv}
  object specified as in \code{\link{flexsurvreg}}.  This may define various
  kinds of censoring, as described in \code{\link{Surv}}. Any covariates on
  the right hand side of this formula will be placed on the location
  parameter for every component-specific distribution. Covariates on other
  parameters of the component-specific distributions may be supplied  using
  the \code{anc} argument.

  Alternatively, \code{formula} may be a list of formulae, with one
  component for each alternative event.  This may be used to specify
  different covariates on the location parameter for different components.

  A list of formulae may also be used to indicate that for particular
  individuals, different events may be observed in different ways, with
  different censoring mechanisms.  Each  list component specifies the data
  and censoring scheme for that mixture component.

  For example, suppose we are studying people admitted to hospital,and the
  competing states are death in hospital and discharge from hospital.  At
  time t we know that a particular individual is still alive, but we do not
  know whether they are still in hospital, or have been discharged.  In this
  case, if the individual were to die in hospital, their death time would be
  right censored at t.  If the individual will be (or has been) discharged
  before death, their discharge time is completely unknown, thus
  interval-censored on (0,Inf). Therefore,  we need to store different event
  time and status variables in the data for different alternative events.
  This is specified here as

  \code{formula = list("discharge" = Surv(t1di, t2di, type="interval2"),
  "death" = Surv(t1de, status_de))}

  where for this individual, \code{(t1di, t2di) = (0, Inf)} and \code{(t1de,
  status_de)  = (t, 0)}.}

\item{data}{Data frame containing variables mentioned in \code{formula},
\code{event} and \code{anc}.}

\item{event}{Variable in the data that specifies which of the alternative
  events is observed for which individual.  If the individual's follow-up is
  right-censored, or if the event is otherwise unknown, this variable must
  have the value \code{NA}.

  Ideally this should be a factor, since the mixture components can then be
  easily identified in the results with a name instead of a number.  If this
  is not already a factor, it is coerced to one.   Then the levels of the
  factor define the required order for the components of the list arguments
  \code{dists}, \code{anc}, \code{inits} and \code{dfns}.  Alternatively, if
  the components of the list arguments are named according to the levels of
  \code{event}, then the components can be arranged in any order.}

\item{dists}{Vector specifying the parametric distribution to use for each
component. The same distributions are supported as in
\code{\link{flexsurvreg}}.}

\item{pformula}{Formula describing covariates to include on the component
membership proabilities by multinomial logistic regression.  The first
component is treated as the baseline.}

\item{anc}{List of component-specific lists, of length equal to the number
  of components.   Each component-specific list is a list of formulae
  representing covariate effects on parameters of the distribution.

  If there are covariates for one component but not others, then a list
  containing one null formula on the location parameter should be supplied
  for the component with no covariates, e.g \code{list(rate=~1)} if the
  location parameter is called \code{rate}.

  Covariates on the location parameter may also be supplied here instead of
  in \code{formula}.  Supplying them in \code{anc} allows some components
  but not others to have covariates on their location parameter.  If a covariate
  on the location parameter was provided in \code{formula}, and there are 
  covariates on other parameters, then a null formula should be included 
  for the location parameter in \code{anc}, e.g \code{list(rate=~1)}}

\item{partial_events}{List specifying the factor levels of \code{event}
  which indicate knowledge that an individual will not experience particular
  events, but may experience others.   The names of the list indicate codes
  that indicate partial knowledge for some individuals.  The list component
  is a vector, which must be a subset of \code{levels(event)} defining the
  events that a person with the corresponding event code may experience.

  For example, suppose there are three alternative events called
  \code{"disease1"},\code{"disease2"} and \code{"disease3"}, and for some
  individuals we know that they will not experience \code{"disease2"}, but
  they may experience the other two events.  In that case we must create a
  new factor level, called, for example \code{"disease1or3"}, and set the
  value of \code{event} to be \code{"disease1or3"} for those individuals.
  Then we use the \code{"partial_events"} argument to tell
  \code{flexsurvmix} what the potential events are for individuals with this
  new factor level.

  \code{partial_events = list("disease1or3" = c("disease1","disease3"))}}

\item{initp}{Initial values for component membership probabilities.  By
default, these are assumed to be equal for each component.}

\item{inits}{List of component-specific vectors. Each component-specific
vector contains the initial values for the parameters of the
component-specific model, as would be supplied to
\code{\link{flexsurvreg}}.   By default, a heuristic is used to obtain
initial values, which depends on the parametric distribution being used,
but is usually based on the empirical mean and/or variance of the survial
times.}

\item{fixedpars}{Indexes of parameters to fix at their initial values and
  not optimise. Arranged in the order: baseline mixing probabilities,
  covariates on mixing probabilities, time-to-event parameters by mixing
  component.  Within mixing components, time-to-event parameters are ordered
  in the same way as in \code{\link{flexsurvreg}}.

  If \code{fixedpars=TRUE} then all parameters will be fixed and the
  function simply calculates the log-likelihood at the initial values.

  Not currently supported when using the EM algorithm.}

\item{dfns}{List of lists of user-defined distribution functions, one for
each mixture component.  Each list component is specified as the
\code{dfns} argument of \code{\link{flexsurvreg}}.}

\item{method}{Method for maximising the likelihood.  Either \code{"em"} for
the EM algorithm, or \code{"direct"} for direct maximisation.}

\item{em.control}{List of settings to control EM algorithm fitting.  The
  only options currently available are

  \code{trace} set to 1 to print the parameter estimates at each iteration
  of the EM algorithm

  \code{reltol} convergence criterion.  The algorithm stops if the log
  likelihood changes by a relative amount less than \code{reltol}.  The
  default is the same as in \code{\link{optim}}, that is,
  \code{sqrt(.Machine$double.eps)}.

  \code{var.method} method to compute the covariance matrix. \code{"louis"}
  for the method of Louis (1982), or \code{"direct"}for direct numerical
  calculation of the Hessian of the log likelihood.

  \code{optim.p.control} A list that is passed as the \code{control}
  argument to  \code{optim} in the M step for the component membership
  probability parameters. The optimisation in the M step for the
  time-to-event parameters can be controlled by the \code{optim.control}
  argument to \code{flexsurvmix}.

  For example, \code{em.control = list(trace=1, reltol=1e-12)}.}

\item{optim.control}{List of options to pass as the \code{control} argument
to \code{\link{optim}},  which is used by \code{method="direct"} or in the
M step for the time-to-event parameters in \code{method="em"}.  By
default, this uses \code{fnscale=10000} and \code{ndeps=rep(1e-06,p)}
where \code{p} is the number of parameters being estimated, unless the
user specifies these options explicitly.}

\item{aux}{A named list of other arguments to pass to custom distribution
functions.  This is used, for example, by \code{\link{flexsurvspline}} to
supply the knot locations and modelling scale (e.g. hazard or odds).  This
cannot be used to fix parameters of a distribution --- use
\code{fixedpars} for that.}

\item{sr.control}{For the models which use \code{\link{survreg}} to find the
maximum likelihood estimates (Weibull, exponential, log-normal), this list
is passed as the \code{control} argument to \code{\link{survreg}}.}

\item{integ.opts}{List of named arguments to pass to
  \code{\link{integrate}}, if a custom density or hazard is provided without
  its cumulative version.  For example,

  \code{integ.opts = list(rel.tol=1e-12)}}

\item{hess.control}{List of options to control inversion of the Hessian to
  obtain a covariance matrix. Available options are \code{tol.solve}, the
  tolerance used for \code{\link{solve}} when inverting the Hessian (default
  \code{.Machine$double.eps}), and \code{tol.evalues}, the accepted
  tolerance for negative eigenvalues in the covariance matrix (default
  \code{1e-05}).

  The Hessian is positive definite, thus invertible, at the maximum
  likelihood.  If the Hessian computed after optimisation convergence can't
  be inverted, this is either because the converged result is not the
  maximum likelihood (e.g. it could be a "saddle point"), or because the
  numerical methods used to obtain the Hessian were inaccurate. If you
  suspect that the Hessian was computed wrongly enough that it is not
  invertible, but not wrongly enough that the nearest valid inverse would be
  an inaccurate estimate of the covariance matrix, then these tolerance
  values can be modified (reducing \code{tol.solve} or increasing
  \code{tol.evalues}) to allow the inverse to be computed.}

\item{...}{Optional arguments to the general-purpose optimisation routine
\code{\link{optim}}.  For example, the BFGS optimisation algorithm is the
default in \code{\link{flexsurvreg}}, but this can be changed, for example
to \code{method="Nelder-Mead"} which can be more robust to poor initial
values.  If the optimisation fails to converge, consider normalising the
problem using, for example, \code{control=list(fnscale = 2500)}, for
example, replacing 2500 by a number of the order of magnitude of the
likelihood. If 'false' convergence is reported with a
non-positive-definite Hessian, then consider tightening the tolerance
criteria for convergence. If the optimisation takes a long time,
intermediate steps can be printed using the \code{trace} argument of the
control list. See \code{\link{optim}} for details.}
}
\value{
List of objects containing information about the fitted model.   The
  important one is \code{res}, a data frame containing the parameter
  estimates and associated information.
}
\description{
In a mixture model for competing events, an individual can experience one of
a set of different events.  We specify a model for the probability that they
will experience each event before the others, and a model for the time to
the event conditionally on that event occurring first.
}
\details{
This differs from the more usual "competing risks" models, where we specify
"cause-specific hazards" describing the time to each competing event.  This
time will not be observed for an individual if one of the competing events
happens first.  The event that happens first is defined by the minimum of
the times to the alternative events.

The \code{flexsurvmix} function fits a mixture model to data consisting of a
single time to an event for each individual, and an indicator for what type
of event occurs for that individual.   The time to event may be observed or
censored, just as in \code{\link{flexsurvreg}}, and the type of event may be
known or unknown. In a typical application, where we follow up a set of
individuals until they experience an event or a maximum follow-up time is
reached, the event type is known if the time is observed, and the event type
is unknown when follow-up ends and the time is right-censored.

The model is fitted by maximum likelihood, either directly or by using an
expectation-maximisation (EM) algorithm, by wrapping
\code{\link{flexsurvreg}} to compute the likelihood or to implement the E
and M steps.
}
\references{
Larson, M. G., & Dinse, G. E. (1985). A mixture model for the
  regression analysis of competing risks data. Journal of the Royal
  Statistical Society: Series C (Applied Statistics), 34(3), 201-211.

  Lau, B., Cole, S. R., & Gange, S. J. (2009). Competing risk regression
  models for epidemiologic data. American Journal of Epidemiology, 170(2),
  244-256.
}
