% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seqimpute.R
\name{seqimpute}
\alias{seqimpute}
\title{seqimpute: Imputation of missing data in longitudinal categorical data}
\usage{
seqimpute(
  data,
  var = NULL,
  np = 1,
  nf = 1,
  m = 5,
  timing = FALSE,
  frame.radius = 0,
  covariates = NULL,
  time.covariates = NULL,
  regr = "multinom",
  npt = 1,
  nfi = 1,
  ParExec = FALSE,
  ncores = NULL,
  SetRNGSeed = FALSE,
  end.impute = TRUE,
  verbose = TRUE,
  available = TRUE,
  pastDistrib = FALSE,
  futureDistrib = FALSE,
  ...
)
}
\arguments{
\item{data}{Either a data frame containing sequences of a categorical 
variable, where missing data are coded as \code{NA}, or a state sequence 
object created using the \link[TraMineR]{seqdef} function. If using a 
state sequence object, any "void" elements will also be treated as missing. 
See the \code{end.impute} argument if you wish to skip imputing values 
at the end of the sequences.}

\item{var}{A specifying the columns of the dataset 
that contain the trajectories. Default is \code{NULL}, meaning all columns 
are used.}

\item{np}{Number of prior states to include in the imputation model 
for internal gaps.}

\item{nf}{Number of subsequent states to include in the imputation model 
for internal gaps.}

\item{m}{Number of multiple imputations to perform (default: \code{5}).}

\item{timing}{Logical, specifies the imputation algorithm to use. 
If \code{FALSE}, the MICT algorithm is applied; if \code{TRUE}, the 
MICT-timing algorithm is used.}

\item{frame.radius}{Integer, relevant only for the MICT-timing algorithm, 
specifying the radius of the timeframe.}

\item{covariates}{List of the columns of the dataset
containing covariates to be included in the imputation model.}

\item{time.covariates}{List of the columns of the dataset
with time-varying covariates to include in the imputation model.}

\item{regr}{Character specifying the imputation method. Options include 
\code{"multinom"} for multinomial models and \code{"rf"} for random forest 
models.}

\item{npt}{Number of prior observations in the imputation model for 
terminal gaps (i.e., gaps at the end of sequences).}

\item{nfi}{Number of future observations in the imputation model for 
initial gaps (i.e., gaps at the beginning of sequences).}

\item{ParExec}{Logical, indicating whether to run multiple imputations 
in parallel. Setting to \code{TRUE} can improve computation time depending 
on available cores.}

\item{ncores}{Integer, specifying the number of cores to use for parallel 
computation. If unset, defaults to the maximum number of CPU cores minus one.}

\item{SetRNGSeed}{Integer, to set the random seed for reproducibility in 
parallel computations. Note that setting \code{set.seed()} alone does not 
ensure reproducibility in parallel mode.}

\item{end.impute}{Logical. If \code{FALSE}, missing data at the end of 
sequences will not be imputed.}

\item{verbose}{Logical, if \code{TRUE}, displays progress and warnings 
in the console. Use \code{FALSE} for silent computation.}

\item{available}{Logical, specifies whether to consider already imputed 
data in the predictive model. If \code{TRUE}, previous imputations are 
used; if \code{FALSE}, only original data are considered.}

\item{pastDistrib}{Logical, if \code{TRUE}, includes the past distribution 
as a predictor in the imputation model.}

\item{futureDistrib}{Logical, if \code{TRUE}, includes the future 
distribution as a predictor in the imputation model.}

\item{...}{Named arguments that are passed down to the imputation functions.}
}
\value{
Returns an S3 object of class \code{seqimp}.
}
\description{
The seqimpute package implements the MICT and MICT-timing 
methods. These are multiple imputation methods for longitudinal data. 
The core idea of the algorithms is to fills gaps of missing data, which is 
the typical form of missing data in a longitudinal setting, recursively from 
their edges. The prediction is based on either a multinomial or a 
random forest regression model. Covariates and time-dependent covariates 
can be included in the model. 

The MICT-timing algorithm is an extension of the MICT algorithm designed 
to address a key limitation of the latter: its assumption that position in 
the trajectory is irrelevant.
}
\details{
The imputation process is divided into several steps, depending on
the type of gaps of missing data. The order of imputation of the gaps are:
\describe{
 \item{\code{Internal gap: }}{there is at least \code{np} observations 
 before an internal gap and \code{nf} after the gap}
 
 \item{\code{Initial gap: }}{gaps situated at the very beginning 
 of a trajectory}
 
 \item{\code{Terminal gap: }}{gaps situated at the very end
 of a trajectory}
 \item{\code{Left-hand side specifically located gap (SLG): }}{gaps 
 that have at least \code{nf} observations after the gap, but less than
 \code{np} observation before it}
 \item{\code{Right-hand side SLG: }}{gaps 
 that have at least \code{np} observations before the gap, but less than
 \code{nf} observation after it}
 \item{\code{Both-hand side SLG: }}{gaps 
 that have less than \code{np} observations before the gap, and less than
 \code{nf} observations after it}
}


The primary difference between the MICT and MICT-timing 
algorithms lies in their approach to selecting patterns from other 
sequences for fitting the multinomial model. While the MICT algorithm 
considers all similar patterns regardless of their temporal placement, 
MICT-timing restricts pattern selection to those that are temporally 
closest to the missing value. This refinement ensures that the 
imputation process adequately accounts for temporal dynamics, resulting 
in more accurate imputed values.
}
\examples{

# Default multiple imputation of the trajectories of game addiction with the
# MICT algorithm

\dontrun{
set.seed(5)
imp1 <- seqimpute(data = gameadd, var = 1:4)


# Default multiple imputation with the MICT-timing algorithm
set.seed(3)
imp2 <- seqimpute(data = gameadd, var = 1:4, timing = TRUE)


# Inclusion in the MICt-timing imputation process of the three background 
# characteristics (Gender, Age and Track), and the time-varying covariate 
# about gambling


set.seed(4)
imp3 <- seqimpute(data = gameadd, var = 1:4, covariates = 5:7, 
  time.covariates = 8:11)

  
# Parallel computation


imp4 <- seqimpute(data = gameadd, var = 1:4, covariates = 5:7, 
  time.covariates = 8:11, ParExec = TRUE, ncores=5, SetRNGSeed = 2)
}

}
\references{
HALPIN, Brendan (2012). Multiple imputation for life-course 
sequence data. Working Paper WP2012-01, Department of Sociology, 
University of Limerick. http://hdl.handle.net/10344/3639.

HALPIN, Brendan (2013). Imputing sequence data: Extensions to 
initial and terminal gaps, Stata's. Working Paper WP2013-01, 
Department of Sociology, 
University of Limerick. http://hdl.handle.net/10344/3620
}
\author{
Kevin Emery <kevin.emery@unige.ch>, Andre Berchtold,  
Anthony Guinchard, and Kamyar Taher
}
