% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nbc4va_data.R
\name{internalSubAsRest}
\alias{internalSubAsRest}
\title{Substitute values in a dataframe proportionally to all other values}
\usage{
internalSubAsRest(
  dataset,
  x,
  cols = 1:ncol(dataset),
  ignore = c(NA, NaN),
  removal = FALSE
)
}
\arguments{
\item{dataset}{A dataframe with value(s) of \emph{x} in it.}

\item{x}{A target value in dataframe to replace with the rest of values per column.}

\item{cols}{A numeric vector of columns to consider for substitution.}

\item{ignore}{A vector of the rest of the values to ignore for substitution.}

\item{removal}{Set to TRUE to remove column(s) that consist only of \emph{x} values.}
}
\value{
out A dataframe or list depending on \emph{removal}:
\itemize{
  \item if (\emph{removal} is FALSE) return the \emph{dataset} with values of \emph{x} substituted by the rest of the values per column
  \item if (\emph{removal} is TRUE) return a list with the following:
  \itemize{
    \item $removed (vectorof numeric): the removed column indices if the column(s) consists only of \emph{x} values
    \item $dataset (dataframe): the \emph{dataset} with values of \emph{x} substituted by the rest of the values per column
  }
}
}
\description{
Substitute a target value proportionally to the distribution of the rest of the values in a column, given the following conditions:
\itemize{
  \item If a column contains only the target value, the column is removed
  \item If there are not enough target values to be distributed, then each target value will be
  randomly sampled from the rest of the column values with replacement
}
}
\details{
Pseudocode of algorithm:
\preformatted{
  SET dataset = table of values with columns and rows
  SET x = target value for substitution

  IF x in dataset:
    FOR EACH column y in a dataset:
      SET xv = all x values in y
      SET rest = all values not equal to x in y
      IF xv == values in y:
        REMOVE y in dataset
      IF number of unique values of rest == 1:
        MODIFY xv = rest
      IF number of xv values < number of unique values of rest:
        SET xn = number of xv values
        MODIFY xv = random sample of rest with size xn
      ELSE:
        SET xn = number of xv values
        SET p = proportions of rest
        SET xnp = xn * p
        IF xnp has decimals:
          MODIFY xnp = round xnp such that sum(xnp) == xn via largest remainder method
        MODIFY xv = rest values with distribution of xnp
  RETURN dataset
}
}
\examples{
library(nbc4va)
data(nbc4vaDataRaw)
unclean <- nbc4vaDataRaw
clean <- nbc4va::internalSubAsRest(unclean, 99)

}
\seealso{
Other data functions: 
\code{\link{internalRoundFixedSum}()}
}
\concept{data functions}
\keyword{internal}
