% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/partition.R
\name{partition}
\alias{partition}
\title{Agglomerative partitioning}
\usage{
partition(
  .data,
  threshold,
  partitioner = part_icc(),
  tolerance = 1e-04,
  niter = NULL,
  x = "reduced_var",
  .sep = "_"
)
}
\arguments{
\item{.data}{a data.frame to partition}

\item{threshold}{the minimum proportion of information explained by a reduced
variable; \code{threshold} sets a boundary for information loss because each
reduced variable must explain at least as much as \code{threshold} as measured
by the metric.}

\item{partitioner}{a \code{partitioner}. See the \verb{part_*()} functions and
\code{\link[=as_partitioner]{as_partitioner()}}.}

\item{tolerance}{a small tolerance within the threshold; if a reduction is
within the threshold plus/minus the tolerance, it will reduce.}

\item{niter}{the number of iterations. By default, it is calculated as 20\% of
the number of variables or 10, whichever is larger.}

\item{x}{the prefix of the new variable names}

\item{.sep}{a character vector that separates \code{x} from the number (e.g.
"reduced_var_1").}
}
\value{
a \code{partition} object
}
\description{
\code{partition()} reduces data while minimizing information loss
using an agglomerative partitioning algorithm. The partition algorithm is
fast and flexible: at every iteration, \code{partition()} uses an approach
called Direct-Measure-Reduce (see Details) to create new variables that
maintain the user-specified minimum level of information. Each reduced
variable is also interpretable: the original variables map to one and only
one variable in the reduced data set.
}
\details{
\code{partition()} uses an approach called Direct-Measure-Reduce.
Directors tell the partition algorithm what to reduce, metrics tell it
whether or not there will be enough information left after the reduction,
and reducers tell it how to reduce the data. Together these are called a
partitioner. The default partitioner for \code{partition()} is \code{\link[=part_icc]{part_icc()}}:
it finds pairs of variables to reduce by finding the pair with the minimum
distance between them, it measures information loss through ICC, and it
reduces data using scaled row means. There are several other partitioners
available (\verb{part_*()} functions), and you can create custom partitioners
with \code{\link[=as_partitioner]{as_partitioner()}} and \code{\link[=replace_partitioner]{replace_partitioner()}}.
}
\examples{

set.seed(123)
df <- simulate_block_data(c(3, 4, 5), lower_corr = .4, upper_corr = .6, n = 100)

#  don't accept reductions where information < .6
prt <- partition(df, threshold = .6)
prt

# return reduced data
partition_scores(prt)

# access mapping keys
mapping_key(prt)
unnest_mappings(prt)

# use a lower threshold of information loss
partition(df, threshold = .5, partitioner = part_kmeans())

# use a custom partitioner
part_icc_rowmeans <- replace_partitioner(part_icc, reduce = as_reducer(rowMeans))
partition(df, threshold = .6, partitioner = part_icc_rowmeans)

}
\seealso{
\code{\link[=part_icc]{part_icc()}}, \code{\link[=part_kmeans]{part_kmeans()}}, \code{\link[=part_minr2]{part_minr2()}}, \code{\link[=part_pc1]{part_pc1()}},
\code{\link[=part_stdmi]{part_stdmi()}}, \code{\link[=as_partitioner]{as_partitioner()}}, \code{\link[=replace_partitioner]{replace_partitioner()}}
}
