% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clusters.R
\name{clusterKmeans}
\alias{clusterKmeans}
\title{Automated K-Means Clustering + PCA}
\usage{
clusterKmeans(
  df,
  k = NA,
  limit = 20,
  drop_na = TRUE,
  ignore = NA,
  ohse = TRUE,
  norm = TRUE,
  comb = c(1, 2),
  seed = 123,
  quiet = TRUE
)
}
\arguments{
\item{df}{Dataframe}

\item{k}{Integer. Number of clusters}

\item{limit}{Integer. How many clusters should be considered?}

\item{drop_na}{Boolean. Should NA rows be removed?}

\item{ignore}{Character vector. Which columns should be excluded
when calculating kmeans?}

\item{ohse}{Boolean. Do you wish to automatically run one hot
encoding to non-numerical columns?}

\item{norm}{Boolean. Should the data be normalized?}

\item{comb}{Vector. Which columns do you wish to plot? Select which
two variables by name or column position.}

\item{seed}{Numeric. Seed for reproducibility}

\item{quiet}{Boolean. Keep quiet? If not, print messages.}
}
\value{
List. If no \code{k} is provided, contains \code{nclusters} and 
\code{nclusters_plot} to determine optimal \code{k} given their WSS (Within
Groups Sum of Squares). If \code{k} is provided, additionally we get:
\itemize{
  \item \code{df} data.frame with original \code{df} plus \code{cluster} column
  \item \code{clusters} integer which is the same as \code{k}
  \item \code{fit} kmeans object used to fit clusters
  \item \code{means} data.frame with means and counts for each cluster
  \item \code{correlations} plot with correlations grouped by clusters
  \item \code{PCA} list with PCA results
}
}
\description{
This function lets the user cluster a whole data.frame automatically.
As you might know, the goal of kmeans is to group data points into 
distinct non-overlapping subgroups. If needed, one hot encoding will 
be applied to categorical values automatically with this function. 
For consideration: Scale/standardize the data when applying kmeans.
Also, kmeans assumes spherical shapes of clusters and does not work well 
when clusters are in different shapes such as elliptical clusters.
}
\examples{
Sys.unsetenv("LARES_FONT") # Temporal
data("iris")
df <- subset(iris, select = c(-Species))

# Find optimal k
check_k <- clusterKmeans(df, limit = 10)
check_k$nclusters_plot
# You can also use our other functions:
# clusterOptimalK() and clusterVisualK()

# Run with selected k
clusters <- clusterKmeans(df, k = 3)
names(clusters)

# Cross-Correlations for each cluster
plot(clusters$correlations)

# PCA Results
plot(clusters$PCA$plotVarExp)
plot(clusters$PCA$plot_1_2)

# You must have \code{ggforce} library to use this auxiliary function:
# 3D interactive plot
\dontrun{clusters$PCA$plot_1_2_3}
}
\seealso{
Other Clusters: 
\code{\link{clusterOptimalK}()},
\code{\link{clusterVisualK}()}
}
\concept{Clusters}
