% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fastexplore.R
\name{fastexplore}
\alias{fastexplore}
\title{Explore and Summarize a Dataset Quickly}
\usage{
fastexplore(
  data,
  label = NULL,
  visualize = c("histogram", "boxplot", "barplot", "heatmap", "scatterplot"),
  save_results = TRUE,
  output_dir = NULL,
  sample_size = NULL,
  interactive = FALSE,
  corr_threshold = 0.9,
  auto_convert_numeric = TRUE,
  visualize_missing = TRUE,
  imputation_suggestions = FALSE,
  report_duplicate_details = TRUE,
  detect_near_duplicates = TRUE,
  auto_convert_dates = FALSE,
  feature_engineering = FALSE,
  outlier_method = c("iqr", "zscore", "dbscan", "lof"),
  run_distribution_checks = TRUE,
  normality_tests = c("shapiro"),
  pairwise_matrix = TRUE,
  max_scatter_cols = 5,
  grouped_plots = TRUE,
  use_upset_missing = TRUE
)
}
\arguments{
\item{data}{A \code{data.frame}. The dataset to analyze.}

\item{label}{A character string specifying the name of the target or label column (optional).
If provided, certain grouped plots and class imbalance checks will be performed.}

\item{visualize}{A character vector specifying which visualizations to produce.
Possible values: \code{c("histogram", "boxplot", "barplot", "heatmap", "scatterplot")}.}

\item{save_results}{Logical. If \code{TRUE}, saves plots and a rendered report (HTML) into
a timestamped \code{EDA_Results_} folder inside \code{output_dir}.}

\item{output_dir}{A character string specifying the output directory for saving results
(if \code{save_results = TRUE}). Defaults to current working directory.}

\item{sample_size}{An integer specifying a random sample size for the data to be used in
visualizations. If \code{NULL}, uses the entire dataset.}

\item{interactive}{Logical. If \code{TRUE}, attempts to produce interactive Plotly heatmaps
and other interactive elements. If required packages are not installed, falls back to static plots.}

\item{corr_threshold}{Numeric. Threshold above which correlations (in absolute value)
are flagged as high. Defaults to \code{0.9}.}

\item{auto_convert_numeric}{Logical. If \code{TRUE}, automatically converts factor/character
columns that look numeric (only digits, minus sign, or decimal point) to numeric.}

\item{visualize_missing}{Logical. If \code{TRUE}, attempts to visualize missingness patterns
(e.g., via an \code{UpSet} plot, if \pkg{UpSetR} is available, or \pkg{VIM}, \pkg{naniar}).}

\item{imputation_suggestions}{Logical. If \code{TRUE}, prints simple text suggestions for imputation strategies.}

\item{report_duplicate_details}{Logical. If \code{TRUE}, shows top duplicated rows and their frequency.}

\item{detect_near_duplicates}{Logical. Placeholder for near-duplicate (fuzzy) detection.
Currently not implemented.}

\item{auto_convert_dates}{Logical. If \code{TRUE}, attempts to detect and convert date-like
strings (\code{YYYY-MM-DD}) to \code{Date} format.}

\item{feature_engineering}{Logical. If \code{TRUE}, automatically engineers derived features
(day, month, year) from any date/time columns, and identifies potential ID columns.}

\item{outlier_method}{A character string indicating which outlier detection method(s) to apply.
One of \code{c("iqr", "zscore", "dbscan", "lof")}. Only the first match will be used in the code
(though the function is designed to handle multiple).}

\item{run_distribution_checks}{Logical. If \code{TRUE}, runs normality tests (e.g., Shapiro-Wilk)
on numeric columns.}

\item{normality_tests}{A character vector specifying which normality tests to run.
Possible values include \code{"shapiro"} or \code{"ks"} (Kolmogorov-Smirnov).
Only used if \code{run_distribution_checks = TRUE}.}

\item{pairwise_matrix}{Logical. If \code{TRUE}, produces a scatterplot matrix (using \pkg{GGally})
for numeric columns.}

\item{max_scatter_cols}{Integer. Maximum number of numeric columns to include in the pairwise matrix.}

\item{grouped_plots}{Logical. If \code{TRUE}, produce grouped histograms, violin plots,
and density plots by label (if the label is a factor).}

\item{use_upset_missing}{Logical. If \code{TRUE}, attempts to produce an UpSet plot for missing data
if \pkg{UpSetR} is available.}
}
\value{
A (silent) list containing:
\itemize{
  \item \code{data_overview} - A basic overview (head, unique values, skim summary).
  \item \code{summary_stats} - Summary statistics for numeric columns.
  \item \code{freq_tables} - Frequency tables for factor columns.
  \item \code{missing_data} - Missing data overview (count, percentage).
  \item \code{duplicated_rows} - Count of duplicated rows.
  \item \code{class_imbalance} - Class distribution if \code{label} is provided and is categorical.
  \item \code{correlation_matrix} - The correlation matrix for numeric variables.
  \item \code{zero_variance_cols} - Columns with near-zero variance.
  \item \code{potential_id_cols} - Columns with unique values in every row.
  \item \code{date_time_cols} - Columns recognized as date/time.
  \item \code{high_corr_pairs} - Pairs of variables with correlation above \code{corr_threshold}.
  \item \code{outlier_method} - The chosen method for outlier detection.
  \item \code{outlier_summary} - Outlier proportions or metrics (if computed).
}
If \code{save_results = TRUE}, additional side effects include saving figures, a correlation heatmap,
and an R Markdown report in the specified directory.
}
\description{
\code{fastexplore} provides a fast and comprehensive exploratory data analysis (EDA) workflow.
It automatically detects variable types, checks for missing and duplicated data,
suggests potential ID columns, and provides a variety of plots (histograms, boxplots,
scatterplots, correlation heatmaps, etc.). It also includes optional outlier detection,
normality testing, and feature engineering.
}
\details{
This function automates many steps of EDA:
\enumerate{
  \item Automatically detects numeric vs. categorical variables.
  \item Auto-converts columns that look numeric (and optionally date-like).
  \item Summarizes data structure, missingness, duplication, and potential ID columns.
  \item Computes correlation matrix and flags highly correlated pairs.
  \item (Optional) Outlier detection using IQR, Z-score, DBSCAN, or LOF methods.
  \item (Optional) Normality tests on numeric columns.
  \item Saves all results and an R Markdown report if \code{save_results = TRUE}.
}
}
