##' @title Test pairwise variable independence
##' @description This is a high-level function which accepts a data
##' set, stop criteria, and split functions for continuous variables
##' and then applies a chi-square test for independence to bins
##' generated by recursively binning the ranks of continuous variables
##' or implied by the combinations of levels of categorical variables.
##' @details `DepSearch` is a wrapper function which organizes and
##' executes pairwise binning to test independence between all
##' variable pairs in `data`. While splitting logic of any sort is
##' supported for continuous margins through the use of the `catCon`
##' and `conCon` arguments, the default settings apply rRandom
##' recursive binning, which proceeds for a single pair in three
##' basic steps.
##'
##' First, the types of the two pairs are identified and rank
##' transformations are applied. If one or both are continuous, the
##' continuous variables are transformed to their ranks. Categorical,
##' logical, and ordinal variables are not transformed.
##'
##' Second, the ranks of the continuous margins are partitioned by
##' edges added at random positions recursively. For the case of dual
##' continuous variables, the edge at each recursive step is added on
##' a randomly selected margin. If one variable is not continuous,
##' then only the continuous margin is recursively split.
##'
##' Finally, the resulting partition is evaluated using a chi-square
##' test. For non-continuous variables, this is the classic
##' contingency table test. For continuous variables, expected counts
##' for each cell of the partition are determined based on the area of
##' the cell. The degrees of freedom for the case of a continuous
##' margin are motivated by the contingency table case verified by
##' empirical investigations. Alternatively, several other options
##' are provided to allow a user to select the degrees of freedom
##' approximation they prefer.
##'
##' This procedure produces a p-value for every pairwise test, placing
##' all pairwise measures on a comparable scale to each other. By
##' placing edges randomly, the method avoids any systematic bias
##' against particular patterns while still remaining powerful in the
##' detection of function and non-function dependencies of any type.
##'
##' The output of `DepSearch` is a list, the first element of
##' which is a list of lists, each of which records the details of the
##' binning of a particular pair of variables.
##' @param data `data.frame` or object coercible to a `data.frame`
##' @param stopCriteria output of `makeCriteria` providing criteria
##' used to stop binning to be passed to binning functions
##' @param catCon splitting function to apply to pairs of one
##' cateogorical and one continuous variable
##' @param conCon splitting function to apply to pairs of continuous
##' variables
##' @param ptype one of 'simple', 'conservative', 'gamma', or
##' 'fitted': the type of p-values to compute for continuous pairs
##' and pairs of mixed type. 'Conservative' assumes a chi-square
##' distribution for the statistic with highly conservative degrees
##' of freedom based on continuous uniform margins that
##' do not account for the constraints introduced by the ranks.
##' 'Simple' assumes a chi-square distribution but uses
##' contingency-table inspired degrees of freedom which can be
##' slightly anti-conservative in the case of continuous pairs but
##' work well for continuous/categorical comparisons. 'Gamma'
##' assumes a gamma distribution on the resulting statistics with
##' parameters determined by empirical investigation. 'Fitted'
##' mixes the gamma approach and the chi-squared approach these by
##' applying 'gamma' to continuous-categorical comparisons and a
##' least squares fitted version of the simple approximation to
##' continuous-continuous comparisons with parameters determined by
##' empirical study. For all categorical-categorical comparisons
##' the contingency table degrees of freedom are used in
##' a chi-square distribution.
##' @param dropPoints logical; should returned bins contain points?
##' @return A `DepSearch` object, with slots `data`, `types`,
##' `pairs`, `binnings`, `residuals`, `statistics`, `K`, `logps`, and
##' `pvalues` that stores the results of using recursive binning with
##' the specified splitting logic to test independence on a data set.
##' `data` gives the name of the data object in the global environment
##' which was split, `types` is a character vector giving the data
##' types of each pair, `pairs` is a character vector of the variable
##' names of each pair, `binnings` is a list of lists where each list
##' is the binning fir to the corresponding pair by the recursive
##' binning algorithm, `residuals` is list of numeric vectors giving
##' the residual for each bin of each pairwise binning, `statistics`
##' is a numeric vector giving the chi-squared statistic for each
##' binning, `K` is a numeric vector giving the number of bins in each
##' binning, `logps` gives the natural logarithm of the statistic's
##' p-value, and finally `pvalues` is a numeric vector of p-values
##' for `statistics` based on the specified p-value computation, which
##' defaults to 'simple'.  Internally, the
##' p-values are computed on the log scale to better distinguish
##' between strongly dependent pairs and the `pvalues` returned are
##' computed by calling `exp(logps)`. The order of all returned values
##' is by increasing `logps`.
##' @examples
##' ## load the iris data set
##' data(iris)
##' ## evaluate dependence in the iris data
##' iris_binnings <- DepSearch(iris)
##' ## plot top departure displays
##' plot(iris_binnings)
##' ## summarize reults
##' summary(iris_binnings)
##' @author Chris Salahub
DepSearch <- function(data, stopCriteria,
                      catCon = uniRIntSplit,
                      conCon = rIntSplit,
                      ptype = c('simple',
                                'conservative',
                                'gamma',
                                'fitted'),
                      dropPoints = FALSE) {
    ## argument checking
    datName <- deparse1(substitute(data))
    if (!is.data.frame(data)) stop("`data` must be a data frame")
    if (missing(stopCriteria)) {
        stopCriteria <- "depth >= 6 | n < 1 | expn <= 10 | stopped"
    }

    ## function definition
    stopFn <- function(bns) stopper(bns, stopCriteria)
    
    ## pre-processing
    vars <- names(data) # get all variable names
    types <- sapply(data, class) # get all classes
    chars <- types == "character"
    ints <- types == "integer"
    logs <- types == "logical"
    if (any(ints)) { # regularize type names
        data[ints] <- lapply(data[ints], as.numeric)
        types[ints] <- "numeric"
    }
    if (any(chars)) {
        data[chars] <- lapply(data[chars], as.factor)
        types[chars] <- "factor"
    }
    if (any(logs)) {
        data[logs] <- lapply(data[logs],
                             function(x) as.factor(as.numeric(x)))
        types[logs] <- "factor"
    }
    combs <- combn(ncol(data), 2) # get all pairs
    scndRwInds <- types[combs[2, ]] == "factor"
    scndRwFs <- combs[2, scndRwInds]
    combs[2, scndRwInds] <- combs[1, scndRwInds]
    combs[1, scndRwInds] <- scndRwFs # factors always come first
    typecomb <- apply(combs, 2,
                      function(x) paste(types[x], collapse = ":"))
    nlev <- sapply(data, function(var) length(levels(var)))

    ## get ranks
    ranks <- data
    ranks[, types=="numeric"] <- apply(data[types=="numeric"],
                                       2, rank,
                                       ties.method = "random")
    
    ## pairwise functions: random squarified splitting
    bns <- vector(mode = "list", length(typecomb))
    names(bns) <- apply(combs, 2,
                        function(x) paste(vars[x],
                                          collapse = ":"))
    facFac <- which(typecomb == "factor:factor")
    facNum <- which(typecomb == "factor:numeric")
    numNum <- which(typecomb == "numeric:numeric")
    for (ii in facFac) {
        bns[[ii]] <- catBinner(x = ranks[, combs[1, ii]],
                               y = ranks[, combs[2, ii]],
                               dropPoints = dropPoints)
    }
    for (ii in facNum) {
        bns[[ii]] <- uniBinner(x = ranks[, combs[1, ii]],
                               y = ranks[, combs[2, ii]],
                               stopper = stopFn,
                               splitter = catCon,
                               dropPoints = dropPoints)
    }
    for (ii in numNum) {
        bns[[ii]] <- binner(x = ranks[, combs[1, ii]],
                            y = ranks[, combs[2, ii]],
                            stopper = stopFn,
                            splitter = conCon,
                            dropPoints = dropPoints)
    }

    ## compute statistic values and p-values
    binStats <- lapply(bns, binChi)
    obStats <- sapply(binStats, function(x) x$stat)
    K <- sapply(binStats, function(x) x$nbins)
    ptype <- match.arg(ptype)
    pvals <- simpleDfs <- numeric(length(typecomb))
    ## compute the simple approximate dfs
    simpleDfs[facFac] <- K[facFac] - nlev[combs[1, facFac]] -
        nlev[combs[2, facFac]] + 1
    simpleDfs[facNum] <- facNumSimpleDf(K[facNum],
                                        nlev[combs[1, facNum]])
    simpleDfs[numNum] <- numNumSimpleDf(K[numNum])
    ## compute the pvalues
    pvals[facFac] <- pchisq(obStats[facFac],
                            df = K[facFac] - nlev[combs[1, facFac]] -
                                nlev[combs[2, facFac]] + 1,
                            lower.tail = FALSE, log.p = TRUE)
    if (ptype == 'simple') {
        pvals[facNum] <- pchisq(obStats[facNum],
                                df = facNumSimpleDf(K[facNum],
                                                    nlev[combs[1, facNum]]),
                                lower.tail = FALSE, log.p = TRUE)
        pvals[numNum] <- pchisq(obStats[numNum],
                                df = numNumSimpleDf(K[numNum]),
                                lower.tail = FALSE, log.p = TRUE)
    } else if (ptype == 'conservative') {
        pvals[facNum] <- pchisq(obStats[facNum],
                                df = K[facNum] - nlev[combs[1, facNum]],
                                lower.tail = FALSE, log.p = TRUE)
        pvals[numNum] <- pchisq(obStats[facNum],
                                df = K[facNum] - 1,
                                lower.tail = FALSE, log.p = TRUE)
    } else if (ptype == 'gamma') {
        pvals[facNum] <- pgamma(obStats[facNum],
                                shape = facNumGammaShape(K[facNum],
                                                         nlev[combs[1, facNum]]),
                                scale = facNumGammaScale(K[facNum],
                                                         nlev[combs[1, facNum]]),
                                lower.tail = FALSE, log.p = TRUE)
        pvals[numNum] <- pgamma(obStats[numNum],
                                shape = numNumGammaShape(K[numNum]),
                                scale = numNumGammaScale(K[numNum]),
                                lower.tail = FALSE, log.p = TRUE)
    } else if (ptype == 'fitted') {
        pvals[facNum] <- pgamma(obStats[facNum],
                                shape = facNumGammaShape(K[facNum],
                                                         nlev[combs[1, facNum]]),
                                scale = facNumGammaScale(K[facNum],
                                                         nlev[combs[1, facNum]]),
                                lower.tail = FALSE, log.p = TRUE)
        pvals[numNum] <- pchisq(obStats[numNum],
                                df = numNumFittedDf(K[numNum]),
                                lower.tail = FALSE, log.p = TRUE)
    } else {
        stop("ptype must be one of ('simple', 'conservative', 'gamma', or 'fitted')")
    }
    pord <- order(pvals)
    
    ## return everything
    result <- list(data = datName,
                   types = typecomb[pord],
                   pairs = names(bns)[pord],
                   binnings = bns[pord],
                   simpleDfs = simpleDfs[pord],
                   Ks = K[pord],
                   residuals = sapply(binStats[pord],
                                      function(x) x$residuals),
                   statistics = sapply(binStats[pord],
                                       function(x) x$stat),
                   logps = pvals[pord],
                   pvalues = exp(pvals)[pord])
    class(result) <- "DepSearch"
    result
}

##' Methods
##' @title S3 methods for `DepSearch`
##' @description The `summary` and `plot` methods outlined here
##' support the quick description of an `DepSearch` object.
##' @details For each index in `which`, this function produces a row
##' of three plots. The first plot is the raw data, the second plot
##' is the ranks of the data, and the final plot is the binning
##' contained in the `DepSearch` object.
##' @param object `DepSearch` object to summarize
##' @param x object with class `DepSearch`
##' @param ... additional arguments to pass on to the method
##' @param adjustP logical: should the p-values be adjusted for
##' multiple testing?
##' @param which indices of binnings to display from `x`, where
##' binnings are ordered by increasing p-value
##' @param border colour of borders to be drawn on the binnings
##' @param buffer relative width of empty space separating categories
##' @param dropPoints logical: should points be dropped for the plot
##' of the binnings?
##' @param colrng colour range to be passed to `residualFill` for
##' plotting
##' @param nbr number of breaks to be passed to `residualFill` for
##' plotting
##' @param pch point type passed to plot
##' @return Nothing for the plot method, while summary quietly returns
##' a summary of `DepSearch`
##' @author Chris Salahub
##' @describeIn methods Summary method for `DepSearch`
summary.DepSearch <- function(object, ..., adjustP=FALSE) {
    dat <- object$data
    nprs <- length(object$pairs)
    typtab <- table(object$types)
    if (adjustP) {
        pvals <- pmin(nprs*object$pvalues, 1)
        added <- "(p-values adjusted by the Bonferroni method)\n"
    } else {
        pvals <- object$pvalues
        added <- ""
    }
    pdecs <- quantile(pvals, probs = seq(0, 1, by = 0.1))
    sig5 <- sum(object$pvalues < 0.05)
    sig1 <- sum(object$pvalues < 0.01)
    cat(paste0("All ", nprs, " pairs in ", dat,
          " recursively binned with type distribution: \n"))
    print(typtab)
    cat("\n")
    cat(paste0(sig5, " pairs are significant at 5% and ",
               sig1, " pairs are significant at 1%\n"))
    cat(added)
    toppairs <- min(10, nprs)
    cat("\nMost significant ", toppairs, " pairs:\n")
    for (ii in 1:toppairs) {
        cat(object$pairs[ii],
            paste0(" (",
                   format(pvals[ii], digits = 2,
                          scientific = TRUE),
                   ")"),
            "\n")
    }
    invisible(list(data = dat, npair = nprs,
                   typeTable = typtab, adjusted = adjustP,
                   pDeciles = pdecs))
}
##' @describeIn methods Print method for `DepSearch`
print.DepSearch <- function(x, ...) {
    nprs <- length(x$pairs)
    cat(paste0("Recursive binning results for all ", nprs, " pairs in ",
               x$data))
}
##' @describeIn methods Plot method for `DepSearch`
plot.DepSearch <- function(x, ..., which = 1:5, border = "black",
                           buffer = 0.01, dropPoints = FALSE,
                           colrng = c("steelblue", "white", "firebrick"),
                           nbr = NA, pch = ".") {
    dat <- get(x$data)
    prs <- strsplit(x$pairs[which], split = "\\:")
    typs <- strsplit(x$types[which], split = "\\:")
    oldPar <- par(mfrow = c(length(which), 3),
                  mar = c(0.5, 1.1, 2.1, 0.1))
    #maxRes <- max(sapply(x$residuals, max))
    for (ii in seq_along(prs)) {
        x1 <- dat[, prs[[ii]][1]] # get pair
        y <- dat[, prs[[ii]][2]]
        scl <- length(x1)*buffer
        if (typs[[ii]][1] == "factor") { # jitter factors
            x1 <- as.factor(x1) # ensure type
            xtbl <- table(x1)
            xbr <- c(0, xtbl)
            xa <- cumsum(xbr[-length(xbr)])/2 + cumsum(xbr[-1])/2
            ## handle small categories thinner than scl
            xmns <- pmin(-xtbl[as.numeric(x1)]/2 + scl, 0)
            xmxs <- pmax(xtbl[as.numeric(x1)]/2 - scl, 0)
            pltx <- xa[as.numeric(x1)] +
                runif(length(x1), min = xmns, max = xmxs)
        } else {
            pltx <- x1
            xbr <- NA
        }
        if (typs[[ii]][2] == "factor") {
            y <- as.factor(y)
            ytbl <- table(y)
            ybr <- c(0, ytbl)
            ya <- cumsum(ybr[-length(ybr)])/2 + cumsum(ybr[-1])/2
            ymns <- pmin(-ytbl[as.numeric(y)]/2 + scl, 0)
            ymxs <- pmax(ytbl[as.numeric(y)]/2 - scl, 0)
            plty <- ya[as.numeric(y)] +
                runif(length(y), min = ymns, max = ymxs)
        } else {
            plty <- y
            ybr <- NA
        }
        ## create three plot areas
        plot(x = pltx, y = plty, xaxt = "n", yaxt = "n", pch = pch,
             ...)
        abline(h = cumsum(ybr), v = cumsum(xbr), lty = 2)
        mtext("Raw", side = 3, line = 0, cex = 0.6)
        plot(x = rank(pltx, ties.method = "random"),
             y = rank(plty, ties.method = "random"),
             xaxt = "n", yaxt = "n", pch = pch, ...)
        abline(h = cumsum(ybr), v = cumsum(xbr), lty = 2)
        mtext("Ranks", side = 3, line = 0, cex = 0.6)
        mtext(side = 3, line = 1, cex = 0.8,
              text = bquote("Pair:"~.(paste(prs[[ii]],
                                            collapse = "|"))*
                                ","~log[10]*p~"="~.(round(x$logps[which[ii]]/
                                                         log(10),
                                                         1))))
        if (dropPoints) thirdPch = NA else thirdPch = pch
        plotBinning(x$binnings[[which[ii]]], factor = 0.9,
                    xlab = "", ylab = "", border = border,
                    fill = importanceFill(x$binnings[[which[ii]]],
                                          colrng = colrng, nbr = nbr),
                    showYax = FALSE, showXax = FALSE, pch = thirdPch,
                    ...)
        mtext("Bins", side = 3, line = 0, cex = 0.6)
    }
    par(oldPar)
}
