% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/functions.R
\name{con_filter}
\alias{con_filter}
\title{Filter a dataframe using two-way criteria to increase connectedness}
\usage{
con_filter(data, formula, verbose = TRUE, returndropped = FALSE)
}
\arguments{
\item{data}{A dataframe}

\item{formula}{A formula with two factor names in the dataframe
that specifies the criteria for filtering,
like \code{y ~ 2 * f1 / f2}}

\item{verbose}{If TRUE, print some diagnostic information about what data
is being deleted. (Similar to the 'tidylog' package).}

\item{returndropped}{If TRUE, return the dropped rows instead of the
kept rows. Default is FALSE.}
}
\value{
The original dataframe is returned, minus rows that are filtered out.
}
\description{
Traditional filtering (subsetting) of data is typically performed via
some criteria based on the \emph{columns} of the data.

In contrast, this function performs filtering of data based on the
\emph{joint} rows and columns of a matrix-view of two factors.

Conceptually, the idea is to re-shape two or three columns of a dataframe
into a matrix, and then delete entire rows (or columns) of the matrix if
there are too many missing cells in a row (or column).

The two most useful applications of two-way filtering are to:
\enumerate{
\item Remove a factor level that has few interactions with another factor.
This is especially useful in linear models to remove rare factor
combinations.
\item Remove a factor level that has any missing interactions with another
factor. This is especially useful with biplots of a matrix to remove
rows or columns that have missing values.
}

A formula syntax is used to specify the two-way filtering criteria.

Some examples may provide the easiest understanding.

dat <- data.frame(state=c("NE","NE", "IA", "NE", "IA"),
year=c(1,2,2,3,3), value=11:15)

When the 'value' column is re-shaped into a matrix it looks like:

state/year |  1 |  2 |  3 |
NE | 11 | 12 | 14 |
IA |    | 13 | 15 |

Drop states with too much missing combinations.
Keep only states with "at least 3 years per state"
con_filter(dat, ~ 3 * year / state)
NE    1    11
NE    2    12
NE    3    14

Keep only years with "at least 2 states per year"
con_filter(dat, ~ 2 * state / year)
NE    2    12
IA    2    13
NE    3    14
IA    3    15

If the constant number in the formula is less than 1.0, this is
interpreted as a \emph{fraction}.
Keep only states with "at least 75\% of years per state"
con_filter(dat, ~ .75 * year / state)

It is possible to include another factor on either side of the slash "/".
Suppose the data had another factor for political party called "party".
Keep only states with "at least 2 combinations of party:year per state"
con_filter(dat, ~ 2 * party:year / state)

If the formula contains a response variable, missing values are dropped
first, then the two-way filtering is based on the factor combinations.
con_filter(dat, value ~ 2 * state / year)
}
\examples{
dat <- data.frame(
  gen = c("G3", "G4", "G1", "G2", "G3", "G4", "G5",
          "G1", "G2", "G3", "G4", "G5",
          "G1", "G2", "G3", "G4", "G5",
          "G1", "G2", "G3", "G4", "G5"),
  env = c("E1", "E1", "E1", "E1", "E1", "E1", "E1",
          "E2", "E2", "E2", "E2", "E2",
          "E3", "E3", "E3", "E3", "E3",
          "E4", "E4", "E4", "E4", "E4"),
  yield = c(65, 50, NA, NA, 65, 50, 60,
            NA, 71, 76, 80, 82,
            90, 93, 95, 102, 97,
            98, 102, 105, 130, 135))

# How many observations are there for each combination of gen*env?
with( subset(dat, !is.na(yield)) , table(gen,env) )

# Note, if there is no response variable, the two-way filtering is based
# only on the presence of the factor combinations.
dat1 <- con_filter(dat, ~ 4*env / gen)

# If there is a response variable, missing values are dropped first,
# then the two-way filtering is based on the factor combinations.

dat1 <- con_filter(dat, yield ~ 4*env/gen)
dat1 <- con_filter(dat, yield ~ 5*env/ gen)
dat1 <- con_filter(dat, yield ~ 6*gen/ env)
dat1 <- con_filter(dat, yield ~ .8 *env / gen)
dat1 <- con_filter(dat, yield ~ .8* gen / env)
dat1 <- con_filter(dat, yield ~ 7 * env / gen)

}
\references{
None.
}
\author{
Kevin Wright
}
