\name{uniform.select}
\alias{uniform.select}
\title{Derive a subset of a large dataset}
\usage{
  uniform.select(bigMat, keep = 0.05, rows = TRUE,
    dir = "", random = TRUE, ram.gb = 0.1)
}
\arguments{
  \item{bigMat}{a big.matrix object, or any argument
  accepted by get.big.matrix(), which includes paths to
  description files or even a standard matrix object.}

  \item{keep}{numeric, by default a proportion (decimal) of
  the original number of rows/columns to choose for the
  subset. Otherwise if an integer>2 then will assume this
  is the size of the desired subset, e.g, for a dataset
  with 10,000 rows where you want a subset size of 1,000
  you could set 'keep' as either 0.1 or 1000.}

  \item{dir}{directory containing the
  filebacked.big.matrix, same as dir for get.big.matrix.}

  \item{rows}{logical, whether the subset should be of the
  rows of bigMat. If rows=FALSE, then the subset is chosen
  from columns, would be equivalent to calling
  subpc.select(t(bigMat)), but avoids actually performing
  the transpose which can save time for large matrices.}

  \item{random}{logical, passed to uniform.select(),
  whether to take a random or uniform selection of columns
  (or rows if rows=FALSE) to run the subset PCA.}

  \item{ram.gb}{maximum size of the matrix in gigabytes for
  the subset PCA, 0.1GB is the default which should result
  in minimal processing time on a typical system.
  Increasing this increases the processing time, but also
  the representativeness of the subset chosen. Note that
  some very large matrices will not be able to be processed
  by this function unless this parameter is increased;
  basically if the dimension being thinned is more than 5%
  of this memory limit (see estimate.memory() from
  NCmisc).}
}
\value{
  A set of row or column indexes (depents on 'rows'
  parameter) of uniformly distributed (optionally
  reproduceable) or randomly selected variables in the
  matrix.
}
\description{
  Either randomly or uniformly select rows or columns from
  a large dataset to form a new smaller dataset.
}
\examples{
mat <- matrix(rnorm(200*100),ncol=200)  # standard matrix
bmat <- as.big.matrix(mat)              # big.matrix
ii1 <- uniform.select(bmat,.05,rows=TRUE) # thin down to 5\% of the rows
ii2 <- uniform.select(bmat,45,rows=FALSE,random=TRUE) # thin down to 45 columns
prv(ii1,ii2)
}
\author{
  Nicholas Cooper
}
\seealso{
  subpc.select
}

