% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/distance.R
\name{kcount}
\alias{kcount}
\title{K-mer counting.}
\usage{
kcount(x, k = 5, residues = NULL, gap = "-", named = TRUE,
  compress = TRUE)
}
\arguments{
\item{x}{a matrix of aligned sequences, a list of unaligned sequences,
or a vector representing a single sequence.
Accepted modes are "character" and "raw" (the latter being applicable
for "DNAbin" and "AAbin" objects).}

\item{k}{integer representing the k-mer size. Defaults to 5.
Note that high values of k may be slow to compute and use a lot of
memory due to the large numbers of calculations required,
particularly when the residue alphabet is also large.}

\item{residues}{either NULL (default; the residue alphabet is automatically
detected from the sequences), a case sensitive character vector
specifying the residue alphabet, or one of the character strings
"RNA", "DNA", "AA", "AMINO". Note that the default option can be slow for
large lists of character vectors. Specifying the residue alphabet is therefore
recommended unless x is a "DNAbin" or "AAbin" object.}

\item{gap}{the character used to represent gaps in the alignment matrix
(if applicable). Ignored for \code{"DNAbin"} and \code{"AAbin"} objects.
Defaults to "-" otherwise.}

\item{named}{logical. Should the k-mers be returned as column names in
the returned matrix? Defaults to TRUE.}

\item{compress}{logical indicating whether to compress AAbin sequences
using the Dayhoff(6) alphabet for k-mer sizes exceeding 4.
Defaults to TRUE to avoid memory overflow and excessive computation time.}
}
\value{
Returns a matrix of k-mer counts with one row for each sequence
  and \emph{n}^\emph{k} columns (where \emph{n} is the size of the
  residue alphabet and \emph{k} is the k-mer size)
}
\description{
Count all k-letter words in a sequence or set of sequences
  with a sliding window of length k.
}
\details{
This function computes a vector or matrix of k-mer counts
  from a sequence or set of sequences using a sliding a window of length k.
  DNA and amino acid sequences can be passed to the function either as
  a list of non-aligned sequences or a matrix of aligned sequences,
  preferably in the "DNAbin" or "AAbin" raw-byte format
  (Paradis et al 2004, 2012; see the \code{\link[ape]{ape}} package
  documentation for more information on these S3 classes).
  Character sequences are supported; however ambiguity codes may
  not be recognized or treated appropriately, since raw ambiguity
  codes are counted according to their underlying residue frequencies
  (e.g. the 5-mer "ACRGT" would contribute 0.5 to the tally for "ACAGT"
  and 0.5 to that of "ACGGT").

  To minimize computation time when counting longer k-mers (k > 3),
  amino acid sequences in the raw "AAbin" format are automatically
  compressed using the Dayhoff-6 alphabet as detailed in Edgar (2004).
  Note that amino acid sequences will not be compressed if they
  are supplied as a list of character vectors rather than an "AAbin"
  object, in which case the k-mer length should be reduced
  (k < 4) to avoid excessive memory use and computation time.
}
\examples{
  ## compute a matrix of k-mer counts for the woodmouse
  ## data (ape package) using a k-mer size of 3
  library(ape)
  data(woodmouse)
  x <- kcount(woodmouse, k = 3)
  x
  ## 64 columns for nucleotide 3-mers AAA, AAC, ... TTT
  ## convert to AAbin object and repeat the operation
  y <- kcount(ape::trans(woodmouse, 2), k = 2)
  y
  ## 400 columns for amino acid 2-mers AA, AB, ... , YY
}
\references{
Edgar RC (2004) Local homology recognition and distance measures in
  linear time using compressed amino acid alphabets.
  \emph{Nucleic Acids Research}, \strong{32}, 380-385.

  Paradis E, Claude J, Strimmer K, (2004) APE: analyses of phylogenetics
  and evolution in R language. \emph{Bioinformatics} \strong{20}, 289-290.

  Paradis E (2012) Analysis of Phylogenetics and Evolution with R
  (Second Edition). Springer, New York.
}
\seealso{
\code{\link{kdistance}} for k-mer distance matrix computation.
}
\author{
Shaun Wilkinson
}
