% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/align_chromatograms.R
\name{align_chromatograms}
\alias{align_chromatograms}
\title{Aligning gas-chromatography peaks based on retention times}
\usage{
align_chromatograms(data, sep = "\\t", rt_col_name = NULL,
  write_output = NULL, rt_cutoff_low = NULL, rt_cutoff_high = NULL,
  reference = NULL, max_linear_shift = 0.02, max_diff_peak2mean = 0.02,
  min_diff_peak2peak = 0.08, blanks = NULL, delete_single_peak = FALSE)
}
\arguments{
\item{data}{Two input formats are supported. The first option is the \strong{path to a plain text file} with extension ".txt" containing the gc-data. It is expected that the file is formatted following this
principle: The first row contains sample names, the second row column names of the corresponding
chromatograms. Starting with the third row, peak data are included, whereby matrices of single
samples are concatenated horizontally (see the vignette or example data). The matrix for each
sample needs to consist of the same number of columns, at least two are required: The retention time and a measure of concentration
(e.g. peak area or height). See the \href{../doc/GCalignR_step_by_step.html}{vignette} for an
example. Alternatively the input may be a \strong{list of data frames}. Each data frame contains
the peak data for a single individual with at least two variables, the retention time of the peak
and the area under the peak. The variables need to have the same names across all samples
(i.e. data frames). Also, each list element has to be named with the ID of the respective sample.
The format can be checked by running \code{\link{check_input}}.}

\item{sep}{The field separator character. The default is tab separated (\code{sep = '\\t'}).
See the "sep" argument in \code{\link[utils]{read.table}} for details.}

\item{rt_col_name}{Character string - the name of the column containing the retention times.The variable needs to
be numeric and the decimal separator needs to be a point.}

\item{write_output}{Character vector of variables to write to a text file (e.g. \code{c("RT","Area")}.
Vector elements need to correspond to column names of \code{data}. Writing output is optional.}

\item{rt_cutoff_low}{Lower threshold under which retention times are cutted (i.e. 5 minutes). Default NULL.}

\item{rt_cutoff_high}{Upper threshold above which retention times are cutted (i.e. 35 minutes). Default NULL.}

\item{reference}{Character string of a sample to which all other samples are aligned by means of a
linear shift (e.g. \code{"M3"}. The name has to correspond to an individual name given
in the first line of \code{data}. Alternatively a sample called \code{reference} can be included
in \code{data} containing user-defined peaks (e.g. an internal standard) to align the samples to.
After the linear transformation the \code{reference} will be removed from the data.}

\item{max_linear_shift}{This value defines the maximum time that one chromatogram is expected to be deviating in retention times
from another chromatogram. To correct for these systematic shifts, the algorithm potentially adds the same
retention time to all peaks within a chromatogram to maximise the number of shared peaks with
the reference. We recommend to start with the default of 0.02 (minutes) and increase if necessary.}

\item{max_diff_peak2mean}{Numeric value defining the allowed deviation of the retention time of a given peak from the mean of the corresponding row (i.e. scored substance). Defaults to 0.02 (minutes). This parameter reflects the retention time range in which peaks across samples are still counted as the same 'substance',
i.e. sorted in one row.}

\item{min_diff_peak2peak}{Numeric values defining the expected minimum difference in retention times among different substances.
Retention time rows that differ less, are therefore merged if every sample contains either
one or none of the respective compounds.
This parameter is a major determinant in the classification of distinct peaks.
Therefore careful consideration is required to adjust this setting to your needs
(e.g. the resolution of your gas-chromatography pipeline).
Large values may cause the merge truely different substances with similar retention times, if those are not
simultaneously occuring within at least one individual, which might occure by chance for small sample
sizes. It is therefore recommended to set the value much lower (e.g. 0.02) when few individuals are
analysed. Defaults to 0.08 (minutes).}

\item{blanks}{Character vector of names of negative controls. Substances found in any of the blanks will be
removed from all samples, before the blanks are deleted from the aligned data.}

\item{delete_single_peak}{Logical, determining whether substances that occur in just one sample are removed or not. Default FALSE.}
}
\value{
Returns an object of class "GCalign" that is a a list containing several objects that are listed below.
Note, that the objects "heatmap_input" and "Logfile" are best inspected by calling the provided functions \emph{gc_heatmap} and \emph{print}.
\item{aligned}{Aligned gas-chromatography data subdivided into individual data frames for every variable.
Samples are represented by columns, rows specifiy substances. The first column of every data frame
is comprised of the mean retention time of the respective substance (i.e. row). The aligned data
can be used for further statistical analyses in other packages and also directly written
to .txt file by specifying the write_output argument in align_chromatograms}
\item{heatmap_input}{Data frames of retention times; used internally to create heatmaps}
\item{Logfile}{Includs several lists summarizing the data; used to print diagonistics of the alignment}
\item{input_list}{List of data frames. Data frames are comprised of the raw of a sample prior to aligning}
\item{input_matrix}{List of data frames. Data frames are matrices of input variables}
}
\description{
This is the core function of \code{\link{GCalignR}} to align gas-chromatography peak data.
Read through the documentation below and take a look at the vignettes for a thorough introduction.
}
\details{
The alignment of peaks is achieved by running \strong{three major algorithms} always considering
the complete set of samples submitted to the function.
In brief: \strong{(1) Chromatograms (more correctly, their peaks) are shifted} to maximise
similarity with a reference to account for systematic shifts in retention times
caused by gas-chromatography processing. \strong{(2) Peaks of similar retention times are aligned}
in order to match similar retention times to the same substance. During the algorithm proceeds,
these clusters are continously revised and every peaks is moved to the optimal
location(i.e. substance). \strong{(3) Peaks of similar retention time are merged} if
they show smaller differences in mean retention times than expected by the achievable
resolution of the gas-chromatography or the chemistry of the compounds are merged.
This has to be specfied by the paramters \code{max_diff_peak2mean} and \code{min_diff_peak2peak}.
Several optional processing steps are available, ranging from the removal of peaks representing
contaminations (requires to include blanks as a control) to the removal of uninformative peaks
that are present in just one sample.
}
\examples{
## Load example data set
data("peak_data")
## Subset for faster processing
peak_data <- peak_data[1:3]
peak_data <- lapply(peak_data, function(x) x[1:50,])
## align data
out <- align_chromatograms(peak_data, rt_col_name = "time",
rt_cutoff_low = 10, rt_cutoff_high = 30, reference = "M2",
max_linear_shift = 0.02)

}
\author{
Martin Stoffel (martin.adam.stoffel@gmail.com) & Meinolf Ottensmann (meinolf.ottensmann@web.de)
}

