% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/diem.r
\name{diem}
\alias{diem}
\title{Diagnostic Index Expectation Maximisation}
\usage{
diem(
  files,
  ploidy = FALSE,
  markerPolarity = FALSE,
  ChosenInds,
  ChosenSites = "all",
  epsilon = 0.99999,
  verbose = FALSE,
  nCores = parallel::detectCores() - 1,
  maxIterations = 50,
  ...
)
}
\arguments{
\item{files}{A character vector with paths to files with genotypes.}

\item{ploidy}{A logical or a list of length equal to length of \code{files}. Each
element of the list
contains a numeric vector with ploidy numbers for all individuals specified in
the \code{files}.}

\item{markerPolarity}{\code{FALSE} or a list of logical vectors.}

\item{ChosenInds}{A numeric vector of indices of individuals to be included in the analysis.}

\item{ChosenSites}{A logical vector indicating which sites are to be included in the
analysis.}

\item{epsilon}{A numeric, specifying how much the hypothetical diagnostic markers should
contribute to the likelihood calculations. Must be in \code{[0,1)}, keeping
tolerance setting of the \code{R} session in mind.}

\item{verbose}{Logical or character with path to directory where run diagnostics will
be saved.}

\item{nCores}{A numeric number of cores to be used for parallelisation. Must be
\code{nCores = 1} on Windows.}

\item{maxIterations}{A numeric.}

\item{...}{additional arguments.}
}
\value{
A list including suggested marker polarities, diagnostic indices and support for all
markers, four genomic state counts matrix for all individuals, and polarity changes
for the EM iterations.
}
\description{
Estimates how to assign alleles in a genome to maximise the distinction between two
unknown groups of individuals. Using expectation maximisation (EM) in likelihood
framework, \code{diem} provides marker
polarities for importing data, their likelihood-based diagnostic index and its support
for all markers, and hybrid indices for all individuals.
}
\details{
Given two alleles of a marker, one allele can belong to one side of a barrier
to geneflow and the other to the other side. Which allele belongs where is a non-trivial
matter. A marker state in an individual can be encoded as 0 if the individual is
homozygous for the first allele, and 2 if the individual is homozygous for the second
allele. Marker polarity determines how the marker will be imported. Marker polarity
equal to \code{FALSE} means that the marker will be imported as-is. A marker with
polarity equal to \code{TRUE} will be imported with states 0 mapped as 2 and states 2
mapped as 0, in effect switching which allele belongs to which side of a barrier to
geneflow.

When \code{markerPolarity = FALSE}, \code{diem} uses random null polarities to
initiate the EM algorithm. To fix the null polarities, \code{markerPolarity} must be
a list of length equal to the length of the \code{files} argument, where each element
in the list is a logical vector of length equal to the number of markers (rows) in
the specific file.

Ploidy needs to be given for each compartment and for each individual. For example,
for a dataset of three diploid mammal males consisting of an autosomal
compartment, an X chromosome
compartment and a Y chromosome compartment, the ploidy list would be
\code{ploidy = list(rep(2, 3), rep(1, 3), rep(1, 3)}. If the dataset consisted of
one male and two females,
ploidy for the sex chromosomes should be vectors reflecting that females have two X
chromosomes, but males only one, and females have no Y chromosomes:
\code{ploidy = list(rep(2, 3), c(1, 2, 2), c(1, 0, 0))}.

When \code{verbose = TRUE}, \code{diem} will output multiple files with information
on the iterations of the EM algorithm, including tracking marker polarities and the
respective likelihood-based diagnostics. See vignette \code{vignette("Understanding-genome-polarisation-output-files",
  package = "diemr")} for a detailed explanation of the individual output files.
}
\note{
To ensure that the data input format of the genotype files, ploidies and individual
selection are readable for \code{diem}, first use \link{CheckDiemFormat}.
Fix all errors, and run \code{diem} only once the checks all passed.

The working directory or a folder optionally specified in the \code{verbose}
argument must have write permissions. \code{diem} will store temporary files in the
location and output results files.

The grain for parallelisation is the compartment \code{files}.
}
\examples{
# set up input genotypes file names, ploidies and selection of individual samples
inputFile <- system.file("extdata", "data7x3.txt", package = "diemr")
ploidies <- list(c(2, 1, 2, 2, 2, 1, 2))
inds <- 1:6

# check input data
CheckDiemFormat(files = inputFile, ploidy = ploidies, ChosenInds = inds)
#  File check passed: TRUE
#  Ploidy check passed: TRUE

# run diem
\dontrun{
# diem will write temporal files during EM iterations
# prior to running diem, set the working directory to a location with write permission
fit <- diem(files = inputFile, ChosenInds = inds, ploidy = ploidies, nCores = 1)

# run diem with fixed null polarities
fit2 <- diem(
  files = inputFile, ChosenInds = inds, ploidy = ploidies, nCores = 1,
  markerPolarity = list(c(TRUE, FALSE, TRUE))
)
}
}
\seealso{
\link{CheckDiemFormat}
}
