% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/external.bold.analyze.tree.R
\name{bold.analyze.tree}
\alias{bold.analyze.tree}
\title{Analyze and visualize the multiple sequence alignment}
\usage{
bold.analyze.tree(
  bold_df,
  dist_model,
  clus_method = c("nj", "njs"),
  save_dist_mat = FALSE,
  newick_tree_export = NULL,
  tree_plot = FALSE,
  tree_plot_type,
  ...
)
}
\arguments{
\item{bold_df}{A modified BCDM data frame obtained from \code{\link[=bold.analyze.align]{bold.analyze.align()}}.}

\item{dist_model}{A character string specifying the model to generate the distances.}

\item{clus_method}{A character string specifying either \code{nj} (neighbour joining) or \code{njs} (neighbour joining with NAs) clustering algorithm.}

\item{save_dist_mat}{A logical value specifying whether the distance matrix should be saved in the output. Default value is FALSE.}

\item{newick_tree_export}{A character string specifying the folder path where the file should be saved along with the name for the file. Default value is NULL.}

\item{tree_plot}{Logical value specifying if a neighbor joining plot should be generated. Default value is FALSE.}

\item{tree_plot_type}{A character string specifying the layout of the tree. Needs to be provided by default.}

\item{...}{additional arguments from \code{ape::dist.dna}.}
}
\value{
An 'output' list containing:
\itemize{
\item dist_mat = A distance matrix based on the model selected if save_dist_mat=TRUE.
\item base_freq = Overall base frequencies of the align.seq result.
\item plot = Neighbor Joining clustering visualization (if tree_plot=TRUE).
\item data_for_plot = A phylo object used for the plot.
\item NJ/NJS tree in a newick format (only if newick_tree_export=TRUE).
}
}
\description{
Calculates genetic distances and performs a Neighbor Joining (NJ) tree estimation of the multiple sequence alignment output obtained from \code{bold.analyze.align()}.
}
\details{
\code{bold.analyze.tree} analyzes the multiple sequence alignment output of the \code{bold.analyze.align()} function to generate a distance matrix using the models available in the \code{\link[ape:dist.dna]{ape::dist.dna()}}. The default \code{dist_model} is \code{K80} (Kimura 1980 model). Two forms of Neighbor Joining clustering are currently available (\code{\link[ape:nj]{ape::nj()}} & \code{\link[ape:njs]{ape::njs()}}). \code{save_dist_mat}= TRUE will store the underlying distance matrix in the output; however, the  default value for the argument is deliberately kept at FALSE to avoid potential memory issues with large data. \code{newick_tree_export} will save the tree in a newick format locally. Data path with the name of the file should be provided (Ex. 'C:/Users/xyz/Desktop/newickoutput' for Windows). Setting \code{tree_plot}= TRUE generates a basic visualization of the Neighbor Joining (NJ) tree using the distance matrix from \code{\link[ape:dist.dna]{ape::dist.dna()}} and the \code{\link[ape:plot.phylo]{ape::plot.phylo()}} function. \code{tree_plot_type} specifies the type of tree and has the following options ("phylogram", "cladogram", "fan", "unrooted", "radial", "tidy" based on \code{type} argument of \code{\link[ape:plot.phylo]{ape::plot.phylo()}}; The first alphabet can be used instead of the whole word). Both \code{\link[ape:nj]{ape::nj()}} and \code{\link[ape:njs]{ape::njs()}} are available for generating the tree. Additional arguments for calculating distances can be passed to \code{\link[ape:dist.dna]{ape::dist.dna()}} using the \code{...} argument (arguments such as \code{gamma}, \code{pairwise.deletion} & \code{base.freq}). The function also provides base frequencies from the data.
}
\examples{
\dontrun{
#Download the data ids
seq.data.ids <- bold.public.search(taxonomy = list("Oreochromis tanganicae",
"Oreochromis karongae"))

# Fetch the data using the ids.
#1. api_key must be obtained from BOLD support before using `bold.fetch()` function.
#2. Use the `bold.apikey()` function  to set the apikey in the global env.

bold.apikey('apikey')

seq.data <- bold.fetch(get_by = "processid",
                       identifiers = seq.data.ids$processid,
                       filt_marker = "COI-5P")

# Remove rows without species name information
seq <- seq.data[seq.data$species!="", ]

# Align the data
# Users need to install and load packages `msa` and `Biostrings`.
# For `align_method` = "Muscle", package `muscle` is required as well.

seq.align<-bold.analyze.align(bold_df=seq.data,
                              marker="COI-5P",
                              align_method="ClustalOmega",
                              cols_for_seq_names = c("species","bin_uri"))

#Analyze the data to get a tree

seq.analysis<-bold.analyze.tree(bold_df=seq.align,
                                dist_model = "K80",
                                clus_method="nj",
                                tree_plot=TRUE,
                                tree_plot_type='p',
                                save_dist_mat = T,
                                pairwise.deletion=T)

# Output
# A ‘phylo’ object of the plot
seq.analysis$data_for_plot
# A distance matrix based on the distance model selected
seq.analysis$save_dist_mat
# Base frequencies of the sequences
seq.analysis$base_freq
}

}
