\name{eic.pred}
\alias{eic.pred}
\title{ Internal function: calculate the score for each EIC based on prediction of match status. }
\description{ This function uses predictive models to evaluate the data features, and give scores to every EIC, which serves as the basis for EIC selection. }
\usage{
eic.pred(eic.rec, known.mz, mass.matched = NA, to.use = 10, do.plot = FALSE, 
match.tol.ppm = 5, do.grp.reduce = TRUE, remove.bottom = 5, max.fpr = 0.3, min.tpr = 0.8)
}
\arguments{
  \item{eic.rec}{ The matrix of data features from every EIC. Each row is an EIC. Each column is a data feature value. }
  \item{known.mz}{ The m/z values of the known metabolic features. }
  \item{mass.matched}{ An indicator vector. "1" means the corresponding EIC has an m/z matched to known features.  The default is NA, in which case the matching is done inside this function. }
  \item{to.use}{ The maximum number of data features to use in the predictive models.  }
  \item{do.plot}{ Whether diagnostic plots would be generated. }
  \item{match.tol.ppm}{ The tolerance level in the m/z match, at ppm scale. }
  \item{do.grp.reduce}{ Whether to reduce the data features first by reducing each group of similar features into one. }
  \item{remove.bottom}{ The number of worst performing data features to remove before model building. If true, the removal is done based on single predictor ROC analysis. }
  \item{max.fpr}{ The threshold for selecting unmatched EICs. Each EIC is assigned an FPR value based on the final prediction model. Those with FPR smaller than this threshold will be selected. If a vector is provided, the first one will be used. But all FPR values will also be returned. So other functions will be able to make selections based on other threshold values. }
  \item{min.tpr}{ The threshold for selecting matched EICs. Each EIC is assigned an TPR value based on the final prediction model. Those with TPR larger than this threshold will be selected. If a vector is provided, the first one will be used. But all TPR values will also be returned. So other functions will be able to make selections based on other threshold values. }
}
\details{ The function first subsample the EICs to balance the unmatched/matched. Then it randomly split the data into training and testing set. Combinations of feature ranking and predictive models are used, and their performance guaged using the testing set. The overall best model is selected, and the EICs each receive a score based on this model. 

Although there is a single scoring system for all EICs, those matched are treated differently than unmatched, because we have higher confidence in them being real metabolites. The matched are selected using the "min.tpr" threshold, to ensure the majority of them enter next step. Those unmatched are selected using the "max.fpr" threshold. 
}
\value{
A list item is returned.
  \item{chosen}{ An indicator vector. "1" means the EIC is selected; "0" means unselected. When multiple min.tpr and/or max.fpr are provided, this vector corresponds to the combination of the first min.tpr and max.fpr. }
  \item{fpr}{ The vector of FPR values, each value corresponds to the FPR at the cutoff of the specific EIC. }
  \item{tpr}{ The vector of TPR values, each value corresponds to the TPR at the cutoff of the specific EIC. }
  \item{matched}{ An indicator vector. "1" means matched to known features. "0" means unmatched.}
  \item{pred.performance}{ Prediction performance of all models tested. }
  \item{feature.rank.method}{ Which method is used for ranking features. }
  \item{model}{ Which prediction model is used. }
  \item{feature importance}{ The importance score of all data features generated by the feature ranking method. }
  \item{used.features}{ The names of the features used in the final model. }
  \item{final.auc}{ The AUC of the selected model. }
}
\references{Bioinformatics. 30(20):  2941-2948.}
\author{Tianwei Yu <tianwei.yu@emory.edu>}
\seealso{semi.sup.learn, eic.qual, eic.disect}
\keyword{ models }
