% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data-EOC.R
\docType{data}
\name{EOC}
\alias{EOC}
\title{A sub-data from Pre-PLCO Phase II Dataset}
\format{A data frame with 278 observations on the following 6 variables.
\describe{
  \item{\code{D.full}}{a factor with 3 levels of disease status, 1, 2, 3. The levels correspond to benign disease, early stage (I and II) and late stage (III and IV).}
  \item{\code{V}}{a binary vector containing the verification status. 1 or 0 indicates verified or non verified subject.}
  \item{\code{D}}{a copy of \code{D.full} with the missing values. \code{NA} values correspond to non verified subjects.}
  \item{\code{CA125}}{a numeric vector of biomarker CA125 (used as diagnostic test).}
  \item{\code{CA153}}{a numeric vector of biomarker CA153 (used as covariate).}
  \item{\code{Age}}{a numeric vector containing the age of patients.}
}}
\source{
SPORE/EDRN/PRE-PLCO Ovarian Phase II Validation Study: \url{https://edrn.nci.nih.gov/protocols/119-spore-edrn-pre-plco-ovarian-phase-ii-validation}.
}
\usage{
EOC
}
\description{
A subset of the Pre-PLCO Phase II Dataset from the SPORE/Early Detection Network/Prostate, Lung, Colon, and Ovarian Cancer Ovarian Validation Study. This data deals with epithelial ovarian cancer (EOC).
}
\details{
The Pre-PLCO datasets contain some demographic variables (Age, Race, ect.) and 59 markers measured by 4 sites (Harvard, FHCRC, MD Anderson, and Pittsburgh). Some interest biomarkers are: CA125, CA153, CA19--9, CA72--4, Kallikrein 6 (KLK6), HE4 and Chitinase (YKL40). The original data set consist of control groups and three classes of EOC: benign disease, early stage (I and II) and late stage (III and IV). In the sub data set, the  biomakers CA125 and CA153 (measured at Harvard laboratories), the age of patients, and three classes of EOC are collected. In addition, the verification status and the missing disease status are also added.

The verification status \eqn{V} is generated by using the following selection process:
\deqn{ P(V = 1) = 0.05 + 0.35 I(CA125 > 0.87) + 0.25 I(CA153 > 0.3) + 0.35 I(Age > 45). }
This process leads to 63.4\% patients selected to undergo disease verification.

The missing disease status D are the copies of the full disease status D.full, but some values corresponding to \eqn{V = 0} are deleted (refered as \code{NA} values).
}
\keyword{data}

