% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/pogitBvs.R
\name{pogitBvs}
\alias{pogitBvs}
\title{Bayesian variable selection for the Pogit model}
\usage{
pogitBvs(y, E = NULL, X, W = NULL, validation = NULL, method = "val",
  model = list(), prior = list(), mcmc = list(), start = list(),
  BVS = TRUE)
}
\arguments{
\item{y}{an integer vector of observed counts for units i = 1,...,I}

\item{E}{an (optional) vector containing total exposure times (offset);
should be \code{NULL} or an integer vector of length equal to the number of
counts.}

\item{X}{a design matrix in the Poisson part of the joint model}

\item{W}{a design matrix in the logit part of the joint model (can be a subset
of \code{X}) or \code{NULL}, if the same design matrix is used in both
sub-models, i.e. \code{W} = \code{X}.}

\item{validation}{a two-column data frame or list with the number of
reported cases (= \code{v}) in the validation sample and the number of
total cases (= \code{m}) subject to the fallible reporting process
(i.e. validation sample size) for each unit (or sub-category);
required if \code{method =} "\code{val}", otherwise \code{NULL}.
The number of rows must conform with the number of rows in \code{W} or with
the number of units I (if \code{X = W}), respectively.}

\item{method}{the method to be used to obtain parameter identification:
The default method "\code{val}" requires a small sample of validation data
(see \code{validation}). If the information on all or some parameters
of the reporting process is not provided by validation data, an informative
prior distribution for the regression effects in the logit sub-model
can be used (\code{method} = "\code{infprior}"). This prior information is encoded
in a normal distribution instead of the spike and slab prior (see the details
for \code{prior}).}

\item{model}{a list specifying the structure of the model (see details)}

\item{prior}{an (optional) list of prior settings and hyper-parameters
controlling the priors (see details)}

\item{mcmc}{an (optional) list of MCMC sampling options (see details)}

\item{start}{an (optional) list containing starting values for the
regression effects in both sub-models (see details)}

\item{BVS}{if \code{TRUE} (default), Bayesian variable selection
 (in at least one part of the joint model) is performed to identify
 regressors with a non-zero effect; otherwise, an unrestricted model is
 estimated (without variable selection).}
}
\value{
The function returns an object of class "\code{pogit}" with methods
 \code{\link{print.pogit}}, \code{\link{summary.pogit}} and
 \code{\link{plot.pogit}}.

 An object of class "\code{pogit}" is a list containing the following components:

 \item{\code{samplesL}}{a named list containing the samples from the posterior
   distribution of the parameters in the logit part of the joint model
   (see also \code{msave}):
   \describe{
   \item{\code{alpha, thetaAlpha}}{regression coefficients \eqn{\alpha} and
   \eqn{\theta_\alpha}}
   \item{\code{pdeltaAlpha}}{P(\eqn{\delta_\alpha}=1)}
   \item{\code{psiAlpha}}{scale parameter \eqn{\psi_\alpha} of the slab component}
   \item{\code{pgammaAlpha}}{P(\eqn{\gamma_\alpha}=1)}
   \item{\code{ai}}{cluster-specific random intercept}
   }}
 \item{\code{samplesP}}{a named list containing the samples from the posterior
   distribution of the parameters in the Poisson part of the joint model
   (see also \code{msave}):
   \describe{
   \item{\code{beta, thetaBeta}}{regression coefficients \eqn{\beta} and
   \eqn{\theta_\beta}}
   \item{\code{pdeltaBeta}}{P(\eqn{\delta_\beta}=1)}
   \item{\code{psiBeta}}{scale parameter \eqn{\psi_\beta} of the slab component}
   \item{\code{pgammaBeta}}{P(\eqn{\gamma_\beta}=1)}
   \item{\code{bi}}{cluster-specific random intercept}
   }}
 \item{\code{data}}{a list containing the data \code{y}, \code{offset},
   \code{X}, \code{W}, \code{val} and \code{subcat}}
 \item{\code{model.logit}}{see \code{model} arguments}
 \item{\code{model.pois}}{see \code{model} arguments}
 \item{\code{mcmc}}{see \code{mcmc} arguments}
 \item{\code{prior.logit}}{see \code{prior} arguments}
 \item{\code{prior.pois}}{see \code{prior} arguments}
 \item{\code{dur}}{a list containing the total runtime (\code{total})
   and the runtime after burn-in (\code{durM}), in seconds}
 \item{\code{BVS}}{see arguments}
 \item{\code{method}}{see arguments}
 \item{\code{start}}{see \code{start} arguments}
 \item{\code{family}}{"pogit"}
 \item{\code{call}}{function call}
}
\description{
This function performs Bayesian variable selection for a Poisson-Logistic (Pogit)
model with spike and slab priors. For posterior inference, a MCMC sampling scheme
is used that relies on augmenting the observed data by the unobserved counts and
involves only Gibbs sampling steps.
}
\details{
The method provides Bayesian variable selection for regression
models of count data subject to under-reporting using mixture priors with a spike
and a slab component.
By augmenting the observed count data with the unobserved counts, the resulting
model can be factorized into a Poisson and a binomial logit model part. Hence,
for this two-part model, sampling algorithms for a Poisson and a binomial
logit model can be used which are described in \code{\link{poissonBvs}} and
\code{\link{logitBvs}}.
Bayesian variable selection is incorporated in both parts of the joint model
using mixture priors with a Dirac spike and (by default) a Student-t slab.
The implementation relies on the representation of the respective model as a
Gaussian regression model in auxiliary variables (see again the help for the
respective function). Though variable selection is primarily used to identify
regressors with a non-zero effect, it can also be useful for identification of
the Pogit model.

By default, identification of the Pogit model is achieved by additional
information on the reporting process through validation data and
incorporation of variable selection. If the information on the parameters
of the reporting process is not provided by validation data, the
identification of the model parameters has to be guaranteed by specifying an
informative prior distribution (see arguments).

To model under-reported clustered data, a cluster-specific random intercept can
be included in both model parts of the Pogit model to account for dependence
within clusters. Bayesian variance selection is applied to determine whether
there is within-cluster dependence in either part of the model.

For details concerning the sampling algorithm see Dvorzak and Wagner (forthcoming).

Details for the model specification (see arguments):
\describe{
 \item{\code{model}}{\describe{\item{}{A list:}
   \item{\code{deltaBetafix, deltaAlphafix}}{indicator vectors of length
   \code{ncol(X)-1} and \code{ncol(W)-1}, respectively, for the Poisson and the
   logit sub-model, to specify which regression effects are subject to selection
   (i.e., 0 = subject to selection, 1 = fix in the model); defaults to vectors
   of zeros.}
   \item{\code{gammaBetafix, gammaAlphafix}}{indicators for variance selection
   of the random intercept term in the Poisson and the logit sub-model
   (i.e., 0 = with variance selection (default), 1 = no variance selection);
   only used if a random intercept is included in either part of the joint
   model (see \code{riBeta} and \code{riAlpha}, respectively).}
   \item{\code{riBeta, riAlpha}}{logical. If \code{TRUE}, a cluster-specific
   random intercept is included in the respective part of the joint model;
   defaults to \code{FALSE}.}
   \item{\code{clBetaID, clAlphaID}}{numeric vectors of length equal to the
   number of observations containing the cluster ID c = 1,...,C for each unit
   (or sub-category) in the respective sub-model (required if
   \code{riBeta=TRUE} or \code{riAlpha=TRUE}, respectively).}
   \item{\code{subcat}}{a factor variable of length equal to the number of
   units that specifies for which sub-category validation data are available
   (is required if \code{W} is a subset of \code{X}).
   If \code{NULL} (default), it is presumed that validation data are available
   for each unit (see also examples).}
}}

\item{\code{prior}}{\describe{\item{}{A list:}
   \item{\code{slabP, slabL}}{distribution of the slab component in the
   Poisson and logit sub-model, i.e. "\code{Student}" (default) or "\code{Normal}".}
   \item{\code{psi.nuP, psi.nuL}}{hyper-parameter of the Student-t slab in
   the respective sub-model (used for a Student-t slab); defaults to 5.}
   \item{\code{m0b, m0a}}{prior mean for the intercept parameter in the
   Poisson and the logit model; defaults to 0. If the argument \code{method} =
   "\code{infprior}", the specification of \code{m0a} is required.}
   \item{\code{M0b, M0a}}{prior variance for the intercept parameter in the
   Poisson and the logit model; defaults to 100.}
   \item{\code{bj0, aj0}}{a vector of prior means for the regression effects
   in the Poisson and the logit sub-model (which is encoded in a normal distribution,
   see notes); defaults to a vector of zeros. If the argument \code{method} =
   "\code{infprior}", the specification of \code{aj0} is mandatory.}
   \item{\code{VP, VL}}{variance of the slab in the respective sub-model;
   defaults to 5.}
   \item{\code{wBeta, wAlpha}}{hyper-parameters of the Beta-prior for the mixture
   weights \eqn{\omega_\beta} and \eqn{\omega_\alpha} in the respective sub-model;
   defaults to \code{c(wa0=1, wb0=1)}, i.e. a uniform distribution.}
   \item{\code{piBeta, piAlpha}}{hyper-parameters of the Beta-prior for the mixture
   weights \eqn{\pi_\beta} and \eqn{\pi_\alpha} in the respective sub-model;
   defaults to \code{c(pa0=1, pb0=1)}, i.e. a uniform distribution.}
}}

\item{\code{mcmc}}{\describe{\item{}{A list:}
   \item{\code{M}}{number of MCMC iterations after the burn-in phase;
   defaults to 8000.}
   \item{\code{burnin}}{number of MCMC iterations discarded as burn-in;
   defaults to 2000.}
   \item{\code{thin}}{thinning parameter; defaults to 1.}
   \item{\code{startsel}}{number of MCMC iterations drawn from the unrestricted
   model (e.g., \code{burnin/2}); defaults to 1000.}
   \item{\code{verbose}}{MCMC progress report in each \code{verbose}-th
   iteration step; defaults to 500. If \code{verbose=0}, no output is
   generated.}
   \item{\code{msave}}{returns additional output with variable
   selection details (i.e. posterior samples for \eqn{\omega_\beta},
   \eqn{\omega_\alpha}, \eqn{\delta_\beta}, \eqn{\delta_\alpha},
   \eqn{\pi_\beta}, \eqn{\pi_\alpha}, \eqn{\gamma_\beta},
   \eqn{\gamma_\alpha}); defaults to \code{FALSE}.}
}}

\item{\code{start}}{\describe{\item{}{A list:}
   \item{\code{beta}}{a vector of length \code{ncol(X)} containing starting
   values for the regression parameters \eqn{\beta} in the Poisson model part.
   By default, a Poisson glm is fitted to the observed counts.}
   \item{\code{alpha}}{a vector of length \code{ncol(W)} containing starting
   values for the regression parameters \eqn{\alpha} in the logit model part.
   By default, a binomial glm is fitted to the validation data for
   \code{method} = "\code{val}". If \code{method} = "\code{infprior}",
   starting values for \eqn{\alpha} are sampled from the (informative) prior
   distribution.}
   \item{\code{firth}}{logical. If \code{TRUE}, a logistic regression model
   applying Firth's correction to the likelihood using
   \code{\link[logistf]{logistf}} is fitted to the validation data
   (only used if \code{method} = "\code{val}").}
}}}
}
\note{
If the argument \code{method} = "\code{infprior}", an
informative prior for the regression parameters in the logit model is
required to guarantee identification of the model parameters.
Otherwise, identification of the Pogit model may be weak and inference
will be biased.
}
\examples{
\dontrun{
## Examples below (except for example 3) should take 3-4 minutes.

## ------ (use simul1) ------
# load simulated data set 'simul1'
data(simul1)
y <- simul1$y
E <- simul1$E
X <- as.matrix(simul1[, -c(1,2,8,9)]) # W = X
validation <- simul1[, c("m", "v"), drop = FALSE]

# function call (with specific MCMC settings)
m1 <- pogitBvs(y = y, E = E, X = X, validation = validation,
               mcmc = list(M = 4000, thin = 5, verbose = 1000))

# print, summarize and plot results
print(m1)
summary(m1)
plot(m1)

# show traceplots disregarding burn-in and thinning
plot(m1, burnin = FALSE, thin = FALSE)
# show density plot of MCMC draws
plot(m1, type = "density")

# informative prior instead of validation data (change prior settings)
# e.g. available prior information on reporting probabilities
p.a0 <- 0.9
p.a  <- c(0.125, 0.5, 0.5, 0.5)
m0a_inf <- log(p.a0/(1 - p.a0))  # prior information for alpha_0
aj0_inf <- log(p.a/(1 - p.a))    # prior information for alpha

prior.set <- list(m0a = m0a_inf, aj0 = aj0_inf, VL = 0.005, slabL = "Normal")
m2 <- pogitBvs(y = y, E = E, X = X, method = "infprior", prior = prior.set,
               mcmc = list(M = 4000, burnin = 2000, thin = 2), BVS = FALSE)
print(m2)
summary(m2)
plot(m2)
plot(m2, type = "acf", lag.max = 30)

## ------ (use simul2) ------
# complex model (with a long (!) runtime)

# load simulated data set 'simul2'
data(simul2)
y <- simul2$y
E <- simul2$E
cID <- simul2$cID
X <- as.matrix(simul2[, -c(1:3,9,10)])
validation <- simul2[, c("v", "m"), drop = FALSE]

# function call (with random intercept in both sub-models)
model <- list(riBeta = 1, riAlpha = 1, clBetaID = cID, clAlphaID = cID)
m3 <- pogitBvs(y = y, E = E, X = X, validation = validation, model = model,
               mcmc = list(M = 6000, burnin = 200, thin = 10), BVS = TRUE)
print(m3)
summary(m3)
plot(m3)

## ------ (use cervical cancer data) ------
# load cervical cancer data
data(cervical)
data(cervical_validation)
y <- cervical$y
E <- cervical$E
X <- as.matrix(cervical[, -c(1:4)])
validation <- cervical_validation[, c(1, 2), drop = FALSE]
W          <- as.matrix(cervical_validation[, -c(1:3)])
subcat     <- factor(as.numeric(cervical$country))

# function call
m4 <- pogitBvs(y = y, E = E, X = X, W = W, validation = validation,
               model = list(subcat = subcat), mcmc = list(M = 10000,
               burnin = 2000, thin = 10), start = list(firth = TRUE),
               BVS = TRUE)
print(m4)
# additionally compute estimated risks and reporting probabilities
summary(m4, printRes = TRUE)
plot(m4, thin = FALSE)
plot(m4, type = "acf", lag.max = 50)

# informative prior instead of validation data (change prior settings)
# e.g. prior information on country-specific reporting probabilities
p.a0 <- 0.85
p.a  <- c(0.99, 0.70, 0.85)
m0a_inf <- log(p.a0/(1 - p.a0))  # prior information for alpha_0
aj0_inf <- log(p.a/(1 - p.a))    # prior information for alpha

prior.set <- list(m0a = m0a_inf, aj0 = aj0_inf, VL = 0.005, slabL = "Normal")
m5 <- pogitBvs(y = y, X = X, W = W, E = E, method = "infprior",
               model = list(subcat = subcat), prior = prior.set,
               mcmc = list(M = 10000, burnin = 2000, thin = 10))
print(m5)
summary(m5, printRes = TRUE)
plot(m5)
plot(m5, type = "acf", lag.max = 50)
}
}
\author{
Michaela Dvorzak <m.dvorzak@gmx.at>, Helga Wagner
}
\references{
Dvorzak, M. and Wagner, H. (forthcoming). Sparse Bayesian modelling
 of underreported count data. \emph{Statistical Modelling}.
}
\seealso{
\code{\link{logitBvs}}, \code{\link{poissonBvs}}
}

