% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fitgroup_da.R
\name{fitgroup.da}
\alias{fitgroup.da}
\title{Estimation of the Dagum distribution from group data}
\usage{
fitgroup.da(y, x = rep(1/length(y), length(y)), gini.e, pc.inc = NULL,
  se.gmm = FALSE, se.nls = FALSE, se.scale = FALSE, N = NULL,
  nrep = 10^3, grid = 1:20, rescale = 1000, gini = FALSE)
}
\arguments{
\item{y}{Vector of (non-cumulative) income shares expressed as decimals or percentage.
At least four data points are required to estimate the parameters of the income distribution.}

\item{x}{Vector of population shares associated with the income shares provided by
\code{y}. The default is a vector of equally sized population shares of the same length of
\code{y}.}

\item{gini.e}{specifies the survey Gini index expressed as a decimal.}

\item{pc.inc}{specifies an estimate of per capita income. If not provided, the weighting matrix
cannot be computed, hence GMM estimates will not be reported.}

\item{se.gmm}{If \code{TRUE} and the argument \code{N} is not \code{NULL}, the standard errors
of the shape parameters of the GMM estimation are computed using results from Beach and Davison(1983) and Hajargasht and
 Griffiths (2016).See Jorda et al. (2018) for details. By default, this argument is \code{FALSE}.}

\item{se.nls}{If \code{TRUE} and the argument \code{N} is not \code{NULL}, the standard errors of the NLS parameters
are obtained using Monte Carlo simulation of random samples of size \code{N}. By default, this argument is \code{FALSE}.}

\item{se.scale}{If \code{TRUE} and the argument \code{N} is not \code{NULL}, the standard error
of the scale parameter of the GMM estimation is obtained by Monte Carlo simulation
of random samples of size \code{N}. By default, this argument is \code{FALSE}.}

\item{N}{Specifies the size of the sample from which the grouped data was generated. This
information is required to compute the standard errors.}

\item{nrep}{Number of samples to be drawn in the Monte Carlo simulation of the standard error of
the NLS parameters and the scale parameter of the GMM estimation.}

\item{grid}{A sequence of positive real numbers to be used as initial values using the
algorithm developed by Jorda et al. (2018).}

\item{rescale}{Rescalation factor of per capita income. Reescalation might help to invert
the weight matrix when the scale is too large or too small. The argument \code{rescale} should be
a positive real number which, by default, is set to 1000.}

\item{gini}{if \code{TRUE}, reports an estimate of the Gini index using NLS and, if
possible, GMM.}
}
\value{
the function \code{fitgroup.da} returns the following objects:
  \itemize{
    \item \code{nls.estimation} Matrix containing the parameters of the Dagum distribution estimated
       by NLS and, if \code{se.nls = TRUE}, their standard errors.
    \item \code{nls.rss} Residual sum of squares of the NLS estimation.
    \item \code{gmm.estimation} Matrix containing the parameters of the Dagum distribution estimated
       by GMM and, if \code{se.gmm = TRUE}, their standard errors.
    \item \code{gmm.rss} Weighted residual sum of squares of the GMM estimation.
    \item \code{gini.estimation} Vector with the survey Gini index and the estimated Gini
     indices using NLS and GMM whenever possible.
  }
}
\description{
The function \code{fitgroup.da} implements the estimation of the Dagum distribution from group
data in form of income shares using the non-linear least squares (NLS) and the generalised method of
moments (GMM) estimators.
}
\details{
The Generalised Beta of the Second Kind (GB2) is a general class of distributions that is
acknowledged to provide an accurate fit to income data (McDonald 1984; McDonald and Mantrala,1995).
The Dagum distribution is a particular case of this model with \eqn{q = 1}, defined in terms of
the cumulative distribution function as follows:
\deqn{F(x; a, b, p) = \bigg(1+\bigg(\frac{x}{b}\bigg)^{-a}\bigg)^{-p} }

where \eqn{b} is the scale parameter and \eqn{a, p} are the shape parameters that define the
heaviness of the tail and the skewness of the distribution.

The function \code{fitgroup.da} estimates the parameters of the Dagum distribution using grouped data in form of
income shares. These data must have been generated by setting the proportion of observations in each
group before sampling, so that the population proportions are fixed, whereas income shares are random
variables. Examples of this type of data can be found in the largest datasets of grouped data,
includingThe World Income Inequality Database (UNU-WIDER, 2017), PovcalNet (World Bank, 2018) or the World Wealth
and Income Database (Alvaredo et al., 2018).

For NLS, numerical optimisation is achieved using the Levenberg-Marquardt Algorithm via \code{\link[minpack.lm]{nlsLM}}
Conventionally, moment estimates of a restricted model are taken as initial values. A potential
limitation of this method is that, as the dimensionality of the parameter
space increases, it is more difficult to achieve global convergence. Although it seems
quite intuitive that the moment estimates of the restricted model might be a good starting
point, the optimization could converge to
a local minimum, which might lead to inaccurate estimates of the parameters.

To provide different non-arbitrary combinations of starting values, we propose to
define a sequence of numbers (provided by \code{grid}). For each value in this
sequence, the moment estimate of one of the parameters is obtained using the survey Gini index,
assuming that the other one is equal to the grid value. Using this procedure,
we end up with as many combinations of initial values as values in the grid,
which are used to obtain different sets of estimates, keeping the one with the smallest
residual sum of squares. Although we cannot ensure that our estimates belong to
the global minimum, this procedure covers a larger proportion of the parameter
space than just using the moment estimates of a
particular sub-model. See Jorda et al. (2018) for details.

This method, however, does not provide
an estimate for the scale parameter because the Lorenz curve is independent to scale. The scale
parameter is estimated by equating the sample mean, specified by \code{pc.inc}, to the population
mean of the Dagum distribution. Because NLS does not use the optimal
covariance matrix of the moment conditions, the standard errors of the parameters
are obtained by Monte Carlo simulation. Please be aware that the estimation of the standard errors
might take a long time, especially if the sample size is large.

\code{fitgroup.da} also implements a two-stage GMM estimator. In the first stage, NLS estimates
are obtained as described above, which are used to compute a first stage estimator
of the weighting matrix. The weighting matrix is used in the second stage to obtain optimally
weighted estimates of the parameters. The numerical optimisation is performed using
\code{\link{optim}} with the BFGS method. If \code{optim} reports an error, the L-BFGS method
is used. NLS estimates are used as initial values for the optimisation algorithm. The GMM estimation
 incorporates the optimal weight matrix, thus making possible to derive the asymptotic standard
 errors of the parameters using results from Beach and Davison(1983) and Hajargasht and
 Griffiths (2016). As in the NLS estimation, the scale parameter is obtained by matching the
 population mean of the Dagum distribution to the sample mean. Hence, the standard error of the scale
 parameter is estimated by Monte Carlo simulation.

The Gini index of the Dagum distribution is computed using the function
\code{simgini.da} which makes use of \code{gini.d}.
If this function reports NaN, the Gini index is estimated by Monte
Carlo simulation of 10^6 samples of size N = 10^6.
}
\examples{
fitgroup.da(y = c(9, 13, 17, 22, 39), gini.e = 0.29)

}
\references{
Alvaredo, F., A. Atkinson, T. Piketty, E. Saez, and G. Zucman. The World Wealth and Income Database.
 \url{http://www.wid.world}.

Beach, C.M. and R. Davidson (1983): Distribution-free statistical inference with
Lorenz curves and income shares, \emph{The Review of Economic Studies}, 50, 723 - 735.

 Hajargasht, G. and W.E. Griffiths (2016): Inference for Lorenz Curves, Tech. Rep.,
 The University of Melbourne.

 Jorda, V., Sarabia, J.M., & Jäntti, M. (2018). Estimation of income inequality from grouped data.
 arXiv preprint arXiv:1808.09831.

 McDonald, J.B. (1984): Some Generalized Functions for the Size Distribution of Income,
 \emph{Econometrica}, 52, 647 - 665.

 McDonald, J.B. and A. Mantrala (1995): The distribution of personal income: revisited,
 \emph{Journal of Applied Econometrics}, 10, 201 - 204.

 UNU-WIDER (2018). World Income Inequality Database (WIID3.4).
 \url{https://www.wider.unu.edu/project/wiid-world-income-inequality-database}.

 World Bank (2018). PovcalNet Data Base. Washington, DC: World Bank. \url{http://iresearch.worldbank.org/PovcalNet/home.aspx}.
}

