Basic unit-level models

The basic unit-level model (Battese, Harter, and Fuller 1988; Rao and Molina 2015) is given by \[ y_j = \beta' x_j + v_{i[j]} + \epsilon_j\,,\\ v_i \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma_v^2) \qquad \epsilon_j \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma^2) \] where \(j\) runs from 1 to \(n\), the number of unit-level observations, \(\beta\) is a vector of regression coefficients for given covariates \(x_j\), and \(v_i\) are random area intercepts.

The apipop data frame contains the complete population whereas apisrs is a simple random sample from it. The variable cname is the county name, and we will be interested in estimation at the county level. Not all counties in the population are sampled. In order to be able to make predictions for out-of-sample areas we make sure that the levels of the sample’s cname variable match those of its population counterpart.

We wish to estimate the area population means \[ \theta_i = \frac{1}{N_i}\sum_{j \in U_i} y_j\,, \] where \(U_i\) is the set of units in area \(i\) of size \(N_i\). The MCMC output in variable sim can be used to obtain draws from the posterior distribution for \(\theta_i\). The \(r\)th draw can be expressed as \[ \theta_{i;r} = \frac{1}{N_i} \left(n_i \bar{y}_i + \beta_r'(t_{x;i} - n_i \bar{x}_i) + (N_i - n_i)v_{i;r} + \sum_{j \in U_i\setminus s_i} \epsilon_{j;r} \right)\,, \] where \(\bar{y}_i\) is the sample mean of \(y\) in area \(i\) and \(t_{x;i}\) is a vector of population totals for area \(i\).

Binomial Unit-Level Model

A model with binomial likelihood can also be fit. We now model the target variable sch.wide, a binary variable indicating whether a school-wide growth target has been met. We use the same mean model structure as above for the linear model, but now using a logistic link function, \[ y_j \stackrel{\mathrm{iid}}{\sim} {\cal Be}(p_j)\,,\\ \mathrm{logit}(p_j) = \beta' x_j + v_{i[j]}\,,\\ v_i \stackrel{\mathrm{iid}}{\sim} {\cal N}(0, \sigma_v^2) \]

apisrs$target.met <- as.numeric(apisrs$sch.wide == "Yes")
sampler <- create_sampler(update(mod, target.met ~ .), family="binomial", data=apisrs)
sim <- MCMCsim(sampler, store.all=TRUE, verbose=FALSE)
summary(sim)

## llh_ :
##       Mean  SD t-value   MCSE q0.05  q0.5 q0.95 n_eff R_hat
## llh_ -83.3 2.5   -33.3 0.0616 -87.7 -83.2 -79.6  1651     1
## 
## beta :
##                 Mean     SD t-value     MCSE   q0.05     q0.5    q0.95 n_eff R_hat
## (Intercept)  4.56516 1.4705  3.1044 0.053839  2.2341  4.50600  7.11086   746 1.002
## ell         -0.03653 0.0148 -2.4738 0.000397 -0.0615 -0.03621 -0.01266  1383 1.000
## meals        0.00123 0.0139  0.0886 0.000360 -0.0217  0.00131  0.02443  1496 1.001
## stypeH      -2.83251 0.6289 -4.5039 0.018450 -3.8958 -2.81534 -1.80446  1162 1.001
## stypeM      -1.66460 0.5866 -2.8378 0.018269 -2.6488 -1.66899 -0.69783  1031 1.000
## hsg         -0.02637 0.0222 -1.1892 0.000655 -0.0630 -0.02606  0.01058  1146 1.002
## col.grad    -0.03600 0.0267 -1.3468 0.000852 -0.0783 -0.03624  0.00844   984 1.001
## grad.sch     0.01267 0.0327  0.3871 0.001279 -0.0397  0.01183  0.06877   655 0.999
## 
## v_sigma :
##          Mean    SD t-value   MCSE  q0.05  q0.5 q0.95 n_eff R_hat
## v_sigma 0.372 0.267    1.39 0.0117 0.0335 0.325 0.869   525  1.01
## 
## v :
##                  Mean    SD  t-value    MCSE  q0.05      q0.5 q0.95 n_eff R_hat
## Alameda      -0.19342 0.429 -0.45131 0.01244 -1.041 -0.082722 0.335  1186 1.001
## Amador        0.01100 0.456  0.02412 0.00835 -0.704  0.002089 0.749  2986 1.000
## Butte         0.00190 0.451  0.00421 0.00824 -0.697  0.004309 0.692  2998 1.000
## Calaveras     0.04634 0.454  0.10213 0.00978 -0.613  0.004646 0.818  2154 1.001
## Colusa       -0.00633 0.455 -0.01389 0.00832 -0.729 -0.004004 0.711  3000 0.999
## Contra Costa -0.00822 0.396 -0.02074 0.00875 -0.666  0.000858 0.618  2051 1.000
## Del Norte    -0.00822 0.471 -0.01745 0.00883 -0.761 -0.002536 0.712  2843 1.000
## El Dorado    -0.00807 0.454 -0.01777 0.00933 -0.750  0.000655 0.707  2366 1.000
## Fresno       -0.04698 0.365 -0.12865 0.00715 -0.704 -0.015413 0.517  2608 1.000
## Glenn         0.01047 0.464  0.02254 0.00848 -0.687  0.002597 0.745  3000 1.000
## ... 47 elements suppressed ...

To predict the population fractions of schools that meet the growth target by county,

samplesums <- tapply(apisrs$target.met, apisrs$cname, sum)
samplesums[is.na(samplesums)] <- 0  # substitute 0 for out-of-sample areas
res <- predict(sim, newdata=apipop, labels=names(N),
         fun=function(x, p) (samplesums + tapply(x[-m], apipop$cname[-m], sum ))/N,
         show.progress=FALSE)
(summ <- summary(res))

##                  Mean     SD t-value     MCSE q0.05  q0.5 q0.95 n_eff R_hat
## Alameda         0.785 0.0601   13.05 0.001644 0.670 0.796 0.867  1338 1.002
## Amador          0.839 0.1185    7.08 0.002226 0.600 0.900 1.000  2834 1.000
## Butte           0.836 0.0744   11.23 0.001447 0.708 0.833 0.938  2645 1.000
## Calaveras       0.872 0.0960    9.08 0.001921 0.700 0.900 1.000  2499 1.000
## Colusa          0.650 0.1605    4.05 0.003092 0.333 0.667 0.889  2693 0.999
## Contra Costa    0.816 0.0572   14.27 0.001306 0.715 0.821 0.899  1918 1.001
## Del Norte       0.880 0.1090    8.08 0.002101 0.750 0.875 1.000  2693 0.999
## El Dorado       0.849 0.0752   11.29 0.001521 0.725 0.850 0.950  2444 1.000
## Fresno          0.782 0.0583   13.40 0.001205 0.683 0.785 0.866  2344 0.999
## Glenn           0.794 0.1398    5.68 0.002684 0.556 0.778 1.000  2713 0.999
## Humboldt        0.853 0.0734   11.62 0.001436 0.725 0.850 0.950  2609 1.000
## Imperial        0.698 0.1047    6.67 0.002090 0.525 0.700 0.850  2510 1.000
## Inyo            0.784 0.1504    5.21 0.002810 0.571 0.857 1.000  2867 1.000
## Kern            0.805 0.0516   15.59 0.001181 0.711 0.811 0.878  1911 1.000
## Kings           0.851 0.0798   10.66 0.001649 0.720 0.840 0.960  2345 1.000
## Lake            0.826 0.0948    8.71 0.001822 0.682 0.818 0.955  2706 1.001
## Lassen          0.837 0.1070    7.82 0.002028 0.636 0.818 1.000  2785 0.999
## Los Angeles     0.799 0.0427   18.73 0.001128 0.731 0.799 0.872  1431 1.003
## Madera          0.817 0.0738   11.07 0.001497 0.677 0.806 0.935  2430 1.001
## Marin           0.839 0.0675   12.43 0.001566 0.720 0.840 0.940  1856 1.000
## Mariposa        0.862 0.1454    5.93 0.002731 0.600 0.800 1.000  2834 1.000
## Mendocino       0.816 0.0905    9.01 0.001723 0.640 0.840 0.960  2760 1.000
## Merced          0.790 0.0807    9.80 0.001614 0.651 0.794 0.905  2500 1.000
## Modoc           0.835 0.1509    5.53 0.002993 0.600 0.800 1.000  2544 1.001
## Mono            0.760 0.2446    3.11 0.004505 0.333 0.667 1.000  2948 1.002
## Monterey        0.802 0.0685   11.71 0.001537 0.687 0.807 0.904  1988 1.000
## Napa            0.848 0.0808   10.50 0.001664 0.704 0.852 0.963  2357 1.001
## Nevada          0.870 0.0933    9.32 0.001769 0.714 0.857 1.000  2782 1.001
## Orange          0.774 0.0603   12.84 0.001327 0.672 0.778 0.871  2066 1.000
## Placer          0.815 0.0743   10.97 0.001705 0.682 0.818 0.924  1900 1.003
## Plumas          0.743 0.1524    4.87 0.002987 0.444 0.778 1.000  2602 1.000
## Riverside       0.825 0.0529   15.60 0.001238 0.729 0.831 0.895  1824 0.999
## Sacramento      0.847 0.0471   18.00 0.001229 0.767 0.847 0.924  1466 1.003
## San Benito      0.824 0.1232    6.69 0.002517 0.636 0.818 1.000  2396 1.000
## San Bernardino  0.861 0.0420   20.49 0.001134 0.790 0.862 0.928  1374 1.000
## San Diego       0.849 0.0420   20.22 0.000935 0.778 0.852 0.913  2016 0.999
## San Francisco   0.760 0.0756   10.06 0.001505 0.630 0.760 0.870  2521 1.000
## San Joaquin     0.829 0.0575   14.42 0.001274 0.726 0.839 0.911  2034 1.000
## San Luis Obispo 0.857 0.0681   12.57 0.001400 0.750 0.875 0.950  2368 1.001
## San Mateo       0.800 0.0732   10.92 0.001707 0.671 0.811 0.895  1840 1.003
## Santa Barbara   0.833 0.0640   13.01 0.001366 0.728 0.840 0.926  2196 1.001
## Santa Clara     0.784 0.0687   11.41 0.001917 0.649 0.796 0.875  1283 1.002
## Santa Cruz      0.786 0.0745   10.56 0.001571 0.653 0.796 0.898  2245 1.002
## Shasta          0.817 0.0764   10.69 0.001533 0.675 0.825 0.925  2486 0.999
## Sierra          0.730 0.2254    3.24 0.004407 0.333 0.667 1.000  2615 1.000
## Siskiyou        0.843 0.0992    8.51 0.002000 0.667 0.867 1.000  2458 1.001
## Solano          0.866 0.0621   13.94 0.001465 0.750 0.867 0.950  1799 1.001
## Sonoma          0.845 0.0630   13.41 0.001332 0.741 0.852 0.935  2235 1.002
## Stanislaus      0.812 0.0714   11.38 0.001809 0.692 0.824 0.901  1557 1.000
## Sutter          0.834 0.0907    9.20 0.001703 0.650 0.850 0.950  2835 1.000
## Tehama          0.841 0.0991    8.49 0.001876 0.647 0.824 1.000  2794 1.000
## Trinity         0.756 0.1999    3.78 0.003735 0.500 0.750 1.000  2864 1.000
## Tulare          0.831 0.0631   13.17 0.001325 0.718 0.836 0.918  2270 0.999
## Tuolumne        0.882 0.0920    9.59 0.001827 0.750 0.917 1.000  2535 1.000
## Ventura         0.837 0.0509   16.46 0.001085 0.752 0.839 0.913  2197 1.000
## Yolo            0.849 0.0737   11.52 0.001444 0.714 0.857 0.943  2608 1.002
## Yuba            0.798 0.0996    8.01 0.001933 0.632 0.789 0.947  2652 0.999

theta <- c(tapply(apipop$sch.wide == "Yes", apipop$cname, mean))  # true population quantities
plot_coef(summ, list(est=theta), n.se=2, est.names=c("mcmcsae", "true"), maxrows=30)

Basic unit-level models

Binomial Unit-Level Model

References