Simulation of confidence intervals.

Setting up the simulations

We use the dataset bfi from the package psych together with lavaan to estimate some realistic factor loadings \(\lambda\) and standard deviations \(\sigma\).

model <- c("y =~ A1 + A2 + A3 + A4 + A5")
fit <- lavaan::cfa(model, data = psych::bfi)
coefs <- lavaan::lavInspect(fit, what = "x")
lambda <- abs(c(coefs$lambda * sqrt(as.numeric(coefs$psi))))
sigma <- sqrt(diag(lavaan::lavInspect(fit, what = "x")$theta))

We take the absolute value of the lambda vector as the agreement data contains reverse-coded items.

Comparing confidence intervals coverages and lengths

We compare five confidence intervals, all without transformations. The adf interval is the asymptotic distribution-free interval, the ell interval is the interval based on elliptical distributions and a kurtosis correction, the ell_par is the elliptical interval assuming a parallel model. The same comments hold for norm (assuming normal data) and norm_par (assuming parallel normal data).

library("alphaci")
library("future.apply")
plan(multisession, workers = availableCores() - 2)
set.seed(313)

n_reps <- 10000
k <- 5
latent <- \(n) extraDistr::rlaplace(n) / sqrt(2)
true <- alphaci:::alpha(sigma, lambda)

In this simulation we normal error terms and a Laplace-distributed latent variable. This one has excess kurtosis \(3\), which caries over in large part to the data. k is the number of questions ands n_reps is the number of simulations.

success <- \(ci) true <= ci[2] & true >= ci[1]
len <- \(ci) ci[2] - ci[1]
simulations <- \(n) {
    results  <- future.apply::future_replicate(n_reps, {
      x <- alphaci:::simulate_congeneric(n, k, sigma, lambda, latent = latent)
      cis <- rbind(adf = alphaci(x, type = "adf"),
        adf_par = alphaci(x, type = "adf", parallel = TRUE),
        ell = alphaci(x, type = "elliptical"),
        ell_par = alphaci(x, type = "elliptical", parallel = TRUE),
        norm = alphaci(x, type = "normal"),
        norm_par = alphaci(x, type = "normal", parallel = TRUE)
      )
      c(cov = apply(cis, 1, success), len = apply(cis, 1, len))
      }, future.seed = TRUE)
  rowMeans(results)
}

Let’s check out the results when \(n= 10\).

simulations(10)
#>      cov.adf  cov.adf_par      cov.ell  cov.ell_par     cov.norm cov.norm_par      len.adf  len.adf_par 
#>    0.8745000    0.7942000    0.9513000    0.9566000    0.9223000    0.9297000    0.7680810   55.3239940 
#>      len.ell  len.ell_par     len.norm len.norm_par 
#>    1.0238478    1.0499270    0.9113473    0.9342908

It appears that the kurtosis corrections work well, at least for small sample size. Let’s see how they perform when \(n\) increases.

nn <- c(5, 10, 20, 30, 40, 50, 100, 200, 500, 1000, 2000, 5000)
results <- sapply(nn, simulations)

Plotting the coverages, we find, where 1 is asymptotically distribution-free, 2 is elliptical, 3 is paralell and elliptical, 4 is normal and 5 is parallel and normal.

matplot(nn, t(results[1:5, ]), xlab = "n", ylab = "Coverage", type = "b",
        log = "x")
abline(h = 0.95, lty = 2)

plot of chunk figure

Hence the kurtosis correction intervals have better coverage than the adf interval when \(n\leq 50\) and outperforms the normal theory intervals for all \(n\). If this observation is general remains to be seen.