Generate Monte Carlo Estimates

Shu Fai Cheung & Sing-Hang Cheung

2023-10-06

1 Introduction

This article is a brief illustration of how to use do_mc() from the package manymome (Cheung & Cheung, 2023) to generate Monte Carlo estimates for indirect_effect() and cond_indirect_effects() to form Monte Carlo confidence intervals.

Although indirect_effect() and cond_indirect_effects() can also be used to generate Monte Carlo estimates when they are called (see vignette("manymome")), there may be situations in which users prefer generating the Monte Carlo estimates first before calling indirect_effect() and cond_indirect_effects(). do_mc() is for this purpose.

Monte Carlo confidence intervals only support models fitted by lavaan::sem() and (since version 0.1.9.8) semTools::sem.mi() or semTools::runMI().

2 How It Works

The function do_mc() retrieves the variance-covariance matrix of the parameter estimates and then generates a number of sets of simulated sample estimates using a multivariate normal distribution. Other parameters and implied variances, covariances, and means of variables are then generated from these simulated estimates.

When a \((1 - \alpha)\)% Monte Carlo confidence interval is requested, the \(100(\alpha/2)\)th percentile and the \(100(1 - \alpha/2)\)th percentile are used to form the confidence interval. For a 95% Monte Carlo confidence interval, the 2.5th percentile and 97.5th percentile will be used.

3 The Workflow

The following workflow will be demonstrated;

  1. Fit the model as usual.

  2. Use do_mc() to generate the Monte Carlo estimates.

  3. Call other functions (e.g, indirect_effect() and cond_indirect_effects()) to compute the desired effects and form Monte Carlo confidence intervals.

4 Demonstration

4.1 Fit a Model by lavaan::sem()

The data set for illustration:

library(manymome)
dat <- data_med
head(dat)
#>           x        m        y       c1       c2
#> 1  9.931992 17.89644 20.73893 1.426513 6.103290
#> 2  8.331493 17.92150 22.91594 2.940388 3.832698
#> 3 10.327471 17.83178 22.14201 3.012678 5.770532
#> 4 11.196969 20.01750 25.05038 3.120056 4.654931
#> 5 11.887811 22.08645 28.47312 4.440018 3.959033
#> 6  8.198297 16.95198 20.73549 2.495083 3.763712

It has one predictor (x), one mediator (m), one outcome variable (y), and two control variables (c1 and c2).

The following simple mediation model with two control variables (c1 and c2) will be fitted:

plot of chunk do_mc_draw_model

Fit the model by lavaan::sem():

mod <-
"
m ~ x + c1 + c2
y ~ m + x + c1 + c2
"
fit_lavaan <- sem(model = mod, data = dat,
                  fixed.x = FALSE,
                  estimator = "MLR")
summary(fit_lavaan)
#> lavaan 0.6.16 ended normally after 1 iteration
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        15
#> 
#>   Number of observations                           100
#> 
#> Model Test User Model:
#>                                               Standard      Scaled
#>   Test Statistic                                 0.000       0.000
#>   Degrees of freedom                                 0           0
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Sandwich
#>   Information bread                           Observed
#>   Observed information based on                Hessian
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   m ~                                                 
#>     x                 0.935    0.075   12.437    0.000
#>     c1                0.198    0.079    2.507    0.012
#>     c2               -0.168    0.099   -1.703    0.089
#>   y ~                                                 
#>     m                 0.785    0.233    3.363    0.001
#>     x                 0.508    0.323    1.573    0.116
#>     c1                0.140    0.188    0.747    0.455
#>     c2               -0.154    0.214   -0.720    0.471
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   x ~~                                                
#>     c1                0.026    0.121    0.211    0.833
#>     c2                0.100    0.084    1.186    0.235
#>   c1 ~~                                               
#>     c2               -0.092    0.109   -0.841    0.400
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .m                 0.681    0.085    7.976    0.000
#>    .y                 4.030    0.580    6.944    0.000
#>     x                 1.102    0.150    7.338    0.000
#>     c1                1.218    0.161    7.540    0.000
#>     c2                0.685    0.073    9.340    0.000

Suppose we would like to use robust “sandwich” standard errors and confidence intervals provided by MLR for the free parameters, but want to use Monte Carlo confidence interval for the indirect effect. In the call above, we used estimator = "MLR" and did not set se = "boot".

4.2 Generate Monte Carlo Estimates

We can then call do_mc() on the output of lavaan::sem() to generate the Monte Carlo estimates of all free parameters and the implied statistics, such as the variances of m and y, which are not free parameters but are needed to form the confidence interval of the standardized indirect effect.

mc_out_lavaan <- do_mc(fit = fit_lavaan,
                       R = 10000,
                       seed = 4234)
#> Stage 1: Simulate estimates
#> Stage 2: Compute implied statistics

Usually, just three arguments are needed:

Parallel processing is not used. However, the time taken is rarely long because there is no need to refit the model many times.

4.3 Using the Output of do_mc() in Other Functions of manymome

When calling indirect_effect() or cond_indirect_effects(), the argument mc_out can be assigned the output of do_mc(). They will then retrieve the stored simulated estimates to form the Monte Carlo confidence intervals, if requested.

out_lavaan <- indirect_effect(x = "x",
                              y = "y",
                              m = "m",
                              fit = fit_lavaan,
                              mc_ci = TRUE,
                              mc_out = mc_out_lavaan)
out_lavaan
#> 
#> == Indirect Effect ==
#>                                        
#>  Path:                 x -> m -> y     
#>  Indirect Effect       0.733           
#>  95.0% Monte Carlo CI: [0.301 to 1.180]
#> 
#> Computation Formula:
#>   (b.m~x)*(b.y~m)
#> Computation:
#>   (0.93469)*(0.78469)
#> 
#> Monte Carlo confidence interval with 10000 replications.
#> 
#> Coefficients of Component Paths:
#>  Path Coefficient
#>   m~x       0.935
#>   y~m       0.785

Reusing the simulated estimates can ensure that all analyses with Monte Carlo confidence intervals are based on the same set of simulated estimates, without the need to generate these estimates again.

5 Missing Data

Monte Carlo confidence intervals can be formed when the variance-covariance matrix of the parameter estimates can be retrieved. Therefore, do_mc() can be used when missing data is handled by full information maximum likelihood in lavaan using missing = "fiml". It also supports multiple imputation if semTools::sem.mi() or semTools::runMI() (since version 0.1.9.8). See vignette("do_mc_lavaan_mi") for an illustration.

6 The Structure of the Output

The output of do_mc() in this case is an object of the class mc_out, which is a list of R lists, each with two elements: est and implied_stats.

This is the content of est of the first list:

mc_out_lavaan[[1]]$est
#>    lhs op rhs    est
#> 1    m  ~   x  0.909
#> 2    m  ~  c1  0.198
#> 3    m  ~  c2 -0.163
#> 4    y  ~   m  0.641
#> 5    y  ~   x  0.921
#> 6    y  ~  c1  0.010
#> 7    y  ~  c2  0.141
#> 8    m ~~   m  0.641
#> 9    y ~~   y  3.432
#> 10   x ~~   x  1.023
#> 11   x ~~  c1  0.074
#> 12   x ~~  c2  0.028
#> 13  c1 ~~  c1  1.180
#> 14  c1 ~~  c2 -0.261
#> 15  c2 ~~  c2  0.718

The content is just the first four columns of the output of lavaan::parameterEstimates(). Note that only fixed and free parameters are used so other rows, if any, are not used even if present.

This is the content of implied_stats of the first list:

mc_out_lavaan[[1]]$implied_stats
#> $cov
#>         m      y      x     c1     c2
#> m   1.586                            
#> y   1.864  6.059                     
#> x   0.939  1.548  1.023              
#> c1  0.344  0.263  0.074  1.180       
#> c2 -0.143  0.033  0.028 -0.261  0.718
#> 
#> $mean
#>  m  y  x c1 c2 
#> NA NA NA NA NA

It has three elements. cov is the implied variances and covariances of all variables. If a model has latent variables, they will be included too. The other elements, mean and mean_lv, are the implied means of the observed variables and the latent variables (if any), respectively. The elements are NA if mean structure is not in the fitted model.

7 Limitations

Monte Carlo confidence intervals require the variance-covariance matrix of all free parameters. Therefore, only models fitted by lavaan::sem() and (since 0.1.9.8) semTools::sem.mi() or semTools::runMI() are supported. Models fitted by stats::lm() do not have a variance-covariance matrix for the regression coefficients from two or more regression models and so are not supported by do_mc().

8 Further Information

For further information on do_mc(), please refer to its help page.

9 Reference

Cheung, S. F., & Cheung, S.-H. (2023). manymome: An R package for computing the indirect effects, conditional effects, and conditional indirect effects, standardized or unstandardized, and their bootstrap confidence intervals, in many (though not all) models. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02224-z