Phylogenetic mixed model

Geir H. Bolstad

2021-12-08

The Almer function

The Almer function is a modification/hack of the lmer function in the lme4 package to incorporate correlated effects in the random structure. The Almer function can be used to fit phylogenetic mixed models (and other models with correlated random effects such as “animal models”).

To start, we need an ultrametric phylogeny of unit depth. We can construct this, for example, using the function rtree of the ape package.

# Only a very small sample size is used 
# in the interest of computational speed:
set.seed(57)
n_species <- 50 
tree <- ape::rtree(n = n_species)
tree <- ape::chronopl(tree, lambda = 1)

From this we can generate the phylogenetic relatedness matrix A.

A <- Matrix::Matrix(ape::vcv(tree), sparse = TRUE)

The column names of A must be the species identifier.

colnames(A) <- rownames(A) <- paste("species", 1:n_species, sep = "_")

From this we can simulate a Brownian motion process and add some residual noise.

y <- 5 + t(chol(A))%*%rnorm(n_species, 0, 2) + # BM process with mean = 5 and sd = 2
     rnorm(n_species, 0, 1)                    # residual variation with sd = 1

For Almer to work, the data must include the species identifier in addition to the species means.

dt <- data.frame(species = colnames(A), y = as.vector(y))

Almer can then be used to estimate the means and variances of the process.

mod <- Almer(y ~ 1 + (1|species), data = dt, A = list(species = A))
summary(mod)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ 1 + (1 | species)
#>    Data: dt
#> 
#> REML criterion at convergence: 209.4
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -1.57466 -0.59782 -0.02876  0.49887  1.75585 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  species  (Intercept) 1.862    1.364   
#>  Residual             2.443    1.563   
#> Number of obs: 50, groups:  species, 50
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   5.0252     0.4654    10.8

The Almer function is flexible (it is based on lmer), and can include additional fixed and random effects on top of the phylogenetic effects. Also, it is not restricted to phylogeny related problems, for example it can be used to estimate additive genetic variances and/or dominance variances (the argument A can have several entries).

The Almer_SE function

This function extends Almer by allowing the inclusion of the uncertainty of the species means. To do this, we take advantage of how weights are included in the lmer function: the diagonal of the residual co-variance matrix is the residual variance parameter \(\sigma^2\) times the vector of inverse weights. By using weights equal to \(1/(1+SE^2/\sigma^2)\), where \(SE\) is a vector of standard errors, the diagonal of the residual co-variance matrix is \(\sigma^2 + SE^2\). Because the weights include the residual variance parameter, the function uses an iterative approach.

To illustrate the approach, we add some arbitrary SE-values to the data

dt$SE <- runif(nrow(dt), min = 0.01, max = 0.02) 

Almer_SE can then be used to estimate the means and variances of the process taking the uncertainty into account.

mod_SE <- Almer_SE(y ~ 1 + (1|species), data = dt, SE = dt$SE, A = list(species = A))
summary(mod_SE)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ 1 + (1 | species)
#>    Data: dt
#> 
#> REML criterion at convergence: 209.4
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -1.57461 -0.59779 -0.02875  0.49885  1.75577 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  species  (Intercept) 1.862    1.365   
#>  Residual             2.443    1.563   
#> Number of obs: 50, groups:  species, 50
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   5.0252     0.4654    10.8

Note that the estimated residual variances represent the residual variance after correcting for the uncertainty in the means. Thus, this function can be useful for meta analyses.

Simulating data and bootstrapping

The Almer_sim function can be used to simulate the responses, in our case species means, of the fitted models of both Almer and Almer_SE. Note that the lme4::simulate.merMod function did not seem to work properly when the number of random effects equal the number of observations, and could therefore not be used.

sim_y <- Almer_sim(mod, nsim = 3)
sim_y[1:3,]
#> 3 x 3 Matrix of class "dgeMatrix"
#>      Sim_1    Sim_2    Sim_3
#> 1 5.171605 7.314536 3.453942
#> 2 1.901284 3.038564 6.394652
#> 3 6.275471 6.200653 9.409682

This can further be used to do bootstrapping, implemented in Almer_boot.

# The number of bootstrap simulations is kept very low in the interest 
# of computational speed. Often 1000 is used in real analyses.
Almer_boot_obj <- Almer_boot(mod, nsim = 10) 
Almer_boot_obj$fixef
#>                 Mean Std. Err.     2.5%    97.5%
#> (Intercept) 4.984573 0.4847826 4.465023 5.788567
Almer_boot_obj$vcov
#>              Mean Std. Err.     2.5%    97.5%
#> species  2.103980  1.820257 0.000000 4.758393
#> Residual 2.475891  1.672111 0.136031 5.004038

Phylogenetic heritability

The phylH function can be used to estimate the phylogenetic heritability of a object fitted by Almer. The 95% confidence interval is estimated by parametric bootstrapping. Both the name of the numerator of the heritability and, unless the phylogenetic residual is estimated as residuals in the model fit, the name of the phylogenetic residuals need to be specified.

# The number of bootstrap simulations is kept very low in the interest 
# of computational speed. Often 1000 is used in real analyses.
phylH_obj <- phylH(mod, numerator = "species", nsim = 10) 
phylH_obj$phylH
#> Phylo Heritability               2.5%              97.5% 
#>          0.4324541          0.0000000          0.8836447