Suggested quality control

Owen Jones

2023-09-29

Introduction

The specific requirements and assumptions for each function in Rage varies. Here we provide notes giving an overview of these requirements, which are also applicable to other functions/calculations in packages such as popbio and popdemo. We make some suggestions of how users should filter their dataset before analysis. To assist with this, the function Rcompadre::cdb_flag conducts a series of checks on the matrices in a compadreDB object and adds “flags” to facilitate the filtering out of problematic matrices. This function (cdb_flag) can automatically be run by Rcompadre::cdb_fetch using the argument flag = TRUE.

Types of issue

Missing data

The most obvious requirement for most of the Rage methods is that missing (NA) values in matrices prevent calculations using those matrices. Sometimes these NA values are in one of the submatrices (i.e., U, F or C) of the matrix model, but other submatrices are complete. For example, there may be NA entries in the F submatrix, while the U matrix remains complete. These issues are flagged with the columns check_NA_A, check_NA_U, check_NA_F and check_NA_C.

Excessive zeros

Submatrices composed entirely of zero values can also be problematic. There may be good biological reasons for this phenomenon. Species that do not reproduce clonally will have zero-value C matrices. Another biologically reasonable explanation could be that in the particular focal population in the particular focal year, there was no sexual reproduction recorded, so the F matrix was composed entirely of zeros. Nevertheless, zero-value submatrices can cause some calculations to fail and it may be necessary to exclude them. These issues are flagged with the columns check_zero_F, check_zero_C, check_zero_U.

Excessive survival

In a biologically reasonable matrix population model, the set of survival and growth transitions (i.e., in the U matrix) from a particular stage cannot exceed 1. However, in some cases, errors in the original matrices (including data entry and rounding errors) cause this situation to occur and may persist in the data set. We can check for this error using column sums of the U matrix, and may wish to exclude matrices with any column sum greater than 1.
This issue is examined using the column SurvivalIssue which gives the maximum value of the column sums for matU . An additional column check_surv_gte_1 (produced with cdb_flag) reports whether any single value is greater than or equal to 1.

Excessive mortality

At the opposite end of the survival spectrum, there may be some matrices where some of the column sums of the U matrix are zero, implying that there is no survival from that particular stage. This may be a perfectly valid parameterisation for a particular year/place but is biologically unreasonable in the longer term and users may wish to exclude problematic matrices from their analysis. This issue is indicated by the column check_zero_U_colsum.

Irreducibility and ergodicity

Several matrix manipulations or calculations require that the MPM (matA) be irreducible and ergodic (Stott et al. 2018). Irreducible MPMs are those where parameterised transition rates facilitate pathways from all stages to all other stages. Conversely, reducible MPMs depict incomplete life cycles where pathways from all stages to every other stage are not possible. Ergodic MPMs are those where there is a single asymptotic stable state that does not depend on initial stage structure. Conversely, non-ergodic MPMs are those where there are multiple asymptotic stable states, which depend on initial stage structure. MPMs that are reducible and/or non-ergodic are usually biologically unreasonable, both in terms of their life cycle description and their projected dynamics. They cause some calculations in Rage (and elsewhere) to fail. Irreducibility is necessary but not sufficient for ergodicity. These issues are flagged with check_irreducible and check_ergodic. Even if Rage functions do not fail due to these issues, the fact that they can indicate biologically unreasonable life cycles may mean that users nevertheless wish to exclude reducible, non-ergodic matrices from their analyses.

Singularity of the U matrix

Matrices are said to be singular if they cannot be inverted. Inversion is required for many matrix calculations and, therefore, singularity can cause some calculations to fail. This issue is flagged with check_singular_U. Calculations for longevity, life_expect_mean, life_expect_var and net_repro_rate fail with singular matrices, so users may wish to exclude singular matrices when conducting analyses using these functions.

Matrix split errors

A complete MPM (A) can be split into its component submatrices (i.e. U, F and C). The sum of these submatrices should equal the complete MPM (i.e. A = U + F + C). Sometimes, however, errors occur so that the submatrices do NOT sum to A. Normally, this is caused by rounding errors, but more significant errors are possible. This problem is flagged with check_component_sum (only relevant for divided (split) matrices). We recommend that users carefully check their matrices for these errors and correct or exclude them as appropriate.

Function requirement summaries

It is a general requirement for almost all Rage functions that the matrices used as arguments do not include NA values. With divided (split) matrices, NA values may be present in some submatrices but not others. For example, the U matrix may be complete, but the F matrix may have NA values. In this case, functions that require an F matrix will fail, while those that only require a U matrix will work. Users should filter the data to exclude entries with NA values in the matrices required for their analysis. The functions mpm_split, mpm_rearrange and mpm_standardise do not require complete NA-free matrices.

For functions that use the U matrix, we further suggest filtering the data to exclude the biologically unreasonable entries where one or more of the matU columns sum to zero, or to greater than 1 (see Excessive Survival, above). Alternatively, users could examine the offending matrices and make sensible corrections (e.g. to correct rounding errors).

For functions that use the F matrix, and where sexual reproduction is known to occur in the species, we suggest that users consider filtering the data to exclude entries where F is entirely zero. This is not always desirable because there are some situations where zero recorded reproduction is biologically reasonable. We suggest a similar approach for the C matrix.

Other issues

When using age-from-stage methods, users should be aware of the issue of convergence to quasi-stationary distribution (see ). Briefly, All age-from-stage calculations produce age-trajectories that inevitably asymptote as a mathematical consequence of describing the vital rates as functions of discrete stages (Horvitz & Tuljapurkar, 2008). This mathematical artefact can introduce bias into measures obtained using age-from-stage methods. Rage provides a convenient and principled way of correcting for this artefact by imposing a lower probability threshold defined by the degree of convergence to the quasi-stationary distribution (see ). We suggest that users filter out from their analyses matrices that do not pass this threshold criterion.

Users should also be aware of the issue of census type. For populations that reproduce in a pulse once per year. The demographic census may be carried out before or after the reproduction event. There are thus two types of census: Pre- and post-reproductive census. This distinction has potentially important implications for demographic measures because of its effects on measured population structure. For example, the fraction of individuals in the first age class will tend to be larger larger in a post-reproductive census than a pre-reproductive census. There is a column in the com(p)adre metadata (CensusType) that is intended to record this information but, because authors of source publications have rarely clearly stated this information, it is very incomplete. For serious analyses we therefore recommend that users carefully collect this information themselves from the source papers.

Finally…

Although we highlight here a range of issues that could cause problems for MPM analyses we have likely inadvertently omitted some issues. We therefore urge users to carefully consider issues that may pertain to their particular analyses.

References

Horvitz, C. C., & Tuljapurkar, S. (2008). Stage dynamics, period survival, and mortality plateaus. The American Naturalist, 172(2), 203–215.

Stott, I., Townley, S., & Carslake, D. (2010). On reducibility and ergodicity of population projection matrix models. Methods in Ecology and Evolution. 1 (3), 242-252