Populations are often heterogeneous in their social contact patterns. Such heterogeneity is often age-dependent and varies between age groups, and can strongly influence epidemic dynamics if the outbreak primarily circulates in a particular age group. This heterogeneity can be incorporated into final size calculations (Miller 2012), and is implemented in *finalsize*.

**New to finalsize?** It may help to read the “Get started” vignette first!

There is substantial **heterogeneity in contact patterns** in a population. We want to know how this heterogeneity could affect the final size of the epidemic.

- An estimate of the infection’s basic reproduction number \(R_0\);
- An estimate of the distribution of the population in different demographic groups (typically age groups);
- An estimate of the contacts between individuals of different demographic groups; and
- An estimate of the susceptibility of each demographic group to the infection.

- That the infection dynamics can be captured using a Susceptible-Infectious-Recovered (SIR) or Susceptible-Exposed-Infectious-Recovered (SEIR) model, where individuals who have been infected acquire immunity against subsequent infection, at least for the remaining duration of the epidemic.

```
# load finalsize
library(finalsize)
# load necessary packages
if (!require("socialmixr")) install.packages("socialmixr")
if (!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)
```

A number of statistical methods can be used to estimate the \(R_0\) of an epidemic in its early stages from available data. These are not discussed here, but some examples are given in the *episoap* package.

Instead, this example considers a infection with an \(R_0\) of 1.5, similar to that which could potentially be observed for pandemic influenza.

```
# define r0 as 1.5
<- 1.5 r0
```

This example uses social contact data from the POLYMOD project (Mossong et al. 2008) to estimate the final size of an epidemic in the U.K. These data are provided with the `socialmixr`

package.

The contact data are divided into five age groups: 0 – 4, 5 – 17, 18 – 39, 40 – 64, and 65 and over, specified using the `age.limits`

argument in `socialmixr::contact_matrix()`

.
The `symmetric = TRUE`

argument to `socialmixr::contact_matrix()`

returns a symmetric contact matrix, so that the contacts reported by group \(\{i\}\) of individuals from group \(\{j\}\) are the same as those reported by group \(\{j\}\) of group \(\{i\}\).

The demographic data — the number of individuals in each age group — is also available through `socialmixr::contact_matrix()`

.

```
# get UK polymod data
<- socialmixr::polymod
polymod <- socialmixr::contact_matrix(
contact_data
polymod,countries = "United Kingdom",
age.limits = c(0, 5, 18, 40, 65),
symmetric = TRUE
)
# view the elements of the contact data list
# the contact matrix
$matrix
contact_data#> contact.age.group
#> [0,5) [5,18) [18,40) [40,65) 65+
#> [1,] 1.9157895 1.467205 3.278803 1.559611 0.3658614
#> [2,] 0.5191021 8.858736 3.421566 2.616146 0.6260247
#> [3,] 0.6252718 1.844236 5.440972 3.492124 0.8092367
#> [4,] 0.2792593 1.324010 3.278895 4.178218 1.2380907
#> [5,] 0.1306272 0.631752 1.515092 2.468756 1.7142857
# the demography data
$demography
contact_data#> age.group population proportion year
#> 1: [0,5) 3453670 0.05728738 2005
#> 2: [5,18) 9761554 0.16191873 2005
#> 3: [18,40) 18110368 0.30040378 2005
#> 4: [40,65) 19288101 0.31993930 2005
#> 5: 65+ 9673058 0.16045081 2005
# get the contact matrix and demography data
<- t(contact_data$matrix)
contact_matrix <- contact_data$demography$population
demography_vector <- contact_data$demography
demography_data
# scale the contact matrix so the largest eigenvalue is 1.0
# this is to ensure that the overall epidemic dynamics correctly reflect
# the assumed value of R0
<- contact_matrix / max(Re(eigen(contact_matrix)$values))
contact_matrix
# divide each row of the contact matrix by the corresponding demography
# this reflects the assumption that each individual in group {j} make contacts
# at random with individuals in group {i}
<- contact_matrix / demography_vector
contact_matrix
<- length(demography_vector) n_demo_grps
```

As a starting scenario, consider a novel pathogen where all age groups have a similar, high susceptibility to infection. This means it is assumed that all individuals fall into a single category: fully susceptible.

Full uniform susceptibility can be modelled as a matrix with values of 1.0, with as many rows as there are demographic groups. The matrix has a single column, representing the single susceptibility group to which all individuals belong.

```
# all individuals are equally and highly susceptible
<- 1L
n_susc_groups <- 1.0 susc_guess
```

```
<- matrix(
susc_uniform data = susc_guess,
nrow = n_demo_grps,
ncol = n_susc_groups
)
```

Final size calculations also need to know the proportion of each demographic group \(\{i\}\) that falls into the susceptibility group \(\{j\}\). This distribution of age groups into susceptibility groups can be represented by the demography-susceptibility distribution matrix.

Since all individuals in each age group have the same susceptibility, there is no variation within age groups. Consequently, all individuals in each age group are assumed to be fully susceptible. This can be represented as a single-column matrix, with as many rows as age groups, and as many columns as susceptibility groups.

In this example, the matrix `p_susc_uniform`

has 5 rows, one for each age group, and only one column, for the single high susceptibility group that holds all individuals.

```
<- matrix(
p_susc_uniform data = 1.0,
nrow = n_demo_grps,
ncol = n_susc_groups
)
```

This example models susceptibility (`susc_uniform`

) and the demography-in-susceptibility (`p_susc_uniform`

) as matrices rather than vectors. This is because a single susceptibility group is a special case of the general final size equation.

*finalsize* supports multiple susceptibility groups (this will be covered later), and these are more easily represented as a matrix, the *susceptibility matrix*.

Each element \(\{i, j\}\) in this matrix represents the susceptibility of individuals in demographic group \(\{i\}\), and susceptibility group \(\{j\}\).

In this example, all individuals are equally susceptible to infection, and thus the susceptibility matrix (`susc_uniform`

) has only a single column with identical values.

Consequently, the demography-susceptibility distribution matrix (`p_susc_uniform`

) has the same dimensions, and all of its values are 1.0.

See the “Heterogeneous susceptibility” example for more on cases where susceptibility varies within age groups.

`final_size`

The final size of the epidemic in the population can then be calculated using the only function in the package, `final_size()`

. This example allows the function to fall back on the default options for the arguments `solver`

(`"iterative"`

) and `control`

(an empty list).

```
# calculate final size
<- final_size(
final_size_data r0 = r0,
contact_matrix = contact_matrix,
demography_vector = demography_vector,
susceptibility = susc_uniform,
p_susceptibility = p_susc_uniform
)
# view the output data frame
final_size_data#> demo_grp susc_grp susceptibility p_infected
#> 1 [0,5) susc_grp_1 1 0.4160682
#> 2 [5,18) susc_grp_1 1 0.6888703
#> 3 [18,40) susc_grp_1 1 0.5379381
#> 4 [40,65) susc_grp_1 1 0.4639496
#> 5 65+ susc_grp_1 1 0.3042623
```

```
# order demographic groups as factors
$demo_grp <- factor(
final_size_data$demo_grp,
final_size_datalevels = demography_data$age.group
)
```

```
# plot data
ggplot(final_size_data) +
geom_col(
aes(
demo_grp, p_infected
),colour = "black", fill = "grey"
+
) scale_y_continuous(
labels = scales::percent,
limits = c(0, 1)
+
) expand_limits(
x = c(0.5, nrow(final_size_data) + 0.5)
+
) theme_classic() +
coord_cartesian(
expand = FALSE
+
) labs(
x = "Age group",
y = "% Infected"
)
```

`finalsize`

returns the *proportion* of each age (and susceptibility) group infected in an epidemic outbreak. The final *counts* of individuals infected can be visualised as well, by multiplying the final proportion of each age group infected with the total number of individuals in that group.

The example below show how this can be done.

```
# prepare demography data
<- contact_data$demography
demography_data
# merge final size counts with demography vector
<- merge(
final_size_data
final_size_data,
demography_data,by.x = "demo_grp",
by.y = "age.group"
)
# reset age group order
$demo_grp <- factor(
final_size_data$demo_grp,
final_size_datalevels = contact_data$demography$age.group
)
# multiply counts with proportion infected
$n_infected <- final_size_data$p_infected *
final_size_data$population final_size_data
```

```
ggplot(final_size_data) +
geom_col(
aes(
x = demo_grp, y = n_infected
),fill = "grey", col = "black"
+
) expand_limits(
x = c(0.5, nrow(final_size_data) + 0.5)
+
) scale_y_continuous(
labels = scales::comma_format(
scale = 1e-6, suffix = "M"
),limits = c(0, 15e6)
+
) theme_classic() +
coord_cartesian(
expand = FALSE
+
) labs(
x = "Age group",
y = "Number infected (millions)"
)
```

Miller, Joel C. 2012. “A Note on the Derivation of Epidemic Final Sizes.” *Bulletin of Mathematical Biology* 74 (9): 2125–41. https://doi.org/10.1007/s11538-012-9749-6.

Mossong, Joël, Niel Hens, Mark Jit, Philippe Beutels, Kari Auranen, Rafael Mikolajczyk, Marco Massari, et al. 2008. “Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases.” *PLOS Medicine* 5 (3): e74. https://doi.org/10.1371/journal.pmed.0050074.