Loading [MathJax]/jax/output/HTML-CSS/jax.js
  • Getting started
    • Load packages
    • Experimental design
    • Count matrix
    • Fitting cpam
    • Result tables
    • Plotting genes and transcripts
    • Clusters
    • Many more options
  • Session Info
  • References

About

This tutorial demonstrates the R package cpam for the analysis of time series omics data. It serves as a basic introduction to the package. There are also two detailed case studies using real world data:

These case studies and several simulation studies are presented in the accompanying manuscript by Yates et al. (2024). See also, the package website.

Data

The data for the following examples have been simulated based on empirical RNA-seq data. These data are gene-level counts from a case-only design with 6 time points and 5 replicates per time point. Code to reproduce the data is available in this repository.

Installation

You can install cpam using:

install.packages("cpam")

Getting started

Load packages

library(cpam)  
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)

Experimental design

An example experimental design is included in the cpam package. Since it is case-only design, there are no experimental conditions beyond time.


# load example data
load(system.file("extdata", "exp_design_example.rda", package = "cpam"))
exp_design_example
ABCDEFGHIJ0123456789
sample
<chr>
time
<int>
X11
X21
X31
X41
X51
X62
X72
X82
X92
X102

Count matrix

The example count data for are provided as a matrix. Let’s take a look at the first few rows.

# load example data
load(system.file("extdata", "count_matrix_example.rda", package = "cpam"))
as.data.frame(count_matrix_example) %>% head 
ABCDEFGHIJ0123456789
 
 
X1
<dbl>
X2
<dbl>
X3
<dbl>
X4
<dbl>
X5
<dbl>
X6
<dbl>
X7
<dbl>
X8
<dbl>
X9
<dbl>
g00110070293137100748903029596279708897268535094282
g002865479268930919688827993847998658015
g003325831313024350736872419245625532238
g004245721857528039272302524424315262002479026165
g005125565118382115517105728114457120633115978108458113169
g006198820582530213521312139204320682298

Fitting cpam

To fit the models, we first prepare the cpam object, then compute p-values, estimate changepoints, and select the shape for each gene. In this simple example, simulated data are gene-level, that is we do not have isoform-level counts. As such, we leave ```the transcript-to-gene mapping (t2g) and we setgene_level = T`.

  cpo <- prepare_cpam(exp_design = exp_design_example,
                      count_matrix = count_matrix_example,
                      model_type = "case-only",
                      t2g = NULL,
                      gene_level = T,
                      num_cores = 1) # just for the example
  cpo <- compute_p_values(cpo) # 6 seconds
  cpo <- estimate_changepoint(cpo) # 4 seconds
  cpo <- select_shape(cpo) # 5 seconds

We can look at a summary of the fitted cpam object

cpo

If you run the code on your own computer, you can launch the Shiny app to visualise the results interactively using visualise(cpo).

Result tables

The results of the analysis are summarised using the results function.

results(cpo)
ABCDEFGHIJ0123456789
target_id
<chr>
p
<dbl>
cp
<dbl>
shape
<chr>
lfc.1
<dbl>
lfc.2
<dbl>
lfc.3
<dbl>
lfc.4
<dbl>
g0039.262545e-3191mdcx0-0.3487989-0.6342337-0.85630888
g0139.262545e-3193ilin00.00000000.00000000.32741484
g0559.262545e-3191ilin00.19461370.38922740.58384104
g0639.262545e-3191cv00.59489010.93667041.02534244
g0699.262545e-3192ilin00.00000000.22547760.45095529
g0909.262545e-3192ilin00.00000000.23855770.47711548
g1069.262545e-3191mdcx0-0.4703372-0.7777797-0.89248001
g1269.262545e-3191ilin00.22492460.44984920.67477381
g1289.262545e-3191ilin00.20917810.41835620.62753426
g1299.262545e-3191cv00.62573260.97856151.11304090

The generated results can be filtered by specifying minimum counts, minimum log-fold changes, and maximum p-values. For example, to return only the transcripts with a log-fold change greater than 1, at least 10 counts, and a p-value less than 0.01, we can run

results(cpo, min_count = 10, min_lfc = 1, p_threshold = 0.01)
ABCDEFGHIJ0123456789
target_id
<chr>
p
<dbl>
cp
<dbl>
shape
<chr>
lfc.1
<dbl>
lfc.2
<dbl>
lfc.3
<dbl>
lfc.4
<dbl>
g0639.262545e-3191cv00.59489010.93667041.02534244
g1269.262545e-3191ilin00.22492460.44984920.67477381
g1289.262545e-3191ilin00.20917810.41835620.62753426
g1299.262545e-3191cv00.62573260.97856151.11304090
g1719.262545e-3191cv00.59320230.92237411.01283317
g1869.262545e-3191cx0-0.2333560-0.3235506-0.20905145
g3319.262545e-3193ilin00.00000000.00000000.33921584
g3349.262545e-3191micv00.40186410.69175240.88626982
g3419.262545e-3191cx0-0.4649112-0.5925479-0.38290350
g3939.262545e-3191cv00.43232400.74719400.94460072

Plotting genes and transcripts

A single gene can be plotted using the plot_cpam function. Here we plot the gene g063

plot_cpam(cpo, gene_id = "g063")

The subtitle shows (1,cv) indicating a changepoint at first time point (i.e., the gene responds immediately) and a convex ('cv') shaped trend.

Let’s look for a gene that has a more complex trend. Unconstrained shapes in cpam are denoted by 'tp'.1 We can filter the results for genes with unconstrained shapes and plot one of them.

results(cpo) %>% 
  filter(shape == "tp")
ABCDEFGHIJ0123456789
target_id
<chr>
p
<dbl>
cp
<dbl>
shape
<chr>
lfc.1
<dbl>
lfc.2
<dbl>
lfc.3
<dbl>
lfc.4
<dbl>
g2109.262545e-3191tp00.63925170.83470980.66199867
g3259.262545e-3191tp0-0.8970277-0.8822846-0.15693398
g3359.262545e-3191tp00.25192390.31118180.24016134
g3399.262545e-3191tp0-0.7559925-0.7772115-0.30624356
g3999.262545e-3191tp00.24833450.48663530.70509583
g4009.262545e-3191tp0-0.9166212-0.67434650.55843871
g4179.262545e-3191tp0-0.4254884-0.5744064-0.53110347
g4249.262545e-3191tp0-0.3248004-0.22656970.04423327
g6409.262545e-3191tp00.57474570.76675300.39499918
g7159.262545e-3191tp00.83593020.8100431-0.23898464

We plot the first gene in the list.

plot_cpam(cpo, gene_id = "g210")

This selection of 'tp' suggests that the trend for this gene does not conform to one of the simpler shape types that cpam uses. We can exclude 'tp' as an option and force cpam to choose among the simpler forms by setting shape_type = "shape2" in the plot_cpam function ("shape1", the default, allows the 'tp'). This choice can be useful for analysis such as clustering. For example:

plot_cpam(cpo, gene_id = "g210",shape_type = "shape2")

Here a monotonic increasing concave shape (‘micv’) is chosen, and we can see this trend deviates from the data substantially more that the unconstrained shape. See the manuscript for more details on the shape types available in cpam.

Next we look for a gene with a changepoint by filtering for genes with changepoints at the third time point.

results(cpo) %>% 
  filter(cp == 3)
ABCDEFGHIJ0123456789
target_id
<chr>
p
<dbl>
cp
<dbl>
shape
<chr>
lfc.1
<dbl>
lfc.2
<dbl>
lfc.3
<dbl>
lfc.4
<dbl>
lfc.5
<dbl>
g0139.262545e-3193ilin0000.32741480.6548297
g1469.262545e-3193dlin000-0.3292578-0.6585156
g1879.262545e-3193dlin000-0.4193064-0.8386128
g2979.262545e-3193ilin0000.30812350.6162471
g3049.262545e-3193cx000-1.0422935-0.6516829
g3319.262545e-3193ilin0000.33921580.6784317
g5599.262545e-3193ilin0000.29220050.5844010
g6239.262545e-3193micv0000.31694630.6338925
g6349.262545e-3193cx000-0.8798729-1.1330087
g6529.262545e-3193ilin0000.36298760.7259753

Again, we plot the first gene in the list.

plot_cpam(cpo, gene_id = "g013")

Clusters

The results function can be used to generate clusters according to selected filters. The plot_cluster function can then be used to visualise the clusters. With such a small simulated data set, we don’t have many genes in each cluster, but we can try a few different clustering options to get an idea of how the function works.

res <- results(cpo)
plot_cluster(cpo, res, changepoints = 1, shapes = c("cv"))

There are 19 genes with a concave shape and a changepoint at the first time point.

More than one shape or changepoint can be provided. For example:

plot_cluster(cpo, res, changepoints = 2, shapes = c("dlin","mdcx"))

There are just four genes with decreasing linear or monotonic decreasing convex shapes which have a changepoint at the second time point.

Clustering can be further refined based on, for example, the rate at which the above transcripts attain their maximum values. We illustrate advanced refinements such as this case study.

Many more options

This is just a simple example to get you started. The package has many more features and options. Check out the two case studies to see how cpam can be used to analyse real-world data:

Session Info

Click to expand
sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Pop!_OS 22.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Australia/Hobart
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.1 stringr_1.5.1 tidyr_1.3.1   dplyr_1.1.4   cpam_0.1.3   
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9         generics_0.1.3     stringi_1.8.4      lattice_0.22-6     digest_0.6.37      magrittr_2.0.3    
#>  [7] RColorBrewer_1.1-3 evaluate_1.0.3     grid_4.4.3         fastmap_1.2.0      jsonlite_1.8.9     Matrix_1.7-2      
#> [13] limma_3.61.8       mgcv_1.9-1         purrr_1.0.4        scales_1.3.0       codetools_0.2-20   jquerylib_0.1.4   
#> [19] cli_3.6.4          rlang_1.1.5        pbmcapply_1.5.1    munsell_0.5.1      splines_4.4.3      scam_1.2-18       
#> [25] withr_3.0.2        cachem_1.1.0       yaml_2.3.10        tools_4.4.3        parallel_4.4.3     colorspace_2.1-1  
#> [31] locfit_1.5-9.10    vctrs_0.6.5        R6_2.6.1           matrixStats_1.5.0  lifecycle_1.0.4    edgeR_4.3.7       
#> [37] pkgconfig_2.0.3    pillar_1.10.1      bslib_0.9.0        gtable_0.3.6       glue_1.8.0         Rcpp_1.0.14       
#> [43] statmod_1.5.0      xfun_0.51          tibble_3.2.1       tidyselect_1.2.1   rstudioapi_0.17.0  knitr_1.49        
#> [49] farver_2.1.2       htmltools_0.5.8.1  nlme_3.1-167       labeling_0.4.3     rmarkdown_2.27     compiler_4.4.3    
#> [55] mvnfast_0.2.8

1 tp stands for thinplate which is the type of spline used for the ‘unconstrained’ curves as defined in the mgcv package. The curves are still penalised to be smooth, but the shape type is not fixed.

References

Yates, Luke A., Jazmine L. Humphreys, Michael A. Charleston, and Steven M. Smith. 2024. “Shape-Constrained, Changepoint Additive Models for Time Series Omics Data with Cpam.” BioRxiv, December. https://doi.org/10.1101/2024.12.22.630003.