This article will show users how to register data using the sample data provided by the package. Given an input data, users can directly register the data as illustrated below.
greatR
package provides an example of data frame
containing two different species A. thaliana and B.
rapa with two and three different replicates, respectively. This
data frame can be read as follows:
# Load the package
library(greatR)
library(data.table)
# Load a data frame from the sample data
<- system.file("extdata/brapa_arabidopsis_all_replicates.csv", package = "greatR") |>
b_rapa_data ::fread() data.table
Note that the data has all of five columns required by the package:
1:6], by = accession] |>
b_rapa_data[, .SD[::kable() knitr
gene_id | accession | timepoint | expression_value | replicate |
---|---|---|---|---|
BRAA02G018970.3C | Ro18 | 11 | 0.3968734 | Ro18-11-a |
BRAA02G018970.3C | Ro18 | 11 | 1.4147711 | Ro18-11-b |
BRAA02G018970.3C | Ro18 | 11 | 0.7423984 | Ro18-11-c |
BRAA02G018970.3C | Ro18 | 29 | 11.3007002 | Ro18-29-a |
BRAA02G018970.3C | Ro18 | 29 | 23.2055664 | Ro18-29-b |
BRAA02G018970.3C | Ro18 | 29 | 22.0307747 | Ro18-29-c |
BRAA02G018970.3C | Col0 | 7 | 0.4667855 | Col0-07-a |
BRAA02G018970.3C | Col0 | 7 | 0.0741901 | Col0-07-b |
BRAA02G018970.3C | Col0 | 8 | 0.0000000 | Col0-08-a |
BRAA02G018970.3C | Col0 | 8 | 0.0000000 | Col0-08-b |
BRAA02G018970.3C | Col0 | 9 | 0.3722542 | Col0-09-a |
BRAA02G018970.3C | Col0 | 9 | 0.0000000 | Col0-09-b |
To align gene expression time-course between Arabidopsis
Col-0 and B. rapa Ro18, we can use function
register()
. By default, the best registration parameters
are optimised via Nelder-Mead (optimisation_method = "nm"
).
When using the default
optimise_registration_parameters = TRUE
, the stretch and
shift search space is automatically estimated. For more details on the
other function paramaters, go to register()
.
<- register(
registration_results
b_rapa_data,reference = "Ro18",
query = "Col0"
)#> ── Validating input data ────────────────────────────────────────────────────────
#> ℹ Will process 10 genes.
#>
#> ── Starting registration with optimisation ──────────────────────────────────────
#> ℹ Using Nelder-Mead method.
#> ℹ Using computed stretches and shifts search space limits.
#> ✔ Optimising registration parameters for genes (10/10) [2.3s]
The function register()
returns a list of two
frames:
data
is a data frame containing the scaled expression
data and an additional timepoint_reg
column which is a
result of registered time points by applying the registration parameters
to the query data.model_comparison
is a data frame containing (a) the
optimal stretch and shift for each gene_id
and (b) the
Bayesian Information Criterion (BIC) for the separate model
(BIC_separate
) and for the combined model
(BIC_combined
) after applying optimal registration
parameters for each gene. If the value of BIC_combined
<
BIC_separate
, then expression dynamics between reference
and query data can be registered (registered = TRUE
).To check whether a gene is registered or not, we can get the summary
results by accessing the model_comparison
table from the
registration result.
$model_comparison |>
registration_results::kable() knitr
gene_id | stretch | shift | BIC_separate | BIC_combined | registered |
---|---|---|---|---|---|
BRAA02G018970.3C | 3.973246 | -15.047106 | 48.03687 | 40.36121 | TRUE |
BRAA02G043220.3C | 3.826033 | -11.495610 | 59.00165 | 50.88363 | TRUE |
BRAA03G023790.3C | 1.664500 | 8.463653 | 52.95771 | 45.92210 | TRUE |
BRAA03G051930.3C | 2.349747 | 5.664320 | 54.83773 | 47.03630 | TRUE |
BRAA04G005470.3C | 3.298411 | -2.561531 | 41.20168 | 32.84564 | TRUE |
BRAA05G005370.3C | 1.876649 | 6.580776 | 52.01824 | 44.35298 | TRUE |
BRAA06G025360.3C | 3.170889 | -5.140066 | 57.69557 | 48.83989 | TRUE |
BRAA07G030470.3C | 3.999999 | -9.495934 | 42.17333 | 34.12138 | TRUE |
BRAA07G034100.3C | 3.999999 | -10.460581 | 41.83850 | 33.84178 | TRUE |
BRAA09G045310.3C | 3.159727 | -1.205799 | 43.44362 | 35.07316 | TRUE |
From the sample data above, we can see that for all ten genes,
registered = TRUE
, meaning that reference and query data
between those ten genes can be aligned or registered. These data frame
outputs can further be summarised and visualised; see the documentation
on the visualising
results article.