Registering data

This article will show users how to register data using the sample data provided by the package. Given an input data, users can directly register the data as illustrated below.


Loading sample data

greatR package provides an example of data frame containing two different species A. thaliana and B. rapa with two and three different replicates, respectively. This data frame can be read as follows:

# Load the package
library(greatR)
library(data.table)
# Load a data frame from the sample data
b_rapa_data <- system.file("extdata/brapa_arabidopsis_all_replicates.csv", package = "greatR") |>
  data.table::fread()

Note that the data has all of five columns required by the package:

b_rapa_data[, .SD[1:6], by = accession] |>
  knitr::kable()
gene_id accession timepoint expression_value replicate
BRAA02G018970.3C Ro18 11 0.3968734 Ro18-11-a
BRAA02G018970.3C Ro18 11 1.4147711 Ro18-11-b
BRAA02G018970.3C Ro18 11 0.7423984 Ro18-11-c
BRAA02G018970.3C Ro18 29 11.3007002 Ro18-29-a
BRAA02G018970.3C Ro18 29 23.2055664 Ro18-29-b
BRAA02G018970.3C Ro18 29 22.0307747 Ro18-29-c
BRAA02G018970.3C Col0 7 0.4667855 Col0-07-a
BRAA02G018970.3C Col0 7 0.0741901 Col0-07-b
BRAA02G018970.3C Col0 8 0.0000000 Col0-08-a
BRAA02G018970.3C Col0 8 0.0000000 Col0-08-b
BRAA02G018970.3C Col0 9 0.3722542 Col0-09-a
BRAA02G018970.3C Col0 9 0.0000000 Col0-09-b

Registering the data

To align gene expression time-course between Arabidopsis Col-0 and B. rapa Ro18, we can use function register(). By default, the best registration parameters are optimised via Nelder-Mead (optimisation_method = "nm"). When using the default optimise_registration_parameters = TRUE, the stretch and shift search space is automatically estimated. For more details on the other function paramaters, go to register().

registration_results <- register(
  b_rapa_data,
  reference = "Ro18",
  query = "Col0"
)
#> ── Validating input data ────────────────────────────────────────────────────────
#> ℹ Will process 10 genes.
#>
#> ── Starting registration with optimisation ──────────────────────────────────────
#> ℹ Using Nelder-Mead method.
#> ℹ Using computed stretches and shifts search space limits.
#> ✔ Optimising registration parameters for genes (10/10) [2.3s]

Registration results

The function register() returns a list of two frames:

To check whether a gene is registered or not, we can get the summary results by accessing the model_comparison table from the registration result.

registration_results$model_comparison |>
  knitr::kable()
gene_id stretch shift BIC_separate BIC_combined registered
BRAA02G018970.3C 3.973246 -15.047106 48.03687 40.36121 TRUE
BRAA02G043220.3C 3.826033 -11.495610 59.00165 50.88363 TRUE
BRAA03G023790.3C 1.664500 8.463653 52.95771 45.92210 TRUE
BRAA03G051930.3C 2.349747 5.664320 54.83773 47.03630 TRUE
BRAA04G005470.3C 3.298411 -2.561531 41.20168 32.84564 TRUE
BRAA05G005370.3C 1.876649 6.580776 52.01824 44.35298 TRUE
BRAA06G025360.3C 3.170889 -5.140066 57.69557 48.83989 TRUE
BRAA07G030470.3C 3.999999 -9.495934 42.17333 34.12138 TRUE
BRAA07G034100.3C 3.999999 -10.460581 41.83850 33.84178 TRUE
BRAA09G045310.3C 3.159727 -1.205799 43.44362 35.07316 TRUE

From the sample data above, we can see that for all ten genes, registered = TRUE, meaning that reference and query data between those ten genes can be aligned or registered. These data frame outputs can further be summarised and visualised; see the documentation on the visualising results article.