swaRverse
provides a pipeline to extract metrics of
collective motion from grouping individuals trajectories. Metrics
include either global (group-level) or pairwise (individual-level)
characteristics of the group. After calculating the timeseries of these
metrics, the package estimates their averages over each ‘event’ of
collective motion. More details about how an event is defined is given
below. Let’s start with ..
We start by adding headings and speeds to the trajectory data, and splitting the whole dataframe into a list of dataframes, one per set. For this, we need to specify whether the data correspond to geo data (lon-lat) or not.
library(swaRmverse)
#data_df <- trackdf::tracks
#raw$set <- c(rep('ctx1', nrow(raw)/2 ), rep('ctx2', nrow(raw)/2))
<- read.csv(system.file("extdata/video/01.csv", package = "trackdf"))
raw <- raw[!raw$ignore, ]
raw
## Add fake context
$context <- c(rep("ctx1", nrow(raw) / 2), rep("ctx2", nrow(raw) / 2))
raw
<- set_data_format(raw_x = raw$x,
data_df raw_y = raw$y,
raw_t = raw$frame,
raw_id = raw$id,
origin = "2020-02-1 12:00:21",
period = "0.04S",
tz = "America/New_York",
raw_context = raw$context
)
<- FALSE
is_geo <- add_velocities(data_df,
data_dfs geo = is_geo,
verbose = TRUE,
parallelize = FALSE
## A list of dataframes )
## Adding velocity info to every set of the dataset..
## Done!
#head(data_dfs[[1]])
print(paste("Velocity information added for", length(data_dfs), "sets."))
## [1] "Velocity information added for 2 sets."
If there is a high number of sets in the dataset, the parallelization of the function can be turned on (setting parallelize argument to TRUE). This is not recommended for small to intermediate data sizes.
Based on the list of positional data and calculated velocities, we can now calculate the timeseries of group polarization, average speed, and shape. As a proxy for group shape we use the angle between the object-oriented bounding box that includes the position of all group members and the average heading of the group. Small angles close to 0 rads represent oblong groups, while large angles close to pi/2 rads wide groups. The group_metrics function calculates the timeseries of each measurement across sets. To reduce noise, the function further calculates the smoothed timeseries of speed and polarization over a given time window (using a moving average).
<- 0.04
sampling_timestep <- 1 # seconds
time_window <- time_window / sampling_timestep
smoothing_time_window
<- group_metrics_per_set(data_list = data_dfs,
g_metr mov_av_time_window = smoothing_time_window,
step2time = sampling_timestep,
geo = is_geo,
parallelize = FALSE
)summary(g_metr)
## set t pol
## Length:2802 Min. :2020-02-01 12:00:21.03 Min. :0.01027
## Class :character 1st Qu.:2020-02-01 12:00:49.04 1st Qu.:0.20701
## Mode :character Median :2020-02-01 12:01:17.05 Median :0.32532
## Mean :2020-02-01 12:01:17.03 Mean :0.33785
## 3rd Qu.:2020-02-01 12:01:45.02 3rd Qu.:0.44768
## Max. :2020-02-01 12:02:13.03 Max. :0.97476
## NA's :2
## speed shape N missing_ind
## Min. : 35.42 Min. :0.0002811 Min. :3.000 Min. :0.0000
## 1st Qu.: 132.02 1st Qu.:0.4259205 1st Qu.:7.000 1st Qu.:0.0000
## Median : 175.98 Median :0.8333063 Median :7.000 Median :1.0000
## Mean : 742.80 Mean :0.8132966 Mean :7.291 Mean :0.5543
## 3rd Qu.: 243.84 3rd Qu.:1.1968371 3rd Qu.:8.000 3rd Qu.:1.0000
## Max. :12232.96 Max. :1.5706044 Max. :9.000 Max. :5.0000
## NA's :2 NA's :2 NA's :2
## speed_av pol_av
## Min. : 111.5 Min. :0.1696
## 1st Qu.: 426.3 1st Qu.:0.2852
## Median : 670.1 Median :0.3259
## Mean : 746.7 Mean :0.3379
## 3rd Qu.:1005.7 3rd Qu.:0.3812
## Max. :2241.2 Max. :0.5599
## NA's :50 NA's :50
As before, one can parallelize the function if the data are from many different days/sets. A column of N and missing_ind are added to the dataframe, showing the group size of that time point and whether an individual has NA data.
From the timeseries of positions and velocities, we can calculate information concerning the nearest neighbor of each group member. Here we estimate the distance and the bearing angle (angle between the focal individual’s heading and its neighbor) to the nearest neighbor of each individual. These, along with the id of the nearest neighbor, are added as columns to the positional timeseries dataframe:
<- pairwise_metrics(data_list = data_dfs,
data_df geo = is_geo,
verbose = TRUE,
parallelize = FALSE,
add_coords = FALSE # could be set to TRUE if the relative positions of neighbors are needed
)
## Pairwise analysis started..
#tail(data_df)
Based on the global and local measurements, we then calculate a series of metrics that aim to capture the dynamics of the collective motion of the group. These metrics are calculated over parts of the trajectories that the group is performing coordinated collective motion, when the group is moving (average speed is higher than a given threshold) and is somewhat polarized (polarization higher than a given threshold). These parts are defined as ‘events’. The thresholds are asked by the user in run time if ‘interactive_mode’ is activated, after printing the quantiles of average speed and polarization across all data. Otherwise, the thresholds (pol_lim and speed_lim) should be given as inputs. If both limits are set to 0, a set will be taken as a complete event. The time between observation is needed as input to distinguish between continuous events. When the group and pairwise timeseries are calculated, one can calculate the metrics per event:
### Interactive mode, if the limits of speed and polarization are unknown
# new_species_metrics <- col_motion_metrics(data_df,
# global_metrics = g_metr,
# step2time = sampling_timestep,
# verbose = TRUE,
# speed_lim = NA,
# pol_lim = NA
#
# )
<- col_motion_metrics(data_df,
new_species_metrics global_metrics = g_metr,
step2time = sampling_timestep,
verbose = TRUE,
speed_lim = 150,
pol_lim = 0.3
)
# summary(new_species_metrics)
The number of events and their total duration given the input thresholds is also printed. If we are not interested in inspecting the timeseries of the measurements, on can calculate the metrics directly from the formatted dataset:
<- col_motion_metrics_from_raw(data_df,
new_species_metrics mov_av_time_window = smoothing_time_window,
step2time = sampling_timestep,
geo = is_geo,
verbose = TRUE,
speed_lim = 150,
pol_lim = 0.3,
parallelize_all = FALSE
)
## Adding velocity info to every set of the dataset..
## Done!
# summary(new_species_metrics)
Since we are interested in comparing different datasets across species or contexts, a new species id column should be added:
$species <- "new_species_1"
new_species_metrics
head(new_species_metrics)
## event N set start_time mean_mean_nnd mean_sd_nnd
## 1 1 8 2020-02-01_ctx1 2020-02-01 12:00:21 261.3625 202.00913
## 2 2 8 2020-02-01_ctx1 2020-02-01 12:00:23 188.0053 123.32601
## 3 3 7 2020-02-01_ctx1 2020-02-01 12:00:25 184.0563 72.31109
## 4 4 8 2020-02-01_ctx1 2020-02-01 12:00:26 199.2923 73.48385
## 5 5 8 2020-02-01_ctx1 2020-02-01 12:00:27 156.2709 132.57649
## 6 6 7 2020-02-01_ctx1 2020-02-01 12:00:28 158.0017 96.51129
## sd_mean_nnd mean_pol sd_pol stdv_speed mean_sd_front mean_mean_bangl
## 1 3.542036 0.3400859 0.1434671 1.8786978 0.2875128 1.637330
## 2 24.690932 0.3337442 0.1693607 1.6556076 0.2743454 1.238517
## 3 27.403410 0.3341782 0.1962730 1.8533976 0.3199358 1.778552
## 4 11.876152 0.4152229 0.0000000 0.0000000 0.3290209 1.754450
## 5 4.002438 0.2857535 0.1386557 0.8965682 0.2441024 1.513099
## 6 26.039637 0.4056536 0.1758167 1.9728170 0.2924780 1.693242
## mean_shape sd_shape event_dur species
## 1 0.7355342 0.4642304 1.28 new_species_1
## 2 0.7145053 0.3854631 1.12 new_species_1
## 3 0.9832798 0.3983144 1.08 new_species_1
## 4 1.1002695 0.0000000 0.04 new_species_1
## 5 1.0568951 0.3513711 0.32 new_species_1
## 6 0.9961263 0.3764097 7.80 new_species_1
## Un-comment bellow to save the output in order to combine it with other datasets (replace 'path2file' with appropriate local path and name).
# write.csv(new_species_metrics, file = path2file.csv, row.names = FALSE) # OR R object
# save(new_species_metrics, file = path2file.rda)
The duration, starting time and group size (N) of each event are also added to the result dataframe. We suggest filtering out events of very small duration and with less than 3 individuals (singletons and pairs). The calculated metrics are: