This vignette demonstrates a workflow for analyzing consumer preference data from decentralized trials of cassava (Manihot esculenta Crantz) varieties in Nigeria and Cameroon. Using the tricot approach [1], 1,000 participants evaluated gari-eba made from 13 cassava genotypes in 2022. The trial was implemented by the International Institute of Tropical Agriculture (IITA) under the RTBFoods project (https://rtbfoods.cirad.fr). Participants assessed overall preference and traits such as color, stretchability, and taste, reflecting diverse consumer priorities.
Building on studies by Olaosebikan et al. (2023) [2] and Emmanuel Alamu et al. (2023) [3], this vignette introduces an alternative workflow. It leverages statistical models, such as the Plackett-Luce model, to analyze overall preference, explore trait-specific performance, and account for consumer heterogeneity. By segmenting the data by groups like country, the analysis uncovers context-specific varietal performance and identifies the best varieties for specific groups.
Additionally, a weighted selection index is proposed to integrate multiple traits, enabling a data-driven approach to ranking varieties. This workflow emphasizes a consumer- and market-oriented approach, offering insights into plant breeding and selection strategies that align with local preferences and environmental contexts. The selection index approach is under development and open for improvements, suggestions and comments.
The cassava
data is a data frame with 1,000 observations
and 27 variables, which are described in the data documentation with
?cassava
. This vignette will require the packages
PlackettLuce [4], ClimMobTools [5], ggplot2 [6],
and patchwork [7].
library("gosset")
library("ClimMobTools")
library("PlackettLuce")
library("ggplot2")
library("patchwork")
data("cassava", package = "gosset")
dat = cassava
head(dat[, 1:11])
## id option_a option_b option_c country gender age
## 1 Osun-pkg1 TMS3 Game Changer TMS6 Nigeria Man 24
## 2 Osun-pkg10 TMS1 TMS3 Obasanjo-2 Nigeria Woman 29
## 3 Osun-pkg100 Game Changer Akpu TMS2 Nigeria Man 41
## 4 Osun-pkg101 TMS6 TMEB2 TMS1 Nigeria Woman 52
## 5 Osun-pkg102 TMEB2 TMS2 Obasanjo-2 Nigeria Woman 40
## 6 Osun-pkg103 Obasanjo-2 Akpu Game Changer Nigeria Woman 23
## consumption consumptionform colour_pos colour_neg
## 1 once a week <NA> C B
## 2 several times a week <NA> A C
## 3 several times a week <NA> A B
## 4 several times a week <NA> A C
## 5 several times a week <NA> C B
## 6 several times a week <NA> A B
Here, we select the columns with complete cases to perform the analysis. The remaining traits (tested in both studies) are color, stretchability, taste, and overall preference.
keep = unlist(lapply(dat[1:ncol(dat)], function(x) sum(is.na(x))))
keep = keep == 0
dat = dat[, keep]
names(dat)
## [1] "id" "option_a" "option_b"
## [4] "option_c" "country" "gender"
## [7] "age" "consumption" "colour_pos"
## [10] "colour_neg" "stretchability_pos" "stretchability_neg"
## [13] "taste_pos" "taste_neg" "overall_pos"
## [16] "overall_neg"
The tricot data, in its original form, has a standard structure for
storing the ranking data in the form of two columns with the trait name
and the value for the best and the worst ranking (e.g., overall_pos,
overall_neg). The function getTraitList()
from the
ClimMobTools
package runs through the columns in the data
to identify this structure and validates the rankings. The output is a
list with the identified rankings, their strings in the dataset, a
vector with the validated rankings, and the trait name. In the cassava
dataset, the pattern for the best and worst rankings is
c("_pos", "_neg")
. We input this to the function
getTraitList()
using the argument
pattern =
.
# extract list of traits from the data
trait_list = getTraitList(dat, pattern = c("_pos", "_neg"))
# trait names extracted from the function
traits = unlist(lapply(trait_list, function(x) x$trait_label))
# clean trait names and put them title case
traits = gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", traits, perl = TRUE)
traits
## [1] "Colour" "Stretchability" "Taste" "Overall"
Here, we perform some data preparation by defining the column indices for the variables with the cassava sample names, the names of varieties tested in the study, and the position of “Overall” within the trait list. We set Obasanjo-2 as the check variety.
pack = c("option_a", "option_b", "option_c")
items = sort(unique(unlist(dat[pack])))
check = "Obasanjo-2"
ov = which(traits %in% "Overall")
The Plackett-Luce model [4] is used to
analyze the data. First, we transform the data into a PlackettLuce
object using the function rank_tricot()
. We use
lapply()
to apply the function to the list of traits.
R = lapply(trait_list, function(x) {
rank_tricot(dat,
items = pack,
input = x$string,
validate.rankings = TRUE)
})
To fit the model, we use the function PlackettLuce()
from the package ‘PlackettLuce’ [4], also
using lapply()
. This returns a list of models, one per
trait. First, we focus on the overall preference. We use the function
plot()
to visualize the Plackett-Luce estimates as
log-worth parameters using the argument
log = TRUE
. The argument levels =
establishes
the order of the varieties in the plot.
mod = lapply(R, PlackettLuce)
plot(mod[[ov]],
ref = check,
log = TRUE,
levels = rev(items))
The plot shows that the varieties TMS6, Sape and TMEB1 outperform
Obasanjo-2 for overall preference. We use the function
reliability()
to identify the improvement achieved by these
varieties against reference Obasanjo-2 [8]. The function returns a data frame with the
reliability estimates, where values above 0.5 indicates that the variety
outperforms the reference. We plot these results using
ggplot()
.
rel = reliability(mod[[ov]], ref = check)
rel$improvement = round((rel$reliability / 0.5 - 1), 2)
rel = rel[order(rel$improvement), ]
rel$item = factor(rel$item, levels = rel$item)
ggplot(data = rel,
aes(x = improvement,
y = item,
fill = "white")) +
geom_bar(stat = "identity",
width = 0.7,
position = "dodge",
show.legend = FALSE) +
scale_fill_manual(values = "#5aae61") +
geom_vline(xintercept = 0,
colour = "grey40",
linewidth = 1) +
theme_classic() +
labs(x = "Probability of outperforming the check",
y = "")
The reliability estimates show an improvement in overall preference of 32% for TMS6, 18% for Sape and 8% for TMEB1.
So far, we have focused solely on the overall preference. However,
there are other traits to assess. The function worth_plot()
can be used to visually analyze and compare variety performance across
different traits. The values represented in a worth map are
log-worth estimates. The function can be used with
ggplot2
functions to improve the plot.
worth_map(mod,
labels = traits,
labels.order = c("Overall", "Taste", "Stretchability", "Colour"))
The worth map confirms the superiority of TMS6, Sape and TMEB1 across the traits, but also presents Madame among the top varieties for color. We will return to this analysis later, but for now, let us examine the data from the perspective of different groups to consider heterogeneity in the participants’ evaluations, as proposed by van Etten et al. (2023) [9].
We will focus on two covariates to analyze the data: gender and
country. The function likelihood_ratio()
is used to test
which of these covariates can provide distinguishable rankings within
the groups.
# by gender
llr1 = lapply(R, function(x){
likelihood_ratio(x, split = dat$gender)
})
llr1 = do.call("rbind", llr1)
llr1$trait = traits
llr1
## deviance DF_delta Pr(>Chisq) trait
## <dbl> <dbl> <dbl> <chr> <chr>
## 1: 11.7619 12.0000 0.4650 Colour
## 2: 9.8536 12.0000 0.6288 Stretchability
## 3: 7.4512 12.0000 0.8264 Taste
## 4: 8.6690 12.0000 0.7309 Overall
# by country
llr2 = lapply(R, function(x){
likelihood_ratio(x, split = dat$country)
})
llr2 = do.call("rbind", llr2)
llr2$trait = traits
llr2
## deviance DF_delta Pr(>Chisq) trait
## <dbl> <dbl> <dbl> <chr> <chr>
## 1: 69.3164 12.0000 0.0000 *** Colour
## 2: 24.8679 12.0000 0.0155 * Stretchability
## 3: 26.9415 12.0000 0.0079 ** Taste
## 4: 38.2459 12.0000 0.0001 *** Overall
The likelihood-ratio test indicates that sub-setting the data by country provides statistically different rankings. This test can also be used to validate more complex groupings, such as those generated by cluster analysis or farmers’ typologies, which include a diverse set of variables (e.g., socio-economic and agroecological), as proposed by Voss et al. (2024) [10]. For simplicity, we focus here only on the variable “country.”
Now, returning to the worth map analysis, we can visualize the
performance of the varieties within the groups. We iterate over the
groups by sub-setting the data in a loop. Within the loop, we create the
plots and store them in a list. Finally, we use the
patchwork
package to display the two worth maps.
# get the slice variable as a vector
slice = dat$country
# and get the unique values
slice_lvs = unique(slice)
trait_plot = list()
# order of varieties from best to worst in the full dataset
items_lvls = rev(names(sort(rank(coef(mod[[ov]], log = FALSE) * -1))))
for (i in seq_along(slice_lvs)) {
# fit the model also applying the slice
mod_i = lapply(R, function(x) {
PlackettLuce(x[slice == slice_lvs[i], ])
})
# plot the worth map
trait_plot[[i]] = worth_map(mod_i,
labels = traits,
labels.order = c("Overall", "Taste", "Stretchability", "Colour"),
items.order = items_lvls)
}
# plot the two maps using patchwork
trait_plot[[1]] + trait_plot[[2]] + plot_layout(ncol = 1)