The BEBOP phase II trial methodology incorporates predictive baseline
information to study co-primary efficacy and toxicity outcomes. BEBOP
stands for Bayesian Evaluation of Bivariate Binary Outcomes and
Predictive Information. It was developed for the PePS2 trial,
investigating pembrolizumab in non-small-cell lung cancer (NSCLC)
patients with performance status 2 (PS2). A major factor in the PePS2
trial is the data on PS0 / 1 NSCLC patients published by Garon et al. (2015), showing that patients with
greater PD-L1 tumour proportion scores are more likely to achieve an
objective response. They introduce this predictive biomarker and
validate a three-level categorisation Low, Medium and High PD-L1 score.
There was also a suggestion (albeit not shown to be *statistically
significant*) that previously untreated patients are more likely to
have a response. These same variables will likely be predictive of
response in the PS2 population. Brock et al. (publication in submission)
developed BEBOP to make efficient use of this information.

BEBOP re-uses the probability model at the core of the EffTox design (Thall and Cook 2004; Thall et al. 2014). Let \(x\) represent a vector of predictive baseline variables and \(\boldsymbol{\theta}\) a vector of parameters. The marginal probabilities of efficacy and toxicity are estimated using general functions \(\text{logit } \pi_E(x, \boldsymbol{\theta})\) and \(\text{logit } \pi_T(x, \boldsymbol{\theta})\) to be specified by the user.

In PePS2, we have \(x_{i1} = 1\) if a patient has been previously treated. To convey PD-L1 expression, the authors use \(x_{i2} = 1\) and \(x_{i3} = 0\) if a patient has a Low PD-L1 score; \(x_{i2} = 0\) and \(x_{i3} = 1\) if a patient has a Medium PD-L1 score; and \(x_{i2} = 0\) and \(x_{i3} = 0\) if a patient has a High PD-L1 score. Thus, we have \(x_i = (x_{i1}, x_{i2}, x_{i3})\). The marginal efficacy and toxicity functions in PePS2 take the form

\(\text{logit } \pi_E(x, \boldsymbol{\theta}) = \alpha + \beta x_{i1} + \gamma x_{i2} + \zeta x_{i3}\)

\(\text{logit } \pi_T(x, \boldsymbol{\theta}) = \lambda\)

As with EffTox, let \((Y_j, Z_j)\) be random variables each taking values \(\{0, 1\}\) respresenting the presence of efficacy and toxicity in patient \(j\). The efficacy and toxicity events are associated by the joint probability function

\(Pr(Y = a, Z = b) = \pi_{a,b}(\pi_E, \pi_T) = (\pi_E)^a (1-\pi_E)^{1-a} (\pi_T)^b (1-\pi_T)^{1-b} + (-1)^{a+b} (\pi_E) (1-\pi_E) (\pi_T) (1-\pi_T) \frac{e^\psi-1}{e^\psi+1}\).

The complete vector of parameters is \(\boldsymbol{\theta} = (\alpha, \beta, \gamma, \zeta, \lambda, \psi)\). Normal priors are specified for the elements of \(\boldsymbol{\theta}\).

The treatment is acceptable for patients with predictive vector \(x\) if

\(\text{Pr}\left\{ \pi_E(x, \boldsymbol{\theta}) > \underline{\pi}_E | \mathcal{D} \right\} > p_E\)

and

\(\text{Pr}\left\{ \pi_T(x, \boldsymbol{\theta}) < \overline{\pi}_T | \mathcal{D} \right\} > p_T\)

where \(\underline{\pi}_E, \overline{\pi}_T, p_E, p_T\) are chosen for clinical relevance.

PePS2 is an all-comers trial, thus patients are admitted regardless of their PD-L1 or pre-treatment status. This is motivated by the dearth of treatment options for PS2 NSCLC patients who cannot use chemotherapy. The design allows the predictive information to effectively stratify the analysis without stratifying recruitment. The statistical design uses the common Bayesian tool of borrowing strength across groups to improve the performance of the analysis.

`trialr`

The cohorts in the PePS2 trial are

i | Pretreated | PDL1 | x1 | x2 | x3 |
---|---|---|---|---|---|

1 | FALSE | Low | 0 | 1 | 0 |

2 | FALSE | Medium | 0 | 0 | 1 |

3 | FALSE | High | 0 | 0 | 0 |

4 | TRUE | Low | 1 | 1 | 0 |

5 | TRUE | Medium | 1 | 0 | 1 |

6 | TRUE | High | 1 | 0 | 0 |

The trial uses a sample size of 60. Let us simulate a set of outcomes with the following efficacy and toxicity rates

```
library(trialr)
<- function() peps2_get_data(num_patients = 60,
peps2_sc prob_eff = c(0.167, 0.192, 0.5, 0.091, 0.156, 0.439),
prob_tox = rep(0.1, 6),
eff_tox_or = rep(1, 6))
set.seed(123)
<- peps2_sc() dat
```

In this example, we have used efficacy rates that increase in PD-L1 and are slightly higher in previously-uintreated patients. We use the uniform toxicity rate of 10% across all cohorts, and no association between efficacy and toxicity events, represented by odds-ratios equal to 1.

The `dat`

object contains, for example, the prior
parameters

`c(dat$alpha_mean, dat$alpha_sd)`

`## [1] -2.2 2.0`

and simulated predictive variables and efficacy and toxicity outcomes

```
::kable(
knitrhead(with(dat, data.frame(eff, tox, x1, x2, x3)), 10)
)
```

eff | tox | x1 | x2 | x3 |
---|---|---|---|---|

0 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 0 |

0 | 1 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

1 | 0 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

We fit the data to the BEBOP model and obtain samples from the
posterior distribution using `rstan`

. The
`BebopInPeps2`

model is provided by `trialr`

and
compiled when the package is installed.

`<- stan_peps2(dat$eff, dat$tox, dat$cohorts) fit `

It is informative to view plots of posterior beliefs. Posterior samples of parameter values are available but they are less meaningful to us than the modelled efficacy rates, for example. View posterior distributions using code like

`::plot(fit, pars = 'prob_eff') rstan`

We see that the modelled rates of efficacy are highest in cohorts 3 and 6, but likely to be greater than the critical threshold of 10% in cohort 2 and perhaps cohort 5 as well. In contrast, there is not much evidence of clinical benefit in cohorts 1 and 4. This is confirmed by invoking the formally described analysis and associated decision rules.

```
<- peps2_process(fit)
decision ::kable(
knitrwith(decision, data.frame(ProbEff, ProbAccEff, ProbTox, ProbAccTox, Accept)),
digits = 3
)
```

ProbEff | ProbAccEff | ProbTox | ProbAccTox | Accept |
---|---|---|---|---|

0.073 | 0.254 | 0.068 | 1 | FALSE |

0.249 | 0.979 | 0.068 | 1 | TRUE |

0.424 | 0.992 | 0.068 | 1 | TRUE |

0.040 | 0.093 | 0.068 | 1 | FALSE |

0.150 | 0.742 | 0.068 | 1 | TRUE |

0.282 | 0.948 | 0.068 | 1 | TRUE |

ProbAccEff is the posterior probability that the efficacy rate in a cohort is greater than the 10% threshold. ProbAccTox is the probability that toxicity is less than 30%. The treatment is acceptable in a cohort if it is sufficiently efficacious and non-toxic. We see that in this simulated iteration, the treatment would be approved in all cohorts except 1 and 4.

We can perform the simulations on a greater number of iterations to learn about the operating characteristics of the design.

```
set.seed(123)
<- function(num_sims = 10, sample_data_func = peps2_sc,
run_sims summarise_func = peps2_process, ...) {
<- list()
sims for(i in 1:num_sims) {
print(i)
<- sample_data_func()
dat <- stan_peps2(dat$eff, dat$tox, dat$cohorts, ...)
fit <- summarise_func(fit)
sim <- sim
sims[[i]]
}return(sims)
}
<- run_sims(num_sims = 10, sample_data_func = peps2_sc,
sims summarise_func = peps2_process)
```

In `run_sims`

, the second and third args are delegates to
simulate trial outcomes and post-process the `rstan`

sample
respectively. The outcome samping delelgate is called without arguments.
The post-process delegate is called with first argument the object
returned by the outcome sampling delegate (e.g. `dat`

above),
and second argument the posterior sample from `rstan`

(e.g. `samp`

above). The objects returned by the post-process
delegate form the items in the `sims`

object that are
returned to the user by `peps2_run_sims`

.

Thus, the probaility of approving the treatment using the statistical design in this scenario can be calculated using

`apply(sapply(sims, function(x) x$Accept), 1, mean)`

`trialr`

is available at https://github.com/brockk/trialr and https://CRAN.R-project.org/package=trialr

Garon, Edward B, Naiyer a Rizvi, Rina Hui, Natasha Leighl, Ani S
Balmanoukian, Joseph Paul Eder, Amita Patnaik, et al. 2015.
“Pembrolizumab for the Treatment of Non-Small-Cell Lung
Cancer.” *The New England Journal of Medicine* 372 (21):
2018–28. https://doi.org/10.1056/NEJMoa1501824.

Thall, PF, and JD Cook. 2004. “Dose-Finding Based on
Efficacy-Toxicity
Trade-Offs.” *Biometrics* 60 (3):
684–93.

Thall, PF, RC Herrick, HQ Nguyen, JJ Venier, and JC Norris. 2014.
“Effective Sample Size for Computing Prior Hyperparameters in
Bayesian Phase I-II
Dose-Finding.” *Clinical Trials* 11 (6): 657–66. https://doi.org/10.1177/1740774514547397.