SCRIP simulation for scRNA-seq data

Fei Qin

Last updated: 11/16/2021

1. Introduction to SCRIP method

SCRIP proposed two frameworks based on Gamma-Poisson and Beta-Gamma-Poisson distribution for simulating scRNA-seq data. Both Gamma-Poisson and Beta-Gamma-Poisson distribution model the over dispersion of scRNA-seq data. Specifically, Beta-Gamma-Poisson model was used to model bursting effect. The dispersion was accurately simulated by fitting the mean-BCV dependency using generalized additive model (GAM). Other key characteristics of scRNA-seq data including library size, zero inflation and outliers were also modeled by SCRIP. With its flexible modeling, SCRIP enables various application for different experimental designs and goals including DE analysis, clustering analysis, trajectory-based analysis and bursting analysis

2. Installation

BiocManager::install("splatter")

library(devtools)
install_github("thecailab/SCRIP")

3. Quick start

Assuming you already have a count matrix for scRNA-seq data, and you want to simulation data based on it. Only a few steps are needed to creat a simulation data using SCRIP.

A dataset from Xin data is used for example.

library(splatter)
library(SCRIP)
 
data(acinar.data)
params <- splatEstimate(acinar.data)
## $start.arg
## $start.arg$shape
## [1] 0.833088
## 
## $start.arg$rate
## [1] 0.09357466
## 
## 
## $fix.arg
## NULL
## 
## $start.arg
## $start.arg$meanlog
## [1] 9.415808
## 
## $start.arg$sdlog
## [1] 1.034692
## 
## 
## $fix.arg
## NULL
## 
## $start.arg
## $start.arg$meanlog
## [1] 4.719586
## 
## $start.arg$sdlog
## [1] 0.7954047
## 
## 
## $fix.arg
## NULL
sim_trend <-  SCRIPsimu(data=acinar.data, params=params, mode="GP-trendedBCV")
sim_trend
## class: SingleCellExperiment 
## dim: 1000 80 
## metadata(13): Params method ... batch.facScale bcv.shrink
## assays(5): BatchCellMeans BaseCellMeans CellMeans TrueCounts counts
## rownames(1000): Gene1 Gene2 ... Gene999 Gene1000
## rowData names(4): Gene BaseGeneMean OutlierFactor GeneMean
## colnames(80): Cell1 Cell2 ... Cell79 Cell80
## colData names(3): Cell Batch ExpLibSize
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

4 Single cell type simulation

4.1 Parameter estimation

SCRIP utlized the estimation strategy from splatter, but also provided more parameters (Fold change, dropout rates, library size, BCV degree of freedom) to serve different experimental designs (i.e. Simulation for differential expression analysis, clustering analysis and trajectory analysis). Detailed description about other parameters will be shown in other sections of this document.

4.2 Simulation

The default mode in SCRIP for simulation is “GP-trendedBCV”. You can also choose other modes (“GP-commonBCV”, “BGP-commonBCV”,“BP”, “BGP-trendedBCV”) in the SCRIPsimu() function. For single cell type simulation, you have to set the “method” as “single”, which was default in SCRIPsimu() function.

4.2.1 GP-commonBCV

GP-commonBCV is the model used by splatter. GP-commonBCV applied the Gamma-Poisson mixture model with mean-BCV dependency fitted by a common BCV across genes.

## class: SingleCellExperiment 
## dim: 1000 80 
## metadata(13): Params method ... batch.facScale bcv.shrink
## assays(5): BatchCellMeans BaseCellMeans CellMeans TrueCounts counts
## rownames(1000): Gene1 Gene2 ... Gene999 Gene1000
## rowData names(4): Gene BaseGeneMean OutlierFactor GeneMean
## colnames(80): Cell1 Cell2 ... Cell79 Cell80
## colData names(3): Cell Batch ExpLibSize
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

4.2.3 BP

BP is the model used for simulating bursting effect using Beta-Poisson mixture distributionwithout considering BCV effect.

## $start.arg
## $start.arg$shape1
## [1] 0.1792874
## 
## $start.arg$shape2
## [1] 1.938204
## 
## 
## $fix.arg
## NULL

4.2.4 BGP-commonBCV

BP-commonBCV is the model used for simulating bursting effect with Beta-Gamma-Poisson mixture distribution. The mean-BCV dependency was fitted by a common BCV across genes.

## $start.arg
## $start.arg$shape1
## [1] 0.1792874
## 
## $start.arg$shape2
## [1] 1.938204
## 
## 
## $fix.arg
## NULL

4.2.5 BGP-trendedBCV

BP-trendedBCV is the model used for simulating bursting effect with Beta-Gamma-Poisson mixture distribution. The mean-BCV dependency was fitted by a GAM.

## $start.arg
## $start.arg$shape1
## [1] 0.1792874
## 
## $start.arg$shape2
## [1] 1.938204
## 
## 
## $fix.arg
## NULL

5 Group simulation

Group simulation is useful for studying different experimental conditions, especially for differential expression (DE) analysis. To serve different applications in scRNA-seq analysis, SCRIP provides flexible simulation. It can simulate scRNA-seq data with different parameters from multiple cell groups (i.e. cell types), which is useful for evaluating the detection of global characteristics such as clustering. It also allows simulation of group difference in a single cell group, which is useful for evaluating typical DE analysis methods.

5.1 Basic group simulation

DEGs were simulated using multiplicative differential expression factors from a log-normal distribution with parameters including number of genes (nGenes), the path-specific proportion of DE genes (de.prob), the proportion of down-regulated DE genes (de.downProb), DE location factor (de.facLoc) and DE scale factor (de.facScale).

5.2 Group simulation with batch effect

Batch effect factors are also generated from a log-normal distribution with parameters including batchCells, batch.facLoc and batch.facScale.

batchCells: number of cells for each batch  
batch.facLoc: Batch location factor in log-normal distribution for batch factor  
batch.facScale: Batch scale factor in log-normal distribution for batch factor