GPUs are great resources for data analysis, especially in statistics and linear algebra. Unfortunately, very few packages connect R to the GPU, and none of them are transparent enough to run the computations on the GPU without substantial changes to the code. The maintenance of these packages is cumbersome: several of the earlier attempts have been removed from their respective repositories. It would be desirable to have a properly maintained R package that takes advantage of the GPU with minimal changes to the existing code.

We have developed the GPUmatrix package (available on CRAN). GPUmatrix mimics the behavior of the Matrix package and extends R to use the GPU for computations. It includes single(FP32) and double(FP64) precision data types, and provides support for sparse matrices. It is easy to learn, and requires very few code changes to perform the operations on the GPU. GPUmatrix relies on either the Torch or Tensorflow R packages to perform the GPU operations.

We have demonstrated its usefulness for several statistical applications and machine learning applications: non-negative matrix factorization, logistic regression and general linear models. We have also included a comparison of GPU and CPU performance on different matrix operations.

Before starting, please be advised that this R package is designed to have the lowest learning curve for the R user to perform algebraic operations using the GPU. Therefore, this tutorial will mostly cover procedures that will go beyond the operations that the user can already perform with R’s CPU matrices.

GPUmatrix is an R package that utilizes tensors through the
**torch** or **tensorflow** packages (see
Advanced Users section for more information). One or the other must be
installed for the use of GPUmatrix. Both packages are hosted in CRAN and
have specific installation instructions. In both cases, it is necessary
to have an NVIDIA® GPU card with the latest drivers installed in order
to use the packages, as well as a version of Python 3. The NVIDIA card
must be compatible; please see the list of capable cards here. If there
is no compatible graphics card or not graphic card at all, you can still
install tensorFlow and torch, but only with the CPU version, which means
that GPUmatrix will only be able to run in CPU mode.

The installation of TensorFlow allows the selection to install the
GPU, CPU, or both versions. This will depend on the version of
TensorFlow that we install with the `install_tensorflow()`

function. The mode in which the tensors are created using GPUmatrix, if
we choose to use TensorFlow, will depend on the installation mode. The
options to switch from CPU to GPU are not enabled when using GPUmatrix
with TensorFlow for this precise reason. To install the GPU version, it
is not necessary to specify the version since __if it detects that the
CUDA dependencies are met__, it will automatically install using the
GPU mode. If you want to install the CPU version, you need to specify it
as follows:

`install_tensorflow(version="nightly-cpu")`

__Once the dependencies for Torch or TensorFlow are installed__,
the GPUmatrix package, being a package hosted on CRAN, can be easily
installed using:

Alternatively, it is possible to install the package from GitHub ot get the last version of the package.

The GPUmatrix package is based on S4 objects in R and we have created
a constructor function that acts similarly to the default
`matrix()`

constructor in R for CPU matrices. The constructor
function is `gpu.matrix()`

and accepts the same parameters as
`matrix()`

:

`## Torch tensors allowed`

`## Tensorflow tensors allowed`

```
if (installTorch) {
#R matrix initialization
m <- matrix(c(1:20)+40,10,2)
#Show CPU matrix
m
#GPU matrix initialization
Gm <- gpu.matrix(c(1:20)+40,10,2)
#Show GPU matrix
Gm
}
```

`## Loading required namespace: torch`

```
## GPUmatrix
## torch_tensor
## 41 51
## 42 52
## 43 53
## 44 54
## 45 55
## 46 56
## 47 57
## 48 58
## 49 59
## 50 60
## [ CUDADoubleType{10,2} ]
```

Although the indexing of tensors in both torch and tensorflow is
0-based, the indexing of GPUmatrix objects is 1-based, making it as
close as possible to working with native R matrices and more convenient
for the user. In the previous example, a normal R CPU matrix called
`m`

and its GPU counterpart `Gm`

are created. Just
like regular matrices, the created GPU matrices allow for indexing of
its elements and assignment of values. The concatenation operators
`rbind()`

and `cbind()`

work independently of the
type of matrices that are to be concatenated, resulting in a
** gpu.matrix**:

```
if (installTorch) {
Gm[c(2,3),1]
Gm[,2]
Gm2 <- cbind(Gm[c(1,2),], Gm[c(6,7),])
Gm2
Gm2[1,3] <- 0
Gm2
}
```

```
## GPUmatrix
## torch_tensor
## 41 51 0 56
## 42 52 47 57
## [ CUDADoubleType{2,4} ]
```

It is also possible to initialize the data with NaN values:

```
if (installTorch) {
Gm3 <- gpu.matrix(nrow = 2,ncol=3)
Gm3[,2]
Gm3[1,2] <- 1
Gm3
Gm3[1,3] <- 0
Gm3
}
```

```
## GPUmatrix
## torch_tensor
## nan 1 0
## nan nan nan
## [ CUDADoubleType{2,3} ]
```

These examples demonstrate that, contrary to standard R, subsetting a gpu.matrix —even when selecting only one column or row— still results in a gpu.matrix. This behavior is analogous to using ‘drop=F’ in standard R. The default standard matrices in R have limitations. The only allowed numeric data types are int and float64. It neither natively allows the creation or handling of sparse matrices. To make up for this lack of functionality, other R packages hosted in CRAN have been created to manage these types.

In the GPUmatrix constructor, we can specify the location of the
matrix, i.e., we can decide to host it on the GPU or in RAM memory to
use it with the CPU. As a package, as its name suggests, oriented
towards algebraic operations in R using the GPU, it will by default be
hosted on the GPU, but it allows the same functionalities using the CPU.
To do this, we use the `device`

attribute of the constructor
and assign it the value ** “cpu”**.

```
if (installTorch) {
#GPUmatrix initialization with CPU option
Gm <- gpu.matrix(c(1:20)+40,10,2,device="cpu")
#Show CPU matrix from GPUmatrix
Gm
}
```

```
## GPUmatrix
## torch_tensor
## 41 51
## 42 52
## 43 53
## 44 54
## 45 55
## 46 56
## 47 57
## 48 58
## 49 59
## 50 60
## [ CPUDoubleType{10,2} ]
```

As commented in the introduction and dependency section, GPUmatrix
can be used with both TensorFlow and Torch. By default, the GPU matrix
constructor is initialized with Torch tensors because, in our opinion,
it provides an advantage in terms of installation and usage compared to
TensorFlow. Additionally, it allows the use of GPUmatrix not only with
GPU tensors but also with CPU tensors. To use GPUmatrix with TensorFlow,
simply use the `type`

attribute in the constructor function
and assign it the value **“tensorflow”** as shown in the
following example:

```
# library(GPUmatrix)
tensorflowGPUmatrix <- gpu.matrix(c(1:20)+40,10,2, type = "tensorflow") tensorflowGPUmatrix
```

The default matrices in R have limitations. The numeric data types it allows are int and float64, with float64 being the type used generally in R by default. It also does not natively allow for the creation and handling of sparse matrices. To make up for this lack of functionality, other R packages hosted in CRAN have been created that allow for programming these types of functionality in R. The problem with these packages is that in most cases they are not compatible with each other, meaning we can have a sparse matrix with float64 and a non-sparse matrix with float32, but not a sparse matrix with float32.

GPUmatrix allows for compatibility with sparse matrices and different data types such as float32. For this reason, casting operations between different matrix types from multiple packages to GPUmatrix type have been implemented:

Matrix class | Package | Data type default | SPARSE | Back cast |
---|---|---|---|---|

matrix | base | float64 | FALSE | TRUE |

data.frame | base | float64 | FALSE | TRUE |

integer | base | float64 | FALSE | TRUE |

numeric | base | float64 | FALSE | TRUE |

dgeMatrix | Matrix | float64 | FALSE | FALSE |

ddiMatrix | Matrix | float64 | TRUE | FALSE |

dpoMatrix | Matrix | float64 | FALSE | FALSE |

dgCMatrix | Matrix | float64 | TRUE | FALSE |

float32 | float | float32 | FALSE | FALSE |

torch_tensor | torch | float64 | Depends of tensor type | TRUE |

tensorflow.tensor | tensorflow | float64 | Depends of tensor type | TRUE |

There are two functions for casting to create a
** gpu.matrix**:

`as.gpu.matrix()`

`gpu.matrix()`

reate ‘Gm’ from ‘m’ matrix R-base:

```
## GPUmatrix
## torch_tensor
## 41 46
## 42 47
## 43 48
## 44 49
## 45 50
## [ CUDADoubleType{5,2} ]
```

Create ‘Gm’ from ‘M’ with Matrix package:

```
##
## Attaching package: 'Matrix'
```

```
## The following object is masked from 'package:GPUmatrix':
##
## det
```

```
## GPUmatrix
## torch_tensor
## 41 46
## 42 47
## 43 48
## 44 49
## 45 50
## [ CUDADoubleType{5,2} ]
```

Create ‘Gm’ from ‘mfloat32’ with float package:

```
## GPUmatrix
## torch_tensor
## 41 46
## 42 47
## 43 48
## 44 49
## 45 50
## [ CUDAFloatType{5,2} ]
```

Interestingly, GPUmatrix returns a float32 data type matrix if the input is a float matrix.

It is also possible to a gpu.matrix create ‘Gms’ type sparse from ‘Ms’ type sparse dgCMatrix, dgeMatrix, ddiMatrix or dpoMatrix with Matrix package:

```
if (installTorch) {
Ms <- Matrix(sample(0:1, 10, replace = TRUE), nrow=5, ncol=2, sparse=TRUE)
Ms
Gms <- gpu.matrix(Ms)
Gms
}
```

```
## GPUmatrix
## torch_tensor
## [ SparseCUDAFloatType{}
## indices:
## 0 0 1 1 2 3 4
## 0 1 0 1 0 0 0
## [ CUDALongType{2,7} ]
## values:
## 1
## 1
## 1
## 1
## 1
## 1
## 1
## [ CUDAFloatType{7} ]
## size:
## [5, 2]
## ]
```

The data types allowed by GPUmatrix are: **float64**,
**float32**, **int**, **bool** or
**logical**, **complex64** and
**complex32**. We can create a GPU matrix with a specific
data type using the ** dtype** parameter of the

`gpu.matrix()`

`dtype()`

`sparse`

`TRUE`

/`FALSE`

depending on
whether we want the resulting matrix to be sparse or not. We can also
modify the sparsity of an existing GPU matrix with the functions
`to_dense()`

`to_sparse()`

```
if (installTorch) {
#Creating a float32 matrix
Gm32 <- gpu.matrix(c(1:20)+40,10,2, dtype = "float32")
Gm32
#Creating a non sparse martix with data type float32 from a sparse matrix type float64
Ms <- Matrix(sample(0:1, 20, replace = TRUE), nrow=10, ncol=2, sparse=TRUE)
Gm32 <- gpu.matrix(Ms, dtype = "float32", sparse = F)
Gm32
#Convert Gm32 in sparse matrix Gms32
Gms32 <- to_sparse(Gm32)
Gms32
##Convert data type Gms32 in float64
Gms64 <- Gms32
dtype(Gms64) <- "float64"
Gms64
}
```

```
## GPUmatrix
## torch_tensor
## [ SparseCUDADoubleType{}
## indices:
## 1 2 2 4 5 5 6 7 8 9 9
## 0 0 1 0 0 1 1 1 1 0 1
## [ CUDALongType{2,11} ]
## values:
## 1
## 1
## 1
## 1
## 1
## 1
## 1
## 1
## 1
## 1
## 1
## [ CUDADoubleType{11} ]
## size:
## [10, 2]
## ]
```

GPUmatrix supports all basic arithmetic operators in R:
`+`

, `-`

, `*`

, `^`

,
`/`

, `%*%`

and `%%`

. Its usage is the
same as for basic R matrices, and it allows compatibility with other
matrix objects from the packages mentioned above.

```
if (installTorch) {
(Gm + Gm) == (m + m)
(Gm + M) == (mfloat32 + Gm)
(M + M) == (mfloat32 + Gm)
(M + M) > (Gm + Gm)*2
}
```

```
## [,1] [,2]
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE FALSE
## [5,] FALSE FALSE
```

As seen in the previous example, the comparison operators
(`==`

, `!=`

, `>`

, `<`

,
`>=`

, `<=`

) also work following the same
dynamic as the arithmetic operators.

Similarly to arithmetic operators, mathematical operators follow the
same operation they would perform on regular matrices of R.
`Gm`

is a *gpu.matrix* variable:

Mathematical operators | Usage |
---|---|

`log` |
`log(Gm)` |

`log2` |
`log2(Gm)` |

`log10` |
`log10(Gm)` |

`cos` |
`cos(Gm)` |

`cosh` |
`cosh(Gm)` |

`acos` |
`acos(Gm)` |

`acosh` |
`acosh(Gm)` |

`sin` |
`sin(Gm)` |

`sinh` |
`sinh(Gm)` |

`asin` |
`asin(Gm)` |

`asinh` |
`asinh(Gm)` |

`tan` |
`tan(Gm)` |

`atan` |
`atan(Gm)` |

`tanh` |
`tanh(Gm)` |

`atanh` |
`atanh(Gm)` |

`sqrt` |
`sqrt(Gm)` |

`abs` |
`abs(Gm)` |

`sign` |
`sign(Gm)` |

`ceiling` |
`ceiling(Gm)` |

`floor` |
`floor(Gm)` |

`cumsum` |
`cumsum(Gm)` |

`cumprod` |
`cumprod(Gm)` |

`exp` |
`exp(Gm)` |

`expm1` |
`expm1(Gm)` |

There are certain functions only applicable to numbers of complex type. In R these functions are grouped as complex operators and all of them are available for GPUmatrix matrices with the same functionality as in R base

Mathematical operators | Usage |
---|---|

`Re` |
`Re(Gm)` |

`Im` |
`Im(Gm)` |

`Conj` |
`Conj(Gm)` |

`Arg` |
`Arg(Gm)` |

`Mod` |
`Mod(Gm)` |

We can find a multitude of functions that can be applied to
*gpu.matrix* type matrices. Most of the functions are functions
from the base R package that can be used on *gpu.matrix* matrices
in the same way they would be applied to regular matrices of R. There
are other functions from other packages like **Matrix** or
**matrixStats** that have been implemented due to their
widespread use within the user community, such as `rowVars`

or `colMaxs`

. The output of these functions, which originally
produced R default matrix type objects, will now return
*gpu.matrix* type matrices if the input type of the function is
*gpu.matrix*.

```
if (installTorch) {
m <- matrix(c(1:20)+40,10,2)
Gm <- gpu.matrix(c(1:20)+40,10,2)
head(tcrossprod(m),1)
head(tcrossprod(Gm),1)
Gm <- tail(Gm,3)
rownames(Gm) <- c("a","b","c")
tail(Gm,2)
colMaxs(Gm)
}
```

`## [1] 50 60`

There is a wide variety of functions implemented in GPUmatrix, and they are adapted to be used just like regular R matrices.

Functions | Usage | Package |
---|---|---|

`determinant` |
`determinant(Gm, logarithm=T)` |
`base` |

`fft` |
`fft(Gm)` |
`base` |

`sort` |
`sort(Gm,decreasing=F)` |
`base` |

`round` |
`round(Gm, digits=0)` |
`base` |

`show` |
`show(Gm)` |
`base` |

`length` |
`length(Gm)` |
`base` |

`dim` |
`dim(Gm)` |
`base` |

`dim<-` |
`dim(Gm) <- c(...,...)` |
`base` |

`rownames` |
`rownames(Gm)` |
`base` |

`rownames<-` |
`rownames(Gm) <- c(...)` |
`base` |

`row.names` |
`row.names(Gm)` |
`base` |

`row.names<-` |
`row.names(Gm) <- c(...)` |
`base` |

`colnames` |
`colnames(Gm)` |
`base` |

`colnames<-` |
`colnames(Gm) <- c(...)` |
`base` |

`rowSums` |
`rowSums(Gm)` |
`Matrix` |

`colSums` |
`colSums(Gm)` |
`Matrix` |

`cbind` |
`cbind(Gm,...)` |
`base` |

`rbind` |
`rbind(Gm,...)` |
`base` |

`head` |
`head(Gm,...)` |
`base` |

`tail` |
`tail(Gm,...)` |
`base` |

`nrow` |
`nrow(Gm)` |
`base` |

`ncol` |
`ncol(Gm)` |
`base` |

`t` |
`t(Gm)` |
`base` |

`crossprod` |
`crossprod(Gm,...)` |
`base` |

`tcrossprod` |
`tcrossprod(Gm,…)` |
`base` |

`%x%` |
`Gm %x% … || … %x% Gm` |
`base` |

`%^%` |
`Gm %^% … || … %^% Gm` |
`base` |

`diag` |
`diag(Gm)` |
`base` |

`diag<-` |
`diag(Gm) <- c(…)` |
`base` |

`solve` |
`solve(Gm, …)` |
`base` |

`qr` |
`qr(Gm)` |
`base` |

`qr.Q` |
`qr.Q``(…)` |
`base` |

`qr.R` |
`qr.R``(…)` |
`base` |

`qr.X` |
`qr.X``(…)` |
`base` |

`qr.solve` |
`qr.solve``(…)` |
`base` |

`qr.coef` |
`qr.coef``(…)` |
`base` |

`qr.qy` |
`qr.qy``(…)` |
`base` |

`qr.qty` |
`qr.qty``(…)` |
`base` |

`qr.resid` |
`qr.resid``(…)` |
`base` |

`eigen` |
`eigen(Gm)` |
`base` |

`svd` |
`svd(Gm)` |
`base` |

`ginv` |
`ginv(Gm, tol = sqrt(.Machine$double.eps))` |
`MASS` |

`chol` |
`chol(Gm)` |
`base` |

`chol_solve` |
`chol_solve(Gm, …)` |
`GPUmatrix` |

`mean` |
`mean(Gm)` |
`base` |

`density` |
`density(Gm)` |
`base` |

`hist` |
`hist(Gm)` |
`base` |

`colMeans` |
`colMeans(Gm)` |
`Matrix` |

`rowMeans` |
`rowMeans(Gm)` |
`Matrix` |

`sum` |
`sum(Gm)` |
`base` |

`min` |
`min(Gm)` |
`base` |

`max` |
`max(Gm)` |
`base` |

`which.max` |
`which.max(Gm)` |
`base` |

`which.min` |
`which.min(Gm)` |
`base` |

`aperm` |
`aperm(Gm)` |
`base` |

`apply` |
`apply(Gm, MARGIN, FUN, …, simplify=TRUE)` |
`base` |

`cov` |
`cov(Gm)` |
`stats` |

`cov2cor` |
`cov2cor(Gm)` |
`stats` |

`cor` |
`cor(Gm, …)` |
`stats` |

`rowVars` |
`rowVars(Gm)` |
`matrixStats` |

`colVars` |
`colVars(Gm)` |
`matrixStats` |

`colMaxs` |
`colMaxs(Gm)` |
`matrixStats` |

`rowMaxs` |
`rowMaxs(Gm)` |
`matrixStats` |

`rowRanks` |
`rowRanks(Gm)` |
`matrixStats` |

`colRanks` |
`colRanks(Gm)` |
`matrixStats` |

`colMins` |
`colMins(Gm)` |
`matrixStats` |

`rowMins` |
`rowMins` |
`matrixStats` |

`dtype` |
`dtype(Gm)` |
`GPUmatrix` |

`dtype<-` |
`dtype(Gm)` |
`GPUmatrix` |

`to_dense` |
`to_dense(Gm)` |
`GPUmatrix` |

`to_sparse` |
`to_sparse(Gm)` |
`GPUmatrix` |

We have compared the computation time for different matrix functions, different precision and running the operations either on the GPU or on the CPU (using both GPUmatrix and plain R).

The functions that we tested are ‘`*`

’ (Hadamard or
element-wise product of matrices), ‘`exp`

’, (exponential of
each element of a matrix), ‘`rowMeans`

’ (means of the rows of
a matrix), ‘`%*%`

’ (standard product of matrices),
‘`solve`

’ (inverse of a matrix) and ‘`svd`

’
(singular value decomposition of a matrix).

These functions were tested on square matrices whose row sizes are 500, 700, 1000, 1400, 2000, 2800 and 4000.

Figure 1 compares the different computational architectures, namely, CPU -standard R matrices running on MKL on FP64-, CPU64 -GPUmatrix matrices computed on the CPU with FP64-, CPU32 -similar to the previous using FP32-, GPU64 -GPUmatrix matrices stored and computed on the GPU with FP64- and GPU32 -identical to the previous using FP32-.

It is important to note that the y-axis is in logarithmic scale. For example, the element-wise product of a matrix on GPU32 is around five times faster than the same operation on CPU. The results show that the GPU is particularly effective for element-wise operations (Hadamard product, exponential of a matrix). For these operations, it is easier to fully utilize the huge number of cores of a GPU. R-MKL seems to use a single core to perform element-wise operations. The torch implementation is much faster, but not as much as using the GPU. rowMeans is also faster on the GPU than on the CPU. In this case, the GPUmatrix CPU implementation is on par with the GPU. When the operations become more complex, as in the standard product of matrices and computing the inverse, CPU and GPU (using double precision) are closer to each other. However, GPU32 is still much faster than its CPU32 counterpart. Finally, it is not advisable -in terms of speed- to use the GPU for even more complex operations such as the SVD. In this case, GPU64 is the slowest method. GPU32 hardly stands up the comparison with CPU32.