We have learnt various R profiling techniques in the previous article. In this twentieth article in the R series, we shall learn how to optimise R code.
We will use version 4.2.2 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.
$ R --version R version 4.2.2 (2022-10-31) -- “Innocent and Trusting” Copyright (C) 2022 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see https://www.gnu.org/licenses/.
Recursive versus non-recursive functions
Consider a recursive function to compute the nth Fibonacci sequence:
recurFib <- function(n) { if (n == 0) {0} else if (n == 1 || n == 2) {1} else { recurFib(n - 1) + recurFib(n - 2) } }
You can verify the output sequence numbers, as shown below:
> recurFib(0) [1] 0 > recurFib(1) [1] 1 > recurFib(2) [1] 1 > recurFib(3) [1] 2 > recurFib(4) [1] 3 > recurFib(5) [1] 5
A non-recursive implementation is given by the following fibo() function:
> fibo <- function(n) { + phi <- (sqrt(5) + 1)/2 + fib <- (phi^(n+1) - (1-phi)^(n+1)) / (2*phi - 1) + round(fib) + }
In this example, we are starting the Fibonacci number with 1, as illustrated below:
> fibo(0) [1] 1 > fibo(1) [1] 1 > fibo(2) [1] 2 > fibo(3) [1] 3 > fibo(4) [1] 5 > fibo(5) [1] 8
The system.time() function can be used to obtain the computation time taken by both these functions.
> recurFib(35) [1] 9227465 > system.time(recurFib(35)) user system elapsed 6.831 0.000 6.834 > fibo(34) [1] 9227465 > system.time(fibo(34)) user system elapsed 0 0 0
The recursive function is definitely slower than the non-recursive implementation. Moreover, R uses faster native implementation for computing power of values (^), and hence its usage is recommended.
Preallocate
You do not need to allocate memory for any object in R. In the following example, the initialisation takes time.
> v1 <- c() > system.time(for (i in 1:10000000) {v1[i] <- i;}) user system elapsed 2.143 0.120 2.263
The length can be specified for a vector and memory can be pre-allocated, so that initialisation can be sped up, as illustrated below:
> v2 <- c(NA) > length(v2) <- 10000000 > system.time(for (i in 1:10000000) {v2[i] <- i;}) user system elapsed 0.299 0.020 0.319
Matrix multiplication
Consider two matrices, m1 and m2:
> m1 <-matrix(1:10000, nrow=100) > m2 <-matrix(10001:20000, nrow=100)
You can write your own version of matrix multiplication as follows:
> matrixMul <- function(mat1, mat2) { rows <- nrow(mat1) cols <- ncol(mat2) output <- matrix(nrow = rows, ncol = cols) for (i in 1:rows) { for(j in 1:cols) { vec1 <- mat1[i,] vec2 <- mat1[,j] mult1 <- vec1 * vec2 output[i,j] <- sum(mult1) } } return(output) }
The time taken for multiplying the two matrices, m1 and m2, using the above function is given below:
> system.time(matrixMul(m1,m2)) user system elapsed 0.039 0.000 0.039
The recommended practice is to perform matrix multiplication using the ‘%*%’ operator. The result for the multiplication of matrices m1 and m2 is quite instantaneous.
> system.time(m1*m2) user system elapsed 0 0 0
Bytecode compiler
The cmpfun() function provides an interface to the byte code compiler for R. It takes the following arguments:
Argument | Description |
f | A closure |
options | Compiler options |
The ‘recurFib’ function can be byte-compiled to improve the performance, as shown below:
> library(compiler) > compiled.recurFib <- cmpfun(recurFib) > system.time(compiled.recurFib(34)) user system elapsed 4.296 0.000 4.298
The compiled function, along with the reference to the bytecode, is shown below:
> compiled.recurFib function(n) { if (n == 0) {0} else if (n == 1 || n == 2) {1} else { recurFib(n - 1) + recurFib(n - 2) } } <bytecode: 0x55eff06fcba8>
You can also view the disassembly of the compiled function using the disassemble() function, as follows:
> disassemble(compiled.recurFib) list(.Code, list(12L, GETVAR.OP, 1L, LDCONST.OP,3L,EQ.OP, 4L, BRIFNOT.OP, 5L, 13L, LDCONST.OP, 3L,RETURN.OP, GETVAR.OP, 1L, LDCONST.OP, 8L, EQ.OP, 9L, OR1ST.OP,7L,30L, GETVAR.OP, 1L, LDCONST.OP, 10L, EQ.OP, 11L, OR2ND.OP, 7L, BRIFNOT.OP, 12L, 36L, LDCONST.OP, 8L, RETURN.OP, GETFUN.OP, 14L, MAKEPROM.OP, 17L, CALL.OP, 15L, GETFUN.OP, 14L, MAKEPROM.OP, 19L, CALL.OP, 18L, ADD.OP, 20L, RETURN.OP), list({ if (n == 0) { 0 } else if (n == 1 || n == 2) { 1 } else { recurFib(n - 1) + recurFib(n - 2) } }, n, structure(c(2L, 3L, 4L, 44L, 3L, 44L, 2L, 4L), srcfile = <environment>, class = “srcref”), 0, n == 0, if (n == 0) { 0 ...
The cmpfile() function can be used to compile a large source .R file. It accepts the following arguments:
Argument | Description |
infile | A closure |
outfile | Compiler options |
ascii | If compiled file should be stored as ascii |
env | Environment for compiling |
verbose | Display compilation output |
options | Compiler options |
version | Workspace format version |
Bigmemory package
The bigmemory package is used to handle large, Big Data matrices using shared memory and memory-mapped files. You can install the package as follows:
> install.packages(“bigmemory”) Installing package into ‘/home/guest/R/x86_64-pc-linux-gnu-library/4.1’ (as ‘lib’ is unspecified) --- Please select a CRAN mirror for use in this session --- ... Creating a generic function for ‘typeof’ from package ‘base’ in package ‘bigmemory’ ** help *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** checking absolute paths in shared objects and dynamic libraries ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (bigmemory)
The ‘m’ matrix is created using the ‘matrix’ function and written to a .csv file using the ‘write.table()’ function, as shown below:
> library(bigmemory) > m <- matrix(rnorm(30*15000, mean=0, sd=4), 30, 15000) > write.table(m, file = “/tmp/m1.csv”) > system.time(write.table(m, file = “/tmp/m1.csv”)) user system elapsed 0.318 0.001 0.318
You can instead use the big.matrix() function from the bigmemory package to improve the write performance, as demonstrated below:
> system.time(big.matrix(m, nrow=30, ncol=15000, backingfile=”m2.csv”, type=”double”, descriptorfile=”m2.desc”)) user system elapsed 0.006 0.000 0.006
The contents of the m2.csv file are given below for reference:
$ cat m2.desc new(“big.matrix.descriptor”, description = list(sharedType = “FileBacked”, filename = “m2.csv”, dirname = “/home/guest/20-optimization/”, totalRows = 30L, totalCols = 15000L, rowOffset = c(0, 30), colOffset = c(0, 15000), nrow = 30, ncol = 15000, rowNames = NULL, colNames = NULL, type = “double”, separated = FALSE))
The R code optimizer
The rco package takes in R code as input, and applies various optimization techniques to improve the code. You can install the package using the following command:
> install.packages(“rco”) Installing package into ‘/home/guest/R/x86_64-pc-linux-gnu-library/4.1’ (as ‘lib’ is unspecified) --- Please select a CRAN mirror for use in this session --- ... *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (rco)
The formula to convert Fahrenheit to Celsius is given below:
celsius = (fahrenheit - 32) / 1.8
The code snippets are stored in a ‘code’ variable, as shown below:
> library(rco) > code <- paste( + “f <- 100”, + “paren <- (f - 32)”, + “v <- paren / 1.8”, + “c <- v”, + “if (c > 30) {“, + ‘ print(“It is hot!”)’, + “} else {“, + ‘ print(“It is pleasant!”)’, + “}”, + sep=”\n” + )
You can invoke the optimize_code() function on the code and view the optimised code, as illustrated below:
> pass1 <- optimize_text(code, iterations = 1) Optimization number 1 > pass1 [1] “f <- 100\nparen <- (100 - 32)\nv <- paren / 1.8\nc <- v\nif (c > 30) {\n print(\”It is hot!\”)\n} else {\n print(\”It is pleasant!\”)\n}” > pass2 <- optimize_text(pass1, iterations = 1) Optimization number 1 > pass2 [1] “f <- 100\nparen <- 68\nv <- 68 / 1.8\nc <- v\nif (c > 30) {\n print(\”It is hot!\”)\n} else {\n print(\”It is pleasant!\”)\n}”
The ‘optimize_text’ function accepts the following arguments:
Argument | Description |
text | Code in character vectors |
optimizers | List of optimizer functions |
iterations | Number of times the optimizer should run |
You are encouraged to read the ‘Help’ pages of the above R functions to learn more about their arguments, options and usage.