Understanding Packaging in R

0
713

In this eighth article in the R series, we shall explore the packaging of R code and publishing it for end users. We will be using R version 4.1.2 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.

By now we know that the R programming language comes with pre-built functions. The base installation of R provides packages with essential functions.

Install
The default list of packages available in the system can be fetched using the following command:

> getOption(“defaultPackages”)
[1] “datasets”  “utils”     “grDevices” “graphics”  “stats”     “methods”

If you need to install new packages in the R console, you can use the command given below to do so:

> install.packages(“RUnit”)

You will need to load the library in the R session before using its functions. For example:

> library(RUnit)

The present active packages that have been loaded into the R session can be displayed as follows:

> (.packages())
[1] “stats”  “graphics”  “grDevices” “utils”  “datasets”  “methods”  
[7] “base”

You can list all the available packages with the all.available=TRUE option to the packages command:

> (.packages(all.available=TRUE))
  [1] “AMR”  “base64enc”   “bit”         “bit64”       “blob”       
  [6] “brio”   “bslib”     “cachem”      “callr”       “cli”        
 [11] “commonmark”  “crayon”  “cubature”  “DBI”        “desc”       
 [16] “diffobj”  “digest”   “ellipsis”  “evaluate”    “fansi”      
 [21] “fastmap” “fontawesome” “fs”      “glue”        “highr”      
 [26] “htmltools” “httpuv” “IRdisplay” “IRkernel” “jquerylib”  
 [31] “jsonlite”  “knitr”  “kyotil”   “later”     “lifecycle”  
 [36] “magrittr”   “memoise”    “mime”  “mvtnorm”    “pbdZMQ”     
 [41] “pillar”   “pkgconfig”   “pkgload”   “plogr”    “praise”     
 [46] “processx”   “promises”   “ps”   “R6”     “rappdirs”   
 [51] “Rcpp”  “rematch2”  “repr”   “rjson”       “rlang”      
 [56] “rmarkdown” “rprojroot”  “RSQLite”  “rstudioapi”  “RUnit”      
 [61] “sass”  “shiny”  “sourcetools” “stringi”     “stringr”    
 [66] “testthat” “tibble”  “tinytest”    “tinytex”     “utf8”       
 [71] “uuid”  “vctrs”     “waldo”       “withr”       “xfun”       
 [76] “xtable”   “yaml”    “base”     “boot”        “class”      
 [81] “cluster” “codetools”  “compiler” “datasets”  “foreign”    
 [86] “graphics” “grDevices”  “grid”  “KernSmooth”  “lattice”    
 [91] “MASS”  “Matrix”    “methods”   “mgcv”  “nlme”       
 [96] “nnet”  “parallel”  “rpart”    “spatial”     “splines”    
[101] “stats”   “stats4”  “survival”   “tcltk”    “tools”      
[106] “utils”

The library() function will open a new window showing a list of available packages along with a one-line description of the package, as illustrated below:

> library()
Packages in library ‘/home/guest/R/x86_64-pc-linux-gnu-library/4.1’:
AMR                     Antimicrobial Resistance Data Analysis
base64enc               Tools for base64 encoding
bit                     Classes and Methods for Fast Memory-Efficient
                        Boolean Selections
bit64                   A S3 Class for Vectors of 64bit Integers
blob                    A Simple S3 Class for Representing Vectors of
                        Binary Data (‘BLOBS’)
brio                    Basic R Input Output
bslib                   Custom ‘Bootstrap’ ‘Sass’ Themes for ‘shiny’
                        and ‘rmarkdown’
cachem                  Cache R Objects with Automatic Pruning
callr                   Call R from R
cli                     Helpers for Developing Command Line Interfaces
commonmark              High Performance CommonMark and Github Markdown
                        Rendering in R
crayon                  Colored Terminal Output
cubature                Adaptive Multivariate Integration over
                        Hypercubes
DBI                     R Database Interface
desc                    Manipulate DESCRIPTION Files
diffobj                 Diffs for R Objects
digest                  Create Compact Hash Digests of R Objects
ellipsis                Tools for Working with ...
evaluate                Parsing and Evaluation Tools that Provide More
                        Details than the Default
fansi                   ANSI Control Sequence Aware String Functions
fastmap                 Fast Data Structures
fontawesome             Easily Work with ‘Font Awesome’ Icons
fs                      Cross-Platform File System Operations Based on
                        ‘libuv’
...

You can install multiple packages with the install.packages function passing it a vector of package names. For example, the ‘abc’ package provides tools for approximate Bayesian computation (ABC), and the ‘abtest’ tool implements Bayesian A/B testing. You can install both using the following command:

> install.packages(c(“abc”, “abtest”))

You can now verify that the installed packages are listed from the available.packages function, as shown below:

> (.packages(all.available=TRUE))
  [1] “abc”    “abc.data”   “abtest”   “AMR”       “base64enc”   
  [6] “bit”    “bit64”    “blob”      “brio”         “bslib”       
 [11] “cachem”  “callr”   “cli”      “commonmark”   “crayon”      
 [16] “cubature”  “DBI”   “desc”    “diffobj”      “digest”      
 [21] “doParallel”   “ellipsis”     “evaluate”     “fansi”        “fastmap”     
 ...

Remove
You can remove a package using the remove.package function. For example:

> remove.packages(c(“abc”,”abc.data”, “abtest”),”/home/guest/R/x86_64-pc-linux-gnu-library/4.1”)

The second argument specifies the path where the packages are installed by default. If you install the packages in a different directory, you can set it to the Library variable, and use it as the second argument to remove.packages(). We can again verify that the packages have been removed:

> (.packages(all.available=TRUE))
  [1] “AMR”    “base64enc”   “bit”      “bit64”        “blob”        
  [6] “brio”   “bslib”     “cachem”       “callr”        “cli”         
 [11] “commonmark”   “crayon”  “cubature”  “DBI”        “desc”        
 [16] “diffobj”  “digest”  “doParallel”  “ellipsis” “evaluate”    
 [21] “fansi”   “fastmap”   “fontawesome”  “foreach”      “fs”
 ...

Update
The old.packages function returns a list of packages that have a newer version released in a remote R repository. For example:

> old.packages()
         Package    LibPath                                           Installed
cli      “cli”      “/home/shakthi/R/x86_64-pc-linux-gnu-library/4.1” “3.1.0”  
DBI      “DBI”      “/home/shakthi/R/x86_64-pc-linux-gnu-library/4.1” “1.1.1”  
fansi    “fansi”    “/home/shakthi/R/x86_64-pc-linux-gnu-library/4.1” “1.0.0”  
glue     “glue”     “/home/shakthi/R/x86_64-pc-linux-gnu-library/4.1” “1.4.2”  
...
         Built   ReposVer  Repository                               
cli      “4.1.2” “3.1.1”   “https://cloud.r-project.org/src/contrib”
DBI      “4.1.0” “1.1.2”   “https://cloud.r-project.org/src/contrib”
fansi    “4.1.2” “1.0.2”   “https://cloud.r-project.org/src/contrib”
glue     “4.1.0” “1.6.1”   “https://cloud.r-project.org/src/contrib”
...

The cli.3.1.0 package has been installed on the system, but there is a newer ‘3.1.1’ package available. We can update the package using the update.packages function, as follows:

> update.packages()
cli :
 Version 3.1.0 installed in /home/shakthi/R/x86_64-pc-linux-gnu-library/4.1 
 Version 3.1.1 available at https://cloud.r-project.org
Update? (Yes/no/cancel) Yes
DBI :
 Version 1.1.1 installed in /home/shakthi/R/x86_64-pc-linux-gnu-library/4.1 
 Version 1.1.2 available at https://cloud.r-project.org
Update? (Yes/no/cancel) no
...
 ** building package indices
 ** testing if installed package can be loaded from temporary location
 ** checking absolute paths in shared objects and dynamic libraries
 ** testing if installed package can be loaded from final location
 ** testing if installed package keeps a record of temporary installation path
 * DONE (cli)

The command prompts you for a confirmation before proceeding to update the package. You can choose not to update a specific package, or cancel the entire operation. In the above example, we only update the cli package to 3.1.1. The same can be verified using the packageVersion function, as follows:

> packageVersion(“cli”)
[1] ‘3.1.1’

Download
The new.packages function shows the uninstalled packages from the local system that are available in remote R repositories. For example:

> new.packages()
    [1] “A3”                               “aaSEA”                           
    [3] “AATtools”                         “aba”                             
    [5] “ABACUS”                           “abbreviate”                      
    [7] “abbyyR”                           “abc”                             
    [9] “abc.data”                         “ABC.RAP”                         
   [11] “abcADM”                           “ABCanalysis”                     
   [13] “abcdeFBA”                         “ABCoptim”                        
   [15] “ABCp2”                            “abcrf”                           
   [17] “abcrlda”                          “abctools”

You can download the R packages to a specific directory using the download.packages function.

The surveillance R package provides statistical methods for modelling and monitoring of time series data and for epidemic phenomena. Its R source code can be downloaded to a ‘/tmp’ directory, as illustrated below:

> download.packages(“surveillance”, destdir=”/tmp”)
trying URL ‘https://cloud.r-project.org/src/contrib/surveillance_1.19.1.tar.gz’
Content type ‘application/x-gzip’ length 4388559 bytes (4.2 MB)
==================================================
downloaded 4.2 MB
     [,1]           [,2]                             
[1,] “surveillance” “/tmp/surveillance_1.19.1.tar.gz”

Layout
The bitops R package implements bitwise operations on integer vectors. The bitops.1.0.7 source code contains the following files:

$ ls
ChangeLog  DESCRIPTION  INDEX  man  MD5  NAMESPACE  R  README.md  src  tests
$ ls man
bitAnd.Rd  bitFlip.Rd  bitShiftL.Rd  cksum.Rd
$ ls R
bitops.R
$ ls src/
bit-ops.c  bit-ops.h  cksum.c  init.c
$ ls tests/
consistency.R

An R source code package should contain the following files.

  • README: This documentation file contains the purpose of the project, and the steps to install, configure and use the package. For example, the README of the bitops.1.0.7 project contains the following text:
$ cat README.md

# R-bitops -- R CRAN package `bitops` -- implementing bitwise operations
## Functionality
Bitwise operations on (the 32-bit) \R integers
## Context and History
Probably the smallest [CRAN](https://CRAN.R-project.org)
[R](https://www.r-project.org) package I maintain,
the package has a long history, originating as S / S-plus extensions by
Steve Dutky.
...
  • ChangeLog: This file provides a brief history of the changes for the various commits to the project. The ChangeLog file in bitops.1.0.7 shows the recent changes on the top of the file.
$ cat ChangeLog
        
2021-04-13  Martin Maechler  <maechler@stat.math.ethz.ch>
        * From Dan Robertson (dlrobertson) via GH PR #3, Feb.2016:
        Introduce C-style  “operators”:
        “%&%”, “%|%”, “%^%”, “%<<%”, “%>>%”
2021-03-30  Martin Maechler  <maechler@stat.math.ethz.ch>
        * src/bit-ops.c (R_2_UINT_): cast even more, now via _2_UINT_()
2021-03-24  Martin Maechler  <maechler@stat.math.ethz.ch>
        * DESCRIPTION (Date, URL): cosmetic before finally releasing
...
  • INDEX: This file is optional and provides names of important objects in the package, along with their description. For example:
$ cat INDEX
bitAnd                  Bitwise And, Or and Xor operations
bitFlip                 Binary Flip (Not) Operator
bitShiftL               Bitwise Shift Operator (to the Left or Right)
cksum                   Compute Check Sum
  • MD5: This file contains the MD5 output of the source files in the R package. This is used to verify the checksum of the downloaded files using the md5sum utility.
  • NAMESPACE: This file contains directives that specify the various imports and exports of the name space for a package. The name space management specifies which variables are imported from other packages, and describes the variables that are exported to users of the package.
$ cat NAMESPACE
 ## Re-written 2012-11-03 B. D. Ripley
 useDynLib(bitops, .registration = TRUE, .fixes = “C_”)
 export(bitFlip, bitAnd, bitOr, bitXor, bitShiftL, bitShiftR, cksum)
 export(“%&%”, “%|%”, “%^%”, “%<<%”, “%>>%”)
  • man: This directory contains manual pages for the source package, and is written in the ‘R documentation’ (Rd) format.
  • R: This directory contains the actual R source files of the project.
  • src: This directory contains additional source files that are used by the package, such as .c and .h header files that interact with R’s foreign function interface (FFI).
$ ls src/
bit-ops.c  bit-ops.h  cksum.c  init.c
  • tests: This folder has the complete test suite to run and verify the R package. For example:
$ ls tests/
consistency.R
  • DESCRIPTION: This file provides the meta-information with details of the package name, version, date, author, maintainer, licence, URL, repository, published date, etc.

Build
You can build a package from the R source directory files using the R CMD build command. The bitops.1.0.7 sources can be built from the command line, as shown below:

$ R CMD build bitops
 * checking for file ‘bitops/DESCRIPTION’ ... OK
 * preparing ‘bitops’:
 * checking DESCRIPTION meta-information ... OK
 * cleaning src
 * checking whether ‘INDEX’ is up-to-date ... NO
 * use ‘--force’ to remove the existing ‘INDEX’
 * checking for LF line-endings in source and make files and shell scripts
 * checking for empty or unneeded directories
 * building ‘bitops_1.0-7.tar.gz’

You can also manually install a package from the shell command line after downloading the tarball with the R CMD INSTALL command, as shown below:

$ R CMD INSTALL bitops_1.0-7.tar.gz 
 * installing to library ‘/home/shakthi/R/x86_64-pc-linux-gnu-library/4.1’
 * installing *source* package ‘bitops’ ...
 ** package ‘bitops’ successfully unpacked and MD5 sums checked
 ** using staged installation
 ** libs
 gcc -I”/usr/include/R/” -DNDEBUG   -D_FORTIFY_SOURCE=2   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt  -c bit-ops.c -o bit-ops.o
 gcc -I”/usr/include/R/” -DNDEBUG   -D_FORTIFY_SOURCE=2   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt  -c cksum.c -o cksum.o
 gcc -I”/usr/include/R/” -DNDEBUG   -D_FORTIFY_SOURCE=2   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt  -c init.c -o init.o
 gcc -shared -L/usr/lib64/R/lib -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -o bitops.so bit-ops.o cksum.o init.o -L/usr/lib64/R/lib -lR
 installing to /home/shakthi/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-bitops/00new/bitops/libs
 ** R
 ** byte-compile and prepare package for lazy loading
 ** help
 *** installing help indices
 ** building package indices
 ** testing if installed package can be loaded from temporary location
 ** checking absolute paths in shared objects and dynamic libraries
 ** testing if installed package can be loaded from final location
 ** testing if installed package keeps a record of temporary installation path
 * DONE (bitops)

You can verify the same using the packageVersion command in the R console:

> packageVersion(“bitops”)
[1] ‘1.0.7’

The devtools library provides various functions that allow you to install R packages from Git repositories and URLs. This is useful if you would like to test the latest main development branch of an R package. You will need to first load the devtools library in the R console session:

> install.packages(“devtools”)
 Installing package into ‘/home/shakthi/R/x86_64-pc-linux-gnu-library/4.1’
 (as ‘lib’ is unspecified)
 also installing the dependencies ‘gert’, ‘usethis’
 trying URL ‘https://cloud.r-project.org/src/contrib/gert_1.5.0.tar.gz’
 Content type ‘application/x-gzip’ length 66958 bytes (65 KB)
 ...
 ** testing if installed package keeps a record of temporary installation path
 * DONE (devtools)
> library(devtools)
Loading required package: usethis

You can now use the install_github function with a GitHub ‘username/project-name’ to install the R sources from the main branch, as shown below:

> install_github(“hadley/plyr”)
Downloading GitHub repo hadley/plyr@HEAD
...
─  preparing ‘plyr’:
  checking DESCRIPTION meta-information ...
─  cleaning src
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘plyr_1.8.6.9000.tar.gz’
 ** checking absolute paths in shared objects and dynamic libraries
 ** testing if installed package can be loaded from final location
 ** testing if installed package keeps a record of temporary installation path
 * DONE (plyr)

R also provides the package.skeleton function to build your own package from a directory containing R source files. This function accepts the following arguments:

Argument Purpose
name Package name
list A character vector of named R objects
environment The environment (default .GlobalEnv) to evaluate list
path A character vector (default ‘.’) containing path in the file system
force A Boolean value to overwrite files if directory already exists
namespace A Boolean value to include a namespace
code_files The character vector that mentions the R source files


Repositories

The R packages are available on the Internet in three major repositories:

1.  Comprehensive R Archive Network (CRAN): This provides source code as well as pre-compiled binary distributions for the R base system and packages. The latest R release is 2021-11-01 (Bird Hippie) R-4.1.2.tar.gz as of writing this article. It has over 18,928 packages, and you can view the available packages by name or date of publication.

2.  Bioconductor: This project provides Free/Libre and Open Source R software packages for powerful statistical and graphical methods to analyse genomic data. These packages are also made available in CRAN, in addition to their own R packages. Their software is released under the Artistic License 2.0.

3.  R-Forge: This provides a platform for the development of R packages and hosts 2145 R related projects. It is based on FusionForge, and uses Subversion for source control management. It has support for mailing lists, forums, bug tracking, Web based administration and backups.

The remote repositories available on your system can be obtained using the following options function:

> options(‘repos’)
$repos
                         CRAN 
“https://cloud.r-project.org”

You can add additional remote R repositories using the setRepositories function.

LEAVE A REPLY

Please enter your comment!
Please enter your name here