In this second article in the series on ‘R, Statistics and Machine Learning’, we are going to learn the syntax of the R programming language. We will use R version 4.1.0 installed on Parabola GNU/Linux-libre (x86-64). The syntax for constants will be discussed first, followed by operators, functions, expressions and, finally, control structures. The Tidyverse R style guide, based on Google’s R style guide with a few changes, will also be addressed.
Numbers are represented in R as is and a few examples are given below:
$ R R version 4.1.0 (2021-05-18) -- “Camp Pontanezen” Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type ‘license()’ or ‘licence()’ for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type ‘contributors()’ for more information and ‘citation()’ on how to cite R or R packages in publications. Type ‘demo()’ for some demos, ‘help()’ for on-line help, or ‘help.start()’ for an HTML browser interface to help. Type ‘q()’ to quit R. > 3.1415 [1] 3.1415 > 5 [1] 5 > 0x00FF [1] 255 > 2^10 [1] 1024 > +1i [1] 0+1i
Text can be enclosed within a pair of quotes (single or double), as shown below:
> “Foo” [1] “Foo” > ‘Foo’ [1] “Foo”
A symbol is a name for another object in R. It is commonly a name of a variable used in an assignment statement. You can identify the type of an R object using the typeof function:
> typeof(3.1415) [1] “double” > typeof(5) [1] “double” > typeof(0x00FF) [1] “double” > typeof(“Foo”) [1] “character” > typeof(‘Foo’) [1] “character”
All comments in R begin with the hash (#) symbol. Examples of a single line and multi-line comment are given below:
> 3.1415 # Pi [1] 3.1415 > # This is a multi-line > # comment example
A variable is assigned a value and you can use the ‘<-’ assignment operator in R:
> a <- 1 > a [1] 1
There are a few special values used in R:
- Inf and -Inf represent positive and negative infinity respectively.
> 10^2048 [1] Inf > -10^2048 [1] -Inf
- NA refers to ‘Not Available’, and can be seen when you load text or data into R that has missing values.
- NULL is a symbol in R that represents a null object.
- NaN means ‘Not a Number’ and is observed in certain computations like division by zero.
> 0 / 0 [1] NaN
Operations
You can perform operations in R such as arithmetic, logical, bitwise, relational, assignment, etc. A few examples of mathematical operations are given below:
> 1 + 2 [1] 3 > 1 - 2 [1] -1 > 1 * 2 [1] 2 > 1 / 2 [1] 0.5 > 2 ^ 5 [1] 32
A few comparison operator examples are shown in the following R console session:
> 1 < 2 [1] TRUE > 1 > 2 [1] FALSE > 1 >= 1 [1] TRUE > 1 <= 1 [1] TRUE > 1 == 1 [1] TRUE > 1 != 2 [1] TRUE
Examples of logical operations are given below:
> TRUE & FALSE [1] FALSE > 1 && 2 [1] TRUE > TRUE | FALSE [1] TRUE > TRUE || FALSE [1] TRUE
If in doubt, you can use the help ‘?’ operator or the help() function to know more about the operation. For example:
> ?TRUE Logical Vectors Description: Create or test for objects of type ‘“logical”’, and the basic logical constants. Usage: TRUE FALSE T; F logical(length = 0) as.logical(x, ...) is.logical(x) Arguments: ... > help(“+”) Arithmetic Operators Description: These unary and binary operators perform arithmetic on numeric or complex vectors (or objects which can be coerced to them). Usage: + x - x x + y x - y x * y x / y x ^ y x %% y x %/% y Arguments: x, y: numeric or complex vectors or objects which can be coerced to such, or other objects for which methods have been written. Details: ...
The order of precedence for the various R operators from the highest to lowest is given in the following table:
Operator | Description |
:: ::: |
access variables in a name space |
$ @ | component / slot extraction |
[ [[ | indexing |
^ | exponentiation (right to left) |
– + | unary minus and plus |
: | sequence operator |
%any% | special operators (including %% and %/%) |
* / | multiply, divide |
+ – | (binary) add, subtract |
< > <= >= = ! | ordering and comparison |
! | negation |
& && | and |
| || | or |
~ | as in formulae |
-> ->> | rightwards assignment |
<- <<- | assignment (right to left) |
= | assignment (right to left) |
? | help (unary and binary) |
Function
A function is also an object in R that takes objects as input and returns an output object. The syntax is as follows:
function (argument1, argument2, ...)
Everything in R is an object, and unary operator functions have an equivalent form. For example:
> 1 == 2 [1] FALSE > -3 [1] -3 > +4 [1] 4 > 5 * 3 [1] 15
Examples of a few built-in mathematical functions that you can use from the R console are given below:
> cos(90) [1] -0.4480736 > sin(90) [1] 0.8939967 > exp(2) [1] 7.389056 > log(10, 2) [1] 3.321928
You can define a new function using the function keyword, followed by named arguments and the body of the function. For example:
> f <- function (a, b) { a * b } > f(2, 3) [1] 6
You can also pass functions as arguments to other functions. In the following example, proc is a function that takes two arguments, a value and a function, applies the value to the function and returns the result.
> proc <- function (x, func) { + return(func(x)) + } > proc(90, sin) [1] 0.8939967
If you only want to create a local function, you can create an anonymous function in R as follows:
> (function(x) x * 2) (3) [1] 6
The ‘return’ function can be used explicitly to specify the value to be returned by an R function.
> p <- function (x) { return (x * x) } > p(3) [1] 9
Expressions
You can separate multiple expressions in a single line using a semi-colon as shown below:
> a <- 1; b <- 2; c <- 3 > a [1] 1 > b [1] 2 > c [1] 3
You can also use parenthesis to force evaluate an enclosed expression inside it.
> 1 * (3.1415 * 2) [1] 6.283
The curly parenthesis can be used if the expression spans multiple lines and is often seen in a function body.
> 1 * {(3.1415 + * 2)} [1] 6.283
Control structures
R provides a number of loop constructs such as if, repeat, while and for for performing iterations. The syntax for the ‘if’ statement is given below:
if (condition) expression1 else expression2
For example:
> if (TRUE) “TRUE” [1] “TRUE” > if (FALSE) “Not applicable” else “FALSE” [1] “FALSE”
You can have multiple ‘if else’ statements in an ‘if’ statement, as shown below:
> x <- 1 > if (x < 0) { + “Less than zero” + } else if (x == 0) { + “Zero” + } else { + “Greater than zero” + } [1] “Greater than zero”
The repeat expression can be used to produce a looping construct in R. The syntax for ‘repeat’ is as follows:
repeat expression
For example:
> i <- 1 > repeat { if (i > 5) break else { print (i); i <- i + 1 } } [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
You will need to use the break keyword to exit from the loop. The while statement syntax is as follows:
while (condition) expression
Here’s an example:
> i <- 1 > while (i <= 5) { print(i); i <- i + 1 } [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
R also has the for loop for performing iterations.
Its syntax is as follows:
for (var in list) expression
An example is given below:
> for (i in seq(from=1, to=5, by=1)) print(i) [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
You can exit from the R interactive console session using q().
R coding style
A number of coding styles exist for the R programming language such as Google’s R style guide, Tidyverse style guide, etc. We will look at the recommended practices from the Tidyverse style guide. You can use styler and lintr R tools to convert and check that your R code conforms to the style guide.
1. All R source files must have a file name extension .R. For example, criticial.values.R.
2. The variables and function names should use snake case style, such as str_length, as_bare_character and boundary.
3. It is a good practice to use a space after a comma for readability. For example:
> log(10, 2) [1] 3.321928
4. Avoid a space with parenthesis (inside and outside) for a function call:
> proc(90, sin)
5. Use a space around infix operators like ‘+’, ‘-’, ‘==’, ‘<-’, etc.
> 1 + 2 [1] 3
There are exceptions when used with operators having a high precedence.
6. You can align ‘=’ and ‘<-’ expressions for better readability, as illustrated below:
inside <- 0 failure <- 0
7. If you wish to override the default value for a function argument, mention the full name:
stri_opts_regex(case_insensitive = ignore_case, ...)
8. Use two spaces for indentation. For example:
opts <- function(x) { if (identical(x, “”)) { stri_opts_brkiter(type = “character”) } else { attr(x, “options”) } }
9. For code blocks, the opening parenthesis should be at the end of the line, and the closing parenthesis should be the first character on a new line.
as_bare_character <- function(x) { if (is.character(x) && !is.object(x)) { # All OK! return(x) } ... }
10. Ensure that code fits within 80 characters per line, as it is readable when printed.”
11. Use one argument per line if a function call is very long. For example:
str_order <- function(x, decreasing = FALSE, na_last = TRUE, locale = “en”, numeric = FALSE, ...) { }
12. Avoid using a semi-colon at the end of a line.
13. Always use ‘<-’ for an assignment operation instead of ‘=’.
> i <- 1
14. The preference is to use double quotes instead of single quotes for text.
15. Use TRUE and FALSE instead of T and F for readability.}
16. A comment must always begin with the hash symbol followed by a space — ‘#’. Comments must not be written to explain what or how the code works. They must answer the question why.
17. The return keyword must only be used to return early from a function. Otherwise, the last evaluated expression result should be used to return from a function.
18. You can use roxygen2 to provide inline documentation in comments along with your code.
#’ Count the number of matches in a string #’ #’ Vectorised over `string` and `pattern`. #’ #’ @inheritParams str_detect #’ @return An integer vector. #’ @seealso #’ [stringi::stri_count()] which this function wraps. #’ #’ [str_locate()]/[str_locate_all()] to locate position #’ of matches #’ #’ @export #’ @examples #’ fruit <- c(“apple”, “banana”, “pear”, “pineapple”) #’ str_count(fruit, “a”) #’ str_count(fruit, “p”) #’ str_count(fruit, “e”) #’ str_count(fruit, c(“a”, “b”, “p”, “p”)) #’ #’ str_count(c(“a.”, “...”, “.a.a”), “.”) #’ str_count(c(“a.”, “...”, “.a.a”), fixed(“.”)) str_count <- function(string, pattern = “”) { check_lengths(string, pattern) switch(type(pattern), empty = , bound = stri_count_boundaries(string, opts_brkiter = opts(pattern)), fixed = stri_count_fixed(string, pattern, opts_fixed = opts(pattern)), coll = stri_count_coll(string, pattern, opts_collator = opts(pattern)), regex = stri_count_regex(string, pattern, opts_regex = opts(pattern)) ) }
19. The tests/ folder files should match the source files in the R/ folder. For example, for R/case.R the corresponding test file should be tests/testthat/test-case.R.
20. The project sources should have a NEWS file containing the relevant changes to the sources. Also add a LICENSE file to ensure the licence under which the sources are released.
You are encouraged to read the references for more information on the R syntax. In the next article in this R series, we shall explore R data structures.