Probability Distributions in R

0
513
Probability Distributions in R

In this fourteenth article in the R series, we shall explore the common probability distribution functions available in R.

We will use version 4.2.1 installed on Parabola GNU/Linux-libre (x86-64) for the code snippets.

$ R --version

R version 4.2.1 (2022-06-23) -- “Funny-Looking Kid”
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

Consider the mtcars data set in the lattice library:

> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

The summary() function can be used to produce the basic statistics of the data, as shown below:

> summary(mtcars)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0

The Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, and maximum) can be obtained using the fivenum() function, as follows:

> mtcars$disp
[1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
[13] 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0 304.0 350.0
[25] 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0

> fivenum(mtcars$disp)
[1] 71.10 120.65 196.30 334.00 472.00

A histogram of the mtcars displacement data is shown in Figure 1.

Figure 1: mtcars$disp histogram
Figure 1: mtcars$disp histogram

The stem() function can generate a stem-and-leaf plot for input data. For example, we can obtain a plot for the mtcars cylinder, as demonstrated below:

> mtcars$cyl
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

> stem(mtcars$cyl)

The decimal point is at the |

4 | 00000000000
4 |
5 |
5 |
6 | 0000000
6 |
7 |
7 |
8 | 00000000000000

Uniform

The Uniform distribution has the following density function:

f(x) = 1/b-a

…where a <= x <= b. The runif() function generates random deviates, while the dunif() function gives the density, as illustrated below:

> d <- runif(10, min=0, max=1)
> d
[1] 0.48730291 0.26676446 0.15025110 0.77975962 0.42828890 0.05255129
[7] 0.54070173 0.54715839 0.42703064 0.31628728

> dunif(d)
[1] 1 1 1 1 1 1 1 1 1 1

These functions accept the following common arguments:

Arguments Description
x, q A vector of quantiles
p A vector of probabilities
n Number of observations
log, log.p A TRUE logical value indicates probabilities as log(p)
lower.tail A TRUE logical value that probabilities are P[X <= x]]

The specific set of arguments for the uniform distribution is given below:

Arguments Description
min, max  Finite lower and upper distribution limits

Binomial

The density of the binomial distribution with probability ‘p’ and size ‘n’ is given by:

p(x) = select(n,x) p^x (1-p)^(n-x)

…where ‘x’ varies from 0 to n. The p(x) is calculated using the Loader’s algorithm. The pbinom() function gives the distribution function while the rbinom() function provides random values, as shown below:

> rbinom(5, 5, 1)
[1] 5 5 5 5 5

> rbinom(5, 5, 0.5)
[1] 3 2 2 4 2

> rbinom(5, 5, 0.25)
[1] 0 3 0 2 2

An example of pbinom() function for five successes in 10 trials, each having probability 0.5, is as follows:

> pbinom(5, 10, 0.5)
[1] 0.6230469

In addition to the common arguments listed previously, the binomial distribution function accepts the following arguments:

Arguments Description
size Number of trials
prob Probability of success in each trial

 Exponential

The exponential function for a given rate ‘lambda’ is given by:

f(x) = lambda {e}^{- lambda x}

…where x >= 0. The rexp() function provides random deviations, the qexp() function gives the quantile function, and the pexp() function provides the distribution function. A couple of examples are given below:

> rexp(5, rate = 1)
[1] 0.44090566 0.18508747 1.18596321 0.86235515 0.05020871

> pexp(d, rate = 1)
[1] 0.38571907 0.23414656 0.13950812 0.54148378 0.34837687 0.05119434
[7] 0.41766053 0.42140839 0.34755644 0.27114996

The specific ‘rate’ argument is applicable to the exponential function.

Arguments Description
rate  A vector of rates

Geometric

The geometric distribution density with probability ‘p’ is defined as follows:

p(x) = p (1-p)^x

…where x = 0, 1, 2, etc, and ‘p’ is between 0 and 1. The rgeom() function for 10 observations and probability 0.5 is given below:

> d <- rgeom(10, 0.5)
> d
[1] 0 1 1 0 0 0 0 0 1 1

The pgeom() and qgeom() functions are based on closed-form formulae and provide the distribution and quantile functions, respectively.

> pgeom(d, 0.5)
[1] 0.50 0.75 0.75 0.50 0.50 0.50 0.50 0.50 0.75 0.75

> qgeom((1:10)/15, prob = 0.2)
[1] 0 0 0 1 1 2 2 3 4 4

The argument specific to the geometric function is given below:

Arguments Description
prob Probability of success in each trial

Normal

The normal distribution for mean ‘mu’ and standard deviation ‘sigma’ is given by:

f(x) = 1/(sqrt(2 pi) sigma) e^-((x - mu)^2/(2 sigma^2))

A sample of ten random values for mean 0 with standard deviation 1 can be generated using the rnorm() function, as follows:

> d <- rnorm(10, mean=0, sd=1)
> d
[1] 0.66512403 -0.43102217 0.50305386 1.55028087 1.20598557 0.33704698
[7] 0.09200192 0.67090448 -0.24424568 -0.78798191

Example uses of pnorm() and dnorm() functions are given below:

> pnorm (d, mean=0, sd=1)to
[1] 0.7470144 0.3332261 0.6925368 0.9394629 0.8860885 0.6319593 0.5366517
[8] 0.7488593 0.4035203 0.2153536

> dnorm(d, mean=0, sd=1)
[1] 0.3197763 0.3635536 0.3515265 0.1199568 0.1927928 0.3769138 0.3972575
[8] 0.3185439 0.3872184 0.2924691

The normal distribution specific arguments are listed for your reference below.

Arguments Description
mean A vector of means
sd A vector of standard deviations

Poisson

The Poisson distribution density is defined as follows:

p(x) = lambda^x exp(-lambda)/x!

…where x = 0, 1, 2, … The mean and variance is lambda.

> d <- rpois(10, 0.9)
> d
[1] 2 2 1 3 1 2 1 1 0 0

The dpois() function gives the log density while the ppois() function gives the log distribution function. A couple of examples are given below:

> dpois(d, 0.9)
[1] 0.16466071 0.16466071 0.36591269 0.04939821 0.36591269 0.16466071
[7] 0.36591269 0.36591269 0.40656966 0.40656966

> ppois(d, 0.9)
[1] 0.9371431 0.9371431 0.7724824 0.9865413 0.7724824 0.9371431 0.7724824
[8] 0.7724824 0.4065697 0.4065697

The Poisson distribution function specific parameter is lambda, as shown below.

Arguments Description
lambda A vector of non-negative means

The Poisson CDF can be plotted in a graph, as shown below:

> x <- seq(-0.5, 5, 0.05)

> head(x)
[1] -0.50 -0.45 -0.40 -0.35 -0.30 -0.25

> plot(x, ppois(x, 1), type = “s”, ylab = “F(x)”, main = “Poisson CDF”)
Figure 2: ppois plot
Figure 2: ppois plot

Weibull

The shape parameter ‘a’ and scale parameter ‘b’ define the Weibull distribution density, as follows:

f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)

…where x > 0. The dweibull() function gives the density while the rweibull() function generates random deviates, as illustrated below:

> x <- c(0, rlnorm(10))
> x
[1] 0.0000000 0.5194988 2.2301132 0.1844523 0.5267321 0.5487738 1.3437871
[8] 2.0694222 3.6199835 6.3532987 5.9563711

> all.equal(dweibull(x, shape=1), dexp(x))
[1] TRUE

> rweibull(10, 5, 1)
[1] 0.5197291 0.6851027 1.0279023 1.1397738 0.8810083 0.7945575 0.5939060
[8] 0.8762381 0.7979457 0.6977511

The specific parameters for the Weibull distribution are as follows.

Arguments Description
shape, scale shape and scale parameters

You are encouraged to read the manual pages for the above R functions to learn more on their arguments, options and usage.

LEAVE A REPLY

Please enter your comment!
Please enter your name here