R Basics 3: Built-in functions, Vectors, and Help

r-basics
Author

Horacio Lopez-Nicora

Published

January 26, 2024



1 Introduction

Recap of last week

Last week, we did a little more basic interaction with R (missing prompts and data types), we wrote code in R scripts (and added comments to our code), and used and named R objects.

Here are some additional tips from our previous session: Go to Tools and then Keyboard Shortcuts Help. Identify useful keyboard shortcuts and use them during today’s session.

Learning objectives for today

  • Built-in Functions

  • Vectors in R

  • Getting Help with R


2 Types of Functions in R

Functions are the foundation of almost everything in R. In programming, they are sets of organized instructions designed to perform specific tasks. The purpose of functions is to create self-contained programs that can be called upon as needed.

Types of Functions in R

Fig. 1. Types of Functions in R

2.1 What’s in a function?

What exactly is a function? Let’s recall from our math knowledge:

Function Rules

Fig. 2. Function Rules

A function in R is a collection of statements that can be reused in a program. This is the syntax of defining a function in R:

R Function Syntax

Fig. 3. R Function Syntax
Which function was covered during our first session of Code Club? (Click for the answer)

During our first session of Code Club, we examined the setwd function.

2.2 Types of built-in functions in R?

Built-in functions, which are already created or defined in the programming framework, are referred to as built-in functions. R offers a comprehensive collection of functions that can effectively handle almost any task for the user. These built-in functions are categorized based on their functionality as follows.

Types of built-in functions in R

Fig. 4. Types of built-in functions in R
Before we begin examining various functions


Math functions

A numeric function in R is defined as a function that can accept either a set of numeric values or a numeric vector (see below) as an input argument to carry out specific tasks. Here are several frequently used numeric functions in R programming.

Function Description
abs(x) absolute value
sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
trunc(x) trunc(5.99) is 5
round(x , digits= n) round(3.475, digits=2) is 3.48
signif(x , digits= n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also asin(x), acos(x), cosh(x), acosh(x), etc.
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x

Exercises 1

A) Let’s explore one function in particular: sum

Last week we used R as a calculator. Add 5 and 7 the way we did last week and then try using the built-in function sum.

Using R as a calculator to add 5 and 7:

Solution (click here)
5 + 7
[1] 12

Using the sum built-in function in R, add 5 and 7:

Solution (click here)
sum(5, 7)
[1] 12

B) Now, let’s combine functions by adding 3, 7, 9, and 11. After that, we will multiply the sum by 3. Lastly, we will calculate the square root of this result and round it to the nearest whole number.

Solution (click here)
round(sqrt(sum(3, 7, 9, 11)*3))
[1] 9

C) Below is a very common example in my data analysis.

In the field of Plant Pathology, data such as disease incidence or severity is typically collected as a percentage or proportion. To prepare the data for analysis, it is common to apply a data transformation known as the arc-sine square root. You have gathered disease severity data from three plots: 0.75 (control), 0.70 (Trt 1), and 0.30 (Trt 2). Apply the transformation mentioned above to your data for analysis. (Click for the answer)
Control <- 0.75
Trt1 <- 0.70
Trt2 <- 0.30

asin(sqrt(Control))
[1] 1.047198
asin(sqrt(Trt1))
[1] 0.9911566
asin(sqrt(Trt2))
[1] 0.5796397

Statistical probability functions

The table below provides descriptions of functions pertaining to probability distributions.

Function Description
dnorm(x) normal density function (by default m=0 sd=1)
pnorm(q) cumulative normal probability for q
(area under the normal curve to the left of q)
pnorm(1.96) is 0.975
qnorm(p) normal quantile. 
value at the p percentile of normal distribution
qnorm(.9) is 1.28 # 90th percentile
rnorm(n, m=0, sd=1) n random normal deviates with mean m
and standard deviation sd.

Let’s generate 10 random normal variates with mean=50, sd=10.

x <- rnorm(10, m=50, sd=10)
round(x) # Rounding to the nearest whole number. 
 [1] 50 62 78 58 51 38 47 56 66 57

Now let’s do the same thing, but call it y .

Did you get the same result?

To ensure reproducibility of pseudo-random numbers for the random number generators listed, you can utilize set.seed(1234) or any other integer.

Example (click here)
set.seed(1234)
x <- round(rnorm(10, m=50, sd=10))
x
 [1] 38 53 61 27 54 55 44 45 44 41

Other statistical and useful functions

Other useful statistical functions are provided in the following table.

Function Description
seq(from , to , by) generate a sequence
indices <- seq(1,10,2)
#indices is c(1, 3, 5, 7, 9)
rep(x , ntimes) repeat x n times
y <- rep(1:3, 2)
# y is c(1, 2, 3, 1, 2, 3)

Each has the option na.rm to strip missing values before calculations. Otherwise the presence of missing values will lead to a missing result.

mean(x , trim=0,
na.rm= FALSE )
mean of object x
sd(x) standard deviation of object(x).
median(x) median
range(x) range
sum(x) sum
min(x) minimum
max(x) maximum
Nota bene

Object can be a numeric vector or data frame.

Exercises 2

A) Let’s explore the following functions together using the group of numbers: 5, 7, 3, and 9 (in this order). These functions include: sum, min, max, and range

Solution (click here)
sum(5, 7, 3, 9)
[1] 24
min(5, 7, 3, 9)
[1] 3
max(5, 7, 3, 9)
[1] 9
range(5, 7, 3, 9)
[1] 3 9

B) Let’s now get the average, standard deviation, and sort these numbers using mean, sd, and sort.

Solution (click here)
mean(5, 7, 3, 9)
[1] 5
sd(5, 7, 3, 9)
Error in sd(5, 7, 3, 9): unused arguments (3, 9)
sort(5, 7, 3, 9)
Error in sort(5, 7, 3, 9): 'decreasing' must be a length-1 logical vector.
Did you intend to set 'partial'?


3 Vectors in R

Depending on the type of data that one needs to store in R, different data structures can be used. The four most commonly used data structures in R are vectors, lists, matrices, and data frames. In this session, we will only be working with vectors.

The fundamental data structure in R is vectors, which are 1-dimensional data structures that can only contain one type of data (e.g., all entries must have the same mode). To create a vector in R, the function c() (concatenate or combine) is used, as shown below.

Let’s create a vector named “my_vector” with 5 entries.

my_vector <- c(10, 30, 50, 20, 40)
my_vector
[1] 10 30 50 20 40

The output generated on the previous code chunk displays the entries in your vector, with the 1 in squared brackets indicating the position of the entry to its right in the vector. In this case, 10 is the first entry of the vector.

If, for any reason, we only wish to extract the value 50 from this vector, we can utilize our knowledge of it being in the third position to do so.

my_vector[3]
[1] 50

Since a vector can only contain one data type, all its members need to be of the same type. If you attempt to combine data of different types into a vector, R will not provide a warning, but rather coerce it to the most flexible type. (The order of flexibility, from least to most, is: logical, integer, double, character). Therefore, if you add a number to a logical vector, the entire vector will be converted to a numeric vector.

To check what data type an object is, run the R built-in function class(), with the object as the only parameter.

class(my_vector)
[1] "numeric"

If you for any reason want to have more information about any object you have stored in your R session the command str() is very helpful.

str(my_vector)
 num [1:5] 10 30 50 20 40

Exercises 3

A) Let’s revisit mean, sd, sort, and other basic R operators.

mean(my_vector)
[1] 30
sd(my_vector)
[1] 15.81139
sort(my_vector)
[1] 10 20 30 40 50

B) Add 7 to the my_vector, multiply 3 by my_vector, and check which values are greater than 25.

Solution (click here)
7 + my_vector
[1] 17 37 57 27 47
3 * my_vector
[1]  30  90 150  60 120
25 >= my_vector
[1]  TRUE FALSE FALSE  TRUE FALSE

C) Please create another_vector and add it to my_vector. Next, use the sum function to combine these vectors.


4 Getting Help with R

Before seeking assistance from others, it is generally advisable for you to attempt to resolve the problem on your own. R provides comprehensive tools for accessing documentation and searching for help.

4.1 R Help: help() and ?

The help() function and ? help operator in R offer access to documentation pages for R functions, data sets, and other objects. They provide access to both packages in the standard R distribution and contributed packages.

Exercises 4

A) Can you obtain information about the type of R object that mean() and sum() functions take by using the help() or ? functions?

Back to top