`<- c(26, 21, 24, 32, 33, 41, 51) temps_F `

# R Basics 4: Vectorized Operations and Data Frames

## 1 Introduction

We are going to continue our discussion about vectors that Horacio started last week and talk about vectorized operations. It’s fine if you don’t know what this means, you will by the end of today’s session. We are also going to introduce the idea of a data frame.

## 2 What are vectorized operations?

I think its easier to see 👀 an example of a vectorized operation, and then we can explain what is happening under the hood.

Let’s create a vector of high temperatures for the weather in Columbus OH in Fahrenheit from the last 7 days. I took this data from January 19 to 25, 2024.

We can call our vector to see what it contains.

` temps_F`

`[1] 26 21 24 32 33 41 51`

A reminder that:

\[ °F = 1.8 * °C + 32 \]

With some light rearrangement, we can write the code below so we can plug in a temp in Fahrenheit, and get out one in Celsius.

```
<- (temps_F - 32)/1.8
temps_C
temps_C
```

`[1] -3.3333333 -6.1111111 -4.4444444 0.0000000 0.5555556 5.0000000 10.5555556`

Here, we simply use the vector `temps_F`

as we would any single number in a smple mathematical operation, and we convert each element or number in our our entire vector using the formula we provide. As a result, we can convert every element of our vector from Fahrenheit to Celsius.

It is worth mentioning that this is

unique in R- many other programming languages would require a for loop or similar to do this operation, but because R is avectorized language, this works easily and simply.

Here is another example. Imagine you have a vector of values and you want to calculate the log base 2 value for each element.

```
<- c(20, 36, 42, 53)
raw
<- log2(raw)
log2
log2
```

`[1] 4.321928 5.169925 5.392317 5.727920`

This is an example where we apply a formula transformation to a vector, but we can also use vectorized operations with two vectors.

## 3 Vector by vector operations

Let’s create some vectors. Note I am not using the `c()`

function to create the vector - since I am using the `:`

to indicate a string of integers from number 1 to number 2, under the hood, R automatically makes this a vector. If you were to use the `c()`

function though you would get the same result.

```
# create a vector with the values from 1 to 10, incrementing by 1
# vector_1to10 <- c(1:10) would also work
<- 1:10
vector_1to10
<- 1:5
vector_1to5
<- 1:3
vector_1to3
<- 0:4 vector_0to4
```

`vector_1to10`

and `vector_1to5`

?
```
# a vector reminder
vector_1to10
```

` [1] 1 2 3 4 5 6 7 8 9 10`

` vector_1to5`

`[1] 1 2 3 4 5`

```
# adding
+ vector_1to5 vector_1to10
```

` [1] 2 4 6 8 10 7 9 11 13 15`

We see that `vector_1to5`

gets recycled, meaning once the 5 elements in the vector have been used, the addition starts back from the beginning again. If we want to be explicit:

- Element 1 of
`vector_1to10`

gets added to element 1 of`vector_1to5`

- Element 2 of
`vector_1to10`

gets added to element 2 of`vector_1to5`

- Element 3 of
`vector_1to10`

gets added to element 3 of`vector_1to5`

- Element 4 of
`vector_1to10`

gets added to element 4 of`vector_1to5`

- Element 5 of
`vector_1to10`

gets added to element 5 of`vector_1to5`

. - Element 6 of
`vector_1to10`

gets added to**element 1**of`vector_1to5`

and so on…

⚠️ **This happens without R explicitly telling you this is happening. We do not get a warning.**

`vector_1to10`

and `vector_1to3`

?
```
# a vector reminder
vector_1to10
```

` [1] 1 2 3 4 5 6 7 8 9 10`

` vector_1to3`

`[1] 1 2 3`

```
# adding
+ vector_1to3 vector_1to10
```

```
Warning in vector_1to10 + vector_1to3: longer object length is not a multiple
of shorter object length
```

` [1] 2 4 6 5 7 9 8 10 12 11`

⚠️ **We see that vector_1to3 gets recycled, but now we have a warning. We get this because the vector lengths are not divisible by each other to produce an integer. R still will perform our operation, but it tells us hey this might not be exactly what you want.**

`vector_1to10`

by `vector_0to4`

?
`/ vector_0to4 vector_1to10 `

```
[1] Inf 2.000000 1.500000 1.333333 1.250000 Inf 7.000000 4.000000
[9] 3.000000 2.500000
```

We see that `vector_0to4`

gets recycled, but now we have a new output we haven’t seen before, `Inf`

which stands for infinity/infinite.

If we are then to try and sum this vector:

`sum(vector_1to10 / vector_0to4)`

`[1] Inf`

We get an answer of infinite. This is because we have at least one infinite value, so including it as part of a sum makes that sum infinite too.

⚠️ **Think about how this might work if you have NAs in your vector**

## 4 Filtering vectors

We can also write code that allows us to filter our vector based on certain conditions. What if we wanted to know which observations had data where the temperature was above freezing? We could use a mathematical operator (in this case, `>`

) to find instances when the statement we provide is true.

`> 0 temps_C `

`[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE`

When we run this code, we get a series of values corresponding to each element of the vector, and whether or not the statement we have prompted with (`temps_C > 0`

) is true or false.

If we wanted to see how many observations meet these criteria, we can take advantage of the fact that `TRUE`

is coded as 1, and `FALSE`

and 0.

`sum(temps_C > 0)`

`[1] 3`

We know then that there are 3 observations where the temperature is above freezing.

If we want to know **which** observations are above freezing, we can use the function `which()`

which shows us the indices of the elements in our vector for which our code evaluates to `TRUE`

.

`which(temps_C > 0)`

`[1] 5 6 7`

This tells us that the 5th, 6th, and 7th elements in our data have temperatures higher than freezing. In our vector we don’t have the specific days associated with each temperature embedded. We can assign meta-data to our values by using the function `names()`

for our vector. I’ll show you how to do that below.

```
# create a new vector that has the dates we want to assign
<- c("1/19", "1/20", "1/21", "1/22", "1/23", "1/24", "1/25")
dates
# we can assign these dates as the "names" of our vector temps_C
names(temps_C) <- dates
# now our temps have dates associated with them
temps_C
```

```
1/19 1/20 1/21 1/22 1/23 1/24 1/25
-3.3333333 -6.1111111 -4.4444444 0.0000000 0.5555556 5.0000000 10.5555556
```

If you knew you wanted to do this at the outset, you could create the vector along with its names all at once using the handy function `structure()`

.

```
<- structure(temps_F,
(temps_F_names names = dates))
```

```
1/19 1/20 1/21 1/22 1/23 1/24 1/25
26 21 24 32 33 41 51
```

You can also set other attributes using the function

`structure()`

.

You might use the sometimes, but more likely you would want to work using a data frame.

## 5 Practice

`help()`

or put a `?`

before a function, you can learn more about its arguments and use. You might want to look back to last week’s Code Club if you don’t remember the simplified function for doing this.
```
<- seq(from = 2, by = 11, length.out = 10)
a_vector
a_vector
```

` [1] 2 13 24 35 46 57 68 79 90 101`

A zero remainder means the number is even.

```
# use modulo operator %% to see if a number is even
# if the remainer is 0, it is even
%% 2 a_vector
```

` [1] 0 1 0 1 0 1 0 1 0 1`

```
# how many are even?
10 - sum(a_vector %% 2)
```

`[1] 5`

## 6 What is a data frame?

Below I am creating a simple data frame, this is a rectangular data object with rows and columns. In reality, you will probably be reading your data in from a `.csv`

or `.xlsx`

file, but you can also construct a data frame directly in R. We are doing this with the function `data.frame()`

which is a part of base R.

```
# create a simple 3 by 4 data frame called small_df
# columns are a, b, and c
# values are numbers from 1 to 4, 5 to 8, and 9 to 12
<- data.frame(a = 1:4,
small_df b = 5:8,
c = 9:12)
```

Let’s look at `small_df`

.

` small_df`

```
a b c
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
```

We can think about a data frame as being composed of a series of vectors (as columns) squished together. In the case above, each column is of the same type, integer or numeric but this doesn’t have to be true. Note the first column of 1:4 is actually the row index - you can tell this as it doesn’t have a column name.

Vectors all must be of the same object type (i.e., character, numeric, logical). In a data frame, each column can be a different object type. This is very useful especially when you have to have meta-data combined with numeric data.

We call use some new functions to look more at our data frame.

`View()`

will open our data frame like you might be used to looking at a spreadsheet in Excel.

`View(small_df)`

`str()`

gives us the structure of the object x.

`str(small_df)`

```
'data.frame': 4 obs. of 3 variables:
$ a: int 1 2 3 4
$ b: int 5 6 7 8
$ c: int 9 10 11 12
```

`dim()`

gives us the dimensions of x.

`dim(small_df)`

`[1] 4 3`

Note, that if you use `dim()`

on a vector (which has only one dimension), you will get something unexpected.

```
# this doesn't work
dim(temps_C)
```

`NULL`

For a vector, instead you should use the function `length()`

to figure out how many elements are in the vector.

```
# this does work
length(temps_C)
```

`[1] 7`

## 7 Performing operations on a data frame

Now that we have our `small_df`

we can try performing some operations on it.

What if you wanted to take the log base 2 for every value in `small_df`

? Just like we did for a vector, we can simply use the function `log2()`

on `small_df`

. Here I am assigning the output of this to a new data frame called `small_df_log2`

.

`<- log2(small_df)) (small_df_log2 `

```
a b c
1 0.000000 2.321928 3.169925
2 1.000000 2.584963 3.321928
3 1.584963 2.807355 3.459432
4 2.000000 3.000000 3.584963
```

Remember, if you assign something to an object, but also want it to print at the same time, just surround the whole thing in parentheses.

What if I wanted to take the log base 2 of only column `a`

? There are a few ways I can do that.

### 7.1 By indexing

I can use the square brackets `[]`

to access certain parts of my data frame. Remember for data frames, the syntax is [rows, columns]. Let’s try some example where we access by using indices (i.e., the position of the row, column, or the combination) to grab different parts of our data frame.

```
# first row, second column
1,2] small_df[
```

`[1] 5`

```
# all rows, second column
2] small_df[,
```

`[1] 5 6 7 8`

```
# third row, all columns
3,] small_df[
```

```
a b c
3 3 7 11
```

Is there any difference as to the types of objects we get when doing these different types of selection? We can use the function `str()`

to see what type of R object we have in the different scenarios.

```
# first row, second column
str(small_df[1,2])
```

` int 5`

```
# all rows, second column
str(small_df[,2])
```

` int [1:4] 5 6 7 8`

```
# third row, all columns
str(small_df[3,])
```

```
'data.frame': 1 obs. of 3 variables:
$ a: int 3
$ b: int 7
$ c: int 11
```

Back to the original idea - how would we take the data only from column `a`

and log base 2 transform it, but leave the rest of `small_df`

as it is?

```
# first save our df with a new name
# this is like doing a "save as" for an existing df
<- small_df
small_df_log2_a
# log2 transform the first column, and then assign that back to the first column
1] <- log2(small_df[,1])
small_df_log2_a[,
# check to see if it worked
small_df_log2_a
```

```
a b c
1 0.000000 5 9
2 1.000000 6 10
3 1.584963 7 11
4 2.000000 8 12
```

Note that column `a`

now is the object type “dbl” or numeric, instead of “int” or integer, since our log2 transformation has made our values no longer integers.

### 7.2 By name

We can also index by providing column names - this is a nice approach since when going back to read code, a column name is often more informative than a column index/number. Note that in this case the column name `a`

is in quotes since it is a character string.

```
# all rows, the column "a"
"a"] small_df[,
```

`[1] 1 2 3 4`

We can also use the dollar sign accessor `$`

to extract an element in a data frame or list. In the case of a data frame, you are accessing a column (though you could be accessing something else if a different type of object). Note that when you type the name of a data frame, and then the $, R will prompt you with some possible column names.

`$a small_df`

`[1] 1 2 3 4`

By using this syntax, you are turning your data frame into a vector. This may or may not be what you want to do, so just be aware.

`tidyverse`

functions `select()`

and `filter()`

. If you don’t know what I’m talking about simply ignore this, we will go over it in a future Code Club.
## 8 Practice

```
# i'm just using different ways to creating strings of numbers
<- data.frame(col_1 = c(2, 3, 4),
df col_2 = seq(from = 5, to = 10, by = 2.5),
col_3 = 4:6,
col_4 = rnorm(3, mean = 10, sd = 2),
col_5 = rep(x = 6, times = 3))
```

` df`

```
col_1 col_2 col_3 col_4 col_5
1 2 5.0 4 8.706182 6
2 3 7.5 5 13.453908 6
3 4 10.0 6 7.366212 6
```

Here are some different ways to do this.

`$col_1 * df$col_2 df`

`[1] 10.0 22.5 40.0`

`1] * df[,2] df[,`

`[1] 10.0 22.5 40.0`

`"col_1"] * df[,"col_2"] df[,`

`[1] 10.0 22.5 40.0`

```
<- df$col_3 + df$col_4
col_6
<- data.frame(df,
new_df col_6)
```

` new_df`

```
col_1 col_2 col_3 col_4 col_5 col_6
1 2 5.0 4 8.706182 6 12.70618
2 3 7.5 5 13.453908 6 18.45391
3 4 10.0 6 7.366212 6 13.36621
```

```
<- c(1, 2, 3, 4, NA)
test_NA_1
class(test_NA_1)
```

`[1] "numeric"`

```
<- c(1, 2, 3, 4, "NA")
test_NA_2
class(test_NA_2)
```

`[1] "character"`