Vectors and data types

Authors
Affiliation

Software Carpentry

Jelmer Poelstra

Published

February 13, 2026



1 Introduction

1.1 What we’ll cover

In this session, you will learn about some of the ways that R stores data. Specifically, we will cover vectors and data types.

  • Vectors are the simplest R “data structure”, object types that R can store data in. (In the next session, you’ll get introduced to another data structure: the data frame.)

  • Data types are how R distinguishes between different kinds of data like numbers and text. We’ll talk about the 4 main data types: character, integer, double, and logical.

1.2 Setting up

To make it easier to keep track of what we do, we’ll write our code in a script:

  1. Open a new R script: Click the + symbol in the top toolbar, then click R Script1.

  2. Save the script straight away as data-structures.R. You can save it anywhere you like, but ideally in a folder that is specifically for this workshop.

  3. If you want section headers as comments in your script, like in the script I am showing you in the live session, then copy-and-paste the following into your script:

Section headers for your script (Click to expand)
# 2 - Vectors ------------------------------------------------------------------
# 2.1 - Single-element vectors (and quoting)

# Challenge 1 - Quoting

# 2.2 - Multi-element vectors

# 2.3 - Vectorization

# Challenge 2

# 2.4 - Exploring vectors

# 2.5 - Extracting element from vectors

# Challenge 3

# 3 - Data types ---------------------------------------------------------------
# 3.1 - R's main data types

# 3.2 - A vector can only contain one data type

# Challenge 4

# 3.3 - Manual type conversion

2 Vectors

A vector in R is essentially a collection of one or more items. Moving forward, we’ll call these items “elements”.

2.1 Single-element vectors (and quoting)

Vectors can consist of just a single element, so each of the two lines of code below creates a vector:

vector1 <- 8
vector2 <- "panda"

In the "panda" example, which is a character string (string for short):

  • "panda" constitutes one element, not 5 (its number of letters).
  • Unlike when dealing with numbers, we have to quote the string.2

Character strings need to be quoted because they are otherwise interpreted as R objects – for example, because vector1 and vector2 are objects, we refer to them without quotes:

# [Note that R will show auto-complete options after you type 3 characters]
vector1
[1] 8
vector2
[1] "panda"

Meanwhile, the code below doesn’t work because there is no object called panda:

vector_fail <- panda
Error: object 'panda' not found

Challenge 1: Quoting

Which of the following (one or more options) will produce an error?

  1. number <- "42"
  2. color <- blue
  3. animal <- "djfhjkhkjkhCGT"
  4. number <- 10^6

2.2 Multi-element vectors

A common way to make vectors with multiple elements is by using the c (combine) function:

num_vec <- c(2, 6, 3, 41)
num_vec
[1]  2  6  3 41

The c() function can also append elements to an existing vector:

# First we create a vector:
birds <- c("cardinal", "chickadee")
birds
[1] "cardinal"  "chickadee"
# Then we append another element to it:
c(birds, "bald eagle")
[1] "cardinal"   "chickadee"  "bald eagle"

Unlike in the first couple of vector examples, we didn’t save the above vector to an object. While the vector is still created this way, it is only printed to the console.

Vectors with series of numbers are commonly useful, and a handy shortcut to make series of whole numbers (integers) is with the : operator:

series_vec <- 1:10
series_vec
 [1]  1  2  3  4  5  6  7  8  9 10
# Another example:
15:20
[1] 15 16 17 18 19 20

2.3 Vectorization

Consider the output of this command:

series_vec * 2
 [1]  2  4  6  8 10 12 14 16 18 20

Above, every individual element in num_vec was multiplied by 2. We call this behavior “vectorization” and this is a key feature of the R language!


Challenge 2: Vectorization

  1. Make a vector x with the (whole) numbers 1 through 26
  2. Subtract 0.5 from each element in x and save the result in vector y
  3. Check your results by printing both vectors
Click for the solution
x <- 1:26
y <- x - 0.5
x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
y
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5 13.5 14.5
[16] 15.5 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 24.5 25.5

2.4 Exploring vectors

R has many functions that provide information about vectors and other types of objects, such as:

  • Get the number of elements with length():

    length(series_vec)
    [1] 10
  • See the first and last few elements, respectively, with head() and tail():

    # Print the first 6 elements:
    head(series_vec)
    [1] 1 2 3 4 5 6
    # Print the last 6 elements:
    tail(series_vec)
    [1]  5  6  7  8  9 10
    # Both head and tail have argument `n` to specify the number of elements:
    tail(series_vec, n = 2)
    [1]  9 10
  • Get arithmetic summaries like mean() for vectors with numbers:

    # mean() will compute the mean (average) across all elements
    mean(series_vec)
    [1] 5.5

2.5 Extracting elements from vectors

Extracting element from objects like vectors is often called indexing. In R, we can do this using “bracket notation” with square brackets [ ] — for example:

  • Get the second element with [2]3:

    num_vec[2]
    [1] 6
  • Get the second through the fourth elements with [2:4]:

    num_vec[2:4]
    [1]  6  3 41

Challenge 3: Indexing


  1. Given the fruits vector below, what does fruits[c(1, 3)] return?

    fruits <- c("apple", "banana", "cherry", "date")
  2. What do you expect birds[3] will return?

    1. "bald eagle"
    2. bald eagle
    3. An error
    4. Something else
  3. What do you expect mean(birds) will return?

    1. "cardidee"
    2. "chickanal"
    3. The original vector birds
    4. Nothing meaningful, maybe an error

3 Data types

3.1 R’s main data types

You saw in the exercise above that R can’t compute the mean of a vector with bird names. R distinguishes between different kinds of data, such as character strings and numbers, using several pre-defined “data types”. And R’s behavior in various operations depends heavily on the data type.

As another example, this fails:

"valerion" * 5
Error in "valerion" * 5: non-numeric argument to binary operator

We can ask what the data type of something is using the typeof() function:

typeof("valerion")
[1] "character"

R had automatically set the data type of "valerion" to character, i.e. a (character) string. The earlier command failed because R can’t perform mathematical functions (“binary operator”) on vectors of type character (“non-numeric argument”).

The character data type most commonly contains letters, but anything that is placed between quotes ("...") will be interpreted as this data type — even plain numbers:

typeof("5")
[1] "character"

Besides character, three other common data types are:

  • double (sometimes referred to as numeric) — numbers that can have decimal points:

    typeof(3.14)
    [1] "double"
  • integer — whole numbers only:

    typeof(1:3)
    [1] "integer"
  • logical (either TRUE or FALSE, unquoted):

    typeof(TRUE)
    [1] "logical"

3.2 A vector can only contain one data type

A vector can only be composed of a single data type. As we saw above, R silently picks the “best-fitting” data type when you create a vector.

Challenge 4: Data types

In each line below, what do you think the data type (if any) will be? Try it out and see if you were right.

typeof("TRUE")
typeof(banana)
typeof(c(2, 6, "3"))
Click for the solutions
  1. "TRUE" is character (and not logical) because of the quotes around it:

    typeof("TRUE")
    [1] "character"

  1. Recall the earlier example: this returns an error because the object banana does not exist. Any unquoted string (that is not a special keyword like TRUE and FALSE) is interpreted as a reference to an object in R.

    typeof(banana)
    Error: object 'banana' not found

  1. This produces a character vector, and we’ll talk about why in the next section:

    typeof(c(2, 6, "3"))
    [1] "character"

R’s behavior of returning a character vector for c(2, 6, "3") in the challenge above is called type coercion. Because a vector can consist of only a single data type, R forces all elements to be of the same type.

Type coercion can be the source of many surprises, and is one reason you need to be aware of these data types and R’s behavior around them.

3.3 Manual type conversion

Luckily, you are not just at the mercy of whatever R decides to do automatically, but can convert vectors using the as. group of functions:

Try to use RStudio’s auto-complete functionality here: type “as.” and then press the Tab key.
as.integer(c("0", "2"))
[1] 0 2
as.character(c(0, 2))
[1] "0" "2"

As you may have guessed, though, not all type conversions are possible — for example:

as.double("kiwi")
Warning: NAs introduced by coercion
[1] NA

(NA is R’s way of denoting missing data – see this bonus section for more.)

That’s it for this session! After lunch, we’ll learn to manipulate and summarize data frames using a real dataset with periodic countrywise statistics such as population size.



4 Bonus material for self-study

4.1 Changing vector elements using indexing

Above, we saw how you can extract elements of a vector using indexing. To change elements in a vector, simply use the bracket on the other side of the arrow – for example:

  • Change the first element to 30:

    num_vec[1] <- 30
    num_vec
    [1] 30  6  3 41
  • Change the last element to 0:

    num_vec[length(num_vec)] <- 0
    num_vec
    [1] 30  6  3  0
  • Change the second element to the mean value of the vector:

    num_vec[2] <- mean(num_vec)
    num_vec
    [1] 30.00  9.75  3.00  0.00

4.2 The data frame data structure

One of R’s most powerful features is its built-in ability to deal with tabular data – i.e., data with rows and columns like you are familiar with from Excel spreadsheets and so on. In R, tabular data is stored in a data structure called “data frame”.

Let’s start by using the data.frame() function to make a data frame with information about 3 cats:

cats <- data.frame(
  name = c("Luna", "Thomas", "Daisy"),
  coat = c("calico", "black", "tabby"),
  weight = c(2.1, 5.0, 3.2)
  )
cats
    name   coat weight
1   Luna calico    2.1
2 Thomas  black    5.0
3  Daisy  tabby    3.2

Above:

  • We created 3 vectors and pasted them side-by-side to create a data frame in which each vector constitutes a column.
  • We gave each vector a name (e.g., coat), and those names became the column names.
  • The resulting data frame has 3 rows (one for each cat) and 3 columns (each with a type of info about the cats, like coat color).

It is good practice to organize tabular data in the so-called “tidy” data format like above, where:

  • Each column contains a different “variable” (e.g. coat color, weight)
  • Each row contains a different “observation” (data on e.g. one cat/person/sample)

4.3 Extracting columns from a data frame

You can extract individual columns from a data frame using the $ operator:

cats$weight
[1] 2.1 5.0 3.2
cats$coat
[1] "calico" "black"  "tabby" 

This kind of operation will return a vector – and can be indexed as well:

cats$weight[2]
[1] 5

4.4 More on the logical data type

Add a column to your cats data frame that indicates whether each cat does or does not like string:

cats$likes_string <- c(1, 0, 1)
cats
    name   coat weight likes_string
1   Luna calico    2.1            1
2 Thomas  black    5.0            0
3  Daisy  tabby    3.2            1

So, likes_string is numeric, but the 1s and 0s actually represent TRUE and FALSE.

You could instead use the logical data type here, by converting this column with the as.logical() function. That will turn 0’s into FALSE and everything else, including 1, to TRUE:

as.logical(cats$likes_string)
[1]  TRUE FALSE  TRUE

To actually modify this column in the dataframe itself:

cats$likes_string <- as.logical(cats$likes_string)
cats
    name   coat weight likes_string
1   Luna calico    2.1         TRUE
2 Thomas  black    5.0        FALSE
3  Daisy  tabby    3.2         TRUE

You might think that 1/0 could be a handier coding than TRUE/FALSE because that enables easy counting of the number of times something is true or false. But consider the following R behavior:

TRUE + TRUE
[1] 2

So, logicals can be used as if they were numbers, where FALSE represents 0 and TRUE represents 1.

4.5 Missing values (NA)

R has a concept of missing data, which is important in statistical computing, as not all information/measurements are always available for each sample.

In R, missing values are coded as NA (like TRUE/FALSE, this is not a character string so it is not quoted):

# This vector will contain one missing value
vector_NA <- c(1, 3, NA, 7)
vector_NA
[1]  1  3 NA  7

Notably, many functions operating on vectors will return NA if any element in the vector is NA:

sum(vector_NA)
[1] NA

You can get around this is by setting na.rm = TRUE in such functions, for example:

sum(vector_NA, na.rm = TRUE)
[1] 11

4.6 Factors

Categorical data, like treatments in an experiment, can be stored as “factors” in R. Factors are useful for statistical analyses and for plotting, e.g. because they allow you to specify a custom order.

diet_vec <- c("high", "medium", "low", "low", "medium")
diet_vec
[1] "high"   "medium" "low"    "low"    "medium"
factor(diet_vec)
[1] high   medium low    low    medium
Levels: high low medium

In the example above, we turned a character vector into a factor. Its “levels” (low, medium, high) are sorted alphabetically by default, but we can manually specify an order that makes more sense:

diet_fct <- factor(diet_vec, levels = c("low", "medium", "high"))
diet_fct
[1] high   medium low    low    medium
Levels: low medium high

This ordering would be automatically respected in plots and statistical analyses.

For most intents and purposes, it makes sense to think of factors as another data type, even though technically, they are a kind of data structure build on the integer data type:

typeof(diet_fct)
[1] "integer"

4.7 Learn more

To learn more about data types and data structures, see this episode from a separate Carpentries lesson.


Bonus Challenge

An important part of every data analysis is cleaning input data. Here, you will clean a cat data set that has an added observation with a problematic data entry.

Start by creating the new data frame:

cats_v2 <- data.frame(
  name = c("Luna", "Thomas", "Daisy", "Oliver"),
  coat = c("calico", "black", "tabby", "tabby"),
  weight = c(2.1, 5.0, 3.2, "2.3 or 2.4")
)

Then move on to the tasks below, filling in the blanks (_____) and running the code:

# 1. Explore the data frame,
#    including with an overview that shows the columns' data types:
cats_v2
_____(cats_v2)

# 2. The "weight" column has the incorrect data type _____.
#    The correct data type is: _____.

# 3. Correct the 4th weight with the mean of the two given values,
#    then print the data frame to see the effect:
cats_v2$weight[4] <- 2.35
cats_v2

# 4. Convert the weight column to the right data type:
cats_v2$weight <- _____(cats_v2$weight)

# 5. Calculate the mean weight of the cats:
_____
Click for the solution
# 1. Explore the data frame,
#    including with an overview that shows the columns' data types:
cats_v2
    name   coat     weight
1   Luna calico        2.1
2 Thomas  black          5
3  Daisy  tabby        3.2
4 Oliver  tabby 2.3 or 2.4
str(cats_v2)
'data.frame':   4 obs. of  3 variables:
 $ name  : chr  "Luna" "Thomas" "Daisy" "Oliver"
 $ coat  : chr  "calico" "black" "tabby" "tabby"
 $ weight: chr  "2.1" "5" "3.2" "2.3 or 2.4"
# 2. The "weight" column has the incorrect data type CHARACTER.
#    The correct data type is: DOUBLE/NUMERIC.

# 3. Correct the 4th weight data point with the mean of the two given values,
#    then print the data frame to see the effect:
cats_v2$weight[4] <- 2.35
cats_v2
    name   coat weight
1   Luna calico    2.1
2 Thomas  black      5
3  Daisy  tabby    3.2
4 Oliver  tabby   2.35
# 4. Convert the weight column to the right data type:
cats_v2$weight <- as.double(cats_v2$weight)

# 5. Calculate the mean weight of the cats:
mean(cats_v2$weight)
[1] 3.1625
Back to top

Footnotes

  1. Or Click File => New file => R Script.↩︎

  2. Either double quotes ("...") or single quotes ('...') work, but the former are most commonly used by convention.↩︎

  3. R uses 1-based indexing, which means it starts counting at 1 like humans do. Index 2 therefore simply corresponds to the second element. Python and several other languages use 0-based indexing, which starts counting at 0 such that the second element corresponds to index 1.↩︎