Introduction to ggplot2

More about aesthetics.

ggplot2
dataviz
plotting
tidyverse
Author

Horacio Lopez-Nicora

Published

January 21, 2025


Artwork by Allison Horst.


1 Introduction

Recap of the past session

Last week’s session marked the beginning of our data visualization journey with ggplot2. We explored the philosophy of coding graphics, created a versatile ggplot template for various charts, and discovered how to add visual elements using aesthetics and layers. Exciting times ahead!

Session Goals

  • Let’s pick up on aesthetics and learn some more about it. Furthermore, let’s introduce the theme() function.
  • Learn the basic of theme, scale, color, transparency, size, and more!


2 Our data set

Illustration by Allison Horst

We are going to continue using our 🐧 data set from the package palmerpenguins. If you haven’t done so, please install that package first:

install.packages("palmerpenguins")

palmerpenguins is a package developed by Allison Horst, Alison Hill and Kristen Gorman, including a data set collected by Dr. Kristen Gorman at the Palmer Station Antarctica, as part of the Long Term Ecological Research Network. It is a nice, relatively simple data set to practice data exploration and visualization in R.

We’ll now load the package, along with the tidyverse (which includes ggplot2):

library(palmerpenguins)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Once you’ve loaded that package you will have a data frame called penguins at your disposal — let’s take a look:

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# Or glimpse() for a sort of transposed view, so we can see all columns:
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…


3 The Absolute Power of Aesthetics

Aesthetics enable us to showcase multiple dimensions of our dataset in a single plot by modifying elements such as color, shape, size, labels, and transparency.

Aesthetics enable us to showcase multiple dimensions of our dataset in a single plot by modifying elements such as color, shape, size, labels, and transparency.

3.1 Last week’s example on “The power of aesthetics”

Last week we added a third aesthetic to our graph, color. Our current plot mapped bill_length_mm to the x aesthetic, and bill_depth_mm to the y aesthetic — , we then added a mapping of species to the color aesthetic:

p <- ggplot(data = penguins) +
  geom_point(mapping = aes(x = bill_length_mm,
                           y = bill_depth_mm,
                           color = species))
p

Please note that we begin by using our data set to create a plot object with the function ggplot(). We then assign this object to the variable p. From this point forward, we can add layers by using the + operator.

3.2 What if we want to customize our plot?

We can do this using theme(). There are several options of themes which control all non-data display. Use theme() if you just need to tweak the display of an existing theme.

For this session, let’s utilize theme_bw().

p <- ggplot(data = penguins) + theme_bw() +
  geom_point(mapping = aes(x = bill_length_mm,
                           y = bill_depth_mm,
                           color = species))
p

3.3 What if we want to change colors?

We can manually change colors.

# Manual color change
# By using scale_colour_manual(),
# we can specify the exact colours we want to use
p + 
scale_color_manual(
  # Note that the color order will correspond to
  # the order of the species given in the legend
  values = c("grey55", "orange", "skyblue"))

By using the colors() function, you can explore numerous color options that are available for selection.

3.4 Using a different color palette: RColorBrewer

There is a wide variety of R color packages specifically designed to offer a range of color palette options, each evoking a distinct mood. For instance, the RColorBrewer package provides a choice of 35 palettes!

At this point, you have become an expert in the fundamentals of R. Installing packages and loading them with library() is now second nature to you.

Now, we can install RColorBrewer and choose one from the many palettes it offers.

RColorBrewer::display.brewer.all(type = "qual")

Above, you can observe the organization of the colors into distinct groups based on their sequential, diverging, or mixed characteristics. It is important to note that varied palettes are advantageous for varying data types.

Let’s use these palettes with our original penguin graph. Here is an example demonstrating how the Set1 palette is utilized to group data points with the function scale_color_brewer() and the palette argument.

p + 
scale_color_brewer(palette = "Set1")

In addition, the ggplot2 package offers other functions. Two such functions are scale_color_viridis() and scale_color_grey(), which allows us to convert colors to grayscale without sacrificing information. This is especially important for individuals with colorblindness.

p + 
scale_color_viridis_d()

Please note that when using scale_color_viridis() to color data points, we need to specify whether our variable is continuous [using scale_color_viridis_c()] or discrete [using scale_color_viridis_d()]. In this case, the variable species is discrete.


Exercise 1

  • Let us revisit the scatter plot depicting the correlation between bill length and depth, distinguished by different species using colored data points.

  • What if we want to use only a grayscale palette for publication purposes?

Hints (click here)

We can use the scale_color_grey() function to color our grouped data points.

Solutions (click here)
p + 
scale_color_grey()


3.5 Colorblind-friendly palettes

Have you ever contemplated how your figure might appear when viewed by individuals with different types of color blindness? We can utilize the colorBlindness package to explore this aspect.

Let’s install the colorBlindness package and load it.

# Let's install the colorBlindness package
install.packages("colorBlindness")
library(colorBlindness)

To begin with, let’s test out various colors using the cvdPlot() function. This will demonstrate how our current plot appears to individuals with different types of color blindness.

colorBlindness::cvdPlot(p)

Our current color palette is not accessible, as can be observed. However, by using viridis palettes, we can ensure that our plots consistently convey the same information, regardless of the audience.

Let’s use the same viridis palette we used above to make our plot more accessible.

p_viridis <- p + 
scale_color_viridis_d()

Were we successful? Let’s use cvdPlot() to check again.

colorBlindness::cvdPlot(p_viridis)

3.6 Providing transparency with alpha

How can we incorporate transparency into the data points in our graph? One way to achieve this is by utilizing the alpha feature.

ggplot(data = penguins) +
  geom_point(alpha = 0.5, mapping = aes(x = bill_length_mm, 
                                        y = bill_depth_mm, 
                                        color = species))

Here, we present an example of how to utilize the alpha function to incorporate transparency into our data points. By doing so, we are able to exhibit four variables within a single graph.

ggplot(data = penguins) +
  geom_point(mapping = aes(x = bill_length_mm,
                           y = bill_depth_mm,
                           color = species,
                           alpha = flipper_length_mm))

Exercise 2

  • We like the graph we produced above. We want, however, to also display body_mass_g.

  • How can we add this additional variable to our graph?

Hints (click here)

We can use the size function within aes().

Solutions (click here)
ggplot(data = penguins) +
  geom_point(mapping = aes(x = bill_length_mm,
                           y = bill_depth_mm,
                           color = species,
                           alpha = flipper_length_mm,
                           size = body_mass_g))


Back to top