Reproducibility 3: More about Quarto

reproducibility

quarto

Author

Jessica Cooperstone

Published

October 28, 2024

1 A recap

Last week we introduced Quarto, and using Quarto within RStudio. Next week, we will go over how to push our Quarto document to Github.

Open up the .Rproj you are using for Code Club. We can open up a new Quarto document by going to File > New File > Quarto Document.

Important

Does everyone have an .Rproj set up? We are going to need this for next week!

Let’s add some material to our document so we can better see what our resulting documents will look like. This will also give us an opportunity to practice some of what we’ve been going over in Code Club this semester.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Let’s go back to what we started with this semester. Let’s revisit data from The World Factbook, put together by the CIA to “provides basic intelligence on the history, people, government, economy, energy, geography, environment, communications, transportation, military, terrorism, and transnational issues for 265 world entities.” I thought this data would give us some opportunities to flex our R skills, and learn a bit about the world.

The data we are going to download can be found here, though I have saved the file, added it to our Code Club Github, and included some code below for you to download it. This is a little bit different than the data we started with which included only info from 2015. This dataset includes many more years.

download.file(
  url = "https://github.com/osu-codeclub/osu-codeclub.github.io/raw/refs/heads/main/posts/S08E01_wrangling_01/data/factbook.csv",
  destfile = "factbook_download.csv"
)

You should now see the file “factbook_download.csv” in your working directory.

We can read it in using the tidyverse function from the readr package called read_csv().

# i've stored mine in a folder called data for organizational sake
factbook <- read_csv("data/factbook_download.csv")

Rows: 11072 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (20): Series Name, Series Code, Country Name, Country Code, 2000 [YR2000...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s look at our data.

View(factbook)

Let’s pull just the data for total population.

factbook_pop <- factbook |> 
  filter(`Series Name` == "Population, total")

And then we can look at it:

head(factbook_pop)

Series Name	Series Code	Country Name	Country Code	2000 [YR2000]	2001 [YR2001]	2002 [YR2002]	2003 [YR2003]	2004 [YR2004]	2005 [YR2005]	2006 [YR2006]	2007 [YR2007]	2008 [YR2008]	2009 [YR2009]	2010 [YR2010]	2011 [YR2011]	2012 [YR2012]	2013 [YR2013]	2014 [YR2014]	2015 [YR2015]
Population, total	SP.POP.TOTL	Afghanistan	AFG	19542982	19688632	21000256	22645130	23553551	24411191	25442944	25903301	26427199	27385307	28189672	29249157	30466479	31541209	32716210	33753499
Population, total	SP.POP.TOTL	Albania	ALB	3089027	3060173	3051010	3039616	3026939	3011487	2992547	2970017	2947314	2927519	2913021	2905195	2900401	2895092	2889104	2880703
Population, total	SP.POP.TOTL	Algeria	DZA	30774621	31200985	31624696	32055883	32510186	32956690	33435080	33983827	34569592	35196037	35856344	36543541	37260563	38000626	38760168	39543154
Population, total	SP.POP.TOTL	American Samoa	ASM	58230	58324	58177	57941	57626	57254	56837	56383	55891	55366	54849	54310	53691	52995	52217	51368
Population, total	SP.POP.TOTL	Andorra	AND	66097	67820	70849	73907	76933	79826	80221	78168	76055	73852	71519	70567	71013	71367	71621	71746
Population, total	SP.POP.TOTL	Angola	AGO	16394062	16941587	17516139	18124342	18771125	19450959	20162340	20909684	21691522	22507674	23364185	24259111	25188292	26147002	27128337	28127721

glimpse(factbook_pop)

Rows: 217
Columns: 20
$ `Series Name`   <chr> "Population, total", "Population, total", "Population,…
$ `Series Code`   <chr> "SP.POP.TOTL", "SP.POP.TOTL", "SP.POP.TOTL", "SP.POP.T…
$ `Country Name`  <chr> "Afghanistan", "Albania", "Algeria", "American Samoa",…
$ `Country Code`  <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "ARG"…
$ `2000 [YR2000]` <chr> "19542982", "3089027", "30774621", "58230", "66097", "…
$ `2001 [YR2001]` <chr> "19688632", "3060173", "31200985", "58324", "67820", "…
$ `2002 [YR2002]` <chr> "21000256", "3051010", "31624696", "58177", "70849", "…
$ `2003 [YR2003]` <chr> "22645130", "3039616", "32055883", "57941", "73907", "…
$ `2004 [YR2004]` <chr> "23553551", "3026939", "32510186", "57626", "76933", "…
$ `2005 [YR2005]` <chr> "24411191", "3011487", "32956690", "57254", "79826", "…
$ `2006 [YR2006]` <chr> "25442944", "2992547", "33435080", "56837", "80221", "…
$ `2007 [YR2007]` <chr> "25903301", "2970017", "33983827", "56383", "78168", "…
$ `2008 [YR2008]` <chr> "26427199", "2947314", "34569592", "55891", "76055", "…
$ `2009 [YR2009]` <chr> "27385307", "2927519", "35196037", "55366", "73852", "…
$ `2010 [YR2010]` <chr> "28189672", "2913021", "35856344", "54849", "71519", "…
$ `2011 [YR2011]` <chr> "29249157", "2905195", "36543541", "54310", "70567", "…
$ `2012 [YR2012]` <chr> "30466479", "2900401", "37260563", "53691", "71013", "…
$ `2013 [YR2013]` <chr> "31541209", "2895092", "38000626", "52995", "71367", "…
$ `2014 [YR2014]` <chr> "32716210", "2889104", "38760168", "52217", "71621", "…
$ `2015 [YR2015]` <chr> "33753499", "2880703", "39543154", "51368", "71746", "…

Looks like our year columns are characters, let’s convert them to be numeric, and in the process practice pivoting.

factbook_pop_long <- factbook_pop |> 
  pivot_longer(cols = starts_with("2"), # pick columns start with 2
               names_to = "year", # take names to new col "year"
               values_to = "pop") |> # values in cells to new col "pop"
  mutate(year = parse_number(year)) |> # use mutate to remove extra year garbage
  mutate(pop = as.numeric(pop)) # convert pop to be numeric

Now that we’ve cleaned that up, let’s go back wide to calculate which country had the largest percent population growth from 2000 to 2015.

Go wide! And let’s clean up those column names at the same time.

factbook_pop_wide <- factbook_pop_long |> 
  pivot_wider(names_from = year, # go from long to wide data
              values_from = pop) |> 
  janitor::clean_names()

Let’s now see which country had the largest percent population growth from 2000 to 2015.

factbook_pop_wide |> 
  mutate(perc_pop_growth = ((x2015 - x2000)/x2000 * 100)) |> 
  select(country_name, perc_pop_growth, x2000, x2015) |> # pull only the columns we want
  slice_max(perc_pop_growth, n = 5) # pick top 5

# A tibble: 5 × 4
  country_name             perc_pop_growth   x2000   x2015
  <chr>                              <dbl>   <dbl>   <dbl>
1 Qatar                              274.   645937 2414573
2 United Arab Emirates               172.  3275333 8916899
3 Kuwait                             102.  1934901 3908743
4 Equatorial Guinea                   96.6  684977 1346973
5 Turks and Caicos Islands            94.9   18744   36538

factbook_pop_wide |> 
  mutate(perc_pop_growth = ((x2015 - x2000)/x2000 * 100)) |> 
  select(country_name, perc_pop_growth, x2000, x2015) |> # pull only the columns we want
  slice_min(perc_pop_growth, n = 5) # pick lowest 5

# A tibble: 5 × 4
  country_name             perc_pop_growth   x2000   x2015
  <chr>                              <dbl>   <dbl>   <dbl>
1 Northern Mariana Islands           -35.9   80338   51514
2 Lithuania                          -17.0 3499536 2904910
3 Latvia                             -16.5 2367550 1977527
4 Bosnia and Herzegovina             -15.7 4179350 3524324
5 Bulgaria                           -12.1 8170172 7177991

Now that we have some stuff, we can now see how making changes to our Quarto document affects the output.

Remember, there are three parts of a Quarto document:

The YAML (rhymes with camel) header
Text
Code

2 YAML

Horacio talked last week about the YAML. The YAML is where you can set the content that will show up on the top of your knitted document, as well as control how your document is rendered.

The YAML is surrounded by three dashes ---.

Here’s a simple example:

---
title: "This is my descriptive title"
author: "Jessica Cooperstone"
date: "October 28, 2024"
format: html
editor: visual 
---

But we can make some changes to arguments we pass to our YAML that will adjust how our resulting report looks. For example, the code below will add a table of contents, and number the sections according to the header levels we set.

---
title: "My document"
format:
  html: # set parameters under the html category
    toc: true # add a table of contents
    number-sections: true # incremental numbering of sections
    
---

Let’s look at what our options are for rendering to .html here.

2.1 Rendering to other formats

We’ve been practicing by rendering to a .htmlfile, but you can render your .qmd document to other formats, including PDFs, Microsoft Word, Markdown, and a special one that we will talk about in the coming weeks called Github (or Github Flavored Markdown (GFM)). Here is an example of what some code that comes from my team that you push to Github could look like, and could serve as supplementary material for a paper, for example.

You can see all the different formats you can render a Quarto document to here.

2.2 Themes

You can also change the theming of your document to make it look very pretty. Quarto comes with some complete themes, which we can look at here with Bootswatch. You can see the full list of complete themes here.

This website for example, uses the theme flatly (and darkly if you are a dark mode afficionado). The Quarto website uses the theme cosmo.

You can set your theme in your YAML like this:

---
title: "My document"
format:
  html: # set parameters under the html category
    theme: litera
---

2.3 Practice

Try adding a theme to your .qmd
Add a new parameter to your YAML - you can pick one from here and see how that goes.

3 Text

Unlike an R script (.R), where R by default interprets anything as code (and material that isn’t code needed to be commented out by using #), in an Quarto, the default is text (and code exists only within code chunks or backticks).

The text portion of the document is written in a language called Markdown. The philosophy of Markdown is that it is easy to both write and read. If you want to learn more about markup languages I’d recommend the this brief explanation by Michael Broe from a past Code Club Session and the Markup language wikipedia page.

Below I’m compiling some commonly used markdown syntax.

A list of the commonly used markdown syntax. — Figure from R Markdown Reference Guide

4 Code

Like Horacio taught us last week, Code chunks are sections of your Quarto document designated for executing code. To insert a new code chunk, you can:

Use the keyboard shortcut Cmd + Option + I (Mac) or Ctrl + Alt + I (Windows).
Type ```{r} to start the chunk and ``` to end it, placing your code in between.
Use the “Add Chunk” command from the editor toolbar and select R.

Code chunks appear as follows:

You place your code on the empty line within the chunk. You can include multiple lines of code in a single chunk; however, if you find yourself needing to scroll through the chunk, it might be too lengthy.

The gear icon allows you to modify chunk options, which we will discuss in more detail later.
The triangle with a line below it executes all code chunks that precede the current one.
The play button runs the current chunk.

Warning

When you render your Quarto document, the process will execute all the code within it. This means that if your code contains errors or doesn’t function properly, your document will not be rendered.

4.1 Code chunk options

We can set different options for our code chunks to adjust if/how they are run. Here are some that we can set.

echo: FALSE runs your code chunk, displays output, but does not display code in your final doc (this is useful if you want to show a figure but not the code used to create it)
eval: FALSE does not run your code, but does display it in your final doc
include: FALSE runs your code but does not display the code or its output in your final doc
message: FALSE prevents messages from showing up in your final doc
warning: FALSE prevents earnings from showing up in your final doc
fig.height: X and fig.width: Y will allow you to specify the dimensions of your figures (in inches)
fig.align: can be set to “left”, “right”, or “center”
fig.cap: "Your figure caption" will allow you to set a figure caption
fig.alt: "Your alt text" will allow you to set alt text for screen readers
cache: TRUE will cache results, meaning if you have a chunk that takes a long time to run, if you haven’t changed anything and you knit again, the code won’t run again but access the cache.

You can find a long list of code chunk options here.

We can set the code chunk options 3 ways:

by using the syntax |# within the chunk, like this:

{r}
#| echo: TRUE 
#| warning: FALSE

by clicking on the gear icon in the top right corner of a code chunk.

within the {r} of a chunk

{r, echo = TRUE, fig.width = 6}

The options can be very useful to get your document to render exactly how you want it.

4.2 Practice

Try adjusting your code chunk options and see how that affects the rendering of your document.
Notice what gets printed after you load the tidyverse with library(tidyverse) - can you get that to go away?

4.3 Prep for next week

We are going to render our .qmd file to GitHub Flavored Markdown (GFM) to prepare to push it to GitHub next week. Let’s do that by adjusting our YAML and rendering our document.

---
title: "My first GitHub document"
author: "Jessica Cooperstone"
date: "October 28, 2024"
format: gfm
---

5 Other things you can make with Quarto

We have focused on using Quarto to store and annotate code, and create reports based on that information. But, there is lots more you can do with Quarto, including:

Websites: we have a whole series of Code Club sessions on making a website with Quarto. This website is made with Quarto!
Presentations: you can make pretty nice looking presentations, here’s a Code Club session on the basics of making presentations
Books
Manuscripts
Dashboards
and more