library(tidyverse)
1 Load libraries
Let’s add some material to our document so we can better see what our resulting documents will look like. This will also give us an opportunity to practice some of what we’ve been going over in Code Club this semester.
2 Download data
Let’s go back to what we started with this semester. Let’s revisit data from The World Factbook, put together by the CIA to “provides basic intelligence on the history, people, government, economy, energy, geography, environment, communications, transportation, military, terrorism, and transnational issues for 265 world entities.” I thought this data would give us some opportunities to flex our R skills, and learn a bit about the world.
The data we are going to download can be found here, though I have saved the file, added it to our Code Club Github, and included some code below for you to download it. This is a little bit different than the data we started with which included only info from 2015. This dataset includes many more years.
download.file(
url = "https://github.com/osu-codeclub/osu-codeclub.github.io/raw/refs/heads/main/posts/S08E01_wrangling_01/data/factbook.csv",
destfile = "factbook_download.csv"
)
You should now see the file “factbook_download.csv” in your working directory.
3 Read in data
We can read it in using the tidyverse function from the readr
package called read_csv()
.
# i've stored mine in a folder called data for organizational sake
<- read_csv("data/factbook_download.csv") factbook
Let’s look at our data.
View(factbook)
4 Wrangle
Let’s pull just the data for total population.
<- factbook |>
factbook_pop filter(`Series Name` == "Population, total")
And then we can look at it:
head(factbook_pop)
glimpse(factbook_pop)
4.1 Pivot
Looks like our year columns are characters, let’s convert them to be numeric, and in the process practice pivoting.
<- factbook_pop |>
factbook_pop_long pivot_longer(cols = starts_with("2"), # pick columns start with 2
names_to = "year", # take names to new col "year"
values_to = "pop") |> # values in cells to new col "pop"
mutate(year = parse_number(year)) |> # use mutate to remove extra year garbage
mutate(pop = as.numeric(pop)) # convert pop to be numeric
glimpse(factbook_pop_long)
Now that we’ve cleaned that up, let’s go back wide to calculate which country had the largest percent population growth from 2000 to 2015.
Go wide! And let’s clean up those column names at the same time.
<- factbook_pop_long |>
factbook_pop_wide pivot_wider(names_from = year, # go from long to wide data
values_from = pop) |>
::clean_names() janitor
4.2 Calculate percent population growth
Let’s now see which country had the largest percent population growth from 2000 to 2015.
|>
factbook_pop_wide mutate(perc_pop_growth = ((x2015 - x2000)/x2000 * 100)) |>
mutate(perc_pop_growth = round(perc_pop_growth, digits = 1)) |>
select(country_name, perc_pop_growth, x2000, x2015) |> # pull only the columns we want
slice_max(perc_pop_growth, n = 5) # pick top 5
And which country had the smallest percent population growth from 2000 to 2015.
|>
factbook_pop_wide mutate(perc_pop_growth = ((x2015 - x2000)/x2000 * 100)) |>
select(country_name, perc_pop_growth, x2000, x2015) |> # pull only the columns we want
slice_min(perc_pop_growth, n = 5) # pick lowest 5