We can now add the the geom_flag() function to the graph. The y = -50 prevents the flags overlapping with the bars and places them beside their name label. The image argument takes the iso2 variable.
Quick tip: with the reorder argument, if we wanted descending order (rather than ascending order of ODA amounts, we would put a minus sign in front of the oda_per_capita in the reorder() function for the x axis value.
oda_bar <- oda %>%
ggplot(aes(x = reorder(donor, oda_per_capita), y = oda_per_capita, fill = continent)) +
geom_flag(y = -50, aes(image = iso2)) +
geom_bar(stat = "identity") +
labs(title = "ODA donor spending ",
subtitle = "Source: OECD's Development Assistance Committee, 2019 ",
x = "Donor Country",
y = "ODA per capita")
The fill argument categorises the continents of the ODA donors. Sometimes I take my hex colors from https://www.color-hex.com/ website.
We can all agree that Wikipedia is often our go-to site when we want to get information quick. When we’re doing IR or Poli Sci reesarch, Wikipedia will most likely have the most up-to-date data compared to other databases on the web that can quickly become out of date.
So in R, we can scrape a table from Wikipedia and turn into a database with the rvest package .
First, we copy and paste the Wikipedia page we want to scrape into the read_html() function as a string:
Next we save all the tables on the Wikipedia page as a list. Turn the header = TRUE.
nato_tables <- nato_members %>% html_table(header = TRUE, fill = TRUE)
The table that I want is the third table on the page, so use [[two brackets]] to access the third list.
nato_exp <- nato_tables[]
The dataset is not perfect, but it is handy to have access to data this up-to-date. It comes from the most recent NATO report, published in 2019.
Some problems we will have to fix.
The first row is a messy replication of the header / more information across two cells in Wikipedia.
The headers are long and convoluted.
There are a few values in as N/A in the dataset, which R thinks is a string.
All the numbers have commas, so R thinks all the numeric values are all strings.
There are a few NA values that I would not want to impute because they are probably zero. Iceland has no armed forces and manages only a small coast guard. North Macedonia joined NATO in March 2020, so it doesn’t have all the data completely.
So first, let’s do some quick data cleaning:
Clean the variable names to remove symbols and adds underscores with a function from the janitor package
library(magrittr) # for pipes
# to create maps
library(viridis) # for pretty colors
We will use this package to really quickly access all the indicators from the World Bank website.
Below is a screenshot of the World Bank’s data page where you can search through all the data with nice maps and information about their sources, their available years and the unit of measurement et cetera.
In R when we download the WDI package, we can download the datasets directly into our environment.
With the WDIsearch() function we can look for the World Bank indicator.
For this blog, we want to map out how dependent countries are on oil. We will download the dataset that measures oil rents as a percentage of a country’s GDP.
The output is:
"NY.GDP.PETR.RT.ZS" "Oil rents (% of GDP)"
Copy the indicator string and paste it into the WDI() function. The country codes are the iso2 codes, which you can input as many as you want in the c().
If you want all countries that the World Bank has, do not add country argument.
We can compare Iran and Saudi Arabian oil rents from 1970 until the most recent value.
data = WDI(indicator='NY.GDP.PETR.RT.ZS', country=c('IR', 'SA'), start=1970, end=2019)
And graph out the output. All only takes a few steps.
my_palette = c("#DA0000", "#239f40")
#both the hex colors are from the maps of the countries
oil_graph <- ggplot(oil_data, aes(year, NY.GDP.PETR.RT.ZS, color = country)) +
geom_line(size = 1.4) +
labs(title = "Oil rents as a percentage of GDP",
subtitle = "In Iran and Saudi Arabia from 1970 to 2019",
x = "Year",
y = "Average oil rent as percentage of GDP",
color = " ") +
scale_color_manual(values = my_palette)
plot.title = element_text(size = 30),
axis.title.y = element_text(size = 20),
axis.title.x = element_text(size = 20))
For some reason the World Bank does not have data for Iran for most of the early 1990s. But I would imagine that they broadly follow the trends in Saudi Arabia.
I added the flags myself manually after I got frustrated with geom_flag() . It is something I will need to figure out for a future blog post!
It is really something that in the late 1970s, oil accounted for over 80% of all Saudi Arabia’s Gross Domestic Product.
Now we see both countries rely on a far smaller percentage. Due both to the fact that oil prices are volatile, climate change is a new constant threat and resource exhaustion is on the horizon, both countries have adjusted policies in attempts to diversify their sources of income.
Next we can use the World Bank data to create maps and compare regions on any World Bank scores.
We will compare all Asian and Middle Eastern countries with regard to all natural rents (not just oil) as a percentage of their GDP.
What is a shiny app, you ask? Click to look at a quick Youtube explainer. It’s basically a handy GUI for R.
When we feed a panel data.frame into the ExPanD() function, a new screen pops up from R IDE (in my case, RStudio) and we can interactively toggle with various options and settings to run a bunch of statistical and visualisation analyses.
Click here to see how to convert your data.frame to pdata.frame object with the plm package.
Be careful your pdata.frame is not too large with too many variables in the mix. This will make ExPanD upset enough to crash. Which, of course, I learned the hard way.
Also I don’t know why there are random capitalizations in the PaCkaGe name. Whenever I read it, I think of that Sponge Bob meme.
If anyone knows why they capitalised the package this way. please let me know!
So to open up the new window, we just need to feed the pdata.frame into the function:
For my computer, I got error messages for the graphing sections, because I had an old version of Cairo package. So to rectify this, I had to first install a source version of Cairo and restart my R session. Then, the error message gods were placated and they went away.
Then press command + shift + F10 to restart R session
You may not have this problem, so just ignore if you have an up-to-date version of the necessary packages.
When the new window opens up, the first section allows you to filter subsections of the panel data.frame. Similar to the filter() argument in the dplyr package.
For example, I can look at just the year 1989:
But let’s look at the full sample
We can toggle with variables to look at mean scores for certain variables across different groups. For example, I look at physical integrity scores across regime types.
Purple plot: closed autocracy
Turquoise plot: electoral autocracy
Khaki plot: electoral democracy:
Peach plot: liberal democracy
The plots show that there is a high mean score for physical integrity scores for liberal democracies and less variance. However with the closed and electoral autocracies, the variance is greater.
We can look at a visualisation of the correlation matrix between the variables in the dataset.
Next we can look at a scatter plot, with option for loess smoother line, to graph the relationship between democracy score and physical integrity scores. Bigger dots indicate larger GDP level.
Last we can run regression analysis, and add different independent variables to the model.
We can add fixed effects.
And we can subset the model by groups.
The first column, the full sample is for all regions in the dataset.