In this post, we can compare countries on the left – right political spectrum and graph the trends.
In the European Social Survey, they ask respondents to indicate where they place themselves on the political spectrum with this question: “In politics people sometimes talk of ‘left’ and ‘right’. Where would you place yourself on this scale, where 0 means the left and 10 means the right?”
lrscale_graph <- round_df %>%
dplyr::filter(country == "IE" | country == "GB" | country == "FR" | country == "DE") %>%
ggplot(aes(x= round, y = mean_lr, group = country)) +
geom_line(aes(color = factor(country)), size = 1.5, alpha = 0.5) +
ggimage::geom_flag(aes(image = country), size = 0.04) +
scale_color_manual(values = my_palette) +
scale_x_discrete(name = "Year", limits=c("2002","2004","2006","2008","2010","2012","2014","2016","2018")) +
labs(title = "Where would you place yourself on this scale,\n where 0 means the left and 10 means the right?",
subtitle = "Source: European Social Survey, 2002 - 2018",
x = "Year",
y = "Left - Right Spectrum")
lrscale_graph + guides(color=guide_legend(title="Country")) + theme_economist()
The European Social Survey (ESS) measure attitudes in thirty-ish countries (depending on the year) across the European continent. It has been conducted every two years since 2001.
The survey consists of a core module and two or more ‘rotating’ modules, on social and public trust; political interest and participation; socio-political orientations; media use; moral, political and social values; social exclusion, national, ethnic and religious allegiances; well-being, health and security; demographics and socio-economics.
So lots of fun data for political scientists to look at.
The very first thing you need to do before you can download any of the data is set your email address.
Don’t forget the email address goes in as a string in “quotations marks”.
Show what countries are in the survey with the show_countries() function.
It’s important to know that country names are case sensitive and you can only use the name printed out by show_countries(). For example, you need to write “Russian Federation” to access Russian survey data; if you write “Russia”…
Using these country names, we can download specific rounds or waves (i.e survey years) with import_country. We have the option to choose the two most recent rounds, 8th (from 2016) and 9th round (from 2018).
ire_data <- import_all_cntrounds("Ireland")
The resulting data comes in the form of nine lists, one for each round
These rounds correspond to the following years:
ESS Round 9 – 2018
ESS Round 8 – 2016
ESS Round 7 – 2014
ESS Round 6 – 2012
ESS Round 5 – 2010
ESS Round 4 – 2008
ESS Round 3 – 2006
ESS Round 2 – 2004
ESS Round 1 – 2002
I want to compare the first round and most recent round to see if Irish people’s views have changed since 2002. In 2002, Ireland was in the middle of an economic boom that we called the “Celtic Tiger”. People did mad things like buy panini presses and second house in Bulgaria to resell. Then the 2008 financial crash hit the country very hard.
Irish people during the Celtic Tiger:
Irish people after the Celtic Tiger crash:
Ireland in 2018 was a very different place. So it will be interesting to see if these social changes translated into attitude changes.
First, we use the import_country() function to download data from ESS. Specify the country and rounds you want to download.
The resulting ire object is a list, so we’ll need to extract the two data.frames from the list:
ire_1 <- ire[]
ire_9 <- ire[]
The exact same questions are not asked every year in ESS; there are rotating modules, sometimes questions are added or dropped. So to merge round 1 and round 9, first we find the common columns with the intersect() function.
All the variables in the dataset are a special class called “haven_labelled“. So we must convert them to numeric variables with a quick function. We exclude the first variable because we want to keep country name as a string character variable.
We can look at the distribution of our variables and count how many missing values there are with the skim() function from the skimr package
We can run a quick t-test to compare the mean attitudes to immigrants on the statement: “Immigrants make country worse or better place to live” across the two survey rounds.
Lower scores indicate an attitude that immigrants undermine Ireland’ quality of life and higher scores indicate agreement that they enrich it!
t.test(att_df$imm_qual_life ~ att_df$round)
In future blog, I will look at converting the raw output of R into publishable tables.
The results of the independent-sample t-test show that if we compare Ireland in 2002 and Ireland in 2018, there has been a statistically significant increase in positive attitudes towards immigrants and belief that Ireland’s quality of life is more enriched by their presence in the country.
As I am currently an immigrant in a foreign country myself, I am glad to come from a country that sees the benefits of immigrants!
If we load the ggpubr package, we can graphically look at the difference in mean attitude scores.
box1 <- ggpubr::ggboxplot(att_df, x = "round", y = "imm_qual_life", color = "round", palette = c("#d11141", "#00aedb"),
ylab = "Attitude", xlab = "Round")
box1 + stat_compare_means(method = "t.test")
It’s not the most glamorous graph but it conveys the shift in Ireland to more positive attitudes to immigration!
I suspect that a country’s economic growth correlates with attitudes to immigration.
The geom_rect() function graphs the coloured rectangles on the plot. I take colours from this color-hex website; the green rectangle for times of economic growth and red for times of recession. Makes sure the geom-rect() comes before the geom_line().
And we can see that there is a relationship between attitudes to immigrants in Ireland and Irish GDP growth. When GDP is growing, Irish people see that immigrants improve quality of life in Ireland and vice versa. The red section of the graph corresponds to the financial crisis.
We can now add the the geom_flag() function to the graph. The y = -50 prevents the flags overlapping with the bars and places them beside their name label. The image argument takes the iso2 variable.
Quick tip: with the reorder argument, if we wanted descending order (rather than ascending order of ODA amounts, we would put a minus sign in front of the oda_per_capita in the reorder() function for the x axis value.
oda_bar <- oda %>%
ggplot(aes(x = reorder(donor, oda_per_capita), y = oda_per_capita, fill = continent)) +
geom_flag(y = -50, aes(image = iso2)) +
geom_bar(stat = "identity") +
labs(title = "ODA donor spending ",
subtitle = "Source: OECD's Development Assistance Committee, 2019 ",
x = "Donor Country",
y = "ODA per capita")
The fill argument categorises the continents of the ODA donors. Sometimes I take my hex colors from https://www.color-hex.com/ website.
We can all agree that Wikipedia is often our go-to site when we want to get information quick. When we’re doing IR or Poli Sci reesarch, Wikipedia will most likely have the most up-to-date data compared to other databases on the web that can quickly become out of date.
So in R, we can scrape a table from Wikipedia and turn into a database with the rvest package .
First, we copy and paste the Wikipedia page we want to scrape into the read_html() function as a string:
Next we save all the tables on the Wikipedia page as a list. Turn the header = TRUE.
nato_tables <- nato_members %>% html_table(header = TRUE, fill = TRUE)
The table that I want is the third table on the page, so use [[two brackets]] to access the third list.
nato_exp <- nato_tables[]
The dataset is not perfect, but it is handy to have access to data this up-to-date. It comes from the most recent NATO report, published in 2019.
Some problems we will have to fix.
The first row is a messy replication of the header / more information across two cells in Wikipedia.
The headers are long and convoluted.
There are a few values in as N/A in the dataset, which R thinks is a string.
All the numbers have commas, so R thinks all the numeric values are all strings.
There are a few NA values that I would not want to impute because they are probably zero. Iceland has no armed forces and manages only a small coast guard. North Macedonia joined NATO in March 2020, so it doesn’t have all the data completely.
So first, let’s do some quick data cleaning:
Clean the variable names to remove symbols and adds underscores with a function from the janitor package
With the WDIsearch() function we can look for the World Bank indicator that measures oil rents as a percentage of a country’s GDP. You can look at the World Bank website and browse all the indicators available.
The output is:
"NY.GDP.PETR.RT.ZS" "Oil rents (% of GDP)"
Copy the indicator string and paste it into the WDI() function. The country codes are the iso2 codes, which you can input as many as you want in the c().
If you want all countries as regions that the World Bank has, do not add country argument.
We can compare Iran and Saudi Arabian oil rents from 1970 until the most recent value.
data = WDI(indicator='NY.GDP.PETR.RT.ZS', country=c('IR', 'SA'), start=1970, end=2019)
And graph out the output. All only takes a few steps.
my_palette = c("#DA0000", "#239f40")
#both the hex colors are from the maps of the countries
oil_graph <- ggplot(oil_data, aes(year, NY.GDP.PETR.RT.ZS, color=country)) +
geom_line(size = 1.4) +
labs(title = "Oil rents as a percentage of GDP",
subtitle = "In Iran and Saudi Arabia from 1970 to 2019",
x = "Year",
y = "Average oil rent as percentage of GDP",
color = " ") +
scale_color_manual(values = my_palette)
oil_graph + theme_fivethirtyeight() +
plot.title = element_text(size = 30),
axis.title.y = element_text(size = 20),
axis.title.x = element_text(size = 20))
For some reason the World Bank does not have data for Iran for most of the early 1990s. But I would imagine that they broadly follow the trends in Saudi Arabia.
I added the flags myself manually after I got frustrated with geom_flag() . It is something I will need to figure out for a future blog post!
It is crazy that in the late 1970s, oil accounted for over 80% of all Saudi Arabia’s Gross Domestic Product. Now we see both countries rely on a far smaller percentage. Due both to the fact that oil prices are volatile, climate change is a new constant threat and resource exhaustion is on the horizon, both countries have adjusted policies in attempts to diversify their sources of income.
Next we can use the World Bank data to create maps and compare regions on any World Bank scores.
# to create maps
library(viridis) # for pretty colors
We will compare all Asian and Middle Easter countries with regard to all natural rents (not just oil) as a percentage of their GDP.
library(forecast) #for time series analysis and graphing
The website Quandl.com is a great resource I came across a while ago, where you can download heaps of datasets for variables such as energy prices, stock prices, World Bank indicators, OECD data other random data.
In order to download the data from the site, you need to first set up an account on the website, and indicate your intended use for the data.
Back on R, you call the API key with Quandl.api_key() function and now you can directly download data!
Now, I click to search only through the free datasets. I have no idea how much a subscription costs but I imagine it is not cheap.
You can browse through the database and when you find the dataset you want, you copy and paste the string code into Quandl() function.
We can choose the class of the time series object we will download with the type = argument.
We also toggle the start_date and end_data of the time series.
So I will download employment data for Ireland from 1980 until 2019 as a zoo object. We can check the Quandl page for the Irish employment data to learn about the data source and the unit of measurement
Click here to check out the Quandl CRAN pdf documentation and learn more about the differen arguments you can use with this function. Here is the generic arguments you can play with when downloading your dataset:
ggtitle("Employment level in Ireland") +
labs("Source: International Monetary Fund data") +
ylab("Employed people (in millions)")
The 1980s were a rough time for unemployment in Ireland. Also the 2008 recession had a sizeable impact on unemployment too. I am afraid how this graph will look with 2020 data.
Next, we can visually examine the autocorrelation plot.
With time series data, it is natural to assume that the value at the current time period is highly related (i.e. serially correlated) to its value in the previous period. This is autocorrelation, and it can be problematic when we want to forecast or run statistics. This is because it violates the assumption that values are independent of each other.
emp_ts <- ts(emp)
There is very high autocorrelation in employment level in Ireland over the years.
In next blog, we will look at how to correct for autocorrelation in time series data.
What is a shiny app, you ask? Click to look at a quick Youtube explainer. It’s basically a handy GUI for R.
When we feed a panel data.frame into the ExPanD() function, a new screen pops up from R IDE (in my case, RStudio) and we can interactively toggle with various options and settings to run a bunch of statistical and visualisation analyses.
Click here to see how to convert your data.frame to pdata.frame object with the plm package.
Be careful your pdata.frame is not too large with too many variables in the mix. This will make ExPanD upset enough to crash. Which, of course, I learned the hard way.
Also I don’t know why there are random capitalizations in the PaCkaGe name. Whenever I read it, I think of that Sponge Bob meme.
If anyone knows why they capitalised the package this way. please let me know!
So to open up the new window, we just need to feed the pdata.frame into the function:
For my computer, I got error messages for the graphing sections, because I had an old version of Cairo package. So to rectify this, I had to first install a source version of Cairo and restart my R session. Then, the error message gods were placated and they went away.
Then press command + shift + F10 to restart R session
You may not have this problem, so just ignore if you have an up-to-date version of the necessary packages.
When the new window opens up, the first section allows you to filter subsections of the panel data.frame. Similar to the filter() argument in the dplyr package.
For example, I can look at just the year 1989:
But let’s look at the full sample
We can toggle with variables to look at mean scores for certain variables across different groups. For example, I look at physical integrity scores across regime types.
Purple plot: closed autocracy
Turquoise plot: electoral autocracy
Khaki plot: electoral democracy:
Peach plot: liberal democracy
The plots show that there is a high mean score for physical integrity scores for liberal democracies and less variance. However with the closed and electoral autocracies, the variance is greater.
We can look at a visualisation of the correlation matrix between the variables in the dataset.
Next we can look at a scatter plot, with option for loess smoother line, to graph the relationship between democracy score and physical integrity scores. Bigger dots indicate larger GDP level.
Last we can run regression analysis, and add different independent variables to the model.
We can add fixed effects.
And we can subset the model by groups.
The first column, the full sample is for all regions in the dataset.
“Nodes” designate the vertices of a network, and “edges” designate its ties. Vertices are accessed using the V() function while edges are accessed with the E(). This igraph object has 232 edges and 16 vertices over the four years.
Furthermore, the igraph object has a name attribute as one of its vertices properties. To access, type:
Which prints off all the countries in the ww1 dataset; this is all the countries that engaged in militarized interstate disputes between the years 1914 to 1918.
Next we can fit an algorithm to modify the graph distances. According to our pal Wikipedia, force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically-pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions!
We can do this in one simple step by feeding the igraph into the algorithm function.
First, we access and store a map object from the rnaturalearth package, with all the spatial information in contains. We specify returnclass = "sf", which will return a dataframe with simple features information.
Simple features or simple feature access refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. Our map has these attributes stored in the object.
With the ne_countries() function, we get the borders of all countries.
This map object comes with lots of information about 241 countries and territories around the world.
In total, it has 65 columns, mostly with different variants of the names / locations of each country / territory. For example, ISO codes for each country. Further in the dataset, there are a few other variables such as GDP and population estimates for each country. So a handy source of data.
However, I want to use values from a different source; I have a freedom_df dataframe with a freedom of association variable.
The freedom of association index broadly captures to what extent are parties, including opposition parties, allowed to form and to participate in elections, and to what extent are civil society organizations able to form and to operate freely in each country.
So, we can merge them into one dataset.
Before that, I want to only use the scores from the most recent year to map out. So, take out only those values in the year 2019 (don’t forget the comma sandwiched between the round bracket and the square bracket):
We’re all ready to graph the map. We can add the freedom of association variable into the aes() argument of the geom_sf() function. Again, the sf refers to simple features with geospatial information we will map out.
assoc_graph <- ggplot(data = map19) +
geom_sf(aes(fill = freedom_association_index),
position = "identity") +
labs(fill='Freedom of Association Index') +
scale_fill_viridis_c(option = "viridis")
The scale_fill_viridis_c(option = "viridis") changes the color spectrum of the main variable.
Other options include:
And various others. Click here to learn more about this palette package.
Finally we call the new graph stored in the assoc_graph object.
I use the theme_map() function from the ggtheme package to make the background very clean and to move the legend down to the bottom left of the screen where it takes up the otherwise very empty Pacific ocean / Antarctic expanse.
And there we have it, a map of countries showing the Freedom of Association index across countries.
The index broadly captures to what extent are parties, including opposition parties, allowed to form and to participate in elections, and to what extent are civil society organizations able to form and to operate freely.
Yellow colors indicate more freedom, green colors indicate middle scores and blue colors indicate low levels of freedom.
Some of the countries have missing data, such as Germany and Yemen, for various reasons. A true perfectionist would go and find and fill in the data manually.