Easystats is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models, according to their github repo.
Click here to browse the github and here to go to the specific perfomance package CRAN PDF
We are going to look at a few questions from the 2019 US Pew survey on relations with foreign countries.
Data can be found by following this link:
We are going to make bar charts to plot out responses to the question asked to American participaints: Should the US cooperate more or less with some key countries? The countries asked were China, Russia, Germany, France, Japan and the UK.
Before we dive in, we can find some nice hex colors for the bar chart. There are four possible responses that the participants could give: cooperate more, cooperate less, cooperate the same as before and refuse to answer / don’t know.
pal <- c("Cooperate more" = "#0a9396",
"Same as before" = "#ee9b00",
"Don't know" = "#005f73",
"Cooperate less" ="#ae2012")
We first select the questions we want from the full survey and pivot the dataframe to long form with pivot_longer(). This way we have a single column with all the different survey responses. that we can manipulate more easily with dplyr functions.
Then we summarise the data to count all the survey reponses for each of the four countries and then calculate the frequency of each response as a percentage of all answers.
Then we mutate the variables so that we can add flags. The geom_flag() function from the ggflags packages only recognises ISO2 country codes in lower cases.
After that we change the factors level for the four responses so they from positive to negative views of cooperation
We use the position = "stack" to make all the responses “stack” onto each other for each country. We use stat = "identity" because we are not counting each reponses. Rather we are using the freq variable we calculated above.
geom_bar(aes(x = forcats::fct_reorder(country_question, freq), y = freq, fill = response_string), color = "#e5e5e5", size = 3, position = "stack", stat = "identity") +
geom_flag(aes(x = country_question, y = -0.05 , country = country_question), color = "black", size = 20) -> pew_graph
And last we change the appearance of the plot with the theme function
scale_fill_manual(values = pal) +
ggtitle("Should the US cooperate more or less with the following country?") +
theme(legend.title = element_blank(),
legend.position = "top",
legend.key.size = unit(2, "cm"),
text = element_text(size = 25),
legend.text = element_text(size = 20),
axis.text = element_blank())
We will plot out a lollipop plot to compare EU countries on their level of income inequality, measured by the Gini coefficient.
A Gini coefficient of zero expresses perfect equality, where all values are the same (e.g. where everyone has the same income). A Gini coefficient of one (or 100%) expresses maximal inequality among values (e.g. for a large number of people where only one person has all the income or consumption and all others have none, the Gini coefficient will be nearly one).
To start, we will take data on the EU from Wikipedia. With rvest package, scrape the table about the EU countries from this Wikipedia page.
Next some data cleaning and grouping the year member groups into different decades. This indicates what year each country joined the EU. If we see clustering of colours on any particular end of the Gini scale, this may indicate that there is a relationship between the length of time that a country was part of the EU and their domestic income inequality level. Are the founding members of the EU more equal than the new countries? Or conversely are the newer countries that joined from former Soviet countries in the 2000s more equal. We can visualise this with the following mutations:
To create the lollipop plot, we will use the geom_segment() functions. This requires an x and xend argument as the country names (with the fct_reorder() function to make sure the countries print out in descending order) and a y and yend argument with the gini number.
All the countries in the EU have a gini score between mid 20s to mid 30s, so I will start the y axis at 20.
We can add the flag for each country when we turn the ISO2 character code to lower case and give it to the country argument.
Click here to read Part 1 about downloading Eurostat data.
prison_pop <- get_eurostat("crim_pris_pop", type = "label")
prison_pop$iso3 <- countrycode::countrycode(prison_pop$geo, "country.name", "iso3c")
prison_pop$year <- as.numeric(format(prison_pop$time, format = "%Y"))
Next we will download map data with the rnaturalearth package. Click here to read more about using this package.
We only want to zoom in on continental EU (and not include islands and territories that EU countries have around the world) so I use the coordinates for a cropped European map from this R-Bloggers post.
We will focus only on European countries and we will change the variable from total prison populations to prison pop as a percentage of total population. Finally we multiply by 1000 to change the variable to per 1000 people and not have the figures come out with many demical places.
I will admit that I did not create the full map in ggplot. I added the final titles and block colours with canva.com because it was just easier! I always find fonts very tricky in R so it is nice to have dozens of different fonts in Canva and I can play around with colours and font sizes without needing to reload the plot each time.
Here is a short list from the package description of all the key variables that can be quickly added:
We create the dyad dataset with the create_dyadyears() function. A dyad-year dataset focuses on information about the relationship between two countries (such as whether the two countries are at war, how much they trade together, whether they are geographically contiguous et cetera).
In the literature, the study of interstate conflict has adopted a heavy focus on dyads as a unit of analysis.
Alternatively, if we want just state-year data like in the previous blog post, we use the function create_stateyears()
We can add the variables with type D to the create_dyadyears() function and we can add the variables with type S to the create_stateyears() !
Focusing on the create_dyadyears() function, the arguments we can include are directed and mry.
The directed argument indicates whether we want directed or non-directed dyad relationship.
In a directed analysis, data include two observations (i.e. two rows) per dyad per year (such as one for USA – Russia and another row for Russia – USA), but in a nondirected analysis, we include only one observation (one row) per dyad per year.
The mry argument indicates whether they want to extend the data to the most recently concluded calendar year – i.e. 2020 – or not (i.e. until the data was last available).
With this dataframe, we can plot the CINC data of the top three superpowers, just looking at any variable that has a 1 at the end and only looking at the corresponding country_1!
According to our pals over at le Wikipedia, the Composite Index of National Capability (CINC) is a statistical measure of national power created by J. David Singer for the Correlates of War project in 1963. It uses an average of percentages of world totals in six different components (such as coal consumption, military expenditure and population). The components represent demographic, economic, and military strength
No we can create our pyramid chart with the pyramid_chart() from the ggcharts package. The first argument is the age category for both the 2011 and 2016 data. The second is the actual population counts for each year. Last, enter the group variable that indicates the year.
To read more about the countrycode package in the CRAN PDF, click here.
First create a new name for the variable I want to make; I’ll call it COWcode in the dataset.
Then use the countrycode() function. First type in the brackets the name of the original variable that contains the list of countries in the dataset. Then finally add "country.name", "cown". This turns the word name for each country into the numeric COW code.
To check out the COW database website, click here.
Alternative codes than the country.name and the cown options include:
• ccTLD: IANA country code top-level domain • country.name: country name (English) • country.name.de: country name (German) • cowc: Correlates of War character • cown: Correlates of War numeric • dhs: Demographic and Health Surveys Program • ecb: European Central Bank • eurostat: Eurostat • fao: Food and Agriculture Organization of the United Nations numerical code • fips: FIPS 10-4 (Federal Information Processing Standard) • gaul: Global Administrative Unit Layers • genc2c: GENC 2-letter code • genc3c: GENC 3-letter code • genc3n: GENC numeric code • gwc: Gleditsch & Ward character • gwn: Gleditsch & Ward numeric • imf: International Monetary Fund • ioc: International Olympic Committee • iso2c: ISO-2 character • iso3c: ISO-3 character • iso3n: ISO-3 numeric • p4n: Polity IV numeric country code • p4c: Polity IV character country code • un: United Nations M49 numeric codes 4 codelist • unicode.symbol: Region subtag (often displayed as emoji flag) • unpd: United Nations Procurement Division • vdem: Varieties of Democracy (V-Dem version 8, April 2018) • wb: World Bank (very similar but not identical to iso3c) • wvs: World Values Survey numeric code