Packages we will need:
library(tidyverse) library(forcats) library(tidytext) library(ggthemes) library(democracyData) library(magrittr)
For this blog, we are going to look at the titles of all countries’ heads of state, such as Kings, Presidents, Emirs, Chairman … understandably, there are many many many ways to title the leader of a country.
First, we will download the PACL dataset from the
Click here to read more about this super handy package:
If you want to read more about the variables in this dataset, click the link below to download the codebook by Cheibub et al.
pacl <- redownload_pacl()
We are going to look at the
npost variable; this captures the political title of the nominal head of stage. This can be King, President, Sultan et cetera!
pacl %>% count(npost) %>% arrange(desc(n))
If we count the occurence of each title, we can see there are many ways to be called the head of a country!
"president" 3693 "prime minister" 2914 "king" 470 "Chairman of Council of Ministers" 229 "premier" 169 "chancellor" 123 "emir" 117 "chair of Council of Ministers" 111 "head of state" 90 "sultan" 67 "chief of government" 63 "president of the confederation" 63 "" 44 "chairman of Council of Ministers" 44 "shah" 33 # ... with 145 more rows
155 groups is a bit difficult to meaningfully compare.
So we can collapse some of the groups together and lump all the titles that occur relatively seldomly – sometimes only once or twice – into an “other” category.
First, we use
grepl() function to take the word president and chair (chairman, chairwoman, chairperson et cetera) and add them into broader categories.
Also, we use the
tolower() function to make all lower case words and there is no confusion over the random capitalisation.
pacl %<>% mutate(npost = tolower(npost)) %>% mutate(npost = ifelse(grepl("president", npost), "president", npost)) %>% mutate(npost = ifelse(grepl("chair", npost), "chairperson", npost))
Next, we create an
"other leader type" with the
We specifiy a threshold and if the group appears fewer times in the dataset than this level we set, it is added into the “other” group.
pacl %<>% mutate(regime_prop = fct_lump_prop(npost, prop = 0.005, other_level = "Other leader type")) %>% mutate(regime_prop = str_to_title(regime_prop))
Now, instead of 155 types of leader titles, we have 10 types and the rest are all bundled into the
Other Leader Type category
President 4370 Prime Minister 2945 Chairperson 520 King 470 Other Leader Type 225 Premier 169 Chancellor 123 Emir 117 Head Of State 90 Sultan 67 Chief Of Government 63
The forcast package has three other ways to lump the variables together.
First, we can quickly look at
We can set the
min argument to 100 and look at how it condenses the groups together:
pacl %>% mutate(npost = tolower(npost)) %>% mutate(post_min = fct_lump_min(npost, min = 100, other_level = "Other type")) %>% mutate(post_min = str_to_title(post_min)) %>% count(post_min) %>% arrange(desc(n))
President 4370 Prime Minister 2945 Chairperson 520 King 470 Other Type 445 Premier 169 Chancellor 123 Emir 117
We can see that if the post appears fewer than 100 times, it is now in the
Other Type category. In the previous example, Head Of State only appeared 90 times so it didn’t make it.
Next we look at
This function lumps together the least frequent levels. This one makes sure that “
other” category remains as the smallest group. We don’t add another numeric argument.
pacl %>% mutate(npost = tolower(npost)) %>% mutate(post_lowfreq = fct_lump_lowfreq(npost, other_level = "Other type")) %>% mutate(post_lowfreq = str_to_title(post_lowfreq)) %>% count(post_lowfreq) %>% arrange(desc(n))
President 4370 Prime Minister 2945 Other Type 1844
This one only has three categories and all but president and prime minister are chucked into the
Other type category.
Last, we can look at the
fct_lump_n() to make sure we have a certain number of groups. We add n = 5 and we create five groups and the rest go to the
Other type category.
pacl %>% mutate(npost = tolower(npost)) %>% mutate(post_n = fct_lump_n(npost, n = 5, other_level = "Other type")) %>% mutate(post_n = str_to_title(post_n)) %>% count(post_n) %>% arrange(desc(n))
President 4370 Prime Minister 2945 Other Type 685 Chairperson 520 King 470 Premier 169
Next we can make a simple graph counting the different leader titles in free, partly free and not free Freedom House countries. We will use the
download_fh() from DemocracyData package again
fh <- download_fh()
We will use the
reorder_within() function from tidytext package.
Click here to read the full blog post explaining the function from Julia Silge’s blog.
First we add Freedom House data with the
Then we use the
fct_lump_n() and choose the top five categories (plus the
Other Type category we make)
pacl %<>% inner_join(fh, by = c("cown", "year")) %>% mutate(npost = fct_lump_n(npost, n = 5, other_level = "Other type")) %>% mutate(npost = str_to_title(npost))
group_by the three Freedom House status levels and count the number of each title:
pacl %<>% group_by(status) %>% count(npost) %>% ungroup() %>%
reorder_within(), we order the titles from most to fewest occurences WITHIN each status group:
pacl %<>% mutate(npost = reorder_within(npost, n, status))
To plot the columns, we use
geom_col() and separate them into each Freedom House group, using facet_wrap(). We add
scales = "free y" so that we don’t add every title to each group. Without this we would have empty spaces in the Free group for Emir and King. So this step removes a lot of clutter.
pacl_colplot <- pacl %>% ggplot(aes(fct_reorder(npost, n), n)) + geom_col(aes(fill = npost), show.legend = FALSE) + facet_wrap(~status, scales = "free_y")
Last, I manually added the colors to each group (which now have longer names to reorder them) so that they are consistent across each group. I am sure there is an easier and less messy way to do this but sometimes finding the easier way takes more effort!
We add the
scale_x_reordered() function to clean up the names and remove everything from the underscore in the title label.
pacl_colplot + scale_fill_manual(values = c("Prime Minister___F" = "#005f73", "Prime Minister___NF" = "#005f73", "Prime Minister___PF" = "#005f73", "President___F" = "#94d2bd", "President___NF" = "#94d2bd", "President___PF" = "#94d2bd", "Other Type___F" = "#ee9b00", "Other Type___NF" = "#ee9b00", "Other Type___PF" = "#ee9b00", "Chairperson___F" = "#bb3e03", "Chairperson___NF" = "#bb3e03", "Chairperson___PF" = "#bb3e03", "King___F" = "#9b2226", "King___NF" = "#9b2226", "King___PF" = "#9b2226", "Emir___F" = "#001219", "Emir___NF" = "#001219", "Emir___PF" = "#001219")) + scale_x_reordered() + coord_flip() + ggthemes::theme_fivethirtyeight() + themes(text = element_size(size = 30))
In case you were curious about the free country that had a chairperson, Nigeria had one for two years.
pacl %>% filter(status == "F") %>% filter(npost == "Chairperson") %>% select(Country = pacl_country) %>% knitr::kable("latex") %>% kableExtra::kable_classic(font_size = 30)
Cheibub, J. A., Gandhi, J., & Vreeland, J. R. (2010). Democracy and dictatorship revisited. Public choice, 143(1), 67-101.