democracy – R Functions and Packages for Political Science Analysis

tokens <- democracy_aid %>% select(description, year) %>% mutate(decade = substr(year, 1, 3)) %>% mutate(decade = paste0(decade, "0s")) %>% group_by(decade) %>% unnest_tokens(word, activity_description) %>% count(word, sort = TRUE) %>% ungroup() %>% anti_join(stop_words) nums <- tokens %>% filter(str_detect(word, "^[0-9]")) %>% select(word) %>% unique() tokens %<>% anti_join(nums, by = "word")

decade	word	n
2010s	rights	4541
2010s	local	3981
2010s	youth	3778
2010s	promote	3679
2010s	democratic	3618
2010s	public	3444
2010s	national	3060
2010s	political	3020
2010s	human	3009
2010s	organization	2711
2000s	rights	2548
2000s	human	1745
2000s	local	1544
2000s	conduct	1381
2000s	political	1257
2000s	training	1217
2000s	promote	1142
2000s	public	1121
2000s	democratic	1071
2000s	national	988

decade

word

2010s

rights

4541

2010s

local

3981

2010s

youth

3778

2010s

promote

3679

2010s

democratic

3618

2010s

public

3444

2010s

national

3060

2010s

political

3020

2010s

human

3009

2010s

organization

2711

2000s

rights

2548

2000s

human

1745

2000s

local

1544

2000s

conduct

1381

2000s

political

1257

2000s

training

1217

2000s

promote

1142

2000s

public

1121

2000s

democratic

1071

2000s

national

988

tokens %<>% mutate(word = ifelse(grepl("democr", word), "democracy", ifelse(grepl("politi", word), "politics", ifelse(grepl("institut", word), "institution", ifelse(grepl("govern", word), "government", ifelse(grepl("organiz", word), "organization", ifelse(grepl("elect", word), "election", word))))))) wordcloud(tokens$word, tokens$n, random.order = FALSE, max.words = 50, colors = my_colors)

2010s Decade	Word	Count
2010s	rights	4541
2010s	local	3981
2010s	youth	3778
2010s	promote	3679
2010s	democratic	3618
2010s	public	3444
2010s	national	3060
2010s	political	3020
2010s	human	3009
2010s	organization	2711
2000s	rights	2548
2000s	human	1745
2000s	local	1544
2000s	conduct	1381
2000s	political	1257
2000s	training	1217
2000s	promote	1142
2000s	public	1121
2000s	democratic	1071
2000s	national	988

2010s Decade

Word

Count

2010s

rights

4541

2010s

local

3981

2010s

youth

3778

2010s

promote

3679

2010s

democratic

3618

2010s

public

3444

2010s

national

3060

2010s

political

3020

2010s

human

3009

2010s

organization

2711

2000s Decade

Word

Count

2000s

rights

2548

2000s

human

1745

2000s

local

1544

2000s

conduct

1381

2000s

political

1257

2000s

training

1217

2000s

promote

1142

2000s

public

1121

2000s

democratic

1071

2000s

national

988

tokens %>% group_by(year) %>% top_n(n = 20, wt = n) %>% mutate(word = case_when(word == "party" ~ "political", word == "parties" ~ "political", word == "election" ~ "political", word == "electoral" ~ "political", word == "civil" ~ "civic", word == "civic" ~ "civic", word == "social" ~ "civic", word == "education" ~ "civic", word == "society" ~ "civic", TRUE ~ as.character(word))) %>% filter(word == "political" | word == "civic") %>% ggplot(aes(x = year, y = n, group = word)) + geom_line(aes(color = word ), size = 2.5,alpha = 0.6) + geom_point(aes(color = word ), fill = "white", shape = 21, size = 3, stroke = 2) + bbplot::bbc_style() + scale_x_discrete(limits = c(2001:2019)) + theme(axis.text.x= element_text(size = 15, angle = 45)) + scale_color_discrete(name = "Aid type", labels = c("Civic grants", "Political grants"))

This blog post will look at the plot_model() function from the sjPlot package. This plot can help simply visualise the coefficients in a model.

Packages we need:

library(sjPlot)
library(kable)

We can look at variables that are related to citizens’ access to public services.

This dependent variable measures equal access access to basic public services, such as access to security, primary education, clean water, and healthcare and whether they are distributed equally or unequally according to socioeconomic position.

Higher scores indicate a more equal society.

I will throw some variables into the model and see what relationships are statistically significant.

The variables in the model are

level of judicial constraint on the executive branch,
freedom of information (such as freedom of speech and uncensored media),
level of democracy,
level of regime corruption and
strength of civil society.

So first, we run a simple linear regression model with the lm() function:

summary(my_model <- lm(social_access ~ judicial_constraint +
        freedom_information +
        democracy_score + 
        regime_corruption +
        civil_society_strength, 
        data = df))

We can use knitr package to produce a nice table or the regression coefficients with kable().

I write out the independent variable names in the caption argument

I also choose the four number columns in the col.names argument. These numbers are:

beta coefficient,
standard error,
t-score
p-value

I can choose how many decimals I want for each number columns with the digits argument.

And lastly, to make the table, I can set the type to "html". This way, I can copy and paste it into my blog post directly.

my_model %>% 
tidy() %>%
kable(caption = "Access to public services by socio-economic position.", 
col.names = c("Predictor", "B", "SE", "t", "p"),
digits = c(0, 2, 3, 2, 3), "html")

Access to public services by socio-economic position
Predictor	B	SE	t	p
(Intercept)	1.98	0.380	5.21	0.000
Judicial constraints	-0.03	0.485	-0.06	0.956
Freedom information	-0.60	0.860	-0.70	0.485
Democracy Score	2.61	0.807	3.24	0.001
Regime Corruption	-2.75	0.381	-7.22	0.000
Civil Society Strength	-1.67	0.771	-2.17	0.032

Kristin Cavallari GIF by E! - Find & Share on GIPHY

Higher democracy scores are significantly and positively related to equal access to public services for different socio-economic groups.

There is no statistically significant relationship between judicial constraint on the executive.

But we can also graphically show the coefficients in a plot with the sjPlot package.

There are many different arguments you can add to change the colors of bars, the size of the font or the thickness of the lines.

p <-  plot_model(my_model, 
      line.size = 8, 
      show.values = TRUE,
      colors = "Set1",
      vline.color = "#d62828",
      axis.labels = c("Civil Society Strength",  "Regime Corruption", "Democracy Score", "Freedom information", "Judicial constraints"), title = "Equal access to public services distributed by socio-economic position")

p + theme_sjplot(base_size = 20)

So how can we interpret this graph?

If a bar goes across the vertical red line, the coefficient is not significant. The further the bar is from the line, the higher the t-score and the more significant the coefficient!