Uncategorized – R Functions and Packages for Political Science Analysis

library(dplyr) # Assuming df is your dataframe and it includes columns mission_1 to mission_4 df <- tibble( mission_1 = c("MissionA", NA, "MissionC", "MissionD"), mission_2 = c(NA, "MissionE", "MissionF", NA), mission_3 = c("MissionG", "MissionH", NA, "MissionJ"), mission_4 = c(NA, NA, "MissionK", "MissionL") ) df <- df %>% rowwise() %>% mutate(non_na_string_count = sum(!is.na(c_across(mission_1:mission_4)) & nchar(c_across(mission_1:mission_4)) > 0)) %>% ungroup() print(df)

tcc %<>% janitor::clean_names() %>% group_by(year, contributor) %>% summarise(across(c(experts_on_mission, formed_police_units, inidividual_police, civilian_police, troops, observers, total), ~sum(.x, na.rm = TRUE))) -> tcc_sum

pema %>% mutate(across(c(signature, namepko, country), as.factor)) %>% mutate(date = as.Date(date, format = "%d/%m/%Y")) %>% # 18/06/1998 select(!c(mandate_renewal:mandate_completeadjustment, comments)) %>% select(where(~ !all(is.na(.)))) %>% mutate(across(where(is.character), ~ if_else(nchar(.) > 0, 1, 0))) %>% mutate(across(where(is.numeric), ~replace_na(., 0))) %>% mutate(across(where(is.numeric), ~if_else(. != 0, 1, 0))) %>% mutate(year = year(date)) -> pema_check

gapminder%>% group_by(continent) %>% mutate(across(where(is.numeric), ~ replace_na(., 0))) %>% mutate(across(where(is.numeric), mean, na.rm = TRUE, .names = "avg_{col}")) %>% mutate(across(where(is.numeric), log, .names = "ln_{col}")) %>% ggplot(aes(x = ln_avg_gdpPercap, y = ln_avg_lifeExp, group = continent)) + geom_point() + geom_label(aes(label = continent, fill = continent), color = "#f0f0f0", size = 8) -> my_plot

my_plot + ggtitle("Scatterplot of average GDP and life expectancy, 1952-2007") + xlab("Average GDP per capita (logged)") + ylab("Average life expectancy (logged)") + ggthemes::theme_fivethirtyeight() + xlim(7.5, 10.1) + scale_fill_manual(values = pal_hash) + theme(legend.position = "none", plot.title = element_text(size = 25), text = element_text(family = "Arial"))

This blog post will look at the plot_model() function from the sjPlot package. This plot can help simply visualise the coefficients in a model.

Packages we need:

library(sjPlot)
library(kable)

We can look at variables that are related to citizens’ access to public services.

This dependent variable measures equal access access to basic public services, such as access to security, primary education, clean water, and healthcare and whether they are distributed equally or unequally according to socioeconomic position.

Higher scores indicate a more equal society.

I will throw some variables into the model and see what relationships are statistically significant.

The variables in the model are

level of judicial constraint on the executive branch,
freedom of information (such as freedom of speech and uncensored media),
level of democracy,
level of regime corruption and
strength of civil society.

So first, we run a simple linear regression model with the lm() function:

summary(my_model <- lm(social_access ~ judicial_constraint +
        freedom_information +
        democracy_score + 
        regime_corruption +
        civil_society_strength, 
        data = df))

We can use knitr package to produce a nice table or the regression coefficients with kable().

I write out the independent variable names in the caption argument

I also choose the four number columns in the col.names argument. These numbers are:

beta coefficient,
standard error,
t-score
p-value

I can choose how many decimals I want for each number columns with the digits argument.

And lastly, to make the table, I can set the type to "html". This way, I can copy and paste it into my blog post directly.

my_model %>% 
tidy() %>%
kable(caption = "Access to public services by socio-economic position.", 
col.names = c("Predictor", "B", "SE", "t", "p"),
digits = c(0, 2, 3, 2, 3), "html")

Access to public services by socio-economic position
Predictor	B	SE	t	p
(Intercept)	1.98	0.380	5.21	0.000
Judicial constraints	-0.03	0.485	-0.06	0.956
Freedom information	-0.60	0.860	-0.70	0.485
Democracy Score	2.61	0.807	3.24	0.001
Regime Corruption	-2.75	0.381	-7.22	0.000
Civil Society Strength	-1.67	0.771	-2.17	0.032

Kristin Cavallari GIF by E! - Find & Share on GIPHY

Higher democracy scores are significantly and positively related to equal access to public services for different socio-economic groups.

There is no statistically significant relationship between judicial constraint on the executive.

But we can also graphically show the coefficients in a plot with the sjPlot package.

There are many different arguments you can add to change the colors of bars, the size of the font or the thickness of the lines.

p <-  plot_model(my_model, 
      line.size = 8, 
      show.values = TRUE,
      colors = "Set1",
      vline.color = "#d62828",
      axis.labels = c("Civil Society Strength",  "Regime Corruption", "Democracy Score", "Freedom information", "Judicial constraints"), title = "Equal access to public services distributed by socio-economic position")

p + theme_sjplot(base_size = 20)

So how can we interpret this graph?

If a bar goes across the vertical red line, the coefficient is not significant. The further the bar is from the line, the higher the t-score and the more significant the coefficient!