How to run multiple t-tests in a function with the broom package in R

Packages we will need:

library(tidyverse)
library(broom)

We will use the Varieties of Democracy dataset again.

Excited Will Ferrell GIF - Find & Share on GIPHY

We will use a t-test comparing democracies (boix == 1) and non-democracies (boix == 0) in the years 2000 to 2020.

We need to remove the instances where boix is NA.

I choose three t-tests to run simultaneously. Comparing democracies and non-democracies on:

  1. the extent to which the country consults religious groups (relig_consult)
  2. the extent to which the country consults Civil Society Organization (CSO) groups (cso_consult)
  3. level of freedom from judicial corruption (judic_corruption) – higher scores mean that there are FEWER instances of judicial corruption
vdem %>% 
filter(year %in% c(2000:2020)) %>%
filter(!is.na(e_boix_regime)) %>%
group_by(e_boix_regime) %>%
select(relig_consult = v2csrlgcon,
cso_consult = v2cscnsult,
judic_corruption = v2jucorrdc) -> vdem

Next we need to pivot the data from wide to long

  vdem %<>% pivot_longer(!e_boix_regime, 
names_to = "variable",
values_to = "value")

Now we get to iterating over the three variables and conducting a t-test on each variable across democracies versus non-democracies

Excited Elf GIF - Find & Share on GIPHY
vdem %>%
group_by(variable) %>%
nest() %>%
mutate(t_test = map(data, ~t.test(value ~ e_boix_regime, data = .x)),
tidy = map(t_test, broom::tidy)) %>%
select(variable, tidy) %>%
unnest(tidy) -> ttest_results

If we look closely at the line

mutate(t_test = map(data, ~t.test(value ~ e_boix_regime, data = .x))):

Here, mutate() adds a new column named t_test to the grouped and nested data.

map() is used to apply a function to each element of the list-column (here, each nested data frame).

The function applied is t.test(value ~ e_boix_regime, data = .x).

This function performs a t-test comparing the means of value across two groups defined by e_boix_regime within each nested data frame.

.x represents each nested data frame in turn.

And here are the tidy results:

If we save the above results in a data.frame, we can graph the following:

ttest_results %>%
ggplot(aes(x = variable, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_point() +
geom_errorbar(width = 0.2) +
coord_flip() +
labs(title = "T-Test Estimates with Confidence Intervals",
x = "Variable",
y = "Estimate Difference") +
theme_minimal()

Add some colors to highlight magnitude of difference

  ggplot(aes(x = variable, y = estimate, ymin = conf.low, ymax = conf.high, color = color_value)) +
geom_point() +
geom_errorbar(width = 0.7, size = 3) +
scale_color_gradientn(colors = c("#0571b0", "#92c5de", "#f7f7f7", "#f4a582", "#ca0020")) +
coord_flip() +
labs(title = "T-Test Estimates with Confidence Intervals",
x = "Variable",
y = "Estimate Difference") +
theme_minimal() +
guides(color = guide_colorbar(title = ""))

Sometimes vdem variables are reverse scored or on different scales

  # mutate(across(judic_corruption, ~ 1- .x)) %>% 
# mutate(across(everything(), ~(.x - mean(.x, na.rm = TRUE)) / sd(.x, na.rm = TRUE), .names = "z_{.col}")) %>%
# select(contains("z_"))

Leave a comment