How to run multiple t-tests in a function with the broom package in R

Packages we will need:

library(tidyverse)
library(broom)

We will use the Varieties of Democracy dataset again.

Excited Will Ferrell GIF - Find & Share on GIPHY

We will use a t-test comparing democracies (boix == 1) and non-democracies (boix == 0) in the years 2000 to 2020.

We need to remove the instances where boix is NA.

I choose three t-tests to run simultaneously. Comparing democracies and non-democracies on:

the extent to which the country consults religious groups (relig_consult)
the extent to which the country consults Civil Society Organization (CSO) groups (cso_consult)
level of freedom from judicial corruption (judic_corruption) – higher scores mean that there are FEWER instances of judicial corruption

vdem %>% 
  filter(year %in% c(2000:2020)) %>%
  filter(!is.na(e_boix_regime)) %>%
  group_by(e_boix_regime) %>% 
  select(relig_consult = v2csrlgcon,
         cso_consult = v2cscnsult,
         judic_corruption = v2jucorrdc) -> vdem

Next we need to pivot the data from wide to long

  vdem %<>% pivot_longer(!e_boix_regime, 
                names_to = "variable",
               values_to = "value")

Now we get to iterating over the three variables and conducting a t-test on each variable across democracies versus non-democracies

vdem %>%
  group_by(variable) %>%
  nest() %>%
  mutate(t_test = map(data, ~t.test(value ~ e_boix_regime, data = .x)),
                tidy = map(t_test, broom::tidy)) %>%
  select(variable, tidy) %>%
  unnest(tidy) -> ttest_results

If we look closely at the line

mutate(t_test = map(data, ~t.test(value ~ e_boix_regime, data = .x))):

Here, mutate() adds a new column named t_test to the grouped and nested data.

map() is used to apply a function to each element of the list-column (here, each nested data frame).

The function applied is t.test(value ~ e_boix_regime, data = .x).

This function performs a t-test comparing the means of value across two groups defined by e_boix_regime within each nested data frame.

.x represents each nested data frame in turn.

And here are the tidy results:

If we save the above results in a data.frame, we can graph the following:

ttest_results %>%
ggplot(aes(x = variable, y = estimate, ymin = conf.low, ymax = conf.high)) +
  geom_point() + 
  geom_errorbar(width = 0.2) + 
  coord_flip() + 
  labs(title = "T-Test Estimates with Confidence Intervals",
       x = "Variable",
       y = "Estimate Difference") +
  theme_minimal()

Add some colors to highlight magnitude of difference

  ggplot(aes(x = variable, y = estimate, ymin = conf.low, ymax = conf.high, color = color_value)) +
  geom_point() + 
  geom_errorbar(width = 0.7, size = 3) +  
  scale_color_gradientn(colors = c("#0571b0", "#92c5de", "#f7f7f7", "#f4a582", "#ca0020")) +
  coord_flip() +
  labs(title = "T-Test Estimates with Confidence Intervals",
       x = "Variable",
       y = "Estimate Difference") +
  theme_minimal() +
  guides(color = guide_colorbar(title = ""))

Sometimes vdem variables are reverse scored or on different scales

  # mutate(across(judic_corruption, ~ 1- .x)) %>% 
  # mutate(across(everything(), ~(.x - mean(.x, na.rm = TRUE)) / sd(.x, na.rm = TRUE), .names = "z_{.col}")) %>% 
  # select(contains("z_"))

Running tidy t-tests with the infer package in R

Packages we will need:

library(tidyverse)
library(tidyr)
library(infer)
library(bbplot)
library(ggthemes)

For this t-test, we will compare US millenials and non-millenials and their views of the UK’s influence in the world.

The data will come from Chicago Council Survey of American Public Opinion on U.S. Foreign Policy

Click here to download 2017 policy survey data

The survey investigates American public opinion on foreign policy. It focuses on respondents’ opinions of the United States’ leadership role in the world and the challenges the country faces domestically and internationally.

The question on the UK’s influence asks how much influence you think the UK has in the world. Please answer on a 0 to 10 scale; with 0 meaning they are not at all influential and 10 meaning they are extremely influential.

First we select and recreate the variables

fp %>%
  select(
    milennial = XMILLENIALSSAMPLEFLAG,
    uk_influence = Q50_10) %>%
  separate(
    col = milennial,
    into = c("milennial_num", "milennial_char"),
    sep = '[)]',
    remove = TRUE) %>% 
  mutate(
     uk_influence = as.character(uk_influence),
     uk_influence = parse_number(uk_influence)) %>% 
  filter(uk_influence != -1) %>% 
  tidyr::drop_na(milennial_char) -> mil_fp

With the infer package, we can run a t-test:

mil_fp %>% 
  t_test(formula = uk_influence ~ milennial_char,
         alternative = "less")%>% 
  kable(format = "html")

statistic	t_df	p_value	alternative	estimate	lower_ci	upper_ci
-3.048249	1329.469	0.0011736	less	-0.3274509	-Inf	-0.1506332

There is a statistically significant difference between milennials and non-milennials.

We can graph a box plot.

mil_fp %>% 
  ggplot(mapping = aes(x = milennial_char,
                       y = uk_influence,
                       fill = milennial_char)) +
  geom_jitter(aes(color = milennial_char),
              size = 2, alpha = 0.5, width = 0.3) +
  geom_boxplot(alpha = 0.4) +
  coord_flip() + bbplot::bbc_style() +
  scale_fill_manual(values = my_palette) + 
  scale_color_manual(values = my_palette)

And a quick graph to compare UK with other countries: Germany and South Korea

mil_fp %>% 
  select(milennial_char, uk_influence, sk_influence, ger_influence) %>% 
  pivot_longer(!milennial_char, names_to = "survey_question", values_to = "response")  %>% 
  group_by(survey_question, response) %>% 
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>% 
  ungroup() %>% 
  filter(!is.na(response)) %>% 
  mutate(survey_question = case_when(survey_question == "uk_influence" ~ "UK",
survey_question == "ger_influence" ~ "Germany",
survey_question == "sk_influence" ~ "South Korea",
TRUE ~ as.character(survey_question))) %>% 
  ggplot() +
  geom_bar(aes(x = forcats::fct_reorder(survey_question, freq), 
               y = freq, fill = as.factor(response)), 
           color = "#e5e5e5", 
           size = 2, 
           position = "stack",
           stat = "identity") + 
  coord_flip() + 
  scale_fill_brewer(palette = "RdBu") + 
  ggthemes::theme_fivethirtyeight() + 
  ggtitle("View of Influence in the world?") +
  theme(legend.title = element_blank(),
        legend.position = "top",
        legend.key.size = unit(0.78, "cm"),
        text = element_text(size = 25),
        legend.text = element_text(size = 20))