From loop to purrr

The first instinct for many R users (especially those coming from other programming languages) is to use run of the mill for loop:

for (i in 1:8) {  
  col_name <- paste0("country_", i)
  
  if (col_name %in% colnames(press_releases_speech)) {
    press_releases_speech[[col_name]] <- countrycode(press_releases_speech[[col_name]], "country.name", "cown")
  }
}

This works fine.

It loops through country_1 to country_8, checks if each column exists, and converts the country names into COW codes.

Buuut there’s another way to do this.

Enter purrr and dplyr

Instead of writing out a loop, we can use the across() function from dplyr (which works with purrr) to apply the conversion to all relevant columns at once:

Use purrr::map to convert all country columns from country name to COW code

#Max number of country columns
country_columns <- paste0("country_", 1:8)

press_releases_speech <- press_releases_speech %>%
  mutate(across(all_of(country_columns), ~ countrycode(.x, "country.name", "cown")))

country_columns <- paste0("country_", 1:8)

This creates a vector of column names: "country_1", "country_2", …, "country_8".

mutate(across(all_of(country_columns), ~ countrycode(.x, "country.name", "cown")))

across(all_of(country_columns), ...): selects only country_1 to country_8 columns.

~ countrycode(.x, "country.name", "cown"): uses countrycode() to convert country names to COW codes.


    Why purrr is Easier than a For Loop

    1. Less Typing, More Doing
      • No need to manually track column names or index values.
      • across() applies the function to all specified columns at once.
    2. More Readable
      • The mutate(across(...)) approach is way easier to scan than a loop.
      • Anyone reading your code will immediately understand, “Oh, we’re applying countrycode() to multiple columns.”
    3. Vectorized Processing = Faster Execution
      • R is optimized for vectorized operations, meaning functions like mutate(across(...)) are generally faster than loops.
      • Instead of processing one column at a time, across() processes them all at once in a more efficient way.
    4. Scalability
      • Need to apply the function to 10 or 100 country columns instead of 8? No problem! Just update country_columns, and you’re good to go.
      • With a for loop, you’d have to adjust your loop range (1:100) and manually ensure it still works.

    When Should You Still Use a For Loop?

    To be fair, for loops aren’t evil—they’re just not always the best tool for the job. If you need more custom logic (e.g., different transformations depending on the column), a loop might be the better choice. But for straightforward “apply this function to multiple columns” situations, purrr or across() is the way to go..

    Leave a comment