Google Trends is a search trends feature. It shows how frequently a given search term is entered into Google’s search engine, relative to the site’s total search volume over a given period of time.
( So note: because the results are all relative to the other search terms in the time period, the dates you provide to the
gtrendsR function will change the shape of your graph and the relative percentage frequencies on the y axis of your plot).
To scrape data from Google Trends, we use the
gtrends() function from the
gtrendsR package and the
get_interest() function from the
trendyy package (a handy wrapper package for
If necessary, also load the tidyverse and ggplot packages.
install.packages("gtrendsR") install.packages("trendyy") library(tidyverse) library(ggplot2) library(gtrendsR) library(trendyy)
To scrape the Google trend data, call the
trendy() function and write in the search terms.
For example, here we search for the term “Kamala Harris” during the period from 1st of January 2019 until today.
If you want to check out more specifications, for the package, you can check out the package PDF here. For example, we can change the geographical region (US state or country for example) with the
We can also change the parameters of the
time argument, we can specify the time span of the query with any one of the following strings:
- “now 1-H” (previous hour)
- “now 4-H” (previous four hours)
- “today+5-y” last five years (default)
- “all” (since the beginning of Google Trends (2004))
If don’t supply a string, the default is five year search data.
kamala <- trendy("Kamala Harris", "2019-01-01", "2020-08-13") %>% get_interest()
We call the
get_interest() function to save this data from Google Trends into a data.frame version of the
kamala object. If we didn’t execute this last step, the data would be in a form that we cannot use with
In this data.frame, there is a
date variable for each week and a
hits variable that shows the interest during that week. Remember, this
hits figure shows how frequently a given search term is entered into Google’s search engine relative to the site’s total search volume over a given period of time.
We will use these two variables to plot the y and x axis.
To look at the search trends in relation to the events during the Kamala Presidential campaign over 2019, we can add vertical lines along the date axis, with a data.frame, we can call
kamala_events = data.frame(date=as.Date(c("2019-01-21", "2019-06-25", "2019-12-03", "2020-08-12")), event=c("Launch Presidential Campaign", "First Primary Debate", "Drops Out Presidential Race", "Chosen as Biden's VP"))
Note the very specific order the
as.Date() function requires.
Next, we can graph the trends, using the above date and hits variables:
ggplot(kamala, aes(x = as.Date(date), y = hits)) + geom_line(colour = "steelblue", size = 2.5) + geom_vline(data=kamala_events, mapping=aes(xintercept=date), color="red") + geom_text(data=kamala_events, mapping=aes(x=date, y=0, label=event), size=4, angle=40, vjust=-0.5, hjust=0) + xlab(label = "Search Dates") + ylab(label = 'Relative Hits %')
Super easy and a quick way to visualise the ups and downs of Kamala Harris’ political career over the past few months, operationalised as the relative frequency with which people Googled her name.
If I had chosen different dates, the relative hits as shown on the y axis would be different! So play around with it and see how the trends change when you increase or decrease the time period.