Packages we will use:
library(pryr)
I am working with a large dataset in R at the moment (it’s got event data with lots of text), and my computer fan is working overtime.
I’m also beginning to realise my coding issues are actually memory problems disguised as coding problems.
So recently, I have had to learn alot about R’s functions to help me understand what is happening to my memory use.
A favourite of mine is gc(), which stands for garbage collection (garbage collector?)

When I run gc(), it outputs the following table
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 646,297 34.6 1,234,340 66.0 1,019,471 54.5
Vcells 1,902,221 14.6 213,407,768 1628.2 255,567,715 1949.9
Ncells
Memory used by R’s internal bookkeeping: environments, expressions, symbols.
Usually stable and not a worry~
Vcells
Memory used by your data: vectors, data frames, strings, lists. So when I think “R ran out of memory,” it almost always mean Vcells are too high~
If I look at the columns:
- used: memory currently in use after garbage collection
- gc trigger: threshold at which R will automatically run GC next
- max used: peak memory usage since the session started
As an aside, I can see from the table also that my session at one time or another used around 2 GB of memory, even though it now uses ~15 MB.

At its core, gc() has two roles:
- Trigger garbage collection
- Report memory usage
The official R documentation is explicit that reporting, not memory recovery, is the primary reason we should use gc()
“the primary purpose of calling gc is for the report on memory usage”
The documentition also says it can be useful to call gc() after a large object has been removed, as this may prompt R to return memory to the operating system.
Also if I turn on garbage collection logging with gcinfo()
gcinfo(TRUE)
This starts printing a log every time I execute a function:
Garbage collection 80 = 53+6+21 (level 0) ... 74.8 Mbytes of cons cells used (57%) 58.8 Mbytes of vectors used (14%)
I typed this into ChatGPT, and this is what the AI overlord told me was in this output:
1. Garbage collection 80
- This is the 80th garbage collection since the R session started.
- GC runs automatically when memory pressure crosses a trigger threshold.
- A high number here usually reflects:
- long sessions
- repeated allocation and copying
- large or complex objects being created and discarded
On its own, “80” is not a problem; it is contextual.
2. = 53+6+21
This is a breakdown of GC events by type, accumulated so far:
- 53: minor (level-0) collections
→ clean up recently allocated objects only - 6: level-1 collections
→ more aggressive; scan more of the heap - 21: level-2 collections
→ full, expensive sweeps of memory
The sum equals 80.
Interpretation:
- Most collections are cheap and local (good)
- But 21 full GCs indicates some sustained memory pressure over time
3. (level 0)
This refers to the current GC event that just ran:
- Level 0 = minor collection
- Triggered by short-term allocation pressure
- Typically fast
This is not a warning. It means R handled it without escalating.
4. 74.8 Mbytes of cons cells used (57%)
- Cons cells (Ncells) = internal R objects:
- environments
- symbols
- expressions
- 74.8 MB is currently in use
- This represents 57% of the current GC trigger threshold
Interpretation:
- Ncells usage is moderate
- Well below the trigger
- Not your bottleneck
5. 58.8 Mbytes of vectors used (14%)
- Vector cells (Vcells) = your actual data:
- vectors, data frames, strings
- 58.8 MB currently in use
- Only 14% of the trigger threshold
Interpretation:
- Data memory pressure is low
- R is very far from running out of vector space
- This GC was likely triggered by allocation churn, not dataset size
rm() for ReMoving objects
rm(my_uncessesarily_big_df)
gc()
A quick way to make sure there isn’t a ton of memory leakage here and there, we can use rm() to remove the object reference and gc() helps clean up unreachable memory.
From a stackoverflow comment:
gcdoes not delete any variables that you are still using- it only frees up the memory for ones that you no longer have access to (whether removed usingrm()or, say, created in a function that has since returned). Runninggc()will never make you lose variables.
object.size()
object.size(my_suspiciously_big_df)
object.size(another_suspiciously_big_df) / 1024^2 # size in MB
ls() + sapply() — Crude but Effective Audits
sapply(ls(), function(x) object.size(get(x)))
This reveals which objects dominate memory.
pryr::mem_used() (Optional, Cleaner Output)
pryr::mem_used()
Thank you for reading along with me to help understand some of the diagnostics we can use in R. Hopefully that can help our poor computers aviod booting up the fan and suffer with overheating~
And at the end of the day, the R Documentation stresses that session restarts as mandatory hygiene better than relying on gc() or rm()!

