jemalloc vs tcmalloc vs dlmalloc

“jemalloc” seems to be the newer allocator for c/c++ applications, it is definitely more sophisticated than e.g. tcmalloc:

My application has issue with increasing process memory size over time (a week) with tcmalloc, similar to what this page claims:

What ultimately forced us off TCMalloc in production was not stability, but its inconsistent memory economy: a heap containing a given number of allocated bytes would spread across an ever larger physical footprint over time. Multiple TCMalloc back-ends (search, ads, feed, etc.) would suffer 512MB-1GB of heap bloat per day, which meant it would be hitting swap within a few days.

So I tried jemalloc, compared it with tcmalloc, also dlmalloc. Here are my observations for jemalloc:

  • The good: My process RSS (resident set size) is obviously smaller than tcmalloc (during the two days of testing), this is consistent with jemalloc claim: Minimize the active page set
  • The neutral: CPU% (from dlmalloc 15% to 2% per process) is as low as tcmalloc (from dlmalloc 15% to 1% per process), so both jemalloc and tcmalloc minimize lock contention
  • The bad: With jemalloc, the process total size (swap) is running away, it is hitting swap limit within two days, I am not able to find a way to contain the swap growth

My application is a unique beast, it runs many short-living threads, it creates/destroys 10s of worker threads every minute, the threads are not using/sharing object pools or memory buffers, every thread does its own malloc/new calls and push the allocated objects into STL containers with boost shared pointer wrapper. My guess is that the short-living threads (or shared pointers) throw off jemalloc, so the swap size is unreleased/uncontained. I tried one jemalloc config “narenas” (number of arenas) with these values: default (4*nCPUs=96 in my case), 12, and 4, none of the values helped; I was not able to find other meaningful configs that I can use to cope the problem.

Still, jemalloc is more sophisticated than tcmalloc, it has more community activities, more configs/hooks/stats than tcmalloc ( V.S. On another side, tcmalloc is more lightweight, works great out of box.

So, in case of a typical server or daemon that uses long-living worker threads, jemalloc is a good choice, in this case, I believe jemalloc can be tuned to maintain smaller RSS/swap than tcmalloc. It can even improve mysql performance:

But for me, I am staying with tcmalloc (and weekly restarts).

Here are the mem and cpu graphs for each of jemalloc, tcmalloc, dlmalloc from my tests.

jemalloc running for 22 hours, cpu is low, but total mem size (swap) is growing un-contained (note: the graph is not showing RSS which is quite small, and very steady):


tcmalloc running for 5 days, cpu is the lowest, total mem size is growing at a much slower pace:


dlmalloc running for 6 days, cpu is out of the roof (note: many processes running on the same physical server), mem growth is similar to tcmalloc (kudos to tcmalloc):


2 thoughts on “jemalloc vs tcmalloc vs dlmalloc

  1. Craig

    This is probably a little bit too late. If you are still with tcmalloc, have you ever tried to increase TCMALLOC_RELEASE_RATE?

    1. suniphrase Post author

      Good to know the parameter, thanks! yes, we are still running tcmalloc and happy with it; we have a weekly rolling restart scheduled for all those processes, that helped to release process memory, and that works in our situation. I checked, we are using the default value 1.0 for TCMALLOC_RELEASE_RATE; our process mem does increase quickly in the first couple days, but on the last couple days before weekly restart, the mem is quite stable; CPU is always stable across the entire week.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s