“jemalloc” seems to be the newer allocator for c/c++ applications, it is definitely more sophisticated than e.g. tcmalloc: https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919
My application has issue with increasing process memory size over time (a week) with tcmalloc, similar to what this page https://www.quora.com/Is-tcmalloc-stable-enough-for-production-use claims:
What ultimately forced us off TCMalloc in production was not stability, but its inconsistent memory economy: a heap containing a given number of allocated bytes would spread across an ever larger physical footprint over time. Multiple TCMalloc back-ends (search, ads, feed, etc.) would suffer 512MB-1GB of heap bloat per day, which meant it would be hitting swap within a few days.
So I tried jemalloc, compared it with tcmalloc, also dlmalloc. Here are my observations for jemalloc:
- The good: My process RSS (resident set size) is obviously smaller than tcmalloc (during the two days of testing), this is consistent with jemalloc claim: Minimize the active page set
- The neutral: CPU% (from dlmalloc 15% to 2% per process) is as low as tcmalloc (from dlmalloc 15% to 1% per process), so both jemalloc and tcmalloc minimize lock contention
- The bad: With jemalloc, the process total size (swap) is running away, it is hitting swap limit within two days, I am not able to find a way to contain the swap growth
My application is a unique beast, it runs many short-living threads, it creates/destroys 10s of worker threads every minute, the threads are not using/sharing object pools or memory buffers, every thread does its own malloc/new calls and push the allocated objects into STL containers with boost shared pointer wrapper. My guess is that the short-living threads (or shared pointers) throw off jemalloc, so the swap size is unreleased/uncontained. I tried one jemalloc config “narenas” (number of arenas) with these values: default (4*nCPUs=96 in my case), 12, and 4, none of the values helped; I was not able to find other meaningful configs that I can use to cope the problem.
Still, jemalloc is more sophisticated than tcmalloc, it has more community activities, more configs/hooks/stats than tcmalloc (http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.html#mallctl_namespace V.S. http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html#Garbage_Collection). On another side, tcmalloc is more lightweight, works great out of box.
So, in case of a typical server or daemon that uses long-living worker threads, jemalloc is a good choice, in this case, I believe jemalloc can be tuned to maintain smaller RSS/swap than tcmalloc. It can even improve mysql performance: https://www.percona.com/blog/2013/03/08/mysql-performance-impact-of-memory-allocators-part-2/
But for me, I am staying with tcmalloc (and weekly restarts).
Here are the mem and cpu graphs for each of jemalloc, tcmalloc, dlmalloc from my tests.
jemalloc running for 22 hours, cpu is low, but total mem size (swap) is growing un-contained (note: the graph is not showing RSS which is quite small, and very steady):
tcmalloc running for 5 days, cpu is the lowest, total mem size is growing at a much slower pace:
dlmalloc running for 6 days, cpu is out of the roof (note: many processes running on the same physical server), mem growth is similar to tcmalloc (kudos to tcmalloc):