Re: Malloc usage

30 May 2013

      Matthew Gretton-Dann matthew.gretton-dann@linaro.org writes:
...
All,
[snip]
...
So before we go any further I would like to see what the view of LEG is 
about a better malloc.  My questions boil down to:

Is malloc important - or do server applications just implement their own?

I got sent this question and a list of "server applications" and did
some investigation, both of typical runtimes and of the applications.
Just based on source inspection and a little googling in some cases.
Let me know if you want me to look into anything in more detail.
The answer is likely "sometimes" to both parts of the question :-)
These are my notes.  Corrections welcome!
Runtimes
========
Perl
----
Uses glibc malloc (in practice -- it ships with its own malloc
implementation but this is not used by default on Linux or in the
Ubuntu builds)
Python
------
Uses its own allocator for small allocations, which are by far the
commonest.  Uses glibc malloc for some things (e.g. memory backing a
list object), but malloc-related functions do not appear high up in
perf traces.
Java
----
Very much does its own heap management.
PHP
---
As Ard says it has its own thing, and looking at its source, it clearly
does something quite complicated (zend_alloc.c is nearly 3000 lines).
It bundles various libraries (sqlite, pcre, ...) that do call malloc()
and it doesn't seem like it tries to get those libraries to call its own
implementation of malloc or anything like that -- so some workloads
might benefit from malloc improvements.
Server processes
================
apache2
-------
As Ard says it has its own thing where it manages a pool per request.
Looks like it calls malloc a fair bit though.
cassandra
---------
Uses the Java heap mostly, clearly.  Does store a few things "off
heap" (row cache, bloom filter bitsets, compression metadata), which
uses sun.misc.unsafe.allocateMemory, which /probably/ backs onto glibc
malloc but mostly I think these things are allocated once at process
start up rather than in any hot path.
hadoop
------
Appears to have bits that call malloc.  Hard to say more than that
without inhaling the architecture more thoroughly.
ceph
----
Certainly calls malloc (and operator new) in many places.  So
potentially interesting.
memcached
---------
AFAICT, allocates one big chunk of memory with malloc and then does
its own thing to divvy it up.
mongodb
-------
AIUI, pushes the problem to the kernel by mmap()ing the data files
into its address space and fooling around in there.  So probably not
dependent on malloc() performance.
swift
-----
Seems to be pure Python, so not really dependent on malloc.
varnish
-------
Calls malloc() once per request and allocates itself within that --
and on linux (incl Ubuntu armhf), it uses a bundled version of
jemalloc for even that.
haproxy
-------
I *think* this mostly uses a similar model to apache2/varnish:
allocate a region once per request (there are quite a few other calls
to malloc too -- I don't know if they are on hot paths or not though).
It does just use glibc malloc to allocate this memory though AFAICT.
tomcat7
-------
Just uses the Java heap afaict (I guess the contained JSPs can use JNI
or whatever but it looks like the container doesn't).
...

Do you have any benchmarks that stress malloc and would provide us with

some more data points?
But any and all comments on the subject are welcome.
It seems perl and ceph almost certainly have a dependency on glibc
malloc performance.  In most other cases, it seems that projects that
have noticed that malloc can be a little slow have implemented their own
solutions.  It might be that an improved system malloc would mean that
some of these could stop using their own implementation, but often times
they are exploiting properties a system malloc simply cannot
(e.g. allocating an arena per-request and then throwing it all away in
one big go).
Cheers,
mwh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Malloc usage