Recently I came across two excellent post about accelerating clang/llvm build with different compiler/optimization [1] [2].
I tried some of author advices getting very good results. Basically I moved to optimized clang build, changed to gold linker and used another memory allocator than system glibc one. Results in build time for all the clang/llvm toolchain is summarized below (my machine is a i7-4510U, 2C/4T, 8GB, 256GB SSD):
GCC 4.8.4 + gold (Ubuntu 14.04)
real 85m17.640s user 257m1.976s sys 11m35.284s
LLVM 3.6 + gold (Ubuntu 14.04)
real 34m4.909s user 128m43.382s sys 3m51.643s
LLVM 3.7 + gold + tcmalloc
real 32m56.707s user 121m40.562s sys 3m52.358s
Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4 while linking in 8GB without issue any swapping.
Two things I would add/check for the posts:
1. Change from libc to tcmalloc showed me a 3-4% improvement. I tried jemalloc, but tcmalloc is faster. I am using currently system version 2.2, but I have pushed an aggressive decommit patch to enable as default for 2.4 that might show lower RSS and latency (I will check it later).
2. First I try to accelerate my build by offloading compilation using distcc. Results were good, although the other machine utilization (i7, 4C/8T, 8GB) showed mixes cpu utilization. The problem was linking memory utilization using ld.bfd, which generates a lot of swapping with higher job count. I will try using distcc with clang.
[1] http://blogs.s-osg.org/an-introduction-to-accelerating-your-build-with-clang... [2] http://blogs.s-osg.org/a-conclusion-to-accelerating-your-build-with-clang/
On 24 June 2015 at 14:50, Adhemerval Zanella adhemerval.zanella@linaro.org wrote:
I tried some of author advices getting very good results. Basically I moved to optimized clang build, changed to gold linker and used another memory allocator than system glibc one. Results in build time for all the clang/llvm toolchain is summarized below (my machine is a i7-4510U, 2C/4T, 8GB, 256GB SSD):
Optimised + no-assertion builds of clang are in general 2/3 of gcc's build times.
Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4 while linking in 8GB without issue any swapping.
BFD uses more than 2GB of RAM per process when linking statically debug versions of LLVM+Clang.
What I did was to use gold and enable shared libraries in the debug version.
- Change from libc to tcmalloc showed me a 3-4% improvement. I tried jemalloc, but tcmalloc is faster. I am using currently system version 2.2, but I have pushed an aggressive decommit patch to enable as default for 2.4 that might show lower RSS and latency (I will check it later).
Using Ninja generally makes that edge disappear, because it builds a lot less files than make would.
I also recommend ccache if you're using gcc, but with Clang it tends to generate some bogus warnings.
- First I try to accelerate my build by offloading compilation using distcc. Results were good, although the other machine utilization (i7, 4C/8T, 8GB) showed mixes cpu utilization. The problem was linking memory utilization using ld.bfd, which generates a lot of swapping with higher job count. I will try using distcc with clang.
Distcc only helps if you use the Ninja "pool" feature on the linking jobs.
http://www.systemcall.eu/blog/2013/02/distributed-compilation-on-a-pandaboar...
Also, I don't want to depend on having a desktop near me, nor distributing jobs across the Internet, so distcc has very limited value.
If you have a powerful desktop, I recommend that you move your tree in there, maybe use your laptop as the distcc slave, and export the source/build trees via NFS, Samba or SSHFS.
cheers, --renato
On 24-06-2015 11:15, Renato Golin wrote:
On 24 June 2015 at 14:50, Adhemerval Zanella adhemerval.zanella@linaro.org wrote:
I tried some of author advices getting very good results. Basically I moved to optimized clang build, changed to gold linker and used another memory allocator than system glibc one. Results in build time for all the clang/llvm toolchain is summarized below (my machine is a i7-4510U, 2C/4T, 8GB, 256GB SSD):
Optimised + no-assertion builds of clang are in general 2/3 of gcc's build times.
Gold linker also shows a *much* less RSS usage, I am able to fully use make -j4 while linking in 8GB without issue any swapping.
BFD uses more than 2GB of RAM per process when linking statically debug versions of LLVM+Clang.
What I did was to use gold and enable shared libraries in the debug version.
I am using default configuration option which I think it with shared libraries.
- Change from libc to tcmalloc showed me a 3-4% improvement. I tried jemalloc, but tcmalloc is faster. I am using currently system version 2.2, but I have pushed an aggressive decommit patch to enable as default for 2.4 that might show lower RSS and latency (I will check it later).
Using Ninja generally makes that edge disappear, because it builds a lot less files than make would.
I also recommend ccache if you're using gcc, but with Clang it tends to generate some bogus warnings.
The memory allocator change will help with either build system (gnu make or ninja). I got this idea about observing the 'perf top' profile with a clang/llvm build.
About ninja, as the posts had reported I also did not noticed much difference in build time. I am also not very found of out-of-tree/experimental tools.
I also checked ccache, but on most of time and build I am doing lately build do not hit the cache system. Usually I update my tree daily and since llvm tend to refactor code a lot it ends by recompiling a lot of objects (and thus invalidating the cache...).
For clang you can use 'export CCACHE_CPP2=yes' to make the warning get away. The only issue is it does not work with the optimized tlbgen build option (I got weird warning mixing ccache and this option).
- First I try to accelerate my build by offloading compilation using distcc. Results were good, although the other machine utilization (i7, 4C/8T, 8GB) showed mixes cpu utilization. The problem was linking memory utilization using ld.bfd, which generates a lot of swapping with higher job count. I will try using distcc with clang.
Distcc only helps if you use the Ninja "pool" feature on the linking jobs.
http://www.systemcall.eu/blog/2013/02/distributed-compilation-on-a-pandaboar...
Also, I don't want to depend on having a desktop near me, nor distributing jobs across the Internet, so distcc has very limited value.
If you have a powerful desktop, I recommend that you move your tree in there, maybe use your laptop as the distcc slave, and export the source/build trees via NFS, Samba or SSHFS.
Distcc in fact helped a lot with my early builds with GCC+ld.bfd, I got from roughly 85m build time to 40m. And the only issue about distcc is that I need to lower the timeout factor a bit so it won't take long to start the job locally if the remote machine is not accessible.
My desktop have more cores, but do not have a SSD on it. Using GCC+ld in debug mode the total build time is roughly the same.
linaro-toolchain@lists.linaro.org