Hello,
I went through the LLVM build bots (and also the libc++ buildkites) and increased their ccache max size. There was a big impact on the flang builds on tcwg-jade-01 (which went from 1h–2h to 10min–30min), but not on other builds. One reason is probably that since I made this change earlier today, there wasn't time yet to finish enough several-hours-long builds to warm up the caches.
Since different machines have different disk sizes and free space I chose different ccache max sizes for them, as follows:
* tcwg-fx-02 hosts the following build bots: - clang-aarch64-sve-vls-2stage - clang-aarch64-sve-vls - clang-aarch64-sve-vla-2stage - clang-aarch64-sve-vla
All share the same ccache. I changed its max size to 20 GB. It's not a lot, but this machine is also used as a dev box so I though it would be good to preserve a fair amount of space.
* tcwg-jade-01 hosts the following build bots: - clang-armv8-lld-2stage - clang-armv7-vfpv3-2stage - clang-armv7-global-isel - clang-armv7-quick - clang-armv7-2stage - clang-armv7-lnt - flang-aarch64-latest-gcc - flang-aarch64-rel-assert - flang-aarch64-release - flang-aarch64-latest-clang - flang-aarch64-debug - flang-aarch64-out-of-tree - flang-aarch64-sharedlibs - flang-aarch64-dylib - clang-aarch64-full-2stage - clang-aarch64-global-isel - clang-aarch64-lld-2stage - clang-aarch64-quick
All armv7 and armv8 bots share one ccache, and all aarch64 bots share another. I changed the max size of each one to 100 GB.
* tcwg-jade-04 hosts the following build bots: - lldb-aarch64-ubuntu - lldb-arm-ubuntu - buildkite-linaro-armv8-libcxx-01 - buildkite-linaro-armv8-libcxx-02 - buildkite-linaro-armv8-libcxx-03 - buildkite-linaro-armv8-libcxx-04
The buildkite bots share a 50 GB ccache, while lldb-arm-ubuntu uses another 50 GB ccache due to being based on a different distro version. And lldb-aarch64-ubuntu also uses its own 50 GB ccache.
* tcwg-llvmbot_tk1-01.tcwglab hosts the following build bot: - silent-linaro-tk1-01
I changed the max cache size to 10 GB. There's not a lot of free space on the machine.
* tcwg-llvmbot_tk1-03.tcwglab hosts the following build bot: - normal-linaro-tk1-02
I changed the max cache size to 20 GB.
* tcwg-llvmbot_tk1-05.tcwglab hosts the following build bot: - silent-linaro-tk1-08
I changed the max cache size to 10 GB.
* The following tcwg-llvmbot_tk1-* machines are currently unreachable so I couldn't examine them: - tcwg-llvmbot_tk1-02.tcwglab - tcwg-llvmbot_tk1-04.tcwglab
* The following tcwg-llvmbot_tk1-* machines are running an llvmbot container but no builder container, so I didn't change their ccache configuration: - tcwg-llvmbot_tk1-06.tcwglab - tcwg-llvmbot_tk1-07.tcwglab - tcwg-llvmbot_tk1-08.tcwglab - tcwg-llvmbot_tk1-09.tcwglab
* tcwg-jade-02 is a GNU builder, and from peeking into a few containers running build jobs I have the impression that it doesn't use ccache. Should I look into it?
* Going through our ssh config file I didn't find these build bots that are listed at http://llvm.validation.linaro.org/ so I didn't check their ccache usage: - clang-arm64-windows-msvc-2stage - clang-arm64-windows-msvc - clang-arm64-windows-msvc-2stage - clang-arm64-windows-msvc - clang-native-arm-lnt-perf - clang-armv7-vfpv3-full-2stage - clang-thumbv7-full-2stage - libcxx aarch64 - libcxx aarch64 -fno-exceptions
There was a big impact on the flang builds on tcwg-jade-01 (which went from 1h–2h to 10min–30min), but not on other builds.
Given how much resource flang takes up this is a great result even if they're the only faster builds.
Since different machines have different disk sizes and free space I chose different ccache max sizes for them, as follows:
Can you document how you set, and why you chose, the sizes? There is a bit about ccache at the end of https://linaro.atlassian.net/wiki/spaces/TCWG/pages/22343946024/LLVM+Docker+... you could put it there.
Particularly want to know if this config must be set each time or is automated. I *think* the volumes are made by hand but I didn't confirm that. I guess the cache size is set inside each container but must agree between the ones that share a ccache volume?
- The following tcwg-llvmbot_tk1-* machines are currently unreachable so I couldn't examine them:
- The following tcwg-llvmbot_tk1-* machines are running an llvmbot container but no builder container, so I didn't change their ccache configuration:
For a while we only had 3/4 actually working which is why you see some "pretending" to be someone else. It is expected that the ones that are back online now aren't doing anything.
- Going through our ssh config file I didn't find these build bots that
are listed at http://llvm.validation.linaro.org/ so I didn't check their ccache usage:
- clang-arm64-windows-msvc-2stage
- clang-arm64-windows-msvc
- clang-arm64-windows-msvc-2stage
- clang-arm64-windows-msvc
These run on the surface laptops (https://linaro.atlassian.net/wiki/spaces/TCWG/pages/22395192116/Accessing+Wi...).
These do not use ccache at this time (no mention on https://linaro.atlassian.net/wiki/spaces/TCWGPUB/pages/25310167965/How+to+se...).
- clang-native-arm-lnt-perf
- clang-armv7-vfpv3-full-2stage
- clang-thumbv7-full-2stage
This is another case where the "builder" name (builder meaning the configuration it builds) is served by a "worker" that has a different name. So lnt-perf runs on whatever is running the tk1-02 container (https://lab.llvm.org/buildbot/#/builders/113).
The other 2 are on the "silent" buildmaster, for example https://lab.llvm.org/staging/#/builders/169.
The current layout for the tk1s I wrote out in https://git.linaro.org/toolchain/jenkins-scripts.git/commit/?id=e54ac9c17326.... And yeah, it makes little sense but we can unpick it now we've got some more nodes online. (short term it is easier to move containers around on our end than change the llvm side config)
- libcxx aarch64
- libcxx aarch64 -fno-exceptions
jade-04 runs 2 AArch64 buildkite agents that can build either of these configs depending on who picks it up first. So that might have made it less obvious.
Or these are references to the pre-buildkite buildbot libcxx bots but I'm pretty sure our monitoring is up to date and wouldn't include them.
On Thu, 7 Jul 2022 at 03:37, Thiago Jung Bauermann thiago.bauermann@linaro.org wrote:
Hello,
I went through the LLVM build bots (and also the libc++ buildkites) and increased their ccache max size. There was a big impact on the flang builds on tcwg-jade-01 (which went from 1h–2h to 10min–30min), but not on other builds. One reason is probably that since I made this change earlier today, there wasn't time yet to finish enough several-hours-long builds to warm up the caches.
Since different machines have different disk sizes and free space I chose different ccache max sizes for them, as follows:
tcwg-fx-02 hosts the following build bots:
- clang-aarch64-sve-vls-2stage
- clang-aarch64-sve-vls
- clang-aarch64-sve-vla-2stage
- clang-aarch64-sve-vla
All share the same ccache. I changed its max size to 20 GB. It's not a lot, but this machine is also used as a dev box so I though it would be good to preserve a fair amount of space.
tcwg-jade-01 hosts the following build bots:
- clang-armv8-lld-2stage
- clang-armv7-vfpv3-2stage
- clang-armv7-global-isel
- clang-armv7-quick
- clang-armv7-2stage
- clang-armv7-lnt
- flang-aarch64-latest-gcc
- flang-aarch64-rel-assert
- flang-aarch64-release
- flang-aarch64-latest-clang
- flang-aarch64-debug
- flang-aarch64-out-of-tree
- flang-aarch64-sharedlibs
- flang-aarch64-dylib
- clang-aarch64-full-2stage
- clang-aarch64-global-isel
- clang-aarch64-lld-2stage
- clang-aarch64-quick
All armv7 and armv8 bots share one ccache, and all aarch64 bots share another. I changed the max size of each one to 100 GB.
tcwg-jade-04 hosts the following build bots:
- lldb-aarch64-ubuntu
- lldb-arm-ubuntu
- buildkite-linaro-armv8-libcxx-01
- buildkite-linaro-armv8-libcxx-02
- buildkite-linaro-armv8-libcxx-03
- buildkite-linaro-armv8-libcxx-04
The buildkite bots share a 50 GB ccache, while lldb-arm-ubuntu uses another 50 GB ccache due to being based on a different distro version. And lldb-aarch64-ubuntu also uses its own 50 GB ccache.
tcwg-llvmbot_tk1-01.tcwglab hosts the following build bot:
- silent-linaro-tk1-01
I changed the max cache size to 10 GB. There's not a lot of free space on the machine.
tcwg-llvmbot_tk1-03.tcwglab hosts the following build bot:
- normal-linaro-tk1-02
I changed the max cache size to 20 GB.
tcwg-llvmbot_tk1-05.tcwglab hosts the following build bot:
- silent-linaro-tk1-08
I changed the max cache size to 10 GB.
The following tcwg-llvmbot_tk1-* machines are currently unreachable so I couldn't examine them:
- tcwg-llvmbot_tk1-02.tcwglab
- tcwg-llvmbot_tk1-04.tcwglab
The following tcwg-llvmbot_tk1-* machines are running an llvmbot container but no builder container, so I didn't change their ccache configuration:
- tcwg-llvmbot_tk1-06.tcwglab
- tcwg-llvmbot_tk1-07.tcwglab
- tcwg-llvmbot_tk1-08.tcwglab
- tcwg-llvmbot_tk1-09.tcwglab
tcwg-jade-02 is a GNU builder, and from peeking into a few containers running build jobs I have the impression that it doesn't use ccache. Should I look into it?
Going through our ssh config file I didn't find these build bots that are listed at http://llvm.validation.linaro.org/ so I didn't check their ccache usage:
- clang-arm64-windows-msvc-2stage
- clang-arm64-windows-msvc
- clang-arm64-windows-msvc-2stage
- clang-arm64-windows-msvc
- clang-native-arm-lnt-perf
- clang-armv7-vfpv3-full-2stage
- clang-thumbv7-full-2stage
- libcxx aarch64
- libcxx aarch64 -fno-exceptions
-- Thiago _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
Hello David,
Thank you very much for your clarifications. They are very helpful. Sorry for the delay in answering your message.
David Spickett david.spickett@linaro.org writes:
There was a big impact on the flang builds on tcwg-jade-01 (which went from 1h–2h to 10min–30min), but not on other builds.
Given how much resource flang takes up this is a great result even if they're the only faster builds.
That's good to know. I've been looking at the “build times” graph for other builds and they haven't changed much since July 6th (when I changed the cache sizes).
Since different machines have different disk sizes and free space I chose different ccache max sizes for them, as follows:
Can you document how you set, and why you chose, the sizes? There is a bit about ccache at the end of https://linaro.atlassian.net/wiki/spaces/TCWG/pages/22343946024/LLVM+Docker+... you could put it there.
Thanks for the suggestion. I expanded that section with the information I gathered and the changes I made. Please let me know if there's anything I can improve on it.
Since the libc++ buildkites share the ccache configuration with the LLVM buildbots, I briefly mentioned that (with a link to the LLVM buildbot page) at https://linaro.atlassian.net/wiki/spaces/TCWG/pages/22405546190/Buildkite+Bo...
Particularly want to know if this config must be set each time or is automated. I *think* the volumes are made by hand but I didn't confirm that. I guess the cache size is set inside each container but must agree between the ones that share a ccache volume?
The config is stored in ~tcwg-buildbot/.ccache/ccache.conf. Since the shared ccache volume is mounted at ~tcwg-buildbot/.ccache, the cache size can be set manually once (either by editing the ccache.conf file or with “ccache --max-size SIZE”) and will be persisted and shared across containers that share the a ccache volume.
I added this information to the wiki. Please let me know if it should be clarified or expanded.
I also believe that the volumes are made by hand, but I didn't confirm that either. I didn't find anything in the dockerfiles scripts to create these volumes.
- The following tcwg-llvmbot_tk1-* machines are currently unreachable so I couldn't
examine them:
- The following tcwg-llvmbot_tk1-* machines are running an llvmbot container but no
builder container, so I didn't change their ccache configuration:
For a while we only had 3/4 actually working which is why you see some "pretending" to be someone else. It is expected that the ones that are back online now aren't doing anything.
Ah, makes sense. Thanks for the information.
- Going through our ssh config file I didn't find these build bots that
are listed at http://llvm.validation.linaro.org/ so I didn't check their ccache usage:
- clang-arm64-windows-msvc-2stage
- clang-arm64-windows-msvc
- clang-arm64-windows-msvc-2stage
- clang-arm64-windows-msvc
These run on the surface laptops (https://linaro.atlassian.net/wiki/spaces/TCWG/pages/22395192116/Accessing+Wi...).
These do not use ccache at this time (no mention on https://linaro.atlassian.net/wiki/spaces/TCWGPUB/pages/25310167965/How+to+se...).
Ok.
- clang-native-arm-lnt-perf
- clang-armv7-vfpv3-full-2stage
- clang-thumbv7-full-2stage
This is another case where the "builder" name (builder meaning the configuration it builds) is served by a "worker" that has a different name. So lnt-perf runs on whatever is running the tk1-02 container (https://lab.llvm.org/buildbot/#/builders/113).
The other 2 are on the "silent" buildmaster, for example https://lab.llvm.org/staging/#/builders/169.
Thanks for the clarifications.
The current layout for the tk1s I wrote out in https://git.linaro.org/toolchain/jenkins-scripts.git/commit/?id=e54ac9c17326.... And yeah, it makes little sense but we can unpick it now we've got some more nodes online. (short term it is easier to move containers around on our end than change the llvm side config)
Ah, that is a very informative commit. Thanks for pointing it out.
- libcxx aarch64
- libcxx aarch64 -fno-exceptions
jade-04 runs 2 AArch64 buildkite agents that can build either of these configs depending on who picks it up first. So that might have made it less obvious.
Or these are references to the pre-buildkite buildbot libcxx bots but I'm pretty sure our monitoring is up to date and wouldn't include them.
Thank you again for providing this information!
linaro-toolchain@lists.linaro.org