Hi,
I noticed that bots like flang-aarch64-latest-gcc are quite slow and could benefit from enabling ccache. Could you make it available on the system so it could be turned on for all these builds?
Thanks,
While it's not visible in the zorg config we are using ccache. Except we do it by setting the compiler to a script that runs the expected clang/gcc via ccache. We can certainly look at using the ccache enable in zorg instead (for the first attempt it was easier to do it in a way we could control on our end).
Looking at the our flang bots overall 2 hours seems to be the average (out of tree is an outlier), I don't know anything about non Linaro flang bots. We will check if there is some obvious bottleneck here but we have resource constraints that limit how fast we can go even with perfect caching. Are there any other bots you were interested in? We can check those too.
What build times were you expecting to see? It is useful for us to know what expectations are even if, unfortunately, we don't meet them at this time.
Thanks, David Spickett.
On Tue, 28 Jun 2022 at 18:05, Mehdi AMINI joker.eph@gmail.com wrote:
Hi,
I noticed that bots like flang-aarch64-latest-gcc are quite slow and could benefit from enabling ccache. Could you make it available on the system so it could be turned on for all these builds?
Thanks,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
To add to David's answer, here is the logic that enables ccache in Linaro-maintained buildbots: https://git.linaro.org/ci/dockerfiles.git/tree/tcwg-base/tcwg-llvmbot/run.sh... .
We have experimented with using zorg's CCACHE settings a few years back, and it turned out to be more robust to configure ccache at the level of default system (well, container) compiler.
One thing to check is whether default 5GB cache limit fits us well. IIUC, flang builds are particularly big, and they may overflow the cache size.
-- Maxim Kuvyrkov https://www.linaro.org
On 29 Jun 2022, at 16:33, David Spickett david.spickett@linaro.org wrote:
While it's not visible in the zorg config we are using ccache. Except we do it by setting the compiler to a script that runs the expected clang/gcc via ccache. We can certainly look at using the ccache enable in zorg instead (for the first attempt it was easier to do it in a way we could control on our end).
Looking at the our flang bots overall 2 hours seems to be the average (out of tree is an outlier), I don't know anything about non Linaro flang bots. We will check if there is some obvious bottleneck here but we have resource constraints that limit how fast we can go even with perfect caching. Are there any other bots you were interested in? We can check those too.
What build times were you expecting to see? It is useful for us to know what expectations are even if, unfortunately, we don't meet them at this time.
Thanks, David Spickett.
On Tue, 28 Jun 2022 at 18:05, Mehdi AMINI joker.eph@gmail.com wrote:
Hi,
I noticed that bots like flang-aarch64-latest-gcc are quite slow and could benefit from enabling ccache. Could you make it available on the system so it could be turned on for all these builds?
Thanks,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
On Wed, Jun 29, 2022 at 3:39 PM Maxim Kuvyrkov maxim.kuvyrkov@linaro.org wrote:
To add to David's answer, here is the logic that enables ccache in Linaro-maintained buildbots: https://git.linaro.org/ci/dockerfiles.git/tree/tcwg-base/tcwg-llvmbot/run.sh... .
Nice trick!
We have experimented with using zorg's CCACHE settings a few years back, and it turned out to be more robust to configure ccache at the level of default system (well, container) compiler.
One thing to check is whether default 5GB cache limit fits us well. IIUC, flang builds are particularly big, and they may overflow the cache size.
Oh yeah, anything under 20GB is likely doomed, in particular if you share the cache across configs (like one machine building gcc and clang).
Can you try to print cache statistics? Maybe tweak the job to clear the stats before the job and print them after each build?
-- Maxim Kuvyrkov https://www.linaro.org
On 29 Jun 2022, at 16:33, David Spickett david.spickett@linaro.org
wrote:
While it's not visible in the zorg config we are using ccache. Except we do it by setting the compiler to a script that runs the expected clang/gcc via ccache. We can certainly look at using the ccache enable in zorg instead (for the first attempt it was easier to do it in a way we could control on our end).
Looking at the our flang bots overall 2 hours seems to be the average (out of tree is an outlier), I don't know anything about non Linaro flang bots. We will check if there is some obvious bottleneck here but we have resource constraints that limit how fast we can go even with perfect caching. Are there any other bots you were interested in? We can check those too.
What build times were you expecting to see? It is useful for us to know what expectations are even if, unfortunately, we don't meet them at this time.
flang-x86_64-knl-linux seems to to average 15-20min here, which is more like I would expect.
Even there they could go much faster: we could avoid building the world and only build flang and the test dependencies. Right now the bottleneck is linking all of the LLVM tools that aren't relevant for testing flang.
Compare with the way I set up the MLIR bots: https://lab.llvm.org/buildbot/#/builders/61/builds/28582 The build step here is exclusively building the binaries needed for running `check-mlir` and nothing more.
MLIR is smaller than flang, but we're still having a turnaround of 3-5 min when the cache is hot.
Hello,
I looked into ccache usage on the LLVM build bots.
Mehdi AMINI joker.eph@gmail.com writes:
On Wed, Jun 29, 2022 at 3:39 PM Maxim Kuvyrkov maxim.kuvyrkov@linaro.org wrote:
We have experimented with using zorg's CCACHE settings a few years back, and it turned out to be more robust to configure ccache at the level of default system (well, container) compiler.
One thing to check is whether default 5GB cache limit fits us well. IIUC, flang builds are particularly big, and they may overflow the cache size.
Oh yeah, anything under 20GB is likely doomed, in particular if you share the cache across configs (like one machine building gcc and clang).
Yes, we do share the cache like that.
Can you try to print cache statistics? Maybe tweak the job to clear the stats before the job and print them after each build?
We were using the default ccache size of 5 GB on all the LLVM bots. I have increased them now. Some machines have bigger and/or emptier disks than others, so I chose different cache sizes on different build hosts. I'll provide more detailed information in a separate email.
The machine that does the flang-aarch64-latest-gcc job (and also flang-aarch64-latest-clang as well as other flang and clang jobs) has a big and relatively empty disk so I increased its cache size to 100 GB.
On 29 Jun 2022, at 16:33, David Spickett david.spickett@linaro.org
wrote:
While it's not visible in the zorg config we are using ccache. Except we do it by setting the compiler to a script that runs the expected clang/gcc via ccache. We can certainly look at using the ccache enable in zorg instead (for the first attempt it was easier to do it in a way we could control on our end).
Looking at the our flang bots overall 2 hours seems to be the average (out of tree is an outlier), I don't know anything about non Linaro flang bots. We will check if there is some obvious bottleneck here but we have resource constraints that limit how fast we can go even with perfect caching. Are there any other bots you were interested in? We can check those too.
What build times were you expecting to see? It is useful for us to know what expectations are even if, unfortunately, we don't meet them at this time.
flang-x86_64-knl-linux seems to to average 15-20min here, which is more like I would expect.
flang-aarch64-latest-gcc builds now take between 10m and 30m, with an occasional build taking 1h. flang-aarch64-latest-clang is similar.
Even there they could go much faster: we could avoid building the world and only build flang and the test dependencies. Right now the bottleneck is linking all of the LLVM tools that aren't relevant for testing flang.
Compare with the way I set up the MLIR bots: https://lab.llvm.org/buildbot/#/builders/61/builds/28582 The build step here is exclusively building the binaries needed for running `check-mlir` and nothing more.
MLIR is smaller than flang, but we're still having a turnaround of 3-5 min when the cache is hot.
I haven't looked into that approach.
On Wed, Jul 6, 2022 at 9:40 PM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello,
I looked into ccache usage on the LLVM build bots.
Mehdi AMINI joker.eph@gmail.com writes:
On Wed, Jun 29, 2022 at 3:39 PM Maxim Kuvyrkov <
maxim.kuvyrkov@linaro.org>
wrote:
We have experimented with using zorg's CCACHE settings a few years back, and it turned out to be more robust to configure ccache at the level of default system (well, container) compiler.
One thing to check is whether default 5GB cache limit fits us well.
IIUC,
flang builds are particularly big, and they may overflow the cache size.
Oh yeah, anything under 20GB is likely doomed, in particular if you share the cache across configs (like one machine building gcc and clang).
Yes, we do share the cache like that.
Can you try to print cache statistics? Maybe tweak the job to clear the stats before the job and print them after each build?
We were using the default ccache size of 5 GB on all the LLVM bots. I have increased them now. Some machines have bigger and/or emptier disks than others, so I chose different cache sizes on different build hosts. I'll provide more detailed information in a separate email.
The machine that does the flang-aarch64-latest-gcc job (and also flang-aarch64-latest-clang as well as other flang and clang jobs) has a big and relatively empty disk so I increased its cache size to 100 GB.
On 29 Jun 2022, at 16:33, David Spickett david.spickett@linaro.org
wrote:
While it's not visible in the zorg config we are using ccache. Except we do it by setting the compiler to a script that runs the expected clang/gcc via ccache. We can certainly look at using the ccache enable in zorg instead (for the first attempt it was easier to do it in a way we could control on our end).
Looking at the our flang bots overall 2 hours seems to be the average (out of tree is an outlier), I don't know anything about non Linaro flang bots. We will check if there is some obvious bottleneck here but we have resource constraints that limit how fast we can go even with perfect caching. Are there any other bots you were interested in? We can check those too.
What build times were you expecting to see? It is useful for us to know what expectations are even if, unfortunately, we don't meet them at this time.
flang-x86_64-knl-linux seems to to average 15-20min here, which is more like I would expect.
flang-aarch64-latest-gcc builds now take between 10m and 30m, with an occasional build taking 1h. flang-aarch64-latest-clang is similar.
That's a huge improvement! :)
Even there they could go much faster: we could avoid building the world
and
only build flang and the test dependencies. Right now the bottleneck is linking all of the LLVM tools that aren't relevant for testing flang.
Compare with the way I set up the MLIR bots: https://lab.llvm.org/buildbot/#/builders/61/builds/28582 The build step here is exclusively building the binaries needed for
running
`check-mlir` and nothing more.
MLIR is smaller than flang, but we're still having a turnaround of 3-5
min
when the cache is hot.
I haven't looked into that approach.
-- Thiago
Mehdi AMINI joker.eph@gmail.com writes:
On Wed, Jul 6, 2022 at 9:40 PM Thiago Jung Bauermann thiago.bauermann@linaro.org wrote:
Mehdi AMINI joker.eph@gmail.com writes:
On 29 Jun 2022, at 16:33, David Spickett david.spickett@linaro.org
wrote:
What build times were you expecting to see? It is useful for us to know what expectations are even if, unfortunately, we don't meet them at this time.
flang-x86_64-knl-linux seems to to average 15-20min here, which is more like I would expect.
flang-aarch64-latest-gcc builds now take between 10m and 30m, with an occasional build taking 1h. flang-aarch64-latest-clang is similar.
That's a huge improvement! :)
Indeed it is! Thank you for bringing this issue to our attention.
I don't know if you are maintaining also clang-aarch64-sve-vla-2stage but it takes far too long right now (>10 hours). It notifies for large number of people which is overly noisy right now. See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means it'll include commits that touches part of the codebase totally unrelated to what it is testing (like flang and mlir). That would be a first easy step to reduce the number of unrelated changes that get flagged incorrectly.
Thanks,
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
I don't know if you are maintaining also clang-aarch64-sve-vla-2stage but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from 5 GB to 20 GB because the machine running that builder is also a development box. Looking at the build times graph it looks like it helped a bit but not a lot.
I now increased the size again to 40 GB. I'll monitor to see if there's an impact.
It notifies for large number of people which is overly noisy right now. See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number of notified people? Or is there something else that could/should be done about that?
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means it'll include commits that touches part of the codebase totally unrelated to what it is testing (like flang and mlir). That would be a first easy step to reduce the number of unrelated changes that get flagged incorrectly.
Thank you for the suggestion, I see that many builders running on Linaro workers don't have it. I'll prepare a patch to add “depends_on_projects” arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test flang as well), so perhaps it wouldn't change much in practice?
On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
I don't know if you are maintaining also clang-aarch64-sve-vla-2stage but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from 5 GB to 20 GB because the machine running that builder is also a development box. Looking at the build times graph it looks like it helped a bit but not a lot.
I now increased the size again to 40 GB. I'll monitor to see if there's an impact.
It notifies for large number of people which is overly noisy right now. See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number of notified people? Or is there something else that could/should be done about that?
It will at least eliminate people who commit in unrelated projects.
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means it'll include commits that touches part of the codebase totally unrelated to what it is testing (like flang and mlir). That would be a first easy step to reduce the number of unrelated changes that get flagged incorrectly.
Thank you for the suggestion, I see that many builders running on Linaro workers don't have it. I'll prepare a patch to add “depends_on_projects” arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test flang as well), so perhaps it wouldn't change much in practice?
I didn't know this builder would test flang as well: which stage of the build is doing so? In general I try to keep builder configs more focused to avoid that bug in one components hides regression in another one (for example MLIR breaking stage1 for this bot for a day and in the meantime you get a stage-2 regression that won't be detected)
In any case, this will still help people contributing outside this list (libc++, lldb, lld, compiler-rt), unless you also need these.
Cheers,
I didn't know this builder would test flang as well: which stage of the build is doing so?
It tests it only for stage 2, and the 1 stage bot checks stage 1. So we do have some of that focus you talked about between clang-aarch64-sve-vla (which is the 1 stage) and clang-aarch64-sve-vla-2stage. We also have plain AArch64 bots checking flang, but the point here is to exercise SVE codegen.
You are right that the 2 stage bot could skip building flang in stage 1 because it's not going to test it. In theory our ccaching means this isn't a big deal (it does now, thanks again!) but we could explicitly disable it and save a lot of linking at least.
We will do our best to improve the build times but ultimately we are limited by hardware availability which is a more difficult problem to fix.
On Fri, 22 Jul 2022 at 10:16, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
I don't know if you are maintaining also clang-aarch64-sve-vla-2stage but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from 5 GB to 20 GB because the machine running that builder is also a development box. Looking at the build times graph it looks like it helped a bit but not a lot.
I now increased the size again to 40 GB. I'll monitor to see if there's an impact.
It notifies for large number of people which is overly noisy right now. See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number of notified people? Or is there something else that could/should be done about that?
It will at least eliminate people who commit in unrelated projects.
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means it'll include commits that touches part of the codebase totally unrelated to what it is testing (like flang and mlir). That would be a first easy step to reduce the number of unrelated changes that get flagged incorrectly.
Thank you for the suggestion, I see that many builders running on Linaro workers don't have it. I'll prepare a patch to add “depends_on_projects” arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test flang as well), so perhaps it wouldn't change much in practice?
I didn't know this builder would test flang as well: which stage of the build is doing so? In general I try to keep builder configs more focused to avoid that bug in one components hides regression in another one (for example MLIR breaking stage1 for this bot for a day and in the meantime you get a stage-2 regression that won't be detected)
In any case, this will still help people contributing outside this list (libc++, lldb, lld, compiler-rt), unless you also need these.
Cheers,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
On Fri, Jul 22, 2022 at 1:18 PM David Spickett david.spickett@linaro.org wrote:
I didn't know this builder would test flang as well: which stage of the
build is doing so?
It tests it only for stage 2, and the 1 stage bot checks stage 1. So we do have some of that focus you talked about between clang-aarch64-sve-vla (which is the 1 stage) and clang-aarch64-sve-vla-2stage. We also have plain AArch64 bots checking flang, but the point here is to exercise SVE codegen.
OK! But how is testing flang in stage2 helpful for this? This is just testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
That is: no change to MLIR or Flang should affect the behavior of clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
You are right that the 2 stage bot could skip building flang in stage 1 because it's not going to test it. In theory our ccaching means this isn't a big deal (it does now, thanks again!) but we could explicitly disable it and save a lot of linking at least.
We will do our best to improve the build times but ultimately we are limited by hardware availability which is a more difficult problem to fix.
Right, and stage-2 can't use any ccache anyway: this is tricky...
By the way, your method of having ccache enabled globally mean it is enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
On Fri, 22 Jul 2022 at 10:16, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
I don't know if you are maintaining also clang-aarch64-sve-vla-2stage but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from 5
GB
to 20 GB because the machine running that builder is also a development box. Looking at the build times graph it looks like it helped a bit but not a lot.
I now increased the size again to 40 GB. I'll monitor to see if there's an impact.
It notifies for large number of people which is overly noisy right
now.
See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number of notified people? Or is there something else that could/should be done about that?
It will at least eliminate people who commit in unrelated projects.
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means
it'll
include commits that touches part of the codebase totally unrelated
to
what it is testing (like flang and mlir). That would be a first easy step to reduce the number of unrelated changes that get flagged incorrectly.
Thank you for the suggestion, I see that many builders running on
Linaro
workers don't have it. I'll prepare a patch to add
“depends_on_projects”
arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test flang as well), so perhaps it wouldn't change much in practice?
I didn't know this builder would test flang as well: which stage of the build is doing so? In general I try to keep builder configs more focused to avoid that bug
in
one components hides regression in another one (for example MLIR breaking stage1 for this bot for a day and in the meantime you get a stage-2 regression that won't be detected)
In any case, this will still help people contributing outside this list (libc++, lldb, lld, compiler-rt), unless you also need these.
Cheers,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
OK! But how is testing flang in stage2 helpful for this? This is just testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
Yes, it is checking that the clang in stage 1 generates correct SVE code. So that when that clang becomes the next release, flang works out of the box.
The test suite (meaning https://github.com/llvm/llvm-test-suite) does include some fortran programs but they are run in a mode where flang is used to "parse unparse" basically go source to AST to source. Then compiled with gofrtran. So you have some coverage there but not a lot.
That is: no change to MLIR or Flang should affect the behavior of clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
If SVE codegen were perfect then I'd agree. Given that it isn't I wouldn't want to miss: * some change to flang/mlir comes in * this change hits some particular codegen situation * clang built in stage 1 emits incorrect code
Similarly if there is a change to clang then you compile the same flang but get some of it wrong, hopefully the tests pick this up. We want to check the runtime behaviour of the code stage 1 clang generates.
Granted it can be frustrating to be on a blame list just because your perfectly valid code hit a pre-existing bug. It's something we try to mitigate by proactively telling people "hey we see your commit broke this but don't worry we know it's something else". We could reason that if a flang/mlir change made it through our other 2 stage bots or the 1 stage SVE bots, it cannot be the root of the issue. If that makes sense.
We'd still want to trigger a new 2 stage SVE build on that change though. So it would need buildbot to be able to take the blame list and filter out flang/mlir changes (and if stage 1 of the 2 stage broke, the stage 1 bot already reported that so it's fine to not notify again). I'm not sure if it can handle that.
By the way, your method of having ccache enabled globally mean it is enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
Stage 2 bypasses the cache. Stage 1 uses /usr/bin/cc which invokes ccache then stage 2 has the compilers set directly to the clang executable built in stage 1.
On Fri, 22 Jul 2022 at 12:28, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 1:18 PM David Spickett david.spickett@linaro.org wrote:
I didn't know this builder would test flang as well: which stage of the build is doing so?
It tests it only for stage 2, and the 1 stage bot checks stage 1. So we do have some of that focus you talked about between clang-aarch64-sve-vla (which is the 1 stage) and clang-aarch64-sve-vla-2stage. We also have plain AArch64 bots checking flang, but the point here is to exercise SVE codegen.
OK! But how is testing flang in stage2 helpful for this? This is just testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
That is: no change to MLIR or Flang should affect the behavior of clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
You are right that the 2 stage bot could skip building flang in stage 1 because it's not going to test it. In theory our ccaching means this isn't a big deal (it does now, thanks again!) but we could explicitly disable it and save a lot of linking at least.
We will do our best to improve the build times but ultimately we are limited by hardware availability which is a more difficult problem to fix.
Right, and stage-2 can't use any ccache anyway: this is tricky...
By the way, your method of having ccache enabled globally mean it is enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
On Fri, 22 Jul 2022 at 10:16, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
I don't know if you are maintaining also clang-aarch64-sve-vla-2stage but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from 5 GB to 20 GB because the machine running that builder is also a development box. Looking at the build times graph it looks like it helped a bit but not a lot.
I now increased the size again to 40 GB. I'll monitor to see if there's an impact.
It notifies for large number of people which is overly noisy right now. See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number of notified people? Or is there something else that could/should be done about that?
It will at least eliminate people who commit in unrelated projects.
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means it'll include commits that touches part of the codebase totally unrelated to what it is testing (like flang and mlir). That would be a first easy step to reduce the number of unrelated changes that get flagged incorrectly.
Thank you for the suggestion, I see that many builders running on Linaro workers don't have it. I'll prepare a patch to add “depends_on_projects” arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test flang as well), so perhaps it wouldn't change much in practice?
I didn't know this builder would test flang as well: which stage of the build is doing so? In general I try to keep builder configs more focused to avoid that bug in one components hides regression in another one (for example MLIR breaking stage1 for this bot for a day and in the meantime you get a stage-2 regression that won't be detected)
In any case, this will still help people contributing outside this list (libc++, lldb, lld, compiler-rt), unless you also need these.
Cheers,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
On Fri, Jul 22, 2022 at 3:39 PM David Spickett david.spickett@linaro.org wrote:
OK! But how is testing flang in stage2 helpful for this? This is just
testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
Yes, it is checking that the clang in stage 1 generates correct SVE code. So that when that clang becomes the next release, flang works out of the box.
The test suite (meaning https://github.com/llvm/llvm-test-suite) does include some fortran programs but they are run in a mode where flang is used to "parse unparse" basically go source to AST to source. Then compiled with gofrtran. So you have some coverage there but not a lot.
That is: no change to MLIR or Flang should affect the behavior of
clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
If SVE codegen were perfect then I'd agree. Given that it isn't I wouldn't want to miss:
- some change to flang/mlir comes in
- this change hits some particular codegen situation
- clang built in stage 1 emits incorrect code
Right: but you're using flang and mlir HEAD as a test suite for clang as I understand it, I don't think that MLIR/flang developers should get their patches flagged here (which is what buildbots are doing). We (google) are doing similar tests, but in a silent infrastructure and investigate issues that we report asynchronously: that is the difference is that the community does not bear the weight of the bot maintenance. This is exacerbated by the fact that this bot runs too slowly and batches too many commits together to provide a good signal, you likely should look into making it silent (that is only you gets notified automatically).
Similarly if there is a change to clang then you compile the same
flang but get some of it wrong, hopefully the tests pick this up. We want to check the runtime behaviour of the code stage 1 clang generates.
Granted it can be frustrating to be on a blame list just because your perfectly valid code hit a pre-existing bug. It's something we try to mitigate by proactively telling people "hey we see your commit broke this but don't worry we know it's something else". We could reason that if a flang/mlir change made it through our other 2 stage bots or the 1 stage SVE bots, it cannot be the root of the issue. If that makes sense.
We'd still want to trigger a new 2 stage SVE build on that change though. So it would need buildbot to be able to take the blame list and filter out flang/mlir changes (and if stage 1 of the 2 stage broke, the stage 1 bot already reported that so it's fine to not notify again). I'm not sure if it can handle that.
By the way, your method of having ccache enabled globally mean it is
enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
Stage 2 bypasses the cache. Stage 1 uses /usr/bin/cc which invokes ccache then stage 2 has the compilers set directly to the clang executable built in stage 1.
Ah of course, it all WAI :)
On Fri, 22 Jul 2022 at 12:28, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 1:18 PM David Spickett <
david.spickett@linaro.org> wrote:
I didn't know this builder would test flang as well: which stage of
the build is doing so?
It tests it only for stage 2, and the 1 stage bot checks stage 1. So we do have some of that focus you talked about between clang-aarch64-sve-vla (which is the 1 stage) and clang-aarch64-sve-vla-2stage. We also have plain AArch64 bots checking flang, but the point here is to exercise SVE codegen.
OK! But how is testing flang in stage2 helpful for this? This is just
testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
That is: no change to MLIR or Flang should affect the behavior of
clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
You are right that the 2 stage bot could skip building flang in stage 1 because it's not going to test it. In theory our ccaching means this isn't a big deal (it does now, thanks again!) but we could explicitly disable it and save a lot of linking at least.
We will do our best to improve the build times but ultimately we are limited by hardware availability which is a more difficult problem to fix.
Right, and stage-2 can't use any ccache anyway: this is tricky...
By the way, your method of having ccache enabled globally mean it is
enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
On Fri, 22 Jul 2022 at 10:16, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
I don't know if you are maintaining also
clang-aarch64-sve-vla-2stage
but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from
5 GB
to 20 GB because the machine running that builder is also a
development
box. Looking at the build times graph it looks like it helped a bit
but
not a lot.
I now increased the size again to 40 GB. I'll monitor to see if
there's
an impact.
It notifies for large number of people which is overly noisy
right now.
See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number
of
notified people? Or is there something else that could/should be
done
about that?
It will at least eliminate people who commit in unrelated projects.
One thing I noticed is that it seems to be missing the `depends_on_projects` for the buildbot configuration: that means
it'll
include commits that touches part of the codebase totally
unrelated to
what it is testing (like flang and mlir). That would be a first
easy
step to reduce the number of unrelated changes that get flagged incorrectly.
Thank you for the suggestion, I see that many builders running on
Linaro
workers don't have it. I'll prepare a patch to add
“depends_on_projects”
arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to
test
flang as well), so perhaps it wouldn't change much in practice?
I didn't know this builder would test flang as well: which stage of
the
build is doing so? In general I try to keep builder configs more focused to avoid that
bug in
one components hides regression in another one (for example MLIR
breaking
stage1 for this bot for a day and in the meantime you get a stage-2 regression that won't be detected)
In any case, this will still help people contributing outside this
list
(libc++, lldb, lld, compiler-rt), unless you also need these.
Cheers,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to
linaro-toolchain-leave@lists.linaro.org
Right: but you're using flang and mlir HEAD as a test suite for clang as I understand it, I don't think that MLIR/flang developers should get their patches flagged here (which is what buildbots are doing). We (google) are doing similar tests, but in a silent infrastructure and investigate issues that we report asynchronously:
Point taken, and moving to silent is a lot simpler way to achieve that than filtering changes.
I'll discuss this with the team next week, thanks again for the feedback!
On Fri, 22 Jul 2022 at 16:21, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 3:39 PM David Spickett david.spickett@linaro.org wrote:
OK! But how is testing flang in stage2 helpful for this? This is just testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
Yes, it is checking that the clang in stage 1 generates correct SVE code. So that when that clang becomes the next release, flang works out of the box.
The test suite (meaning https://github.com/llvm/llvm-test-suite) does include some fortran programs but they are run in a mode where flang is used to "parse unparse" basically go source to AST to source. Then compiled with gofrtran. So you have some coverage there but not a lot.
That is: no change to MLIR or Flang should affect the behavior of clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
If SVE codegen were perfect then I'd agree. Given that it isn't I wouldn't want to miss:
- some change to flang/mlir comes in
- this change hits some particular codegen situation
- clang built in stage 1 emits incorrect code
Right: but you're using flang and mlir HEAD as a test suite for clang as I understand it, I don't think that MLIR/flang developers should get their patches flagged here (which is what buildbots are doing). We (google) are doing similar tests, but in a silent infrastructure and investigate issues that we report asynchronously: that is the difference is that the community does not bear the weight of the bot maintenance. This is exacerbated by the fact that this bot runs too slowly and batches too many commits together to provide a good signal, you likely should look into making it silent (that is only you gets notified automatically).
Similarly if there is a change to clang then you compile the same flang but get some of it wrong, hopefully the tests pick this up. We want to check the runtime behaviour of the code stage 1 clang generates.
Granted it can be frustrating to be on a blame list just because your perfectly valid code hit a pre-existing bug. It's something we try to mitigate by proactively telling people "hey we see your commit broke this but don't worry we know it's something else". We could reason that if a flang/mlir change made it through our other 2 stage bots or the 1 stage SVE bots, it cannot be the root of the issue. If that makes sense.
We'd still want to trigger a new 2 stage SVE build on that change though. So it would need buildbot to be able to take the blame list and filter out flang/mlir changes (and if stage 1 of the 2 stage broke, the stage 1 bot already reported that so it's fine to not notify again). I'm not sure if it can handle that.
By the way, your method of having ccache enabled globally mean it is enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
Stage 2 bypasses the cache. Stage 1 uses /usr/bin/cc which invokes ccache then stage 2 has the compilers set directly to the clang executable built in stage 1.
Ah of course, it all WAI :)
On Fri, 22 Jul 2022 at 12:28, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 1:18 PM David Spickett david.spickett@linaro.org wrote:
I didn't know this builder would test flang as well: which stage of the build is doing so?
It tests it only for stage 2, and the 1 stage bot checks stage 1. So we do have some of that focus you talked about between clang-aarch64-sve-vla (which is the 1 stage) and clang-aarch64-sve-vla-2stage. We also have plain AArch64 bots checking flang, but the point here is to exercise SVE codegen.
OK! But how is testing flang in stage2 helpful for this? This is just testing that clang from stage-1 is properly functioning right? The test-suite is there for this I think?
That is: no change to MLIR or Flang should affect the behavior of clang-stage-1? (I'm trying to figure out if these commits should trigger testing)
You are right that the 2 stage bot could skip building flang in stage 1 because it's not going to test it. In theory our ccaching means this isn't a big deal (it does now, thanks again!) but we could explicitly disable it and save a lot of linking at least.
We will do our best to improve the build times but ultimately we are limited by hardware availability which is a more difficult problem to fix.
Right, and stage-2 can't use any ccache anyway: this is tricky...
By the way, your method of having ccache enabled globally mean it is enabled implicitly during stage-2 as well? This won't have cache hit (stage-2...) but it'll take cache space unfortunately (and there is a slight overhead to going through the cache all the time).
On Fri, 22 Jul 2022 at 10:16, Mehdi AMINI joker.eph@gmail.com wrote:
On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann < thiago.bauermann@linaro.org> wrote:
Hello Mehdi,
Mehdi AMINI joker.eph@gmail.com writes:
> I don't know if you are maintaining also clang-aarch64-sve-vla-2stage > but it takes far too long right now (>10 hours).
Yes, I actually increased its ccache size on July 6th but only from 5 GB to 20 GB because the machine running that builder is also a development box. Looking at the build times graph it looks like it helped a bit but not a lot.
I now increased the size again to 40 GB. I'll monitor to see if there's an impact.
> It notifies for large number of people which is overly noisy right now. > See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
Does adding a “depends_on_projects” argument help reduce the number of notified people? Or is there something else that could/should be done about that?
It will at least eliminate people who commit in unrelated projects.
> One thing I noticed is that it seems to be missing the > `depends_on_projects` for the buildbot configuration: that means it'll > include commits that touches part of the codebase totally unrelated to > what it is testing (like flang and mlir). That would be a first easy > step to reduce the number of unrelated changes that get flagged > incorrectly.
Thank you for the suggestion, I see that many builders running on Linaro workers don't have it. I'll prepare a patch to add “depends_on_projects” arguments to them.
Though specifically for clang-aarch64-sve-vla-2stage IIUC its value would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test flang as well), so perhaps it wouldn't change much in practice?
I didn't know this builder would test flang as well: which stage of the build is doing so? In general I try to keep builder configs more focused to avoid that bug in one components hides regression in another one (for example MLIR breaking stage1 for this bot for a day and in the meantime you get a stage-2 regression that won't be detected)
In any case, this will still help people contributing outside this list (libc++, lldb, lld, compiler-rt), unless you also need these.
Cheers,
-- Mehdi _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org
linaro-toolchain@lists.linaro.org