It is caused by enabling the slp-vectorizer pass. Failing tests are the following.
https://github.com/fujitsu/compiler-test-suite/blob/main/Fortran/0105/0105_0... https://github.com/fujitsu/compiler-test-suite/blob/main/Fortran/0631/0631_0... https://github.com/fujitsu/compiler-test-suite/blob/main/Fortran/0631/0631_0...
Compiler flags are the following.
flang -O3 -mcpu=neoverse-v1 -msve-vector-bits=scalable -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-mve-cg -DNDEBUG -fstack-arrays
0631_0051.f and 0631_0054.f seem to fail by miscompiling. Output of ot test execution differs form the expected one.
0105_0223.f90 fails by an assertion failure in the backend.
flang: llvm/include/llvm/CodeGen/SlotIndexes.h:626: void llvm::SlotIndexes::insertMBBInMaps(llvm::MachineBasicBlock*): Assertion `unsigned(mbb->getNumber()) == MBBRanges.size() && "Blocks must be added in order"' failed.
Thank you for pointing me in the right direction. I was able to reproduce the issue in my own environment as well. It appears that it's the combination of "-mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-mve-cg" with "-fslp-vectorize" that is causing problems in this case. Based on my testing:
flang -O3 -mcpu=neoverse-v1 -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-mve-cg -fslp-vectorize 0105_0223.f90 <- broken flang -O3 -mcpu=neoverse-v1 -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-mve-cg -fno-slp-vectorize 0105_0223.f90 <- working flang -O3 -mcpu=neoverse-v1 -fslp-vectorize 0105_0223.f90 <- working flang -O3 -mcpu=neoverse-v1 0105_0223.f90 <- working (same as above, -fslp-vectorize is enabled by O3)
I've not had any prior experience with the two machine pipeliner flags in question, but I'll reach out to some people who should know more and try to determine where the issue is coming from.
Overall I think the consensus in the flang community is that -O3 should behave the same way it does in clang, and since clang enables the slp-vectorizer by default then so should flang. I'm not sure whether an interaction with llvm backend flags such as this one here is reason enough to diverge from that, but I'll start a discussion and get some external opinions on that. I suppose it depends on whether the issue actually originates in the slp-vectorizer or in the pipeliner flags.
In the meantime, would you be able to determine whether the reasons for those flags being passed to your test suite in the first place are already covered by the slp-vectorizer? As in, does omitting "-mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-mve-cg" actually achieve measurably worse results than leaving those in alongside "-fno-slp-vectorize"? If so then it will be a very useful datapoint for determining what the priority here should be.
Thanks, Kajetan