[Linaro-TCWG-CI] glibc-2.42.9000-284-g27effb3d50: 9 regressions on aarch64

List overview All Threads
Download

newer

older

Re: [Linaro-TCWG-CI]...

ci_notify＠linaro.org

18 Oct 2025 18 Oct '25

11:42 a.m.

Dear contributor,

Our automatic CI has detected problems related to your patch(es). Please find some details below.

In master-aarch64, after: | commit glibc-2.42.9000-284-g27effb3d50 | Author: Yury Khrustalev yury.khrustalev@arm.com | Date: Thu Sep 25 15:54:36 2025 +0100 | | aarch64: clear ZA state of SME before clone and clone3 syscalls | | This change adds a call to the __arm_za_disable() function immediately | before the SVC instruction inside clone() and clone3() wrappers. It also | adds a macro for inline clone() used in fork() and adds the same call to | ... 129 lines of the commit log omitted.

Produces 9 regressions: | | regressions.sum: | Running gdb:gdb.threads/foll-fork-other-thread.exp ... | FAIL: gdb.threads/foll-fork-other-thread.exp: fork_func=fork: follow=child: target-non-stop=auto: non-stop=off: displaced-stepping=auto: bt | FAIL: gdb.threads/foll-fork-other-thread.exp: fork_func=fork: follow=child: target-non-stop=auto: non-stop=off: displaced-stepping=off: bt | FAIL: gdb.threads/foll-fork-other-thread.exp: fork_func=fork: follow=child: target-non-stop=auto: non-stop=off: displaced-stepping=on: bt | FAIL: gdb.threads/foll-fork-other-thread.exp: fork_func=fork: follow=child: target-non-stop=off: non-stop=off: displaced-stepping=auto: bt | ... and 5 more

Used configuration : *CI config* tcwg_gnu_native_check_gdb master-aarch64 *configure and test flags:* none, autodetected on aarch64-unknown-linux-gnu

We track this bug report under https://linaro.atlassian.net/browse/GNU-1706. Please let us know if you have a fix.

If you have any questions regarding this report, please ask on linaro-toolchain@lists.linaro.org mailing list.

-----------------8<--------------------------8<--------------------------8<--------------------------

The information below contains the details of the failures, and the ways to reproduce a debug environment:

You can find the failure logs in *.log.1.xz files in * https://ci.linaro.org/job/tcwg_gnu_native_check_gdb--master-aarch64-build/17... The full lists of regressions and improvements as well as configure and make commands are in * https://ci.linaro.org/job/tcwg_gnu_native_check_gdb--master-aarch64-build/17... The list of [ignored] baseline and flaky failures are in * https://ci.linaro.org/job/tcwg_gnu_native_check_gdb--master-aarch64-build/17...

Current build : https://ci.linaro.org/job/tcwg_gnu_native_check_gdb--master-aarch64-build/17... Reference build : https://ci.linaro.org/job/tcwg_gnu_native_check_gdb--master-aarch64-build/17...

Instruction to reproduce the build : https://gitlab.com/LinaroLtd/tcwg/ci/interesting-commits/-/raw/master/glibc/...

Full commit : https://sourceware.org/git/?p=glibc.git%3Ba=commitdiff%3Bh=27effb3d50424fb96...

Show replies by date

Yury Khrustalev

20 Oct 20 Oct

11:13 a.m.

Hi,

...

From: ci_notify@linaro.org ci_notify@linaro.org Sent: 18 October 2025 12:42 PM To: Yury Khrustalev Cc: linaro-toolchain@lists.linaro.org Subject: [Linaro-TCWG-CI] glibc-2.42.9000-284-g27effb3d50: 9 regressions on aarch64

Dear contributor,

Our automatic CI has detected problems related to your patch(es). Please find some details below.

I've rebuilt the toolchain using commands from reproduction_instructions.txt and then ran the tests from "foll-fork-other-thread.exp" on FVP model using kernel 6.16 (see [1] for details of Fast Model setup) and all tests pass including the 9 tests from the report.

Can I ask, which version of qemu was used for this build? does it support SME?

[1]: https://inbox.sourceware.org/libc-help/aIc3ElNTSQrelCK9@arm.com/

Thanks, Yury IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Christophe Lyon

11:21 a.m.

On Mon, 20 Oct 2025 at 13:14, Yury Khrustalev via linaro-toolchain linaro-toolchain@lists.linaro.org wrote:

...

Hi,

...
From: ci_notify@linaro.org ci_notify@linaro.org Sent: 18 October 2025 12:42 PM To: Yury Khrustalev Cc: linaro-toolchain@lists.linaro.org Subject: [Linaro-TCWG-CI] glibc-2.42.9000-284-g27effb3d50: 9 regressions on aarch64

Dear contributor,

Our automatic CI has detected problems related to your patch(es). Please find some details below.

I've rebuilt the toolchain using commands from reproduction_instructions.txt and then ran the tests from "foll-fork-other-thread.exp" on FVP model using kernel 6.16 (see [1] for details of Fast Model setup) and all tests pass including the 9 tests from the report.

Can I ask, which version of qemu was used for this build? does it support SME?

This build didn't use qemu. it was executed on a graviton 3 instance.

Thanks,

Christophe

...

Thanks, Yury IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-leave@lists.linaro.org

Yury Khrustalev

11:29 a.m.

...

From: Christophe Lyon christophe.lyon@linaro.org Sent: 20 October 2025 12:21 PM To: Yury Khrustalev Cc: linaro-toolchain@lists.linaro.org Subject: Re: [Linaro-TCWG-CI] glibc-2.42.9000-284-g27effb3d50: 9 regressions on aarch64 ...

...
Can I ask, which version of qemu was used for this build? does it support SME?

This build didn't use qemu. it was executed on a graviton 3 instance.

There are QEMU variables in the make check command... In any case, I cannot reproduce these failures, all relevant tests work correctly on both FVP and hardware aarch64 system.

Thanks, Yury

Thiago Jung Bauermann

28 Oct 28 Oct

4:56 a.m.

Hello Yury,

Yury Khrustalev via linaro-toolchain linaro-toolchain@lists.linaro.org writes:

...

...
From: Christophe Lyon christophe.lyon@linaro.org Sent: 20 October 2025 12:21 PM To: Yury Khrustalev Cc: linaro-toolchain@lists.linaro.org Subject: Re: [Linaro-TCWG-CI] glibc-2.42.9000-284-g27effb3d50: 9 regressions on aarch64 ...

...
Can I ask, which version of qemu was used for this build? does it support SME?

This build didn't use qemu. it was executed on a graviton 3 instance.

There are QEMU variables in the make check command... In any case, I cannot reproduce these failures, all relevant tests work correctly on both FVP and hardware aarch64 system.

Sorry for the delay. It took me a bit to reproduce it outside of our CI environment because I'm not used to glibc development, but I finally managed it on a QEMU VM with -cpu max running Ubuntu 24.04:

1. Build glibc and install according to https://sourceware.org/glibc/wiki/Testing/Builds#Building_glibc_with_intent_...

2. Change GDB testcase to wait for GDB to be attached:

--- a/gdb/testsuite/gdb.threads/foll-fork-other-thread.c +++ b/gdb/testsuite/gdb.threads/foll-fork-other-thread.c @@ -22,6 +22,7 @@ #include <errno.h> #include <assert.h> #include <limits.h> +#include <stdio.h>

/* Set by GDB. */ volatile int stop_looping = 0; @@ -66,6 +67,13 @@ main (void) int i; int ret; pthread_t thread; + volatile int gdb_attached = 0; + + + printf ("PID = %d\n", getpid ()); + + while (gdb_attached == 0) + sleep_a_bit ();

alarm (60);

I'm attaching the full .c file, to simplify things.

3. Build testcase according to https://sourceware.org/glibc/wiki/Testing/Builds#Compile_against_glibc_in_an...:

$ SYSROOT=/path/to/glibc-install $ gcc -L${SYSROOT}/usr/lib64 \ -I${SYSROOT}/include \ --sysroot=${SYSROOT} \ -Wl,-rpath=${SYSROOT}/lib64 \ -Wl,--dynamic-linker=${SYSROOT}/lib/ld-linux-aarch64.so.1 \ -g \ -pthread \ -o foll-fork-other-thread \ -DFORK_FUNC=fork \ foll-fork-other-thread.c

4. Create the following GDB commands file:

$ cat gdb-commands.txt set libthread-db-search-path /path/to/glibc-install/lib64 frame function main set gdb_attached = 1 set displaced-stepping auto catch fork continue thread 1 set scheduler-locking on break foll-fork-other-thread.c:85 continue set scheduler-locking off delete breakpoints set follow-fork child next bt

5. Run the testcase and attach GDB to it (you can use the distro's GDB):

$ ./foll-fork-other-thread & [1] 32660 PID = 32660 $ gdb -p 32660 -x gdb-commands.txt Attaching to process 32660 Reading symbols from /home/bauermann/scratchpad/GNU-1706/foll-fork-other-thread... Reading symbols from /home/thiago.bauermann/tmp/glibc-install/lib64/libc.so.6... Reading symbols from /home/thiago.bauermann/tmp/glibc-install/lib/ld-linux-aarch64.so.1... [Thread debugging using libthread_db enabled] Using host libthread_db library "/home/thiago.bauermann/tmp/glibc-install/lib64/libthread_db.so.1". __internal_syscall_cancel (a1=a1@entry=0, a2=a2@entry=0, a3=a3@entry=281474087606464, a4=a4@entry=0, a5=a5@entry=0, a6=a6@entry=0, nr=nr@entry=115) at cancellation.c:40

⚠️ warning: 40 cancellation.c: No such file or directory #5 0x0000aaaabcdf0dac in main () at foll-fork-other-thread.c:76 76 sleep_a_bit (); Catchpoint 1 (fork) [New Thread 0xffffacf0f1a0 (LWP 32682)] [Switching to Thread 0xffffacf0f1a0 (LWP 32682)]

Thread 2 "foll-fork-other" hit Catchpoint 1 (forked process 32683), arch_fork (ctid=0xffffacf0f270) at ../sysdeps/unix/sysv/linux/arch-fork.h:41 ⚠️ warning: 41 ../sysdeps/unix/sysv/linux/arch-fork.h: No such file or directory [Switching to thread 1 (Thread 0xffffad0fdf60 (LWP 32660))] #0 __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/aarch64/syscall_cancel.S:50 ⚠️ warning: 50 ../sysdeps/unix/sysv/linux/aarch64/syscall_cancel.S: No such file or directory Breakpoint 2 at 0xaaaabcdf0e0c: file foll-fork-other-thread.c, line 85.

Thread 1 "foll-fork-other" hit Breakpoint 2, main () at foll-fork-other-thread.c:85 85 sleep_a_bit (); /* break here */ [Attaching after Thread 0xffffacf0f1a0 (LWP 32682) fork to child process 32683] [New inferior 2 (process 32683)] [Detaching after fork from parent process 32660] [Inferior 1 (process 32660) detached] [Thread debugging using libthread_db enabled] Using host libthread_db library "/home/thiago.bauermann/tmp/glibc-install/lib64/libthread_db.so.1". ⚠️ warning: Not resuming: switched threads before following fork child. [Switching to Thread 0xffffacf0f1a0 (LWP 32683)] arch_fork (ctid=0xffffacf0f270) at ../sysdeps/unix/sysv/linux/arch-fork.h:41 ⚠️ warning: 41 ../sysdeps/unix/sysv/linux/arch-fork.h: No such file or directory #0 arch_fork (ctid=0xffffacf0f270) at ../sysdeps/unix/sysv/linux/arch-fork.h:41 #1 __GI__Fork () at ../sysdeps/nptl/_Fork.c:33 #2 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

The "corrupt stack" error above is the detected regression.

I reproduced the problem also on today's trunk (commit 013f5167b9c0). Commit b4b713bd8921 (the parent of your commit) doesn't reproduce the problem. Here is the backtrace from it:

#0 arch_fork (ctid=0xffffa806f270) at ../sysdeps/unix/sysv/linux/arch-fork.h:43 #1 __GI__Fork () at ../sysdeps/nptl/_Fork.c:33 #2 0x0000ffffa81297ec in __libc_fork () at fork.c:75 #3 0x0000aaaacaf20bc0 in gdb_forker_thread (arg=0x0) at foll-fork-other-thread.c:35 #4 0x0000ffffa80f09cc in start_thread (arg=0x0) at pthread_create.c:448 #5 0x0000ffffa814f94c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone3.S:72

Looking at frame #1 from trunk shows:

(gdb) info frame 1 Stack frame at 0xffffacf0e7a0: pc = 0xffffacfc52b8 in __GI__Fork (../sysdeps/nptl/_Fork.c:33); saved pc = 0x0 called by frame at 0xffffacf0e7a0, caller of frame at 0xffffacf0e7a0 source language c. Arglist at 0xffffacf0e770, args: Locals at 0xffffacf0e770, Previous frame's sp is 0xffffacf0e7a0 Saved registers: x19 at 0xffffacf0e780, x20 at 0xffffacf0e788, x29 at 0xffffacf0e770

Notice how it says "saved pc = 0x0". That doesn't look good.

Compare with frame 1 from the parent commit b4b713bd8921, which has a good backtrace:

(gdb) info frame 1 Stack frame at 0xffffa806e7a0: pc = 0xffffa812526c in __GI__Fork (../sysdeps/nptl/_Fork.c:33); saved pc = 0xffffa81297ec called by frame at 0xffffa806e8d0, caller of frame at 0xffffa806e7a0 source language c. Arglist at 0xffffa806e770, args: Locals at 0xffffa806e770, Previous frame's sp is 0xffffa806e7a0 Saved registers: x19 at 0xffffa806e780, x20 at 0xffffa806e788, x29 at 0xffffa806e770, x30 at 0xffffa806e778

Notice that it has a valid "saved pc". Also interesting is the list of saved registers, which includes x30 (aka the link register).

-- Thiago

Thiago Jung Bauermann

5:50 a.m.

Thiago Jung Bauermann thiago.bauermann@linaro.org writes:

...

Sorry for the delay. It took me a bit to reproduce it outside of our CI environment because I'm not used to glibc development, but I finally managed it on a QEMU VM with -cpu max running Ubuntu 24.04:

It turns out I can also reproduce it on an Ampere machine with a Neoverse-N1 processor.

I was using QEMU with -cpu max because I assumed that the regression was related to SME state, but it seems that it isn't.

-- Thiago

Yury Khrustalev

11:50 a.m.

Hi Thiago,

...

From: Thiago Jung Bauermann thiago.bauermann@linaro.org Sent: 28 October 2025 05:50 AM To: Yury Khrustalev via linaro-toolchain Cc: Christophe Lyon; Yury Khrustalev; nd Subject: Re: [Linaro-TCWG-CI] glibc-2.42.9000-284-g27effb3d50: 9 regressions on aarch64

Thiago Jung Bauermann thiago.bauermann@linaro.org writes:

...
Sorry for the delay. It took me a bit to reproduce it outside of our CI environment because I'm not used to glibc development, but I finally managed it on a QEMU VM with -cpu max running Ubuntu 24.04:

It turns out I can also reproduce it on an Ampere machine with a Neoverse-N1 processor.

I was using QEMU with -cpu max because I assumed that the regression was related to SME state, but it seems that it isn't.

Thanks for the updated instructions for reproducing this regression. It really helped and I've managed to reproduce it and find the cause. The incorrect CFI directive corrupts state that is used by GDB to workout callstack information.

Patch [1] is on the way.

I'm still not sure why GDB tests were passing when I ran them via `make check`, so I can't confirm if the fix will resolve these regressions in the CI. It would be great if this could be confirmed from your side.

[1]: https://inbox.sourceware.org/libc-alpha/20251028115009.1308287-1-yury.khrust...

Thanks, Yury

106

days inactive

116

days old

linaro-toolchain@lists.linaro.org

6 comments

participants

tags (0)

participants (4)

Christophe Lyon
ci_notify＠linaro.org
Thiago Jung Bauermann
Yury Khrustalev