Linaro recently started building and testing with stable branches with clang. Stable 4.9 branch kernel built with clang 10 boot crashed on x86 and qemu_x86. We do not have base line results to compare with.
steps to build and boot: # build kernel with tuxmake # sudo pip3 install -U tuxmake # tuxmake --runtime docker --target-arch x86 --toolchain clang-10 --kconfig defconfig --kconfig-add https://builds.tuxbuild.com/1kgtX7QEDmhvj6OfbZBdlGaEple/config # boot qemu_x86_64 # /usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net nic,model=virtio,macaddr=DE:AD:BE:EF:66:14 -net tap -m 1024 -monitor none -kernel kernel/bzImage --append "root=/dev/sda rootwait console=ttyS0,115200" -hda rootfs/rpb-console-image-lkft-intel-corei7-64-20201022181159-3085.rootfs.ext4 -m 4096 -smp 4 -nographic
Crash log: --------------- [ 14.121499] Freeing unused kernel memory: 1896K [ 14.126962] random: fast init done [ 14.206005] PANIC: double fault, error_code: 0x0 [ 14.210633] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2 [ 14.216809] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 14.224196] task: ffff88026e2c0000 task.stack: ffffc90000020000 [ 14.230105] RIP: 0010:[<ffffffff8117ec2b>] [<ffffffff8117ec2b>] proc_dostring+0x13b/0x1e0 [ 14.238374] RSP: 0018:000000000000000c EFLAGS: 00010297 [ 14.243676] RAX: 00005638939fb850 RBX: 000000000000000c RCX: 00005638939fb850 [ 14.250799] RDX: 000000000000000c RSI: 0000000000000000 RDI: 000000000000007f [ 14.257925] RBP: ffffc90000023d98 R08: ffffc90000023ef8 R09: 00005638939fb850 [ 14.265049] R10: 0000000000000000 R11: ffffffff8117f9e0 R12: ffffffff82479cf0 [ 14.272171] R13: ffffc90000023ef8 R14: ffffc90000023dd8 R15: 000000000000007f [ 14.279298] FS: 00007f57fbce8840(0000) GS:ffff880277880000(0000) knlGS:0000000000000000 [ 14.287384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.293120] CR2: fffffffffffffff8 CR3: 000000026d58a000 CR4: 0000000000360670 [ 14.300243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.307368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 14.314491] Stack: [ 14.316504] Call Trace: [ 14.318955] Code: c3 49 8b 10 31 f6 48 01 da 49 89 10 49 83 3e 00 74 49 41 83 c7 ff 49 63 ff 4c 89 c9 0f 1f 40 00 48 39 fe 73 36 48 89 c8 48 89 dc <e8> b0 9d 3a 00 85 c0 0f 85 8c 00 00 00 84 d2 74 1f 80 fa 0a 74 [ 14.338906] Kernel panic - not syncing: Machine halted. [ 14.344123] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2 [ 14.350291] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 14.357677] ffff880277888e80 ffffffff81518ae9 ffff880277888e98 ffffffff82971a10 [ 14.365129] 000000000000000f 0000000000000000 0000000000000086 ffffffff820c5d57 [ 14.372584] ffff880277888f08 ffffffff81175736 0000003000000008 ffff880277888f18 [ 14.380038] Call Trace: [ 14.382481] <#DF> [ 14.384406] [<ffffffff81518ae9>] dump_stack+0xa9/0x100 [ 14.389641] [<ffffffff81175736>] panic+0xe6/0x2a0 [ 14.394432] [<ffffffff810c9911>] df_debug+0x31/0x40 [ 14.399389] [<ffffffff81096312>] do_double_fault+0x102/0x140 [ 14.405128] [<ffffffff81ccc987>] double_fault+0x27/0x30 [ 14.410440] [<ffffffff8117f9e0>] ? proc_put_long+0xc0/0xc0 [ 14.416004] [<ffffffff8117ec2b>] ? proc_dostring+0x13b/0x1e0 [ 14.421739] <EOE> [ 14.423703] Kernel Offset: disabled [ 14.427209] ---[ end Kernel panic - not syncing: Machine halted.
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
full test log, https://lkft.validation.linaro.org/scheduler/job/1978901#L916 https://lkft.validation.linaro.org/scheduler/job/1980839#L578
On Thu, Nov 26, 2020 at 10:14:43AM +0530, Naresh Kamboju wrote:
Linaro recently started building and testing with stable branches with clang. Stable 4.9 branch kernel built with clang 10 boot crashed on x86 and qemu_x86. We do not have base line results to compare with.
steps to build and boot: # build kernel with tuxmake # sudo pip3 install -U tuxmake # tuxmake --runtime docker --target-arch x86 --toolchain clang-10 --kconfig defconfig --kconfig-add https://builds.tuxbuild.com/1kgtX7QEDmhvj6OfbZBdlGaEple/config # boot qemu_x86_64 # /usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net nic,model=virtio,macaddr=DE:AD:BE:EF:66:14 -net tap -m 1024 -monitor none -kernel kernel/bzImage --append "root=/dev/sda rootwait console=ttyS0,115200" -hda rootfs/rpb-console-image-lkft-intel-corei7-64-20201022181159-3085.rootfs.ext4 -m 4096 -smp 4 -nographic
Crash log:
[ 14.121499] Freeing unused kernel memory: 1896K [ 14.126962] random: fast init done [ 14.206005] PANIC: double fault, error_code: 0x0 [ 14.210633] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2 [ 14.216809] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 14.224196] task: ffff88026e2c0000 task.stack: ffffc90000020000 [ 14.230105] RIP: 0010:[<ffffffff8117ec2b>] [<ffffffff8117ec2b>] proc_dostring+0x13b/0x1e0 [ 14.238374] RSP: 0018:000000000000000c EFLAGS: 00010297 [ 14.243676] RAX: 00005638939fb850 RBX: 000000000000000c RCX: 00005638939fb850 [ 14.250799] RDX: 000000000000000c RSI: 0000000000000000 RDI: 000000000000007f [ 14.257925] RBP: ffffc90000023d98 R08: ffffc90000023ef8 R09: 00005638939fb850 [ 14.265049] R10: 0000000000000000 R11: ffffffff8117f9e0 R12: ffffffff82479cf0 [ 14.272171] R13: ffffc90000023ef8 R14: ffffc90000023dd8 R15: 000000000000007f [ 14.279298] FS: 00007f57fbce8840(0000) GS:ffff880277880000(0000) knlGS:0000000000000000 [ 14.287384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.293120] CR2: fffffffffffffff8 CR3: 000000026d58a000 CR4: 0000000000360670 [ 14.300243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.307368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 14.314491] Stack: [ 14.316504] Call Trace: [ 14.318955] Code: c3 49 8b 10 31 f6 48 01 da 49 89 10 49 83 3e 00 74 49 41 83 c7 ff 49 63 ff 4c 89 c9 0f 1f 40 00 48 39 fe 73 36 48 89 c8 48 89 dc <e8> b0 9d 3a 00 85 c0 0f 85 8c 00 00 00 84 d2 74 1f 80 fa 0a 74 [ 14.338906] Kernel panic - not syncing: Machine halted. [ 14.344123] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2 [ 14.350291] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 14.357677] ffff880277888e80 ffffffff81518ae9 ffff880277888e98 ffffffff82971a10 [ 14.365129] 000000000000000f 0000000000000000 0000000000000086 ffffffff820c5d57 [ 14.372584] ffff880277888f08 ffffffff81175736 0000003000000008 ffff880277888f18 [ 14.380038] Call Trace: [ 14.382481] <#DF> [ 14.384406] [<ffffffff81518ae9>] dump_stack+0xa9/0x100 [ 14.389641] [<ffffffff81175736>] panic+0xe6/0x2a0 [ 14.394432] [<ffffffff810c9911>] df_debug+0x31/0x40 [ 14.399389] [<ffffffff81096312>] do_double_fault+0x102/0x140 [ 14.405128] [<ffffffff81ccc987>] double_fault+0x27/0x30 [ 14.410440] [<ffffffff8117f9e0>] ? proc_put_long+0xc0/0xc0 [ 14.416004] [<ffffffff8117ec2b>] ? proc_dostring+0x13b/0x1e0 [ 14.421739] <EOE> [ 14.423703] Kernel Offset: disabled [ 14.427209] ---[ end Kernel panic - not syncing: Machine halted.
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
full test log, https://lkft.validation.linaro.org/scheduler/job/1978901#L916 https://lkft.validation.linaro.org/scheduler/job/1980839#L578
Is the mainline 4.9 tree supposed to work with clang? I didn't think that upstream effort started until 4.19 or so.
thanks,
greg k-h
On Thu, Nov 26, 2020 at 07:39:33AM +0100, Greg Kroah-Hartman wrote:
On Thu, Nov 26, 2020 at 10:14:43AM +0530, Naresh Kamboju wrote:
Linaro recently started building and testing with stable branches with clang. Stable 4.9 branch kernel built with clang 10 boot crashed on x86 and qemu_x86. We do not have base line results to compare with.
steps to build and boot: # build kernel with tuxmake # sudo pip3 install -U tuxmake # tuxmake --runtime docker --target-arch x86 --toolchain clang-10 --kconfig defconfig --kconfig-add https://builds.tuxbuild.com/1kgtX7QEDmhvj6OfbZBdlGaEple/config # boot qemu_x86_64 # /usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net nic,model=virtio,macaddr=DE:AD:BE:EF:66:14 -net tap -m 1024 -monitor none -kernel kernel/bzImage --append "root=/dev/sda rootwait console=ttyS0,115200" -hda rootfs/rpb-console-image-lkft-intel-corei7-64-20201022181159-3085.rootfs.ext4 -m 4096 -smp 4 -nographic
Crash log:
[ 14.121499] Freeing unused kernel memory: 1896K [ 14.126962] random: fast init done [ 14.206005] PANIC: double fault, error_code: 0x0 [ 14.210633] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2 [ 14.216809] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 14.224196] task: ffff88026e2c0000 task.stack: ffffc90000020000 [ 14.230105] RIP: 0010:[<ffffffff8117ec2b>] [<ffffffff8117ec2b>] proc_dostring+0x13b/0x1e0 [ 14.238374] RSP: 0018:000000000000000c EFLAGS: 00010297 [ 14.243676] RAX: 00005638939fb850 RBX: 000000000000000c RCX: 00005638939fb850 [ 14.250799] RDX: 000000000000000c RSI: 0000000000000000 RDI: 000000000000007f [ 14.257925] RBP: ffffc90000023d98 R08: ffffc90000023ef8 R09: 00005638939fb850 [ 14.265049] R10: 0000000000000000 R11: ffffffff8117f9e0 R12: ffffffff82479cf0 [ 14.272171] R13: ffffc90000023ef8 R14: ffffc90000023dd8 R15: 000000000000007f [ 14.279298] FS: 00007f57fbce8840(0000) GS:ffff880277880000(0000) knlGS:0000000000000000 [ 14.287384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.293120] CR2: fffffffffffffff8 CR3: 000000026d58a000 CR4: 0000000000360670 [ 14.300243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.307368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 14.314491] Stack: [ 14.316504] Call Trace: [ 14.318955] Code: c3 49 8b 10 31 f6 48 01 da 49 89 10 49 83 3e 00 74 49 41 83 c7 ff 49 63 ff 4c 89 c9 0f 1f 40 00 48 39 fe 73 36 48 89 c8 48 89 dc <e8> b0 9d 3a 00 85 c0 0f 85 8c 00 00 00 84 d2 74 1f 80 fa 0a 74 [ 14.338906] Kernel panic - not syncing: Machine halted. [ 14.344123] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2 [ 14.350291] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 14.357677] ffff880277888e80 ffffffff81518ae9 ffff880277888e98 ffffffff82971a10 [ 14.365129] 000000000000000f 0000000000000000 0000000000000086 ffffffff820c5d57 [ 14.372584] ffff880277888f08 ffffffff81175736 0000003000000008 ffff880277888f18 [ 14.380038] Call Trace: [ 14.382481] <#DF> [ 14.384406] [<ffffffff81518ae9>] dump_stack+0xa9/0x100 [ 14.389641] [<ffffffff81175736>] panic+0xe6/0x2a0 [ 14.394432] [<ffffffff810c9911>] df_debug+0x31/0x40 [ 14.399389] [<ffffffff81096312>] do_double_fault+0x102/0x140 [ 14.405128] [<ffffffff81ccc987>] double_fault+0x27/0x30 [ 14.410440] [<ffffffff8117f9e0>] ? proc_put_long+0xc0/0xc0 [ 14.416004] [<ffffffff8117ec2b>] ? proc_dostring+0x13b/0x1e0 [ 14.421739] <EOE> [ 14.423703] Kernel Offset: disabled [ 14.427209] ---[ end Kernel panic - not syncing: Machine halted.
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
full test log, https://lkft.validation.linaro.org/scheduler/job/1978901#L916 https://lkft.validation.linaro.org/scheduler/job/1980839#L578
Is the mainline 4.9 tree supposed to work with clang? I didn't think that upstream effort started until 4.19 or so.
thanks,
greg k-h
We have been building and boot testing the mainline 4.9 tree for quite some time. This issue appears to be exposed by the rootfs that Linaro is using for testing; ours is incredibly simple (prints the version string then shuts down, there is no systemd or complex init).
Some initial notes, I am not sure how much time I will have to look at this in the near future:
1. This does not happen with the same configuration file on linux-4.14.y.
2. This happens with the latest version of clang on linux-4.9.y.
3. Bisecting v4.9 to v4.14 will be rather difficult because clang support was backported to 4.9 somewhere in the 130s.
There could be a clang backport missing or a bug was unintentionally fixed somewhere else.
Cheers, Nathan
On Wed, Nov 25, 2020 at 10:38 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Is the mainline 4.9 tree supposed to work with clang? I didn't think that upstream effort started until 4.19 or so.
(For historical records, separate from the initial bug report that started this thread)
I consider 785f11aa595b ("kbuild: Add better clang cross build support") to be the starting point of a renewed effort to upstream clang support. 785f11aa595b landed in v4.12-rc1. I think most patches landed between there and 4.15 (would have been my guess). From there, support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64. We still have CI coverage of those branches+arches with Clang today. Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4 and 4a with 4.14+clang. CrOS has also shipped clang built kernels since 4.4+.
On Mon, Nov 30, 2020 at 12:12:39PM -0800, Nick Desaulniers wrote:
On Wed, Nov 25, 2020 at 10:38 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Is the mainline 4.9 tree supposed to work with clang? I didn't think that upstream effort started until 4.19 or so.
(For historical records, separate from the initial bug report that started this thread)
I consider 785f11aa595b ("kbuild: Add better clang cross build support") to be the starting point of a renewed effort to upstream clang support. 785f11aa595b landed in v4.12-rc1. I think most patches landed between there and 4.15 (would have been my guess). From there, support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64. We still have CI coverage of those branches+arches with Clang today. Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4 and 4a with 4.14+clang. CrOS has also shipped clang built kernels since 4.4+.
Thanks for the info. Naresh, does this help explain why maybe testing these kernel branches with clang might not be the best thing to do?
greg k-h
On Tue, 1 Dec 2020 at 13:49, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Nov 30, 2020 at 12:12:39PM -0800, Nick Desaulniers wrote:
(For historical records, separate from the initial bug report that started this thread)
I consider 785f11aa595b ("kbuild: Add better clang cross build support") to be the starting point of a renewed effort to upstream clang support. 785f11aa595b landed in v4.12-rc1. I think most patches landed between there and 4.15 (would have been my guess). From there, support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64. We still have CI coverage of those branches+arches with Clang today. Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4 and 4a with 4.14+clang. CrOS has also shipped clang built kernels since 4.4+.
Thanks for the info. Naresh, does this help explain why maybe testing these kernel branches with clang might not be the best thing to do?
It is clear now.
FYI, With this note LKFT will not test 4.14+clang and old branches.
- Naresh
On Tue, Dec 1, 2020 at 12:19 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Nov 30, 2020 at 12:12:39PM -0800, Nick Desaulniers wrote:
On Wed, Nov 25, 2020 at 10:38 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Is the mainline 4.9 tree supposed to work with clang? I didn't think that upstream effort started until 4.19 or so.
(For historical records, separate from the initial bug report that started this thread)
I consider 785f11aa595b ("kbuild: Add better clang cross build support") to be the starting point of a renewed effort to upstream clang support. 785f11aa595b landed in v4.12-rc1. I think most patches landed between there and 4.15 (would have been my guess). From there, support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64. We still have CI coverage of those branches+arches with Clang today. Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4 and 4a with 4.14+clang. CrOS has also shipped clang built kernels since 4.4+.
Thanks for the info. Naresh, does this help explain why maybe testing these kernel branches with clang might not be the best thing to do?
On the contrary, I think it's very much worthwhile to test these branches with Clang. Particularly since CrOS is shipping x86_64 devices built with Clang since 4.4.y. This looks like a problem that's potentially been fixed but the fix not yet identified and backported. It would be good for us to identify and fix the issue before it becomes a problem for CrOS.
Though, it looks like CrOS just skipped 4.9...? Looking at: https://chromium.googlesource.com/chromiumos/third_party/kernel/+refs I don't see a chromeos-4.9 branch.
That said, I still find such reports helpful to track.
linux-stable-mirror@lists.linaro.org