Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
page allocation error using kernel 6.3.7-desktop-1.mga9 #1 SMP PREEMPT_DYNAMIC, from Fr 09 Jun 2023 22:57:31, Key ID b742fa8b80420f66; see the backtrace in the dmesg
cat /proc/cpuinfo
siblings : 4 core id : 1 cpu cores : 2 ... type: regression, worked with the previous kernel, namely 6.3.6, Mo 05 Jun 2023 21:37:15, Key ID b742fa8b80420f66 before updating today
And then:
The first hibernation attempt resulted in the backtrace you can see in the dmesg above, my second hibernation attempt from a text console (vt03 or so) has worked without errors and the third one I tried to do from the GUI/X11 again; see the debug options I had turned on). On the third attempt something strange did happen. It seemed to write to disk as it should, the screen turned black but the power led and button still stayed alighted. Waking up by pressing the power button did not yield any effect, nor the SysRq keys (alas forgot to write 511 to >/proc/sys/kernel/sysrq). After a hard power reset it booted as if not hibernated. On the first hibernation attempt I could see lengthy and intermittent disk access. On the third attempt I had waited for some considerable time.
See Bugzilla for the full thread and attached infos (dmesg, journalctl, stack trace disassembly).
Unfortunately, the reporter can't provide /proc/kcore output and haven't performed bisection yet (he can't build custom kernel).
Anyway, I'm adding it to regzbot (as stable-specific regression) for now:
#regzbot introduced: v6.3.6..v6.3.7 https://bugzilla.kernel.org/show_bug.cgi?id=217544 #regzbot title: page allocation error (kernel fault on hibernation involving get_zeroed_page/swsusp_write) #regzbot link: https://bugs.mageia.org/show_bug.cgi?id=32044
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217544
Hi all, Hi Bagas S.
As the issue didn't reproduce the way I would have liked (did not reproduce at all here, not even with the same kernel version; no further comment) I have now uploaded the /proc/kcore and the kernel binaries and symbol files I still had on disk at https://upload.elstel.info (This may move to something like upload.elstel.info/bugs/kernpagealloc in the future)
Regards, Elmar
Am Fri, Jun 23, 2023 at 07:36:21PM +0700 schrieb Bagas Sanjaya:
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
page allocation error using kernel 6.3.7-desktop-1.mga9 #1 SMP PREEMPT_DYNAMIC, from Fr 09 Jun 2023 22:57:31, Key ID b742fa8b80420f66; see the backtrace in the dmesg
cat /proc/cpuinfo
siblings : 4 core id : 1 cpu cores : 2 ... type: regression, worked with the previous kernel, namely 6.3.6, Mo 05 Jun 2023 21:37:15, Key ID b742fa8b80420f66 before updating today
And then:
The first hibernation attempt resulted in the backtrace you can see in the dmesg above, my second hibernation attempt from a text console (vt03 or so) has worked without errors and the third one I tried to do from the GUI/X11 again; see the debug options I had turned on). On the third attempt something strange did happen. It seemed to write to disk as it should, the screen turned black but the power led and button still stayed alighted. Waking up by pressing the power button did not yield any effect, nor the SysRq keys (alas forgot to write 511 to >/proc/sys/kernel/sysrq). After a hard power reset it booted as if not hibernated. On the first hibernation attempt I could see lengthy and intermittent disk access. On the third attempt I had waited for some considerable time.
See Bugzilla for the full thread and attached infos (dmesg, journalctl, stack trace disassembly).
Unfortunately, the reporter can't provide /proc/kcore output and haven't performed bisection yet (he can't build custom kernel).
Anyway, I'm adding it to regzbot (as stable-specific regression) for now:
#regzbot introduced: v6.3.6..v6.3.7 https://bugzilla.kernel.org/show_bug.cgi?id=217544 #regzbot title: page allocation error (kernel fault on hibernation involving get_zeroed_page/swsusp_write) #regzbot link: https://bugs.mageia.org/show_bug.cgi?id=32044
Thanks.
-- An old man doll... just what I always wanted! - Clara
On Fri, Jun 23, 2023 at 06:17:05PM +0200, Elmar Stellnberger wrote:
Hi all, Hi Bagas S.
As the issue didn't reproduce the way I would have liked (did not reproduce at all here, not even with the same kernel version; no further comment) I have now uploaded the /proc/kcore and the kernel binaries and symbol files I still had on disk at https://upload.elstel.info (This may move to something like upload.elstel.info/bugs/kernpagealloc in the future)
First, tl;dr:
A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail?
A: No. Q: Should I include quotations after my reply?
Can you attach [1] to your Bugzilla report? Also, any report on bisection?
Also, you don't need to upload full kernel images instead; people can grab /proc/config.gz you uploaded on Bugzilla and then `make olddefconfig` from it.
Anyway, telling regzbot:
#regzbot link: https://upload.elstel.info/kcore.xz
Thanks.
[1]: https://upload.elstel.info/kcore.xz
Hi Bagas S., Hi all
concerns: Bug 217544 - kernel fault on hibernation: get_zeroed_page/swsusp_write https://bugzilla.kernel.org/show_bug.cgi?id=217544
Bisection does not make sense here, since I can not reproduce the issue. Packing the kernel binaries and symbol files was meant to invoke gdb directly on the kcore:
/usr/src/kernel-6.3.7-desktop586-1.mga9/scripts/extract-vmlinux vmlinuz-6.3.7-desktop-1.mga9 >vmlinux file vmlinuz-6.3.7-desktop-1.mga9
vmlinuz-6.3.7-desktop-1.mga9: Linux kernel x86 boot executable bzImage, version 6.3.7-desktop-1.mga9 (iurt@ecosse.mageia.org) #1 SMP PREEMPT_DYNAMIC Fri Jun 9 17:47:53 UTC 2023, RO-rootFS, swap_dev 0X6, Normal VGA
file vmlinux
vmlinux: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, BuildID[sha1]=942674511321671b33c739cceddb1e3a48a17895, stripped
grep __alloc_pages /boot/System.map...
0xc03758... T __alloc_pages
gdb vmlinux kcore
# x/5i 0xc03758..
Am Sat, Jun 24, 2023 at 08:25:39AM +0700 schrieb Bagas Sanjaya:
Also, you don't need to upload full kernel images instead; people can grab /proc/config.gz you uploaded on Bugzilla and then `make olddefconfig` from it.
I would heavily doubt that the same symbols would get to be located at the same address if you started to compile from source, even if you applied all Mageia specific patches. We would need reproducible builds for that. Nonetheless you are free to check whether the symbols will reside at the same place in your System.map afterwards. I wonder whether there is a way to convert the System.map text file (which looks to me like the output of 'nm -S') back into an elf section to be added to the stripped vmlinux with objcopy. Shouldn´t there be a script/ for this?
:: Sometimes you have only one chance to catch a bug.
Cheers, Elmar
Am Sat, Jun 24, 2023 at 08:25:39AM +0700 schrieb Bagas Sanjaya:
On Fri, Jun 23, 2023 at 06:17:05PM +0200, Elmar Stellnberger wrote: Can you attach [1] to your Bugzilla report? Also, any report on bisection?
Pardon, what is [1]?
On 6/24/23 17:21, Elmar Stellnberger wrote:
Hi Bagas S., Hi all
concerns: Bug 217544 - kernel fault on hibernation: get_zeroed_page/swsusp_write https://bugzilla.kernel.org/show_bug.cgi?id=217544
Bisection does not make sense here, since I can not reproduce the issue. Packing the kernel binaries and symbol files was meant to invoke gdb directly on the kcore:
Thorsten: Should this be marked as invalid/inconclusive?
Am Sat, Jun 24, 2023 at 08:25:39AM +0700 schrieb Bagas Sanjaya:
On Fri, Jun 23, 2023 at 06:17:05PM +0200, Elmar Stellnberger wrote: Can you attach [1] to your Bugzilla report? Also, any report on bisection?
Pardon, what is [1]?
Your kcore dump.
On 24.06.23 14:15, Bagas Sanjaya wrote:
On 6/24/23 17:21, Elmar Stellnberger wrote:
Hi Bagas S., Hi all
concerns: Bug 217544 - kernel fault on hibernation: get_zeroed_page/swsusp_write https://bugzilla.kernel.org/show_bug.cgi?id=217544
Bisection does not make sense here, since I can not reproduce the issue. Packing the kernel binaries and symbol files was meant to invoke gdb directly on the kcore:
Thorsten: Should this be marked as invalid/inconclusive?
Not as invalid, as there might be a real issue here; but it's hard to say, as among others it also quite possible that something else went wrong (compiler? hardware?). Someone would have to investigate. But given the fact that this happened with a stable kernel[1] and is impossible to reproduce, I suspect no developer will be motivated enough to do so. Then it's not worth tracking[2]:
#regzbot inconclusive: impossible to reproduce
Elmar, that's nothing bad. In case this turns out to be something you can reproduce and bisect, just let us know and we'll add it back.
[1] see the sections about stable kernels https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kern...
[2] side note: due to limited resources I consider to stop tracking all non-bisected issues in general (expect those that started to happen in mainline since the last mainline release) – or put them in a special category that signals "those are collected here JFYI until they are bisected, as the regression tracker due to limited resources for now can't keep a close eye on these"
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
linux-stable-mirror@lists.linaro.org