Greetings fellas,
I have encountered a critical bug in the Linux vanilla kernel that
leads to a kernel panic during the shutdown or reboot process. The
issue arises after all services, including `journald`, have been
stopped. As a result, the machine fails to complete the shutdown or
reboot procedure, effectively causing the system to hang and not shut
down or reboot.
Here are the details of the issue:
- Affected Versions: Before kernel version 6.8.10, the bug caused a
quick display of a kernel trace dump before the shutdown/reboot
completed. Starting from version 6.8.10 and continuing into version
6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
preventing the shutdown or reboot from completing and leaving the
machine stuck.
- Symptoms:
- In normal shutdown/reboot scenarios, the kernel trace dump briefly
appears as the last message on the screen.
- In rescue mode, the kernel panic message is displayed. Normally it
is not shown.
Since `journald` is stopped before this issue occurs, no textual logs
are available. However, I have captured two pictures illustrating
these related issues, which I am attaching to this email for your
reference. Also added my custom kernel config.
Thank you for your attention to this matter. Please let me know if any
additional information is required to assist in diagnosing and
resolving this bug.
Best regards,
Ilkka Naulapää
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8031b58c3a9b1db3ef68b3bd749fbee2e1e1aaa3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061719-prewashed-wimp-a695@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
8031b58c3a9b ("mptcp: ensure snd_una is properly initialized on connect")
fb7a0d334894 ("mptcp: ensure snd_nxt is properly initialized on connect")
54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
a42cf9d18278 ("mptcp: poll allow write call before actual connect")
d98a82a6afc7 ("mptcp: handle defer connect in mptcp_sendmsg")
3e5014909b56 ("mptcp: cleanup MPJ subflow list handling")
3d1d6d66e156 ("mptcp: implement support for user-space disconnect")
b29fcfb54cd7 ("mptcp: full disconnect implementation")
3ce0852c86b9 ("mptcp: enforce HoL-blocking estimation")
7cd2802d7496 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8031b58c3a9b1db3ef68b3bd749fbee2e1e1aaa3 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Fri, 7 Jun 2024 17:01:48 +0200
Subject: [PATCH] mptcp: ensure snd_una is properly initialized on connect
This is strictly related to commit fb7a0d334894 ("mptcp: ensure snd_nxt
is properly initialized on connect"). It turns out that syzkaller can
trigger the retransmit after fallback and before processing any other
incoming packet - so that snd_una is still left uninitialized.
Address the issue explicitly initializing snd_una together with snd_nxt
and write_seq.
Suggested-by: Mat Martineau <martineau(a)kernel.org>
Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1a…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 96b113854bd3..bb7dca8aa2d9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3740,6 +3740,7 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
WRITE_ONCE(msk->write_seq, subflow->idsn);
WRITE_ONCE(msk->snd_nxt, subflow->idsn);
+ WRITE_ONCE(msk->snd_una, subflow->idsn);
if (likely(!__mptcp_check_fallback(msk)))
MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEACTIVE);
After ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is
set to down/up") to not reset from usbnet_open after the reset from
usbnet_probe at initialization stage to speed up this, some issues have
been reported.
It seems to happen that if the initialization is slower, and some time
passes between the probe operation and the open operation, the second reset
from open is necessary too to have the device working. The reason is that
if there is no activity with the phy, this is "disconnected".
In order to improve this, the solution is to detect when the phy is
"disconnected", and we can use the phy status register for this. So we will
only reset the device from reset operation in this situation, that is, only
if necessary.
The same bahavior is happening when the device is stopped (link set to
down) and later is restarted (link set to up), so if the phy keeps working
we only need to enable the mac again, but if enough time passes between the
device stop and restart, reset is necessary, and we can detect the
situation checking the phy status register too.
cc: stable(a)vger.kernel.org # 6.6+
Fixes: ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is set to down/up")
Reported-by: Yongqin Liu <yongqin.liu(a)linaro.org>
Reported-by: Antje Miederhöfer <a.miederhoefer(a)gmx.de>
Reported-by: Arne Fitzenreiter <arne_f(a)ipfire.org>
Tested-by: Yongqin Liu <yongqin.liu(a)linaro.org>
Tested-by: Antje Miederhöfer <a.miederhoefer(a)gmx.de>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
drivers/net/usb/ax88179_178a.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index 51c295e1e823..c2fb736f78b2 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -174,7 +174,6 @@ struct ax88179_data {
u32 wol_supported;
u32 wolopts;
u8 disconnecting;
- u8 initialized;
};
struct ax88179_int_data {
@@ -1678,12 +1677,21 @@ static int ax88179_reset(struct usbnet *dev)
static int ax88179_net_reset(struct usbnet *dev)
{
- struct ax88179_data *ax179_data = dev->driver_priv;
+ u16 tmp16;
- if (ax179_data->initialized)
+ ax88179_read_cmd(dev, AX_ACCESS_PHY, AX88179_PHY_ID, GMII_PHY_PHYSR,
+ 2, &tmp16);
+ if (tmp16) {
+ ax88179_read_cmd(dev, AX_ACCESS_MAC, AX_MEDIUM_STATUS_MODE,
+ 2, 2, &tmp16);
+ if (!(tmp16 & AX_MEDIUM_RECEIVE_EN)) {
+ tmp16 |= AX_MEDIUM_RECEIVE_EN;
+ ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_MEDIUM_STATUS_MODE,
+ 2, 2, &tmp16);
+ }
+ } else {
ax88179_reset(dev);
- else
- ax179_data->initialized = 1;
+ }
return 0;
}
--
2.45.1
A small prescaler is beneficial, as this improves the resolution of the
duty_cycle configuration. However if the prescaler is too small, the
maximal possible period becomes considerably smaller than the requested
value.
One situation where this goes wrong is the following: With a parent
clock rate of 208877930 Hz and max_arr = 0xffff = 65535, a request for
period = 941243 ns currently results in PSC = 1. The value for ARR is
then calculated to
PSC = 941243 * 208877930 / (1000000000 * 2) - 1 = 98301
This value is bigger than 65535 however and so doesn't fit into the
respective register. In this particular case the PWM was configured for
a period of 313733.4806027616 ns (with ARR = 98301 & 0xffff). Even if
ARR was configured to its maximal value, only period = 627495.6861167669
ns would be achievable.
Fix the calculation accordingly and adapt the comment to match the new
algorithm.
With the calculation fixed the above case results in PSC = 2 and so an
actual period of 941229.1667195285 ns.
Fixes: 8002fbeef1e4 ("pwm: stm32: Calculate prescaler with a division instead of a loop")
Cc: stable(a)vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
---
drivers/pwm/pwm-stm32.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c
index 3e7b2a8e34e7..2de7195e43a9 100644
--- a/drivers/pwm/pwm-stm32.c
+++ b/drivers/pwm/pwm-stm32.c
@@ -321,17 +321,24 @@ static int stm32_pwm_config(struct stm32_pwm *priv, unsigned int ch,
* First we need to find the minimal value for prescaler such that
*
* period_ns * clkrate
- * ------------------------------
+ * ------------------------------ ≤ max_arr
* NSEC_PER_SEC * (prescaler + 1)
*
- * isn't bigger than max_arr.
+ * This equation is equivalent to
+ *
+ * period_ns * clkrate
+ * ---------------------- ≤ prescaler + 1
+ * NSEC_PER_SEC * max_arr
+ *
+ * As the left hand side might not be integer but the right hand side
+ * is, the division must be rounded up when doing integer math. There
+ * is no variant of mul_u64_u64_div_u64() that rounds up, so we're
+ * trading that against the +1 which results in a non-optimal prescaler
+ * only if the division's result is integer.
*/
prescaler = mul_u64_u64_div_u64(period_ns, clk_get_rate(priv->clk),
(u64)NSEC_PER_SEC * priv->max_arr);
- if (prescaler > 0)
- prescaler -= 1;
-
if (prescaler > MAX_TIM_PSC)
return -EINVAL;
--
2.43.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5208e7ced520a813b4f4774451fbac4e517e78b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061754-ceremony-sturdily-fedb@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
5208e7ced520 ("serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level")
cc6628f07e0d ("serial: 8250_pxa: Switch to use uart_read_port_properties()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5208e7ced520a813b4f4774451fbac4e517e78b2 Mon Sep 17 00:00:00 2001
From: Doug Brown <doug(a)schmorgal.com>
Date: Sun, 19 May 2024 12:19:30 -0700
Subject: [PATCH] serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level
The FIFO is 64 bytes, but the FCR is configured to fire the TX interrupt
when the FIFO is half empty (bit 3 = 0). Thus, we should only write 32
bytes when a TX interrupt occurs.
This fixes a problem observed on the PXA168 that dropped a bunch of TX
bytes during large transmissions.
Fixes: ab28f51c77cd ("serial: rewrite pxa2xx-uart to use 8250_core")
Signed-off-by: Doug Brown <doug(a)schmorgal.com>
Link: https://lore.kernel.org/r/20240519191929.122202-1-doug@schmorgal.com
Cc: stable <stable(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_pxa.c b/drivers/tty/serial/8250/8250_pxa.c
index f1a51b00b1b9..ba96fa913e7f 100644
--- a/drivers/tty/serial/8250/8250_pxa.c
+++ b/drivers/tty/serial/8250/8250_pxa.c
@@ -125,6 +125,7 @@ static int serial_pxa_probe(struct platform_device *pdev)
uart.port.iotype = UPIO_MEM32;
uart.port.regshift = 2;
uart.port.fifosize = 64;
+ uart.tx_loadsz = 32;
uart.dl_write = serial_pxa_dl_write;
ret = serial8250_register_8250_port(&uart);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8cf360b9d6a840700e06864236a01a883b34bbad
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061351-brick-halved-4e61@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8cf360b9d6a8 ("mm/memory-failure: fix handling of dissolved but not taken off from buddy pages")
b6fd410c32f1 ("memory-failure: use a folio in me_huge_page()")
dbe70dbb41ab ("mm: memory-failure: remove unneeded PageHuge() check")
bc1cfde19467 ("mm/memory-failure: convert try_memory_failure_hugetlb() to folios")
a38358c934f6 ("Merge branch 'mm-hotfixes-stable' into mm-stable")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cf360b9d6a840700e06864236a01a883b34bbad Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Thu, 23 May 2024 15:12:17 +0800
Subject: [PATCH] mm/memory-failure: fix handling of dissolved but not taken
off from buddy pages
When I did memory failure tests recently, below panic occurs:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
<TASK>
__rmqueue_pcplist+0x23b/0x520
get_page_from_freelist+0x26b/0xe40
__alloc_pages_noprof+0x113/0x1120
__folio_alloc_noprof+0x11/0xb0
alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
__alloc_fresh_hugetlb_folio+0xe7/0x140
alloc_pool_huge_folio+0x68/0x100
set_max_huge_pages+0x13d/0x340
hugetlb_sysctl_handler_common+0xe8/0x110
proc_sys_call_handler+0x194/0x280
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
And before the panic, there had an warning about bad page state:
BUG: Bad page state in process page-types pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
<TASK>
dump_stack_lvl+0x83/0xa0
bad_page+0x63/0xf0
free_unref_page+0x36e/0x5c0
unpoison_memory+0x50b/0x630
simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
debugfs_attr_write+0x42/0x60
full_proxy_write+0x5b/0x80
vfs_write+0xcd/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xc2/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
</TASK>
The root cause should be the below race:
memory_failure
try_memory_failure_hugetlb
me_huge_page
__page_handle_poison
dissolve_free_hugetlb_folio
drain_all_pages -- Buddy page can be isolated e.g. for compaction.
take_page_off_buddy -- Failed as page is not in the buddy list.
-- Page can be putback into buddy after compaction.
page_ref_inc -- Leads to buddy page with refcnt = 1.
Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning. And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.
Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.
Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a9fe9eda593f..d3c830e817e3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1221,7 +1221,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
* subpages.
*/
folio_put(folio);
- if (__page_handle_poison(p) >= 0) {
+ if (__page_handle_poison(p) > 0) {
page_ref_inc(p);
res = MF_RECOVERED;
} else {
@@ -2091,7 +2091,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
*/
if (res == 0) {
folio_unlock(folio);
- if (__page_handle_poison(p) >= 0) {
+ if (__page_handle_poison(p) > 0) {
page_ref_inc(p);
res = MF_RECOVERED;
} else {