[ Upstream commit a238487f7965d102794ed9f8aff0b667cd2ae886 ]
The 4xxx drivers hardcode the ring to service mapping. However, when
additional configurations where added to the driver, the mappings were
not updated. This implies that an incorrect mapping might be reported
through pfvf for certain configurations.
This is a backport of the upstream commit with modifications, as the
original patch does not apply cleanly to kernel v6.1.x. The logic has
been simplified to reflect the limited configurations of the QAT driver
in this version: crypto-only and compression.
Instead of dynamically computing the ring to service mappings, these are
now hardcoded to simplify the backport.
Fixes: 0cec19c761e5 ("crypto: qat - add support for compression for 4xxx")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski(a)intel.com>
Reviewed-by: Tero Kristo <tero.kristo(a)linux.intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
(cherry-picked from commit a238487f7965d102794ed9f8aff0b667cd2ae886)
[Giovanni: backport to 6.1.y, conflict resolved simplifying the logic
in the function get_ring_to_svc_map() as the QAT driver in v6.1 supports
only limited configurations (crypto only and compression). Differs from
upstream as the ring to service mapping is hardcoded rather than being
dynamically computed.]
Reviewed-by: Ahsan Atta <ahsan.atta(a)intel.com>
Tested-by: Ahsan Atta <ahsan.atta(a)intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
---
V1 -> V2: changed signed-off-by area:
* added (cherry-picked from ...) after last tag from upstream commit
* added a note explaining how this backport differs from the original patch
* added a new Signed-off-by tag for the backport author.
drivers/crypto/qat/qat_4xxx/adf_4xxx_hw_data.c | 13 +++++++++++++
drivers/crypto/qat/qat_common/adf_accel_devices.h | 1 +
drivers/crypto/qat/qat_common/adf_gen4_hw_data.h | 6 ++++++
drivers/crypto/qat/qat_common/adf_init.c | 3 +++
4 files changed, 23 insertions(+)
diff --git a/drivers/crypto/qat/qat_4xxx/adf_4xxx_hw_data.c b/drivers/crypto/qat/qat_4xxx/adf_4xxx_hw_data.c
index fda5f699ff57..65b52c692add 100644
--- a/drivers/crypto/qat/qat_4xxx/adf_4xxx_hw_data.c
+++ b/drivers/crypto/qat/qat_4xxx/adf_4xxx_hw_data.c
@@ -297,6 +297,18 @@ static char *uof_get_name(struct adf_accel_dev *accel_dev, u32 obj_num)
return NULL;
}
+static u16 get_ring_to_svc_map(struct adf_accel_dev *accel_dev)
+{
+ switch (get_service_enabled(accel_dev)) {
+ case SVC_CY:
+ return ADF_GEN4_DEFAULT_RING_TO_SRV_MAP;
+ case SVC_DC:
+ return ADF_GEN4_DEFAULT_RING_TO_SRV_MAP_DC;
+ }
+
+ return 0;
+}
+
static u32 uof_get_ae_mask(struct adf_accel_dev *accel_dev, u32 obj_num)
{
switch (get_service_enabled(accel_dev)) {
@@ -353,6 +365,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data)
hw_data->uof_get_ae_mask = uof_get_ae_mask;
hw_data->set_msix_rttable = set_msix_default_rttable;
hw_data->set_ssm_wdtimer = adf_gen4_set_ssm_wdtimer;
+ hw_data->get_ring_to_svc_map = get_ring_to_svc_map;
hw_data->disable_iov = adf_disable_sriov;
hw_data->ring_pair_reset = adf_gen4_ring_pair_reset;
hw_data->enable_pm = adf_gen4_enable_pm;
diff --git a/drivers/crypto/qat/qat_common/adf_accel_devices.h b/drivers/crypto/qat/qat_common/adf_accel_devices.h
index ad01d99e6e2b..7993d0f82dea 100644
--- a/drivers/crypto/qat/qat_common/adf_accel_devices.h
+++ b/drivers/crypto/qat/qat_common/adf_accel_devices.h
@@ -176,6 +176,7 @@ struct adf_hw_device_data {
void (*get_arb_info)(struct arb_info *arb_csrs_info);
void (*get_admin_info)(struct admin_info *admin_csrs_info);
enum dev_sku_info (*get_sku)(struct adf_hw_device_data *self);
+ u16 (*get_ring_to_svc_map)(struct adf_accel_dev *accel_dev);
int (*alloc_irq)(struct adf_accel_dev *accel_dev);
void (*free_irq)(struct adf_accel_dev *accel_dev);
void (*enable_error_correction)(struct adf_accel_dev *accel_dev);
diff --git a/drivers/crypto/qat/qat_common/adf_gen4_hw_data.h b/drivers/crypto/qat/qat_common/adf_gen4_hw_data.h
index 4fb4b3df5a18..5e653ec755e6 100644
--- a/drivers/crypto/qat/qat_common/adf_gen4_hw_data.h
+++ b/drivers/crypto/qat/qat_common/adf_gen4_hw_data.h
@@ -95,6 +95,12 @@ do { \
ADF_RING_BUNDLE_SIZE * (bank) + \
ADF_RING_CSR_RING_SRV_ARB_EN, (value))
+#define ADF_GEN4_DEFAULT_RING_TO_SRV_MAP_DC \
+ (COMP << ADF_CFG_SERV_RING_PAIR_0_SHIFT | \
+ COMP << ADF_CFG_SERV_RING_PAIR_1_SHIFT | \
+ COMP << ADF_CFG_SERV_RING_PAIR_2_SHIFT | \
+ COMP << ADF_CFG_SERV_RING_PAIR_3_SHIFT)
+
/* Default ring mapping */
#define ADF_GEN4_DEFAULT_RING_TO_SRV_MAP \
(ASYM << ADF_CFG_SERV_RING_PAIR_0_SHIFT | \
diff --git a/drivers/crypto/qat/qat_common/adf_init.c b/drivers/crypto/qat/qat_common/adf_init.c
index 2e3481270c4b..49f07584f8c9 100644
--- a/drivers/crypto/qat/qat_common/adf_init.c
+++ b/drivers/crypto/qat/qat_common/adf_init.c
@@ -95,6 +95,9 @@ int adf_dev_init(struct adf_accel_dev *accel_dev)
return -EFAULT;
}
+ if (hw_data->get_ring_to_svc_map)
+ hw_data->ring_to_svc_map = hw_data->get_ring_to_svc_map(accel_dev);
+
if (adf_ae_init(accel_dev)) {
dev_err(&GET_DEV(accel_dev),
"Failed to initialise Acceleration Engine\n");
--
2.50.0
Hello,
The following is the original thread, where a bug was reported to the
linux-wireless and ath10k mailing lists. The specific bug has been
detailed clearly here.
https://lore.kernel.org/linux-wireless/690B1DB2-C9DC-4FAD-8063-4CED659B1701…
There is also a Bugzilla report by me, which was opened later:
https://bugzilla.kernel.org/show_bug.cgi?id=220264
As stated, it is highly encouraged to check out all the logs,
especially the line of IRQ #16 in /proc/interrupts.
Here is where all the logs are:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
(these logs are taken from an Arch liveboot)
On my daily driver, I found these on my IRQ #16:
16: 173210 0 0 0 IR-IO-APIC
16-fasteoi i2c_designware.0, idma64.0, i801_smbus
The fixes stated on the Reddit post for this Wi-Fi card didn't quite
work. (But git-cloning the firmware files did give me some more time
to have stable internet)
This time, I had to go for the GRUB kernel parameters.
Right now, I'm using "irqpoll" to curb the errors caused.
"intel_iommu=off" did not work, and the Wi-Fi was constantly crashing
even then. Did not try out "pci=noaer" this time.
If it's of any concern, there is a very weird error in Chromium-based
browsers which has only happened after I started using irqpoll. When I
Google something, the background of the individual result boxes shows
as pure black, while the surrounding space is the usual
greyish-blackish, like we see in Dark Mode. Here is a picture of the
exact thing I'm experiencing: https://files.catbox.moe/mjew6g.png
If you notice anything in my logs/bug reports, please let me know.
(Because it seems like Wi-Fi errors are just a red herring, there are
some ACPI or PCIe-related errors in the computers of this model - just
a naive speculation, though.)
Thanking you,
Bandhan Pramanik
BootLoader (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE=
/boot/vmlinuz-x.y.z" to kernel parameters. But these identifiers are not
recognized by the kernel itself so will be passed to user space. However
user space init program also doesn't recognized it.
KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on
some architectures.
We cannot change BootLoader's behavior, because this behavior exists for
many years, and there are already user space programs search BOOT_IMAGE=
in /proc/cmdline to obtain the kernel image locations:
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go
(search getBootOptions)
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go
(search getKernelReleaseWithBootOption)
So the the best way is handle (ignore) it by the kernel itself, which
can avoid such boot warnings (if we use something like init=/bin/bash,
bootloader identifier can even cause a crash):
Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Update comments and commit messages.
V3: Document bootloader identifiers.
init/main.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/init/main.c b/init/main.c
index 225a58279acd..b25e7da5347a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -545,6 +545,12 @@ static int __init unknown_bootoption(char *param, char *val,
const char *unused, void *arg)
{
size_t len = strlen(param);
+ /*
+ * Well-known bootloader identifiers:
+ * 1. LILO/Grub pass "BOOT_IMAGE=...";
+ * 2. kexec/kdump (kexec-tools) pass "kexec".
+ */
+ const char *bootloader[] = { "BOOT_IMAGE=", "kexec", NULL };
/* Handle params aliased to sysctls */
if (sysctl_is_alias(param))
@@ -552,6 +558,12 @@ static int __init unknown_bootoption(char *param, char *val,
repair_env_string(param, val);
+ /* Handle bootloader identifier */
+ for (int i = 0; bootloader[i]; i++) {
+ if (!strncmp(param, bootloader[i], strlen(bootloader[i])))
+ return 0;
+ }
+
/* Handle obsolete-style parameters */
if (obsolete_checksetup(param))
return 0;
--
2.47.3
Syzkaller reports a KASAN issue as below:
general protection fault, probably for non-canonical address 0xfbd59c0000000021: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead000000000108-0xdead00000000010f]
CPU: 0 PID: 5083 Comm: syz-executor.2 Not tainted 6.1.134-syzkaller-00037-g855bd1d7d838 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__list_del include/linux/list.h:114 [inline]
RIP: 0010:__list_del_entry include/linux/list.h:137 [inline]
RIP: 0010:list_del include/linux/list.h:148 [inline]
RIP: 0010:p9_fd_cancelled+0xe9/0x200 net/9p/trans_fd.c:734
Call Trace:
<TASK>
p9_client_flush+0x351/0x440 net/9p/client.c:614
p9_client_rpc+0xb6b/0xc70 net/9p/client.c:734
p9_client_version net/9p/client.c:920 [inline]
p9_client_create+0xb51/0x1240 net/9p/client.c:1027
v9fs_session_init+0x1f0/0x18f0 fs/9p/v9fs.c:408
v9fs_mount+0xba/0xcb0 fs/9p/vfs_super.c:126
legacy_get_tree+0x108/0x220 fs/fs_context.c:632
vfs_get_tree+0x8e/0x300 fs/super.c:1573
do_new_mount fs/namespace.c:3056 [inline]
path_mount+0x6a6/0x1e90 fs/namespace.c:3386
do_mount fs/namespace.c:3399 [inline]
__do_sys_mount fs/namespace.c:3607 [inline]
__se_sys_mount fs/namespace.c:3584 [inline]
__x64_sys_mount+0x283/0x300 fs/namespace.c:3584
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This happens because of a race condition between:
- The 9p client sending an invalid flush request and later cleaning it up;
- The 9p client in p9_read_work() canceled all pending requests.
Thread 1 Thread 2
...
p9_client_create()
...
p9_fd_create()
...
p9_conn_create()
...
// start Thread 2
INIT_WORK(&m->rq, p9_read_work);
p9_read_work()
...
p9_client_rpc()
...
...
p9_conn_cancel()
...
spin_lock(&m->req_lock);
...
p9_fd_cancelled()
...
...
spin_unlock(&m->req_lock);
// status rewrite
p9_client_cb(m->client, req, REQ_STATUS_ERROR)
// first remove
list_del(&req->req_list);
...
spin_lock(&m->req_lock)
...
// second remove
list_del(&req->req_list);
spin_unlock(&m->req_lock)
...
Commit 74d6a5d56629 ("9p/trans_fd: Fix concurrency del of req_list in
p9_fd_cancelled/p9_read_work") fixes a concurrency issue in the 9p filesystem
client where the req_list could be deleted simultaneously by both
p9_read_work and p9_fd_cancelled functions, but for the case where req->status
equals REQ_STATUS_RCVD.
Add an explicit check for REQ_STATUS_ERROR in p9_fd_cancelled before
processing the request. Skip processing if the request is already in the error
state, as it has been removed and its resources cleaned up.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: afd8d6541155 ("9P: Add cancelled() to the transport functions.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nalivayko Sergey <Sergey.Nalivayko(a)kaspersky.com>
---
net/9p/trans_fd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a69422366a23..a6054a392a90 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -721,9 +721,9 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
spin_lock(&m->req_lock);
/* Ignore cancelled request if message has been received
- * before lock.
- */
- if (req->status == REQ_STATUS_RCVD) {
+ * or cancelled with error before lock.
+ */
+ if (req->status == REQ_STATUS_RCVD || req->status == REQ_STATUS_ERROR) {
spin_unlock(&m->req_lock);
return 0;
}
--
2.30.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a0ee1d5faff135e28810f29e0f06328c66f89852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062039-anger-volumes-9d75@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a0ee1d5faff135e28810f29e0f06328c66f89852 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao(a)intel.com>
Date: Mon, 24 Mar 2025 22:08:48 +0800
Subject: [PATCH] KVM: VMX: Flush shadow VMCS on emergency reboot
Ensure the shadow VMCS cache is evicted during an emergency reboot to
prevent potential memory corruption if the cache is evicted after reboot.
This issue was identified through code inspection, as __loaded_vmcs_clear()
flushes both the normal VMCS and the shadow VMCS.
Avoid checking the "launched" state during an emergency reboot, unlike the
behavior in __loaded_vmcs_clear(). This is important because reboot NMIs
can interfere with operations like copy_shadow_to_vmcs12(), where shadow
VMCSes are loaded directly using VMPTRLD. In such cases, if NMIs occur
right after the VMCS load, the shadow VMCSes will be active but the
"launched" state may not be set.
Fixes: 16f5b9034b69 ("KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chao Gao <chao.gao(a)intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Link: https://lore.kernel.org/r/20250324140849.2099723-1-chao.gao@intel.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..848c4963bdb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -770,8 +770,11 @@ void vmx_emergency_disable_virtualization_cpu(void)
return;
list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
- loaded_vmcss_on_cpu_link)
+ loaded_vmcss_on_cpu_link) {
vmcs_clear(v->vmcs);
+ if (v->shadow_vmcs)
+ vmcs_clear(v->shadow_vmcs);
+ }
kvm_cpu_vmxoff();
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a0ee1d5faff135e28810f29e0f06328c66f89852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062034-chastise-wrecking-9a12@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a0ee1d5faff135e28810f29e0f06328c66f89852 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao(a)intel.com>
Date: Mon, 24 Mar 2025 22:08:48 +0800
Subject: [PATCH] KVM: VMX: Flush shadow VMCS on emergency reboot
Ensure the shadow VMCS cache is evicted during an emergency reboot to
prevent potential memory corruption if the cache is evicted after reboot.
This issue was identified through code inspection, as __loaded_vmcs_clear()
flushes both the normal VMCS and the shadow VMCS.
Avoid checking the "launched" state during an emergency reboot, unlike the
behavior in __loaded_vmcs_clear(). This is important because reboot NMIs
can interfere with operations like copy_shadow_to_vmcs12(), where shadow
VMCSes are loaded directly using VMPTRLD. In such cases, if NMIs occur
right after the VMCS load, the shadow VMCSes will be active but the
"launched" state may not be set.
Fixes: 16f5b9034b69 ("KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chao Gao <chao.gao(a)intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Link: https://lore.kernel.org/r/20250324140849.2099723-1-chao.gao@intel.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..848c4963bdb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -770,8 +770,11 @@ void vmx_emergency_disable_virtualization_cpu(void)
return;
list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
- loaded_vmcss_on_cpu_link)
+ loaded_vmcss_on_cpu_link) {
vmcs_clear(v->vmcs);
+ if (v->shadow_vmcs)
+ vmcs_clear(v->shadow_vmcs);
+ }
kvm_cpu_vmxoff();
}
The check for a fast symlink in the presence of only an
external xattr inode is incorrect. If a fast symlink does
not have an xattr block (i_file_acl == 0), but does have
an external xattr inode that increases inode i_blocks, then
the check for a fast symlink will incorrectly fail and
__ext4_iget()->ext4_ind_check_inode() will report the inode
is corrupt when it "validates" i_data[] on the next read:
# ln -s foo /mnt/tmp/bar
# setfattr -h -n trusted.test \
-v "$(yes | head -n 4000)" /mnt/tmp/bar
# umount /mnt/tmp
# mount /mnt/tmp
# ls -l /mnt/tmp
ls: cannot access '/mnt/tmp/bar': Structure needs cleaning
total 4
? l?????????? ? ? ? ? ? bar
# dmesg | tail -1
EXT4-fs error (device dm-8): __ext4_iget:5098:
inode #24578: block 7303014: comm ls: invalid block
(note that "block 7303014" = 0x6f6f66 = "foo" in LE order).
ext4_inode_is_fast_symlink() should check the superblock
EXT4_FEATURE_INCOMPAT_EA_INODE feature flag, not the inode
EXT4_EA_INODE_FL, since the latter is only set on the xattr
inode itself, and not on the inode that uses this xattr.
Cc: stable(a)vger.kernel.org
Fixes: fc82228a5e38 ("ext4: support fast symlinks from ext3 file systems")
Signed-off-by: Andreas Dilger <adilger(a)whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli(a)ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz(a)whamcloud.com>
Reviewed-by: Oleg Drokin <green(a)whamcloud.com>
Reviewed-on: https://review.whamcloud.com/59879
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-19121
---
fs/ext4/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index be9a4cba35fd..caca88521c75 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -146,7 +146,7 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
*/
int ext4_inode_is_fast_symlink(struct inode *inode)
{
- if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) {
+ if (!ext4_has_feature_ea_inode(inode->i_sb)) {
int ea_blocks = EXT4_I(inode)->i_file_acl ?
EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0;
--
2.43.5
netlink_attachskb() checks for the socket's read memory allocation
constraints. Firstly, it has:
rmem < READ_ONCE(sk->sk_rcvbuf)
to check if the just increased rmem value fits into the socket's receive
buffer. If not, it proceeds and tries to wait for the memory under:
rmem + skb->truesize > READ_ONCE(sk->sk_rcvbuf)
The checks don't cover the case when skb->truesize + sk->sk_rmem_alloc is
equal to sk->sk_rcvbuf. Thus the function neither successfully accepts
these conditions, nor manages to reschedule the task - and is called in
retry loop for indefinite time which is caught as:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 0-....: (25999 ticks this GP) idle=ef2/1/0x4000000000000000 softirq=262269/262269 fqs=6212
(t=26000 jiffies g=230833 q=259957)
NMI backtrace for cpu 0
CPU: 0 PID: 22 Comm: kauditd Not tainted 5.10.240 #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc42 04/01/2014
Call Trace:
<IRQ>
dump_stack lib/dump_stack.c:120
nmi_cpu_backtrace.cold lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace lib/nmi_backtrace.c:62
rcu_dump_cpu_stacks kernel/rcu/tree_stall.h:335
rcu_sched_clock_irq.cold kernel/rcu/tree.c:2590
update_process_times kernel/time/timer.c:1953
tick_sched_handle kernel/time/tick-sched.c:227
tick_sched_timer kernel/time/tick-sched.c:1399
__hrtimer_run_queues kernel/time/hrtimer.c:1652
hrtimer_interrupt kernel/time/hrtimer.c:1717
__sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113
asm_call_irq_on_stack arch/x86/entry/entry_64.S:808
</IRQ>
netlink_attachskb net/netlink/af_netlink.c:1234
netlink_unicast net/netlink/af_netlink.c:1349
kauditd_send_queue kernel/audit.c:776
kauditd_thread kernel/audit.c:897
kthread kernel/kthread.c:328
ret_from_fork arch/x86/entry/entry_64.S:304
Restore the original behavior of the check which commit in Fixes
accidentally missed when restructuring the code.
Found by Linux Verification Center (linuxtesting.org).
Fixes: ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
Similar rmem and sk->sk_rcvbuf comparing pattern in
netlink_broadcast_deliver() accepts these values being equal, while
the netlink_dump() case does not - but it goes all the way down to
f9c2288837ba ("netlink: implement memory mapped recvmsg()") and looks
like an irrelevant issue without any real consequences. Though might be
cleaned up if needed.
Updated sk->sk_rmem_alloc vs sk->sk_rcvbuf checks throughout the kernel
diverse in treating the corner case of them being equal.
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6332a0e06596..0fc3f045fb65 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1218,7 +1218,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
nlk = nlk_sk(sk);
rmem = atomic_add_return(skb->truesize, &sk->sk_rmem_alloc);
- if ((rmem == skb->truesize || rmem < READ_ONCE(sk->sk_rcvbuf)) &&
+ if ((rmem == skb->truesize || rmem <= READ_ONCE(sk->sk_rcvbuf)) &&
!test_bit(NETLINK_S_CONGESTED, &nlk->state)) {
netlink_skb_set_owner_r(skb, sk);
return 0;
--
2.50.1
On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro(a)zeniv.linux.org.uk> wrote:
>
> On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > Hi Al and Christian,
> >
> > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > unmounted/not ours mounts") introduced an ABI backward compatibility
> > break. CRIU depends on the previous behavior, and users are now
> > reporting criu restore failures following the kernel update. This change
> > has been propagated to stable kernels. Is this check strictly required?
>
> Yes.
>
> > Would it be possible to check only if the current process has
> > CAP_SYS_ADMIN within the mount user namespace?
>
> Not enough, both in terms of permissions *and* in terms of "thou
> shalt not bugger the kernel data structures - nobody's priveleged
> enough for that".
Al,
I am still thinking in terms of "Thou shalt not break userspace"...
Seriously though, this original behavior has been in the kernel for 20
years, and it hasn't triggered any corruptions in all that time. I
understand this change might be necessary in its current form, and
that some collateral damage could be unavoidable. But if that's the
case, I'd expect a detailed explanation of why it had to be so and why
userspace breakage is unavoidable.
The original change was merged two decades ago. We need to
consider that some applications might rely on that behavior. I'm not
questioning the security aspect - that must be addressed. But for
anything else, we need to minimize the impact on user applications that
don't violate security.
We can consider a cleaner fix for the upstream kernel, but when we are
talking about stable kernels, the user-space backward compatibility
aspect should be even more critical.
Thanks,
Andrei