From: Francis Laniel <flaniel(a)linux.microsoft.com>
[ Upstream commit 03b80ff8023adae6780e491f66e932df8165e3a0 ]
If name_show() is non unique, this test will try to install a kprobe on this
function which should fail returning EADDRNOTAVAIL.
On kernel where name_show() is not unique, this test is skipped.
Link: https://lore.kernel.org/all/20231020104250.9537-3-flaniel@linux.microsoft.c…
Cc: stable(a)vger.kernel.org
Signed-off-by: Francis Laniel <flaniel(a)linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../ftrace/test.d/kprobe/kprobe_non_uniq_symbol.tc | 13 +++++++++++++
1 file changed, 13 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_non_uniq_symbol.tc
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_non_uniq_symbol.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_non_uniq_symbol.tc
new file mode 100644
index 0000000000000..bc9514428dbaf
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_non_uniq_symbol.tc
@@ -0,0 +1,13 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test failure of registering kprobe on non unique symbol
+# requires: kprobe_events
+
+SYMBOL='name_show'
+
+# We skip this test on kernel where SYMBOL is unique or does not exist.
+if [ "$(grep -c -E "[[:alnum:]]+ t ${SYMBOL}" /proc/kallsyms)" -le '1' ]; then
+ exit_unsupported
+fi
+
+! echo "p:test_non_unique ${SYMBOL}" > kprobe_events
--
2.43.0
From: Randy Dunlap <rdunlap(a)infradead.org>
[ Upstream commit 644f17412f5acf01a19af9d04a921937a2bc86c6 ]
UML supports HAS_IOMEM since 0bbadafdc49d (um: allow disabling
NO_IOMEM).
Current IMA build on UML fails on allmodconfig (with TCG_TPM=m):
ld: security/integrity/ima/ima_queue.o: in function `ima_add_template_entry':
ima_queue.c:(.text+0x2d9): undefined reference to `tpm_pcr_extend'
ld: security/integrity/ima/ima_init.o: in function `ima_init':
ima_init.c:(.init.text+0x43f): undefined reference to `tpm_default_chip'
ld: security/integrity/ima/ima_crypto.o: in function `ima_calc_boot_aggregate_tfm':
ima_crypto.c:(.text+0x1044): undefined reference to `tpm_pcr_read'
ld: ima_crypto.c:(.text+0x10d8): undefined reference to `tpm_pcr_read'
Modify the IMA Kconfig entry so that it selects TCG_TPM if HAS_IOMEM
is set, regardless of the UML Kconfig setting.
This updates TCG_TPM from =m to =y and fixes the linker errors.
Fixes: f4a0391dfa91 ("ima: fix Kconfig dependencies")
Cc: Stable <stable(a)vger.kernel.org> # v5.14+
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Fabio Estevam <festevam(a)gmail.com>
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: linux-um(a)lists.infradead.org
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
security/integrity/ima/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 6a8f67714c831..a8bf67557eb54 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -8,7 +8,7 @@ config IMA
select CRYPTO_MD5
select CRYPTO_SHA1
select CRYPTO_HASH_INFO
- select TCG_TPM if HAS_IOMEM && !UML
+ select TCG_TPM if HAS_IOMEM
select TCG_TIS if TCG_TPM && X86
select TCG_CRB if TCG_TPM && ACPI
select TCG_IBMVTPM if TCG_TPM && PPC_PSERIES
--
2.43.0
From: Filipe Manana <fdmanana(a)suse.com>
[ Upstream commit ede600e497b1461d06d22a7d17703d9096868bc3 ]
At split_node(), if we fail to log the tree mod log copy operation, we
return without unlocking the split extent buffer we just allocated and
without decrementing the reference we own on it. Fix this by unlocking
it and decrementing the ref count before returning.
Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/ctree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 156716f9e3e21..61ed69c688d58 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3537,6 +3537,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
ret = tree_mod_log_eb_copy(fs_info, split, c, 0, mid, c_nritems - mid);
if (ret) {
+ btrfs_tree_unlock(split);
+ free_extent_buffer(split);
btrfs_abort_transaction(trans, ret);
return ret;
}
--
2.43.0
From: Mario Limonciello <mario.limonciello(a)amd.com>
[ Upstream commit 968ab9261627fa305307e3935ca1a32fcddd36cb ]
commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe")
had a mistake in loop iteration 63 that it would clear offset 0xFC instead
of 0x100. Offset 0xFC is actually `WAKE_INT_MASTER_REG`. This was
clearing bits 13 and 15 from the register which significantly changed the
expected handling for some platforms for GPIO0.
commit b26cd9325be4 ("pinctrl: amd: Disable and mask interrupts on resume")
actually fixed this bug, but lead to regressions on Lenovo Z13 and some
other systems. This is because there was no handling in the driver for bit
15 debounce behavior.
Quoting a public BKDG:
```
EnWinBlueBtn. Read-write. Reset: 0. 0=GPIO0 detect debounced power button;
Power button override is 4 seconds. 1=GPIO0 detect debounced power button
in S3/S5/S0i3, and detect "pressed less than 2 seconds" and "pressed 2~10
seconds" in S0; Power button override is 10 seconds
```
Cross referencing the same master register in Windows it's obvious that
Windows doesn't use debounce values in this configuration. So align the
Linux driver to do this as well. This fixes wake on lid when
WAKE_INT_MASTER_REG is properly programmed.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217315
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/r/20230421120625.3366-2-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pinctrl/pinctrl-amd.c | 7 +++++++
drivers/pinctrl/pinctrl-amd.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 509ba4bceefcb..41f12fa15143c 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -114,6 +114,12 @@ static int amd_gpio_set_debounce(struct gpio_chip *gc, unsigned offset,
struct amd_gpio *gpio_dev = gpiochip_get_data(gc);
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+
+ /* Use special handling for Pin0 debounce */
+ pin_reg = readl(gpio_dev->base + WAKE_INT_MASTER_REG);
+ if (pin_reg & INTERNAL_GPIO0_DEBOUNCE)
+ debounce = 0;
+
pin_reg = readl(gpio_dev->base + offset * 4);
if (debounce) {
@@ -191,6 +197,7 @@ static void amd_gpio_dbg_show(struct seq_file *s, struct gpio_chip *gc)
char *output_value;
char *output_enable;
+ seq_printf(s, "WAKE_INT_MASTER_REG: 0x%08x\n", readl(gpio_dev->base + WAKE_INT_MASTER_REG));
for (bank = 0; bank < gpio_dev->hwbank_num; bank++) {
seq_printf(s, "GPIO bank%d\t", bank);
diff --git a/drivers/pinctrl/pinctrl-amd.h b/drivers/pinctrl/pinctrl-amd.h
index 884f48f7a6a36..c6be602c3df73 100644
--- a/drivers/pinctrl/pinctrl-amd.h
+++ b/drivers/pinctrl/pinctrl-amd.h
@@ -21,6 +21,7 @@
#define AMD_GPIO_PINS_BANK3 32
#define WAKE_INT_MASTER_REG 0xfc
+#define INTERNAL_GPIO0_DEBOUNCE (1 << 15)
#define EOI_MASK (1 << 29)
#define WAKE_INT_STATUS_REG0 0x2f8
--
2.43.0
From: Takashi Iwai <tiwai(a)suse.de>
[ Upstream commit 89dbb335cb6a627a4067bc42caa09c8bc3326d40 ]
snd_jack_report() is supposed to be callable from an IRQ context, too,
and it's indeed used in that way from virtsnd driver. The fix for
input_dev race in commit 1b6a6fc5280e ("ALSA: jack: Access input_dev
under mutex"), however, introduced a mutex lock in snd_jack_report(),
and this resulted in a potential sleep-in-atomic.
For addressing that problem, this patch changes the relevant code to
use the object get/put and removes the mutex usage. That is,
snd_jack_report(), it takes input_get_device() and leaves with
input_put_device() for assuring the input_dev being assigned.
Although the whole mutex could be reduced, we keep it because it can
be still a protection for potential races between creation and
deletion.
Fixes: 1b6a6fc5280e ("ALSA: jack: Access input_dev under mutex")
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/r/cf95f7fe-a748-4990-8378-000491b40329@moroto.mount…
Tested-by: Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20230706155357.3470-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/core/jack.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/sound/core/jack.c b/sound/core/jack.c
index d2f9a92453f2f..6340de60f26ef 100644
--- a/sound/core/jack.c
+++ b/sound/core/jack.c
@@ -378,6 +378,7 @@ void snd_jack_report(struct snd_jack *jack, int status)
{
struct snd_jack_kctl *jack_kctl;
#ifdef CONFIG_SND_JACK_INPUT_DEV
+ struct input_dev *idev;
int i;
#endif
@@ -389,30 +390,28 @@ void snd_jack_report(struct snd_jack *jack, int status)
status & jack_kctl->mask_bits);
#ifdef CONFIG_SND_JACK_INPUT_DEV
- mutex_lock(&jack->input_dev_lock);
- if (!jack->input_dev) {
- mutex_unlock(&jack->input_dev_lock);
+ idev = input_get_device(jack->input_dev);
+ if (!idev)
return;
- }
for (i = 0; i < ARRAY_SIZE(jack->key); i++) {
int testbit = SND_JACK_BTN_0 >> i;
if (jack->type & testbit)
- input_report_key(jack->input_dev, jack->key[i],
+ input_report_key(idev, jack->key[i],
status & testbit);
}
for (i = 0; i < ARRAY_SIZE(jack_switch_types); i++) {
int testbit = 1 << i;
if (jack->type & testbit)
- input_report_switch(jack->input_dev,
+ input_report_switch(idev,
jack_switch_types[i],
status & testbit);
}
- input_sync(jack->input_dev);
- mutex_unlock(&jack->input_dev_lock);
+ input_sync(idev);
+ input_put_device(idev);
#endif /* CONFIG_SND_JACK_INPUT_DEV */
}
EXPORT_SYMBOL(snd_jack_report);
--
2.43.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040520-unselect-antitrust-a41b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 Mon Sep 17 00:00:00 2001
From: Davide Caratti <dcaratti(a)redhat.com>
Date: Fri, 29 Mar 2024 13:08:52 +0100
Subject: [PATCH] mptcp: don't account accept() of non-MPC client as fallback
to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3a1967bc7bad..7e74b812e366 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3937,8 +3937,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_set_state(newsk, TCP_CLOSE);
}
} else {
- MPTCP_INC_STATS(sock_net(ssk),
- MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
tcpfallback:
newsk->sk_kern_sock = kern;
lock_sock(newsk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1626dd20c68f..6042a47da61b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -905,6 +905,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
return child;
fallback:
+ if (fallback)
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
mptcp_subflow_drop_ctx(child);
return child;
}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 4c4248554826..4131f3263a48 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -383,12 +383,14 @@ do_transfer()
local stat_cookierx_last
local stat_csum_err_s
local stat_csum_err_c
+ local stat_tcpfb_last_l
stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr")
stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
+ stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
@@ -457,11 +459,13 @@ do_transfer()
local stat_cookietx_now
local stat_cookierx_now
local stat_ooo_now
+ local stat_tcpfb_now_l
stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue")
+ stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
expect_synrx=$((stat_synrx_last_l))
expect_ackrx=$((stat_ackrx_last_l))
@@ -508,6 +512,11 @@ do_transfer()
fi
fi
+ if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then
+ mptcp_lib_pr_fail "unexpected fallback to TCP"
+ rets=1
+ fi
+
if [ $cookies -eq 2 ];then
if [ $stat_cookietx_last -ge $stat_cookietx_now ] ;then
extra+=" WARN: CookieSent: did not advance"
On 4/10/24 17:57, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> arm64: dts: qcom: Add support for Xiaomi Redmi Note 9S
autosel has been reeaaaaaly going over the top lately, particularly
with dts patches.. I'm not sure adding support for a device is
something that should go to stable
Konrad
MTD OTP logic is very fragile on parsing NVMEM cell and can be
problematic with some specific kind of devices.
The problem was discovered by e87161321a40 ("mtd: rawnand: macronix:
OTP access for MX30LFxG18AC") where OTP support was added to a NAND
device. With the case of NAND devices, it does require a node where ECC
info are declared and all the fixed partitions, and this cause the OTP
codepath to parse this node as OTP NVMEM cells, making probe fail and
the NAND device registration fail.
MTD OTP parsing should have been limited to always using compatible to
prevent this error by using node with compatible "otp-user" or
"otp-factory".
NVMEM across the years had various iteration on how cells could be
declared in DT, in some old implementation, no_of_node should have been
enabled but now add_legacy_fixed_of_cells should be used to disable
NVMEM to parse child node as NVMEM cell.
To fix this and limit any regression with other MTD that makes use of
declaring OTP as direct child of the dev node, disable
add_legacy_fixed_of_cells if we detect the MTD type is Nand.
With the following logic, the OTP NVMEM entry is correctly created with
no cells and the MTD Nand is correctly probed and partitions are
correctly exposed.
Fixes: 4b361cfa8624 ("mtd: core: add OTP nvmem provider support")
Cc: <stable(a)vger.kernel.org> # v6.7+
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
To backport this to v6.6 and previous,
config.no_of_node = mtd_type_is_nand(mtd);
should be used as it does pose the same usage of
add_legacy_fixed_of_cells.
Changes v5:
- Lower case of cell and use non-NAND
Changes v4:
- Add info on how to backport this to previous kernel
- Fix Fixes tag
- Reformat commit description as it was unprecise and
had false statement
Changes v3:
- Fix commit description
Changes v2:
- Use mtd_type_is_nand instead of node name check
drivers/mtd/mtdcore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 5887feb347a4..0de87bc63840 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -900,7 +900,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
config.name = compatible;
config.id = NVMEM_DEVID_AUTO;
config.owner = THIS_MODULE;
- config.add_legacy_fixed_of_cells = true;
+ config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
config.type = NVMEM_TYPE_OTP;
config.root_only = true;
config.ignore_wp = true;
--
2.43.0
From: Peter Wang <peter.wang(a)mediatek.com>
[ Upstream commit 6bc5e70b1c792b31b497e48b4668a9a2909aca0d ]
When wl suspend error occurs, for example BKOP or SSU timeout, the host
triggers an error handler and returns -EBUSY to break the wl suspend
process. However, it is possible for the runtime PM to enter wl suspend
again before the error handler has finished, and return -EINVAL because the
device is in an error state. To address this, ensure that the rumtime PM
waits for the error handler to finish, or trigger the error handler in such
cases, because returning -EINVAL can cause the I/O to hang.
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
Link: https://lore.kernel.org/r/20240329015036.15707-1-peter.wang@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/ufs/core/ufshcd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index f3c25467e571f..948449a13247c 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -9044,7 +9044,10 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
/* UFS device & link must be active before we enter in this function */
if (!ufshcd_is_ufs_dev_active(hba) || !ufshcd_is_link_active(hba)) {
- ret = -EINVAL;
+ /* Wait err handler finish or trigger err recovery */
+ if (!ufshcd_eh_in_progress(hba))
+ ufshcd_force_error_recovery(hba);
+ ret = -EBUSY;
goto enable_scaling;
}
--
2.43.0
From: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
[ Upstream commit c61115b37ff964d63191dbf4a058f481daabdf57 ]
SoCs with ACE architecture are tailored to use s2idle instead deep (S3)
suspend state and the IMR content is lost when the system is forced to
enter even to S3.
When waking up from S3 state the IMR boot will fail as the content is lost.
Set the skip_imr_boot flag to make sure that we don't try IMR in this case.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood(a)intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Link: https://msgid.link/r/20240322112504.4192-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/sof/intel/hda-dsp.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dsp.c b/sound/soc/sof/intel/hda-dsp.c
index 44f39a520bb39..e80a2a5ec56a1 100644
--- a/sound/soc/sof/intel/hda-dsp.c
+++ b/sound/soc/sof/intel/hda-dsp.c
@@ -681,17 +681,27 @@ static int hda_suspend(struct snd_sof_dev *sdev, bool runtime_suspend)
struct sof_intel_hda_dev *hda = sdev->pdata->hw_pdata;
const struct sof_intel_dsp_desc *chip = hda->desc;
struct hdac_bus *bus = sof_to_bus(sdev);
+ bool imr_lost = false;
int ret, j;
/*
- * The memory used for IMR boot loses its content in deeper than S3 state
- * We must not try IMR boot on next power up (as it will fail).
- *
+ * The memory used for IMR boot loses its content in deeper than S3
+ * state on CAVS platforms.
+ * On ACE platforms due to the system architecture the IMR content is
+ * lost at S3 state already, they are tailored for s2idle use.
+ * We must not try IMR boot on next power up in these cases as it will
+ * fail.
+ */
+ if (sdev->system_suspend_target > SOF_SUSPEND_S3 ||
+ (chip->hw_ip_version >= SOF_INTEL_ACE_1_0 &&
+ sdev->system_suspend_target == SOF_SUSPEND_S3))
+ imr_lost = true;
+
+ /*
* In case of firmware crash or boot failure set the skip_imr_boot to true
* as well in order to try to re-load the firmware to do a 'cold' boot.
*/
- if (sdev->system_suspend_target > SOF_SUSPEND_S3 ||
- sdev->fw_state == SOF_FW_CRASHED ||
+ if (imr_lost || sdev->fw_state == SOF_FW_CRASHED ||
sdev->fw_state == SOF_FW_BOOT_FAILED)
hda->skip_imr_boot = true;
--
2.43.0
From: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
[ Upstream commit c61115b37ff964d63191dbf4a058f481daabdf57 ]
SoCs with ACE architecture are tailored to use s2idle instead deep (S3)
suspend state and the IMR content is lost when the system is forced to
enter even to S3.
When waking up from S3 state the IMR boot will fail as the content is lost.
Set the skip_imr_boot flag to make sure that we don't try IMR in this case.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood(a)intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Link: https://msgid.link/r/20240322112504.4192-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/sof/intel/hda-dsp.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dsp.c b/sound/soc/sof/intel/hda-dsp.c
index 2445ae7f6b2e9..1506982a56c30 100644
--- a/sound/soc/sof/intel/hda-dsp.c
+++ b/sound/soc/sof/intel/hda-dsp.c
@@ -681,17 +681,27 @@ static int hda_suspend(struct snd_sof_dev *sdev, bool runtime_suspend)
struct sof_intel_hda_dev *hda = sdev->pdata->hw_pdata;
const struct sof_intel_dsp_desc *chip = hda->desc;
struct hdac_bus *bus = sof_to_bus(sdev);
+ bool imr_lost = false;
int ret, j;
/*
- * The memory used for IMR boot loses its content in deeper than S3 state
- * We must not try IMR boot on next power up (as it will fail).
- *
+ * The memory used for IMR boot loses its content in deeper than S3
+ * state on CAVS platforms.
+ * On ACE platforms due to the system architecture the IMR content is
+ * lost at S3 state already, they are tailored for s2idle use.
+ * We must not try IMR boot on next power up in these cases as it will
+ * fail.
+ */
+ if (sdev->system_suspend_target > SOF_SUSPEND_S3 ||
+ (chip->hw_ip_version >= SOF_INTEL_ACE_1_0 &&
+ sdev->system_suspend_target == SOF_SUSPEND_S3))
+ imr_lost = true;
+
+ /*
* In case of firmware crash or boot failure set the skip_imr_boot to true
* as well in order to try to re-load the firmware to do a 'cold' boot.
*/
- if (sdev->system_suspend_target > SOF_SUSPEND_S3 ||
- sdev->fw_state == SOF_FW_CRASHED ||
+ if (imr_lost || sdev->fw_state == SOF_FW_CRASHED ||
sdev->fw_state == SOF_FW_BOOT_FAILED)
hda->skip_imr_boot = true;
--
2.43.0
FUSE attempts to detect server support for statx by trying it once and
setting no_statx=1 if it fails with ENOSYS, but consider the following
scenario:
- Userspace (e.g. sh) calls stat() on a file
* succeeds
- Userspace (e.g. lsd) calls statx(BTIME) on the same file
- request_mask = STATX_BASIC_STATS | STATX_BTIME
- first pass: sync=true due to differing cache_mask
- statx fails and returns ENOSYS
- set no_statx and retry
- retry sets mask = STATX_BASIC_STATS
- now mask == cache_mask; sync=false (time_before: still valid)
- so we take the "else if (stat)" path
- "err" is still ENOSYS from the failed statx call
Fix this by zeroing "err" before retrying the failed call.
Fixes: d3045530bdd2 ("fuse: implement statx")
Cc: stable(a)vger.kernel.org # v6.6
Signed-off-by: Danny Lin <danny(a)orbstack.dev>
---
fs/fuse/dir.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index de452cdbf3cf..a63125ce70a4 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1362,6 +1362,7 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
err = fuse_do_statx(inode, file, stat);
if (err == -ENOSYS) {
fc->no_statx = 1;
+ err = 0;
goto retry;
}
} else {
--
2.44.0
Framebuffer memory is allocated via vzalloc() from non-contiguous
physical pages. The physical framebuffer start address is therefore
meaningless. Do not set it.
The value is not used within the kernel and only exported to userspace
on dedicated ARM configs. No functional change is expected.
v2:
- refer to vzalloc() in commit message (Javier)
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: a5b44c4adb16 ("drm/fbdev-generic: Always use shadow buffering")
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Zack Rusin <zackr(a)vmware.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v6.4+
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Reviewed-by: Zack Rusin <zack.rusin(a)broadcom.com>
Reviewed-by: Sui Jingfeng <sui.jingfeng(a)linux.dev>
Tested-by: Sui Jingfeng <sui.jingfeng(a)linux.dev>
---
drivers/gpu/drm/drm_fbdev_generic.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c b/drivers/gpu/drm/drm_fbdev_generic.c
index be357f926faec..97e579c33d84a 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -113,7 +113,6 @@ static int drm_fbdev_generic_helper_fb_probe(struct drm_fb_helper *fb_helper,
/* screen */
info->flags |= FBINFO_VIRTFB | FBINFO_READS_FAST;
info->screen_buffer = screen_buffer;
- info->fix.smem_start = page_to_phys(vmalloc_to_page(info->screen_buffer));
info->fix.smem_len = screen_size;
/* deferred I/O */
--
2.44.0
Commit 54d217406afe (drm: use mgr->dev in drm_dbg_kms in
drm_dp_add_payload_part2) appears to have been accidentially reverted as
part of commit 5aa1dfcdf0a42 (drm/mst: Refactor the flow for payload
allocation/removement).
I've been seeing NULL pointer dereferences in drm_dp_add_payload_part2
due to state->dev being NULL in the debug message printed if the payload
allocation has failed.
This commit restores mgr->dev to avoid the Oops.
Fixes: 5aa1dfcdf0a42 ("drm/mst: Refactor the flow for payload allocation/removement")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 03d528209426..3dc966f25c0c 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3437,7 +3437,7 @@ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
/* Skip failed payloads */
if (payload->payload_allocation_status != DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) {
- drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
+ drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n",
payload->port->connector->name);
return -EIO;
}
--
2.44.0
Hi
Since you backported 41044b41ad2c8c8165a42ec6e9a4096826dcf153 to 6.8.3 and
higher, the kernel crashes hard on boot if BTRFS's sanity checks have been
enabled (CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y).
Please backport b2136cc288fce2f24a92f3d656531b2d50ebec5a to the stable/mainline
series fix this issue.
Best regards
Mihai Moldovan
Hi @stable@vger.kernel.org<mailto:stable@vger.kernel.org>
This patch has been merged in mainline as commit ce5d241c3ad45.
[patch diff]
[cid:image001.png@01DA8D04.BCD8A070]
Because of our customer used v5.15, v5.4 and 4.19 for MP production, which requires this solution.
We need your help apply the patch in v5.15, v5.4 and 4.19.
Thanks!
BR,
Nini Song
Hi,
This fix was sent out for fixing S4 under stress on some Phoenix laptops.
31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
David Markey confirmed[1] it helps Framework 13 under S4 stress
testing. Can you please bring it to 6.1.y and later?
[1] https://community.frame.work/t/responded-arch-hibernation-woes-on-amd-13/45…
Thanks
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b372e96bd0a32729d55d27f613c8bc80708a82e1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024041442-grumble-saggy-01f0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b372e96bd0a32729d55d27f613c8bc80708a82e1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.de>
Date: Mon, 25 Mar 2024 09:21:20 +1100
Subject: [PATCH] ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
The page has been marked clean before writepage is called. If we don't
redirty it before postponing the write, it might never get written.
Cc: stable(a)vger.kernel.org
Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown <neilb(a)suse.de>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Reviewed-by: Xiubo Li <xiubli(a)redhat.org>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 1340d77124ae..ee9caf7916fb 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -795,8 +795,10 @@ static int ceph_writepage(struct page *page, struct writeback_control *wbc)
ihold(inode);
if (wbc->sync_mode == WB_SYNC_NONE &&
- ceph_inode_to_fs_client(inode)->write_congested)
+ ceph_inode_to_fs_client(inode)->write_congested) {
+ redirty_page_for_writepage(wbc, page);
return AOP_WRITEPAGE_ACTIVATE;
+ }
wait_on_page_fscache(page);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x b372e96bd0a32729d55d27f613c8bc80708a82e1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024041441-unabashed-swarm-244d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b372e96bd0a32729d55d27f613c8bc80708a82e1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.de>
Date: Mon, 25 Mar 2024 09:21:20 +1100
Subject: [PATCH] ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
The page has been marked clean before writepage is called. If we don't
redirty it before postponing the write, it might never get written.
Cc: stable(a)vger.kernel.org
Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown <neilb(a)suse.de>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Reviewed-by: Xiubo Li <xiubli(a)redhat.org>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 1340d77124ae..ee9caf7916fb 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -795,8 +795,10 @@ static int ceph_writepage(struct page *page, struct writeback_control *wbc)
ihold(inode);
if (wbc->sync_mode == WB_SYNC_NONE &&
- ceph_inode_to_fs_client(inode)->write_congested)
+ ceph_inode_to_fs_client(inode)->write_congested) {
+ redirty_page_for_writepage(wbc, page);
return AOP_WRITEPAGE_ACTIVATE;
+ }
wait_on_page_fscache(page);
Hi,
The commit below resolves s2idle failure on processor for AMD Family 1Ah
Model 20h,
eed14eb48ee1 drm/amdgpu/vpe: power on vpe when hw_init
Please add this commit to stable kernel 6.8.y. Thanks!
Regards,
Richard
The patch titled
Subject: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled-v2.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
Date: Fri, 12 Apr 2024 10:57:54 +0800
extend comment per Oscar
Link: https://lkml.kernel.org/r/20240412025754.1897615-1-linmiaohe@huawei.com
Fixes: a6b40850c442 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/memory-failure.c~mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled-v2
+++ a/mm/memory-failure.c
@@ -159,6 +159,10 @@ static int __page_handle_poison(struct p
* dissolve_free_huge_page() might hold cpu_hotplug_lock via static_key_slow_dec()
* when hugetlb vmemmap optimization is enabled. This will break current lock
* dependency chain and leads to deadlock.
+ * Disabling pcp before dissolving the page was a deterministic approach because
+ * we made sure that those pages cannot end up in any PCP list. Draining PCP lists
+ * expels those pages to the buddy system, but nothing guarantees that those pages
+ * do not get back to a PCP queue if we need to refill those.
*/
ret = dissolve_free_huge_page(page);
if (!ret) {
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled.patch
mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled-v2.patch
fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch
The asid is only erased from the xarray when the vm refcount reaches
zero, however this leads to potential UAF since the xe_vm_get() only
works on a vm with refcount != 0. Since the asid is allocated in the vm
create ioctl, rather erase it when closing the vm, prior to dropping the
potential last ref. This should also work when user closes driver fd
without explicit vm destroy.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a196dbe65252..c5c26b3d1b76 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1581,6 +1581,16 @@ void xe_vm_close_and_put(struct xe_vm *vm)
xe->usm.num_vm_in_fault_mode--;
else if (!(vm->flags & XE_VM_FLAG_MIGRATION))
xe->usm.num_vm_in_non_fault_mode--;
+
+ if (vm->usm.asid) {
+ void *lookup;
+
+ xe_assert(xe, xe->info.has_asid);
+ xe_assert(xe, !(vm->flags & XE_VM_FLAG_MIGRATION));
+
+ lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
+ xe_assert(xe, lookup == vm);
+ }
mutex_unlock(&xe->usm.lock);
for_each_tile(tile, xe, id)
@@ -1596,24 +1606,15 @@ static void vm_destroy_work_func(struct work_struct *w)
struct xe_device *xe = vm->xe;
struct xe_tile *tile;
u8 id;
- void *lookup;
/* xe_vm_close_and_put was not called? */
xe_assert(xe, !vm->size);
mutex_destroy(&vm->snap_mutex);
- if (!(vm->flags & XE_VM_FLAG_MIGRATION)) {
+ if (!(vm->flags & XE_VM_FLAG_MIGRATION))
xe_device_mem_access_put(xe);
- if (xe->info.has_asid && vm->usm.asid) {
- mutex_lock(&xe->usm.lock);
- lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
- xe_assert(xe, lookup == vm);
- mutex_unlock(&xe->usm.lock);
- }
- }
-
for_each_tile(tile, xe, id)
XE_WARN_ON(vm->pt_root[id]);
--
2.44.0
The patch titled
Subject: bootconfig: use memblock_free_late to free xbc memory to buddy
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
bootconfig-use-memblock_free_late-to-free-xbc-memory-to-buddy.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Qiang Zhang <qiang4.zhang(a)intel.com>
Subject: bootconfig: use memblock_free_late to free xbc memory to buddy
Date: Fri, 12 Apr 2024 10:41:04 +0800
At the time to free xbc memory, memblock has handed over memory to buddy
allocator. So it doesn't make sense to free memory back to memblock.
memblock_free() called by xbc_exit() even causes UAF bugs on architectures
with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs
shows this case.
[ 9.410890] ==================================================================
[ 9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260
[ 9.426850] Read of size 8 at addr ffff88845dd30000 by task swapper/0/1
[ 9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 6.9.0-rc3-00208-g586b5dfb51b9 #5
[ 9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D-00000000 Dec 28 2023
[ 9.460789] Call Trace:
[ 9.463518] <TASK>
[ 9.465859] dump_stack_lvl+0x53/0x70
[ 9.469949] print_report+0xce/0x610
[ 9.473944] ? __virt_addr_valid+0xf5/0x1b0
[ 9.478619] ? memblock_isolate_range+0x12d/0x260
[ 9.483877] kasan_report+0xc6/0x100
[ 9.487870] ? memblock_isolate_range+0x12d/0x260
[ 9.493125] memblock_isolate_range+0x12d/0x260
[ 9.498187] memblock_phys_free+0xb4/0x160
[ 9.502762] ? __pfx_memblock_phys_free+0x10/0x10
[ 9.508021] ? mutex_unlock+0x7e/0xd0
[ 9.512111] ? __pfx_mutex_unlock+0x10/0x10
[ 9.516786] ? kernel_init_freeable+0x2d4/0x430
[ 9.521850] ? __pfx_kernel_init+0x10/0x10
[ 9.526426] xbc_exit+0x17/0x70
[ 9.529935] kernel_init+0x38/0x1e0
[ 9.533829] ? _raw_spin_unlock_irq+0xd/0x30
[ 9.538601] ret_from_fork+0x2c/0x50
[ 9.542596] ? __pfx_kernel_init+0x10/0x10
[ 9.547170] ret_from_fork_asm+0x1a/0x30
[ 9.551552] </TASK>
[ 9.555649] The buggy address belongs to the physical page:
[ 9.561875] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x45dd30
[ 9.570821] flags: 0x200000000000000(node=0|zone=2)
[ 9.576271] page_type: 0xffffffff()
[ 9.580167] raw: 0200000000000000 ffffea0011774c48 ffffea0012ba1848 0000000000000000
[ 9.588823] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[ 9.597476] page dumped because: kasan: bad access detected
[ 9.605362] Memory state around the buggy address:
[ 9.610714] ffff88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9.618786] ffff88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9.626857] >ffff88845dd30000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 9.634930] ^
[ 9.638534] ffff88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 9.646605] ffff88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 9.654675] ==================================================================
Link: https://lkml.kernel.org/r/20240412024103.3078378-1-qiang4.zhang@linux.intel…
Signed-off-by: Qiang Zhang <qiang4.zhang(a)intel.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/bootconfig.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/bootconfig.c~bootconfig-use-memblock_free_late-to-free-xbc-memory-to-buddy
+++ a/lib/bootconfig.c
@@ -63,7 +63,7 @@ static inline void * __init xbc_alloc_me
static inline void __init xbc_free_mem(void *addr, size_t size)
{
- memblock_free(addr, size);
+ memblock_free_late(__pa(addr), size);
}
#else /* !__KERNEL__ */
_
Patches currently in -mm which might be from qiang4.zhang(a)intel.com are
bootconfig-use-memblock_free_late-to-free-xbc-memory-to-buddy.patch
Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:
commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the actively used cid when it fails to observe active task after it
sets lazy_put.
There *is* a memory barrier between storing to rq->curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq->curr
and switch_mm_cid(), which happens earlier than:
- spin_unlock(),
- switch_to().
So it's fine when the architecture switch_mm() happens to have that
barrier already, but less so when the architecture only provides the
full barrier in switch_to() or spin_unlock().
It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().
Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.
Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.
Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
Reported-by: levi.yun <yeoreum.yun(a)arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com> # for arm64
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com> # for x86
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Cc: <stable(a)vger.kernel.org> # 6.4.x
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: levi.yun <yeoreum.yun(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: linux-arch(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: x86(a)kernel.org
---
arch/x86/include/asm/barrier.h | 3 +++
include/asm-generic/barrier.h | 8 ++++++++
kernel/sched/sched.h | 20 ++++++++++++++------
3 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0216f63a366b..d0795b5fab46 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -79,6 +79,9 @@ do { \
#define __smp_mb__before_atomic() do { } while (0)
#define __smp_mb__after_atomic() do { } while (0)
+/* Writing to CR3 provides a full memory barrier in switch_mm(). */
+#define smp_mb__after_switch_mm() do { } while (0)
+
#include <asm-generic/barrier.h>
#endif /* _ASM_X86_BARRIER_H */
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d88f9ef..5a6c94d7a598 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -296,5 +296,13 @@ do { \
#define io_stop_wc() do { } while (0)
#endif
+/*
+ * Architectures that guarantee an implicit smp_mb() in switch_mm()
+ * can override smp_mb__after_switch_mm.
+ */
+#ifndef smp_mb__after_switch_mm
+#define smp_mb__after_switch_mm() smp_mb()
+#endif
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 001fe047bd5d..35717359d3ca 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -79,6 +79,8 @@
# include <asm/paravirt_api_clock.h>
#endif
+#include <asm/barrier.h>
+
#include "cpupri.h"
#include "cpudeadline.h"
@@ -3445,13 +3447,19 @@ static inline void switch_mm_cid(struct rq *rq,
* between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
* Provide it here.
*/
- if (!prev->mm) // from kernel
+ if (!prev->mm) { // from kernel
smp_mb();
- /*
- * user -> user transition guarantees a memory barrier through
- * switch_mm() when current->mm changes. If current->mm is
- * unchanged, no barrier is needed.
- */
+ } else { // from user
+ /*
+ * user -> user transition relies on an implicit
+ * memory barrier in switch_mm() when
+ * current->mm changes. If the architecture
+ * switch_mm() does not have an implicit memory
+ * barrier, it is emitted here. If current->mm
+ * is unchanged, no barrier is needed.
+ */
+ smp_mb__after_switch_mm();
+ }
}
if (prev->mm_cid_active) {
mm_cid_snapshot_time(rq, prev->mm);
--
2.25.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed
2) pages_lost - incremented every time a writer overwrites a sub-buffer
3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 25476ead681b..6511dc3a00da 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1393,7 +1393,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
- local_inc(&cpu_buffer->pages_touched);
/*
* Just make sure we have seen our old_write and synchronize
* with any interrupts that come in.
@@ -1430,8 +1429,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
*/
local_set(&next_page->page->commit, 0);
- /* Again, either we update tail_page or an interrupt does */
- (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page);
+ /* Either we update tail_page or an interrupt does */
+ if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page))
+ local_inc(&cpu_buffer->pages_touched);
}
}
--
2.43.0
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: e4a6bceac98eba3c00e874892736b34ea5fdaca3
Gitweb: https://git.kernel.org/tip/e4a6bceac98eba3c00e874892736b34ea5fdaca3
Author: John Stultz <jstultz(a)google.com>
AuthorDate: Wed, 10 Apr 2024 16:26:28 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 12 Apr 2024 14:11:15 +02:00
selftests: timers: Fix posix_timers ksft_print_msg() warning
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
check_timer_distribution()") the following warning occurs when building
with an older gcc:
posix_timers.c:250:2: warning: format not a string literal and no format arguments [-Wformat-security]
250 | ksft_print_msg(errmsg);
| ^~~~~~~~~~~~~~
Fix this up by changing it to ksft_print_msg("%s", errmsg)
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Justin Stitt <justinstitt(a)google.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240410232637.4135564-1-jstultz@google.com
---
tools/testing/selftests/timers/posix_timers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d86a0e0..348f471 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -247,7 +247,7 @@ static int check_timer_distribution(void)
ksft_test_result_skip("check signal distribution (old kernel)\n");
return 0;
err:
- ksft_print_msg(errmsg);
+ ksft_print_msg("%s", errmsg);
return -1;
}
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: f7d5bcd35d427daac7e206b1073ca14f5db85c27
Gitweb: https://git.kernel.org/tip/f7d5bcd35d427daac7e206b1073ca14f5db85c27
Author: Nathan Chancellor <nathan(a)kernel.org>
AuthorDate: Thu, 11 Apr 2024 11:45:40 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 12 Apr 2024 14:11:15 +02:00
selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
check_timer_distribution()"), clang warns:
tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
401 | return major > min_major || (major == min_major && minor >= min_minor);
| ^~~~~
tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
395 | unsigned int major, minor;
| ^
| = 0
This is a false positive because if uname() fails, ksft_exit_fail_msg()
will be called, which unconditionally calls exit(), a noreturn function.
However, clang does not know that ksft_exit_fail_msg() will call exit() at
the point in the pipeline that the warning is emitted because inlining has
not occurred, so it assumes control flow will resume normally after
ksft_exit_fail_msg() is called.
Make it clear to clang that all of the functions that call exit()
unconditionally in kselftest.h are noreturn transitively by marking them
explicitly with '__attribute__((__noreturn__))', which clears up the
warning above and any future warnings that may appear for the same reason.
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240411-mark-kselftest-exit-funcs-noreturn-v1-1-…
Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
---
tools/testing/selftests/kselftest.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 973b18e..0591974 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -80,6 +80,9 @@
#define KSFT_XPASS 3
#define KSFT_SKIP 4
+#ifndef __noreturn
+#define __noreturn __attribute__((__noreturn__))
+#endif
#define __printf(a, b) __attribute__((format(printf, a, b)))
/* counters */
@@ -300,13 +303,13 @@ void ksft_test_result_code(int exit_code, const char *test_name,
va_end(args);
}
-static inline int ksft_exit_pass(void)
+static inline __noreturn int ksft_exit_pass(void)
{
ksft_print_cnts();
exit(KSFT_PASS);
}
-static inline int ksft_exit_fail(void)
+static inline __noreturn int ksft_exit_fail(void)
{
ksft_print_cnts();
exit(KSFT_FAIL);
@@ -333,7 +336,7 @@ static inline int ksft_exit_fail(void)
ksft_cnt.ksft_xfail + \
ksft_cnt.ksft_xskip)
-static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
@@ -348,19 +351,19 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
exit(KSFT_FAIL);
}
-static inline int ksft_exit_xfail(void)
+static inline __noreturn int ksft_exit_xfail(void)
{
ksft_print_cnts();
exit(KSFT_XFAIL);
}
-static inline int ksft_exit_xpass(void)
+static inline __noreturn int ksft_exit_xpass(void)
{
ksft_print_cnts();
exit(KSFT_XPASS);
}
-static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: ed366de8ec89d4f960d66c85fc37d9de22f7bf6d
Gitweb: https://git.kernel.org/tip/ed366de8ec89d4f960d66c85fc37d9de22f7bf6d
Author: John Stultz <jstultz(a)google.com>
AuthorDate: Wed, 10 Apr 2024 16:26:30 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 12 Apr 2024 14:11:15 +02:00
selftests: timers: Fix abs() warning in posix_timers test
Building with clang results in the following warning:
posix_timers.c:69:6: warning: absolute value function 'abs' given an
argument of type 'long long' but has parameter of type 'int' which may
cause truncation of value [-Wabsolute-value]
if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
^
So switch to using llabs() instead.
Fixes: 0bc4b0cf1570 ("selftests: add basic posix timers selftests")
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240410232637.4135564-3-jstultz@google.com
---
tools/testing/selftests/timers/posix_timers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index 348f471..c001dd7 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -66,7 +66,7 @@ static int check_diff(struct timeval start, struct timeval end)
diff = end.tv_usec - start.tv_usec;
diff += (end.tv_sec - start.tv_sec) * USECS_PER_SEC;
- if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
+ if (llabs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
printf("Diff too high: %lld..", diff);
return -1;
}
The SCSI sg driver oopsed on my 6.8.4 kernel, and I noticed that a
patch (presumably) to fix this was pulled from 6.8.5. However, I also
noticed this email:
>> Reverted this patch and I couldn't see the reposted warning.
>> scsi: sg: Avoid sg device teardown race
>> [ Upstream commit 27f58c04a8f438078583041468ec60597841284d ]
>
> Fix is here:
>
> https://git.kernel.org/mkp/scsi/c/d4e655c49f47
Is this unsuitable for 6.8.6 please?
Thanks,
Chris
The commit 407d1a51921e ("PCI: Create device tree node for bridge")
creates of_node for PCI devices.
During the insertion handling of these new DT nodes done by of_platform,
new devices (struct device) are created. For each PCI devices a struct
device is already present (created and handled by the PCI core).
Having a second struct device to represent the exact same PCI device is
not correct.
On the of_node creation:
- tell the of_platform that there is no need to create a device for this
node (OF_POPULATED flag),
- link this newly created of_node to the already present device,
- tell fwnode that the device attached to this of_node is ready using
fwnode_dev_initialized().
With this fix, the of_node are available in the sysfs device tree:
/sys/devices/platform/soc/d0070000.pcie/
+ of_node -> .../devicetree/base/soc/pcie@d0070000
+ pci0000:00
+ 0000:00:00.0
+ of_node -> .../devicetree/base/soc/pcie@d0070000/pci@0,0
+ 0000:01:00.0
+ of_node -> .../devicetree/base/soc/pcie@d0070000/pci@0,0/dev@0,0
On the of_node removal, revert the operations.
Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
Cc: stable(a)vger.kernel.org
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/pci/of.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 51e3dd0ea5ab..5afd2731e876 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -615,7 +615,8 @@ void of_pci_remove_node(struct pci_dev *pdev)
np = pci_device_to_OF_node(pdev);
if (!np || !of_node_check_flag(np, OF_DYNAMIC))
return;
- pdev->dev.of_node = NULL;
+
+ device_remove_of_node(&pdev->dev);
of_changeset_revert(np->data);
of_changeset_destroy(np->data);
@@ -668,12 +669,22 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
if (ret)
goto out_free_node;
+ /*
+ * This of_node will be added to an existing device.
+ * Avoid any device creation and use the existing device
+ */
+ of_node_set_flag(np, OF_POPULATED);
+ np->fwnode.dev = &pdev->dev;
+ fwnode_dev_initialized(&np->fwnode, true);
+
ret = of_changeset_apply(cset);
if (ret)
goto out_free_node;
np->data = cset;
- pdev->dev.of_node = np;
+
+ /* Add the of_node to the existing device */
+ device_add_of_node(&pdev->dev, np);
kfree(name);
return;
--
2.44.0
In __pci_register_driver(), the pci core overwrites the dev_groups field of
the embedded struct device_driver with the dev_groups from the outer
struct pci_driver unconditionally.
Set dev_groups in the pci_driver to make sure it is used.
This was broken since the introduction of pvpanic-pci.
Fixes: db3a4f0abefd ("misc/pvpanic: add PCI driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Greg,
does it make sense to duplicate fields between struct pci_driver and
struct device_driver?
The fields "name", "groups" and "dev_groups" are duplicated.
pci_driver::dev_groups was introduced in
commit ded13b9cfd59 ("PCI: Add support for dev_groups to struct pci_driver")
because "this helps converting PCI drivers sysfs attributes to static"
I don't understand the reasoning. The embedded device_driver shares the
same storage lifetime and the fields have the exact same type.
---
drivers/misc/pvpanic/pvpanic-pci.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/misc/pvpanic/pvpanic-pci.c b/drivers/misc/pvpanic/pvpanic-pci.c
index 9ad20e82785b..b21598a18f6d 100644
--- a/drivers/misc/pvpanic/pvpanic-pci.c
+++ b/drivers/misc/pvpanic/pvpanic-pci.c
@@ -44,8 +44,6 @@ static struct pci_driver pvpanic_pci_driver = {
.name = "pvpanic-pci",
.id_table = pvpanic_pci_id_tbl,
.probe = pvpanic_pci_probe,
- .driver = {
- .dev_groups = pvpanic_dev_groups,
- },
+ .dev_groups = pvpanic_dev_groups,
};
module_pci_driver(pvpanic_pci_driver);
---
base-commit: 00dcf5d862e86e57f5ce46344039f11bb1ad61f6
change-id: 20240411-pvpanic-pci-dev-groups-e3beebcbc4e4
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
The patch below does not apply to the 6.7-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.7.y
git checkout FETCH_HEAD
git cherry-pick -x 0e45882ca829b26b915162e8e86dbb1095768e9e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033027-expensive-footage-f3ea@gregkh' --subject-prefix 'PATCH 6.7.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e45882ca829b26b915162e8e86dbb1095768e9e Mon Sep 17 00:00:00 2001
From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Date: Tue, 5 Mar 2024 15:35:06 +0100
Subject: [PATCH] drm/i915/vma: Fix UAF on destroy against retire race
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Object debugging tools were sporadically reporting illegal attempts to
free a still active i915 VMA object when parking a GT believed to be idle.
[161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
[161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0
...
[161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1
[161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
[161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
[161.360592] RIP: 0010:debug_print_object+0x80/0xb0
...
[161.361347] debug_object_free+0xeb/0x110
[161.361362] i915_active_fini+0x14/0x130 [i915]
[161.361866] release_references+0xfe/0x1f0 [i915]
[161.362543] i915_vma_parked+0x1db/0x380 [i915]
[161.363129] __gt_park+0x121/0x230 [i915]
[161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]
That has been tracked down to be happening when another thread is
deactivating the VMA inside __active_retire() helper, after the VMA's
active counter has been already decremented to 0, but before deactivation
of the VMA's object is reported to the object debugging tool.
We could prevent from that race by serializing i915_active_fini() with
__active_retire() via ref->tree_lock, but that wouldn't stop the VMA from
being used, e.g. from __i915_vma_retire() called at the end of
__active_retire(), after that VMA has been already freed by a concurrent
i915_vma_destroy() on return from the i915_active_fini(). Then, we should
rather fix the issue at the VMA level, not in i915_active.
Since __i915_vma_parked() is called from __gt_park() on last put of the
GT's wakeref, the issue could be addressed by holding the GT wakeref long
enough for __active_retire() to complete before that wakeref is released
and the GT parked.
I believe the issue was introduced by commit d93939730347 ("drm/i915:
Remove the vma refcount") which moved a call to i915_active_fini() from
a dropped i915_vma_release(), called on last put of the removed VMA kref,
to i915_vma_parked() processing path called on last put of a GT wakeref.
However, its visibility to the object debugging tool was suppressed by a
bug in i915_active that was fixed two weeks later with commit e92eb246feb9
("drm/i915/active: Fix missing debug object activation").
A VMA associated with a request doesn't acquire a GT wakeref by itself.
Instead, it depends on a wakeref held directly by the request's active
intel_context for a GT associated with its VM, and indirectly on that
intel_context's engine wakeref if the engine belongs to the same GT as the
VMA's VM. Those wakerefs are released asynchronously to VMA deactivation.
Fix the issue by getting a wakeref for the VMA's GT when activating it,
and putting that wakeref only after the VMA is deactivated. However,
exclude global GTT from that processing path, otherwise the GPU never goes
idle. Since __i915_vma_retire() may be called from atomic contexts, use
async variant of wakeref put. Also, to avoid circular locking dependency,
take care of acquiring the wakeref before VM mutex when both are needed.
v7: Add inline comments with justifications for:
- using untracked variants of intel_gt_pm_get/put() (Nirmoy),
- using async variant of _put(),
- not getting the wakeref in case of a global GTT,
- always getting the first wakeref outside vm->mutex.
v6: Since __i915_vma_active/retire() callbacks are not serialized, storing
a wakeref tracking handle inside struct i915_vma is not safe, and
there is no other good place for that. Use untracked variants of
intel_gt_pm_get/put_async().
v5: Replace "tile" with "GT" across commit description (Rodrigo),
- avoid mentioning multi-GT case in commit description (Rodrigo),
- explain why we need to take a temporary wakeref unconditionally inside
i915_vma_pin_ww() (Rodrigo).
v4: Refresh on top of commit 5e4e06e4087e ("drm/i915: Track gt pm
wakerefs") (Andi),
- for more easy backporting, split out removal of former insufficient
workarounds and move them to separate patches (Nirmoy).
- clean up commit message and description a bit.
v3: Identify root cause more precisely, and a commit to blame,
- identify and drop former workarounds,
- update commit message and description.
v2: Get the wakeref before VM mutex to avoid circular locking dependency,
- drop questionable Fixes: tag.
Fixes: d93939730347 ("drm/i915: Remove the vma refcount")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/8875
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Cc: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: stable(a)vger.kernel.org # v5.19+
Reviewed-by: Nirmoy Das <nirmoy.das(a)intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240305143747.335367-6-janus…
(cherry picked from commit f3c71b2ded5c4367144a810ef25f998fd1d6c381)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d09aad34ba37..b70715b1411d 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -34,6 +34,7 @@
#include "gt/intel_engine.h"
#include "gt/intel_engine_heartbeat.h"
#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
#include "gt/intel_gt_requests.h"
#include "gt/intel_tlb.h"
@@ -103,12 +104,42 @@ static inline struct i915_vma *active_to_vma(struct i915_active *ref)
static int __i915_vma_active(struct i915_active *ref)
{
- return i915_vma_tryget(active_to_vma(ref)) ? 0 : -ENOENT;
+ struct i915_vma *vma = active_to_vma(ref);
+
+ if (!i915_vma_tryget(vma))
+ return -ENOENT;
+
+ /*
+ * Exclude global GTT VMA from holding a GT wakeref
+ * while active, otherwise GPU never goes idle.
+ */
+ if (!i915_vma_is_ggtt(vma)) {
+ /*
+ * Since we and our _retire() counterpart can be
+ * called asynchronously, storing a wakeref tracking
+ * handle inside struct i915_vma is not safe, and
+ * there is no other good place for that. Hence,
+ * use untracked variants of intel_gt_pm_get/put().
+ */
+ intel_gt_pm_get_untracked(vma->vm->gt);
+ }
+
+ return 0;
}
static void __i915_vma_retire(struct i915_active *ref)
{
- i915_vma_put(active_to_vma(ref));
+ struct i915_vma *vma = active_to_vma(ref);
+
+ if (!i915_vma_is_ggtt(vma)) {
+ /*
+ * Since we can be called from atomic contexts,
+ * use an async variant of intel_gt_pm_put().
+ */
+ intel_gt_pm_put_async_untracked(vma->vm->gt);
+ }
+
+ i915_vma_put(vma);
}
static struct i915_vma *
@@ -1404,7 +1435,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
struct i915_vma_work *work = NULL;
struct dma_fence *moving = NULL;
struct i915_vma_resource *vma_res = NULL;
- intel_wakeref_t wakeref = 0;
+ intel_wakeref_t wakeref;
unsigned int bound;
int err;
@@ -1424,8 +1455,14 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
if (err)
return err;
- if (flags & PIN_GLOBAL)
- wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
+ /*
+ * In case of a global GTT, we must hold a runtime-pm wakeref
+ * while global PTEs are updated. In other cases, we hold
+ * the rpm reference while the VMA is active. Since runtime
+ * resume may require allocations, which are forbidden inside
+ * vm->mutex, get the first rpm wakeref outside of the mutex.
+ */
+ wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm);
if (flags & vma->vm->bind_async_flags) {
/* lock VM */
@@ -1561,8 +1598,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
if (work)
dma_fence_work_commit_imm(&work->base);
err_rpm:
- if (wakeref)
- intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
+ intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
if (moving)
dma_fence_put(moving);
MTD OTP logic is very fragile on parsing NVMEM Cell and can be
problematic with some specific kind of devices.
The problem was discovered by e87161321a40 ("mtd: rawnand: macronix:
OTP access for MX30LFxG18AC") where OTP support was added to a NAND
device. With the case of NAND devices, it does require a node where ECC
info are declared and all the fixed partitions, and this cause the OTP
codepath to parse this node as OTP NVMEM Cells, making probe fail and
the NAND device registration fail.
MTD OTP parsing should have been limited to always using compatible to
prevent this error by using node with compatible "otp-user" or
"otp-factory".
NVMEM across the years had various iteration on how Cells could be
declared in DT, in some old implementation, no_of_node should have been
enabled but now add_legacy_fixed_of_cells should be used to disable
NVMEM to parse child node as NVMEM Cell.
To fix this and limit any regression with other MTD that makes use of
declaring OTP as direct child of the dev node, disable
add_legacy_fixed_of_cells if we detect the MTD type is Nand.
With the following logic, the OTP NVMEM entry is correctly created with
no Cells and the MTD Nand is correctly probed and partitions are
correctly exposed.
Fixes: 4b361cfa8624 ("mtd: core: add OTP nvmem provider support")
Cc: <stable(a)vger.kernel.org> # v6.7+
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
To backport this to v6.6 and previous,
config.no_of_node = mtd_type_is_nand(mtd);
should be used as it does pose the same usage of
add_legacy_fixed_of_cells.
Changes v4:
- Add info on how to backport this to previous kernel
- Fix Fixes tag
- Reformat commit description as it was unprecise and
had false statement
Changes v3:
- Fix commit description
Changes v2:
- Use mtd_type_is_nand instead of node name check
drivers/mtd/mtdcore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 5887feb347a4..0de87bc63840 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -900,7 +900,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
config.name = compatible;
config.id = NVMEM_DEVID_AUTO;
config.owner = THIS_MODULE;
- config.add_legacy_fixed_of_cells = true;
+ config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
config.type = NVMEM_TYPE_OTP;
config.root_only = true;
config.ignore_wp = true;
--
2.43.0
The table of primary plane formats wasn't sorted at all, leading to
applications picking our least desirable formats by defaults.
Sort the primary plane formats according to our order of preference.
Nice side-effect of this change is that it makes IGT's kms_atomic
plane-invalid-params pass because the test picks the first format
which for vmwgfx was DRM_FORMAT_XRGB1555 and uses fb's with odd sizes
which make Pixman, which IGT depends on assert due to the fact that our
16bpp formats aren't 32 bit aligned like Pixman requires all formats
to be.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 36cc79bc9077 ("drm/vmwgfx: Add universal plane support")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.12+
Acked-by: Pekka Paalanen <pekka.paalanen(a)collabora.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_kms.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index bf9931e3a728..bf24f2f0dcfc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -233,10 +233,10 @@ struct vmw_framebuffer_bo {
static const uint32_t __maybe_unused vmw_primary_plane_formats[] = {
- DRM_FORMAT_XRGB1555,
- DRM_FORMAT_RGB565,
DRM_FORMAT_XRGB8888,
DRM_FORMAT_ARGB8888,
+ DRM_FORMAT_RGB565,
+ DRM_FORMAT_XRGB1555,
};
static const uint32_t __maybe_unused vmw_cursor_plane_formats[] = {
--
2.40.1
The conditional was supposed to prevent enabling of a crtc state
without a set primary plane. Accidently it also prevented disabling
crtc state with a set primary plane. Neither is correct.
Fix the conditional and just driver-warn when a crtc state has been
enabled without a primary plane which will help debug broken userspace.
Fixes IGT's kms_atomic_interruptible and kms_atomic_transition tests.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 06ec41909e31 ("drm/vmwgfx: Add and connect CRTC helper functions")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.12+
Reviewed-by: Ian Forbes <ian.forbes(a)broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e33e5993d8fc..13b2820cae51 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -931,6 +931,7 @@ int vmw_du_cursor_plane_atomic_check(struct drm_plane *plane,
int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
struct drm_atomic_state *state)
{
+ struct vmw_private *vmw = vmw_priv(crtc->dev);
struct drm_crtc_state *new_state = drm_atomic_get_new_crtc_state(state,
crtc);
struct vmw_display_unit *du = vmw_crtc_to_du(new_state->crtc);
@@ -938,9 +939,13 @@ int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
bool has_primary = new_state->plane_mask &
drm_plane_mask(crtc->primary);
- /* We always want to have an active plane with an active CRTC */
- if (has_primary != new_state->enable)
- return -EINVAL;
+ /*
+ * This is fine in general, but broken userspace might expect
+ * some actual rendering so give a clue as why it's blank.
+ */
+ if (new_state->enable && !has_primary)
+ drm_dbg_driver(&vmw->drm,
+ "CRTC without a primary plane will be blank.\n");
if (new_state->connector_mask != connector_mask &&
--
2.40.1
No upstream commit exists for this patch.
Fuzzing of 5.10 stable branch reports a slab-out-of-bounds error in
ata_scsi_pass_thru.
The error is fixed in 5.18 by commit ce70fd9a551a ("scsi: core: Remove the
cmd field from struct scsi_request") upstream.
Backporting this commit would require significant changes to the code so
it is bettter to use a simple fix for that particular error.
The problem is that the length of the received SCSI command is not
validated if scsi_op == VARIABLE_LENGTH_CMD. It can lead to out-of-bounds
reading if the user sends a request with SCSI command of length less than
32.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Signed-off-by: Artem Sadovnikov <ancowi69(a)gmail.com>
Signed-off-by: Mikhail Ivanov <iwanov-23(a)bk.ru>
Signed-off-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
---
v2: The new addresses were added and the text was updated.
drivers/ata/libata-scsi.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dfa090ccd21c..77589e911d3d 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4065,6 +4065,9 @@ int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
if (unlikely(!scmd->cmd_len))
goto bad_cdb_len;
+
+ if (scsi_op == VARIABLE_LENGTH_CMD && scmd->cmd_len < 32)
+ goto bad_cdb_len;
if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) {
if (unlikely(scmd->cmd_len > dev->cdb_len))
--
2.25.1
commit 2f4a4d63a193be6fd530d180bb13c3592052904c modified
cpc_read/cpc_write to use access_width to read CPC registers. For PCC
registers the access width field in the ACPI register macro specifies
the PCC subspace id. For non-zero PCC subspace id the access width is
incorrectly treated as access width. This causes errors when reading
from PCC registers in the CPPC driver.
For PCC registers base the size of read/write on the bit width field.
The debug message in cpc_read/cpc_write is updated to print relevant
information for the address space type used to read the register.
Signed-off-by: Vanshidhar Konda <vanshikonda(a)os.amperecomputing.com>
Tested-by: Jarred White <jarredwhite(a)linux.microsoft.com>
Reviewed-by: Jarred White <jarredwhite(a)linux.microsoft.com>
Cc: 5.15+ <stable(a)vger.kernel.org> # 5.15+
---
When testing v6.9-rc1 kernel on AmpereOne system dmesg showed that
cpufreq policy had failed to initialize on some cores during boot because
cpufreq->get() always returned 0. On this system CPPC registers are in PCC
subspace index 2 that are 32 bits wide. With this patch the CPPC driver
interpreted the access width field as 16 bits, causing the register read
to roll over too quickly to provide valid values during frequency
computation.
v2:
- Use size variable in debug print message
- Use size instead of reg->bit_width for acpi_os_read_memory and
acpi_os_write_memory
drivers/acpi/cppc_acpi.c | 53 ++++++++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 16 deletions(-)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 4bfbe55553f4..a037e9d15f48 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1002,14 +1002,14 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val)
}
*val = 0;
+ size = GET_BIT_WIDTH(reg);
if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
- u32 width = GET_BIT_WIDTH(reg);
u32 val_u32;
acpi_status status;
status = acpi_os_read_port((acpi_io_address)reg->address,
- &val_u32, width);
+ &val_u32, size);
if (ACPI_FAILURE(status)) {
pr_debug("Error: Failed to read SystemIO port %llx\n",
reg->address);
@@ -1018,17 +1018,22 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val)
*val = val_u32;
return 0;
- } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
+ } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0) {
+ /*
+ * For registers in PCC space, the register size is determined
+ * by the bit width field; the access size is used to indicate
+ * the PCC subspace id.
+ */
+ size = reg->bit_width;
vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id);
+ }
else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
vaddr = reg_res->sys_mem_vaddr;
else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE)
return cpc_read_ffh(cpu, reg, val);
else
return acpi_os_read_memory((acpi_physical_address)reg->address,
- val, reg->bit_width);
-
- size = GET_BIT_WIDTH(reg);
+ val, size);
switch (size) {
case 8:
@@ -1044,8 +1049,13 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val)
*val = readq_relaxed(vaddr);
break;
default:
- pr_debug("Error: Cannot read %u bit width from PCC for ss: %d\n",
- reg->bit_width, pcc_ss_id);
+ if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) {
+ pr_debug("Error: Cannot read %u width from for system memory: 0x%llx\n",
+ size, reg->address);
+ } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
+ pr_debug("Error: Cannot read %u bit width to PCC for ss: %d\n",
+ size, pcc_ss_id);
+ }
return -EFAULT;
}
@@ -1063,12 +1073,13 @@ static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val)
int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
struct cpc_reg *reg = ®_res->cpc_entry.reg;
+ size = GET_BIT_WIDTH(reg);
+
if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
- u32 width = GET_BIT_WIDTH(reg);
acpi_status status;
status = acpi_os_write_port((acpi_io_address)reg->address,
- (u32)val, width);
+ (u32)val, size);
if (ACPI_FAILURE(status)) {
pr_debug("Error: Failed to write SystemIO port %llx\n",
reg->address);
@@ -1076,17 +1087,22 @@ static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val)
}
return 0;
- } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
+ } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0) {
+ /*
+ * For registers in PCC space, the register size is determined
+ * by the bit width field; the access size is used to indicate
+ * the PCC subspace id.
+ */
+ size = reg->bit_width;
vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id);
+ }
else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
vaddr = reg_res->sys_mem_vaddr;
else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE)
return cpc_write_ffh(cpu, reg, val);
else
return acpi_os_write_memory((acpi_physical_address)reg->address,
- val, reg->bit_width);
-
- size = GET_BIT_WIDTH(reg);
+ val, size);
if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
val = MASK_VAL(reg, val);
@@ -1105,8 +1121,13 @@ static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val)
writeq_relaxed(val, vaddr);
break;
default:
- pr_debug("Error: Cannot write %u bit width to PCC for ss: %d\n",
- reg->bit_width, pcc_ss_id);
+ if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) {
+ pr_debug("Error: Cannot write %u width from for system memory: 0x%llx\n",
+ size, reg->address);
+ } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
+ pr_debug("Error: Cannot write %u bit width to PCC for ss: %d\n",
+ size, pcc_ss_id);
+ }
ret_val = -EFAULT;
break;
}
--
2.43.1
The ISRs of the tps25750 and tps6598x do not handle generated events
properly under all circumstances.
The tps6598x ISR does not read all bits of the INT_EVENTX registers,
leaving events signaled with bits above 64 unattended. Moreover, these
events are not cleared, leaving the interrupt enabled.
The tps25750 reads all bits of the INT_EVENT1 register, but the event
checking is not right because the same event is checked in two different
regions of the same register by means of an OR operation.
This series aims to fix both issues by reading all bits of the
INT_EVENTX registers, and limiting the event checking to the region
where the supported events are defined (currently they are limited to
the first 64 bits of the registers, as the are defined as BIT_ULL()).
If the need for events above the first 64 bits of the INT_EVENTX
registers arises, a different mechanism might be required. But for the
current needs, all definitions can be left as they are.
Note: resend to add the Cc tag for 'stable' (fixes in the series).
Signed-off-by: Javier Carrasco <javier.carrasco(a)wolfvision.net>
---
Javier Carrasco (2):
usb: typec: tipd: fix event checking for tps25750
usb: typec: tipd: fix event checking for tps6598x
drivers/usb/typec/tipd/core.c | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240328-tps6598x_fix_event_handling-3398d3d82f85
Best regards,
--
Javier Carrasco <javier.carrasco(a)wolfvision.net>
Hi,
These patches fix and reported by xfstests tests xfs/179 xfs/270
xfs/557 xfs/606, the patchset were tested to confirm they fix those
tests. all are clean picks.
thanks,
MNAdam
From: Vasiliy Kovalev <kovalev(a)altlinux.org>
When returning from the hci_disconnect() function, the conn->state
continues to be set to BT_CONNECTED and hci_conn_drop() is executed,
which decrements the conn->refcnt.
Syzkaller has generated a reproducer that results in multiple calls to
hci_encrypt_change_evt() of the same conn object.
--
hci_encrypt_change_evt(){
// conn->state == BT_CONNECTED
hci_disconnect(){
hci_abort_conn();
}
hci_conn_drop();
// conn->state == BT_CONNECTED
}
--
This behavior can cause the conn->refcnt to go far into negative values
and cause problems. To get around this, you need to change the conn->state,
namely to BT_DISCONN, as it was before.
Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
net/bluetooth/hci_event.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 64477e1bde7cec..e0477021183f9b 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2989,6 +2989,7 @@ static void hci_cs_le_start_enc(struct hci_dev *hdev, u8 status)
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
unlock:
hci_dev_unlock(hdev);
@@ -3654,6 +3655,7 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, void *data,
hci_encrypt_cfm(conn, ev->status);
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
goto unlock;
}
@@ -5248,6 +5250,7 @@ static void hci_key_refresh_complete_evt(struct hci_dev *hdev, void *data,
if (ev->status && conn->state == BT_CONNECTED) {
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
goto unlock;
}
--
2.33.8
After a recent discussion regarding "do we need a 'nobackport' tag" I
set out to create one change for stable-kernel-rules.rst. This is now
the second patch in the series, which links to that discussion; the
other stuff is fine-tuning that happened along the way.
Ciao, Thorsten
Thorsten Leemhuis (4):
docs: stable-kernel-rules: reduce redundancy
docs: stable-kernel-rules: mention "no semi-automatic backport"
docs: stable-kernel-rules: call mainline by its name and change
example
docs: stable-kernel-rules: remove code-labels tags
Documentation/process/stable-kernel-rules.rst | 50 +++++++------------
1 file changed, 18 insertions(+), 32 deletions(-)
base-commit: 3f86ed6ec0b390c033eae7f9c487a3fea268e027
--
2.44.0
Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:
commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the actively used cid when it fails to observe active task after it
sets lazy_put.
There *is* a memory barrier between storing to rq->curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq->curr
and switch_mm_cid(), which happens earlier than:
- spin_unlock(),
- switch_to().
So it's fine when the architecture switch_mm happens to have that barrier
already, but less so when the architecture only provides the full barrier
in switch_to() or spin_unlock().
It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().
Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.
Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.
Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
Reported-by: levi.yun <yeoreum.yun(a)arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Cc: <stable(a)vger.kernel.org> # 6.4.x
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: levi.yun <yeoreum.yun(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
---
arch/x86/include/asm/barrier.h | 3 +++
include/asm-generic/barrier.h | 8 ++++++++
kernel/sched/sched.h | 20 ++++++++++++++------
3 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 35389b2af88e..0d5e54201eb2 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -79,6 +79,9 @@ do { \
#define __smp_mb__before_atomic() do { } while (0)
#define __smp_mb__after_atomic() do { } while (0)
+/* Writing to CR3 provides a full memory barrier in switch_mm(). */
+#define smp_mb__after_switch_mm() do { } while (0)
+
#include <asm-generic/barrier.h>
/*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d88f9ef..5a6c94d7a598 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -296,5 +296,13 @@ do { \
#define io_stop_wc() do { } while (0)
#endif
+/*
+ * Architectures that guarantee an implicit smp_mb() in switch_mm()
+ * can override smp_mb__after_switch_mm.
+ */
+#ifndef smp_mb__after_switch_mm
+#define smp_mb__after_switch_mm() smp_mb()
+#endif
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..044d842c696c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -79,6 +79,8 @@
# include <asm/paravirt_api_clock.h>
#endif
+#include <asm/barrier.h>
+
#include "cpupri.h"
#include "cpudeadline.h"
@@ -3481,13 +3483,19 @@ static inline void switch_mm_cid(struct rq *rq,
* between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
* Provide it here.
*/
- if (!prev->mm) // from kernel
+ if (!prev->mm) { // from kernel
smp_mb();
- /*
- * user -> user transition guarantees a memory barrier through
- * switch_mm() when current->mm changes. If current->mm is
- * unchanged, no barrier is needed.
- */
+ } else { // from user
+ /*
+ * user -> user transition relies on an implicit
+ * memory barrier in switch_mm() when
+ * current->mm changes. If the architecture
+ * switch_mm() does not have an implicit memory
+ * barrier, it is emitted here. If current->mm
+ * is unchanged, no barrier is needed.
+ */
+ smp_mb__after_switch_mm();
+ }
}
if (prev->mm_cid_active) {
mm_cid_snapshot_time(rq, prev->mm);
--
2.39.2
smp_call_function_single disables IRQs when executing the callback. To
prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere.
This is already done by qman_update_cgr and qman_delete_cgr; fix the
other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
CC: stable(a)vger.kernel.org
Signed-off-by: Sean Anderson <sean.anderson(a)seco.com>
Reviewed-by: Camelia Groza <camelia.groza(a)nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
I got no response the first time I sent this, so I am resending to net.
This issue was introduced in a series which went through net, so I hope
it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ande…
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
drivers/soc/fsl/qbman/qman.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 739e4eee6b75..1bf1f1ea67f0 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1456,11 +1456,11 @@ static void qm_congestion_task(struct work_struct *work)
union qm_mc_result *mcr;
struct qman_cgr *cgr;
- spin_lock(&p->cgr_lock);
+ spin_lock_irq(&p->cgr_lock);
qm_mc_start(&p->p);
qm_mc_commit(&p->p, QM_MCC_VERB_QUERYCONGESTION);
if (!qm_mc_result_timeout(&p->p, &mcr)) {
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
dev_crit(p->config->dev, "QUERYCONGESTION timeout\n");
qman_p_irqsource_add(p, QM_PIRQ_CSCI);
return;
@@ -1476,7 +1476,7 @@ static void qm_congestion_task(struct work_struct *work)
list_for_each_entry(cgr, &p->cgr_cbs, node)
if (cgr->cb && qman_cgrs_get(&c, cgr->cgrid))
cgr->cb(p, cgr, qman_cgrs_get(&rr, cgr->cgrid));
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
qman_p_irqsource_add(p, QM_PIRQ_CSCI);
}
@@ -2440,7 +2440,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags,
preempt_enable();
cgr->chan = p->config->channel;
- spin_lock(&p->cgr_lock);
+ spin_lock_irq(&p->cgr_lock);
if (opts) {
struct qm_mcc_initcgr local_opts = *opts;
@@ -2477,7 +2477,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags,
qman_cgrs_get(&p->cgrs[1], cgr->cgrid))
cgr->cb(p, cgr, 1);
out:
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
put_affine_portal();
return ret;
}
--
2.35.1.1320.gc452695387.dirty
[Embedded World 2024, SECO SpA]<https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689>
On 11.04.24 09:20, Toralf Förster wrote:
> It is a remote system, nothing in the logs, system is a hardened Gentoo
> Linux, 6.8.4 was fine.
>
> Linux mr-fox 6.8.4 #4 SMP Thu Apr 4 22:10:47 UTC 2024 x86_64 AMD Ryzen
> 9 5950X 16-Core Processor AuthenticAMD GNU/Linux
>
> Another Gentoo dev reported problems too.
>
> config is below.
Thx for the report, but the harsh reality is: nearly no developer will
see your initial report, as you just sent it to LKML, which nearly
nobody ready. I CCed a few lists, which might help. But that is
unlikely, as this could be cause by all sorts of changes. Which is why
we likely need a bisection (
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
) from somebody affected to make some progress here.
That being said: there are a few EFI changes in there that in a case
like this are a suspect. I CCed the developer, maybe something rings a bell.
Ciao, Thorsten
In current driver qcom_slim_ngd_up_worker() indefinitely
waiting for ctrl->qmi_up completion object. This is
resulting in workqueue lockup on Kthread.
Added wait_for_completion_interruptible_timeout to
allow the thread to wait for specific timeout period and
bail out instead waiting infinitely.
Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
Signed-off-by: Viken Dadhaniya <quic_vdadhani(a)quicinc.com>
---
v1 -> v2:
- Remove macro and add value inline.
- add fix, cc and review tag.
---
---
drivers/slimbus/qcom-ngd-ctrl.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/slimbus/qcom-ngd-ctrl.c b/drivers/slimbus/qcom-ngd-ctrl.c
index efeba8275a66..a09a26bf4988 100644
--- a/drivers/slimbus/qcom-ngd-ctrl.c
+++ b/drivers/slimbus/qcom-ngd-ctrl.c
@@ -1451,7 +1451,11 @@ static void qcom_slim_ngd_up_worker(struct work_struct *work)
ctrl = container_of(work, struct qcom_slim_ngd_ctrl, ngd_up_work);
/* Make sure qmi service is up before continuing */
- wait_for_completion_interruptible(&ctrl->qmi_up);
+ if (!wait_for_completion_interruptible_timeout(&ctrl->qmi_up,
+ msecs_to_jiffies(MSEC_PER_SEC))) {
+ dev_err(ctrl->dev, "QMI wait timeout\n");
+ return;
+ }
mutex_lock(&ctrl->ssr_lock);
qcom_slim_ngd_enable(ctrl, true);
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
Hi,
there has been a report of a failure in a 5.4 based kernel, which has
been fixed in kernel 5.10 with commit abee7c494d8c41bb388839bccc47e06247f0d7de.
Please apply the attached backported patch to the stable 5.4 kernel.
Juergen
1. About v3-0001-tracing-Remove-unnecessary-hist_data-destroy-in-d.patch:
The reason I write the changelog by myself is that no one found the bug
at that time, then later the code was removed on upstream, but
4.19-stable has the bug.
2. About v3-0002-tracing-Remove-unnecessary-var-destroy-in-onmax_d.patch
I also write the changelog by myself is that the upstream api is changed.
refs commits:
466f4528fbc6 ("tracing: Generalize hist trigger onmax and save action")
ff9d31d0d466 ("tracing: Remove unnecessary var_ref destroy in track_data_destroy()")
George Guo (2):
tracing: Remove unnecessary hist_data destroy in
destroy_synth_var_refs()
tracing: Remove unnecessary var destroy in onmax_destroy()
kernel/trace/trace_events_hist.c | 27 ++-------------------------
1 file changed, 2 insertions(+), 25 deletions(-)
--
2.34.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 04c35ab3bdae7fefbd7c7a7355f29fa03a035221
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040851-hamster-canary-7b07@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in COW mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 3 Apr 2024 23:21:30 +0200
Subject: [PATCH] x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Wupeng Ma <mawupeng1(a)huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..36b603d0cdde 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
diff --git a/mm/memory.c b/mm/memory.c
index 904f70b99498..d2155ced45f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *vma,
goto out;
pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_folio(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
From 310227f42882c52356b523e2f4e11690eebcd2ab Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 13 Feb 2024 14:54:25 +0100
Subject: [PATCH] virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed.
For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.
Let's fix this by re-enabling the config if freezing fails.
Fixes: 22b7050a024d ("virtio: defer config changed notifications")
Cc: <stable(a)kernel.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Message-Id: <20240213135425.795001-1-david(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
drivers/virtio/virtio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b3513..f513ee21b1c18 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev)
if (drv && drv->freeze) {
ret = drv->freeze(dev);
- if (ret)
+ if (ret) {
+ virtio_config_enable(dev);
return ret;
+ }
}
if (dev->config->destroy_avq)
--
2.43.0
Dave,
>> Could you please try the patch below on top of v6.1.80?
> Works okay on top of v6.1.80:
>
> [ 30.952668] scsi 6:0:0:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3
> [ 31.072592] scsi target6:0:0: Beginning Domain Validation
> [ 31.139334] scsi 6:0:0:0: Power-on or device reset occurred
> [ 31.186227] scsi target6:0:0: Ending Domain Validation
> [ 31.240482] scsi target6:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
> [ 31.462587] ata5: SATA link down (SStatus 0 SControl 0)
> [ 31.618798] scsi 6:0:2:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3
> [ 31.732588] scsi target6:0:2: Beginning Domain Validation
> [ 31.799201] scsi 6:0:2:0: Power-on or device reset occurred
> [ 31.846724] scsi target6:0:2: Ending Domain Validation
> [ 31.900822] scsi target6:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
Great, thanks for testing!
Greg, please revert the following commits from linux-6.1.y:
b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties")
cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions")
and include the patch below instead.
Thank you!
--
Martin K. Petersen Oracle Linux Engineering
From 87441914d491c01b73b949663c101056a9d9b8c7 Mon Sep 17 00:00:00 2001
From: "Martin K. Petersen" <martin.petersen(a)oracle.com>
Date: Tue, 13 Feb 2024 09:33:06 -0500
Subject: [PATCH] scsi: sd: usb_storage: uas: Access media prior to querying
device properties
[ Upstream commit 321da3dc1f3c92a12e3c5da934090d2992a8814c ]
It has been observed that some USB/UAS devices return generic properties
hardcoded in firmware for mode pages for a period of time after a device
has been discovered. The reported properties are either garbage or they do
not accurately reflect the characteristics of the physical storage device
attached in the case of a bridge.
Prior to commit 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to
avoid calling revalidate twice") we would call revalidate several
times during device discovery. As a result, incorrect values would
eventually get replaced with ones accurately describing the attached
storage. When we did away with the redundant revalidate pass, several
cases were reported where devices reported nonsensical values or would
end up in write-protected state.
An initial attempt at addressing this issue involved introducing a
delayed second revalidate invocation. However, this approach still
left some devices reporting incorrect characteristics.
Tasos Sahanidis debugged the problem further and identified that
introducing a READ operation prior to MODE SENSE fixed the problem and that
it wasn't a timing issue. Issuing a READ appears to cause the devices to
update their state to reflect the actual properties of the storage
media. Device properties like vendor, model, and storage capacity appear to
be correctly reported from the get-go. It is unclear why these devices
defer populating the remaining characteristics.
Match the behavior of a well known commercial operating system and
trigger a READ operation prior to querying device characteristics to
force the device to populate the mode pages.
The additional READ is triggered by a flag set in the USB storage and
UAS drivers. We avoid issuing the READ for other transport classes
since some storage devices identify Linux through our particular
discovery command sequence.
Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.c…
Fixes: 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice")
Cc: stable(a)vger.kernel.org
Reported-by: Tasos Sahanidis <tasos(a)tasossah.com>
Reviewed-by: Ewan D. Milne <emilne(a)redhat.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Tested-by: Tasos Sahanidis <tasos(a)tasossah.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 31b5273f43a7..349b1455a2c6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3284,6 +3284,24 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp,
return true;
}
+static void sd_read_block_zero(struct scsi_disk *sdkp)
+{
+ unsigned int buf_len = sdkp->device->sector_size;
+ char *buffer, cmd[10] = { };
+
+ buffer = kmalloc(buf_len, GFP_KERNEL);
+ if (!buffer)
+ return;
+
+ cmd[0] = READ_10;
+ put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */
+ put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */
+
+ scsi_execute_req(sdkp->device, cmd, DMA_FROM_DEVICE, buffer, buf_len,
+ NULL, SD_TIMEOUT, sdkp->max_retries, NULL);
+ kfree(buffer);
+}
+
/**
* sd_revalidate_disk - called the first time a new disk is seen,
* performs disk spin up, read_capacity, etc.
@@ -3323,7 +3341,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
*/
if (sdkp->media_present) {
sd_read_capacity(sdkp, buffer);
-
+ /*
+ * Some USB/UAS devices return generic values for mode pages
+ * until the media has been accessed. Trigger a READ operation
+ * to force the device to populate mode pages.
+ */
+ if (sdp->read_before_ms)
+ sd_read_block_zero(sdkp);
/*
* set the default to rotational. All non-rotational devices
* support the block characteristics VPD page, which will
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index c54e9805da53..12cf9940e5b6 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -179,6 +179,13 @@ static int slave_configure(struct scsi_device *sdev)
*/
sdev->use_192_bytes_for_3f = 1;
+ /*
+ * Some devices report generic values until the media has been
+ * accessed. Force a READ(10) prior to querying device
+ * characteristics.
+ */
+ sdev->read_before_ms = 1;
+
/*
* Some devices don't like MODE SENSE with page=0x3f,
* which is the command used for checking if a device
diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index de3836412bf3..ed22053b3252 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -878,6 +878,13 @@ static int uas_slave_configure(struct scsi_device *sdev)
if (devinfo->flags & US_FL_CAPACITY_HEURISTICS)
sdev->guess_capacity = 1;
+ /*
+ * Some devices report generic values until the media has been
+ * accessed. Force a READ(10) prior to querying device
+ * characteristics.
+ */
+ sdev->read_before_ms = 1;
+
/*
* Some devices don't like MODE SENSE with page=0x3f,
* which is the command used for checking if a device
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d2751ed536df..1504d3137cc6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -204,6 +204,7 @@ struct scsi_device {
unsigned use_10_for_rw:1; /* first try 10-byte read / write */
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
+ unsigned read_before_ms:1; /* perform a READ before MODE SENSE */
unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */
unsigned no_write_same:1; /* no WRITE SAME command */
unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */
please backport
e7d24c0aa8e678f41
gcc-plugins/stackleak: Avoid .head.text section
to stable kernels v5.15 and newer. This addresses the regression reported here:
https://lkml.kernel.org/r/dc118105-b97c-4e51-9a42-a918fa875967%40hardfalcon…
On v5.15, there is a dependency that needs to be backported first:
ae978009fc013e3166c9f523f8b17e41a3c0286e
gcc-plugins/stackleak: Ignore .noinstr.text and .entry.text
The particular issue that this patch fixes does not exist [yet] in
v6.1 and v5.15, but I am working on backports that would introduce it.
But even without those backports, this change is important as it
prevents input sections from being instrumented by stackleak that may
not tolerate this for other reasons too.
Thanks,
Ard.
Hi stable team, there is a report that the recent backport of
5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement
for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke
hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the
culprit does.
> With kernel 6.6.23 hibernating usually hangs here: the display stays
> on but the mouse pointer does not move and the keyboard does not work.
> But SysRq REISUB does reboot. Sometimes it seems to hibernate: the
> computer powers down and can be waked up and the previous display comes
> visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details.
Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
not CCed them in mails like this.
Side note: there is a mainline report about problems due to
5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement
for unbound workqueues") as well, but it's about "nohz_full=0 prevents
kernel from booting":
https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that
separately to Tejun.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
#regzbot introduced: 5a70baec2294e8a7d0fcc4558741c23e752dad
#regzbot from: Petri Kaukasoina
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218658
#regzbot title: workqueue: hubernate usually hangs when going to sleep
#regzbot ignore-activity
Larry Finger <Larry.Finger(a)gmail.com> wrote:
> As discussed in the links below, the SDIO part of RTW8821CS fails to
> start correctly if such startup happens while the UART portion of
> the chip is initializing.
I checked with SDIO team internally, but they didn't meet this case, so we may
take this workaround.
SDIO team wonder if something other than BT cause this failure, and after
system boots everything will be well. Could you boot the system without WiFi/BT
drivers, but insmod drivers manually after booting?
> ---
> drivers/net/wireless/realtek/rtw88/sdio.c | 28 +++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c
> index 0cae5746f540..eec0ad85be72 100644
> --- a/drivers/net/wireless/realtek/rtw88/sdio.c
> +++ b/drivers/net/wireless/realtek/rtw88/sdio.c
> @@ -1325,6 +1325,34 @@ int rtw_sdio_probe(struct sdio_func *sdio_func,
[...]
> + mdelay(500);
Will it better to use sleep function?
As discussed in the links below, the SDIO part of RTW8821CS fails to
start correctly if such startup happens while the UART portion of
the chip is initializing. The logged results with such failure is
[ 10.230516] rtw_8821cs mmc3:0001:1: Start of rtw_sdio_probe
[ 10.306569] Bluetooth: HCI UART driver ver 2.3
[ 10.306717] Bluetooth: HCI UART protocol Three-wire (H5) registered
[ 10.307167] of_dma_request_slave_channel: dma-names property of node '/serial@fe650000' missing or empty
[ 10.307199] dw-apb-uart fe650000.serial: failed to request DMA
[ 10.543474] rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
[ 10.730744] rtw_8821cs mmc3:0001:1: sdio read32 failed (0x11080): -110
[ 10.730923] rtw_8821cs mmc3:0001:1: sdio write32 failed (0x11080): -110
Due to the above errors, wifi fails to work.
For those instances when wifi works, the following is logged:
[ 10.452861] Bluetooth: HCI UART protocol Three-wire (H5) registered
[ 10.453580] of_dma_request_slave_channel: dma-names property of node '/serial@fe650000' missing or empty
[ 10.453621] dw-apb-uart fe650000.serial: failed to request DMA
[ 10.455741] rtw_8821cs mmc3:0001:1: Start of rtw_sdio_probe
[ 10.639186] rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
In this case, SDIO wifi works correctly. The correct case is ensured by
adding an mdelay(500) statement before the call to rtw_core_init(). No
adverse effects are observed.
Link: https://1EHFQ.trk.elasticemail.com/tracking/click?d=1UfsVowwwMAM6kBoyumkHP3…
Link: https://1EHFQ.trk.elasticemail.com/tracking/click?d=XUEf4t8W9xt0czASPOeeDt8…
Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets")
Signed-off-by: Larry Finger <Larry.Finger(a)gmail.com>
Cc: stable(a)vger.kernel.org # v6.4+
---
drivers/net/wireless/realtek/rtw88/sdio.c | 28 +++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c
index 0cae5746f540..eec0ad85be72 100644
--- a/drivers/net/wireless/realtek/rtw88/sdio.c
+++ b/drivers/net/wireless/realtek/rtw88/sdio.c
@@ -1325,6 +1325,34 @@ int rtw_sdio_probe(struct sdio_func *sdio_func,
rtwdev->hci.ops = &rtw_sdio_ops;
rtwdev->hci.type = RTW_HCI_TYPE_SDIO;
+ /* Insert a delay of 500 ms. Without the delay, the wifi part
+ * and the UART that controls Bluetooth interfere with one
+ * another resulting in the following being logged:
+ *
+ * Start of SDIO probe function.
+ * Bluetooth: HCI UART driver ver 2.3
+ * Bluetooth: HCI UART protocol Three-wire (H5) registered
+ * of_dma_request_slave_channel: dma-names property of node '/serial@fe650000'
+ * missing or empty
+ * dw-apb-uart fe650000.serial: failed to request DMA
+` * rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
+ * rtw_8821cs mmc3:0001:1: sdio read32 failed (0x11080): -110
+ *
+ * If the UART is finished initializing before the SDIO probe
+ * function startw, the following is logged:
+ *
+ * Bluetooth: HCI UART protocol Three-wire (H5) registered
+ * of_dma_request_slave_channel: dma-names property of node '/serial@fe650000'
+ * missing or empty
+ * dw-apb-uart fe650000.serial: failed to request DMA
+ * Start of SDIO probe function.
+ * rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
+ * Bluetooth: hci0: RTL: examining hci_ver=08 hci_rev=000c lmp_ver=08 lmp_subver=8821
+ * SDIO wifi works correctly.
+ *
+ * No adverse effects are observed from the delay.
+ */
+ mdelay(500);
ret = rtw_core_init(rtwdev);
if (ret)
goto err_release_hw;
--
2.44.0
https://1EHFQ.trk.elasticemail.com/tracking/unsubscribe?d=XjvOA0R6jwFES_UmJ…
On Wed, Apr 10, 2024 at 10:31 PM Cem Topcuoglu
<topcuoglu.c(a)northeastern.edu> wrote:
>
> Hi,
>
>
>
> We encountered a bug labelled “KASAN: slab-out-of-bounds Write in ops_init” while fuzzing kernel version 5.15.124 with Syzkaller (lines exist in 5.15.154 as well).
>
>
>
> In the net_namespace.c file, we have an if condition at line 89. Subsequently, Syzkaller encounters the bug at line 90.
>
>
>
> 89 if (old_ng->s.len > id) {
>
> 90 old_ng->ptr[id] = data;
>
> 91 return 0;
>
> 92 }
>
>
>
> Upon inspecting the net_generic struct, we noticed that this struct uses union which puts the array and the header (including the array length information) together.
>
> We suspect that with this union, modifying the ng->ptr[0] is essentially modifying ng->s.len, which might fail the check in 89. This might be the cause for Syzkaller detecting this slab-out-of-bound.
>
Look for MIN_PERNET_OPS_ID (this should be 3)
ng->ptr[0] , [1], [2] can not be overwritten.
Do you have a repro ?
Also please use the latest stable (5.15.154).
> Since we are CS PhD students and Linux hobbyists, we do not have a full understanding of what could lead to this. We would really appreciate if you guys can share some insights into this matter : )
>
>
>
> We attached the syzkaller’s bug report below.
>
>
>
> ==================================================================
>
> BUG: KASAN: slab-out-of-bounds in net_assign_generic
>
> usr/src/kernel/net/core/net_namespace.c:90 [inline]
>
> BUG: KASAN: slab-out-of-bounds in ops_init+0x44b/0x4d0
>
> usr/src/kernel/net/core/net_namespace.c:129
>
> Write of size 8 at addr ffff888043c62ae8 by task (coredump)/5424
>
> CPU: 1 PID: 5424 Comm: (coredump) Not tainted 5.15.124-yocto-standard #1
>
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
>
> Call Trace:
>
> <TASK>
>
> __dump_stack usr/src/kernel/lib/dump_stack.c:88 [inline]
>
> dump_stack_lvl+0x51/0x70 usr/src/kernel/lib/dump_stack.c:106
>
> print_address_description.constprop.0+0x24/0x140 usr/src/kernel/mm/kasan/report.c:248
>
> __kasan_report usr/src/kernel/mm/kasan/report.c:434 [inline]
>
> kasan_report.cold+0x7d/0x117 usr/src/kernel/mm/kasan/report.c:451
>
> __asan_report_store8_noabort+0x17/0x20 usr/src/kernel/mm/kasan/report_generic.c:314
>
> net_assign_generic usr/src/kernel/net/core/net_namespace.c:90 [inline]
>
> ops_init+0x44b/0x4d0 usr/src/kernel/net/core/net_namespace.c:129
>
> setup_net+0x40a/0x970 usr/src/kernel/net/core/net_namespace.c:329
>
> copy_net_ns+0x2ac/0x680 usr/src/kernel/net/core/net_namespace.c:473
>
> create_new_namespaces+0x390/0xa50 usr/src/kernel/kernel/nsproxy.c:110
>
> unshare_nsproxy_namespaces+0xb0/0x1d0 usr/src/kernel/kernel/nsproxy.c:226
>
> ksys_unshare+0x30c/0x850 usr/src/kernel/kernel/fork.c:3094
>
> __do_sys_unshare usr/src/kernel/kernel/fork.c:3168 [inline]
>
> __se_sys_unshare usr/src/kernel/kernel/fork.c:3166 [inline]
>
> __x64_sys_unshare+0x36/0x50 usr/src/kernel/kernel/fork.c:3166
>
> do_syscall_x64 usr/src/kernel/arch/x86/entry/common.c:50 [inline]
>
> do_syscall_64+0x40/0x90 usr/src/kernel/arch/x86/entry/common.c:80
>
> entry_SYSCALL_64_after_hwframe+0x61/0xcb
>
> RIP: 0033:0x7fbafce1b39b
>
> Code: 73 01 c3 48 8b 0d 85 2a 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00
>
> 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 2a 0e 00 f7
>
> d8 64 89 01 48
>
> RSP: 002b:00007ffddc8dfda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
>
> RAX: ffffffffffffffda RBX: 0000557e645dd018 RCX: 00007fbafce1b39b
>
> RDX: 0000000000000000 RSI: 00007ffddc8dfd10 RDI: 0000000040000000
>
> RBP: 00007ffddc8dfde0 R08: 0000000000000000 R09: 00007ffd00000067
>
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000fffffff5
>
> R13: 00007fbafd26ba60 R14: 0000000040000000 R15: 0000000000000000
>
> </TASK>
>
> Allocated by task 5424:
>
> kasan_save_stack+0x26/0x60 usr/src/kernel/mm/kasan/common.c:38
>
> kasan_set_track usr/src/kernel/mm/kasan/common.c:46 [inline]
>
> set_alloc_info usr/src/kernel/mm/kasan/common.c:434 [inline]
>
> ____kasan_kmalloc usr/src/kernel/mm/kasan/common.c:513 [inline]
>
> ____kasan_kmalloc usr/src/kernel/mm/kasan/common.c:472 [inline]
>
> __kasan_kmalloc+0xae/0xe0 usr/src/kernel/mm/kasan/common.c:522
>
> kasan_kmalloc usr/src/kernel/include/linux/kasan.h:264 [inline]
>
> __kmalloc+0x308/0x560 usr/src/kernel/mm/slub.c:4407
>
> kmalloc usr/src/kernel/include/linux/slab.h:596 [inline]
>
> kzalloc usr/src/kernel/include/linux/slab.h:721 [inline]
>
> net_alloc_generic+0x28/0x80 usr/src/kernel/net/core/net_namespace.c:74
>
> net_alloc usr/src/kernel/net/core/net_namespace.c:401 [inline]
>
> copy_net_ns+0xc3/0x680 usr/src/kernel/net/core/net_namespace.c:460
>
> create_new_namespaces+0x390/0xa50 usr/src/kernel/kernel/nsproxy.c:110
>
> unshare_nsproxy_namespaces+0xb0/0x1d0 usr/src/kernel/kernel/nsproxy.c:226
>
> ksys_unshare+0x30c/0x850 usr/src/kernel/kernel/fork.c:3094
>
> __do_sys_unshare usr/src/kernel/kernel/fork.c:3168 [inline]
>
> __se_sys_unshare usr/src/kernel/kernel/fork.c:3166 [inline]
>
> __x64_sys_unshare+0x36/0x50 usr/src/kernel/kernel/fork.c:3166
>
> do_syscall_x64 usr/src/kernel/arch/x86/entry/common.c:50 [inline]
>
> do_syscall_64+0x40/0x90 usr/src/kernel/arch/x86/entry/common.c:80
>
> entry_SYSCALL_64_after_hwframe+0x61/0xcb
>
> The buggy address belongs to the object at ffff888043c62a00
>
> which belongs to the cache kmalloc-256 of size 256
>
> The buggy address is located 232 bytes inside of
>
> 256-byte region [ffff888043c62a00, ffff888043c62b00)
>
> The buggy address belongs to the page:
>
> page:000000008dd0a6b6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x43c62
>
> head:000000008dd0a6b6 order:1 compound_mapcount:0
>
> flags: 0x4000000000010200(slab|head|zone=1)
>
> raw: 4000000000010200 ffffea0001108f00 0000000700000007 ffff888001041b40
>
> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>
> ffff888043c62980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> ffff888043c62a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> >ffff888043c62a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>
> ^
>
> ffff888043c62b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> ffff888043c62b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> ==================================================================
>
> kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>
>
>
> Best
>
>
Hi,
We encountered a bug labelled “KASAN: slab-out-of-bounds Write in ops_init” while fuzzing kernel version 5.15.124 with Syzkaller (lines exist in 5.15.154 as well).
In the net_namespace.c file, we have an if condition at line 89. Subsequently, Syzkaller encounters the bug at line 90.
89 if (old_ng->s.len > id) {
90 old_ng->ptr[id] = data;
91 return 0;
92 }
Upon inspecting the net_generic struct, we noticed that this struct uses union which puts the array and the header (including the array length information) together.
We suspect that with this union, modifying the ng->ptr[0] is essentially modifying ng->s.len, which might fail the check in 89. This might be the cause for Syzkaller detecting this slab-out-of-bound.
Since we are CS PhD students and Linux hobbyists, we do not have a full understanding of what could lead to this. We would really appreciate if you guys can share some insights into this matter : )
We attached the syzkaller’s bug report below.
==================================================================
BUG: KASAN: slab-out-of-bounds in net_assign_generic
usr/src/kernel/net/core/net_namespace.c:90 [inline]
BUG: KASAN: slab-out-of-bounds in ops_init+0x44b/0x4d0
usr/src/kernel/net/core/net_namespace.c:129
Write of size 8 at addr ffff888043c62ae8 by task (coredump)/5424
CPU: 1 PID: 5424 Comm: (coredump) Not tainted 5.15.124-yocto-standard #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
<TASK>
__dump_stack usr/src/kernel/lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x51/0x70 usr/src/kernel/lib/dump_stack.c:106
print_address_description.constprop.0+0x24/0x140 usr/src/kernel/mm/kasan/report.c:248
__kasan_report usr/src/kernel/mm/kasan/report.c:434 [inline]
kasan_report.cold+0x7d/0x117 usr/src/kernel/mm/kasan/report.c:451
__asan_report_store8_noabort+0x17/0x20 usr/src/kernel/mm/kasan/report_generic.c:314
net_assign_generic usr/src/kernel/net/core/net_namespace.c:90 [inline]
ops_init+0x44b/0x4d0 usr/src/kernel/net/core/net_namespace.c:129
setup_net+0x40a/0x970 usr/src/kernel/net/core/net_namespace.c:329
copy_net_ns+0x2ac/0x680 usr/src/kernel/net/core/net_namespace.c:473
create_new_namespaces+0x390/0xa50 usr/src/kernel/kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xb0/0x1d0 usr/src/kernel/kernel/nsproxy.c:226
ksys_unshare+0x30c/0x850 usr/src/kernel/kernel/fork.c:3094
__do_sys_unshare usr/src/kernel/kernel/fork.c:3168 [inline]
__se_sys_unshare usr/src/kernel/kernel/fork.c:3166 [inline]
__x64_sys_unshare+0x36/0x50 usr/src/kernel/kernel/fork.c:3166
do_syscall_x64 usr/src/kernel/arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x40/0x90 usr/src/kernel/arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7fbafce1b39b
Code: 73 01 c3 48 8b 0d 85 2a 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00
00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 2a 0e 00 f7
d8 64 89 01 48
RSP: 002b:00007ffddc8dfda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000557e645dd018 RCX: 00007fbafce1b39b
RDX: 0000000000000000 RSI: 00007ffddc8dfd10 RDI: 0000000040000000
RBP: 00007ffddc8dfde0 R08: 0000000000000000 R09: 00007ffd00000067
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000fffffff5
R13: 00007fbafd26ba60 R14: 0000000040000000 R15: 0000000000000000
</TASK>
Allocated by task 5424:
kasan_save_stack+0x26/0x60 usr/src/kernel/mm/kasan/common.c:38
kasan_set_track usr/src/kernel/mm/kasan/common.c:46 [inline]
set_alloc_info usr/src/kernel/mm/kasan/common.c:434 [inline]
____kasan_kmalloc usr/src/kernel/mm/kasan/common.c:513 [inline]
____kasan_kmalloc usr/src/kernel/mm/kasan/common.c:472 [inline]
__kasan_kmalloc+0xae/0xe0 usr/src/kernel/mm/kasan/common.c:522
kasan_kmalloc usr/src/kernel/include/linux/kasan.h:264 [inline]
__kmalloc+0x308/0x560 usr/src/kernel/mm/slub.c:4407
kmalloc usr/src/kernel/include/linux/slab.h:596 [inline]
kzalloc usr/src/kernel/include/linux/slab.h:721 [inline]
net_alloc_generic+0x28/0x80 usr/src/kernel/net/core/net_namespace.c:74
net_alloc usr/src/kernel/net/core/net_namespace.c:401 [inline]
copy_net_ns+0xc3/0x680 usr/src/kernel/net/core/net_namespace.c:460
create_new_namespaces+0x390/0xa50 usr/src/kernel/kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xb0/0x1d0 usr/src/kernel/kernel/nsproxy.c:226
ksys_unshare+0x30c/0x850 usr/src/kernel/kernel/fork.c:3094
__do_sys_unshare usr/src/kernel/kernel/fork.c:3168 [inline]
__se_sys_unshare usr/src/kernel/kernel/fork.c:3166 [inline]
__x64_sys_unshare+0x36/0x50 usr/src/kernel/kernel/fork.c:3166
do_syscall_x64 usr/src/kernel/arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x40/0x90 usr/src/kernel/arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x61/0xcb
The buggy address belongs to the object at ffff888043c62a00
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 232 bytes inside of
256-byte region [ffff888043c62a00, ffff888043c62b00)
The buggy address belongs to the page:
page:000000008dd0a6b6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x43c62
head:000000008dd0a6b6 order:1 compound_mapcount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 ffffea0001108f00 0000000700000007 ffff888001041b40
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888043c62980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888043c62a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888043c62a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
^
ffff888043c62b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888043c62b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Best
The patch titled
Subject: fork: defer linking file vma until vma is fully initialized
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: fork: defer linking file vma until vma is fully initialized
Date: Wed, 10 Apr 2024 17:14:41 +0800
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
--- a/kernel/fork.c~fork-defer-linking-file-vma-until-vma-is-fully-initialized
+++ a/kernel/fork.c
@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(str
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ /*
+ * Link the vma into the MT. After using __mt_dup(), memory
+ * allocation is not necessary here, so it cannot fail.
+ */
+ vma_iter_bulk_store(&vmi, tmp);
+
+ mm->map_count++;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(str
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
- /*
- * Link the vma into the MT. After using __mt_dup(), memory
- * allocation is not necessary here, so it cannot fail.
- */
- vma_iter_bulk_store(&vmi, tmp);
-
- mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
if (retval) {
mpnt = vma_next(&vmi);
goto loop_out;
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled.patch
fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch
Hi Greg, Sasha, Thadeu,
Today there was mentioning of
https://www.jmpeax.dev/The-tale-of-a-GSM-Kernel-LPE.html
a LPE from the n_gsm module. I do realize, Thadeu mentioned the
possible attack surface already back in
https://lore.kernel.org/all/ZMuRoDbMcQrsCs3m@quatroqueijos.cascardo.eti.br/…
Published exploits are referenced as well through the potential
initial finder in https://github.com/YuriiCrimson/ExploitGSM .
While 67c37756898a ("tty: n_gsm: require CAP_NET_ADMIN to attach
N_GSM0710 ldisc") is not the fix itself, it helps mitigating against
this issue.
Thus can you consider applying this still to the stable series as
needed? I think it should go at least back to 5.15.y but if
Iunderstood Thadeu correctly then even further back to the still
supported stable branches.
What do you think?
Regards,
Salvatore