The patch titled
Subject: nilfs2: fix kernel bug due to missing clearing of checked flag
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-checked-flag.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix kernel bug due to missing clearing of checked flag
Date: Fri, 18 Oct 2024 04:33:10 +0900
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/page.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/nilfs2/page.c~nilfs2-fix-kernel-bug-due-to-missing-clearing-of-checked-flag
+++ a/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct foli
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag.patch
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-checked-flag.patch
If some remap_pfn_range() calls succeeded before one failed, we still have
buffer pages mapped into the userspace page tables when we drop the buffer
reference with comedi_buf_map_put(bm). The userspace mappings are only
cleaned up later in the mmap error path.
Fix it by explicitly flushing all mappings in our VMA on the error path.
See commit 79a61cc3fc04 ("mm: avoid leaving partial pfn mappings around in
error case").
Cc: stable(a)vger.kernel.org
Fixes: ed9eccbe8970 ("Staging: add comedi core")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Note: compile-tested only; I don't actually have comedi hardware, and I
don't know anything about comedi.
---
Changes in v2:
- only do the zapping in the pfnmap path (Ian Abbott)
- use zap_vma_ptes() instead of zap_page_range_single() (Ian Abbott)
- Link to v1: https://lore.kernel.org/r/20241014-comedi-tlb-v1-1-4b699144b438@google.com
---
drivers/comedi/comedi_fops.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/comedi/comedi_fops.c b/drivers/comedi/comedi_fops.c
index 1b481731df96..68e5301e6281 100644
--- a/drivers/comedi/comedi_fops.c
+++ b/drivers/comedi/comedi_fops.c
@@ -2407,6 +2407,16 @@ static int comedi_mmap(struct file *file, struct vm_area_struct *vma)
start += PAGE_SIZE;
}
+
+ /*
+ * Leaving behind a partial mapping of a buffer we're about to
+ * drop is unsafe, see remap_pfn_range_notrack().
+ * We need to zap the range here ourselves instead of relying
+ * on the automatic zapping in remap_pfn_range() because we call
+ * remap_pfn_range() in a loop.
+ */
+ if (retval)
+ zap_vma_ptes(vma, vma->vm_start, size);
}
if (retval == 0) {
---
base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
change-id: 20241014-comedi-tlb-400246505961
--
Jann Horn <jannh(a)google.com>
From: Mikulas Patocka <mpatocka(a)redhat.com>
commit 0a9bab391e336489169b95cb0d4553d921302189 upstream.
Tasklets have an inherent problem with memory corruption. The function
tasklet_action_common calls tasklet_trylock, then it calls the tasklet
callback and then it calls tasklet_unlock. If the tasklet callback frees
the structure that contains the tasklet or if it calls some code that may
free it, tasklet_unlock will write into free memory.
The commits 8e14f610159d and d9a02e016aaf try to fix it for dm-crypt, but
it is not a sufficient fix and the data corruption can still happen [1].
There is no fix for dm-verity and dm-verity will write into free memory
with every tasklet-processed bio.
There will be atomic workqueues implemented in the kernel 6.9 [2]. They
will have better interface and they will not suffer from the memory
corruption problem.
But we need something that stops the memory corruption now and that can be
backported to the stable kernels. So, I'm proposing this commit that
disables tasklets in both dm-crypt and dm-verity. This commit doesn't
remove the tasklet support, because the tasklet code will be reused when
atomic workqueues will be implemented.
[1] https://lore.kernel.org/all/d390d7ee-f142-44d3-822a-87949e14608b@suse.de/T/
[2] https://lore.kernel.org/lkml/20240130091300.2968534-1-tj@kernel.org/
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 39d42fa96ba1b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Fixes: 5721d4e5a9cdb ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
(cherry picked from commit 30884a44e0cedc3dfda8c22432f3ba4078ec2d94)
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
---
drivers/md/dm-crypt.c | 37 ++-----------------------------------
1 file changed, 2 insertions(+), 35 deletions(-)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9889035c343e3..95b3b69a5e3c4 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -69,10 +69,8 @@ struct dm_crypt_io {
struct bio *base_bio;
u8 *integrity_metadata;
bool integrity_metadata_from_pool:1;
- bool in_tasklet:1;
struct work_struct work;
- struct tasklet_struct tasklet;
struct convert_context ctx;
@@ -1769,7 +1767,6 @@ static void crypt_io_init(struct dm_crypt_io *io, struct crypt_config *cc,
io->ctx.r.req = NULL;
io->integrity_metadata = NULL;
io->integrity_metadata_from_pool = false;
- io->in_tasklet = false;
atomic_set(&io->io_pending, 0);
}
@@ -1778,12 +1775,6 @@ static void crypt_inc_pending(struct dm_crypt_io *io)
atomic_inc(&io->io_pending);
}
-static void kcryptd_io_bio_endio(struct work_struct *work)
-{
- struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
- bio_endio(io->base_bio);
-}
-
/*
* One of the bios was finished. Check for completion of
* the whole request and correctly clean up the buffer.
@@ -1807,20 +1798,6 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
base_bio->bi_status = error;
- /*
- * If we are running this function from our tasklet,
- * we can't call bio_endio() here, because it will call
- * clone_endio() from dm.c, which in turn will
- * free the current struct dm_crypt_io structure with
- * our tasklet. In this case we need to delay bio_endio()
- * execution to after the tasklet is done and dequeued.
- */
- if (io->in_tasklet) {
- INIT_WORK(&io->work, kcryptd_io_bio_endio);
- queue_work(cc->io_queue, &io->work);
- return;
- }
-
bio_endio(base_bio);
}
@@ -2264,11 +2241,6 @@ static void kcryptd_crypt(struct work_struct *work)
kcryptd_crypt_write_convert(io);
}
-static void kcryptd_crypt_tasklet(unsigned long work)
-{
- kcryptd_crypt((struct work_struct *)work);
-}
-
static void kcryptd_queue_crypt(struct dm_crypt_io *io)
{
struct crypt_config *cc = io->cc;
@@ -2280,15 +2252,10 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io)
* irqs_disabled(): the kernel may run some IO completion from the idle thread, but
* it is being executed with irqs disabled.
*/
- if (in_hardirq() || irqs_disabled()) {
- io->in_tasklet = true;
- tasklet_init(&io->tasklet, kcryptd_crypt_tasklet, (unsigned long)&io->work);
- tasklet_schedule(&io->tasklet);
+ if (!(in_hardirq() || irqs_disabled())) {
+ kcryptd_crypt(&io->work);
return;
}
-
- kcryptd_crypt(&io->work);
- return;
}
INIT_WORK(&io->work, kcryptd_crypt);
--
2.46.0
If there is an error during some initialization related to firmware,
the buffers dp->tx_ring[i].tx_status are released.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80
Call Trace:
free_large_kmalloc
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set the buffers to NULL after releasing in
order to avoid the double free.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v4:
- Send with cover letter to get reference with 1/2
v3: https://lore.kernel.org/linux-wireless/20241017074854.176765-1-jtornosm@red…
v2: https://lore.kernel.org/linux-wireless/20241016123722.206899-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 789d430e4455..15061782a2df 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1277,8 +1277,10 @@ void ath12k_dp_free(struct ath12k_base *ab)
ath12k_dp_rx_reo_cmd_list_cleanup(ab);
- for (i = 0; i < ab->hw_params->max_tx_ring; i++)
+ for (i = 0; i < ab->hw_params->max_tx_ring; i++) {
kfree(dp->tx_ring[i].tx_status);
+ dp->tx_ring[i].tx_status = NULL;
+ }
ath12k_dp_rx_free(ab);
/* Deinit any SOC level resource */
--
2.47.0
If there is an error during some initialization related to firmware,
the function ath12k_dp_cc_cleanup is called to release resources.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
BUG: kernel NULL pointer dereference, address: 0000000000000020
at RIP: 0010:ath12k_dp_cc_cleanup.part.0+0xb6/0x500 [ath12k]
Call Trace:
ath12k_dp_cc_cleanup
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set to NULL the released structure in
ath12k_dp_cc_cleanup at the end.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v4:
- Send with cover letter to get reference with 2/2
v3: https://lore.kernel.org/linux-wireless/20241017074654.176678-1-jtornosm@red…
v2: https://lore.kernel.org/linux-wireless/20241016123452.206671-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-2-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 61aa78d8bd8c..789d430e4455 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1241,6 +1241,7 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab)
}
kfree(dp->spt_info);
+ dp->spt_info = NULL;
}
static void ath12k_dp_reoq_lut_cleanup(struct ath12k_base *ab)
--
2.46.2
Hi, Conor,
> Hi, Conor
>
> Thanks for your patch.
>
> > From: Conor Dooley <conor.dooley(a)microchip.com>
> >
> > Aurelien reported probe failures due to the csi node being enabled
> > without having a camera attached to it. A camera was in the initial
> > submissions, but was removed from the dts, as it had not actually been
> > present on the board, but was from an addon board used by the developer
> of the relevant drivers.
> > The non-camera pipeline nodes were not disabled when this happened and
> > the probe failures are problematic for Debian. Disable them.
> >
> > CC: stable(a)vger.kernel.org
> > Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera
> > subsystem
> > nodes")
>
> Here you write it in 13 characters, should be "Fixes: 28ecaaa5af19 ..."
>
After fixing this:
Reviewed-by: Changhuang Liang <changhuang.liang(a)starfivetech.com>
It seems it is necessary to set WILC MAC address after operation mode,
otherwise the MAC address of the WILC MAC is reset back to what is in
nvmem. This causes a failure to associate with AP after the WILC MAC
address was overridden by userspace.
Test case:
"
ap$ cat << EOF > hostap.conf
interface=wlan0
ssid=ssid
hw_mode=g
channel=6
wpa=2
wpa_passphrase=pass
wpa_key_mgmt=WPA-PSK
EOF
ap$ hostapd -d hostap.conf
ap$ ifconfig wlan0 10.0.0.1
"
"
sta$ ifconfig wlan0 hw ether 00:11:22:33:44:55
sta$ wpa_supplicant -i wlan0 -c <(wpa_passphrase ssid pass)
sta$ ifconfig wlan0 10.0.0.2
sta$ ping 10.0.0.1 # fails without this patch
"
AP still indicates SA with original MAC address from nvmem without this patch:
"
nl80211: RX frame da=ff:ff:ff:ff:ff:ff sa=60:01:23:45:67:89 bssid=ff:ff:ff:ff:ff:ff ...
^^^^^^^^^^^^^^^^^
"
Fixes: 83d9b54ee5d4 ("wifi: wilc1000: read MAC address from fuse at probe")
Tested-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
---
Cc: Ajay Singh <ajay.kathat(a)microchip.com>
Cc: Alexis Lothoré <alexis.lothore(a)bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea(a)tuxon.dev>
Cc: Kalle Valo <kvalo(a)kernel.org>
Cc: Marek Vasut <marex(a)denx.de>
Cc: linux-wireless(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
V2: Add TB, Fixes, CC Stable
---
drivers/net/wireless/microchip/wilc1000/netdev.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c b/drivers/net/wireless/microchip/wilc1000/netdev.c
index 9ecf3fb29b558..8bc127c5a538c 100644
--- a/drivers/net/wireless/microchip/wilc1000/netdev.c
+++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
@@ -608,6 +608,9 @@ static int wilc_mac_open(struct net_device *ndev)
return ret;
}
+ wilc_set_operation_mode(vif, wilc_get_vif_idx(vif), vif->iftype,
+ vif->idx);
+
netdev_dbg(ndev, "Mac address: %pM\n", ndev->dev_addr);
ret = wilc_set_mac_address(vif, ndev->dev_addr);
if (ret) {
@@ -618,9 +621,6 @@ static int wilc_mac_open(struct net_device *ndev)
return ret;
}
- wilc_set_operation_mode(vif, wilc_get_vif_idx(vif), vif->iftype,
- vif->idx);
-
mgmt_regs.interface_stypes = vif->mgmt_reg_stypes;
/* so we detect a change */
vif->mgmt_reg_stypes = 0;
--
2.45.2
From: Conor Dooley <conor.dooley(a)microchip.com>
Aurelien reported probe failures due to the csi node being enabled
without having a camera attached to it. A camera was in the initial
submissions, but was removed from the dts, as it had not actually been
present on the board, but was from an addon board used by the
developer of the relevant drivers. The non-camera pipeline nodes were
not disabled when this happened and the probe failures are problematic
for Debian. Disable them.
CC: stable(a)vger.kernel.org
Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera subsystem nodes")
Closes: https://lore.kernel.org/all/Zw1-vcN4CoVkfLjU@aurel32.net/
Reported-by: Aurelien Jarno <aurelien(a)aurel32.net>
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
CC: Emil Renner Berthing <kernel(a)esmil.dk>
CC: Rob Herring <robh(a)kernel.org>
CC: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
CC: Conor Dooley <conor+dt(a)kernel.org>
CC: Changhuang Liang <changhuang.liang(a)starfivetech.com>
CC: devicetree(a)vger.kernel.org
CC: linux-riscv(a)lists.infradead.org
CC: linux-kernel(a)vger.kernel.org
---
arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
index c7771b3b64758..d6c55f1cc96a9 100644
--- a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
+++ b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
@@ -128,7 +128,6 @@ &camss {
assigned-clocks = <&ispcrg JH7110_ISPCLK_DOM4_APB_FUNC>,
<&ispcrg JH7110_ISPCLK_MIPI_RX0_PXL>;
assigned-clock-rates = <49500000>, <198000000>;
- status = "okay";
ports {
#address-cells = <1>;
@@ -151,7 +150,6 @@ camss_from_csi2rx: endpoint {
&csi2rx {
assigned-clocks = <&ispcrg JH7110_ISPCLK_VIN_SYS>;
assigned-clock-rates = <297000000>;
- status = "okay";
ports {
#address-cells = <1>;
--
2.45.2
The IMAGE_DLLCHARACTERISTICS_NX_COMPAT informs the firmware that the
EFI binary does not rely on pages that are both executable and
writable.
The flag is used by some distro versions of GRUB to decide if the EFI
binary may be executed.
As the Linux kernel neither has RWX sections nor needs RWX pages for
relocation we should set the flag.
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt(a)canonical.com>
---
arch/riscv/kernel/efi-header.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/efi-header.S b/arch/riscv/kernel/efi-header.S
index 515b2dfbca75..c5f17c2710b5 100644
--- a/arch/riscv/kernel/efi-header.S
+++ b/arch/riscv/kernel/efi-header.S
@@ -64,7 +64,7 @@ extra_header_fields:
.long efi_header_end - _start // SizeOfHeaders
.long 0 // CheckSum
.short IMAGE_SUBSYSTEM_EFI_APPLICATION // Subsystem
- .short 0 // DllCharacteristics
+ .short IMAGE_DLL_CHARACTERISTICS_NX_COMPAT // DllCharacteristics
.quad 0 // SizeOfStackReserve
.quad 0 // SizeOfStackCommit
.quad 0 // SizeOfHeapReserve
--
2.45.2
The existing code moves VF to the same namespace as the synthetic NIC
during netvsc_register_vf(). But, if the synthetic device is moved to a
new namespace after the VF registration, the VF won't be moved together.
To make the behavior more consistent, add a namespace check for synthetic
NIC's NETDEV_REGISTER event (generated during its move), and move the VF
if it is not in the same namespace.
Cc: stable(a)vger.kernel.org
Fixes: c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device")
Suggested-by: Stephen Hemminger <stephen(a)networkplumber.org>
Signed-off-by: Haiyang Zhang <haiyangz(a)microsoft.com>
---
v2: Move my fix to synthetic NIC's NETDEV_REGISTER event as suggested by Stephen.
---
drivers/net/hyperv/netvsc_drv.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 153b97f8ec0d..54e98356ee93 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2798,6 +2798,30 @@ static struct hv_driver netvsc_drv = {
},
};
+/* Set VF's namespace same as the synthetic NIC */
+static void netvsc_event_set_vf_ns(struct net_device *ndev)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(ndev);
+ struct net_device *vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
+ int ret;
+
+ if (!vf_netdev)
+ return;
+
+ if (!net_eq(dev_net(ndev), dev_net(vf_netdev))) {
+ ret = dev_change_net_namespace(vf_netdev, dev_net(ndev),
+ "eth%d");
+ if (ret)
+ netdev_err(vf_netdev,
+ "Cannot move to same namespace as %s: %d\n",
+ ndev->name, ret);
+ else
+ netdev_info(vf_netdev,
+ "Moved VF to namespace with: %s\n",
+ ndev->name);
+ }
+}
+
/*
* On Hyper-V, every VF interface is matched with a corresponding
* synthetic interface. The synthetic interface is presented first
@@ -2810,6 +2834,11 @@ static int netvsc_netdev_event(struct notifier_block *this,
struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
int ret = 0;
+ if (event_dev->netdev_ops == &device_ops && event == NETDEV_REGISTER) {
+ netvsc_event_set_vf_ns(event_dev);
+ return NOTIFY_DONE;
+ }
+
ret = check_dev_is_matching_vf(event_dev);
if (ret != 0)
return NOTIFY_DONE;
--
2.34.1
The quilt patch titled
Subject: maple_tree: correct tree corruption on spanning store
has been removed from the -mm tree. Its filename was
maple_tree-correct-tree-corruption-on-spanning-store.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: maple_tree: correct tree corruption on spanning store
Date: Mon, 7 Oct 2024 16:28:32 +0100
Patch series "maple_tree: correct tree corruption on spanning store", v3.
There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.
This bug seems far more likely to happen since commit f8d112a4e657
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.
We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.
The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.
This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.
The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.
The fix patch commit message goes into great detail as to how this occurs.
This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.
Bert has kindly tested the fix and confirmed it resolved his issues. Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.
This patch (of 2):
There has been a subtle bug present in the maple tree implementation from
its inception.
This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.
A range may always ultimately span two leaf nodes. In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.
This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().
In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.
When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.
When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.
We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.
We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().
After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e.
r_mas.offset <= r_mas.end).
Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.
Consider the following tree:
Height
0 Root Node
/ \
pivot = 0xffff / \ pivot = ULONG_MAX
/ \
1 A [-----] ...
/ \
pivot = 0x4fff / \ pivot = 0xffff
/ \
2 (LEAVES) B [-----] [-----] C
^--- Last pivot 0xffff.
Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):
1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
determines that this is a spanning store across nodes B and C. The mas
state is set such that the current node from which we traverse further
is node A.
2. In mas_wr_spanning_store() we try to find elements to the right of pivot
0xffff by searching for an index of 0x10000:
- mas_wr_walk_index() invokes mas_wr_walk_descend() and
mas_wr_node_walk() in turn.
- mas_wr_node_walk() loops over entries in node A until EITHER it
finds an entry whose pivot equals or exceeds 0x10000 OR it
reaches the final entry.
- Since no entry has a pivot equal to or exceeding 0x10000, pivot
0xffff is selected, leading to node C.
- mas_wr_walk_traverse() resets the mas state to traverse node C. We
loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
in turn once again.
- Again, we reach the last entry in node C, which has a pivot of
0xffff.
3. We then copy the elements to the left of 0x4000 in node B to the big
node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
too.
4. We determine whether we have any entries to copy from the right of the
end of the range via - and with r_mas set up at the entry at pivot
0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
pivot 0xffff.
5. BUG! The maple tree is corrupted with a duplicate entry.
This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.
spanning store across two leaf nodes with a range that ends at that shared
pivot.
A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.
Instead, this patch detects if the right hand node is populated, i.e. has
anything we need to copy.
We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.
The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().
The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.
The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.
After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.
Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.17283144…
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Bert Karwatzki <spasswolf(a)web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf(a)web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7…
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/lib/maple_tree.c~maple_tree-correct-tree-corruption-on-spanning-store
+++ a/lib/maple_tree.c
@@ -2196,6 +2196,8 @@ static inline void mas_node_or_none(stru
/*
* mas_wr_node_walk() - Find the correct offset for the index in the @mas.
+ * If @mas->index cannot be found within the containing
+ * node, we traverse to the last entry in the node.
* @wr_mas: The maple write state
*
* Uses mas_slot_locked() and does not need to worry about dead nodes.
@@ -3532,7 +3534,7 @@ static bool mas_wr_walk(struct ma_wr_sta
return true;
}
-static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
+static void mas_wr_walk_index(struct ma_wr_state *wr_mas)
{
struct ma_state *mas = wr_mas->mas;
@@ -3541,11 +3543,9 @@ static bool mas_wr_walk_index(struct ma_
wr_mas->content = mas_slot_locked(mas, wr_mas->slots,
mas->offset);
if (ma_is_leaf(wr_mas->type))
- return true;
+ return;
mas_wr_walk_traverse(wr_mas);
-
}
- return true;
}
/*
* mas_extend_spanning_null() - Extend a store of a %NULL to include surrounding %NULLs.
@@ -3765,8 +3765,8 @@ static noinline void mas_wr_spanning_sto
memset(&b_node, 0, sizeof(struct maple_big_node));
/* Copy l_mas and store the value in b_node. */
mas_store_b_node(&l_wr_mas, &b_node, l_mas.end);
- /* Copy r_mas into b_node. */
- if (r_mas.offset <= r_mas.end)
+ /* Copy r_mas into b_node if there is anything to copy. */
+ if (r_mas.max > r_mas.last)
mas_mab_cp(&r_mas, r_mas.offset, r_mas.end,
&b_node, b_node.b_end + 1);
else
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch
mm-refactor-mm_access-to-not-return-null.patch
mm-refactor-mm_access-to-not-return-null-fix.patch
mm-madvise-unrestrict-process_madvise-for-current-process.patch
maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch
misc_minor_alloc was allocating id using ida for minor only in case of
MISC_DYNAMIC_MINOR but misc_minor_free was always freeing ids
using ida_free causing a mismatch and following warn:
> > WARNING: CPU: 0 PID: 159 at lib/idr.c:525 ida_free+0x3e0/0x41f
> > ida_free called for id=127 which is not allocated.
> > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
...
> > [<60941eb4>] ida_free+0x3e0/0x41f
> > [<605ac993>] misc_minor_free+0x3e/0xbc
> > [<605acb82>] misc_deregister+0x171/0x1b3
misc_minor_alloc is changed to allocate id from ida for all minors
falling in the range of dynamic/ misc dynamic minors
Fixes: ab760791c0cf ("char: misc: Increase the maximum number of dynamic misc devices to 1048448")
Signed-off-by: Vimal Agrawal <avimalin(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
v2: Added Fixes:
added missed case for static minor in misc_minor_alloc
v3: Removed kunit changes as that will be added as second patch in this two patch series
v4: Updated Signed-off-by: to match from:
drivers/char/misc.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index 541edc26ec89..2cf595d2e10b 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -63,16 +63,30 @@ static DEFINE_MUTEX(misc_mtx);
#define DYNAMIC_MINORS 128 /* like dynamic majors */
static DEFINE_IDA(misc_minors_ida);
-static int misc_minor_alloc(void)
+static int misc_minor_alloc(int minor)
{
- int ret;
-
- ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
- if (ret >= 0) {
- ret = DYNAMIC_MINORS - ret - 1;
+ int ret = 0;
+
+ if (minor == MISC_DYNAMIC_MINOR) {
+ /* allocate free id */
+ ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
+ if (ret >= 0) {
+ ret = DYNAMIC_MINORS - ret - 1;
+ } else {
+ ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
+ MINORMASK, GFP_KERNEL);
+ }
} else {
- ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
- MINORMASK, GFP_KERNEL);
+ /* specific minor, check if it is in dynamic or misc dynamic range */
+ if (minor < DYNAMIC_MINORS) {
+ minor = DYNAMIC_MINORS - minor - 1;
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else if (minor > MISC_DYNAMIC_MINOR) {
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else {
+ /* case of non-dynamic minors, no need to allocate id */
+ ret = 0;
+ }
}
return ret;
}
@@ -219,7 +233,7 @@ int misc_register(struct miscdevice *misc)
mutex_lock(&misc_mtx);
if (is_dynamic) {
- int i = misc_minor_alloc();
+ int i = misc_minor_alloc(misc->minor);
if (i < 0) {
err = -EBUSY;
@@ -228,6 +242,7 @@ int misc_register(struct miscdevice *misc)
misc->minor = i;
} else {
struct miscdevice *c;
+ int i;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -235,6 +250,12 @@ int misc_register(struct miscdevice *misc)
goto out;
}
}
+
+ i = misc_minor_alloc(misc->minor);
+ if (i < 0) {
+ err = -EBUSY;
+ goto out;
+ }
}
dev = MKDEV(MISC_MAJOR, misc->minor);
--
2.17.1
misc_minor_alloc was allocating id using ida for minor only in case of
MISC_DYNAMIC_MINOR but misc_minor_free was always freeing ids
using ida_free causing a mismatch and following warn:
> > WARNING: CPU: 0 PID: 159 at lib/idr.c:525 ida_free+0x3e0/0x41f
> > ida_free called for id=127 which is not allocated.
> > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
...
> > [<60941eb4>] ida_free+0x3e0/0x41f
> > [<605ac993>] misc_minor_free+0x3e/0xbc
> > [<605acb82>] misc_deregister+0x171/0x1b3
misc_minor_alloc is changed to allocate id from ida for all minors
falling in the range of dynamic/ misc dynamic minors
Fixes: ab760791c0cf ("char: misc: Increase the maximum number of dynamic misc devices to 1048448")
Signed-off-by: Vimal Agrawal <vimal.agrawal(a)sophos.com>
Cc: stable(a)vger.kernel.org
---
v2: Added Fixes:
added missed case for static minor in misc_minor_alloc
v3: removed kunit changes as that will be added as second patch in this two patch series
drivers/char/misc.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index 541edc26ec89..9d0cd3459b4f 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -63,16 +63,30 @@ static DEFINE_MUTEX(misc_mtx);
#define DYNAMIC_MINORS 128 /* like dynamic majors */
static DEFINE_IDA(misc_minors_ida);
-static int misc_minor_alloc(void)
+static int misc_minor_alloc(int minor)
{
int ret = 0;
- ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
- if (ret >= 0) {
- ret = DYNAMIC_MINORS - ret - 1;
- } else {
- ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
+ if (minor == MISC_DYNAMIC_MINOR) {
+ /* allocate free id */
+ ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
+ if (ret >= 0) {
+ ret = DYNAMIC_MINORS - ret - 1;
+ } else {
+ ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
MINORMASK, GFP_KERNEL);
+ }
+ } else {
+ /* specific minor, check if it is in dynamic or misc dynamic range */
+ if (minor < DYNAMIC_MINORS) {
+ minor = DYNAMIC_MINORS - minor - 1;
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else if (minor > MISC_DYNAMIC_MINOR) {
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else {
+ /* case of non-dynamic minors, no need to allocate id */
+ ret = 0;
+ }
}
return ret;
}
@@ -219,7 +233,7 @@ int misc_register(struct miscdevice *misc)
mutex_lock(&misc_mtx);
if (is_dynamic) {
- int i = misc_minor_alloc();
+ int i = misc_minor_alloc(misc->minor);
if (i < 0) {
err = -EBUSY;
@@ -228,6 +242,7 @@ int misc_register(struct miscdevice *misc)
misc->minor = i;
} else {
struct miscdevice *c;
+ int i;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -235,6 +250,12 @@ int misc_register(struct miscdevice *misc)
goto out;
}
}
+
+ i = misc_minor_alloc(misc->minor);
+ if (i < 0) {
+ err = -EBUSY;
+ goto out;
+ }
}
dev = MKDEV(MISC_MAJOR, misc->minor);
--
2.17.1
misc_minor_alloc was allocating id using ida for minor only in case of
MISC_DYNAMIC_MINOR but misc_minor_free was always freeing ids
using ida_free causing a mismatch and following warn:
> > WARNING: CPU: 0 PID: 159 at lib/idr.c:525 ida_free+0x3e0/0x41f
> > ida_free called for id=127 which is not allocated.
> > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
...
> > [<60941eb4>] ida_free+0x3e0/0x41f
> > [<605ac993>] misc_minor_free+0x3e/0xbc
> > [<605acb82>] misc_deregister+0x171/0x1b3
misc_minor_alloc is changed to allocate id from ida for all minors
falling in the range of dynamic/ misc dynamic minors
Fixes: ab760791c0cf ("char: misc: Increase the maximum number of dynamic misc devices to 1048448")
Signed-off-by: Vimal Agrawal <vimal.agrawal(a)sophos.com>
Cc: stable(a)vger.kernel.org
---
v2: Added Fixes:
added missed case for static minor in misc_minor_alloc
v3: removed kunit changes as that will be added as second patch in this two patch series
drivers/char/misc.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index 541edc26ec89..9d0cd3459b4f 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -63,16 +63,30 @@ static DEFINE_MUTEX(misc_mtx);
#define DYNAMIC_MINORS 128 /* like dynamic majors */
static DEFINE_IDA(misc_minors_ida);
-static int misc_minor_alloc(void)
+static int misc_minor_alloc(int minor)
{
int ret;
- ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
- if (ret >= 0) {
- ret = DYNAMIC_MINORS - ret - 1;
- } else {
- ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
+ if (minor == MISC_DYNAMIC_MINOR) {
+ /* allocate free id */
+ ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
+ if (ret >= 0) {
+ ret = DYNAMIC_MINORS - ret - 1;
+ } else {
+ ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
MINORMASK, GFP_KERNEL);
+ }
+ } else {
+ /* specific minor, check if it is in dynamic or misc dynamic range */
+ if (minor < DYNAMIC_MINORS) {
+ minor = DYNAMIC_MINORS - minor - 1;
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else if (minor > MISC_DYNAMIC_MINOR) {
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else {
+ /* case of non-dynamic minors, no need to allocate id */
+ ret = 0;
+ }
}
return ret;
}
@@ -219,7 +233,7 @@ int misc_register(struct miscdevice *misc)
mutex_lock(&misc_mtx);
if (is_dynamic) {
- int i = misc_minor_alloc();
+ int i = misc_minor_alloc(misc->minor);
if (i < 0) {
err = -EBUSY;
@@ -228,6 +242,7 @@ int misc_register(struct miscdevice *misc)
misc->minor = i;
} else {
struct miscdevice *c;
+ int i;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -235,6 +250,12 @@ int misc_register(struct miscdevice *misc)
goto out;
}
}
+
+ i = misc_minor_alloc(misc->minor);
+ if (i < 0) {
+ err = -EBUSY;
+ goto out;
+ }
}
dev = MKDEV(MISC_MAJOR, misc->minor);
--
2.17.1
If there is an error during some initialization related to firmware,
the buffers dp->tx_ring[i].tx_status are released.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80
Call Trace:
free_large_kmalloc
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set the buffers to NULL after releasing in
order to avoid the double free.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v3:
- Remove unnecessary check to not free if buffer is NULL.
- Trim backtrace.
- Fix typos.
v2: https://lore.kernel.org/linux-wireless/20241016123722.206899-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 789d430e4455..15061782a2df 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1277,8 +1277,10 @@ void ath12k_dp_free(struct ath12k_base *ab)
ath12k_dp_rx_reo_cmd_list_cleanup(ab);
- for (i = 0; i < ab->hw_params->max_tx_ring; i++)
+ for (i = 0; i < ab->hw_params->max_tx_ring; i++) {
kfree(dp->tx_ring[i].tx_status);
+ dp->tx_ring[i].tx_status = NULL;
+ }
ath12k_dp_rx_free(ab);
/* Deinit any SOC level resource */
--
2.47.0
If there is an error during some initialization related to firmware,
the function ath12k_dp_cc_cleanup is called to release resources.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
BUG: kernel NULL pointer dereference, address: 0000000000000020
at RIP: 0010:ath12k_dp_cc_cleanup.part.0+0xb6/0x500 [ath12k]
Call Trace:
ath12k_dp_cc_cleanup
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set to NULL the released structure in
ath12k_dp_cc_cleanup at the end.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v3:
- Trim backtrace.
- Fix typos.
v2: https://lore.kernel.org/linux-wireless/20241016123452.206671-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-2-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 61aa78d8bd8c..789d430e4455 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1241,6 +1241,7 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab)
}
kfree(dp->spt_info);
+ dp->spt_info = NULL;
}
static void ath12k_dp_reoq_lut_cleanup(struct ath12k_base *ab)
--
2.46.2
The quilt patch titled
Subject: mm/mglru: only clear kswapd_failures if reclaimable
has been removed from the -mm tree. Its filename was
mm-mglru-only-clear-kswapd_failures-if-reclaimable.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Wei Xu <weixugc(a)google.com>
Subject: mm/mglru: only clear kswapd_failures if reclaimable
Date: Mon, 14 Oct 2024 22:12:11 +0000
lru_gen_shrink_node() unconditionally clears kswapd_failures, which can
prevent kswapd from sleeping and cause 100% kswapd cpu usage even when
kswapd repeatedly fails to make progress in reclaim.
Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some
progress, similar to shrink_node().
I happened to run into this problem in one of my tests recently. It
requires a combination of several conditions: The allocator needs to
allocate a right amount of pages such that it can wake up kswapd
without itself being OOM killed; there is no memory for kswapd to
reclaim (My test disables swap and cleans page cache first); no other
process frees enough memory at the same time.
Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Wei Xu <weixugc(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Jan Alexander Steffens <heftig(a)archlinux.org>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/vmscan.c~mm-mglru-only-clear-kswapd_failures-if-reclaimable
+++ a/mm/vmscan.c
@@ -4963,8 +4963,8 @@ static void lru_gen_shrink_node(struct p
blk_finish_plug(&plug);
done:
- /* kswapd should never fail */
- pgdat->kswapd_failures = 0;
+ if (sc->nr_reclaimed > reclaimed)
+ pgdat->kswapd_failures = 0;
}
/******************************************************************************
_
Patches currently in -mm which might be from weixugc(a)google.com are
The quilt patch titled
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
has been removed from the -mm tree. Its filename was
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
Date: Tue, 15 Oct 2024 09:45:21 +0800
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The
problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never
be freed because we lost it from page table. We can skip HugeTLB pages
for unuse_vma to fix it.
Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swapfile-skip-hugetlb-pages-for-unuse_vma
+++ a/mm/swapfile.c
@@ -2313,7 +2313,7 @@ static int unuse_mm(struct mm_struct *mm
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
The quilt patch titled
Subject: mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
has been removed from the -mm tree. Its filename was
mm-dont-install-pmd-mappings-when-thps-are-disabled-by-the-hw-process-vma.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
Date: Fri, 11 Oct 2024 12:24:45 +0200
We (or rather, readahead logic :) ) might be allocating a THP in the
pagecache and then try mapping it into a process that explicitly disabled
THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
starting the VM.
For example, starting a VM backed on a file system with large folios
supported makes the VM crash when the VM tries accessing such a mapping
using KVM.
Is it also a problem when the HW disabled THP using
TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow
PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
really wants. For now, fix it by essentially performing the same check as
would be done in __thp_vma_allowable_orders() or in shmem code, where this
works as expected, and disallow PMD mappings, making us fallback to PTE
mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Leo Fu <bfu(a)redhat.com>
Tested-by: Thomas Huth <thuth(a)redhat.com>
Cc: Thomas Huth <thuth(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memory.c~mm-dont-install-pmd-mappings-when-thps-are-disabled-by-the-hw-process-vma
+++ a/mm/memory.c
@@ -4920,6 +4920,15 @@ vm_fault_t do_set_pmd(struct vm_fault *v
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ /*
+ * It is too late to allocate a small folio, we already have a large
+ * folio in the pagecache: especially s390 KVM cannot tolerate any
+ * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
+ * PMD mappings if THPs are disabled.
+ */
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
+ return ret;
+
if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
return ret;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-pagewalk-fix-usage-of-pmd_leaf-pud_leaf-without-present-check.patch
selftests-mm-hugetlb_fault_after_madv-use-default-hguetlb-page-size.patch
selftests-mm-hugetlb_fault_after_madv-improve-test-output.patch
The quilt patch titled
Subject: mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
has been removed from the -mm tree. Its filename was
mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Subject: mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Date: Fri, 11 Oct 2024 12:24:44 +0200
Patch series "mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma".
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the
problematic bit.
For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing the
problematic page, because the readahead logic might decide to use a
PMD-sized THP and faulting it into the page tables will install a PMD
mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can
install a PMD mapping. khugepaged should already be taking care of not
collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by
shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david(a)redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ]
Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Leo Fu <bfu(a)redhat.com>
Tested-by: Thomas Huth <thuth(a)redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Boqiao Fu <bfu(a)redhat.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/huge_mm.h | 18 ++++++++++++++++++
mm/huge_memory.c | 13 +------------
mm/shmem.c | 7 +------
3 files changed, 20 insertions(+), 18 deletions(-)
--- a/include/linux/huge_mm.h~mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw
+++ a/include/linux/huge_mm.h
@@ -322,6 +322,24 @@ struct thpsize {
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma,
+ unsigned long vm_flags)
+{
+ /*
+ * Explicitly disabled through madvise or prctl, or some
+ * architectures may disable THP for some mappings, for
+ * example, s390 kvm.
+ */
+ return (vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags);
+}
+
+static inline bool thp_disabled_by_hw(void)
+{
+ /* If the hardware/firmware marked hugepage support disabled. */
+ return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED);
+}
+
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
--- a/mm/huge_memory.c~mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw
+++ a/mm/huge_memory.c
@@ -109,18 +109,7 @@ unsigned long __thp_vma_allowable_orders
if (!vma->vm_mm) /* vdso */
return 0;
- /*
- * Explicitly disabled through madvise or prctl, or some
- * architectures may disable THP for some mappings, for
- * example, s390 kvm.
- * */
- if ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
- return 0;
- /*
- * If the hardware/firmware marked hugepage support disabled.
- */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
return 0;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */
--- a/mm/shmem.c~mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw
+++ a/mm/shmem.c
@@ -1664,12 +1664,7 @@ unsigned long shmem_allowable_huge_order
loff_t i_size;
int order;
- if (vma && ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)))
- return 0;
-
- /* If the hardware/firmware marked hugepage support disabled. */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
+ if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags)))
return 0;
global_huge = shmem_huge_global_enabled(inode, index, write_end,
_
Patches currently in -mm which might be from wangkefeng.wang(a)huawei.com are
mm-remove-unused-hugepage-for-vma_alloc_folio.patch
The quilt patch titled
Subject: mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
has been removed from the -mm tree. Its filename was
mm-khugepaged-fix-the-arguments-order-in-khugepaged_collapse_file-trace-point.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yang Shi <yang(a)os.amperecomputing.com>
Subject: mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
Date: Fri, 11 Oct 2024 18:17:02 -0700
The "addr" and "is_shmem" arguments have different order in TP_PROTO and
TP_ARGS. This resulted in the incorrect trace result:
text-hugepage-644429 [276] 392092.878683: mm_khugepaged_collapse_file:
mm=0xffff20025d52c440, hpage_pfn=0x200678c00, index=512, addr=1, is_shmem=0,
filename=text-hugepage, nr=512, result=failed
The value of "addr" is wrong because it was treated as bool value, the
type of is_shmem.
Fix the order in TP_PROTO to keep "addr" is before "is_shmem" since the
original patch review suggested this order to achieve best packing.
And use "lx" for "addr" instead of "ld" in TP_printk because address is
typically shown in hex.
After the fix, the trace result looks correct:
text-hugepage-7291 [004] 128.627251: mm_khugepaged_collapse_file:
mm=0xffff0001328f9500, hpage_pfn=0x20016ea00, index=512, addr=0x400000,
is_shmem=0, filename=text-hugepage, nr=512, result=failed
Link: https://lkml.kernel.org/r/20241012011702.1084846-1-yang@os.amperecomputing.…
Fixes: 4c9473e87e75 ("mm/khugepaged: add tracepoint to collapse_file()")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Gautam Menghani <gautammenghani201(a)gmail.com>
Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org> [6.2+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/trace/events/huge_memory.h | 4 ++--
mm/khugepaged.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/include/trace/events/huge_memory.h~mm-khugepaged-fix-the-arguments-order-in-khugepaged_collapse_file-trace-point
+++ a/include/trace/events/huge_memory.h
@@ -208,7 +208,7 @@ TRACE_EVENT(mm_khugepaged_scan_file,
TRACE_EVENT(mm_khugepaged_collapse_file,
TP_PROTO(struct mm_struct *mm, struct folio *new_folio, pgoff_t index,
- bool is_shmem, unsigned long addr, struct file *file,
+ unsigned long addr, bool is_shmem, struct file *file,
int nr, int result),
TP_ARGS(mm, new_folio, index, addr, is_shmem, file, nr, result),
TP_STRUCT__entry(
@@ -233,7 +233,7 @@ TRACE_EVENT(mm_khugepaged_collapse_file,
__entry->result = result;
),
- TP_printk("mm=%p, hpage_pfn=0x%lx, index=%ld, addr=%ld, is_shmem=%d, filename=%s, nr=%d, result=%s",
+ TP_printk("mm=%p, hpage_pfn=0x%lx, index=%ld, addr=%lx, is_shmem=%d, filename=%s, nr=%d, result=%s",
__entry->mm,
__entry->hpfn,
__entry->index,
--- a/mm/khugepaged.c~mm-khugepaged-fix-the-arguments-order-in-khugepaged_collapse_file-trace-point
+++ a/mm/khugepaged.c
@@ -2227,7 +2227,7 @@ rollback:
folio_put(new_folio);
out:
VM_BUG_ON(!list_empty(&pagelist));
- trace_mm_khugepaged_collapse_file(mm, new_folio, index, is_shmem, addr, file, HPAGE_PMD_NR, result);
+ trace_mm_khugepaged_collapse_file(mm, new_folio, index, addr, is_shmem, file, HPAGE_PMD_NR, result);
return result;
}
_
Patches currently in -mm which might be from yang(a)os.amperecomputing.com are
The quilt patch titled
Subject: mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
has been removed from the -mm tree. Its filename was
mm-damon-fix-memory-leak-in-damon_sysfs_test_add_targets.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jinjie Ruan <ruanjinjie(a)huawei.com>
Subject: mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
Date: Thu, 10 Oct 2024 20:53:23 +0800
The sysfs_target->regions allocated in damon_sysfs_regions_alloc() is not
freed in damon_sysfs_test_add_targets(), which cause the following memory
leak, free it to fix it.
unreferenced object 0xffffff80c2a8db80 (size 96):
comm "kunit_try_catch", pid 187, jiffies 4294894363
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
[<0000000001e3714d>] kmemleak_alloc+0x34/0x40
[<000000008e6835c1>] __kmalloc_cache_noprof+0x26c/0x2f4
[<000000001286d9f8>] damon_sysfs_test_add_targets+0x1cc/0x738
[<0000000032ef8f77>] kunit_try_run_case+0x13c/0x3ac
[<00000000f3edea23>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<00000000adf936cf>] kthread+0x2e8/0x374
[<0000000041bb1628>] ret_from_fork+0x10/0x20
Link: https://lkml.kernel.org/r/20241010125323.3127187-1-ruanjinjie@huawei.com
Fixes: b8ee5575f763 ("mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()")
Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/tests/sysfs-kunit.h | 1 +
1 file changed, 1 insertion(+)
--- a/mm/damon/tests/sysfs-kunit.h~mm-damon-fix-memory-leak-in-damon_sysfs_test_add_targets
+++ a/mm/damon/tests/sysfs-kunit.h
@@ -67,6 +67,7 @@ static void damon_sysfs_test_add_targets
damon_destroy_ctx(ctx);
kfree(sysfs_targets->targets_arr);
kfree(sysfs_targets);
+ kfree(sysfs_target->regions);
kfree(sysfs_target);
}
_
Patches currently in -mm which might be from ruanjinjie(a)huawei.com are
The quilt patch titled
Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
has been removed from the -mm tree. Its filename was
lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Florian Westphal <fw(a)strlen.de>
Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
Date: Mon, 7 Oct 2024 22:52:24 +0200
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
[akpm(a)linux-foundation.org: add comment]
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Reported-by: Ben Greear <greearb(a)candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candela…
Cc: Uladzislau Rezki <urezki(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/codetag.c | 3 +++
1 file changed, 3 insertions(+)
--- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls
+++ a/lib/codetag.c
@@ -228,6 +228,9 @@ bool codetag_unload_module(struct module
if (!mod)
return true;
+ /* await any module's kfree_rcu() operations to complete */
+ kvfree_rcu_barrier();
+
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
struct codetag_module *found = NULL;
_
Patches currently in -mm which might be from fw(a)strlen.de are
The quilt patch titled
Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race
has been removed from the -mm tree. Its filename was
mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race
Date: Mon, 07 Oct 2024 23:42:04 +0200
In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.
At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet. For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.
The problem is: The rmap locks, which protect against concurrent page
table removal by retract_page_tables() in the THP code, are only taken
after the PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:
process A process B
========= =========
mremap
mremap_to
move_vma
move_page_tables
get_old_pmd
alloc_new_pmd
*** PREEMPT ***
madvise(MADV_COLLAPSE)
do_madvise
madvise_walk_vmas
madvise_vma_behavior
madvise_collapse
hpage_collapse_scan_file
collapse_file
retract_page_tables
i_mmap_lock_read(mapping)
pmdp_collapse_flush
i_mmap_unlock_read(mapping)
move_pgt_entry(NORMAL_PMD, ...)
take_rmap_locks
move_normal_pmd
drop_rmap_locks
When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect
depends on arch-specific and machine-specific details; on x86, you can end
up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.
Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken. Otherwise, we bail and
let the caller fall back to the PTE-level copying path, which will then
bail immediately at the pmd_none() check.
Bug reachability: Reaching this bug requires that you can create
shmem/file THP mappings - anonymous THP uses different code that doesn't
zap stuff under rmap locks. File THP is gated on an experimental config
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
shmem THP to hit this bug. As far as I know, getting shmem THP normally
requires that you can mount your own tmpfs with the right mount flags,
which would require creating your own user+mount namespace; though I don't
know if some distros maybe enable shmem THP by default or something like
that.
Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.
Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5…
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Closes: https://project-zero.issues.chromium.org/371047675
Acked-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-fix-move_normal_pmd-retract_page_tables-race
+++ a/mm/mremap.c
@@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_ar
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
+ bool res = false;
pmd_t pmd;
if (!arch_supports_page_table_move())
@@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_ar
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
- /* Clear the pmd */
pmd = *old_pmd;
+
+ /* Racing with collapse? */
+ if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd)))
+ goto out_unlock;
+ /* Clear the pmd */
pmd_clear(old_pmd);
+ res = true;
VM_BUG_ON(!pmd_none(*new_pmd));
pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+out_unlock:
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
- return true;
+ return res;
}
#else
static inline bool move_normal_pmd(struct vm_area_struct *vma,
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-mark-mas-allocation-in-vms_abort_munmap_vmas-as-__gfp_nofail.patch
The quilt patch titled
Subject: selftests/mm: fix deadlock for fork after pthread_create on ARM
has been removed from the -mm tree. Its filename was
selftests-mm-fix-deadlock-for-fork-after-pthread_create-on-arm.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Edward Liaw <edliaw(a)google.com>
Subject: selftests/mm: fix deadlock for fork after pthread_create on ARM
Date: Thu, 3 Oct 2024 21:17:11 +0000
On Android with arm, there is some synchronization needed to avoid a
deadlock when forking after pthread_create.
Link: https://lkml.kernel.org/r/20241003211716.371786-3-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/uffd-unit-tests.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~selftests-mm-fix-deadlock-for-fork-after-pthread_create-on-arm
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -241,6 +241,9 @@ static void *fork_event_consumer(void *d
fork_event_args *args = data;
struct uffd_msg msg = { 0 };
+ /* Ready for parent thread to fork */
+ pthread_barrier_wait(&ready_for_fork);
+
/* Read until a full msg received */
while (uffd_read_msg(args->parent_uffd, &msg));
@@ -308,8 +311,12 @@ static int pagemap_test_fork(int uffd, b
/* Prepare a thread to resolve EVENT_FORK */
if (with_event) {
+ pthread_barrier_init(&ready_for_fork, NULL, 2);
if (pthread_create(&thread, NULL, fork_event_consumer, &args))
err("pthread_create()");
+ /* Wait for child thread to start before forking */
+ pthread_barrier_wait(&ready_for_fork);
+ pthread_barrier_destroy(&ready_for_fork);
}
child = fork();
_
Patches currently in -mm which might be from edliaw(a)google.com are
The quilt patch titled
Subject: selftests/mm: replace atomic_bool with pthread_barrier_t
has been removed from the -mm tree. Its filename was
selftests-mm-replace-atomic_bool-with-pthread_barrier_t.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Edward Liaw <edliaw(a)google.com>
Subject: selftests/mm: replace atomic_bool with pthread_barrier_t
Date: Thu, 3 Oct 2024 21:17:10 +0000
Patch series "selftests/mm: fix deadlock after pthread_create".
On Android arm, pthread_create followed by a fork caused a deadlock in the
case where the fork required work to be completed by the created thread.
Update the synchronization primitive to use pthread_barrier instead of
atomic_bool.
Apply the same fix to the wp-fork-with-event test.
This patch (of 2):
Swap synchronization primitive with pthread_barrier, so that stdatomic.h
does not need to be included.
The synchronization is needed on Android ARM64; we see a deadlock with
pthread_create when the parent thread races forward before the child has a
chance to start doing work.
Link: https://lkml.kernel.org/r/20241003211716.371786-1-edliaw@google.com
Link: https://lkml.kernel.org/r/20241003211716.371786-2-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/uffd-common.c | 5 +++--
tools/testing/selftests/mm/uffd-common.h | 3 +--
tools/testing/selftests/mm/uffd-unit-tests.c | 14 ++++++++------
3 files changed, 12 insertions(+), 10 deletions(-)
--- a/tools/testing/selftests/mm/uffd-common.c~selftests-mm-replace-atomic_bool-with-pthread_barrier_t
+++ a/tools/testing/selftests/mm/uffd-common.c
@@ -18,7 +18,7 @@ bool test_uffdio_wp = true;
unsigned long long *count_verify;
uffd_test_ops_t *uffd_test_ops;
uffd_test_case_ops_t *uffd_test_case_ops;
-atomic_bool ready_for_fork;
+pthread_barrier_t ready_for_fork;
static int uffd_mem_fd_create(off_t mem_size, bool hugetlb)
{
@@ -519,7 +519,8 @@ void *uffd_poll_thread(void *arg)
pollfd[1].fd = pipefd[cpu*2];
pollfd[1].events = POLLIN;
- ready_for_fork = true;
+ /* Ready for parent thread to fork */
+ pthread_barrier_wait(&ready_for_fork);
for (;;) {
ret = poll(pollfd, 2, -1);
--- a/tools/testing/selftests/mm/uffd-common.h~selftests-mm-replace-atomic_bool-with-pthread_barrier_t
+++ a/tools/testing/selftests/mm/uffd-common.h
@@ -33,7 +33,6 @@
#include <inttypes.h>
#include <stdint.h>
#include <sys/random.h>
-#include <stdatomic.h>
#include "../kselftest.h"
#include "vm_util.h"
@@ -105,7 +104,7 @@ extern bool map_shared;
extern bool test_uffdio_wp;
extern unsigned long long *count_verify;
extern volatile bool test_uffdio_copy_eexist;
-extern atomic_bool ready_for_fork;
+extern pthread_barrier_t ready_for_fork;
extern uffd_test_ops_t anon_uffd_test_ops;
extern uffd_test_ops_t shmem_uffd_test_ops;
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~selftests-mm-replace-atomic_bool-with-pthread_barrier_t
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -774,7 +774,7 @@ static void uffd_sigbus_test_common(bool
char c;
struct uffd_args args = { 0 };
- ready_for_fork = false;
+ pthread_barrier_init(&ready_for_fork, NULL, 2);
fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
@@ -791,8 +791,9 @@ static void uffd_sigbus_test_common(bool
if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &args))
err("uffd_poll_thread create");
- while (!ready_for_fork)
- ; /* Wait for the poll_thread to start executing before forking */
+ /* Wait for child thread to start before forking */
+ pthread_barrier_wait(&ready_for_fork);
+ pthread_barrier_destroy(&ready_for_fork);
pid = fork();
if (pid < 0)
@@ -833,7 +834,7 @@ static void uffd_events_test_common(bool
char c;
struct uffd_args args = { 0 };
- ready_for_fork = false;
+ pthread_barrier_init(&ready_for_fork, NULL, 2);
fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
if (uffd_register(uffd, area_dst, nr_pages * page_size,
@@ -844,8 +845,9 @@ static void uffd_events_test_common(bool
if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &args))
err("uffd_poll_thread create");
- while (!ready_for_fork)
- ; /* Wait for the poll_thread to start executing before forking */
+ /* Wait for child thread to start before forking */
+ pthread_barrier_wait(&ready_for_fork);
+ pthread_barrier_destroy(&ready_for_fork);
pid = fork();
if (pid < 0)
_
Patches currently in -mm which might be from edliaw(a)google.com are
The quilt patch titled
Subject: fat: fix uninitialized variable
has been removed from the -mm tree. Its filename was
fat-fix-uninitialized-variable.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Subject: fat: fix uninitialized variable
Date: Fri, 04 Oct 2024 15:03:49 +0900
syszbot produced this with a corrupted fs image. In theory, however an IO
error would trigger this also.
This affects just an error report, so should not be a serious error.
Link: https://lkml.kernel.org/r/87r08wjsnh.fsf@mail.parknet.co.jp
Link: https://lkml.kernel.org/r/66ff2c95.050a0220.49194.03e9.GAE@google.com
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Reported-by: syzbot+ef0d7bc412553291aa86(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fat/namei_vfat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fat/namei_vfat.c~fat-fix-uninitialized-variable
+++ a/fs/fat/namei_vfat.c
@@ -1037,7 +1037,7 @@ error_inode:
if (corrupt < 0) {
fat_fs_error(new_dir->i_sb,
"%s: Filesystem corrupted (i_pos %lld)",
- __func__, sinfo.i_pos);
+ __func__, new_i_pos);
}
goto out;
}
_
Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are
The LG Gram Pro 16 2-in-1 (2024) the 16T90SP has its keybopard IRQ (1)
described as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh
which breaks the keyboard.
Add the 16T90SP to the irq1_level_low_skip_override[] quirk table to fix
this.
Reported-by: Dirk Holten <dirk.holten(a)gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219382
Cc: stable(a)vger.kernel.org
Suggested-by: Dirk Holten <dirk.holten(a)gmx.de>
Signed-off-by: Christian Heusel <christian(a)heusel.eu>
---
Note that I do not have the relevant hardware since I'm sending in this
quirk at the request of someone else.
Also does this change need a "Fixes: ..." tag?
---
drivers/acpi/resource.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 129bceb1f4a27df93439bcefdb27fd9c91258028..dd6249fb76c24f08db4149883be4548130d0ef1e 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -502,6 +502,11 @@ static const struct dmi_system_id irq1_level_low_skip_override[] = {
DMI_MATCH(DMI_SYS_VENDOR, "LG Electronics"),
DMI_MATCH(DMI_BOARD_NAME, "17U70P"),
},
+ /* LG Electronics 16T90SP */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LG Electronics"),
+ DMI_MATCH(DMI_BOARD_NAME, "16T90SP"),
+ },
},
{ }
};
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241016-lg-gram-pro-keyboard-9a9d8b9aa647
Best regards,
--
Christian Heusel <christian(a)heusel.eu>
There is a probability that the host machine will also restart
when the virtual machine is restarting.
Commit ad362fe07fec ("KVM: arm64: vgic-its: Avoid potential UAF
in LPI translation cache") released the reference count of an IRQ
when it shouldn't have. This led to a situation where, when the
system finally released the IRQ, it found that the structure had
already been freed, triggering a
'refcount_t: underflow; use-after-free' error.
In fact, the function "vgic_put_irq" should be called by
"vgic_its_inject_cached_translation" instead of
"vgic_its_trigger_msi".
Call trace:
its_free_ite+0x90/0xa0
vgic_its_free_device+0x3c/0xa0
vgic_its_destroy+0x4c/0xb8
kvm_put_kvm+0x214/0x358
kvm_vcpu_release+0x24/0x38
__fput+0x84/0x278
____fput+0x20/0x30
task_work_run+0xcc/0x190
do_exit+0x36c/0xa88
do_group_exit+0x4c/0xb8
__arm64_sys_exit_group+0x24/0x28
invoke_syscall+0x54/0x120
el0_svc_common.constprop.4+0x16c/0x1f0
do_el0_svc+0x34/0xb0
el0_svc+0x1c/0x28
el0_sync_handler+0x8c/0xb0
el0_sync+0x148/0x180
Fixes: ad362fe07fec ("KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wenyao Hai <haiwenyao(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
---
arch/arm64/kvm/vgic/vgic-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index ba945ba78cc7..fb5f57cbab42 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -679,6 +679,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = true;
vgic_queue_irq_unlock(kvm, irq, flags);
+ vgic_put_irq(kvm, irq);
return 0;
}
@@ -697,7 +698,6 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = true;
vgic_queue_irq_unlock(kvm, irq, flags);
- vgic_put_irq(kvm, irq);
return 0;
}
--
2.45.2
With these features are enabled, the EEVDF scheduler introduces a large
performance degradation, observed in multiple database tests on kernel
versions using EEVDF, across multiple architectures (x86, aarch64, amd64)
and CPU generations.
Disable the features to minimize default performance impact.
Cc: <stable(a)vger.kernel.org> # 6.6.x
Fixes: 86bfbb7ce4f6 ("sched/fair: Add lag based placement")
Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption")
Signed-off-by: Cristian Prundeanu <cpru(a)amazon.com>
---
kernel/sched/features.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index a3d331dd2d8f..8a5ca80665b3 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -4,7 +4,7 @@
* Using the avg_vruntime, do the right thing and preserve lag across
* sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled.
*/
-SCHED_FEAT(PLACE_LAG, true)
+SCHED_FEAT(PLACE_LAG, false)
/*
* Give new tasks half a slice to ease into the competition.
*/
@@ -17,7 +17,7 @@ SCHED_FEAT(PLACE_REL_DEADLINE, true)
* Inhibit (wakeup) preemption until the current task has either matched the
* 0-lag point or until is has exhausted it's slice.
*/
-SCHED_FEAT(RUN_TO_PARITY, true)
+SCHED_FEAT(RUN_TO_PARITY, false)
/*
* Allow wakeup of tasks with a shorter slice to cancel RUN_TO_PARITY for
* current.
--
2.40.1
Hi, Conor
Thanks for your patch.
> From: Conor Dooley <conor.dooley(a)microchip.com>
>
> Aurelien reported probe failures due to the csi node being enabled without
> having a camera attached to it. A camera was in the initial submissions, but
> was removed from the dts, as it had not actually been present on the board,
> but was from an addon board used by the developer of the relevant drivers.
> The non-camera pipeline nodes were not disabled when this happened and
> the probe failures are problematic for Debian. Disable them.
>
> CC: stable(a)vger.kernel.org
> Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera subsystem
> nodes")
Here you write it in 13 characters, should be "Fixes: 28ecaaa5af19 ..."
Best Regards
Changhuang.
> Closes: https://lore.kernel.org/all/Zw1-vcN4CoVkfLjU@aurel32.net/
> Reported-by: Aurelien Jarno <aurelien(a)aurel32.net>
> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
> ---
> CC: Emil Renner Berthing <kernel(a)esmil.dk>
> CC: Rob Herring <robh(a)kernel.org>
> CC: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
> CC: Conor Dooley <conor+dt(a)kernel.org>
> CC: Changhuang Liang <changhuang.liang(a)starfivetech.com>
> CC: devicetree(a)vger.kernel.org
> CC: linux-riscv(a)lists.infradead.org
> CC: linux-kernel(a)vger.kernel.org
> ---
> arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> index c7771b3b64758..d6c55f1cc96a9 100644
> --- a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> +++ b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> @@ -128,7 +128,6 @@ &camss {
> assigned-clocks = <&ispcrg JH7110_ISPCLK_DOM4_APB_FUNC>,
> <&ispcrg JH7110_ISPCLK_MIPI_RX0_PXL>;
> assigned-clock-rates = <49500000>, <198000000>;
> - status = "okay";
>
> ports {
> #address-cells = <1>;
> @@ -151,7 +150,6 @@ camss_from_csi2rx: endpoint { &csi2rx {
> assigned-clocks = <&ispcrg JH7110_ISPCLK_VIN_SYS>;
> assigned-clock-rates = <297000000>;
> - status = "okay";
>
> ports {
> #address-cells = <1>;
> --
> 2.45.2
The patch titled
Subject: ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Edward Adam Davis <eadavis(a)qq.com>
Subject: ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
Date: Wed, 16 Oct 2024 19:43:47 +0800
Syzbot reported a kernel BUG in ocfs2_truncate_inline. There are two
reasons for this: first, the parameter value passed is greater than
ocfs2_max_inline_data_with_xattr, second, the start and end parameters of
ocfs2_truncate_inline are "unsigned int".
So, we need to add a sanity check for byte_start and byte_len right before
ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater
than ocfs2_max_inline_data_with_xattr return -EINVAL.
Link: https://lkml.kernel.org/r/tencent_D48DB5122ADDAEDDD11918CFB68D93258C07@qq.c…
Fixes: 1afc32b95233 ("ocfs2: Write support for inline data")
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Reported-by: syzbot+81092778aac03460d6b7(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/fs/ocfs2/file.c~ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow
+++ a/fs/ocfs2/file.c
@@ -1784,6 +1784,14 @@ int ocfs2_remove_inode_range(struct inod
return 0;
if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) {
+ int id_count = ocfs2_max_inline_data_with_xattr(inode->i_sb, di);
+
+ if (byte_start > id_count || byte_start + byte_len > id_count) {
+ ret = -EINVAL;
+ mlog_errno(ret);
+ goto out;
+ }
+
ret = ocfs2_truncate_inline(inode, di_bh, byte_start,
byte_start + byte_len, 0);
if (ret) {
_
Patches currently in -mm which might be from eadavis(a)qq.com are
ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow.patch
Hey,
Would you be interested in acquiring the attendees list of NEPCON NAGOYA 2024?
List contains: Names, Titles, Phone Numbers, Company Details, and more…
Interested? Let me know so that I’ll send you the pricing for the same.
Kind Regards,
Jane Wilkins
Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
Jeongjun Park <aha310510(a)gmail.com> wrote:
>
> I got the following KCSAN report during syzbot testing:
>
> ==================================================================
> BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current
>
> write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
> inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
> inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
> shmem_mknod+0x117/0x180 mm/shmem.c:3443
> shmem_create+0x34/0x40 mm/shmem.c:3497
> lookup_open fs/namei.c:3578 [inline]
> open_last_lookups fs/namei.c:3647 [inline]
> path_openat+0xdbc/0x1f00 fs/namei.c:3883
> do_filp_open+0xf7/0x200 fs/namei.c:3913
> do_sys_openat2+0xab/0x120 fs/open.c:1416
> do_sys_open fs/open.c:1431 [inline]
> __do_sys_openat fs/open.c:1447 [inline]
> __se_sys_openat fs/open.c:1442 [inline]
> __x64_sys_openat+0xf3/0x120 fs/open.c:1442
> x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
> inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
> inode_get_ctime include/linux/fs.h:1629 [inline]
> generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
> shmem_getattr+0x17b/0x200 mm/shmem.c:1157
> vfs_getattr_nosec fs/stat.c:166 [inline]
> vfs_getattr+0x19b/0x1e0 fs/stat.c:207
> vfs_statx_path fs/stat.c:251 [inline]
> vfs_statx+0x134/0x2f0 fs/stat.c:315
> vfs_fstatat+0xec/0x110 fs/stat.c:341
> __do_sys_newfstatat fs/stat.c:505 [inline]
> __se_sys_newfstatat+0x58/0x260 fs/stat.c:499
> __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
> x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> value changed: 0x2755ae53 -> 0x27ee44d3
>
> Since there is no special protection when shmem_getattr() calls
> generic_fillattr(), data-race occurs by functions such as shmem_unlink()
> or shmem_mknod(). This can cause unexpected results, so commenting it out
> is not enough.
>
> Therefore, when calling generic_fillattr() from shmem_getattr(), it is
> appropriate to protect the inode using inode_lock_shared() and
> inode_unlock_shared() to prevent data-race.
>
Cc: stable(a)vger.kernel.org
I think this patch should be applied from next rc version and also stable
version. When calling generic_fillattr(), if you don't hold read lock,
data-race will occur in inode member variables, which can cause unexpected
behavior. This problem is also present in several stable versions, so I think
it should be fixed as soon as possible.
Regards,
Jeongjun Park
> Reported-by: syzbot <syzkaller(a)googlegroups.com>
> Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat")
> Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
> ---
> mm/shmem.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5a77acf6ac6a..9beeb47c3743 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1154,7 +1154,9 @@ static int shmem_getattr(struct mnt_idmap *idmap,
> stat->attributes_mask |= (STATX_ATTR_APPEND |
> STATX_ATTR_IMMUTABLE |
> STATX_ATTR_NODUMP);
> + inode_lock_shared(inode);
> generic_fillattr(idmap, request_mask, inode, stat);
> + inode_unlock_shared(inode);
>
> if (shmem_is_huge(inode, 0, false, NULL, 0))
> stat->blksize = HPAGE_PMD_SIZE;
> --
The patch titled
Subject: mm/gup: stop leaking pinned pages in low memory conditions
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: mm/gup: stop leaking pinned pages in low memory conditions
Date: Wed, 16 Oct 2024 13:22:42 -0700
If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) family
of functions, and requests "too many" pages, then the call will
erroneously leave pages pinned. This is visible in user space as an
actual memory leak.
Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls to
exhaust memory.
The root cause of the problem is this sequence, within
__gup_longterm_locked():
__get_user_pages_locked()
rc = check_and_migrate_movable_pages()
...which gets retried in a loop. The loop error handling is incomplete,
clearly due to a somewhat unusual and complicated tri-state error API.
But anyway, if -ENOMEM, or in fact, any unexpected error is returned from
check_and_migrate_movable_pages(), then __gup_longterm_locked() happily
returns the error, while leaving the pages pinned.
In the failed case, which is an app that requests (via a device driver)
30720000000 bytes to be pinned, and then exits, I see this:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 2048
And after applying this patch, it returns to balanced pins:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 7502048
Fix this by unpinning the pages that __get_user_pages_locked() has
pinned, in such error cases.
Link: https://lkml.kernel.org/r/20241016202242.456953-1-jhubbard@nvidia.com
Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Shigeru Yoshida <syoshida(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/mm/gup.c~mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions
+++ a/mm/gup.c
@@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct
/* FOLL_LONGTERM implies FOLL_PIN */
rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
+
+ /*
+ * The __get_user_pages_locked() call happens before we know
+ * that whether it's possible to successfully complete the whole
+ * operation. To compensate for this, if we get an unexpected
+ * error (such as -ENOMEM) then we must unpin everything, before
+ * erroring out.
+ */
+ if (rc != -EAGAIN && rc != 0)
+ unpin_user_pages(pages, nr_pinned_pages);
+
} while (rc == -EAGAIN);
memalloc_pin_restore(flags);
return rc ? rc : nr_pinned_pages;
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions.patch
kaslr-rename-physmem_end-and-physmem_end-to-direct_map_physmem_end.patch
The patch titled
Subject: x86/traps: move kmsan check after instrumentation_begin
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
x86-traps-move-kmsan-check-after-instrumentation_begin.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Subject: x86/traps: move kmsan check after instrumentation_begin
Date: Wed, 16 Oct 2024 20:24:07 +0500
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:
AR built-in.a
AR vmlinux.a
LD vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
kmsan_unpoison_entry_regs() leaves .noinstr.text section
OBJCOPY modules.builtin.modinfo
GEN modules.builtin
MODPOST Module.symvers
CC .vmlinux.export.o
Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.
There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.
Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/traps.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/traps.c~x86-traps-move-kmsan-check-after-instrumentation_begin
+++ a/arch/x86/kernel/traps.c
@@ -261,12 +261,6 @@ static noinstr bool handle_bug(struct pt
int ud_type;
u32 imm;
- /*
- * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug()
- * is a rare case that uses @regs without passing them to
- * irqentry_enter().
- */
- kmsan_unpoison_entry_regs(regs);
ud_type = decode_bug(regs->ip, &imm);
if (ud_type == BUG_NONE)
return handled;
@@ -276,6 +270,12 @@ static noinstr bool handle_bug(struct pt
*/
instrumentation_begin();
/*
+ * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug()
+ * is a rare case that uses @regs without passing them to
+ * irqentry_enter().
+ */
+ kmsan_unpoison_entry_regs(regs);
+ /*
* Since we're emulating a CALL with exceptions, restore the interrupt
* state to what it was at the exception site.
*/
_
Patches currently in -mm which might be from snovitoll(a)gmail.com are
x86-traps-move-kmsan-check-after-instrumentation_begin.patch
mm-kasan-kmsan-copy_from-to_kernel_nofault.patch
-Wenum-enum-conversion and -Wenum-compare-conditional were strengthened
in clang-19 to warn in C mode, which caused the kernel to move them to
W=1 in commit 75b5ab134bb5 ("kbuild: Move
-Wenum-{compare-conditional,enum-conversion} into W=1") because there
were numerous instances of each that would break builds with -Werror.
Unfortunately, this is not a full solution, as more and more developers,
subsystems, and distributors are building with W=1 as well, so they
continue to see the numerous instances of these warnings.
Since the move to W=1, there have not been many new instances that have
appeared through various build reports and the ones that have appeared
seem to be following similar existing patterns, suggesting that most
instances of these warnings will not be real issues. The only
alternatives for silencing these warnings are adding casts (which is
generally seen as an ugly practice) or refactoring the enums to macro
defines or a unified enum (which may be undesirable because of type
safety in other parts of the code).
Disable the warnings altogether so that W=1 users do not see them.
Cc: stable(a)vger.kernel.org
Fixes: 75b5ab134bb5 ("kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1")
Link: https://lore.kernel.org/ZwRA9SOcOjjLJcpi@google.com/
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I am open to discussion around whether or not to disable this entirely
but I think it needs to be out of W=1.
---
scripts/Makefile.extrawarn | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 1d13cecc7cc7808610e635ddc03476cf92b3a8c1..3f124c80bee5133899047c07bc05e3173ee4c7f0 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -31,6 +31,8 @@ KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
ifdef CONFIG_CC_IS_CLANG
# The kernel builds with '-std=gnu11' so use of GNU extensions is acceptable.
KBUILD_CFLAGS += -Wno-gnu
+KBUILD_CFLAGS += -Wno-enum-compare-conditional
+KBUILD_CFLAGS += -Wno-enum-enum-conversion
else
# gcc inanely warns about local variables called 'main'
@@ -129,8 +131,6 @@ endif
KBUILD_CFLAGS += $(call cc-disable-warning, pointer-to-enum-cast)
KBUILD_CFLAGS += -Wno-tautological-constant-out-of-range-compare
KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access)
-KBUILD_CFLAGS += -Wno-enum-compare-conditional
-KBUILD_CFLAGS += -Wno-enum-enum-conversion
endif
endif
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241016-disable-two-clang-enum-warnings-e7994d44f948
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Add missing Broadcast_AND region to the LLCC block for x1e80100,
as the LLCC version on this platform is 4.1 and it provides the region.
This also fixes the following error caused by the missing region:
[ 3.797768] qcom-llcc 25000000.system-cache-controller: error -EINVAL: invalid resource (null)
This error started showing up only after the new regmap region called
Broadcast_AND that has been added to the llcc-qcom driver.
Cc: stable(a)vger.kernel.org # 6.11: 055afc34fd21: soc: qcom: llcc: Add regmap for Broadcast_AND region
Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts")
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v2:
- fixed subject line to say x1e80100 instead of sm8450
- mentioned the reason why the new error is showing up
and how it is related to the llcc-qcom driver
- cc'ed stable with patch dependency for cherry-picking
- Link to v1: https://lore.kernel.org/r/20240718-x1e80100-dts-llcc-add-broadcastand_regio…
---
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index 0e6802c1d2d8375987c614ec69c440e2f38d25c6..fbf1acf8b0d84a2d2c723785242a65f47e63340b 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -6093,7 +6093,8 @@ system-cache-controller@25000000 {
<0 0x25a00000 0 0x200000>,
<0 0x25c00000 0 0x200000>,
<0 0x25e00000 0 0x200000>,
- <0 0x26000000 0 0x200000>;
+ <0 0x26000000 0 0x200000>,
+ <0 0x26200000 0 0x200000>;
reg-names = "llcc0_base",
"llcc1_base",
"llcc2_base",
@@ -6102,7 +6103,8 @@ system-cache-controller@25000000 {
"llcc5_base",
"llcc6_base",
"llcc7_base",
- "llcc_broadcast_base";
+ "llcc_broadcast_base",
+ "llcc_broadcast_and_base";
interrupts = <GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>;
};
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20240718-x1e80100-dts-llcc-add-broadcastand_region-797413ee2a8f
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
This problem reported by Clement LE GOFFIC manifest when
using CONFIG_KASAN_IN_VMALLOC and VMAP_STACK:
https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db…
After some analysis it seems we are missing to sync the
VMALLOC shadow memory in top level PGD to all CPUs.
Add some code to perform this sync, and the bug appears
to go away.
As suggested by Ard, also perform a dummy read from the
shadow memory of the new VMAP_STACK in the low level
assembly.
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
Linus Walleij (2):
ARM: ioremap: Flush PGDs for VMALLOC shadow
ARM: entry: Do a dummy read from VMAP shadow
arch/arm/kernel/entry-armv.S | 8 ++++++++
arch/arm/mm/ioremap.c | 7 +++++++
2 files changed, 15 insertions(+)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241015-arm-kasan-vmalloc-crash-fcbd51416457
Best regards,
--
Linus Walleij <linus.walleij(a)linaro.org>
When copying a namespace we won't have added the new copy into the
namespace rbtree until after the copy succeeded. Calling free_mnt_ns()
will try to remove the copy from the rbtree which is invalid. Simply
free the namespace skeleton directly.
Fixes: 1901c92497bd ("fs: keep an index of current mount namespaces")
Cc: stable(a)vger.kernel.org # v6.11+
Reported-by: Brad Spengler <spender(a)grsecurity.net>
Tested-by: Brad Spengler <spender(a)grsecurity.net>
Suggested-by: Brad Spengler <spender(a)grsecurity.net>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
In vfs.fixes unless I hear objections.
---
fs/namespace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 93c377816d75..d26f5e6d2ca3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3944,7 +3944,9 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
new = copy_tree(old, old->mnt.mnt_root, copy_flags);
if (IS_ERR(new)) {
namespace_unlock();
- free_mnt_ns(new_ns);
+ ns_free_inum(&new_ns->ns);
+ dec_mnt_namespaces(new_ns->ucounts);
+ mnt_ns_release(new_ns);
return ERR_CAST(new);
}
if (user_ns != ns->user_ns) {
--
2.45.2
iowarrior_read() uses the iowarrior dev structure, but does not use any
lock on the structure. This can cause various bugs including data-races,
so it is more appropriate to use a mutex lock to safely protect the
iowarrior dev structure. When using a mutex lock, you should split the
branch to prevent blocking when the O_NONBLOCK flag is set.
In addition, it is unnecessary to check for NULL on the iowarrior dev
structure obtained by reading file->private_data. Therefore, it is
better to remove the check.
Cc: stable(a)vger.kernel.org
Fixes: 946b960d13c1 ("USB: add driver for iowarrior devices.")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
---
v1 -> v2: Added cc tag and change log
drivers/usb/misc/iowarrior.c | 46 ++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 10 deletions(-)
diff --git a/drivers/usb/misc/iowarrior.c b/drivers/usb/misc/iowarrior.c
index 6d28467ce352..a513766b4985 100644
--- a/drivers/usb/misc/iowarrior.c
+++ b/drivers/usb/misc/iowarrior.c
@@ -277,28 +277,45 @@ static ssize_t iowarrior_read(struct file *file, char __user *buffer,
struct iowarrior *dev;
int read_idx;
int offset;
+ int retval;
dev = file->private_data;
+ if (file->f_flags & O_NONBLOCK) {
+ retval = mutex_trylock(&dev->mutex);
+ if (!retval)
+ return -EAGAIN;
+ } else {
+ retval = mutex_lock_interruptible(&dev->mutex);
+ if (retval)
+ return -ERESTARTSYS;
+ }
+
/* verify that the device wasn't unplugged */
- if (!dev || !dev->present)
- return -ENODEV;
+ if (!dev->present) {
+ retval = -ENODEV;
+ goto exit;
+ }
dev_dbg(&dev->interface->dev, "minor %d, count = %zd\n",
dev->minor, count);
/* read count must be packet size (+ time stamp) */
if ((count != dev->report_size)
- && (count != (dev->report_size + 1)))
- return -EINVAL;
+ && (count != (dev->report_size + 1))) {
+ retval = -EINVAL;
+ goto exit;
+ }
/* repeat until no buffer overrun in callback handler occur */
do {
atomic_set(&dev->overflow_flag, 0);
if ((read_idx = read_index(dev)) == -1) {
/* queue empty */
- if (file->f_flags & O_NONBLOCK)
- return -EAGAIN;
+ if (file->f_flags & O_NONBLOCK) {
+ retval = -EAGAIN;
+ goto exit;
+ }
else {
//next line will return when there is either new data, or the device is unplugged
int r = wait_event_interruptible(dev->read_wait,
@@ -309,28 +326,37 @@ static ssize_t iowarrior_read(struct file *file, char __user *buffer,
-1));
if (r) {
//we were interrupted by a signal
- return -ERESTART;
+ retval = -ERESTART;
+ goto exit;
}
if (!dev->present) {
//The device was unplugged
- return -ENODEV;
+ retval = -ENODEV;
+ goto exit;
}
if (read_idx == -1) {
// Can this happen ???
- return 0;
+ retval = 0;
+ goto exit;
}
}
}
offset = read_idx * (dev->report_size + 1);
if (copy_to_user(buffer, dev->read_queue + offset, count)) {
- return -EFAULT;
+ retval = -EFAULT;
+ goto exit;
}
} while (atomic_read(&dev->overflow_flag));
read_idx = ++read_idx == MAX_INTERRUPT_BUFFER ? 0 : read_idx;
atomic_set(&dev->read_idx, read_idx);
+ mutex_unlock(&dev->mutex);
return count;
+
+exit:
+ mutex_unlock(&dev->mutex);
+ return retval;
}
/*
--
On 11.10.24 10:48, Joel GUITTET wrote:
>
> I faced some issue related to scaling frequency on ZynqMP device
> using v5.15.167 kernel. As an exemple setting the scaling frequency
> below show it's not properly set:
>
> cat /sys/devices/system/cpu/cpufreq/policy0/
> scaling_available_frequencies 299999 399999 599999 1199999
>
> echo 399999 > /sys/devices/system/cpu/cpufreq/policy0/
> scaling_setspeed
>
> cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq 399999
>
> cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq 299999
> ====> Should be 399999
>
> After analysis of this issue with the help of Xilinx, it appears
> that a commit was backported on the 5.15.y branch, but probably it
> should not, or not as is. The commit is
> 9117fc44fd3a9538261e530ba5a022dfc9519620 modifying drivers/clk/
> zynqmp/divider.c.
FWIW, that is 1fe15be1fb6135 ("drivers: clk: zynqmp: update divider
round rate logic") [v6.8-rc1].
> Is anybody reading this message able to answer why it was
> backported ?
Looks like because it fixes a bug. I CCed the original author and those
that handled the patch, maybe they can help us out and tell us what's
the best strategy forward here.
> The information I have until now is that it is intended
> recent kernel version. Dependencies for this modification is
> currently under clarification with Xilinx (maybe another commit to
> backport).
>
> By the way, reverting this commit fix the issue shown above.
Does 6.12-rc work fine for you? Because if not, we should fix the
problem there.
Ciao, Thorsten
The NVMe regulator has been left enabled by the boot firmware. Mark it
as such to avoid disabling the regulator temporarily during boot.
Fixes: eb57cbe730d1 ("arm64: dts: qcom: x1e80100: Describe the PCIe 6a resources")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
index 1c3a6a7b3ed6..5ef030c60abe 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
@@ -253,6 +253,8 @@ vreg_nvme: regulator-nvme {
pinctrl-names = "default";
pinctrl-0 = <&nvme_reg_en>;
+
+ regulator-boot-on;
};
};
--
2.45.2
The NVMe regulator has been left enabled by the boot firmware. Mark it
as such to avoid disabling the regulator temporarily during boot.
Fixes: d0e2f8f62dff ("arm64: dts: qcom: Add device tree for ASUS Vivobook S 15")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Xilin Wu <wuxilin123(a)gmail.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts b/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts
index 20616bd4aa6c..fb4a48a1e2a8 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts
@@ -134,6 +134,8 @@ vreg_nvme: regulator-nvme {
pinctrl-0 = <&nvme_reg_en>;
pinctrl-names = "default";
+
+ regulator-boot-on;
};
};
--
2.45.2
The NVMe regulator has been left enabled by the boot firmware. Mark it
as such to avoid disabling the regulator temporarily during boot.
Fixes: eb57cbe730d1 ("arm64: dts: qcom: x1e80100: Describe the PCIe 6a resources")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-crd.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-crd.dts b/arch/arm64/boot/dts/qcom/x1e80100-crd.dts
index 10b28d870f08..5d2eec8590ce 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-crd.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-crd.dts
@@ -300,6 +300,8 @@ vreg_nvme: regulator-nvme {
pinctrl-names = "default";
pinctrl-0 = <&nvme_reg_en>;
+
+ regulator-boot-on;
};
vreg_wwan: regulator-wwan {
--
2.45.2
The environ pointer itself is never NULL, this is guaranteed by crt.h.
However if the environment is empty, environ will point to a NULL
pointer.
While this case will be checked by the loop later, this only happens
after the first loop iteration.
To avoid reading invalid memory inside the loop, fix the test that
checks for an empty environment.
Fixes: 077d0a392446 ("tools/nolibc/stdlib: add a simple getenv() implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/include/nolibc/stdlib.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/stdlib.h b/tools/include/nolibc/stdlib.h
index 75aa273c23a6153db6a32facaea16457a522703b..c31967378cf1f699283d801487c1a91d17e4d1ce 100644
--- a/tools/include/nolibc/stdlib.h
+++ b/tools/include/nolibc/stdlib.h
@@ -90,7 +90,7 @@ char *getenv(const char *name)
{
int idx, i;
- if (environ) {
+ if (*environ) {
for (idx = 0; environ[idx]; idx++) {
for (i = 0; name[i] && name[i] == environ[idx][i];)
i++;
---
base-commit: 2f87d0916ce0d2925cedbc9e8f5d6291ba2ac7b2
change-id: 20241016-nolibc-getenv-f99f4d3df2cd
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Treat each completed full size write to /dev/ttyDBC0 as a separate usb
transfer. Make sure the size of the TRBs matches the size of the tty
write by first queuing as many max packet size TRBs as possible up to
the last TRB which will be cut short to match the size of the tty write.
This solves an issue where userspace writes several transfers back to
back via /dev/ttyDBC0 into a kfifo before dbgtty can find available
request to turn that kfifo data into TRBs on the transfer ring.
The boundary between transfer was lost as xhci-dbgtty then turned
everyting in the kfifo into as many 'max packet size' TRBs as possible.
DbC would then send more data to the host than intended for that
transfer, causing host to issue a babble error.
Refuse to write more data to kfifo until previous tty write data is
turned into properly sized TRBs with data size boundaries matching tty
write size
Tested-by: Uday M Bhat <uday.m.bhat(a)intel.com>
Tested-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-dbgcap.h | 1 +
drivers/usb/host/xhci-dbgtty.c | 55 ++++++++++++++++++++++++++++++----
2 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-dbgcap.h b/drivers/usb/host/xhci-dbgcap.h
index 8ec813b6e9fd..9dc8f4d8077c 100644
--- a/drivers/usb/host/xhci-dbgcap.h
+++ b/drivers/usb/host/xhci-dbgcap.h
@@ -110,6 +110,7 @@ struct dbc_port {
struct tasklet_struct push;
struct list_head write_pool;
+ unsigned int tx_boundary;
bool registered;
};
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index b8e78867e25a..d719c16ea30b 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -24,6 +24,29 @@ static inline struct dbc_port *dbc_to_port(struct xhci_dbc *dbc)
return dbc->priv;
}
+static unsigned int
+dbc_kfifo_to_req(struct dbc_port *port, char *packet)
+{
+ unsigned int len;
+
+ len = kfifo_len(&port->port.xmit_fifo);
+
+ if (len == 0)
+ return 0;
+
+ len = min(len, DBC_MAX_PACKET);
+
+ if (port->tx_boundary)
+ len = min(port->tx_boundary, len);
+
+ len = kfifo_out(&port->port.xmit_fifo, packet, len);
+
+ if (port->tx_boundary)
+ port->tx_boundary -= len;
+
+ return len;
+}
+
static int dbc_start_tx(struct dbc_port *port)
__releases(&port->port_lock)
__acquires(&port->port_lock)
@@ -36,7 +59,7 @@ static int dbc_start_tx(struct dbc_port *port)
while (!list_empty(pool)) {
req = list_entry(pool->next, struct dbc_request, list_pool);
- len = kfifo_out(&port->port.xmit_fifo, req->buf, DBC_MAX_PACKET);
+ len = dbc_kfifo_to_req(port, req->buf);
if (len == 0)
break;
do_tty_wake = true;
@@ -200,14 +223,32 @@ static ssize_t dbc_tty_write(struct tty_struct *tty, const u8 *buf,
{
struct dbc_port *port = tty->driver_data;
unsigned long flags;
+ unsigned int written = 0;
spin_lock_irqsave(&port->port_lock, flags);
- if (count)
- count = kfifo_in(&port->port.xmit_fifo, buf, count);
- dbc_start_tx(port);
+
+ /*
+ * Treat tty write as one usb transfer. Make sure the writes are turned
+ * into TRB request having the same size boundaries as the tty writes.
+ * Don't add data to kfifo before previous write is turned into TRBs
+ */
+ if (port->tx_boundary) {
+ spin_unlock_irqrestore(&port->port_lock, flags);
+ return 0;
+ }
+
+ if (count) {
+ written = kfifo_in(&port->port.xmit_fifo, buf, count);
+
+ if (written == count)
+ port->tx_boundary = kfifo_len(&port->port.xmit_fifo);
+
+ dbc_start_tx(port);
+ }
+
spin_unlock_irqrestore(&port->port_lock, flags);
- return count;
+ return written;
}
static int dbc_tty_put_char(struct tty_struct *tty, u8 ch)
@@ -241,6 +282,10 @@ static unsigned int dbc_tty_write_room(struct tty_struct *tty)
spin_lock_irqsave(&port->port_lock, flags);
room = kfifo_avail(&port->port.xmit_fifo);
+
+ if (port->tx_boundary)
+ room = 0;
+
spin_unlock_irqrestore(&port->port_lock, flags);
return room;
--
2.25.1
The stream contex type (SCT) bitfield is used both in the stream context
data structure, and in the 'Set TR Dequeue pointer' command TRB.
In both cases it uses bits 3:1
The SCT_FOR_TRB(p) macro used to set the stream context type (SCT) field
for the 'Set TR Dequeue pointer' command TRB incorrectly shifts the value
1 bit left before masking the three bits.
Fix this by first masking and rshifting, just like the similar
SCT_FOR_CTX(p) macro does
This issue has not been visibile as the lost bit 3 is only used with
secondary stream arrays (SSA). Xhci driver currently only supports using
a primary stream array with Linear stream addressing.
Fixes: 95241dbdf828 ("xhci: Set SCT field for Set TR dequeue on streams")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 620502de971a..f0fb696d5619 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1001,7 +1001,7 @@ enum xhci_setup_dev {
/* Set TR Dequeue Pointer command TRB fields, 6.4.3.9 */
#define TRB_TO_STREAM_ID(p) ((((p) & (0xffff << 16)) >> 16))
#define STREAM_ID_FOR_TRB(p) ((((p)) & 0xffff) << 16)
-#define SCT_FOR_TRB(p) (((p) << 1) & 0x7)
+#define SCT_FOR_TRB(p) (((p) & 0x7) << 1)
/* Link TRB specific fields */
#define TRB_TC (1<<1)
--
2.25.1
The struct vchiq_arm_state 'platform_state' is currently allocated
dynamically using kzalloc(). Unfortunately, it is never freed and is
subjected to memory leaks in the error handling paths of the probe()
function.
To address the issue, use device resource management helper
devm_kzalloc(), to ensure cleanup after its allocation.
Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index af623ad87c15..7ece82c361ee 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -285,7 +285,7 @@ vchiq_platform_init_state(struct vchiq_state *state)
{
struct vchiq_arm_state *platform_state;
- platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL);
+ platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL);
if (!platform_state)
return -ENOMEM;
--
2.45.2
The PLL checks are comparing 64 bit integers with 32 bit
ones. Depending on the values of the variables, this may
underflow.
Fix it ensuring that both sides of the expression are u64.
Fixes: 852b50aeed15 ("media: On Semi AR0521 sensor driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/i2c/ar0521.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/i2c/ar0521.c b/drivers/media/i2c/ar0521.c
index fc27238dd4d3..24873149096c 100644
--- a/drivers/media/i2c/ar0521.c
+++ b/drivers/media/i2c/ar0521.c
@@ -255,10 +255,10 @@ static u32 calc_pll(struct ar0521_dev *sensor, u32 freq, u16 *pre_ptr, u16 *mult
continue; /* Minimum value */
if (new_mult > 254)
break; /* Maximum, larger pre won't work either */
- if (sensor->extclk_freq * (u64)new_mult < AR0521_PLL_MIN *
+ if (sensor->extclk_freq * (u64)new_mult < (u64)AR0521_PLL_MIN *
new_pre)
continue;
- if (sensor->extclk_freq * (u64)new_mult > AR0521_PLL_MAX *
+ if (sensor->extclk_freq * (u64)new_mult > (u64)AR0521_PLL_MAX *
new_pre)
break; /* Larger pre won't work either */
new_pll = div64_round_up(sensor->extclk_freq * (u64)new_mult,
--
2.47.0
The error check logic at get_ctrl() is broken: if ptr_to_user()
fails to fill a control due to an error, no errors are returned
and v4l2_g_ctrl() returns success on a failed operation, which
may cause applications to fail.
Add an error check at get_ctrl() and ensure that it will
be returned to userspace without filling the control value if
get_ctrl() fails.
Fixes: 71c689dc2e73 ("media: v4l2-ctrls: split up into four source files")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/v4l2-core/v4l2-ctrls-api.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-ctrls-api.c b/drivers/media/v4l2-core/v4l2-ctrls-api.c
index e5a364efd5e6..be32dccf9830 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls-api.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls-api.c
@@ -753,9 +753,10 @@ static int get_ctrl(struct v4l2_ctrl *ctrl, struct v4l2_ext_control *c)
for (i = 0; i < master->ncontrols; i++)
cur_to_new(master->cluster[i]);
ret = call_op(master, g_volatile_ctrl);
- new_to_user(c, ctrl);
+ if (!ret)
+ ret = new_to_user(c, ctrl);
} else {
- cur_to_user(c, ctrl);
+ ret = cur_to_user(c, ctrl);
}
v4l2_ctrl_unlock(master);
return ret;
@@ -770,7 +771,10 @@ int v4l2_g_ctrl(struct v4l2_ctrl_handler *hdl, struct v4l2_control *control)
if (!ctrl || !ctrl->is_int)
return -EINVAL;
ret = get_ctrl(ctrl, &c);
- control->value = c.value;
+
+ if (!ret)
+ control->value = c.value;
+
return ret;
}
EXPORT_SYMBOL(v4l2_g_ctrl);
--
2.47.0
The logic at tpg_precalculate_line() blindly rescales the
buffer even when scaled_witdh is equal to zero. If this ever
happens, this will cause a division by zero.
Instead, add a WARN_ON() to trigger such cases and return
without doing any precalculation.
Fixes: 63881df94d3e ("[media] vivid: add the Test Pattern Generator")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
index c86343a4d0bf..a22f31515d7e 100644
--- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
+++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
@@ -1795,6 +1795,9 @@ static void tpg_precalculate_line(struct tpg_data *tpg)
unsigned p;
unsigned x;
+ if (WARN_ON(tpg->src_width == 0))
+ return;
+
switch (tpg->pattern) {
case TPG_PAT_GREEN:
contrast = TPG_COLOR_100_RED;
--
2.47.0
The logic at get_edid_tag_location() returns either an offset
or an error condition. However, the error condition uses a
non-standard "-1" value.
Use instead -ENOENT to indicate that the tag was not found.
Fixes: 056f2821b631 ("media: cec: extron-da-hd-4k-plus: add the Extron DA HD 4K Plus CEC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
.../cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c b/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c
index a526464af88c..7d03a36df5cf 100644
--- a/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c
+++ b/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c
@@ -348,12 +348,12 @@ static int get_edid_tag_location(const u8 *edid, unsigned int size,
/* Return if not a CTA-861 extension block */
if (size < 256 || edid[0] != 0x02 || edid[1] != 0x03)
- return -1;
+ return -ENOENT;
/* search tag */
d = edid[0x02] & 0x7f;
if (d <= 4)
- return -1;
+ return -ENOENT;
i = 0x04;
end = 0x00 + d;
@@ -371,7 +371,7 @@ static int get_edid_tag_location(const u8 *edid, unsigned int size,
return offset + i;
i += len + 1;
} while (i < end);
- return -1;
+ return -ENOENT;
}
static void extron_edid_crc(u8 *edid)
--
2.47.0
fepriv->auto_sub_step is unsigned. Setting it to -1 is just a
trick to avoid calling continue.
It relies to have this code just afterwards:
if (!ready) fepriv->auto_sub_step++;
Simplify the code by simply setting it to zero and use
continue to return to the while loop.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/dvb-core/dvb_frontend.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c
index d48f48fda87c..c9283100332a 100644
--- a/drivers/media/dvb-core/dvb_frontend.c
+++ b/drivers/media/dvb-core/dvb_frontend.c
@@ -443,8 +443,8 @@ static int dvb_frontend_swzigzag_autotune(struct dvb_frontend *fe, int check_wra
default:
fepriv->auto_step++;
- fepriv->auto_sub_step = -1; /* it'll be incremented to 0 in a moment */
- break;
+ fepriv->auto_sub_step = 0;
+ continue;
}
if (!ready) fepriv->auto_sub_step++;
--
2.47.0
The dvbdev contains a static variable used to store dvb minors.
The behavior of it depends if CONFIG_DVB_DYNAMIC_MINORS is set
or not. When not set, dvb_register_device() won't check for
boundaries, as it will rely that a previous call to
dvb_register_adapter() would already be enforcing it.
On a similar way, dvb_device_open() uses the assumption
that the register functions already did the needed checks.
This can be fragile if some device ends using different
calls. This also generate warnings on static check analysers.
So, add explicit guards to prevent potential risk of OOM issues.
Fixes: 5dd3f3071070 ("V4L/DVB (9361): Dynamic DVB minor allocation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/dvb-core/dvbdev.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/media/dvb-core/dvbdev.c b/drivers/media/dvb-core/dvbdev.c
index b43695bc51e7..14f323fbada7 100644
--- a/drivers/media/dvb-core/dvbdev.c
+++ b/drivers/media/dvb-core/dvbdev.c
@@ -86,10 +86,15 @@ static DECLARE_RWSEM(minor_rwsem);
static int dvb_device_open(struct inode *inode, struct file *file)
{
struct dvb_device *dvbdev;
+ unsigned int minor = iminor(inode);
+
+ if (minor >= MAX_DVB_MINORS)
+ return -ENODEV;
mutex_lock(&dvbdev_mutex);
down_read(&minor_rwsem);
- dvbdev = dvb_minors[iminor(inode)];
+
+ dvbdev = dvb_minors[minor];
if (dvbdev && dvbdev->fops) {
int err = 0;
@@ -525,7 +530,7 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
for (minor = 0; minor < MAX_DVB_MINORS; minor++)
if (!dvb_minors[minor])
break;
- if (minor == MAX_DVB_MINORS) {
+ if (minor >= MAX_DVB_MINORS) {
if (new_node) {
list_del(&new_node->list_head);
kfree(dvbdevfops);
@@ -540,6 +545,14 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
}
#else
minor = nums2minor(adap->num, type, id);
+ if (minor >= MAX_DVB_MINORS) {
+ dvb_media_device_free(dvbdev);
+ list_del(&dvbdev->list_head);
+ kfree(dvbdev);
+ *pdvbdev = NULL;
+ mutex_unlock(&dvbdev_register_lock);
+ return ret;
+ }
#endif
dvbdev->minor = minor;
dvb_minors[minor] = dvb_device_get(dvbdev);
--
2.47.0
From: Ard Biesheuvel <ardb(a)kernel.org>
GCC and Clang both implement stack protector support based on Thread
Local Storage (TLS) variables, and this is used in the kernel to
implement per-task stack cookies, by copying a task's stack cookie into
a per-CPU variable every time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits
the TLS variable to be specified directly. This is useful because it
will allow us to move away from using a fixed offset of 40 bytes into
the per-CPU area on x86_64, which requires a lot of special handling in
the per-CPU code and the runtime relocation code.
However, while GCC is rather lax in its implementation of this command
line option, Clang actually requires that the provided symbol name
refers to a TLS variable (i.e., one declared with __thread), although it
also permits the variable to be undeclared entirely, in which case it
will use an implicit declaration of the right type.
The upshot of this is that Clang will emit the correct references to the
stack cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in
the same compilation unit (which amounts to the whole of vmlinux if LTO
is enabled), it will drop the per-CPU prefix and emit a load from a
bogus address.
Work around this by using a symbol name that never occurs in C code, and
emit it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Cc: <stable(a)vger.kernel.org>
Cc: Fangrui Song <i(a)maskray.me>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Uros Bizjak <ubizjak(a)gmail.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
v2: add declaration of alias to asm-prototypes.h, but expose it only to
genksyms
arch/x86/Makefile | 5 +++--
arch/x86/entry/entry.S | 16 ++++++++++++++++
arch/x86/include/asm/asm-prototypes.h | 3 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/vmlinux.lds.S | 3 +++
5 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index d9feadffa972..a503e6d535f8 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -46,3 +46,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 07a34d723505..ba83f54dfaa8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2085,8 +2085,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 2b7c8c14c6fd..a80ad2bf8da4 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -490,6 +490,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
--
2.47.0.rc0.187.ge670bccf7e-goog
Commit 904140fa4553 ("dt-bindings: pinctrl: samsung: use Exynos7
fallbacks for newer wake-up controllers") added
samsung,exynos7-wakeup-eint fallback to some compatibles, so the
intention in the if:then: conditions was to handle the cases:
1. Single Exynos7 compatible or Exynos5433+Exynos7 or
Exynos7885+Exynos7: only one interrupt
2. Exynos850+Exynos7: no interrupts
This was not implemented properly however and if:then: block matches
only single Exynos5433 or Exynos7885 compatibles, which do not exist in
DTS anymore, so basically is a no-op and no enforcement on number of
interrupts is made by the binding.
Fix the if:then: condition so interrupts in the Exynos5433 and
Exynos7885 wake-up pin controller will be properly constrained.
Fixes: 904140fa4553 ("dt-bindings: pinctrl: samsung: use Exynos7 fallbacks for newer wake-up controllers")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Cc: Igor Belwon <igor.belwon(a)mentallysanemainliners.org>
---
.../samsung,pinctrl-wakeup-interrupt.yaml | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml b/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml
index 91516fedc872..49cb2b1a3d28 100644
--- a/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml
@@ -92,14 +92,17 @@ allOf:
- if:
properties:
compatible:
- # Match without "contains", to skip newer variants which are still
- # compatible with samsung,exynos7-wakeup-eint
- enum:
- - samsung,s5pv210-wakeup-eint
- - samsung,exynos4210-wakeup-eint
- - samsung,exynos5433-wakeup-eint
- - samsung,exynos7-wakeup-eint
- - samsung,exynos7885-wakeup-eint
+ oneOf:
+ # Match without "contains", to skip newer variants which are still
+ # compatible with samsung,exynos7-wakeup-eint
+ - enum:
+ - samsung,exynos4210-wakeup-eint
+ - samsung,exynos7-wakeup-eint
+ - samsung,s5pv210-wakeup-eint
+ - contains:
+ enum:
+ - samsung,exynos5433-wakeup-eint
+ - samsung,exynos7885-wakeup-eint
then:
properties:
interrupts:
--
2.43.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: ffd95846c6ec6cf1f93da411ea10d504036cab42
Gitweb: https://git.kernel.org/tip/ffd95846c6ec6cf1f93da411ea10d504036cab42
Author: Zhang Rui <rui.zhang(a)intel.com>
AuthorDate: Tue, 15 Oct 2024 14:15:22 +08:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Tue, 15 Oct 2024 05:45:18 -07:00
x86/apic: Always explicitly disarm TSC-deadline timer
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.
The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.
Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.
Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.
1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revisio…
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
---
arch/x86/kernel/apic/apic.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 6513c53..c5fb28e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -440,7 +440,19 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v = apic_read(APIC_LVTT);
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
- apic_write(APIC_TMICT, 0);
+
+ /*
+ * Setting APIC_LVT_MASKED (above) should be enough to tell
+ * the hardware that this timer will never fire. But AMD
+ * erratum 411 and some Intel CPU behavior circa 2024 say
+ * otherwise. Time for belt and suspenders programming: mask
+ * the timer _and_ zero the counter registers:
+ */
+ if (v & APIC_LVT_TIMER_TSCDEADLINE)
+ wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
+ else
+ apic_write(APIC_TMICT, 0);
+
return 0;
}
The patch titled
Subject: fork: only invoke khugepaged, ksm hooks if no error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: fork: only invoke khugepaged, ksm hooks if no error
Date: Tue, 15 Oct 2024 18:56:06 +0100
There is no reason to invoke these hooks early against an mm that is in an
incomplete state.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.
Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in
any case.
Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't
really matter all that much if, under such dire memory pressure, we fail
to register an mm with these.
As a result, we follow the example of commit d2081b2bf819 ("mm:
khugepaged: make khugepaged_enter() void function") and make ksm_fork() a
void function also.
We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.
Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.17290143…
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Jann Horn <jannh(a)google.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Linus Torvalds <torvalds(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ksm.h | 10 ++++------
kernel/fork.c | 7 ++-----
2 files changed, 6 insertions(+), 11 deletions(-)
--- a/include/linux/ksm.h~fork-only-invoke-khugepaged-ksm-hooks-if-no-error
+++ a/include/linux/ksm.h
@@ -54,12 +54,11 @@ static inline long mm_ksm_zero_pages(str
return atomic_long_read(&mm->ksm_zero_pages);
}
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
+ /* Adding mm to ksm is best effort on fork. */
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
- return __ksm_enter(mm);
-
- return 0;
+ __ksm_enter(mm);
}
static inline int ksm_execve(struct mm_struct *mm)
@@ -107,9 +106,8 @@ static inline int ksm_disable(struct mm_
return 0;
}
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
- return 0;
}
static inline int ksm_execve(struct mm_struct *mm)
--- a/kernel/fork.c~fork-only-invoke-khugepaged-ksm-hooks-if-no-error
+++ a/kernel/fork.c
@@ -653,11 +653,6 @@ static __latent_entropy int dup_mmap(str
mm->exec_vm = oldmm->exec_vm;
mm->stack_vm = oldmm->stack_vm;
- retval = ksm_fork(mm, oldmm);
- if (retval)
- goto out;
- khugepaged_fork(mm, oldmm);
-
/* Use __mt_dup() to efficiently build an identical maple tree. */
retval = __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL);
if (unlikely(retval))
@@ -760,6 +755,8 @@ loop_out:
vma_iter_free(&vmi);
if (!retval) {
mt_set_in_rcu(vmi.mas.tree);
+ ksm_fork(mm, oldmm);
+ khugepaged_fork(mm, oldmm);
} else if (mpnt) {
/*
* The entire maple tree has already been duplicated. If the
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-mmap-correct-error-handling-in-mmap_region.patch
maple_tree-correct-tree-corruption-on-spanning-store.patch
maple_tree-add-regression-test-for-spanning-store-bug.patch
maintainers-add-memory-mapping-vma-co-maintainers.patch
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch
mm-refactor-mm_access-to-not-return-null.patch
mm-refactor-mm_access-to-not-return-null-fix.patch
mm-madvise-unrestrict-process_madvise-for-current-process.patch
maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch
The patch titled
Subject: fork: do not invoke uffd on fork if error occurs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: fork: do not invoke uffd on fork if error occurs
Date: Tue, 15 Oct 2024 18:56:05 +0100
Patch series "fork: do not expose incomplete mm on fork".
During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.
In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.
As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.
We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.
This patch (of 2):
Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.
This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().
This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.
We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().
We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.
Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.
Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.17290143…
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Linus Torvalds <torvalds(a)linuxfoundation.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 28 ++++++++++++++++++++++++++++
include/linux/userfaultfd_k.h | 5 +++++
kernel/fork.c | 5 ++++-
3 files changed, 37 insertions(+), 1 deletion(-)
--- a/fs/userfaultfd.c~fork-do-not-invoke-uffd-on-fork-if-error-occurs
+++ a/fs/userfaultfd.c
@@ -692,6 +692,34 @@ void dup_userfaultfd_complete(struct lis
}
}
+void dup_userfaultfd_fail(struct list_head *fcs)
+{
+ struct userfaultfd_fork_ctx *fctx, *n;
+
+ /*
+ * An error has occurred on fork, we will tear memory down, but have
+ * allocated memory for fctx's and raised reference counts for both the
+ * original and child contexts (and on the mm for each as a result).
+ *
+ * These would ordinarily be taken care of by a user handling the event,
+ * but we are no longer doing so, so manually clean up here.
+ *
+ * mm tear down will take care of cleaning up VMA contexts.
+ */
+ list_for_each_entry_safe(fctx, n, fcs, list) {
+ struct userfaultfd_ctx *octx = fctx->orig;
+ struct userfaultfd_ctx *ctx = fctx->new;
+
+ atomic_dec(&octx->mmap_changing);
+ VM_BUG_ON(atomic_read(&octx->mmap_changing) < 0);
+ userfaultfd_ctx_put(octx);
+ userfaultfd_ctx_put(ctx);
+
+ list_del(&fctx->list);
+ kfree(fctx);
+ }
+}
+
void mremap_userfaultfd_prep(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx *vm_ctx)
{
--- a/include/linux/userfaultfd_k.h~fork-do-not-invoke-uffd-on-fork-if-error-occurs
+++ a/include/linux/userfaultfd_k.h
@@ -249,6 +249,7 @@ static inline bool vma_can_userfault(str
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
extern void dup_userfaultfd_complete(struct list_head *);
+void dup_userfaultfd_fail(struct list_head *);
extern void mremap_userfaultfd_prep(struct vm_area_struct *,
struct vm_userfaultfd_ctx *);
@@ -351,6 +352,10 @@ static inline void dup_userfaultfd_compl
{
}
+static inline void dup_userfaultfd_fail(struct list_head *l)
+{
+}
+
static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx *ctx)
{
--- a/kernel/fork.c~fork-do-not-invoke-uffd-on-fork-if-error-occurs
+++ a/kernel/fork.c
@@ -775,7 +775,10 @@ out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
mmap_write_unlock(oldmm);
- dup_userfaultfd_complete(&uf);
+ if (!retval)
+ dup_userfaultfd_complete(&uf);
+ else
+ dup_userfaultfd_fail(&uf);
fail_uprobe_end:
uprobe_end_dup_mmap();
return retval;
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-mmap-correct-error-handling-in-mmap_region.patch
maple_tree-correct-tree-corruption-on-spanning-store.patch
maple_tree-add-regression-test-for-spanning-store-bug.patch
maintainers-add-memory-mapping-vma-co-maintainers.patch
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch
mm-refactor-mm_access-to-not-return-null.patch
mm-refactor-mm_access-to-not-return-null-fix.patch
mm-madvise-unrestrict-process_madvise-for-current-process.patch
maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch
The patch titled
Subject: nilfs2: fix kernel bug due to missing clearing of buffer delay flag
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix kernel bug due to missing clearing of buffer delay flag
Date: Wed, 16 Oct 2024 06:32:07 +0900
Syzbot reported that after nilfs2 reads a corrupted file system image and
degrades to read-only, the BUG_ON check for the buffer delay flag in
submit_bh_wbc() may fail, causing a kernel bug.
This is because the buffer delay flag is not cleared when clearing the
buffer state flags to discard a page/folio or a buffer head. So, fix
this.
This became necessary when the use of nilfs2's own page clear routine was
expanded. This state inconsistency does not occur if the buffer is
written normally by log writing.
Link: https://lkml.kernel.org/r/20241015213300.7114-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+985ada84bf055a575c07(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985ada84bf055a575c07
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/page.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/page.c~nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag
+++ a/fs/nilfs2/page.c
@@ -77,7 +77,8 @@ void nilfs_forget_buffer(struct buffer_h
const unsigned long clear_bits =
(BIT(BH_Uptodate) | BIT(BH_Dirty) | BIT(BH_Mapped) |
BIT(BH_Async_Write) | BIT(BH_NILFS_Volatile) |
- BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected));
+ BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected) |
+ BIT(BH_Delay));
lock_buffer(bh);
set_mask_bits(&bh->b_state, clear_bits, 0);
@@ -406,7 +407,8 @@ void nilfs_clear_folio_dirty(struct foli
const unsigned long clear_bits =
(BIT(BH_Uptodate) | BIT(BH_Dirty) | BIT(BH_Mapped) |
BIT(BH_Async_Write) | BIT(BH_NILFS_Volatile) |
- BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected));
+ BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected) |
+ BIT(BH_Delay));
bh = head;
do {
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-propagate-directory-read-errors-from-nilfs_find_entry.patch
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag.patch
tpm2_load_null() has weak and broken error handling:
- The return value of tpm2_create_primary() is ignored.
- Leaks TPM return codes from tpm2_load_context() to the caller.
- If the key name comparison succeeds returns previous error
instead of zero to the caller.
Implement a proper error rollback.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: eb24c9788cd9 ("tpm: disable the TPM if NULL name changes")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v6:
- Address Stefan's remark:
https://lore.kernel.org/linux-integrity/def4ec2d-584b-405f-9d5e-99267013c3c…
v5:
- Fix the TPM error code leak from tpm2_load_context().
v4:
- No changes.
v3:
- Update log messages. Previously the log message incorrectly stated
on load failure that integrity check had been failed, even tho the
check is done *after* the load operation.
v2:
- Refined the commit message.
- Reverted tpm2_create_primary() changes. They are not required if
tmp_null_key is used as the parameter.
---
drivers/char/tpm/tpm2-sessions.c | 43 +++++++++++++++++---------------
1 file changed, 23 insertions(+), 20 deletions(-)
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index 253639767c1e..1215c53f0ae7 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -915,33 +915,36 @@ static int tpm2_parse_start_auth_session(struct tpm2_auth *auth,
static int tpm2_load_null(struct tpm_chip *chip, u32 *null_key)
{
- int rc;
unsigned int offset = 0; /* dummy offset for null seed context */
u8 name[SHA256_DIGEST_SIZE + 2];
+ u32 tmp_null_key;
+ int rc;
rc = tpm2_load_context(chip, chip->null_key_context, &offset,
- null_key);
- if (rc != -EINVAL)
- return rc;
+ &tmp_null_key);
+ if (rc != -EINVAL) {
+ if (!rc)
+ *null_key = tmp_null_key;
+ goto err;
+ }
- /* an integrity failure may mean the TPM has been reset */
- dev_err(&chip->dev, "NULL key integrity failure!\n");
- /* check the null name against what we know */
- tpm2_create_primary(chip, TPM2_RH_NULL, NULL, name);
- if (memcmp(name, chip->null_key_name, sizeof(name)) == 0)
- /* name unchanged, assume transient integrity failure */
- return rc;
- /*
- * Fatal TPM failure: the NULL seed has actually changed, so
- * the TPM must have been illegally reset. All in-kernel TPM
- * operations will fail because the NULL primary can't be
- * loaded to salt the sessions, but disable the TPM anyway so
- * userspace programmes can't be compromised by it.
- */
- dev_err(&chip->dev, "NULL name has changed, disabling TPM due to interference\n");
+ rc = tpm2_create_primary(chip, TPM2_RH_NULL, &tmp_null_key, name);
+ if (rc)
+ goto err;
+
+ /* Return the null key if the name has not been changed: */
+ if (memcmp(name, chip->null_key_name, sizeof(name)) == 0) {
+ *null_key = tmp_null_key;
+ return 0;
+ }
+
+ /* Deduce from the name change TPM interference: */
+ dev_err(&chip->dev, "the null key integrity check failedh\n");
+ tpm2_flush_context(chip, tmp_null_key);
chip->flags |= TPM_CHIP_FLAG_DISABLE;
- return rc;
+err:
+ return rc ? -ENODEV : 0;
}
/**
--
2.47.0
These patches address some issues I spotted while looking at kprobes and
uprobes.
Patch 1 is the most pressing, as a uprobes user can trigger a kernel
BUG(). Patches 2 and 3 fix latent endianness bugs which only manifest on
big-endian kernels, and patchs 4-6 clean things up so that it's harder
to get this wrong again in future.
Mark.
Mark Rutland (6):
arm64: probes: Remove broken LDR (literal) uprobe support
arm64: probes: Fix simulate_ldr*_literal()
arm64: probes: Fix uprobes for big-endian kernels
arm64: probes: Move kprobes-specific fields
arm64: probes: Cleanup kprobes endianness conversions
arm64: probes: Remove probe_opcode_t
arch/arm64/include/asm/probes.h | 11 +++----
arch/arm64/include/asm/uprobes.h | 8 ++---
arch/arm64/kernel/probes/decode-insn.c | 22 ++++++++-----
arch/arm64/kernel/probes/decode-insn.h | 2 +-
arch/arm64/kernel/probes/kprobes.c | 39 ++++++++++++------------
arch/arm64/kernel/probes/simulate-insn.c | 18 +++++------
arch/arm64/kernel/probes/uprobes.c | 8 ++---
7 files changed, 53 insertions(+), 55 deletions(-)
--
2.30.2
If some remap_pfn_range() calls succeeded before one failed, we still have
buffer pages mapped into the userspace page tables when we drop the buffer
reference with comedi_buf_map_put(bm). The userspace mappings are only
cleaned up later in the mmap error path.
Fix it by explicitly flushing all mappings in our VMA on the comedi_mmap()
error path.
See commit 79a61cc3fc04 ("mm: avoid leaving partial pfn mappings around in
error case").
Cc: stable(a)vger.kernel.org
Fixes: ed9eccbe8970 ("Staging: add comedi core")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Note: compile-tested only; I don't actually have comedi hardware, and I
don't know anything about comedi.
---
drivers/comedi/comedi_fops.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/comedi/comedi_fops.c b/drivers/comedi/comedi_fops.c
index 1b481731df96..0e573df8646f 100644
--- a/drivers/comedi/comedi_fops.c
+++ b/drivers/comedi/comedi_fops.c
@@ -2414,6 +2414,15 @@ static int comedi_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_private_data = bm;
vma->vm_ops->open(vma);
+ } else {
+ /*
+ * Leaving behind a partial mapping of a buffer we're about to
+ * drop is unsafe, see remap_pfn_range_notrack().
+ * We need to zap the range here ourselves instead of relying
+ * on the automatic zapping in remap_pfn_range() because we call
+ * remap_pfn_range() in a loop.
+ */
+ zap_page_range_single(vma, vma->vm_start, size, NULL);
}
done:
---
base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
change-id: 20241014-comedi-tlb-400246505961
--
Jann Horn <jannh(a)google.com>