When handling non-swap entries in move_pages_pte(), the error handling
for entries that are NOT migration entries fails to unmap the page table
entries before jumping to the error handling label.
This results in a kmap/kunmap imbalance which on CONFIG_HIGHPTE systems
triggers a WARNING in kunmap_local_indexed() because the kmap stack is
corrupted.
Example call trace on ARM32 (CONFIG_HIGHPTE enabled):
WARNING: CPU: 1 PID: 633 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
Call trace:
kunmap_local_indexed from move_pages+0x964/0x19f4
move_pages from userfaultfd_ioctl+0x129c/0x2144
userfaultfd_ioctl from sys_ioctl+0x558/0xd24
The issue was introduced with the UFFDIO_MOVE feature but became more
frequent with the addition of guard pages (commit 7c53dfbdb024 ("mm: add
PTE_MARKER_GUARD PTE marker")) which made the non-migration entry code
path more commonly executed during userfaultfd operations.
Fix this by ensuring PTEs are properly unmapped in all non-swap entry
paths before jumping to the error handling label, not just for migration
entries.
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
mm/userfaultfd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 8253978ee0fb1..7c298e9cbc18f 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1384,14 +1384,15 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
+ src_pte = dst_pte = NULL;
if (is_migration_entry(entry)) {
- pte_unmap(src_pte);
- pte_unmap(dst_pte);
- src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
- } else
+ } else {
err = -EFAULT;
+ }
goto out;
}
--
2.39.5
A new warning in Clang 22 [1] complains that @clidr passed to
get_clidr_el1() is an uninitialized const pointer. get_clidr_el1()
doesn't really care since it casts away the const-ness anyways -- it is
a false positive.
| ../arch/arm64/kvm/sys_regs.c:2838:23: warning: variable 'clidr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
| 2838 | get_clidr_el1(NULL, &clidr); /* Ugly... */
| | ^~~~~
Disable this warning for sys_regs.o with an iron fist as it doesn't make
sense to waste maintainer's time or potentially break builds by
backporting large changelists from 6.2+.
This patch isn't needed for anything past 6.1 as this code section was
reworked in Commit 7af0c2534f4c ("KVM: arm64: Normalize cache
configuration").
Cc: stable(a)vger.kernel.org
Fixes: 7c8c5e6a9101e ("arm64: KVM: system register handling")
Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d44… [1]
Signed-off-by: Justin Stitt <justinstitt(a)google.com>
---
I'm sending a similar patch for 6.1.
---
arch/arm64/kvm/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..109cca425d3e 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,6 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o
+
+# Work around a false positive Clang 22 -Wuninitialized-const-pointer warning
+CFLAGS_sys_regs.o := $(call cc-disable-warning, uninitialized-const-pointer)
---
base-commit: 8bb7eca972ad531c9b149c0a51ab43a417385813
change-id: 20250728-b4-stable-disable-uninit-ptr-warn-5-15-c0c9db3df206
Best regards,
--
Justin Stitt <justinstitt(a)google.com>
A new warning in Clang 22 [1] complains that @clidr passed to
get_clidr_el1() is an uninitialized const pointer. get_clidr_el1()
doesn't really care since it casts away the const-ness anyways -- it is
a false positive.
| ../arch/arm64/kvm/sys_regs.c:2978:23: warning: variable 'clidr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
| 2978 | get_clidr_el1(NULL, &clidr); /* Ugly... */
| | ^~~~~
Disable this warning for sys_regs.o with an iron fist as it doesn't make
sense to waste maintainer's time or potentially break builds by
backporting large changelists from 6.2+.
This patch isn't needed for anything past 6.1 as this code section was
reworked in Commit 7af0c2534f4c ("KVM: arm64: Normalize cache
configuration").
Cc: stable(a)vger.kernel.org
Fixes: 7c8c5e6a9101e ("arm64: KVM: system register handling")
Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d44… [1]
Signed-off-by: Justin Stitt <justinstitt(a)google.com>
---
I've sent a similar patch for 5.15.
---
arch/arm64/kvm/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5e33c2d4645a..5fdb5331bfad 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -24,6 +24,9 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
+# Work around a false positive Clang 22 -Wuninitialized-const-pointer warning
+CFLAGS_sys_regs.o := $(call cc-disable-warning, uninitialized-const-pointer)
+
always-y := hyp_constants.h hyp-constants.s
define rule_gen_hyp_constants
---
base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
change-id: 20250728-stable-disable-unit-ptr-warn-281fee82539c
Best regards,
--
Justin Stitt <justinstitt(a)google.com>
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series consists of backports of two of Lorenzo's series [2] and
[3].
Note: for [2], I dropped the last patch in that series, since it
wouldn't make sense to apply it due to [4] being part of this tree. In
lieu of that, I backported [3] to ultimately allow write-sealed memfds
to be mapped as read-only.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
[3] https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.17328047…
[4] https://lore.kernel.org/all/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022…
Lorenzo Stoakes (4):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: reinstate ability to map write-sealed memfd mappings read-only
selftests/memfd: add test for mapping write-sealed memfd read-only
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 +-
include/linux/memfd.h | 14 ++++
include/linux/mm.h | 80 +++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/memfd.c | 2 +-
mm/mmap.c | 10 ++-
mm/shmem.c | 2 +-
tools/testing/selftests/memfd/memfd_test.c | 43 ++++++++++++
11 files changed, 129 insertions(+), 34 deletions(-)
--
2.50.1.552.g942d659e1b-goog
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 72 ++++++++++---------
2 files changed, 41 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index efe98ffb679a4..b2538cff222ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2570,9 +2570,6 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
adev->firmware.gpu_info_fw = NULL;
- if (adev->mman.discovery_bin)
- return 0;
-
switch (adev->asic_type) {
default:
return 0;
@@ -2594,6 +2591,8 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
chip_name = "arcturus";
break;
case CHIP_NAVI12:
+ if (adev->mman.discovery_bin)
+ return 0;
chip_name = "navi12";
break;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 81b3443c8d7f4..27bd7659961e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -2555,40 +2555,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
switch (adev->asic_type) {
case CHIP_VEGA10:
- case CHIP_VEGA12:
- case CHIP_RAVEN:
- case CHIP_VEGA20:
- case CHIP_ARCTURUS:
- case CHIP_ALDEBARAN:
- /* this is not fatal. We have a fallback below
- * if the new firmwares are not present. some of
- * this will be overridden below to keep things
- * consistent with the current behavior.
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
*/
- r = amdgpu_discovery_reg_base_init(adev);
- if (!r) {
- amdgpu_discovery_harvest_ip(adev);
- amdgpu_discovery_get_gfx_info(adev);
- amdgpu_discovery_get_mall_info(adev);
- amdgpu_discovery_get_vcn_info(adev);
- }
- break;
- default:
- r = amdgpu_discovery_reg_base_init(adev);
- if (r) {
- drm_err(&adev->ddev, "discovery failed: %d\n", r);
- return r;
- }
-
- amdgpu_discovery_harvest_ip(adev);
- amdgpu_discovery_get_gfx_info(adev);
- amdgpu_discovery_get_mall_info(adev);
- amdgpu_discovery_get_vcn_info(adev);
- break;
- }
-
- switch (adev->asic_type) {
- case CHIP_VEGA10:
+ amdgpu_discovery_init(adev);
vega10_reg_base_init(adev);
adev->sdma.num_instances = 2;
adev->gmc.num_umc = 4;
@@ -2611,6 +2582,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[DCI_HWIP][0] = IP_VERSION(12, 0, 0);
break;
case CHIP_VEGA12:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
vega10_reg_base_init(adev);
adev->sdma.num_instances = 2;
adev->gmc.num_umc = 4;
@@ -2633,6 +2609,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[DCI_HWIP][0] = IP_VERSION(12, 0, 1);
break;
case CHIP_RAVEN:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
vega10_reg_base_init(adev);
adev->sdma.num_instances = 1;
adev->vcn.num_vcn_inst = 1;
@@ -2674,6 +2655,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
}
break;
case CHIP_VEGA20:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
vega20_reg_base_init(adev);
adev->sdma.num_instances = 2;
adev->gmc.num_umc = 8;
@@ -2697,6 +2683,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[DCI_HWIP][0] = IP_VERSION(12, 1, 0);
break;
case CHIP_ARCTURUS:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
arct_reg_base_init(adev);
adev->sdma.num_instances = 8;
adev->vcn.num_vcn_inst = 2;
@@ -2725,6 +2716,11 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[UVD_HWIP][1] = IP_VERSION(2, 5, 0);
break;
case CHIP_ALDEBARAN:
+ /* This is not fatal. We only need the discovery
+ * binary for sysfs. We don't need it for a
+ * functional system.
+ */
+ amdgpu_discovery_init(adev);
aldebaran_reg_base_init(adev);
adev->sdma.num_instances = 5;
adev->vcn.num_vcn_inst = 2;
@@ -2751,6 +2747,16 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev)
adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 1, 0);
break;
default:
+ r = amdgpu_discovery_reg_base_init(adev);
+ if (r) {
+ drm_err(&adev->ddev, "discovery failed: %d\n", r);
+ return r;
+ }
+
+ amdgpu_discovery_harvest_ip(adev);
+ amdgpu_discovery_get_gfx_info(adev);
+ amdgpu_discovery_get_mall_info(adev);
+ amdgpu_discovery_get_vcn_info(adev);
break;
}
--
2.50.1
Make sure to drop the references taken to the PMC OF node and device by
of_parse_phandle() and of_find_device_by_node() during probe.
Note the holding a reference to the PMC device does not prevent the
PMC regmap from going away (e.g. if the PMC driver is unbound) so there
is no need to keep the reference.
Fixes: 2d1021487273 ("phy: tegra: xusb: Add wake/sleepwalk for Tegra210")
Cc: stable(a)vger.kernel.org # 5.14
Cc: JC Kuo <jckuo(a)nvidia.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/phy/tegra/xusb-tegra210.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/tegra/xusb-tegra210.c b/drivers/phy/tegra/xusb-tegra210.c
index ebc8a7e21a31..3409924498e9 100644
--- a/drivers/phy/tegra/xusb-tegra210.c
+++ b/drivers/phy/tegra/xusb-tegra210.c
@@ -3164,18 +3164,22 @@ tegra210_xusb_padctl_probe(struct device *dev,
}
pdev = of_find_device_by_node(np);
+ of_node_put(np);
if (!pdev) {
dev_warn(dev, "PMC device is not available\n");
goto out;
}
- if (!platform_get_drvdata(pdev))
+ if (!platform_get_drvdata(pdev)) {
+ put_device(&pdev->dev);
return ERR_PTR(-EPROBE_DEFER);
+ }
padctl->regmap = dev_get_regmap(&pdev->dev, "usb_sleepwalk");
if (!padctl->regmap)
dev_info(dev, "failed to find PMC regmap\n");
+ put_device(&pdev->dev);
out:
return &padctl->base;
}
--
2.49.1
Recently, we encountered the following hungtask:
INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/11:2 D 0 2981147 2 0x80004000
Workqueue: cgroup_destroy css_free_rwork_fn
Call Trace:
__schedule+0x934/0xe10
schedule+0x40/0xb0
wb_wait_for_completion+0x52/0x80
? finish_wait+0x80/0x80
mem_cgroup_css_free+0x3a/0x1b0
css_free_rwork_fn+0x42/0x380
process_one_work+0x1a2/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x110/0x130
? __kthread_cancel_work+0x40/0x40
ret_from_fork+0x1f/0x30
This is because the writeback thread has been continuously and repeatedly
throttled by wbt, but at the same time, the writes of another thread
proceed quite smoothly.
After debugging, I believe it is caused by the following reasons.
When thread A is blocked by wbt, the I/O issued by thread B will
use a deeper queue depth(rwb->rq_depth.max_depth) because it
meets the conditions of wb_recent_wait(), thus allowing thread B's
I/O to be issued smoothly and resulting in the inflight I/O of wbt
remaining relatively high.
However, when I/O completes, due to the high inflight I/O of wbt,
the condition "limit - inflight >= rwb->wb_background / 2"
in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
to remain unable to be woken up.
Some on-site information:
>>> rwb.rq_depth.max_depth
(unsigned int)48
>>> rqw.inflight.counter.value_()
44
>>> rqw.inflight.counter.value_()
35
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)3
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)2
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)20
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)12
cat wb_normal
24
cat wb_background
12
To fix this issue, we can use max_depth in wbt_rqw_done(), so that
the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
will also be consistent, which is more reasonable.
Signed-off-by: Julian Sun <sunjunchao(a)bytedance.com>
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
---
block/blk-wbt.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a50d4cd55f41..d6a2782d442f 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
!wb_recent_wait(rwb))
limit = 0;
+ else if (wb_recent_wait(rwb))
+ limit = rwb->rq_depth.max_depth;
else
limit = rwb->wb_normal;
--
2.20.1
Using device_find_child() and of_find_device_by_node() to locate
devices could cause an imbalance in the device's reference count.
device_find_child() and of_find_device_by_node() both call
get_device() to increment the reference count of the found device
before returning the pointer. In mtk_drm_get_all_drm_priv(), these
references are never released through put_device(), resulting in
permanent reference count increments. Additionally, the
for_each_child_of_node() iterator fails to release node references in
all code paths. This leaks device node references when loop
termination occurs before reaching MAX_CRTC. These reference count
leaks may prevent device/node resources from being properly released
during driver unbind operations.
As comment of device_find_child() says, 'NOTE: you will need to drop
the reference with put_device() after use'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/mediatek/mtk_drm_drv.c | 27 +++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 7c0c12dde488..c78186debd3e 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -388,19 +388,24 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
of_id = of_match_node(mtk_drm_of_ids, node);
if (!of_id)
- continue;
+ goto next;
pdev = of_find_device_by_node(node);
if (!pdev)
- continue;
+ goto next;
drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match);
- if (!drm_dev)
- continue;
+ if (!drm_dev) {
+ put_device(&pdev->dev);
+ goto next;
+ }
temp_drm_priv = dev_get_drvdata(drm_dev);
- if (!temp_drm_priv)
- continue;
+ if (!temp_drm_priv) {
+ put_device(drm_dev);
+ put_device(&pdev->dev);
+ goto next;
+ }
if (temp_drm_priv->data->main_len)
all_drm_priv[CRTC_MAIN] = temp_drm_priv;
@@ -412,10 +417,14 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
if (temp_drm_priv->mtk_drm_bound)
cnt++;
- if (cnt == MAX_CRTC) {
- of_node_put(node);
+ put_device(drm_dev);
+ put_device(&pdev->dev);
+
+next:
+ of_node_put(node);
+
+ if (cnt == MAX_CRTC)
break;
- }
}
if (drm_priv->data->mmsys_dev_num == cnt) {
--
2.25.1