- Linux-stable-mirror - lists.linaro.org

[PATCH 0/3] driver core: class: Fix bug and code improvements for class APIs

by Zijun Hu

This patch series is to - Fix an potential wild pointer dereference bug for API: class_dev_iter_next() - Improve the following APIs: class_for_each_device() class_find_device() Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- Zijun Hu (3): driver core: class: Fix wild pointer dereference in API class_dev_iter_next() driver core: class: Correct WARN() message in APIs class_(for_each|find)_device() driver core: class: Delete a redundant check in APIs class_(for_each|find)_device() drivers/base/class.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) --- base-commit: 9bd133f05b1dca5ca4399a76d04d0f6f4d454e44 change-id: 20241104-class_fix-f176bd9eba22 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

10 months, 2 weeks

2
6
0 0

New GaN Wall charger 45W-Sample!

by Vicky

10 months, 2 weeks

1
0
0 0

[PATCH v2] mm/readahead: Fix large folio support in async readahead

by Yafang Shao

When testing large folio support with XFS on our servers, we observed that only a few large folios are mapped when reading large files via mmap. After a thorough analysis, I identified it was caused by the `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this parameter is set to 128KB. After I tune it to 2MB, the large folio can work as expected. However, I believe the large folio behavior should not be dependent on the value of read_ahead_kb. It would be more robust if the kernel can automatically adopt to it. With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a sequential read on a 1GB file using MADV_HUGEPAGE, the differences in /proc/meminfo are as follows: - before this patch FileHugePages: 18432 kB FilePmdMapped: 4096 kB - after this patch FileHugePages: 1067008 kB FilePmdMapped: 1048576 kB This shows that after applying the patch, the entire 1GB file is mapped to huge pages. The stable list is CCed, as without this patch, large folios don’t function optimally in the readahead path. It's worth noting that if read_ahead_kb is set to a larger value that isn't aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail to map to hugepages. Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Suggested-by: Matthew Wilcox <willy(a)infradead.org> Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com> Cc: stable(a)vger.kernel.org --- mm/readahead.c | 2 ++ 1 file changed, 2 insertions(+) Changes: v1->v2: - Drop the align (Matthew) - Improve commit log (Andrew) RFC->v1: https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.shao@gmail.com/ - Simplify the code as suggested by Matthew RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@gmail.co… diff --git a/mm/readahead.c b/mm/readahead.c index 3dc6c7a128dd..9b8a48e736c6 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -385,6 +385,8 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra, return 4 * cur; if (cur <= max / 2) return 2 * cur; + if (cur > max) + return cur; return max; } -- 2.43.5

10 months, 2 weeks

3
16
0 0

[PATCH] accel/ivpu: Fix Qemu crash when running in passthrough

by Jacek Lawrynowicz

Restore PCI state after putting the NPU in D0. Restoring state before powering up the device caused a Qemu crash if NPU was running in passthrough mode and recovery was performed. Fixes: 3534eacbf101 ("accel/ivpu: Fix PCI D0 state entry in resume") Cc: <stable(a)vger.kernel.org> # v6.8+ Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski(a)linux.intel.com> --- drivers/accel/ivpu/ivpu_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index 59d3170f5e354..5aac3d64045d3 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -73,8 +73,8 @@ static int ivpu_resume(struct ivpu_device *vdev) int ret; retry: - pci_restore_state(to_pci_dev(vdev->drm.dev)); pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0); + pci_restore_state(to_pci_dev(vdev->drm.dev)); ret = ivpu_hw_power_up(vdev); if (ret) { -- 2.45.1

10 months, 2 weeks

1
1
0 0

[PATCH] iov_iter: fix copy_page_from_iter_atomic() for highmem

by Christian Brauner

When fixing copy_page_from_iter_atomic() in c749d9b7ebbc ("iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP") the check for PageHighMem() got moved out of the loop. If copy_page_from_iter_atomic() crosses page boundaries it will use a stale PageHighMem() check for an earlier page. Fixes: 908a1ad89466 ("iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()") Fixes: c749d9b7ebbc ("iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP") Cc: stable(a)vger.kernel.org Reviewed-by: David Howells <dhowells(a)redhat.com> Signed-off-by: Christian Brauner <brauner(a)kernel.org> --- Hey Linus, I think the original fix was buggy but then again my knowledge of highmem isn't particularly detailed. Compile tested only. If correct, I would ask you to please apply it directly. Thanks! Christian --- lib/iov_iter.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 908e75a28d90..e90a5ababb11 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -457,12 +457,16 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) } EXPORT_SYMBOL(iov_iter_zero); +static __always_inline bool iter_atomic_uses_kmap(struct page *page) +{ + return IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) || + PageHighMem(page); +} + size_t copy_page_from_iter_atomic(struct page *page, size_t offset, size_t bytes, struct iov_iter *i) { size_t n, copied = 0; - bool uses_kmap = IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) || - PageHighMem(page); if (!page_copy_sane(page, offset, bytes)) return 0; @@ -473,7 +477,7 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset, char *p; n = bytes - copied; - if (uses_kmap) { + if (iter_atomic_uses_kmap(page)) { page += offset / PAGE_SIZE; offset %= PAGE_SIZE; n = min_t(size_t, n, PAGE_SIZE - offset); @@ -484,7 +488,7 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset, kunmap_atomic(p); copied += n; offset += n; - } while (uses_kmap && copied != bytes && n > 0); + } while (iter_atomic_uses_kmap(page) && copied != bytes && n > 0); return copied; } -- 2.45.2

10 months, 2 weeks

3
3
0 0

[PATCH 6.1] platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors

by Xiangyu Chen

From: Xiangyu Chen <xiangyu.chen(a)windriver.com> [ Upstream commit 2fae3129c0c08e72b1fe93e61fd8fd203252094a ] x86_android_tablet_remove() frees the pdevs[] array, so it should not be used after calling x86_android_tablet_remove(). When platform_device_register() fails, store the pdevs[x] PTR_ERR() value into the local ret variable before calling x86_android_tablet_remove() to avoid using pdevs[] after it has been freed. Fixes: 5eba0141206e ("platform/x86: x86-android-tablets: Add support for instantiating platform-devs") Fixes: e2200d3f26da ("platform/x86: x86-android-tablets: Add gpio_keys support to x86_android_tablet_init()") Cc: stable(a)vger.kernel.org Reported-by: Aleksandr Burakov <a.burakov(a)rosalinux.ru> Closes: https://lore.kernel.org/platform-driver-x86/20240917120458.7300-1-a.burakov… Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> Link: https://lore.kernel.org/r/20241005130545.64136-1-hdegoede@redhat.com [Xiangyu: Modified file path to backport this commit to fix CVE: CVE-2024-49986] Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> --- drivers/platform/x86/x86-android-tablets.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/x86-android-tablets.c b/drivers/platform/x86/x86-android-tablets.c index 9178076d9d7d..94710471d7dd 100644 --- a/drivers/platform/x86/x86-android-tablets.c +++ b/drivers/platform/x86/x86-android-tablets.c @@ -1853,8 +1853,9 @@ static __init int x86_android_tablet_init(void) for (i = 0; i < pdev_count; i++) { pdevs[i] = platform_device_register_full(&dev_info->pdev_info[i]); if (IS_ERR(pdevs[i])) { + ret = PTR_ERR(pdevs[i]); x86_android_tablet_cleanup(); - return PTR_ERR(pdevs[i]); + return ret; } } -- 2.43.0

10 months, 2 weeks

1
0
0 0

[PATCH 6.1] ext4: fix timer use-after-free on failed mount

by Xiangyu Chen

From: Xiaxi Shen <shenxiaxi26(a)gmail.com> commit 0ce160c5bdb67081a62293028dc85758a8efb22a upstream. Syzbot has found an ODEBUG bug in ext4_fill_super The del_timer_sync function cancels the s_err_report timer, which reminds about filesystem errors daily. We should guarantee the timer is no longer active before kfree(sbi). When filesystem mounting fails, the flow goes to failed_mount3, where an error occurs when ext4_stop_mmpd is called, causing a read I/O failure. This triggers the ext4_handle_error function that ultimately re-arms the timer, leaving the s_err_report timer active before kfree(sbi) is called. Fix the issue by canceling the s_err_report timer after calling ext4_stop_mmpd. Signed-off-by: Xiaxi Shen <shenxiaxi26(a)gmail.com> Reported-and-tested-by: syzbot+59e0101c430934bc9a36(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=59e0101c430934bc9a36 Link: https://patch.msgid.link/20240715043336.98097-1-shenxiaxi26@gmail.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Cc: stable(a)kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3bf214d4afef..987d49e18dbe 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5617,8 +5617,8 @@ failed_mount9: __maybe_unused failed_mount3: /* flush s_error_work before sbi destroy */ flush_work(&sbi->s_error_work); - del_timer_sync(&sbi->s_err_report); ext4_stop_mmpd(sbi); + del_timer_sync(&sbi->s_err_report); ext4_group_desc_free(sbi); failed_mount: if (sbi->s_chksum_driver) -- 2.43.0

10 months, 2 weeks

1
0
0 0

[PATCH v3] usb: xhci: quirk for data loss in ISOC transfers

by Raju Rangoju

During the High-Speed Isochronous Audio transfers, xHCI controller on certain AMD platforms experiences momentary data loss. This results in Missed Service Errors (MSE) being generated by the xHCI. The root cause of the MSE is attributed to the ISOC OUT endpoint being omitted from scheduling. This can happen either when an IN endpoint with a 64ms service interval is pre-scheduled prior to the ISOC OUT endpoint or when the interval of the ISOC OUT endpoint is shorter than that of the IN endpoint. Consequently, the OUT service is neglected when an IN endpoint with a service interval exceeding 32ms is scheduled concurrently (every 64ms in this scenario). This issue is particularly seen on certain older AMD platforms. To mitigate this problem, it is recommended to adjust the service interval of the IN endpoint to not exceed 32ms (interval 8). This adjustment ensures that the OUT endpoint will not be bypassed, even if a smaller interval value is utilized. Cc: stable(a)vger.kernel.org Signed-off-by: Raju Rangoju <Raju.Rangoju(a)amd.com> --- Changes since v2: - added stable tag to backport to all stable kernels Changes since v1: - replaced hex values with pci device names - corrected the commit message drivers/usb/host/xhci-mem.c | 5 +++++ drivers/usb/host/xhci-pci.c | 25 +++++++++++++++++++++++++ drivers/usb/host/xhci.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index d2900197a49e..4892bb9afa6e 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1426,6 +1426,11 @@ int xhci_endpoint_init(struct xhci_hcd *xhci, /* Periodic endpoint bInterval limit quirk */ if (usb_endpoint_xfer_int(&ep->desc) || usb_endpoint_xfer_isoc(&ep->desc)) { + if ((xhci->quirks & XHCI_LIMIT_ENDPOINT_INTERVAL_9) && + usb_endpoint_xfer_int(&ep->desc) && + interval >= 9) { + interval = 8; + } if ((xhci->quirks & XHCI_LIMIT_ENDPOINT_INTERVAL_7) && udev->speed >= USB_SPEED_HIGH && interval >= 7) { diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index cb07cee9ed0c..82657ca30030 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -69,12 +69,22 @@ #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_XHCI 0x15ec #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_XHCI 0x15f0 +#define PCI_DEVICE_ID_AMD_ARIEL_TYPEC_XHCI 0x13ed +#define PCI_DEVICE_ID_AMD_ARIEL_TYPEA_XHCI 0x13ee +#define PCI_DEVICE_ID_AMD_STARSHIP_XHCI 0x148c +#define PCI_DEVICE_ID_AMD_FIREFLIGHT_15D4_XHCI 0x15d4 +#define PCI_DEVICE_ID_AMD_FIREFLIGHT_15D5_XHCI 0x15d5 +#define PCI_DEVICE_ID_AMD_RAVEN_15E0_XHCI 0x15e0 +#define PCI_DEVICE_ID_AMD_RAVEN_15E1_XHCI 0x15e1 +#define PCI_DEVICE_ID_AMD_RAVEN2_XHCI 0x15e5 #define PCI_DEVICE_ID_AMD_RENOIR_XHCI 0x1639 #define PCI_DEVICE_ID_AMD_PROMONTORYA_4 0x43b9 #define PCI_DEVICE_ID_AMD_PROMONTORYA_3 0x43ba #define PCI_DEVICE_ID_AMD_PROMONTORYA_2 0x43bb #define PCI_DEVICE_ID_AMD_PROMONTORYA_1 0x43bc +#define PCI_DEVICE_ID_ATI_NAVI10_7316_XHCI 0x7316 + #define PCI_DEVICE_ID_ASMEDIA_1042_XHCI 0x1042 #define PCI_DEVICE_ID_ASMEDIA_1042A_XHCI 0x1142 #define PCI_DEVICE_ID_ASMEDIA_1142_XHCI 0x1242 @@ -284,6 +294,21 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) if (pdev->vendor == PCI_VENDOR_ID_NEC) xhci->quirks |= XHCI_NEC_HOST; + if (pdev->vendor == PCI_VENDOR_ID_AMD && + (pdev->device == PCI_DEVICE_ID_AMD_ARIEL_TYPEC_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_ARIEL_TYPEA_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_STARSHIP_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_FIREFLIGHT_15D4_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_FIREFLIGHT_15D5_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_RAVEN_15E0_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_RAVEN_15E1_XHCI || + pdev->device == PCI_DEVICE_ID_AMD_RAVEN2_XHCI)) + xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_9; + + if (pdev->vendor == PCI_VENDOR_ID_ATI && + pdev->device == PCI_DEVICE_ID_ATI_NAVI10_7316_XHCI) + xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_9; + if (pdev->vendor == PCI_VENDOR_ID_AMD && xhci->hci_version == 0x96) xhci->quirks |= XHCI_AMD_0x96_HOST; diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index f0fb696d5619..fa69f7ac09b5 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1624,6 +1624,7 @@ struct xhci_hcd { #define XHCI_ZHAOXIN_HOST BIT_ULL(46) #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48) +#define XHCI_LIMIT_ENDPOINT_INTERVAL_9 BIT_ULL(49) unsigned int num_active_eps; unsigned int limit_active_eps; -- 2.34.1

10 months, 2 weeks

1
0
0 0

Re: No sound on speakers X1 Carbon Gen 12

by Takashi Iwai

On Sun, 20 Oct 2024 17:12:14 +0200, Dean Matthew Menezes wrote: > > The first change worked to fix the sound from the speaker. Then please double-check whether my original fix in https://lore.kernel.org/87cyjzrutw.wl-tiwai@suse.de really doesn't bring back the speaker output. If it's confirmed to be broken, run as root: echo 1 > /sys/module/snd_hda_codec/parameters/dump_coef and get alsa-info.sh outputs from both working and patched-but-not-working cases again, but at this time, during the playback. (Also, please keep Cc.) thanks, Takashi

10 months, 2 weeks

3
20
0 0

[PATCH 6.1] blk-iocost: do not WARN if iocg was already offlined

by Xiangyu Chen

From: Li Nan <linan122(a)huawei.com> [ Upstream commit 01bc4fda9ea0a6b52f12326486f07a4910666cf6 ] In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which is intended to confirm iocg is active when it has debt. However, warn can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn() is run at that time: WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190 Call trace: iocg_pay_debt+0x14c/0x190 iocg_kick_waitq+0x438/0x4c0 iocg_waitq_timer_fn+0xd8/0x130 __run_hrtimer+0x144/0x45c __hrtimer_run_queues+0x16c/0x244 hrtimer_interrupt+0x2cc/0x7b0 The warn in this situation is meaningless. Since this iocg is being removed, the state of the 'active_list' is irrelevant, and 'waitq_timer' is canceled after removing 'active_list' in ioc_pd_free(), which ensures iocg is freed after iocg_waitq_timer_fn() returns. Therefore, add the check if iocg was already offlined to avoid warn when removing a blkcg or disk. Signed-off-by: Li Nan <linan122(a)huawei.com> Reviewed-by: Yu Kuai <yukuai3(a)huawei.com> Acked-by: Tejun Heo <tj(a)kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> --- block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 772e909e9fbf..12affc18d030 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1423,8 +1423,11 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, lockdep_assert_held(&iocg->ioc->lock); lockdep_assert_held(&iocg->waitq.lock); - /* make sure that nobody messed with @iocg */ - WARN_ON_ONCE(list_empty(&iocg->active_list)); + /* + * make sure that nobody messed with @iocg. Check iocg->pd.online + * to avoid warn when removing blkcg or disk. + */ + WARN_ON_ONCE(list_empty(&iocg->active_list) && iocg->pd.online); WARN_ON_ONCE(iocg->inuse > 1); iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); -- 2.43.0

10 months, 2 weeks

2
1
0 0

[PATCH 6.1] Bluetooth: L2CAP: Fix uaf in l2cap_connect

by Xiangyu Chen

From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> [ Upstream commit 333b4fd11e89b29c84c269123f871883a30be586 ] [Syzbot reported] BUG: KASAN: slab-use-after-free in l2cap_connect.constprop.0+0x10d8/0x1270 net/bluetooth/l2cap_core.c:3949 Read of size 8 at addr ffff8880241e9800 by task kworker/u9:0/54 CPU: 0 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-00268-g788220eee30d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Workqueue: hci2 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0xc3/0x620 mm/kasan/report.c:488 kasan_report+0xd9/0x110 mm/kasan/report.c:601 l2cap_connect.constprop.0+0x10d8/0x1270 net/bluetooth/l2cap_core.c:3949 l2cap_connect_req net/bluetooth/l2cap_core.c:4080 [inline] l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:4772 [inline] l2cap_sig_channel net/bluetooth/l2cap_core.c:5543 [inline] l2cap_recv_frame+0xf0b/0x8eb0 net/bluetooth/l2cap_core.c:6825 l2cap_recv_acldata+0x9b4/0xb70 net/bluetooth/l2cap_core.c:7514 hci_acldata_packet net/bluetooth/hci_core.c:3791 [inline] hci_rx_work+0xaab/0x1610 net/bluetooth/hci_core.c:4028 process_one_work+0x9c5/0x1b40 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xed0 kernel/workqueue.c:3389 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 ... Freed by task 5245: kasan_save_stack+0x33/0x60 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579 poison_slab_object+0xf7/0x160 mm/kasan/common.c:240 __kasan_slab_free+0x32/0x50 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2256 [inline] slab_free mm/slub.c:4477 [inline] kfree+0x12a/0x3b0 mm/slub.c:4598 l2cap_conn_free net/bluetooth/l2cap_core.c:1810 [inline] kref_put include/linux/kref.h:65 [inline] l2cap_conn_put net/bluetooth/l2cap_core.c:1822 [inline] l2cap_conn_del+0x59d/0x730 net/bluetooth/l2cap_core.c:1802 l2cap_connect_cfm+0x9e6/0xf80 net/bluetooth/l2cap_core.c:7241 hci_connect_cfm include/net/bluetooth/hci_core.h:1960 [inline] hci_conn_failed+0x1c3/0x370 net/bluetooth/hci_conn.c:1265 hci_abort_conn_sync+0x75a/0xb50 net/bluetooth/hci_sync.c:5583 abort_conn_sync+0x197/0x360 net/bluetooth/hci_conn.c:2917 hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:328 process_one_work+0x9c5/0x1b40 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xed0 kernel/workqueue.c:3389 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Reported-by: syzbot+c12e2f941af1feb5632c(a)syzkaller.appspotmail.com Tested-by: syzbot+c12e2f941af1feb5632c(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c12e2f941af1feb5632c Fixes: 7b064edae38d ("Bluetooth: Fix authentication if acl data comes before remote feature evt") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> [Xiangyu: Modified to bp this commit to fix CVE-2024-49950] Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> --- net/bluetooth/hci_core.c | 2 ++ net/bluetooth/hci_event.c | 2 +- net/bluetooth/l2cap_core.c | 9 --------- 3 files changed, 3 insertions(+), 10 deletions(-) diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 993b98257bc2..3039dc5fbe75 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3859,6 +3859,8 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb) hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, handle); + if (conn && hci_dev_test_flag(hdev, HCI_MGMT)) + mgmt_device_connected(hdev, conn, NULL, 0); hci_dev_unlock(hdev); if (conn) { diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 0c0141c59fd1..e5ca2d188c1a 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -3789,7 +3789,7 @@ static void hci_remote_features_evt(struct hci_dev *hdev, void *data, goto unlock; } - if (!ev->status && !test_bit(HCI_CONN_MGMT_CONNECTED, &conn->flags)) { + if (!ev->status) { struct hci_cp_remote_name_req cp; memset(&cp, 0, sizeof(cp)); bacpy(&cp.bdaddr, &conn->dst); diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 209c6d458d33..187c91843876 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4300,18 +4300,9 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, static int l2cap_connect_req(struct l2cap_conn *conn, struct l2cap_cmd_hdr *cmd, u16 cmd_len, u8 *data) { - struct hci_dev *hdev = conn->hcon->hdev; - struct hci_conn *hcon = conn->hcon; - if (cmd_len < sizeof(struct l2cap_conn_req)) return -EPROTO; - hci_dev_lock(hdev); - if (hci_dev_test_flag(hdev, HCI_MGMT) && - !test_and_set_bit(HCI_CONN_MGMT_CONNECTED, &hcon->flags)) - mgmt_device_connected(hdev, hcon, NULL, 0); - hci_dev_unlock(hdev); - l2cap_connect(conn, cmd, data, L2CAP_CONN_RSP, 0); return 0; } -- 2.43.0

10 months, 2 weeks

1
0
0 0

[net v2 1/2] netdev-genl: Hold rcu_read_lock in napi_get

by Joe Damato

Hold rcu_read_lock in netdev_nl_napi_get_doit, which calls napi_by_id and is required to be called under rcu_read_lock. Cc: stable(a)vger.kernel.org Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi") Signed-off-by: Joe Damato <jdamato(a)fastly.com> --- v2: - Simplified by removing the helper and calling rcu_read_lock / unlock directly instead. net/core/netdev-genl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 765ce7c9d73b..0b684410b52d 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -233,6 +233,7 @@ int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info) return -ENOMEM; rtnl_lock(); + rcu_read_lock(); napi = napi_by_id(napi_id); if (napi) { @@ -242,6 +243,7 @@ int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info) err = -ENOENT; } + rcu_read_unlock(); rtnl_unlock(); if (err) -- 2.25.1

10 months, 2 weeks

1
0
0 0

[RFC net 1/2] netdev-genl: Hold rcu_read_lock in napi_get

by Joe Damato

Hold rcu_read_lock in netdev_nl_napi_get_doit, which calls napi_by_id and is required to be called under rcu_read_lock. Cc: stable(a)vger.kernel.org Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi") Signed-off-by: Joe Damato <jdamato(a)fastly.com> --- net/core/netdev-genl.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 765ce7c9d73b..934c63a93524 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -216,6 +216,23 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi, return -EMSGSIZE; } +/* must be called under rcu_read_lock(), because napi_by_id requires it */ +static struct napi_struct *__do_napi_by_id(unsigned int napi_id, + struct genl_info *info, int *err) +{ + struct napi_struct *napi; + + napi = napi_by_id(napi_id); + if (napi) { + *err = 0; + } else { + NL_SET_BAD_ATTR(info->extack, info->attrs[NETDEV_A_NAPI_ID]); + *err = -ENOENT; + } + + return napi; +} + int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info) { struct napi_struct *napi; @@ -233,15 +250,13 @@ int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info) return -ENOMEM; rtnl_lock(); + rcu_read_lock(); - napi = napi_by_id(napi_id); - if (napi) { + napi = __do_napi_by_id(napi_id, info, &err); + if (!err) err = netdev_nl_napi_fill_one(rsp, napi, info); - } else { - NL_SET_BAD_ATTR(info->extack, info->attrs[NETDEV_A_NAPI_ID]); - err = -ENOENT; - } + rcu_read_unlock(); rtnl_unlock(); if (err) -- 2.25.1

10 months, 2 weeks

2
4
0 0

+ crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 has been added to the -mm mm-hotfixes-unstable branch. Its filename is crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Dave Vasilevsky <dave(a)vasilevsky.ca> Subject: crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 Date: Tue, 17 Sep 2024 12:37:20 -0400 Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware. On these machines, the kernel refuses to boot from non-zero PHYSICAL_START, which occurs when CRASH_DUMP is on. Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should default to off for them. Users booting via some other mechanism can still turn it on explicitly. Does not change the default on any other architectures for the time being. Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca Fixes: 75bc255a7444 ("crash: clean up kdump related config items") Signed-off-by: Dave Vasilevsky <dave(a)vasilevsky.ca> Reported-by: Reimar D��ffinger <Reimar.Doeffinger(a)gmx.de> Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html Acked-by: Michael Ellerman <mpe(a)ellerman.id.au> [powerpc] Acked-by: Baoquan He <bhe(a)redhat.com> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: John Paul Adrian Glaubitz <glaubitz(a)physik.fu-berlin.de> Cc: Reimar D��ffinger <Reimar.Doeffinger(a)gmx.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/arm/Kconfig | 3 +++ arch/arm64/Kconfig | 3 +++ arch/loongarch/Kconfig | 3 +++ arch/mips/Kconfig | 3 +++ arch/powerpc/Kconfig | 4 ++++ arch/riscv/Kconfig | 3 +++ arch/s390/Kconfig | 3 +++ arch/sh/Kconfig | 3 +++ arch/x86/Kconfig | 3 +++ kernel/Kconfig.kexec | 2 +- 10 files changed, 29 insertions(+), 1 deletion(-) --- a/arch/arm64/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/arm64/Kconfig @@ -1576,6 +1576,9 @@ config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_S config ARCH_SUPPORTS_CRASH_DUMP def_bool y +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION def_bool CRASH_RESERVE --- a/arch/arm/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/arm/Kconfig @@ -1598,6 +1598,9 @@ config ATAGS_PROC config ARCH_SUPPORTS_CRASH_DUMP def_bool y +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config AUTO_ZRELADDR bool "Auto calculation of the decompressed kernel image address" if !ARCH_MULTIPLATFORM default !(ARCH_FOOTBRIDGE || ARCH_RPC || ARCH_SA1100) --- a/arch/loongarch/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/loongarch/Kconfig @@ -604,6 +604,9 @@ config ARCH_SUPPORTS_KEXEC config ARCH_SUPPORTS_CRASH_DUMP def_bool y +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config ARCH_SELECTS_CRASH_DUMP def_bool y depends on CRASH_DUMP --- a/arch/mips/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/mips/Kconfig @@ -2876,6 +2876,9 @@ config ARCH_SUPPORTS_KEXEC config ARCH_SUPPORTS_CRASH_DUMP def_bool y +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config PHYSICAL_START hex "Physical address where the kernel is loaded" default "0xffffffff84000000" --- a/arch/powerpc/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/powerpc/Kconfig @@ -684,6 +684,10 @@ config RELOCATABLE_TEST config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) +config ARCH_DEFAULT_CRASH_DUMP + bool + default y if !PPC_BOOK3S_32 + config ARCH_SELECTS_CRASH_DUMP def_bool y depends on CRASH_DUMP --- a/arch/riscv/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/riscv/Kconfig @@ -898,6 +898,9 @@ config ARCH_SUPPORTS_KEXEC_PURGATORY config ARCH_SUPPORTS_CRASH_DUMP def_bool y +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION def_bool CRASH_RESERVE --- a/arch/s390/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/s390/Kconfig @@ -276,6 +276,9 @@ config ARCH_SUPPORTS_CRASH_DUMP This option also enables s390 zfcpdump. See also <file:Documentation/arch/s390/zfcpdump.rst> +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + menu "Processor type and features" config HAVE_MARCH_Z10_FEATURES --- a/arch/sh/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/sh/Kconfig @@ -550,6 +550,9 @@ config ARCH_SUPPORTS_KEXEC config ARCH_SUPPORTS_CRASH_DUMP def_bool BROKEN_ON_SMP +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config ARCH_SUPPORTS_KEXEC_JUMP def_bool y --- a/arch/x86/Kconfig~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/arch/x86/Kconfig @@ -2084,6 +2084,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP config ARCH_SUPPORTS_CRASH_DUMP def_bool X86_64 || (X86_32 && HIGHMEM) +config ARCH_DEFAULT_CRASH_DUMP + def_bool y + config ARCH_SUPPORTS_CRASH_HOTPLUG def_bool y --- a/kernel/Kconfig.kexec~crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32 +++ a/kernel/Kconfig.kexec @@ -97,7 +97,7 @@ config KEXEC_JUMP config CRASH_DUMP bool "kernel crash dumps" - default y + default ARCH_DEFAULT_CRASH_DUMP depends on ARCH_SUPPORTS_CRASH_DUMP depends on KEXEC_CORE select VMCORE_INFO _ Patches currently in -mm which might be from dave(a)vasilevsky.ca are crash-powerpc-default-to-crash_dump=n-on-ppc_book3s_32.patch

10 months, 2 weeks

1
0
0 0

[PATCH] MAINTAINERS: update Alexey Makhalov's email address

by Alexey Makhalov

Fix a typo in an email address. Reported-by: Konstantin Ryabitsev <konstantin(a)linuxfoundation.org> Closes: https://lore.kernel.org/all/20240925-rational-succinct-vulture-cca9fb@lemur… Signed-off-by: Alexey Makhalov <alexey.makhalov(a)broadcom.com> --- MAINTAINERS | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 21fdaa19229a..bfc902d7925a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17503,7 +17503,7 @@ F: include/uapi/linux/ppdev.h PARAVIRT_OPS INTERFACE M: Juergen Gross <jgross(a)suse.com> R: Ajay Kaher <ajay.kaher(a)broadcom.com> -R: Alexey Makhalov <alexey.amakhalov(a)broadcom.com> +R: Alexey Makhalov <alexey.makhalov(a)broadcom.com> R: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com> L: virtualization(a)lists.linux.dev L: x86(a)kernel.org @@ -24691,7 +24691,7 @@ F: drivers/misc/vmw_balloon.c VMWARE HYPERVISOR INTERFACE M: Ajay Kaher <ajay.kaher(a)broadcom.com> -M: Alexey Makhalov <alexey.amakhalov(a)broadcom.com> +M: Alexey Makhalov <alexey.makhalov(a)broadcom.com> R: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com> L: virtualization(a)lists.linux.dev L: x86(a)kernel.org @@ -24719,7 +24719,7 @@ F: drivers/scsi/vmw_pvscsi.h VMWARE VIRTUAL PTP CLOCK DRIVER M: Nick Shi <nick.shi(a)broadcom.com> R: Ajay Kaher <ajay.kaher(a)broadcom.com> -R: Alexey Makhalov <alexey.amakhalov(a)broadcom.com> +R: Alexey Makhalov <alexey.makhalov(a)broadcom.com> R: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com> L: netdev(a)vger.kernel.org S: Supported -- 2.39.4

10 months, 2 weeks

2
1
0 0

[PATCH v6 1/8] KVM: SVM: Fix gctx page leak on invalid inputs

by Dionna Glaze

Ensure that snp gctx page allocation is adequately deallocated on failure during snp_launch_start. Fixes: 136d8bc931c8 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command") CC: Sean Christopherson <seanjc(a)google.com> CC: Paolo Bonzini <pbonzini(a)redhat.com> CC: Thomas Gleixner <tglx(a)linutronix.de> CC: Ingo Molnar <mingo(a)redhat.com> CC: Borislav Petkov <bp(a)alien8.de> CC: Dave Hansen <dave.hansen(a)linux.intel.com> CC: Ashish Kalra <ashish.kalra(a)amd.com> CC: Tom Lendacky <thomas.lendacky(a)amd.com> CC: John Allen <john.allen(a)amd.com> CC: Herbert Xu <herbert(a)gondor.apana.org.au> CC: "David S. Miller" <davem(a)davemloft.net> CC: Michael Roth <michael.roth(a)amd.com> CC: Luis Chamberlain <mcgrof(a)kernel.org> CC: Russ Weight <russ.weight(a)linux.dev> CC: Danilo Krummrich <dakr(a)redhat.com> CC: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> CC: "Rafael J. Wysocki" <rafael(a)kernel.org> CC: Tianfei zhang <tianfei.zhang(a)intel.com> CC: Alexey Kardashevskiy <aik(a)amd.com> CC: stable(a)vger.kernel.org Signed-off-by: Dionna Glaze <dionnaglaze(a)google.com> Acked-by: Sean Christopherson <seanjc(a)google.com> --- arch/x86/kvm/svm/sev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index c6c8524859001..357906375ec59 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2212,10 +2212,6 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) if (sev->snp_context) return -EINVAL; - sev->snp_context = snp_context_create(kvm, argp); - if (!sev->snp_context) - return -ENOTTY; - if (params.flags) return -EINVAL; @@ -2230,6 +2226,10 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) return -EINVAL; + sev->snp_context = snp_context_create(kvm, argp); + if (!sev->snp_context) + return -ENOTTY; + start.gctx_paddr = __psp_pa(sev->snp_context); start.policy = params.policy; memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw)); -- 2.47.0.277.g8800431eea-goog

10 months, 2 weeks

1
0
0 0

[PATCH] arm64: dts: rockchip: Fix vdd_gpu voltage constraints on PinePhone Pro

by Dragan Simic

The regulator-{min,max}-microvolt values for the vdd_gpu regulator in the PinePhone Pro device dts file are too restrictive, which prevents the highest GPU OPP from being used, slowing the GPU down unnecessarily. Let's fix that by making the regulator-{min,max}-microvolt values less strict, using the voltage range that the Silergy SYR838 chip used for the vdd_gpu regulator is actually capable of producing. [1][2] This also eliminates the following error messages from the kernel log: core: _opp_supported_by_regulators: OPP minuV: 1100000 maxuV: 1150000, not supported by regulator panfrost ff9a0000.gpu: _opp_add: OPP not supported by regulators (800000000) These changes to the regulator-{min,max}-microvolt values make the PinePhone Pro device dts consistent with the dts files for other Rockchip RK3399-based boards and devices. It's possible to be more strict here, by specifying the regulator-{min,max}-microvolt values that don't go outside of what the GPU actually may use, as the consumer of the vdd_gpu regulator, but those changes are left for a later directory-wide regulator cleanup. [1] https://files.pine64.org/doc/PinePhonePro/PinephonePro-Schematic-V1.0-20211… [2] https://www.t-firefly.com/download/Firefly-RK3399/docs/Chip%20Specification… Fixes: 78a21c7d5952 ("arm64: dts: rockchip: Add initial support for Pine64 PinePhone Pro") Cc: stable(a)vger.kernel.org Signed-off-by: Dragan Simic <dsimic(a)manjaro.org> --- arch/arm64/boot/dts/rockchip/rk3399-pinephone-pro.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinephone-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinephone-pro.dts index 1a44582a49fb..956d64f5b271 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-pinephone-pro.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-pinephone-pro.dts @@ -410,8 +410,8 @@ vdd_gpu: regulator@41 { pinctrl-names = "default"; pinctrl-0 = <&vsel2_pin>; regulator-name = "vdd_gpu"; - regulator-min-microvolt = <875000>; - regulator-max-microvolt = <975000>; + regulator-min-microvolt = <712500>; + regulator-max-microvolt = <1500000>; regulator-ramp-delay = <1000>; regulator-always-on; regulator-boot-on;

10 months, 2 weeks

4
9
0 0

[PATCH 5.10/5.15/6.1 0/5] x86/mm: backport fixes for CVE-2023-0597 and CVE-2023-3640

by Vasiliy Kovalev

This series addresses two security vulnerabilities (CVE-2023-0597 [1], CVE-2023-3640 [2]) in the x86 memory management subsystem, alongside prerequisite [3] patches necessary for stable integration. [PATCH 5.10/5.15/6.1 1/5] x86/kasan: Map shadow for percpu pages on demand Ensures KASAN shadow mapping on demand for per-CPU pages. [PATCH 5.10/5.15/6.1 2/5] x86/mm: Recompute physical address for every page of per-CPU CEA mapping Calculates accurate physical addresses across CPU entry areas. [PATCH 5.10/5.15/6.1 3/5] x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area Populates KASAN shadow memory for debugging across CPU entry areas. [PATCH 5.10/5.15/6.1 4/5] x86/mm: Randomize per-cpu entry area Randomizes the per-CPU entry area to reduce the risk of information leakage due to predictable memory layouts, especially in systems without KASLR, as described in CVE-2023-0597 [1]. [PATCH 5.10/5.15/6.1 5/5] x86/mm: Do not shuffle CPU entry areas without KASLR Prevents CPU entry area shuffling when KASLR is disabled, mitigating information leakage risks, as stated in CVE-2023-3640 [2]. [1] https://nvd.nist.gov/vuln/detail/CVE-2023-0597 [2] https://nvd.nist.gov/vuln/detail/CVE-2023-3640 [3] https://patchwork.ozlabs.org/project/ubuntu-kernel/cover/20230903234603.859…

10 months, 2 weeks

1
5
0 0

[PATCH v3] usb: dwc3: gadget: Add missing check for single port RAM in TxFIFO resizing logic

by Selvarasu Ganesan

The existing implementation of the TxFIFO resizing logic only supports scenarios where more than one port RAM is used. However, there is a need to resize the TxFIFO in USB2.0-only mode where only a single port RAM is available. This commit introduces the necessary changes to support TxFIFO resizing in such scenarios by adding a missing check for single port RAM. This fix addresses certain platform configurations where the existing TxFIFO resizing logic does not work properly due to the absence of support for single port RAM. By adding this missing check, we ensure that the TxFIFO resizing logic works correctly in all scenarios, including those with a single port RAM. Fixes: 9f607a309fbe ("usb: dwc3: Resize TX FIFOs to meet EP bursting requirements") Cc: stable(a)vger.kernel.org # 6.12.x: fad16c82: usb: dwc3: gadget: Refine the logic for resizing Tx FIFOs Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com> --- Changes in v3: - Updated the $subject and commit message. - Added Fixes tag, and addressed some minor comments from reviewer . - Link to v2: https://lore.kernel.org/linux-usb/20241111142049.604-1-selvarasu.g@samsung.… Changes in v2: - Removed the code change that limits the number of FIFOs for bulk EP, as plan to address this issue in a separate patch. - Renamed the variable spram_type to is_single_port_ram for better understanding. - Link to v1: https://lore.kernel.org/lkml/20241107104040.502-1-selvarasu.g@samsung.com/ --- drivers/usb/dwc3/core.h | 4 +++ drivers/usb/dwc3/gadget.c | 54 +++++++++++++++++++++++++++++++++------ 2 files changed, 50 insertions(+), 8 deletions(-) diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index eaa55c0cf62f..8306b39e5c64 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -915,6 +915,7 @@ struct dwc3_hwparams { #define DWC3_MODE(n) ((n) & 0x7) /* HWPARAMS1 */ +#define DWC3_SPRAM_TYPE(n) (((n) >> 23) & 1) #define DWC3_NUM_INT(n) (((n) & (0x3f << 15)) >> 15) /* HWPARAMS3 */ @@ -925,6 +926,9 @@ struct dwc3_hwparams { #define DWC3_NUM_IN_EPS(p) (((p)->hwparams3 & \ (DWC3_NUM_IN_EPS_MASK)) >> 18) +/* HWPARAMS6 */ +#define DWC3_RAM0_DEPTH(n) (((n) & (0xffff0000)) >> 16) + /* HWPARAMS7 */ #define DWC3_RAM1_DEPTH(n) ((n) & 0xffff) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 2fed2aa01407..6101e5467b08 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -687,6 +687,44 @@ static int dwc3_gadget_calc_tx_fifo_size(struct dwc3 *dwc, int mult) return fifo_size; } +/** + * dwc3_gadget_calc_ram_depth - calculates the ram depth for txfifo + * @dwc: pointer to the DWC3 context + */ +static int dwc3_gadget_calc_ram_depth(struct dwc3 *dwc) +{ + int ram_depth; + int fifo_0_start; + bool is_single_port_ram; + + /* Check supporting RAM type by HW */ + is_single_port_ram = DWC3_SPRAM_TYPE(dwc->hwparams.hwparams1); + + /* + * If a single port RAM is utilized, then allocate TxFIFOs from + * RAM0. otherwise, allocate them from RAM1. + */ + ram_depth = is_single_port_ram ? DWC3_RAM0_DEPTH(dwc->hwparams.hwparams6) : + DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); + + /* + * In a single port RAM configuration, the available RAM is shared + * between the RX and TX FIFOs. This means that the txfifo can begin + * at a non-zero address. + */ + if (is_single_port_ram) { + u32 reg; + + /* Check if TXFIFOs start at non-zero addr */ + reg = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0)); + fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(reg); + + ram_depth -= (fifo_0_start >> 16); + } + + return ram_depth; +} + /** * dwc3_gadget_clear_tx_fifos - Clears txfifo allocation * @dwc: pointer to the DWC3 context @@ -753,7 +791,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) { struct dwc3 *dwc = dep->dwc; int fifo_0_start; - int ram1_depth; + int ram_depth; int fifo_size; int min_depth; int num_in_ep; @@ -773,7 +811,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) if (dep->flags & DWC3_EP_TXFIFO_RESIZED) return 0; - ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); + ram_depth = dwc3_gadget_calc_ram_depth(dwc); switch (dwc->gadget->speed) { case USB_SPEED_SUPER_PLUS: @@ -809,7 +847,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) /* Reserve at least one FIFO for the number of IN EPs */ min_depth = num_in_ep * (fifo + 1); - remaining = ram1_depth - min_depth - dwc->last_fifo_depth; + remaining = ram_depth - min_depth - dwc->last_fifo_depth; remaining = max_t(int, 0, remaining); /* * We've already reserved 1 FIFO per EP, so check what we can fit in @@ -835,9 +873,9 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size); /* Check fifo size allocation doesn't exceed available RAM size. */ - if (dwc->last_fifo_depth >= ram1_depth) { + if (dwc->last_fifo_depth >= ram_depth) { dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n", - dwc->last_fifo_depth, ram1_depth, + dwc->last_fifo_depth, ram_depth, dep->endpoint.name, fifo_size); if (DWC3_IP_IS(DWC3)) fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size); @@ -3090,7 +3128,7 @@ static int dwc3_gadget_check_config(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); struct usb_ep *ep; int fifo_size = 0; - int ram1_depth; + int ram_depth; int ep_num = 0; if (!dwc->do_fifo_resize) @@ -3113,8 +3151,8 @@ static int dwc3_gadget_check_config(struct usb_gadget *g) fifo_size += dwc->max_cfg_eps; /* Check if we can fit a single fifo per endpoint */ - ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); - if (fifo_size > ram1_depth) + ram_depth = dwc3_gadget_calc_ram_depth(dwc); + if (fifo_size > ram_depth) return -ENOMEM; return 0; -- 2.17.1

10 months, 2 weeks

2
1
0 0

backport "udf: Allocate name buffer in directory iterator on heap" to 5.15

by Hauke Mehrtens

Hi, I am running into this compile error in 5.15.171 in OpenWrt on 32 bit systems. This problem was introduced with kernel 5.15.169. ``` fs/udf/namei.c: In function 'udf_rename': fs/udf/namei.c:878:1: error: the frame size of 1144 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 878 | } | ^ cc1: all warnings being treated as errors make[2]: *** [scripts/Makefile.build:289: fs/udf/namei.o] Error 1 make[1]: *** [scripts/Makefile.build:552: fs/udf] Error 2 ``` This is fixed by this upstream commit: commit 0aba4860b0d0216a1a300484ff536171894d49d8 Author: Jan Kara <jack(a)suse.cz> Date: Tue Dec 20 12:38:45 2022 +0100 udf: Allocate name buffer in directory iterator on heap Please backport this patch to 5.15 too. It was already backported to kernel 6.1. Hauke

10 months, 2 weeks

2
2
0 0

[PATCH 8/9] drm/amd/display: Remove PIPE_DTO_SRC_SEL programming from set_dtbclk_dto

by Hamza Mahfooz

From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> There are cases where an OTG is remapped from driving a regular HDMI display to a DP/eDP display. There are also cases where DTBCLK needs to be enabled for HPO, but DTBCLK DTO programming may be done while OTG is still enabled which is dangerous as the PIPE_DTO_SRC_SEL programming may change the pixel clock generator source for a mapped and running OTG and cause it to hang. Remove the PIPE_DTO_SRC_SEL programming from this sequence since it is already done in program_pixel_clk(). Additionally, make sure that program_pixel_clk sets DTBCLK DTO as source for special HDMI cases. Cc: stable(a)vger.kernel.org # 6.11+ Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c b/drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c index 838d72eaa87f..b363f5360818 100644 --- a/drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c +++ b/drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c @@ -1392,10 +1392,10 @@ static void dccg35_set_dtbclk_dto( /* The recommended programming sequence to enable DTBCLK DTO to generate * valid pixel HPO DPSTREAM ENCODER, specifies that DTO source select should - * be set only after DTO is enabled + * be set only after DTO is enabled. + * PIPEx_DTO_SRC_SEL should not be programmed during DTBCLK update since OTG may still be on, and the + * programming is handled in program_pix_clk() regardless, so it can be removed from here. */ - REG_UPDATE(OTG_PIXEL_RATE_CNTL[params->otg_inst], - PIPE_DTO_SRC_SEL[params->otg_inst], 2); } else { switch (params->otg_inst) { case 0: @@ -1412,9 +1412,12 @@ static void dccg35_set_dtbclk_dto( break; } - REG_UPDATE_2(OTG_PIXEL_RATE_CNTL[params->otg_inst], - DTBCLK_DTO_ENABLE[params->otg_inst], 0, - PIPE_DTO_SRC_SEL[params->otg_inst], params->is_hdmi ? 0 : 1); + /** + * PIPEx_DTO_SRC_SEL should not be programmed during DTBCLK update since OTG may still be on, and the + * programming is handled in program_pix_clk() regardless, so it can be removed from here. + */ + REG_UPDATE(OTG_PIXEL_RATE_CNTL[params->otg_inst], + DTBCLK_DTO_ENABLE[params->otg_inst], 0); REG_WRITE(DTBCLK_DTO_MODULO[params->otg_inst], 0); REG_WRITE(DTBCLK_DTO_PHASE[params->otg_inst], 0); -- 2.46.1

10 months, 2 weeks

1
0
0 0

[PATCH 6/9] drm/amd/display: Populate Power Profile In Case of Early Return

by Hamza Mahfooz

From: Austin Zheng <Austin.Zheng(a)amd.com> Early return possible if context has no clk_mgr. This will lead to an invalid power profile being returned which looks identical to a profile with the lowest power level. Add back logic that populated the power profile and overwrite the value if needed. Cc: stable(a)vger.kernel.org Fixes: fc8c959496fa ("drm/amd/display: Update Interface to Check UCLK DPM") Reviewed-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Austin Zheng <Austin.Zheng(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 0c1875d35a95..1dd26d5df6b9 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -6100,11 +6100,11 @@ struct dc_power_profile dc_get_power_profile_for_dc_state(const struct dc_state { struct dc_power_profile profile = { 0 }; - if (!context || !context->clk_mgr || !context->clk_mgr->ctx || !context->clk_mgr->ctx->dc) + profile.power_level = !context->bw_ctx.bw.dcn.clk.p_state_change_support; + if (!context->clk_mgr || !context->clk_mgr->ctx || !context->clk_mgr->ctx->dc) return profile; struct dc *dc = context->clk_mgr->ctx->dc; - if (dc->res_pool->funcs->get_power_profile) profile.power_level = dc->res_pool->funcs->get_power_profile(context); return profile; -- 2.46.1

10 months, 2 weeks

1
0
0 0

[PATCH 4/9] drm/amd/display: Enable Request rate limiter during C-State on dcn401

by Hamza Mahfooz

From: Dillon Varone <dillon.varone(a)amd.com> [WHY] When C-State entry is requested, the rate limiter will be disabled which can result in high contention in the DCHUB return path. [HOW] Enable the rate limiter during C-state requests to prevent contention. Cc: stable(a)vger.kernel.org # 6.11+ Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com> Signed-off-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../src/dml2_core/dml2_core_dcn4_calcs.c | 6 +++++ .../display/dc/hubbub/dcn10/dcn10_hubbub.h | 8 ++++++- .../display/dc/hubbub/dcn20/dcn20_hubbub.h | 1 + .../display/dc/hubbub/dcn401/dcn401_hubbub.c | 24 +++++++++++++++++-- .../display/dc/hubbub/dcn401/dcn401_hubbub.h | 7 +++++- .../amd/display/dc/hwss/dcn401/dcn401_hwseq.c | 13 ++++++---- .../gpu/drm/amd/display/dc/inc/hw/dchubbub.h | 2 +- .../dc/resource/dcn401/dcn401_resource.h | 3 ++- 8 files changed, 53 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c index 92e43a1e4dd4..601320b1be81 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c @@ -11,6 +11,7 @@ #define DML2_MAX_FMT_420_BUFFER_WIDTH 4096 #define DML_MAX_NUM_OF_SLICES_PER_DSC 4 +#define ALLOW_SDPIF_RATE_LIMIT_PRE_CSTATE const char *dml2_core_internal_bw_type_str(enum dml2_core_internal_bw_type bw_type) { @@ -3886,6 +3887,10 @@ static void CalculateSwathAndDETConfiguration(struct dml2_core_internal_scratch #endif *p->hw_debug5 = false; +#ifdef ALLOW_SDPIF_RATE_LIMIT_PRE_CSTATE + if (p->NumberOfActiveSurfaces > 1) + *p->hw_debug5 = true; +#else for (unsigned int k = 0; k < p->NumberOfActiveSurfaces; ++k) { if (!(p->mrq_present) && (!(*p->UnboundedRequestEnabled)) && (TotalActiveDPP == 1) && p->display_cfg->plane_descriptors[k].surface.dcc.enable @@ -3901,6 +3906,7 @@ static void CalculateSwathAndDETConfiguration(struct dml2_core_internal_scratch dml2_printf("DML::%s: k=%u hw_debug5 = %u\n", __func__, k, *p->hw_debug5); #endif } +#endif } static enum dml2_odm_mode DecideODMMode(unsigned int HActive, diff --git a/drivers/gpu/drm/amd/display/dc/hubbub/dcn10/dcn10_hubbub.h b/drivers/gpu/drm/amd/display/dc/hubbub/dcn10/dcn10_hubbub.h index 4bd1dda07719..9fbd45c7dfef 100644 --- a/drivers/gpu/drm/amd/display/dc/hubbub/dcn10/dcn10_hubbub.h +++ b/drivers/gpu/drm/amd/display/dc/hubbub/dcn10/dcn10_hubbub.h @@ -200,6 +200,7 @@ struct dcn_hubbub_registers { uint32_t DCHUBBUB_ARB_FRAC_URG_BW_MALL_B; uint32_t DCHUBBUB_TIMEOUT_DETECTION_CTRL1; uint32_t DCHUBBUB_TIMEOUT_DETECTION_CTRL2; + uint32_t DCHUBBUB_CTRL_STATUS; }; #define HUBBUB_REG_FIELD_LIST_DCN32(type) \ @@ -320,7 +321,12 @@ struct dcn_hubbub_registers { type DCHUBBUB_TIMEOUT_REQ_STALL_THRESHOLD;\ type DCHUBBUB_TIMEOUT_PSTATE_STALL_THRESHOLD;\ type DCHUBBUB_TIMEOUT_DETECTION_EN;\ - type DCHUBBUB_TIMEOUT_TIMER_RESET + type DCHUBBUB_TIMEOUT_TIMER_RESET;\ + type ROB_UNDERFLOW_STATUS;\ + type ROB_OVERFLOW_STATUS;\ + type ROB_OVERFLOW_CLEAR;\ + type DCHUBBUB_HW_DEBUG;\ + type CSTATE_SWATH_CHK_GOOD_MODE #define HUBBUB_STUTTER_REG_FIELD_LIST(type) \ type DCHUBBUB_ARB_ALLOW_SR_ENTER_WATERMARK_A;\ diff --git a/drivers/gpu/drm/amd/display/dc/hubbub/dcn20/dcn20_hubbub.h b/drivers/gpu/drm/amd/display/dc/hubbub/dcn20/dcn20_hubbub.h index 036bb3e6c957..46d8f5c70750 100644 --- a/drivers/gpu/drm/amd/display/dc/hubbub/dcn20/dcn20_hubbub.h +++ b/drivers/gpu/drm/amd/display/dc/hubbub/dcn20/dcn20_hubbub.h @@ -96,6 +96,7 @@ struct dcn20_hubbub { unsigned int det1_size; unsigned int det2_size; unsigned int det3_size; + bool allow_sdpif_rate_limit_when_cstate_req; }; void hubbub2_construct(struct dcn20_hubbub *hubbub, diff --git a/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.c b/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.c index 5d658e9bef64..92fab471b183 100644 --- a/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.c +++ b/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.c @@ -1192,15 +1192,35 @@ static void dcn401_wait_for_det_update(struct hubbub *hubbub, int hubp_inst) } } -static void dcn401_program_timeout_thresholds(struct hubbub *hubbub, struct dml2_display_arb_regs *arb_regs) +static bool dcn401_program_arbiter(struct hubbub *hubbub, struct dml2_display_arb_regs *arb_regs, bool safe_to_lower) { struct dcn20_hubbub *hubbub2 = TO_DCN20_HUBBUB(hubbub); + bool wm_pending = false; + uint32_t temp; + /* request backpressure and outstanding return threshold (unused)*/ //REG_UPDATE(DCHUBBUB_TIMEOUT_DETECTION_CTRL1, DCHUBBUB_TIMEOUT_REQ_STALL_THRESHOLD, arb_regs->req_stall_threshold); /* P-State stall threshold */ REG_UPDATE(DCHUBBUB_TIMEOUT_DETECTION_CTRL2, DCHUBBUB_TIMEOUT_PSTATE_STALL_THRESHOLD, arb_regs->pstate_stall_threshold); + + if (safe_to_lower || arb_regs->allow_sdpif_rate_limit_when_cstate_req > hubbub2->allow_sdpif_rate_limit_when_cstate_req) { + hubbub2->allow_sdpif_rate_limit_when_cstate_req = arb_regs->allow_sdpif_rate_limit_when_cstate_req; + + /* only update the required bits */ + REG_GET(DCHUBBUB_CTRL_STATUS, DCHUBBUB_HW_DEBUG, &temp); + if (hubbub2->allow_sdpif_rate_limit_when_cstate_req) { + temp |= (1 << 5); + } else { + temp &= ~(1 << 5); + } + REG_UPDATE(DCHUBBUB_CTRL_STATUS, DCHUBBUB_HW_DEBUG, temp); + } else { + wm_pending = true; + } + + return wm_pending; } static const struct hubbub_funcs hubbub4_01_funcs = { @@ -1226,7 +1246,7 @@ static const struct hubbub_funcs hubbub4_01_funcs = { .program_det_segments = dcn401_program_det_segments, .program_compbuf_segments = dcn401_program_compbuf_segments, .wait_for_det_update = dcn401_wait_for_det_update, - .program_timeout_thresholds = dcn401_program_timeout_thresholds, + .program_arbiter = dcn401_program_arbiter, }; void hubbub401_construct(struct dcn20_hubbub *hubbub2, diff --git a/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.h b/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.h index 5f1960722ebd..b1d9ea9d1c3d 100644 --- a/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.h +++ b/drivers/gpu/drm/amd/display/dc/hubbub/dcn401/dcn401_hubbub.h @@ -128,7 +128,12 @@ HUBBUB_SF(DCHUBBUB_TIMEOUT_DETECTION_CTRL1, DCHUBBUB_TIMEOUT_REQ_STALL_THRESHOLD, mask_sh),\ HUBBUB_SF(DCHUBBUB_TIMEOUT_DETECTION_CTRL2, DCHUBBUB_TIMEOUT_PSTATE_STALL_THRESHOLD, mask_sh),\ HUBBUB_SF(DCHUBBUB_TIMEOUT_DETECTION_CTRL2, DCHUBBUB_TIMEOUT_DETECTION_EN, mask_sh),\ - HUBBUB_SF(DCHUBBUB_TIMEOUT_DETECTION_CTRL2, DCHUBBUB_TIMEOUT_TIMER_RESET, mask_sh) + HUBBUB_SF(DCHUBBUB_TIMEOUT_DETECTION_CTRL2, DCHUBBUB_TIMEOUT_TIMER_RESET, mask_sh),\ + HUBBUB_SF(DCHUBBUB_CTRL_STATUS, ROB_UNDERFLOW_STATUS, mask_sh),\ + HUBBUB_SF(DCHUBBUB_CTRL_STATUS, ROB_OVERFLOW_STATUS, mask_sh),\ + HUBBUB_SF(DCHUBBUB_CTRL_STATUS, ROB_OVERFLOW_CLEAR, mask_sh),\ + HUBBUB_SF(DCHUBBUB_CTRL_STATUS, DCHUBBUB_HW_DEBUG, mask_sh),\ + HUBBUB_SF(DCHUBBUB_CTRL_STATUS, CSTATE_SWATH_CHK_GOOD_MODE, mask_sh) bool hubbub401_program_urgent_watermarks( struct hubbub *hubbub, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c index e8cc1bfa73f3..5de11e2837c0 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c @@ -1488,6 +1488,10 @@ void dcn401_prepare_bandwidth(struct dc *dc, &context->bw_ctx.bw.dcn.watermarks, dc->res_pool->ref_clocks.dchub_ref_clock_inKhz / 1000, false); + /* update timeout thresholds */ + if (hubbub->funcs->program_arbiter) { + dc->wm_optimized_required |= hubbub->funcs->program_arbiter(hubbub, &context->bw_ctx.bw.dcn.arb_regs, false); + } /* decrease compbuf size */ if (hubbub->funcs->program_compbuf_segments) { @@ -1529,6 +1533,10 @@ void dcn401_optimize_bandwidth( &context->bw_ctx.bw.dcn.watermarks, dc->res_pool->ref_clocks.dchub_ref_clock_inKhz / 1000, true); + /* update timeout thresholds */ + if (hubbub->funcs->program_arbiter) { + hubbub->funcs->program_arbiter(hubbub, &context->bw_ctx.bw.dcn.arb_regs, true); + } if (dc->clk_mgr->dc_mode_softmax_enabled) if (dc->clk_mgr->clks.dramclk_khz > dc->clk_mgr->bw_params->dc_mode_softmax_memclk * 1000 && @@ -1554,11 +1562,6 @@ void dcn401_optimize_bandwidth( pipe_ctx->dlg_regs.min_dst_y_next_start); } } - - /* update timeout thresholds */ - if (hubbub->funcs->program_timeout_thresholds) { - hubbub->funcs->program_timeout_thresholds(hubbub, &context->bw_ctx.bw.dcn.arb_regs); - } } void dcn401_fams2_global_control_lock(struct dc *dc, diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h b/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h index 6c1d41c0f099..52b745667ef7 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h @@ -228,7 +228,7 @@ struct hubbub_funcs { void (*program_det_segments)(struct hubbub *hubbub, int hubp_inst, unsigned det_buffer_size_seg); void (*program_compbuf_segments)(struct hubbub *hubbub, unsigned compbuf_size_seg, bool safe_to_increase); void (*wait_for_det_update)(struct hubbub *hubbub, int hubp_inst); - void (*program_timeout_thresholds)(struct hubbub *hubbub, struct dml2_display_arb_regs *arb_regs); + bool (*program_arbiter)(struct hubbub *hubbub, struct dml2_display_arb_regs *arb_regs, bool safe_to_lower); }; struct hubbub { diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.h b/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.h index 7c8d61db153d..19568c359669 100644 --- a/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.h +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.h @@ -612,7 +612,8 @@ void dcn401_prepare_mcache_programming(struct dc *dc, struct dc_state *context); SR(DCHUBBUB_SDPIF_CFG1), \ SR(DCHUBBUB_MEM_PWR_MODE_CTRL), \ SR(DCHUBBUB_TIMEOUT_DETECTION_CTRL1), \ - SR(DCHUBBUB_TIMEOUT_DETECTION_CTRL2) + SR(DCHUBBUB_TIMEOUT_DETECTION_CTRL2), \ + SR(DCHUBBUB_CTRL_STATUS) /* DCCG */ -- 2.46.1

10 months, 2 weeks

1
0
0 0

[PATCH 3/9] drm/amd/display: Fix handling of plane refcount

by Hamza Mahfooz

From: Joshua Aberback <joshua.aberback(a)amd.com> [Why] The mechanism to backup and restore plane states doesn't maintain refcount, which can cause issues if the refcount of the plane changes in between backup and restore operations, such as memory leaks if the refcount was supposed to go down, or double frees / invalid memory accesses if the refcount was supposed to go up. [How] Cache and re-apply current refcount when restoring plane states. Cc: stable(a)vger.kernel.org Reviewed-by: Josip Pavic <josip.pavic(a)amd.com> Signed-off-by: Joshua Aberback <joshua.aberback(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 7872c6cabb14..0c1875d35a95 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -3141,7 +3141,10 @@ static void restore_planes_and_stream_state( return; for (i = 0; i < status->plane_count; i++) { + /* refcount will always be valid, restore everything else */ + struct kref refcount = status->plane_states[i]->refcount; *status->plane_states[i] = scratch->plane_states[i]; + status->plane_states[i]->refcount = refcount; } *stream = scratch->stream_state; } -- 2.46.1

10 months, 2 weeks

1
0
0 0

[PATCH 2/9] drm/amd/display: Ignore scalar validation failure if pipe is phantom

by Hamza Mahfooz

From: Chris Park <chris.park(a)amd.com> [Why] There are some pipe scaler validation failure when the pipe is phantom and causes crash in DML validation. Since, scalar parameters are not as important in phantom pipe and we require this plane to do successful MCLK switches, the failure condition can be ignored. [How] Ignore scalar validation failure if the pipe validation is marked as phantom pipe. Cc: stable(a)vger.kernel.org # 6.11+ Reviewed-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Chris Park <chris.park(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 33125b95c3a1..619fad17de55 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1501,6 +1501,10 @@ bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx) res = spl_calculate_scaler_params(spl_in, spl_out); // Convert respective out params from SPL to scaler data translate_SPL_out_params_to_pipe_ctx(pipe_ctx, spl_out); + + /* Ignore scaler failure if pipe context plane is phantom plane */ + if (!res && plane_state->is_phantom) + res = true; } else { #endif /* depends on h_active */ @@ -1571,6 +1575,10 @@ bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx) &plane_state->scaling_quality); } + /* Ignore scaler failure if pipe context plane is phantom plane */ + if (!res && plane_state->is_phantom) + res = true; + if (res && (pipe_ctx->plane_res.scl_data.taps.v_taps != temp.v_taps || pipe_ctx->plane_res.scl_data.taps.h_taps != temp.h_taps || pipe_ctx->plane_res.scl_data.taps.v_taps_c != temp.v_taps_c || -- 2.46.1

10 months, 2 weeks

1
0
0 0

[PATCH 1/9] drm/amd/display: update pipe selection policy to check head pipe

by Hamza Mahfooz

From: Yihan Zhu <Yihan.Zhu(a)amd.com> [Why] No check on head pipe during the dml to dc hw mapping will allow illegal pipe usage. This will result in a wrong pipe topology to cause mpcc tree totally mess up then cause a display hang. [How] Avoid to use the pipe is head in all check and avoid ODM slice during preferred pipe check. v2: Added pipe type check for DPP pipe type before executing head pipe check in the pipe selection logic in DML2 to avoid NULL pointer de-reference. Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../display/dc/dml2/dml2_dc_resource_mgmt.c | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c index 6eccf0241d85..1ed21c1b86a5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c @@ -258,12 +258,25 @@ static unsigned int find_preferred_pipe_candidates(const struct dc_state *existi * However this condition comes with a caveat. We need to ignore pipes that will * require a change in OPP but still have the same stream id. For example during * an MPC to ODM transiton. + * + * Adding check to avoid pipe select on the head pipe by utilizing dc resource + * helper function resource_get_primary_dpp_pipe and comparing the pipe index. */ if (existing_state) { for (i = 0; i < pipe_count; i++) { if (existing_state->res_ctx.pipe_ctx[i].stream && existing_state->res_ctx.pipe_ctx[i].stream->stream_id == stream_id) { + struct pipe_ctx *head_pipe = + resource_is_pipe_type(&existing_state->res_ctx.pipe_ctx[i], DPP_PIPE) ? + resource_get_primary_dpp_pipe(&existing_state->res_ctx.pipe_ctx[i]) : + NULL; + + // we should always respect the head pipe from selection + if (head_pipe && head_pipe->pipe_idx == i) + continue; if (existing_state->res_ctx.pipe_ctx[i].plane_res.hubp && - existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i) + existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i && + (existing_state->res_ctx.pipe_ctx[i].prev_odm_pipe || + existing_state->res_ctx.pipe_ctx[i].next_odm_pipe)) continue; preferred_pipe_candidates[num_preferred_candidates++] = i; @@ -292,6 +305,14 @@ static unsigned int find_last_resort_pipe_candidates(const struct dc_state *exis */ if (existing_state) { for (i = 0; i < pipe_count; i++) { + struct pipe_ctx *head_pipe = + resource_is_pipe_type(&existing_state->res_ctx.pipe_ctx[i], DPP_PIPE) ? + resource_get_primary_dpp_pipe(&existing_state->res_ctx.pipe_ctx[i]) : + NULL; + + // we should always respect the head pipe from selection + if (head_pipe && head_pipe->pipe_idx == i) + continue; if ((existing_state->res_ctx.pipe_ctx[i].plane_res.hubp && existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i) || existing_state->res_ctx.pipe_ctx[i].stream_res.tg) -- 2.46.1

10 months, 2 weeks

1
0
0 0

[PATCH net] netfilter: ipset: add missing range check in bitmap_ip_uadt

by Jeongjun Park

In the bitmap_ip_uadt function, if ip is greater than ip_to, they are swapped. However, there is no check to see if ip is smaller than map->first, which causes an out-of-bounds vulnerability. Therefore, you need to add a missing bounds check to prevent out-of-bounds. Cc: <stable(a)vger.kernel.org> Reported-by: syzbot+58c872f7790a4d2ac951(a)syzkaller.appspotmail.com Fixes: 72205fc68bd1 ("netfilter: ipset: bitmap:ip set type support") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> --- net/netfilter/ipset/ip_set_bitmap_ip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c b/net/netfilter/ipset/ip_set_bitmap_ip.c index e4fa00abde6a..705c316b001a 100644 --- a/net/netfilter/ipset/ip_set_bitmap_ip.c +++ b/net/netfilter/ipset/ip_set_bitmap_ip.c @@ -178,7 +178,7 @@ bitmap_ip_uadt(struct ip_set *set, struct nlattr *tb[], ip_to = ip; } - if (ip_to > map->last_ip) + if (ip < map->first_ip || ip_to > map->last_ip) return -IPSET_ERR_BITMAP_RANGE; for (; !before(ip_to, ip); ip += map->hosts) { --

10 months, 2 weeks

2
1
0 0

[PATCH] Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K"

by Aurelien Jarno

The commit 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") increased the max_req_size, even for 4K pages, causing various issues: - Panic booting the kernel/rootfs from an SD card on Rockchip RK3566 - Panic booting the kernel/rootfs from an SD card on StarFive JH7100 - "swiotlb buffer is full" and data corruption on StarFive JH7110 At this stage no fix have been found, so it's probably better to just revert the change. This reverts commit 8396c793ffdf28bb8aee7cfe0891080f8cab7890. Cc: stable(a)vger.kernel.org Cc: Sam Protsenko <semen.protsenko(a)linaro.org> Fixes: 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") Closes: https://lore.kernel.org/linux-mmc/614692b4-1dbe-31b8-a34d-cb6db1909bb7@w6rz… Closes: https://lore.kernel.org/linux-mmc/CAC8uq=Ppnmv98mpa1CrWLawWoPnu5abtU69v-=G-… Signed-off-by: Aurelien Jarno <aurelien(a)aurel32.net> --- drivers/mmc/host/dw_mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) I have posted a patch to fix the issue, but unfortunately it only fixes the JH7110 case: https://lore.kernel.org/linux-mmc/20241020142931.138277-1-aurelien@aurel32.… diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 41e451235f637..e9f6e4e622901 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2957,8 +2957,8 @@ static int dw_mci_init_slot(struct dw_mci *host) if (host->use_dma == TRANS_MODE_IDMAC) { mmc->max_segs = host->ring_size; mmc->max_blk_size = 65535; - mmc->max_req_size = DW_MCI_DESC_DATA_LENGTH * host->ring_size; - mmc->max_seg_size = mmc->max_req_size; + mmc->max_seg_size = 0x1000; + mmc->max_req_size = mmc->max_seg_size * host->ring_size; mmc->max_blk_count = mmc->max_req_size / 512; } else if (host->use_dma == TRANS_MODE_EDMAC) { mmc->max_segs = 64; -- 2.45.2

10 months, 2 weeks

2
1
0 0

[PATCH] mmc: sunxi-mmc: Fix A100 compatible description

by Andre Przywara

It turns out that the Allwinner A100/A133 SoC only supports 8K DMA blocks (13 bits wide), for both the SD/SDIO and eMMC instances. And while this alone would make a trivial fix, the H616 falls back to the A100 compatible string, so we have to now match the H616 compatible string explicitly against the description advertising 64K DMA blocks. As the A100 is now compatible with the D1 description, let the A100 compatible string point to that block instead, and introduce an explicit match against the H616 string, pointing to the old description. Also remove the redundant setting of clk_delays to NULL on the way. Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller") Cc: stable(a)vger.kernel.org Signed-off-by: Andre Przywara <andre.przywara(a)arm.com> --- drivers/mmc/host/sunxi-mmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c index d3bd0ac99ec46..e0ab5fd635e6c 100644 --- a/drivers/mmc/host/sunxi-mmc.c +++ b/drivers/mmc/host/sunxi-mmc.c @@ -1191,10 +1191,9 @@ static const struct sunxi_mmc_cfg sun50i_a64_emmc_cfg = { .needs_new_timings = true, }; -static const struct sunxi_mmc_cfg sun50i_a100_cfg = { +static const struct sunxi_mmc_cfg sun50i_h616_cfg = { .idma_des_size_bits = 16, .idma_des_shift = 2, - .clk_delays = NULL, .can_calibrate = true, .mask_data0 = true, .needs_new_timings = true, @@ -1217,8 +1216,9 @@ static const struct of_device_id sunxi_mmc_of_match[] = { { .compatible = "allwinner,sun20i-d1-mmc", .data = &sun20i_d1_cfg }, { .compatible = "allwinner,sun50i-a64-mmc", .data = &sun50i_a64_cfg }, { .compatible = "allwinner,sun50i-a64-emmc", .data = &sun50i_a64_emmc_cfg }, - { .compatible = "allwinner,sun50i-a100-mmc", .data = &sun50i_a100_cfg }, + { .compatible = "allwinner,sun50i-a100-mmc", .data = &sun20i_d1_cfg }, { .compatible = "allwinner,sun50i-a100-emmc", .data = &sun50i_a100_emmc_cfg }, + { .compatible = "allwinner,sun50i-h616-mmc", .data = &sun50i_h616_cfg }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, sunxi_mmc_of_match); -- 2.46.2

10 months, 2 weeks

4
5
0 0

[merged mm-hotfixes-stable] selftests-hugetlb_dio-fixup-check-for-initial-conditions-to-skip-in-the-start.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: selftests: hugetlb_dio: fixup check for initial conditions to skip in the start has been removed from the -mm tree. Its filename was selftests-hugetlb_dio-fixup-check-for-initial-conditions-to-skip-in-the-start.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Donet Tom <donettom(a)linux.ibm.com> Subject: selftests: hugetlb_dio: fixup check for initial conditions to skip in the start Date: Sun, 10 Nov 2024 00:49:03 -0600 This test verifies that a hugepage, used as a user buffer for DIO operations, is correctly freed upon unmapping. To test this, we read the count of free hugepages before and after the mmap, DIO, and munmap operations, then check if the free hugepage count is the same. Reading free hugepages before the test was removed by commit 0268d4579901 ('selftests: hugetlb_dio: check for initial conditions to skip at the start'), causing the test to always fail. This patch adds back reading the free hugepages before starting the test. With this patch, the tests are now passing. Test results without this patch: ./tools/testing/selftests/mm/hugetlb_dio TAP version 13 1..4 # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 1 : Huge pages not freed! # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 2 : Huge pages not freed! # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 3 : Huge pages not freed! # No. Free pages before allocation : 0 # No. Free pages after munmap : 100 not ok 4 : Huge pages not freed! # Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0 Test results with this patch: /tools/testing/selftests/mm/hugetlb_dio TAP version 13 1..4 # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 1 : Huge pages freed successfully ! # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 2 : Huge pages freed successfully ! # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 3 : Huge pages freed successfully ! # No. Free pages before allocation : 100 # No. Free pages after munmap : 100 ok 4 : Huge pages freed successfully ! # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Link: https://lkml.kernel.org/r/20241110064903.23626-1-donettom@linux.ibm.com Fixes: 0268d4579901 ("selftests: hugetlb_dio: check for initial conditions to skip in the start") Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- tools/testing/selftests/mm/hugetlb_dio.c | 7 +++++++ 1 file changed, 7 insertions(+) --- a/tools/testing/selftests/mm/hugetlb_dio.c~selftests-hugetlb_dio-fixup-check-for-initial-conditions-to-skip-in-the-start +++ a/tools/testing/selftests/mm/hugetlb_dio.c @@ -44,6 +44,13 @@ void run_dio_using_hugetlb(unsigned int if (fd < 0) ksft_exit_fail_perror("Error opening file\n"); + /* Get the free huge pages before allocation */ + free_hpage_b = get_free_hugepages(); + if (free_hpage_b == 0) { + close(fd); + ksft_exit_skip("No free hugepage, exiting!\n"); + } + /* Allocate a hugetlb page */ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0); if (orig_buffer == MAP_FAILED) { _ Patches currently in -mm which might be from donettom(a)linux.ibm.com are

10 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases has been removed from the -mm tree. Its filename was mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: John Hubbard <jhubbard(a)nvidia.com> Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases Date: Mon, 4 Nov 2024 19:29:44 -0800 commit 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") created a new constraint on the pin_user_pages*() API family: a potentially large internal allocation must now occur, for FOLL_LONGTERM cases. A user-visible consequence has now appeared: user space can no longer pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB PAGE_SIZE system, when user space tries to (indirectly, via a device driver that calls pin_user_pages()) pin 2GB, this requires an allocation of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for kmalloc(). In addition to the directly visible effect described above, there is also the problem of adding an unnecessary allocation. The **pages array argument has already been allocated, and there is no need for a redundant **folios array allocation in this case. Fix this by avoiding the new allocation entirely. This is done by referring to either the original page[i] within **pages, or to the associated folio. Thanks to David Hildenbrand for suggesting this approach and for providing the initial implementation (which I've tested and adjusted slightly) as well. [jhubbard(a)nvidia.com: whitespace tweak, per David] Link: https://lkml.kernel.org/r/131cf9c8-ebc0-4cbb-b722-22fa8527bf3c@nvidia.com [jhubbard(a)nvidia.com: bypass pofs_get_folio(), per Oscar] Link: https://lkml.kernel.org/r/c1587c7f-9155-45be-bd62-1e36c0dd6923@nvidia.com Link: https://lkml.kernel.org/r/20241105032944.141488-2-jhubbard@nvidia.com Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") Signed-off-by: John Hubbard <jhubbard(a)nvidia.com> Suggested-by: David Hildenbrand <david(a)redhat.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com> Cc: Dave Airlie <airlied(a)redhat.com> Cc: Gerd Hoffmann <kraxel(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch> Cc: Dongwon Kim <dongwon.kim(a)intel.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Junxiao Chang <junxiao.chang(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/gup.c | 116 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 77 insertions(+), 39 deletions(-) --- a/mm/gup.c~mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases +++ a/mm/gup.c @@ -2273,20 +2273,57 @@ struct page *get_dump_page(unsigned long #endif /* CONFIG_ELF_CORE */ #ifdef CONFIG_MIGRATION + +/* + * An array of either pages or folios ("pofs"). Although it may seem tempting to + * avoid this complication, by simply interpreting a list of folios as a list of + * pages, that approach won't work in the longer term, because eventually the + * layouts of struct page and struct folio will become completely different. + * Furthermore, this pof approach avoids excessive page_folio() calls. + */ +struct pages_or_folios { + union { + struct page **pages; + struct folio **folios; + void **entries; + }; + bool has_folios; + long nr_entries; +}; + +static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i) +{ + if (pofs->has_folios) + return pofs->folios[i]; + return page_folio(pofs->pages[i]); +} + +static void pofs_clear_entry(struct pages_or_folios *pofs, long i) +{ + pofs->entries[i] = NULL; +} + +static void pofs_unpin(struct pages_or_folios *pofs) +{ + if (pofs->has_folios) + unpin_folios(pofs->folios, pofs->nr_entries); + else + unpin_user_pages(pofs->pages, pofs->nr_entries); +} + /* * Returns the number of collected folios. Return value is always >= 0. */ static unsigned long collect_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) + struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { unsigned long i, collected = 0; struct folio *prev_folio = NULL; bool drain_allow = true; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio == prev_folio) continue; @@ -2327,16 +2364,15 @@ static unsigned long collect_longterm_un * Returns -EAGAIN if all folios were successfully migrated or -errno for * failure (or partial success). */ -static int migrate_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) +static int +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { int ret; unsigned long i; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio_is_device_coherent(folio)) { /* @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_f * convert the pin on the source folio to a normal * reference. */ - folios[i] = NULL; + pofs_clear_entry(pofs, i); folio_get(folio); gup_put_folio(folio, 1, FOLL_PIN); @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_f * calling folio_isolate_lru() which takes a reference so the * folio won't be freed if it's migrating. */ - unpin_folio(folios[i]); - folios[i] = NULL; + unpin_folio(folio); + pofs_clear_entry(pofs, i); } if (!list_empty(movable_folio_list)) { @@ -2387,12 +2423,26 @@ static int migrate_longterm_unpinnable_f return -EAGAIN; err: - unpin_folios(folios, nr_folios); + pofs_unpin(pofs); putback_movable_pages(movable_folio_list); return ret; } +static long +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) +{ + LIST_HEAD(movable_folio_list); + unsigned long collected; + + collected = collect_longterm_unpinnable_folios(&movable_folio_list, + pofs); + if (!collected) + return 0; + + return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs); +} + /* * Check whether all folios are *allowed* to be pinned indefinitely (long term). * Rather confusingly, all folios in the range are required to be pinned via @@ -2417,16 +2467,13 @@ err: static long check_and_migrate_movable_folios(unsigned long nr_folios, struct folio **folios) { - unsigned long collected; - LIST_HEAD(movable_folio_list); + struct pages_or_folios pofs = { + .folios = folios, + .has_folios = true, + .nr_entries = nr_folios, + }; - collected = collect_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); - if (!collected) - return 0; - - return migrate_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); + return check_and_migrate_movable_pages_or_folios(&pofs); } /* @@ -2436,22 +2483,13 @@ static long check_and_migrate_movable_fo static long check_and_migrate_movable_pages(unsigned long nr_pages, struct page **pages) { - struct folio **folios; - long i, ret; - - folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL); - if (!folios) { - unpin_user_pages(pages, nr_pages); - return -ENOMEM; - } - - for (i = 0; i < nr_pages; i++) - folios[i] = page_folio(pages[i]); + struct pages_or_folios pofs = { + .pages = pages, + .has_folios = false, + .nr_entries = nr_pages, + }; - ret = check_and_migrate_movable_folios(nr_pages, folios); - - kfree(folios); - return ret; + return check_and_migrate_movable_pages_or_folios(&pofs); } #else static long check_and_migrate_movable_pages(unsigned long nr_pages, _ Patches currently in -mm which might be from jhubbard(a)nvidia.com are

10 months, 2 weeks

1
0
0 0

[PATCH] drm/xe: handle flat ccs during hibernation on igpu

by Matthew Auld

Starting from LNL, CCS has moved over to flat CCS model where there is now dedicated memory reserved for storing compression state. On platforms like LNL this reserved memory lives inside graphics stolen memory, which is not treated like normal RAM and is therefore skipped by the core kernel when creating the hibernation image. Currently if something was compressed and we enter hibernation all the corresponding CCS state is lost on such HW, resulting in corrupted memory. To fix this evict user buffers from TT -> SYSTEM to ensure we take a snapshot of the raw CCS state when entering hibernation, where upon resuming we can restore the raw CCS state back when next validating the buffer. This has been confirmed to fix display corruption on LNL when coming back from hibernation. Fixes: cbdc52c11c9b ("drm/xe/xe2: Support flat ccs") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3409 Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.8+ --- drivers/gpu/drm/xe/xe_bo_evict.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c index b01bc20eb90b..8fb2be061003 100644 --- a/drivers/gpu/drm/xe/xe_bo_evict.c +++ b/drivers/gpu/drm/xe/xe_bo_evict.c @@ -35,10 +35,21 @@ int xe_bo_evict_all(struct xe_device *xe) int ret; /* User memory */ - for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) { + for (mem_type = XE_PL_TT; mem_type <= XE_PL_VRAM1; ++mem_type) { struct ttm_resource_manager *man = ttm_manager_type(bdev, mem_type); + /* + * On igpu platforms with flat CCS we need to ensure we save and restore any CCS + * state since this state lives inside graphics stolen memory which doesn't survive + * hibernation. + * + * This can be further improved by only evicting objects that we know have actually + * used a compression enabled PAT index. + */ + if (mem_type == XE_PL_TT && (IS_DGFX(xe) || !xe_device_has_flat_ccs(xe))) + continue; + if (man) { ret = ttm_resource_manager_evict_all(bdev, man); if (ret) -- 2.47.0

10 months, 2 weeks

2
1
0 0

[PATCH] nommu: pass NULL argument to vma_iter_prealloc()

by Hajime Tazaki

When deleting a vma entry from a maple tree, it has to pass NULL to vma_iter_prealloc() in order to calculate internal state of the tree, but it passed a wrong argument. As a result, nommu kernels crashed upon accessing a vma iterator, such as acct_collect() reading the size of vma entries after do_munmap(). This commit fixes this issue by passing a right argument to the preallocation call. Fixes: b5df09226450 ("mm: set up vma iterator for vma_iter_prealloc() calls") Cc: stable(a)vger.kernel.org Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com> Signed-off-by: Hajime Tazaki <thehajime(a)gmail.com> --- mm/nommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/nommu.c b/mm/nommu.c index 385b0c15add8..0c708f85408d 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -573,7 +573,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_start); vma_iter_config(&vmi, vma->vm_start, vma->vm_end); - if (vma_iter_prealloc(&vmi, vma)) { + if (vma_iter_prealloc(&vmi, NULL)) { pr_warn("Allocation of vma tree for process %d failed\n", current->pid); return -ENOMEM; -- 2.43.0

10 months, 2 weeks

3
2
0 0

[PATCH] amba: Fix atomicity violation in amba_match()

by Qiu-ji Chen

Atomicity violation occurs during consecutive reads of pcdev->driver_override. Consider a scenario: after pvdev->driver_override passes the if statement, due to possible concurrency, pvdev->driver_override may change. This leads to pvdev->driver_override passing the condition with an old value, but entering the return !strcmp(pcdev->driver_override, drv->name); statement with a new value. This causes the function to return an unexpected result. Since pvdev->driver_override is a string that is modified byte by byte, without considering atomicity, data races may cause a partially modified pvdev->driver_override to enter both the condition and return statements, resulting in an error. To fix this, we suggest protecting all reads of pvdev->driver_override with a lock, and storing the result of the strcmp() function in a new variable retval. This ensures that pvdev->driver_override does not change during the entire operation, allowing the function to return the expected result. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. Fixes: 5150a8f07f6c ("amba: reorder functions") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/amba/bus.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c index 34bc880ca20b..e310f4f83b27 100644 --- a/drivers/amba/bus.c +++ b/drivers/amba/bus.c @@ -209,6 +209,7 @@ static int amba_match(struct device *dev, const struct device_driver *drv) { struct amba_device *pcdev = to_amba_device(dev); const struct amba_driver *pcdrv = to_amba_driver(drv); + int retval; mutex_lock(&pcdev->periphid_lock); if (!pcdev->periphid) { @@ -230,8 +231,14 @@ static int amba_match(struct device *dev, const struct device_driver *drv) mutex_unlock(&pcdev->periphid_lock); /* When driver_override is set, only bind to the matching driver */ - if (pcdev->driver_override) - return !strcmp(pcdev->driver_override, drv->name); + + device_lock(dev); + if (pcdev->driver_override) { + retval = !strcmp(pcdev->driver_override, drv->name); + device_unlock(dev); + return retval; + } + device_unlock(dev); return amba_lookup(pcdrv->id_table, pcdev) != NULL; } -- 2.34.1

10 months, 2 weeks

2
2
0 0

[PATCH] cdx: Fix atomicity violation in cdx_bus_match() and cdx_probe()

by Qiu-ji Chen

An atomicity violation occurs during consecutive reads of the variable cdx_dev->driver_override. Imagine a scenario: while evaluating the statement if (cdx_dev->driver_override && strcmp(cdx_dev->driver_override, drv->name)), the value of cdx_dev->driver_override changes, leading to an inconsistency where the value of cdx_dev->driver_override is the old value when passing the non-null check, but the new value when evaluated by strcmp(). This causes an inconsistency. The second error occurs during the validation of cdx_dev->driver_override. The logic of this error is similar to the first one, as the entire process is not protected by a lock, leading to an inconsistency in the values of cdx_dev->driver_override before and after the reads. The third error occurs in driver_override_show() when executing the statement return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);. Since the string changes byte by byte, it is possible for a partially modified cdx_dev->driver_override value to be used in this statement, leading to an incorrect return value from the program. To fix these issues, for the first and second problems, since we need to protect the entire process of reading the variable cdx_dev->driver_override with a lock, we introduced a variable ret and an out block. For each branch in this section, we replaced the return statements with assignments to the variable ret, and then used a goto statement to directly execute the out block, making the code overall more concise. For the third problem, we adopted a similar approach to the one used in the modalias_show() function, protecting the process of reading cdx_dev->driver_override with a lock, ensuring that the program runs correctly. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. Fixes: 2959ab247061 ("cdx: add the cdx bus driver") Fixes: 48a6c7bced2a ("cdx: add device attributes") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/cdx/cdx.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c index 07371cb653d3..fae03c89f818 100644 --- a/drivers/cdx/cdx.c +++ b/drivers/cdx/cdx.c @@ -268,6 +268,7 @@ static int cdx_bus_match(struct device *dev, const struct device_driver *drv) const struct cdx_driver *cdx_drv = to_cdx_driver(drv); const struct cdx_device_id *found_id = NULL; const struct cdx_device_id *ids; + int ret = false; if (cdx_dev->is_bus) return false; @@ -275,28 +276,40 @@ static int cdx_bus_match(struct device *dev, const struct device_driver *drv) ids = cdx_drv->match_id_table; /* When driver_override is set, only bind to the matching driver */ - if (cdx_dev->driver_override && strcmp(cdx_dev->driver_override, drv->name)) - return false; + device_lock(dev); + if (cdx_dev->driver_override && strcmp(cdx_dev->driver_override, drv->name)) { + ret = false; + goto out; + } found_id = cdx_match_id(ids, cdx_dev); - if (!found_id) - return false; + if (!found_id) { + ret = false; + goto out; + } do { /* * In case override_only was set, enforce driver_override * matching. */ - if (!found_id->override_only) - return true; - if (cdx_dev->driver_override) - return true; + if (!found_id->override_only) { + ret = true; + goto out; + } + if (cdx_dev->driver_override) { + ret = true; + goto out; + } ids = found_id + 1; found_id = cdx_match_id(ids, cdx_dev); } while (found_id); - return false; + ret = false; +out: + device_unlock(dev); + return ret; } static int cdx_probe(struct device *dev) @@ -470,8 +483,12 @@ static ssize_t driver_override_show(struct device *dev, struct device_attribute *attr, char *buf) { struct cdx_device *cdx_dev = to_cdx_device(dev); + ssize_t len; - return sysfs_emit(buf, "%s\n", cdx_dev->driver_override); + device_lock(dev); + len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override); + device_unlock(dev); + return len; } static DEVICE_ATTR_RW(driver_override); -- 2.34.1

10 months, 2 weeks

2
2
0 0

[PATCH] bus/fls-mc: Fix possible UAF error in driver_override_show()

by Qiu-ji Chen

There is a data race between the functions driver_override_show() and driver_override_store(). In the driver_override_store() function, the assignment to ret calls driver_set_override(), which frees the old value while writing the new value to dev. If a race occurs, it may cause a use-after-free (UAF) error in driver_override_show(). To fix this issue, we adopted a logic similar to the driver_override_show() function in vmbus_drv.c, where the dev is protected by a lock to prevent its value from changing. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/bus/fsl-mc/fsl-mc-bus.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c index 930d8a3ba722..62a9da88b4c9 100644 --- a/drivers/bus/fsl-mc/fsl-mc-bus.c +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c @@ -201,8 +201,12 @@ static ssize_t driver_override_show(struct device *dev, struct device_attribute *attr, char *buf) { struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); + ssize_t len; - return snprintf(buf, PAGE_SIZE, "%s\n", mc_dev->driver_override); + device_lock(dev); + len = snprintf(buf, PAGE_SIZE, "%s\n", mc_dev->driver_override); + device_unlock(dev); + return len; } static DEVICE_ATTR_RW(driver_override); -- 2.34.1

10 months, 2 weeks

1
0
0 0

Re: [PATCH 1/1] x86/cpu: Add INTEL_LUNARLAKE_M to X86_BUG_MONITOR

by Rafael J. Wysocki

On Tue, Nov 12, 2024 at 3:02 PM Len Brown <lenb(a)kernel.org> wrote: > > On Tue, Nov 12, 2024 at 8:14 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote: > > > > On Tue, Nov 12, 2024 at 2:12 PM Len Brown <lenb(a)kernel.org> wrote: > > > > > > On Tue, Nov 12, 2024 at 6:44 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote: > > > > > > > > - if (boot_cpu_has(X86_FEATURE_MWAIT) && c->x86_vfm == INTEL_ATOM_GOLDMONT) > > > > > + if (boot_cpu_has(X86_FEATURE_MWAIT) && > > > > > + (c->x86_vfm == INTEL_ATOM_GOLDMONT > > > > > + || c->x86_vfm == INTEL_LUNARLAKE_M)) > > > > > > > > I would put the || at the end of the previous line, that is > > > > > > > > > It isn't my personal preference for human readability either, > > > but this is what scripts/Lindent does... > > > > Well, it doesn't match the coding style of the first line ... > > Fair observation. > > I'll bite. > > If you took the existing intel.c and added it as a patch to the kernel, > the resulting checkpatch would have 6 errors and 33 warnings. > > If you ran Lindent on the existing intel.c, the resulting diff would be > 408 lines -- 1 file changed, 232 insertions(+), 176 deletions(-) > > This for a file that is only 1300 lines long. > > If whitespace nirvana is the goal, tools are the answer, not the valuable > cycles of human reviewers. Well, the advice always given is to follow the coding style of the given fine in the first place. checkpatch reflects the preferences of its author is this particular respect and maintainers' preferences tend to differ from one to another.

10 months, 2 weeks

1
0
0 0

[PATCH] kbuild: switch from lz4c to lz4 for compression

by Parth Pancholi

From: Parth Pancholi <parth.pancholi(a)toradex.com> Replace lz4c with lz4 for kernel image compression. Although lz4 and lz4c are functionally similar, lz4c has been deprecated upstream since 2018. Since as early as Ubuntu 16.04 and Fedora 25, lz4 and lz4c have been packaged together, making it safe to update the requirement from lz4c to lz4. Consequently, some distributions and build systems, such as OpenEmbedded, have fully transitioned to using lz4. OpenEmbedded core adopted this change in commit fe167e082cbd ("bitbake.conf: require lz4 instead of lz4c"), causing compatibility issues when building the mainline kernel in the latest OpenEmbedded environment, as seen in the errors below. This change maintains compatibility with current kernel builds because both tools have a similar command-line interface while fixing the mainline kernel build failures with the latest master OpenEmbedded builds associated with the mentioned compatibility issues. LZ4 arch/arm/boot/compressed/piggy_data /bin/sh: 1: lz4c: not found ... ... ERROR: oe_runmake failed Cc: stable(a)vger.kernel.org Link: https://github.com/lz4/lz4/pull/553 Suggested-by: Francesco Dolcini <francesco.dolcini(a)toradex.com> Signed-off-by: Parth Pancholi <parth.pancholi(a)toradex.com> --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 79192a3024bf..7630f763f5b2 100644 --- a/Makefile +++ b/Makefile @@ -508,7 +508,7 @@ KGZIP = gzip KBZIP2 = bzip2 KLZOP = lzop LZMA = lzma -LZ4 = lz4c +LZ4 = lz4 XZ = xz ZSTD = zstd -- 2.34.1

10 months, 2 weeks

1
0
0 0

Re: [PATCH 1/1] x86/cpu: Add INTEL_LUNARLAKE_M to X86_BUG_MONITOR

by Rafael J. Wysocki

On Tue, Nov 12, 2024 at 2:12 PM Len Brown <lenb(a)kernel.org> wrote: > > On Tue, Nov 12, 2024 at 6:44 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote: > > > > - if (boot_cpu_has(X86_FEATURE_MWAIT) && c->x86_vfm == INTEL_ATOM_GOLDMONT) > > > + if (boot_cpu_has(X86_FEATURE_MWAIT) && > > > + (c->x86_vfm == INTEL_ATOM_GOLDMONT > > > + || c->x86_vfm == INTEL_LUNARLAKE_M)) > > > > I would put the || at the end of the previous line, that is > > > It isn't my personal preference for human readability either, > but this is what scripts/Lindent does... Well, it doesn't match the coding style of the first line ...

10 months, 2 weeks

1
0
0 0

Re: [PATCH 6.11 000/184] 6.11.8-rc1 review

by Ronald Warsow

Hi Greg no regressions here on x86_64 (RKL, Intel 11th Gen. CPU) Thanks Tested-by: Ronald Warsow <rwarsow(a)gmx.de>

10 months, 2 weeks

1
0
0 0

[PATCH 0/4] Venus driver fixes to avoid possible OOB accesses

by Vikash Garodia

This series primarily adds check at relevant places in venus driver where there are possible OOB accesses due to unexpected payload from venus firmware. The patches describes the specific OOB possibility. Please review and share your feedback. Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com> --- Vikash Garodia (4): media: venus: hfi_parser: add check to avoid out of bound access media: venus: hfi_parser: avoid OOB access beyond payload word count media: venus: hfi: add check to handle incorrect queue size media: venus: hfi: add a check to handle OOB in sfr region drivers/media/platform/qcom/venus/hfi_parser.c | 6 +++++- drivers/media/platform/qcom/venus/hfi_venus.c | 15 +++++++++++++-- 2 files changed, 18 insertions(+), 3 deletions(-) --- base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a change-id: 20241104-venus_oob-0343b143d61d Best regards, -- Vikash Garodia <quic_vgarodia(a)quicinc.com>

10 months, 2 weeks

3
30
0 0

[PATCH v2 0/1] ufs: ufs_sb_private_info: remove unused s_{2,3}apb fields

by Agathe Porte

v2: add Cc stable because the UBSAN might be triggered of previous stable kernels as well. Agathe Porte (1): ufs: ufs_sb_private_info: remove unused s_{2,3}apb fields fs/ufs/super.c | 4 ---- fs/ufs/ufs_fs.h | 4 ---- 2 files changed, 8 deletions(-) -- 2.43.0

10 months, 2 weeks

2
3
0 0

[PATCH 1/1] x86/cpu: Add INTEL_LUNARLAKE_M to X86_BUG_MONITOR

by Len Brown

From: Len Brown <len.brown(a)intel.com> Under some conditions, MONITOR wakeups on Lunar Lake processors can be lost, resulting in significant user-visible delays. Add LunarLake to X86_BUG_MONITOR so that wake_up_idle_cpu() always sends an IPI, avoiding this potential delay. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219364 Cc: stable(a)vger.kernel.org # 6.11 Signed-off-by: Len Brown <len.brown(a)intel.com> --- arch/x86/kernel/cpu/intel.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index e7656cbef68d..284cd561499c 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -586,7 +586,9 @@ static void init_intel(struct cpuinfo_x86 *c) c->x86_vfm == INTEL_WESTMERE_EX)) set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); - if (boot_cpu_has(X86_FEATURE_MWAIT) && c->x86_vfm == INTEL_ATOM_GOLDMONT) + if (boot_cpu_has(X86_FEATURE_MWAIT) && + (c->x86_vfm == INTEL_ATOM_GOLDMONT + || c->x86_vfm == INTEL_LUNARLAKE_M)) set_cpu_bug(c, X86_BUG_MONITOR); #ifdef CONFIG_X86_64 -- 2.43.0

10 months, 2 weeks

2
1
0 0

[PATCH v2] USB: core: remove dead code in do_proc_bulk()

by Rex Nie

Since len1 is unsigned int, len1 < 0 always false. Remove it keep code simple. Cc: stable(a)vger.kernel.org Fixes: ae8709b296d8 ("USB: core: Make do_proc_control() and do_proc_bulk() killable") Signed-off-by: Rex Nie <rex.nie(a)jaguarmicro.com> --- changes in v2: - Add "Cc: stable(a)vger.kernel.org" (kernel test robot) - Add Fixes tag - Link to v1: https://lore.kernel.org/stable/20241108094255.2133-1-rex.nie@jaguarmicro.co… --- drivers/usb/core/devio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 3beb6a862e80..712e290bab04 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -1295,7 +1295,7 @@ static int do_proc_bulk(struct usb_dev_state *ps, return ret; len1 = bulk->len; - if (len1 < 0 || len1 >= (INT_MAX - sizeof(struct urb))) + if (len1 >= (INT_MAX - sizeof(struct urb))) return -EINVAL; if (bulk->ep & USB_DIR_IN) -- 2.17.1

10 months, 2 weeks

2
5
0 0

[PATCH] mm/mremap: Fix address wraparound in move_page_tables()

by Jann Horn

On 32-bit platforms, it is possible for the expression `len + old_addr < old_end` to be false-positive if `len + old_addr` wraps around. `old_addr` is the cursor in the old range up to which page table entries have been moved; so if the operation succeeded, `old_addr` is the *end* of the old region, and adding `len` to it can wrap. The overflow causes mremap() to mistakenly believe that PTEs have been copied; the consequence is that mremap() bails out, but doesn't move the PTEs back before the new VMA is unmapped, causing anonymous pages in the region to be lost. So basically if userspace tries to mremap() a private-anon region and hits this bug, mremap() will return an error and the private-anon region's contents appear to have been zeroed. The idea of this check is that `old_end - len` is the original start address, and writing the check that way also makes it easier to read; so fix the check by rearranging the comparison accordingly. (An alternate fix would be to refactor this function by introducing an "orig_old_start" variable or such.) Cc: stable(a)vger.kernel.org Fixes: af8ca1c14906 ("mm/mremap: optimize the start addresses in move_page_tables()") Signed-off-by: Jann Horn <jannh(a)google.com> --- Tested in a VM with a 32-bit X86 kernel; without the patch: ``` user@horn:~/big_mremap$ cat test.c #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <err.h> #include <sys/mman.h> #define ADDR1 ((void*)0x60000000) #define ADDR2 ((void*)0x10000000) #define SIZE 0x50000000uL int main(void) { unsigned char *p1 = mmap(ADDR1, SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0); if (p1 == MAP_FAILED) err(1, "mmap 1"); unsigned char *p2 = mmap(ADDR2, SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0); if (p2 == MAP_FAILED) err(1, "mmap 2"); *p1 = 0x41; printf("first char is 0x%02hhx\n", *p1); unsigned char *p3 = mremap(p1, SIZE, SIZE, MREMAP_MAYMOVE|MREMAP_FIXED, p2); if (p3 == MAP_FAILED) { printf("mremap() failed; first char is 0x%02hhx\n", *p1); } else { printf("mremap() succeeded; first char is 0x%02hhx\n", *p3); } } user@horn:~/big_mremap$ gcc -static -o test test.c user@horn:~/big_mremap$ setarch -R ./test first char is 0x41 mremap() failed; first char is 0x00 ``` With the patch: ``` user@horn:~/big_mremap$ setarch -R ./test first char is 0x41 mremap() succeeded; first char is 0x41 ``` --- mm/mremap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/mremap.c b/mm/mremap.c index dda09e957a5d4c2546934b796e862e5e0213b311..dee98ff2bbd64439200dddac16c4bd054537c2ed 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -648,7 +648,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, * Prevent negative return values when {old,new}_addr was realigned * but we broke out of the above loop for the first PMD itself. */ - if (len + old_addr < old_end) + if (old_addr < old_end - len) return 0; return len + old_addr - old_end; /* how much done */ --- base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623 change-id: 20241111-fix-mremap-32bit-wrap-747105730f20 -- Jann Horn <jannh(a)google.com>

10 months, 2 weeks

5
4
0 0

[PATCH AUTOSEL 4.19] proc/softirqs: replace seq_printf with seq_put_decimal_ull_width

by Sasha Levin

From: David Wang <00107082(a)163.com> [ Upstream commit 84b9749a3a704dcc824a88aa8267247c801d51e4 ] seq_printf is costy, on a system with n CPUs, reading /proc/softirqs would yield 10*n decimal values, and the extra cost parsing format string grows linearly with number of cpus. Replace seq_printf with seq_put_decimal_ull_width have significant performance improvement. On an 8CPUs system, reading /proc/softirqs show ~40% performance gain with this patch. Signed-off-by: David Wang <00107082(a)163.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- fs/proc/softirqs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/softirqs.c b/fs/proc/softirqs.c index 12901dcf57e2b..d8f4e7d54d002 100644 --- a/fs/proc/softirqs.c +++ b/fs/proc/softirqs.c @@ -19,7 +19,7 @@ static int show_softirqs(struct seq_file *p, void *v) for (i = 0; i < NR_SOFTIRQS; i++) { seq_printf(p, "%12s:", softirq_to_name[i]); for_each_possible_cpu(j) - seq_printf(p, " %10u", kstat_softirqs_cpu(i, j)); + seq_put_decimal_ull_width(p, " ", kstat_softirqs_cpu(i, j), 10); seq_putc(p, '\n'); } return 0; -- 2.43.0

10 months, 2 weeks

1
0
0 0

[PATCH AUTOSEL 5.4 1/5] soc: qcom: Add check devm_kasprintf() returned value

by Sasha Levin

From: Charles Han <hanchunchao(a)inspur.com> [ Upstream commit e694d2b5c58ba2d1e995d068707c8d966e7f5f2a ] devm_kasprintf() can return a NULL pointer on failure but this returned value in qcom_socinfo_probe() is not checked. Signed-off-by: Charles Han <hanchunchao(a)inspur.com> Link: https://lore.kernel.org/r/20240929072349.202520-1-hanchunchao@inspur.com Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/soc/qcom/socinfo.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/soc/qcom/socinfo.c b/drivers/soc/qcom/socinfo.c index 3303bcaf67154..8a9f781ba83f7 100644 --- a/drivers/soc/qcom/socinfo.c +++ b/drivers/soc/qcom/socinfo.c @@ -433,10 +433,16 @@ static int qcom_socinfo_probe(struct platform_device *pdev) qs->attr.revision = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u.%u", SOCINFO_MAJOR(le32_to_cpu(info->ver)), SOCINFO_MINOR(le32_to_cpu(info->ver))); - if (offsetof(struct socinfo, serial_num) <= item_size) + if (!qs->attr.soc_id || qs->attr.revision) + return -ENOMEM; + + if (offsetof(struct socinfo, serial_num) <= item_size) { qs->attr.serial_number = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u", le32_to_cpu(info->serial_num)); + if (!qs->attr.serial_number) + return -ENOMEM; + } qs->soc_dev = soc_device_register(&qs->attr); if (IS_ERR(qs->soc_dev)) -- 2.43.0

10 months, 2 weeks

1
4
0 0

[PATCH AUTOSEL 5.15 1/8] soc: qcom: Add check devm_kasprintf() returned value

by Sasha Levin

From: Charles Han <hanchunchao(a)inspur.com> [ Upstream commit e694d2b5c58ba2d1e995d068707c8d966e7f5f2a ] devm_kasprintf() can return a NULL pointer on failure but this returned value in qcom_socinfo_probe() is not checked. Signed-off-by: Charles Han <hanchunchao(a)inspur.com> Link: https://lore.kernel.org/r/20240929072349.202520-1-hanchunchao@inspur.com Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/soc/qcom/socinfo.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/soc/qcom/socinfo.c b/drivers/soc/qcom/socinfo.c index 5beb452f24013..491f33973aa0c 100644 --- a/drivers/soc/qcom/socinfo.c +++ b/drivers/soc/qcom/socinfo.c @@ -614,10 +614,16 @@ static int qcom_socinfo_probe(struct platform_device *pdev) qs->attr.revision = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u.%u", SOCINFO_MAJOR(le32_to_cpu(info->ver)), SOCINFO_MINOR(le32_to_cpu(info->ver))); - if (offsetof(struct socinfo, serial_num) <= item_size) + if (!qs->attr.soc_id || qs->attr.revision) + return -ENOMEM; + + if (offsetof(struct socinfo, serial_num) <= item_size) { qs->attr.serial_number = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u", le32_to_cpu(info->serial_num)); + if (!qs->attr.serial_number) + return -ENOMEM; + } qs->soc_dev = soc_device_register(&qs->attr); if (IS_ERR(qs->soc_dev)) -- 2.43.0

10 months, 2 weeks

1
7
0 0

[PATCH AUTOSEL 6.1 01/12] soc: qcom: Add check devm_kasprintf() returned value

by Sasha Levin

From: Charles Han <hanchunchao(a)inspur.com> [ Upstream commit e694d2b5c58ba2d1e995d068707c8d966e7f5f2a ] devm_kasprintf() can return a NULL pointer on failure but this returned value in qcom_socinfo_probe() is not checked. Signed-off-by: Charles Han <hanchunchao(a)inspur.com> Link: https://lore.kernel.org/r/20240929072349.202520-1-hanchunchao@inspur.com Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/soc/qcom/socinfo.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/soc/qcom/socinfo.c b/drivers/soc/qcom/socinfo.c index aa37e1bad095c..66219ccd8d47f 100644 --- a/drivers/soc/qcom/socinfo.c +++ b/drivers/soc/qcom/socinfo.c @@ -649,10 +649,16 @@ static int qcom_socinfo_probe(struct platform_device *pdev) qs->attr.revision = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u.%u", SOCINFO_MAJOR(le32_to_cpu(info->ver)), SOCINFO_MINOR(le32_to_cpu(info->ver))); - if (offsetof(struct socinfo, serial_num) <= item_size) + if (!qs->attr.soc_id || qs->attr.revision) + return -ENOMEM; + + if (offsetof(struct socinfo, serial_num) <= item_size) { qs->attr.serial_number = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u", le32_to_cpu(info->serial_num)); + if (!qs->attr.serial_number) + return -ENOMEM; + } qs->soc_dev = soc_device_register(&qs->attr); if (IS_ERR(qs->soc_dev)) -- 2.43.0

10 months, 2 weeks

1
11
0 0

[PATCH AUTOSEL 6.6 01/15] soc: qcom: Add check devm_kasprintf() returned value

by Sasha Levin

From: Charles Han <hanchunchao(a)inspur.com> [ Upstream commit e694d2b5c58ba2d1e995d068707c8d966e7f5f2a ] devm_kasprintf() can return a NULL pointer on failure but this returned value in qcom_socinfo_probe() is not checked. Signed-off-by: Charles Han <hanchunchao(a)inspur.com> Link: https://lore.kernel.org/r/20240929072349.202520-1-hanchunchao@inspur.com Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/soc/qcom/socinfo.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/soc/qcom/socinfo.c b/drivers/soc/qcom/socinfo.c index 880b41a57da01..f979ef420354f 100644 --- a/drivers/soc/qcom/socinfo.c +++ b/drivers/soc/qcom/socinfo.c @@ -757,10 +757,16 @@ static int qcom_socinfo_probe(struct platform_device *pdev) qs->attr.revision = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u.%u", SOCINFO_MAJOR(le32_to_cpu(info->ver)), SOCINFO_MINOR(le32_to_cpu(info->ver))); - if (offsetof(struct socinfo, serial_num) <= item_size) + if (!qs->attr.soc_id || qs->attr.revision) + return -ENOMEM; + + if (offsetof(struct socinfo, serial_num) <= item_size) { qs->attr.serial_number = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u", le32_to_cpu(info->serial_num)); + if (!qs->attr.serial_number) + return -ENOMEM; + } qs->soc_dev = soc_device_register(&qs->attr); if (IS_ERR(qs->soc_dev)) -- 2.43.0

10 months, 2 weeks

1
14
0 0

[PATCH AUTOSEL 6.11 01/16] soc: qcom: Add check devm_kasprintf() returned value

by Sasha Levin

From: Charles Han <hanchunchao(a)inspur.com> [ Upstream commit e694d2b5c58ba2d1e995d068707c8d966e7f5f2a ] devm_kasprintf() can return a NULL pointer on failure but this returned value in qcom_socinfo_probe() is not checked. Signed-off-by: Charles Han <hanchunchao(a)inspur.com> Link: https://lore.kernel.org/r/20240929072349.202520-1-hanchunchao@inspur.com Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/soc/qcom/socinfo.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/soc/qcom/socinfo.c b/drivers/soc/qcom/socinfo.c index d7359a235e3cf..1d5a69eda26e5 100644 --- a/drivers/soc/qcom/socinfo.c +++ b/drivers/soc/qcom/socinfo.c @@ -782,10 +782,16 @@ static int qcom_socinfo_probe(struct platform_device *pdev) qs->attr.revision = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u.%u", SOCINFO_MAJOR(le32_to_cpu(info->ver)), SOCINFO_MINOR(le32_to_cpu(info->ver))); - if (offsetof(struct socinfo, serial_num) <= item_size) + if (!qs->attr.soc_id || qs->attr.revision) + return -ENOMEM; + + if (offsetof(struct socinfo, serial_num) <= item_size) { qs->attr.serial_number = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%u", le32_to_cpu(info->serial_num)); + if (!qs->attr.serial_number) + return -ENOMEM; + } qs->soc_dev = soc_device_register(&qs->attr); if (IS_ERR(qs->soc_dev)) -- 2.43.0

10 months, 2 weeks

1
15
0 0

[PATCH 5.10] io_uring: fix possible deadlock in io_register_iowq_max_workers()

by Hagar Hemdan

commit 73254a297c2dd094abec7c9efee32455ae875bdf upstream. The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <mheyne(a)amazon.de> Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> [Hagar: Modified to apply on v5.10] --- io_uring/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index f1ab0cd98727..3dbc704c7001 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -10818,8 +10818,10 @@ static int io_register_iowq_max_workers(struct io_ring_ctx *ctx, } if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); } if (copy_to_user(arg, new_count, sizeof(new_count))) @@ -10844,8 +10846,11 @@ static int io_register_iowq_max_workers(struct io_ring_ctx *ctx, return 0; err: if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); + } return ret; } -- 2.40.1

10 months, 2 weeks

1
0
0 0

[PATCH 5.15] io_uring: fix possible deadlock in io_register_iowq_max_workers()

by Hagar Hemdan

commit 73254a297c2dd094abec7c9efee32455ae875bdf upstream. The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <mheyne(a)amazon.de> Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> [Hagar: Modified to apply on v5.15] Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> --- io_uring/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index f1ab0cd98727..3dbc704c7001 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -10818,8 +10818,10 @@ static int io_register_iowq_max_workers(struct io_ring_ctx *ctx, } if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); } if (copy_to_user(arg, new_count, sizeof(new_count))) @@ -10844,8 +10846,11 @@ static int io_register_iowq_max_workers(struct io_ring_ctx *ctx, return 0; err: if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); + } return ret; } -- 2.40.1

10 months, 2 weeks

1
0
0 0

[PATCH 6.1] io_uring: fix possible deadlock in io_register_iowq_max_workers()

by Hagar Hemdan

commit 73254a297c2dd094abec7c9efee32455ae875bdf upstream. The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <mheyne(a)amazon.de> Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> [Hagar: Modified to apply on v6.1] Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> --- io_uring/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 92c1aa8f3501..4f0ae938b146 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3921,8 +3921,10 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx, } if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); } if (copy_to_user(arg, new_count, sizeof(new_count))) @@ -3947,8 +3949,11 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx, return 0; err: if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); + } return ret; } -- 2.40.1

10 months, 2 weeks

1
0
0 0

[PATCH 6.6] io_uring: fix possible deadlock in io_register_iowq_max_workers()

by Hagar Hemdan

commit 73254a297c2dd094abec7c9efee32455ae875bdf upstream. The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <mheyne(a)amazon.de> Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> [Hagar: Modified to apply on v6.6] Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> --- io_uring/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 484c9bcbee77..70dd6a5b9647 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4358,8 +4358,10 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx, } if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); } if (copy_to_user(arg, new_count, sizeof(new_count))) @@ -4384,8 +4386,11 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx, return 0; err: if (sqd) { + mutex_unlock(&ctx->uring_lock); mutex_unlock(&sqd->lock); io_put_sq_data(sqd); + mutex_lock(&ctx->uring_lock); + } return ret; } -- 2.40.1

10 months, 2 weeks

1
0
0 0

[PATCH] x86/efi: Apply EFI Memory Attributes after kexec

by Nicolas Saenz Julienne

Kexec bypasses EFI's switch to virtual mode. In exchange, it has its own routine, kexec_enter_virtual_mode(), that replays the mappings made by the original kernel. Unfortunately, the function fails to reinstate EFI's memory attributes and runtime memory protections, which would've otherwise been set after entering virtual mode. Remediate this by calling efi_runtime_update_mappings() from it. Cc: stable(a)vger.kernel.org Fixes: 18141e89a76c ("x86/efi: Add support for EFI_MEMORY_ATTRIBUTES_TABLE") Signed-off-by: Nicolas Saenz Julienne <nsaenz(a)amazon.com> --- Notes: - I tested the Memory Attributes path using QEMU/OVMF. - Although care is taken to make sure the memory backing the EFI Memory Attributes table is preserved during runtime and reachable after kexec (see efi_memattr_init()). I don't see the same happening for the EFI properties table. Maybe it's just unnecessary as there's an assumption that the table will fall in memory preserved during runtime? Or for another reason? Otherwise, we'd need to make sure it isn't possible to set EFI_NX_PE_DATA on kexec. arch/x86/platform/efi/efi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 88a96816de9a..b9b17892c495 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -784,6 +784,7 @@ static void __init kexec_enter_virtual_mode(void) efi_sync_low_kernel_mappings(); efi_native_runtime_setup(); + efi_runtime_update_mappings(); #endif } -- 2.40.1

10 months, 2 weeks

2
2
0 0

[PATCH 6.11 000/249] 6.11.7-rc2 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 6.11.7 release. There are 249 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sat, 09 Nov 2024 06:45:18 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.11.7-rc2… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.11.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 6.11.7-rc2 Uladzislau Rezki (Sony) <urezki(a)gmail.com> rcu/kvfree: Refactor kvfree_rcu_queue_batch() Florian Westphal <fw(a)strlen.de> lib: alloc_tag_module_unload must wait for pending kfree_rcu calls Uladzislau Rezki (Sony) <urezki(a)gmail.com> rcu/kvfree: Add kvfree_rcu_barrier() API Conor Dooley <conor.dooley(a)microchip.com> RISC-V: disallow gcc + rust builds David Sterba <dsterba(a)suse.com> MIPS: export __cmpxchg_small() Alex Deucher <alexander.deucher(a)amd.com> drm/amdgpu: handle default profile on on devices without fullscreen 3D Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Sequential field availability check in mi_enum_attr() Alex Deucher <alexander.deucher(a)amd.com> drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs Alex Deucher <alexander.deucher(a)amd.com> drm/amdgpu/swsmu: fix ordering for setting workload_mask Tejas Upadhyay <tejas.upadhyay(a)intel.com> drm/xe: Write all slices if its mcr register Tejas Upadhyay <tejas.upadhyay(a)intel.com> drm/xe: Define STATELESS_COMPRESSION_CTRL as mcr register Shekhar Chauhan <shekhar.chauhan(a)intel.com> drm/xe/xe2: Add performance turning changes Akshata Jahagirdar <akshata.jahagirdar(a)intel.com> drm/xe/xe2: Introduce performance changes Sai Teja Pottumuttu <sai.teja.pottumuttu(a)intel.com> drm/xe/xe2hpg: Introduce performance tuning changes for Xe2_HPG Tejas Upadhyay <tejas.upadhyay(a)intel.com> drm/xe: Move enable host l2 VRAM post MCR init Tejas Upadhyay <tejas.upadhyay(a)intel.com> drm/xe/xe2hpg: Add Wa_15016589081 Thomas Zimmermann <tzimmermann(a)suse.de> drm/xe: Support 'nomodeset' kernel command-line option Juha-Pekka Heikkila <juhapekka.heikkila(a)gmail.com> drm/i915/display: Don't enable decompression on Xe2 with Tile4 Jouni Högander <jouni.hogander(a)intel.com> drm/i915/psr: Prevent Panel Replay if CRC calculation is enabled Jani Nikula <jani.nikula(a)intel.com> drm/xe/display: drop unused rawclk_freq and RUNTIME_INFO() Jani Nikula <jani.nikula(a)intel.com> drm/i915: move rawclk from runtime to display runtime info Suraj Kandpal <suraj.kandpal(a)intel.com> drm/i915/pps: Disable DPLS_GATING around pps sequence Mitul Golani <mitulkumar.ajitkumar.golani(a)intel.com> drm/i915/display/dp: Compute AS SDP when vrr is also enabled Suraj Kandpal <suraj.kandpal(a)intel.com> drm/i915/dp: Clear VSC SDP during post ddi disable routine Suraj Kandpal <suraj.kandpal(a)intel.com> drm/i915/hdcp: Add encoder check in hdcp2_get_capability Suraj Kandpal <suraj.kandpal(a)intel.com> drm/i915/hdcp: Add encoder check in intel_hdcp_get_capability Mitul Golani <mitulkumar.ajitkumar.golani(a)intel.com> drm/i915/display: WA for Re-initialize dispcnlunitt1 xosc clock Mitul Golani <mitulkumar.ajitkumar.golani(a)intel.com> drm/i915/display: Cache adpative sync caps to use it later Matthew Auld <matthew.auld(a)intel.com> drm/i915: disable fbc due to Wa_16023588340 Gustavo Sousa <gustavo.sousa(a)intel.com> drm/i915: Skip programming FIA link enable bits for MTL+ Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100: fix PCIe4 and PCIe6a PHY clocks Abel Vesa <abel.vesa(a)linaro.org> arm64: dts: qcom: x1e80100: Add Broadcast_AND region in LLCC block Haibo Chen <haibo.chen(a)nxp.com> arm64: dts: imx8ulp: correct the flexspi compatible string Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-crd: fix nvme regulator boot glitch Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100: fix PCIe4 interconnect Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-vivobook-s15: fix nvme regulator boot glitch Konrad Dybcio <konradybcio(a)kernel.org> arm64: dts: qcom: x1e80100: Fix up BAR spaces Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch Fabien Parent <fabien.parent(a)linaro.org> arm64: dts: qcom: msm8939: revert use of APCS mbox for RPM Conor Dooley <conor.dooley(a)microchip.com> riscv: dts: starfive: disable unused csi/camss nodes E Shattow <e(a)freeshell.de> riscv: dts: starfive: Update ethernet phy0 delay parameter values for Star64 Yu Zhao <yuzhao(a)google.com> mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() Zhiguo Jiang <justinjiang(a)vivo.com> mm: shrink skip folio mapped by an exiting process Yu Zhao <yuzhao(a)google.com> mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Yuanchu Xie <yuanchu(a)google.com> mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true Dmitry Torokhov <dmitry.torokhov(a)gmail.com> Input: fix regression when re-registering input handlers Vlastimil Babka <vbabka(a)suse.cz> mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes Gregory Price <gourry(a)gourry.net> vmscan,migrate: fix page count imbalance on node stats when demoting pages Johan Hovold <johan+linaro(a)kernel.org> gpiolib: fix debugfs dangling chip separator Johan Hovold <johan+linaro(a)kernel.org> gpiolib: fix debugfs newline separators Filipe Manana <fdmanana(a)suse.com> btrfs: fix defrag not merging contiguous extents due to merged extent maps Filipe Manana <fdmanana(a)suse.com> btrfs: fix extent map merging not happening for adjacent extents Jens Axboe <axboe(a)kernel.dk> io_uring/rw: fix missing NOWAIT check for O_DIRECT start write Matthew Brost <matthew.brost(a)intel.com> drm/xe: Don't short circuit TDR on jobs not started Matthew Brost <matthew.brost(a)intel.com> drm/xe: Add mmio read before GGTT invalidate Michal Wajdeczko <michal.wajdeczko(a)intel.com> drm/xe: Kill regs/xe_sriov_regs.h Michal Wajdeczko <michal.wajdeczko(a)intel.com> drm/xe: Fix register definition order in xe_regs.h Jinjie Ruan <ruanjinjie(a)huawei.com> drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic() Jinjie Ruan <ruanjinjie(a)huawei.com> drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic() Jinjie Ruan <ruanjinjie(a)huawei.com> drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic() Andrey Konovalov <andreyknvl(a)gmail.com> kasan: remove vmalloc_percpu test Keith Busch <kbusch(a)kernel.org> nvme: re-fix error-handling for io_uring nvme-passthrough Vitaliy Shevtsov <v.shevtsov(a)maxima.ru> nvmet-auth: assign dh_key to NULL after kfree_sensitive Christoffer Sandberg <cs(a)tuxedo.de> ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 Christoffer Sandberg <cs(a)tuxedo.de> ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 Christoph Hellwig <hch(a)lst.de> xfs: fix finding a last resort AG in xfs_filestream_pick_ag Andrzej Kacprowski <Andrzej.Kacprowski(a)intel.com> accel/ivpu: Fix NOC firewall interrupt handling Zhihao Cheng <chengzhihao1(a)huawei.com> btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() Matt Johnston <matt(a)codeconstruct.com.au> mctp i2c: handle NULL header address Gregory Price <gourry(a)gourry.net> resource,kexec: walk_system_ram_res_rev must retain resource flags Edward Adam Davis <eadavis(a)qq.com> ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow Sabyrzhan Tasbolatov <snovitoll(a)gmail.com> x86/traps: move kmsan check after instrumentation_begin Gatlin Newhouse <gatlin.newhouse(a)gmail.com> x86/traps: Enable UBSAN traps on x86 Matt Fleming <mfleming(a)cloudflare.com> mm/page_alloc: let GFP_ATOMIC order-0 allocs access highatomic reserves Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> fork: only invoke khugepaged, ksm hooks if no error Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> fork: do not invoke uffd on fork if error occurs Alexander Usyskin <alexander.usyskin(a)intel.com> mei: use kvmalloc for read buffer Matthieu Baerts (NGI0) <matttbe(a)kernel.org> mptcp: init: protect sched with rcu_read_lock Jarkko Sakkinen <jarkko(a)kernel.org> tpm: Lazily flush the auth session Alex Deucher <alexander.deucher(a)amd.com> drm/amdgpu/smu13: fix profile reporting Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> drm/amd/pm: Vangogh: Fix kernel memory out of bounds write Jarkko Sakkinen <jarkko(a)kernel.org> tpm: Rollback tpm2_load_null() Jarkko Sakkinen <jarkko(a)kernel.org> tpm: Return tpm2_sessions_init() when null key creation fails Hugh Dickins <hughd(a)google.com> iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP Benjamin Segall <bsegall(a)google.com> posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone Shawn Wang <shawnwang(a)linux.alibaba.com> sched/numa: Fix the potential null pointer dereference in task_numa_work() Dan Williams <dan.j.williams(a)intel.com> cxl/acpi: Ensure ports ready at cxl_acpi_probe() return Dan Williams <dan.j.williams(a)intel.com> cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() Peter Wang <peter.wang(a)mediatek.com> scsi: ufs: core: Fix another deadlock during RTC update Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> riscv: Remove duplicated GET_RM Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> riscv: Remove unused GENERATING_ASM_OFFSETS WangYuli <wangyuli(a)uniontech.com> riscv: Use '%u' to format the output of 'cpu' Miquel Sabaté Solà <mikisabate(a)gmail.com> riscv: Prevent a bad reference count on CPU nodes Heinrich Schuchardt <heinrich.schuchardt(a)canonical.com> riscv: efi: Set NX compat flag in PE/COFF header Kailang Yang <kailang(a)realtek.com> ALSA: hda/realtek: Limit internal Mic boost on Dell platform Dmitry Torokhov <dmitry.torokhov(a)gmail.com> Input: edt-ft5x06 - fix regmap leak when probe fails Alexandre Ghiti <alexghiti(a)rivosinc.com> riscv: vdso: Prevent the compiler from inserting calls to memset() Frank Li <Frank.Li(a)nxp.com> spi: spi-fsl-dspi: Fix crash when not using GPIO chip select Naohiro Aota <naohiro.aota(a)wdc.com> btrfs: fix error propagation of split bios Qu Wenruo <wqu(a)suse.com> btrfs: merge btrfs_orig_bbio_end_io() into btrfs_bio_end_io() Richard Zhu <hongxing.zhu(a)nxp.com> phy: freescale: imx8m-pcie: Do CMN_RST just before PHY PLL lock check Chen Ridong <chenridong(a)huawei.com> cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction Xinyu Zhang <xizhang(a)purestorage.com> block: fix sanity checks in blk_rq_map_user_bvec Ben Chuang <ben.chuang(a)genesyslogic.com.tw> mmc: sdhci-pci-gli: GL9767: Fix low power mode in the SD Express process Ben Chuang <ben.chuang(a)genesyslogic.com.tw> mmc: sdhci-pci-gli: GL9767: Fix low power mode on the set clock function Dan Williams <dan.j.williams(a)intel.com> cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams <dan.j.williams(a)intel.com> cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com> soc: qcom: pmic_glink: Handle GLINK intent allocation rejections Gil Fine <gil.fine(a)linux.intel.com> thunderbolt: Honor TMU requirements in the domain when setting TMU mode Mika Westerberg <mika.westerberg(a)linux.intel.com> thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan() Conor Dooley <conor.dooley(a)microchip.com> firmware: microchip: auto-update: fix poll_complete() to not report spurious timeout errors Chen Ridong <chenridong(a)huawei.com> mm: shrinker: avoid memleak in alloc_shrinker_info Wladislav Wiebe <wladislav.kw(a)gmail.com> tools/mm: -Werror fixes in page-types/slabinfo Jeongjun Park <aha310510(a)gmail.com> mm: shmem: fix data-race in shmem_getattr() Yunhui Cui <cuiyunhui(a)bytedance.com> RISC-V: ACPI: fix early_ioremap to early_memremap Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix potential deadlock with newly created symlinks Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix kernel bug due to missing clearing of checked flag Javier Carrasco <javier.carrasco.cruz(a)gmail.com> iio: light: veml6030: fix microlux value calculation Jinjie Ruan <ruanjinjie(a)huawei.com> iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table() Jinjie Ruan <ruanjinjie(a)huawei.com> iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table() Zicheng Qu <quzicheng(a)huawei.com> iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr() Julien Stephan <jstephan(a)baylibre.com> dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply Zicheng Qu <quzicheng(a)huawei.com> staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg() Johannes Berg <johannes.berg(a)intel.com> wifi: iwlwifi: mvm: fix 6 GHz scan construction Ville Syrjälä <ville.syrjala(a)linux.intel.com> wifi: iwlegacy: Clear stale interrupts before resuming device Johannes Berg <johannes.berg(a)intel.com> wifi: cfg80211: clear wdev->cqm_config pointer on free Manikanta Pubbisetty <quic_mpubbise(a)quicinc.com> wifi: ath10k: Fix memory leak in management tx Felix Fietkau <nbd(a)nbd.name> wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower Edward Liaw <edliaw(a)google.com> Revert "selftests/mm: replace atomic_bool with pthread_barrier_t" Edward Liaw <edliaw(a)google.com> Revert "selftests/mm: fix deadlock for fork after pthread_create on ARM" Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Revert "driver core: Fix uevent_show() vs driver detach race" Basavaraj Natikar <Basavaraj.Natikar(a)amd.com> xhci: Use pm_runtime_get to prevent RPM on unsupported systems Faisal Hassan <quic_faisalh(a)quicinc.com> xhci: Fix Link TRB DMA in command ring stopped completion event Johan Hovold <johan+linaro(a)kernel.org> phy: qcom: qmp-usbc: fix NULL-deref on runtime suspend Johan Hovold <johan+linaro(a)kernel.org> phy: qcom: qmp-usb-legacy: fix NULL-deref on runtime suspend Johan Hovold <johan+linaro(a)kernel.org> phy: qcom: qmp-usb: fix NULL-deref on runtime suspend Javier Carrasco <javier.carrasco.cruz(a)gmail.com> usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path Javier Carrasco <javier.carrasco.cruz(a)gmail.com> usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes Amit Sunil Dhamne <amitsd(a)google.com> usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices Javier Carrasco <javier.carrasco.cruz(a)gmail.com> usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes() Zijun Hu <quic_zijuhu(a)quicinc.com> usb: phy: Fix API devm_usb_put_phy() can not release the phy Zongmin Zhou <zhouzongmin(a)kylinos.cn> usbip: tools: Fix detach_port() invalid port error path Bitterblue Smith <rtl8821cerfe2(a)gmail.com> wifi: rtlwifi: rtl8192du: Don't claim USB ID 0bda:8171 Jan Schär <jan(a)jschaer.ch> ALSA: usb-audio: Add quirks for Dell WD19 dock Chuck Lever <chuck.lever(a)oracle.com> rpcrdma: Always release the rpcrdma_device's xa_array Chuck Lever <chuck.lever(a)oracle.com> NFSD: Never decrement pending_async_copies on error Chuck Lever <chuck.lever(a)oracle.com> NFSD: Initialize struct nfsd4_copy earlier Dimitri Sivanich <sivanich(a)hpe.com> misc: sgi-gru: Don't disable preemption in GRU driver Dai Ngo <dai.ngo(a)oracle.com> NFS: remove revoked delegation from server's delegation list Daniel Palmer <daniel(a)0x0f.com> net: amd: mvme147: Fix probe banner message Zhang Rui <rui.zhang(a)intel.com> thermal: intel: int340x: processor: Add MMIO RAPL PL4 support Zhang Rui <rui.zhang(a)intel.com> thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug support Sumeet Pawnikar <sumeet.r.pawnikar(a)intel.com> powercap: intel_rapl_msr: Add PL4 support for Arrowlake-U Hans de Goede <hdegoede(a)redhat.com> ACPI: resource: Fold Asus Vivobook Pro N6506M* DMI quirks together Pali Rohár <pali(a)kernel.org> cifs: Fix creating native symlinks pointing to current or parent directory Pali Rohár <pali(a)kernel.org> cifs: Improve creating native symlinks pointing to directory Benjamin Marzinski <bmarzins(a)redhat.com> scsi: scsi_transport_fc: Allow setting rport state to current state Guilherme Giacomo Simoes <trintaeoitogc(a)gmail.com> rust: device: change the from_raw() function Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Additional check in ntfs_file_release Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Fix general protection fault in run_is_mapped_full Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Additional check in ni_clear() Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Fix possible deadlock in mi_read Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Add rough attr alloc_size check Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Stale inode instead of bad Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Fix warning possible deadlock in ntfs_set_state Andrew Ballance <andrewjballance(a)gmail.com> fs/ntfs3: Check if more than chunk-size bytes are written lei lu <llfamsec(a)gmail.com> ntfs3: Add bounds checking to mi_enum_attr() Boris Brezillon <boris.brezillon(a)collabora.com> drm/panthor: Report group as timedout when we fail to properly suspend Boris Brezillon <boris.brezillon(a)collabora.com> drm/panthor: Fail job creation when the group is dead Boris Brezillon <boris.brezillon(a)collabora.com> drm/panthor: Fix firmware initialization on systems with a page size > 4k Keith Busch <kbusch(a)kernel.org> nvme: module parameter to disable pi with offsets Jason Gunthorpe <jgg(a)ziepe.ca> PCI: Fix pci_enable_acs() support for the ACS quirks Shiju Jose <shiju.jose(a)huawei.com> cxl/events: Fix Trace DRAM Event Record Dan Carpenter <dan.carpenter(a)linaro.org> drm/tegra: Fix NULL vs IS_ERR() check in probe() Dan Carpenter <dan.carpenter(a)linaro.org> drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy() Chun-Kuang Hu <chunkuang.hu(a)kernel.org> drm/mediatek: Use cmdq_pkt_create() and cmdq_pkt_destroy() Liankun Yang <liankun.yang(a)mediatek.com> drm/mediatek: Fix get efuse issue for MT8188 DPTX Hsin-Te Yuan <yuanhsinte(a)chromium.org> drm/mediatek: Fix color format MACROs in OVL Jason-JH.Lin <jason-jh.lin(a)mediatek.com> drm/mediatek: ovl: Remove the color format comment for ovl_fmt_convert() Paulo Alcantara <pc(a)manguebit.com> smb: client: set correct device number on nfs reparse points Paulo Alcantara <pc(a)manguebit.com> smb: client: fix parsing of device numbers Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> gpio: sloppy-logic-analyzer: Check for error code from devm_mutex_init() call Pierre Gondois <pierre.gondois(a)arm.com> ACPI: CPPC: Make rmw_lock a raw_spin_lock David Howells <dhowells(a)redhat.com> afs: Fix missing subdir edit when renamed between parent dirs Xiongfeng Wang <wangxiongfeng2(a)huawei.com> firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Marco Elver <elver(a)google.com> kasan: Fix Software Tag-Based KASAN with GCC Christoph Hellwig <hch(a)lst.de> iomap: turn iomap_want_unshare_iter into an inline function Darrick J. Wong <djwong(a)kernel.org> fsdax: dax_unshare_iter needs to copy entire blocks Darrick J. Wong <djwong(a)kernel.org> fsdax: remove zeroing code from dax_unshare_iter Darrick J. Wong <djwong(a)kernel.org> iomap: share iomap_unshare_iter predicate code with fsdax Darrick J. Wong <djwong(a)kernel.org> iomap: don't bother unsharing delalloc extents Christoph Hellwig <hch(a)lst.de> iomap: improve shared block detection in iomap_unshare_iter Toke Høiland-Jørgensen <toke(a)redhat.com> bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: nft_payload: sanitize offset and length before calling skb_checksum() Daniel Golle <daniel(a)makrotopia.org> net: ethernet: mtk_wed: fix path of MT7988 WO firmware Ido Schimmel <idosch(a)nvidia.com> mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address Amit Cohen <amcohen(a)nvidia.com> mlxsw: pci: Sync Rx buffers for device Amit Cohen <amcohen(a)nvidia.com> mlxsw: pci: Sync Rx buffers for CPU Amit Cohen <amcohen(a)nvidia.com> mlxsw: spectrum_ptp: Add missing verification before pushing Tx header Benoît Monin <benoit.monin(a)gmx.fr> net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension Hou Tao <houtao1(a)huawei.com> bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao <houtao1(a)huawei.com> bpf: Add bpf_mem_alloc_check_size() helper Hou Tao <houtao1(a)huawei.com> bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() Sungwoo Kim <iam(a)sung-woo.kim> Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs Eric Dumazet <edumazet(a)google.com> netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6() Dong Chenchen <dongchenchen2(a)huawei.com> netfilter: Fix use-after-free in get_info() Wang Liang <wangliang74(a)huawei.com> net: fix crash when config small gso_max_size/gso_ipv4_max_size Byeonguk Jeong <jungbu2855(a)gmail.com> bpf: Fix out-of-bounds write in trie_get_next_key() Vladimir Oltean <vladimir.oltean(a)nxp.com> net/sched: sch_api: fix xa_insert() error path in tcf_block_get_ext() Zichen Xie <zichenxie0106(a)gmail.com> netdevsim: Add trailing zero to terminate the string in nsim_nexthop_bucket_activity_write() Eduard Zingerman <eddyz87(a)gmail.com> bpf: Force checkpoint when jmp history is too long Pedro Tammela <pctammela(a)mojatatu.com> net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT Pablo Neira Ayuso <pablo(a)netfilter.org> gtp: allow -1 to be specified as file description from userspace Ido Schimmel <idosch(a)nvidia.com> ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find() Ido Schimmel <idosch(a)nvidia.com> ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_init_flow() Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com> ice: fix crash on probe for DPLL enabled E810 LOM Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com> ice: add callbacks for Embedded SYNC enablement on dpll pins Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com> dpll: add Embedded SYNC feature for a pin Wander Lairson Costa <wander(a)redhat.com> igb: Disable threaded IRQ for igb_msix_other Furong Xu <0x1207(a)gmail.com> net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data Ley Foon Tan <leyfoon.tan(a)starfivetech.com> net: stmmac: dwmac4: Fix high address display by updating reg_space[] from register values Cong Wang <cong.wang(a)bytedance.com> sock_map: fix a NULL pointer dereference in sock_map_link_update_prog() Aleksei Vetrov <vvvvvv(a)google.com> ASoC: dapm: fix bounds checker error in dapm_widget_list_create Jianbo Liu <jianbol(a)nvidia.com> macsec: Fix use-after-free while sending the offloading packet Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> ASoC: cs42l51: Fix some error handling paths in cs42l51_probe() Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> Revert "wifi: iwlwifi: remove retry loops in start" Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> wifi: iwlwifi: mvm: don't add default link in fw restart flow Daniel Gabay <daniel.gabay(a)intel.com> wifi: iwlwifi: mvm: Fix response handling in iwl_mvm_send_recovery_cmd() Miri Korenblit <miriam.rachel.korenblit(a)intel.com> wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> wifi: iwlwifi: mvm: don't leak a link on AP removal Selvin Xavier <selvin.xavier(a)broadcom.com> RDMA/bnxt_re: synchronize the qp-handle table array Selvin Xavier <selvin.xavier(a)broadcom.com> RDMA/bnxt_re: Fix the usage of control path spin locks Patrisious Haddad <phaddad(a)nvidia.com> RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of down Leon Romanovsky <leon(a)kernel.org> RDMA/cxgb4: Dump vendor specific QP details Geert Uytterhoeven <geert(a)linux-m68k.org> wifi: brcm80211: BRCM_TRACING should depend on TRACING Ping-Ke Shih <pkshih(a)realtek.com> wifi: rtw89: pci: early chips only enable 36-bit DMA on specific PCI hosts Remi Pommarel <repk(a)triplefau.lt> wifi: ath11k: Fix invalid ring usage in full monitor mode Felix Fietkau <nbd(a)nbd.name> wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys Geert Uytterhoeven <geert(a)linux-m68k.org> mac80211: MAC80211_MESSAGE_TRACING should depend on TRACING Ben Hutchings <ben(a)decadent.org.uk> wifi: iwlegacy: Fix "field-spanning write" warning in il_enqueue_hcmd() John Garry <john.g.garry(a)oracle.com> scsi: scsi_debug: Fix do_device_access() handling of unexpected SG copy length Arnaldo Carvalho de Melo <acme(a)redhat.com> perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT Jiri Slaby <jirislaby(a)kernel.org> perf trace: Fix non-listed archs in the syscalltbl routines Pei Xiao <xiaopei01(a)kylinos.cn> slub/kunit: fix a WARNING due to unwrapped __kmalloc_cache_noprof Georgi Djakov <djakov(a)kernel.org> spi: geni-qcom: Fix boot warning related to pm_runtime and devres Xiu Jianfeng <xiujianfeng(a)huawei.com> cgroup: Fix potential overflow issue when checking max_depth Frank Min <Frank.Min(a)amd.com> drm/amdgpu: fix random data corruption for sdma 7 ------------- Diffstat: .../devicetree/bindings/iio/adc/adi,ad7380.yaml | 21 ++ Documentation/driver-api/dpll.rst | 21 ++ Documentation/netlink/specs/dpll.yaml | 24 ++ Documentation/rust/arch-support.rst | 2 +- Makefile | 4 +- arch/arm64/boot/dts/freescale/imx8ulp.dtsi | 2 +- arch/arm64/boot/dts/qcom/msm8939.dtsi | 2 +- .../boot/dts/qcom/x1e80100-asus-vivobook-s15.dts | 2 + arch/arm64/boot/dts/qcom/x1e80100-crd.dts | 2 + .../boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts | 2 + arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 2 + arch/arm64/boot/dts/qcom/x1e80100.dtsi | 34 ++- arch/mips/kernel/cmpxchg.c | 1 + arch/riscv/Kconfig | 2 +- arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 - .../boot/dts/starfive/jh7110-pine64-star64.dts | 3 +- arch/riscv/kernel/acpi.c | 4 +- arch/riscv/kernel/asm-offsets.c | 2 - arch/riscv/kernel/cacheinfo.c | 7 +- arch/riscv/kernel/cpu-hotplug.c | 2 +- arch/riscv/kernel/efi-header.S | 2 +- arch/riscv/kernel/traps_misaligned.c | 2 - arch/riscv/kernel/vdso/Makefile | 1 + arch/x86/include/asm/bug.h | 12 + arch/x86/kernel/traps.c | 71 ++++- block/blk-map.c | 4 +- drivers/accel/ivpu/ivpu_debugfs.c | 9 + drivers/accel/ivpu/ivpu_hw.c | 1 + drivers/accel/ivpu/ivpu_hw.h | 1 + drivers/accel/ivpu/ivpu_hw_ip.c | 5 +- drivers/acpi/cppc_acpi.c | 9 +- drivers/acpi/resource.c | 18 +- drivers/base/core.c | 48 +++- drivers/base/module.c | 4 - drivers/char/tpm/tpm-chip.c | 10 + drivers/char/tpm/tpm-dev-common.c | 3 + drivers/char/tpm/tpm-interface.c | 6 +- drivers/char/tpm/tpm2-sessions.c | 100 ++++--- drivers/cxl/Kconfig | 1 + drivers/cxl/Makefile | 20 +- drivers/cxl/acpi.c | 7 + drivers/cxl/core/hdm.c | 50 +++- drivers/cxl/core/port.c | 13 +- drivers/cxl/core/region.c | 48 +--- drivers/cxl/core/trace.h | 17 +- drivers/cxl/cxl.h | 3 +- drivers/cxl/port.c | 17 +- drivers/dpll/dpll_netlink.c | 130 +++++++++ drivers/dpll/dpll_nl.c | 5 +- drivers/firmware/arm_sdei.c | 2 +- drivers/firmware/microchip/mpfs-auto-update.c | 42 +-- drivers/gpio/gpio-sloppy-logic-analyzer.c | 4 +- drivers/gpio/gpiolib.c | 4 +- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 9 +- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 15 +- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 4 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 6 +- drivers/gpu/drm/i915/display/intel_alpm.c | 2 +- drivers/gpu/drm/i915/display/intel_backlight.c | 10 +- .../gpu/drm/i915/display/intel_display_device.c | 5 + .../gpu/drm/i915/display/intel_display_device.h | 2 + drivers/gpu/drm/i915/display/intel_display_power.c | 8 + .../drm/i915/display/intel_display_power_well.c | 4 +- drivers/gpu/drm/i915/display/intel_display_types.h | 1 + drivers/gpu/drm/i915/display/intel_display_wa.h | 8 + drivers/gpu/drm/i915/display/intel_dp.c | 29 +- drivers/gpu/drm/i915/display/intel_dp.h | 1 - drivers/gpu/drm/i915/display/intel_dp_aux.c | 4 +- drivers/gpu/drm/i915/display/intel_dp_hdcp.c | 11 +- drivers/gpu/drm/i915/display/intel_fbc.c | 6 + drivers/gpu/drm/i915/display/intel_hdcp.c | 7 +- drivers/gpu/drm/i915/display/intel_pps.c | 14 +- drivers/gpu/drm/i915/display/intel_psr.c | 6 + drivers/gpu/drm/i915/display/intel_tc.c | 3 + drivers/gpu/drm/i915/display/intel_vrr.c | 3 +- drivers/gpu/drm/i915/display/skl_universal_plane.c | 5 - drivers/gpu/drm/i915/intel_device_info.c | 5 - drivers/gpu/drm/i915/intel_device_info.h | 2 - drivers/gpu/drm/mediatek/mtk_crtc.c | 47 +--- drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 9 +- drivers/gpu/drm/mediatek/mtk_dp.c | 85 +++++- drivers/gpu/drm/panthor/panthor_fw.c | 4 +- drivers/gpu/drm/panthor/panthor_gem.c | 11 +- drivers/gpu/drm/panthor/panthor_mmu.c | 16 +- drivers/gpu/drm/panthor/panthor_mmu.h | 1 + drivers/gpu/drm/panthor/panthor_sched.c | 20 +- drivers/gpu/drm/tegra/drm.c | 4 +- drivers/gpu/drm/tests/drm_connector_test.c | 24 +- drivers/gpu/drm/tests/drm_hdmi_state_helper_test.c | 8 +- drivers/gpu/drm/tests/drm_kunit_helpers.c | 42 +++ drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h | 1 - drivers/gpu/drm/xe/display/xe_display_wa.c | 16 ++ drivers/gpu/drm/xe/regs/xe_gt_regs.h | 17 +- drivers/gpu/drm/xe/regs/xe_regs.h | 10 +- drivers/gpu/drm/xe/regs/xe_sriov_regs.h | 23 -- drivers/gpu/drm/xe/xe_device_types.h | 6 - drivers/gpu/drm/xe/xe_ggtt.c | 10 + drivers/gpu/drm/xe/xe_gt.c | 10 +- drivers/gpu/drm/xe/xe_gt_sriov_pf.c | 2 +- drivers/gpu/drm/xe/xe_guc_submit.c | 18 +- drivers/gpu/drm/xe/xe_lmtt.c | 2 +- drivers/gpu/drm/xe/xe_module.c | 39 ++- drivers/gpu/drm/xe/xe_sriov.c | 2 +- drivers/gpu/drm/xe/xe_tuning.c | 21 +- drivers/gpu/drm/xe/xe_wa.c | 4 + drivers/iio/adc/ad7124.c | 2 +- drivers/iio/industrialio-gts-helper.c | 4 +- drivers/iio/light/veml6030.c | 2 +- drivers/infiniband/hw/bnxt_re/qplib_fp.c | 4 + drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 38 +-- drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 + drivers/infiniband/hw/cxgb4/provider.c | 1 + drivers/infiniband/hw/mlx5/qp.c | 4 +- drivers/input/input.c | 134 +++++----- drivers/input/touchscreen/edt-ft5x06.c | 19 +- drivers/misc/mei/client.c | 4 +- drivers/misc/sgi-gru/grukservices.c | 2 - drivers/misc/sgi-gru/grumain.c | 4 - drivers/misc/sgi-gru/grutlbpurge.c | 2 - drivers/mmc/host/sdhci-pci-gli.c | 38 ++- drivers/net/ethernet/amd/mvme147.c | 7 +- drivers/net/ethernet/intel/ice/ice_dpll.c | 293 ++++++++++++++++++++- drivers/net/ethernet/intel/ice/ice_dpll.h | 1 + drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 21 +- drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 2 +- drivers/net/ethernet/mediatek/mtk_wed_wo.h | 4 +- drivers/net/ethernet/mellanox/mlxsw/pci.c | 25 +- .../net/ethernet/mellanox/mlxsw/spectrum_ipip.c | 26 +- drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c | 7 + drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 8 + drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 2 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 +- drivers/net/gtp.c | 22 +- drivers/net/macsec.c | 3 +- drivers/net/mctp/mctp-i2c.c | 3 + drivers/net/netdevsim/fib.c | 4 +- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 7 +- drivers/net/wireless/ath/ath10k/wmi.c | 2 + drivers/net/wireless/ath/ath11k/dp_rx.c | 7 +- drivers/net/wireless/broadcom/brcm80211/Kconfig | 1 + drivers/net/wireless/intel/iwlegacy/common.c | 15 +- drivers/net/wireless/intel/iwlegacy/common.h | 12 + drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 34 ++- drivers/net/wireless/intel/iwlwifi/iwl-drv.h | 3 + drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 10 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 12 +- .../net/wireless/intel/iwlwifi/mvm/mld-mac80211.c | 34 ++- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +- .../net/wireless/realtek/rtlwifi/rtl8192du/sw.c | 1 - drivers/net/wireless/realtek/rtw89/pci.c | 48 +++- drivers/nvme/host/core.c | 19 +- drivers/nvme/host/ioctl.c | 7 +- drivers/nvme/target/auth.c | 1 + drivers/pci/pci.c | 14 +- drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 10 +- drivers/phy/qualcomm/phy-qcom-qmp-usb-legacy.c | 1 + drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 1 + drivers/phy/qualcomm/phy-qcom-qmp-usbc.c | 1 + drivers/powercap/intel_rapl_msr.c | 1 + drivers/scsi/scsi_debug.c | 10 +- drivers/scsi/scsi_transport_fc.c | 4 +- drivers/soc/qcom/pmic_glink.c | 25 +- drivers/spi/spi-fsl-dspi.c | 6 +- drivers/spi/spi-geni-qcom.c | 8 +- drivers/staging/iio/frequency/ad9832.c | 7 +- .../intel/int340x_thermal/processor_thermal_rapl.c | 70 ++--- drivers/thunderbolt/retimer.c | 5 +- drivers/thunderbolt/tb.c | 48 +++- drivers/ufs/core/ufshcd.c | 2 +- drivers/usb/host/xhci-pci.c | 6 +- drivers/usb/host/xhci-ring.c | 16 +- drivers/usb/phy/phy.c | 2 +- drivers/usb/typec/class.c | 1 + drivers/usb/typec/tcpm/qcom/qcom_pmic_typec.c | 10 +- drivers/usb/typec/tcpm/tcpm.c | 10 +- fs/afs/dir.c | 25 ++ fs/afs/dir_edit.c | 91 ++++++- fs/afs/internal.h | 2 + fs/btrfs/bio.c | 62 ++--- fs/btrfs/bio.h | 3 + fs/btrfs/defrag.c | 10 +- fs/btrfs/extent_map.c | 7 +- fs/btrfs/volumes.c | 1 + fs/dax.c | 49 ++-- fs/iomap/buffered-io.c | 7 +- fs/nfs/delegation.c | 5 + fs/nfsd/nfs4proc.c | 10 +- fs/nilfs2/namei.c | 3 + fs/nilfs2/page.c | 1 + fs/ntfs3/file.c | 9 +- fs/ntfs3/frecord.c | 4 +- fs/ntfs3/inode.c | 15 +- fs/ntfs3/lznt.c | 3 + fs/ntfs3/namei.c | 2 +- fs/ntfs3/ntfs_fs.h | 2 +- fs/ntfs3/record.c | 31 ++- fs/ocfs2/file.c | 8 + fs/smb/client/cifs_unicode.c | 17 +- fs/smb/client/reparse.c | 174 +++++++++++- fs/smb/client/reparse.h | 9 +- fs/smb/client/smb2inode.c | 3 +- fs/smb/client/smb2proto.h | 1 + fs/userfaultfd.c | 28 ++ fs/xfs/xfs_filestream.c | 23 +- fs/xfs/xfs_trace.h | 15 +- include/acpi/cppc_acpi.h | 2 +- include/drm/drm_kunit_helpers.h | 4 + include/linux/bpf_mem_alloc.h | 3 + include/linux/compiler-gcc.h | 4 + include/linux/device.h | 3 + include/linux/dpll.h | 15 ++ include/linux/input.h | 10 +- include/linux/iomap.h | 19 ++ include/linux/ksm.h | 10 +- include/linux/mmzone.h | 7 +- include/linux/rcutiny.h | 5 + include/linux/rcutree.h | 1 + include/linux/tick.h | 8 + include/linux/ubsan.h | 5 + include/linux/userfaultfd_k.h | 5 + include/net/ip_tunnels.h | 2 +- include/trace/events/afs.h | 7 +- include/uapi/linux/dpll.h | 3 + io_uring/rw.c | 23 +- kernel/bpf/cgroup.c | 19 +- kernel/bpf/helpers.c | 21 +- kernel/bpf/lpm_trie.c | 2 +- kernel/bpf/memalloc.c | 14 +- kernel/bpf/verifier.c | 9 +- kernel/cgroup/cgroup.c | 4 +- kernel/fork.c | 14 +- kernel/rcu/tree.c | 118 ++++++++- kernel/resource.c | 4 +- kernel/sched/fair.c | 4 +- lib/Kconfig.ubsan | 4 +- lib/codetag.c | 3 + lib/iov_iter.c | 6 +- lib/slub_kunit.c | 2 +- mm/kasan/kasan_test.c | 27 -- mm/migrate.c | 2 +- mm/mmap.c | 3 +- mm/page_alloc.c | 10 +- mm/rmap.c | 24 +- mm/shmem.c | 2 + mm/shrinker.c | 8 +- mm/vmscan.c | 109 ++++---- net/bluetooth/hci_sync.c | 18 +- net/bpf/test_run.c | 1 + net/core/dev.c | 4 + net/core/rtnetlink.c | 4 +- net/core/sock_map.c | 4 + net/ipv4/ip_tunnel.c | 2 +- net/ipv6/netfilter/nf_reject_ipv6.c | 15 +- net/mac80211/Kconfig | 2 +- net/mac80211/cfg.c | 3 +- net/mac80211/key.c | 42 +-- net/mptcp/protocol.c | 2 + net/netfilter/nft_payload.c | 3 + net/netfilter/x_tables.c | 2 +- net/sched/cls_api.c | 1 + net/sched/sch_api.c | 2 +- net/sunrpc/xprtrdma/ib_client.c | 1 + net/wireless/core.c | 1 + rust/kernel/device.rs | 15 +- rust/kernel/firmware.rs | 2 +- sound/pci/hda/patch_realtek.c | 23 +- sound/soc/codecs/cs42l51.c | 7 +- sound/soc/soc-dapm.c | 2 + sound/usb/mixer_quirks.c | 3 + tools/mm/page-types.c | 9 +- tools/mm/slabinfo.c | 4 +- tools/perf/util/python.c | 3 + tools/perf/util/syscalltbl.c | 10 + tools/testing/cxl/test/cxl.c | 14 +- tools/testing/selftests/mm/uffd-common.c | 5 +- tools/testing/selftests/mm/uffd-common.h | 3 +- tools/testing/selftests/mm/uffd-unit-tests.c | 21 +- tools/usb/usbip/src/usbip_detach.c | 1 + 281 files changed, 2933 insertions(+), 1083 deletions(-)

10 months, 2 weeks

14
16
0 0

[PATCH v6.1 1/1] net: sched: use RCU read-side critical section in taprio_dump()

by Lee Jones

From: Dmitry Antipov <dmantipov(a)yandex.ru> [ Upstream commit b22db8b8befe90b61c98626ca1a2fbb0505e9fe3 ] Fix possible use-after-free in 'taprio_dump()' by adding RCU read-side critical section there. Never seen on x86 but found on a KASAN-enabled arm64 system when investigating https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa: [T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0 [T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862 [T15862] [T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2 [T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024 [T15862] Call trace: [T15862] dump_backtrace+0x20c/0x220 [T15862] show_stack+0x2c/0x40 [T15862] dump_stack_lvl+0xf8/0x174 [T15862] print_report+0x170/0x4d8 [T15862] kasan_report+0xb8/0x1d4 [T15862] __asan_report_load4_noabort+0x20/0x2c [T15862] taprio_dump+0xa0c/0xbb0 [T15862] tc_fill_qdisc+0x540/0x1020 [T15862] qdisc_notify.isra.0+0x330/0x3a0 [T15862] tc_modify_qdisc+0x7b8/0x1838 [T15862] rtnetlink_rcv_msg+0x3c8/0xc20 [T15862] netlink_rcv_skb+0x1f8/0x3d4 [T15862] rtnetlink_rcv+0x28/0x40 [T15862] netlink_unicast+0x51c/0x790 [T15862] netlink_sendmsg+0x79c/0xc20 [T15862] __sock_sendmsg+0xe0/0x1a0 [T15862] ____sys_sendmsg+0x6c0/0x840 [T15862] ___sys_sendmsg+0x1ac/0x1f0 [T15862] __sys_sendmsg+0x110/0x1d0 [T15862] __arm64_sys_sendmsg+0x74/0xb0 [T15862] invoke_syscall+0x88/0x2e0 [T15862] el0_svc_common.constprop.0+0xe4/0x2a0 [T15862] do_el0_svc+0x44/0x60 [T15862] el0_svc+0x50/0x184 [T15862] el0t_64_sync_handler+0x120/0x12c [T15862] el0t_64_sync+0x190/0x194 [T15862] [T15862] Allocated by task 15857: [T15862] kasan_save_stack+0x3c/0x70 [T15862] kasan_save_track+0x20/0x3c [T15862] kasan_save_alloc_info+0x40/0x60 [T15862] __kasan_kmalloc+0xd4/0xe0 [T15862] __kmalloc_cache_noprof+0x194/0x334 [T15862] taprio_change+0x45c/0x2fe0 [T15862] tc_modify_qdisc+0x6a8/0x1838 [T15862] rtnetlink_rcv_msg+0x3c8/0xc20 [T15862] netlink_rcv_skb+0x1f8/0x3d4 [T15862] rtnetlink_rcv+0x28/0x40 [T15862] netlink_unicast+0x51c/0x790 [T15862] netlink_sendmsg+0x79c/0xc20 [T15862] __sock_sendmsg+0xe0/0x1a0 [T15862] ____sys_sendmsg+0x6c0/0x840 [T15862] ___sys_sendmsg+0x1ac/0x1f0 [T15862] __sys_sendmsg+0x110/0x1d0 [T15862] __arm64_sys_sendmsg+0x74/0xb0 [T15862] invoke_syscall+0x88/0x2e0 [T15862] el0_svc_common.constprop.0+0xe4/0x2a0 [T15862] do_el0_svc+0x44/0x60 [T15862] el0_svc+0x50/0x184 [T15862] el0t_64_sync_handler+0x120/0x12c [T15862] el0t_64_sync+0x190/0x194 [T15862] [T15862] Freed by task 6192: [T15862] kasan_save_stack+0x3c/0x70 [T15862] kasan_save_track+0x20/0x3c [T15862] kasan_save_free_info+0x4c/0x80 [T15862] poison_slab_object+0x110/0x160 [T15862] __kasan_slab_free+0x3c/0x74 [T15862] kfree+0x134/0x3c0 [T15862] taprio_free_sched_cb+0x18c/0x220 [T15862] rcu_core+0x920/0x1b7c [T15862] rcu_core_si+0x10/0x1c [T15862] handle_softirqs+0x2e8/0xd64 [T15862] __do_softirq+0x14/0x20 Fixes: 18cdd2f0998a ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex") Acked-by: Vinicius Costa Gomes <vinicius.gomes(a)intel.com> Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru> Link: https://patch.msgid.link/20241018051339.418890-2-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> (cherry picked from commit 5d282467245f267c0b9ada3f7f309ff838521536) [Lee: Backported from linux-6.6.y to linux-6.1.y and fixed conflicts] Signed-off-by: Lee Jones <lee(a)kernel.org> --- net/sched/sch_taprio.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 212fef2b72f5..1d5cdc987abd 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1995,9 +1995,6 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) struct nlattr *nest, *sched_nest; unsigned int i; - oper = rtnl_dereference(q->oper_sched); - admin = rtnl_dereference(q->admin_sched); - opt.num_tc = netdev_get_num_tc(dev); memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map)); @@ -2024,18 +2021,23 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay)) goto options_error; + rcu_read_lock(); + + oper = rtnl_dereference(q->oper_sched); + admin = rtnl_dereference(q->admin_sched); + if (taprio_dump_tc_entries(q, skb)) - goto options_error; + goto options_error_rcu; if (oper && dump_schedule(skb, oper)) - goto options_error; + goto options_error_rcu; if (!admin) goto done; sched_nest = nla_nest_start_noflag(skb, TCA_TAPRIO_ATTR_ADMIN_SCHED); if (!sched_nest) - goto options_error; + goto options_error_rcu; if (dump_schedule(skb, admin)) goto admin_error; @@ -2043,11 +2045,15 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) nla_nest_end(skb, sched_nest); done: + rcu_read_unlock(); return nla_nest_end(skb, nest); admin_error: nla_nest_cancel(skb, sched_nest); +options_error_rcu: + rcu_read_unlock(); + options_error: nla_nest_cancel(skb, nest); -- 2.47.0.277.g8800431eea-goog

10 months, 2 weeks

3
3
0 0

[PATCH 5.15 00/73] 5.15.171-rc1 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 5.15.171 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri, 08 Nov 2024 12:02:47 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.171-r… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 5.15.171-rc1 Johannes Berg <johannes.berg(a)intel.com> mac80211: always have ieee80211_sta_restart() Jeongjun Park <aha310510(a)gmail.com> vt: prevent kernel-infoleak in con_font_get() Rob Clark <robdclark(a)chromium.org> drm/i915: Fix potential context UAFs Jason-JH.Lin <jason-jh.lin(a)mediatek.com> Revert "drm/mipi-dsi: Set the fwnode for mipi_dsi_device" Jeongjun Park <aha310510(a)gmail.com> mm: shmem: fix data-race in shmem_getattr() Johannes Berg <johannes.berg(a)intel.com> wifi: iwlwifi: mvm: fix 6 GHz scan construction Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix kernel bug due to missing clearing of checked flag Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> x86/bugs: Use code segment selector for VERW operand Edward Adam Davis <eadavis(a)qq.com> ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow Matt Fleming <mfleming(a)cloudflare.com> mm/page_alloc: let GFP_ATOMIC order-0 allocs access highatomic reserves Mel Gorman <mgorman(a)techsingularity.net> mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Mel Gorman <mgorman(a)techsingularity.net> mm/page_alloc: explicitly define what alloc flags deplete min reserves Mel Gorman <mgorman(a)techsingularity.net> mm/page_alloc: explicitly record high-order atomic allocations in alloc_flags Mel Gorman <mgorman(a)techsingularity.net> mm/page_alloc: treat RT tasks similar to __GFP_HIGH Mel Gorman <mgorman(a)techsingularity.net> mm/page_alloc: rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman <mgorman(a)techsingularity.net> mm/page_alloc: split out buddy removal code from rmqueue into separate helper Wonhyuk Yang <vvghjk1234(a)gmail.com> mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked() Eric Dumazet <edumazet(a)google.com> mm/page_alloc: call check_new_pages() while zone spinlock is not held Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> riscv: Remove duplicated GET_RM Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> riscv: Remove unused GENERATING_ASM_OFFSETS WangYuli <wangyuli(a)uniontech.com> riscv: Use '%u' to format the output of 'cpu' Heinrich Schuchardt <heinrich.schuchardt(a)canonical.com> riscv: efi: Set NX compat flag in PE/COFF header Alexandre Ghiti <alexghiti(a)rivosinc.com> riscv: vdso: Prevent the compiler from inserting calls to memset() Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix potential deadlock with newly created symlinks Javier Carrasco <javier.carrasco.cruz(a)gmail.com> iio: light: veml6030: fix microlux value calculation Zicheng Qu <quzicheng(a)huawei.com> iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr() Zicheng Qu <quzicheng(a)huawei.com> staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg() Ville Syrjälä <ville.syrjala(a)linux.intel.com> wifi: iwlegacy: Clear stale interrupts before resuming device Manikanta Pubbisetty <quic_mpubbise(a)quicinc.com> wifi: ath10k: Fix memory leak in management tx Felix Fietkau <nbd(a)nbd.name> wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Revert "driver core: Fix uevent_show() vs driver detach race" Basavaraj Natikar <Basavaraj.Natikar(a)amd.com> xhci: Use pm_runtime_get to prevent RPM on unsupported systems Faisal Hassan <quic_faisalh(a)quicinc.com> xhci: Fix Link TRB DMA in command ring stopped completion event Javier Carrasco <javier.carrasco.cruz(a)gmail.com> usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes() Zijun Hu <quic_zijuhu(a)quicinc.com> usb: phy: Fix API devm_usb_put_phy() can not release the phy Zongmin Zhou <zhouzongmin(a)kylinos.cn> usbip: tools: Fix detach_port() invalid port error path Dimitri Sivanich <sivanich(a)hpe.com> misc: sgi-gru: Don't disable preemption in GRU driver Dai Ngo <dai.ngo(a)oracle.com> NFS: remove revoked delegation from server's delegation list Daniel Palmer <daniel(a)0x0f.com> net: amd: mvme147: Fix probe banner message Benjamin Marzinski <bmarzins(a)redhat.com> scsi: scsi_transport_fc: Allow setting rport state to current state Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Additional check in ni_clear() Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Fix possible deadlock in mi_read Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Fix warning possible deadlock in ntfs_set_state Andrew Ballance <andrewjballance(a)gmail.com> fs/ntfs3: Check if more than chunk-size bytes are written Pierre Gondois <pierre.gondois(a)arm.com> ACPI: CPPC: Make rmw_lock a raw_spin_lock Xiongfeng Wang <wangxiongfeng2(a)huawei.com> firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: nft_payload: sanitize offset and length before calling skb_checksum() Benoît Monin <benoit.monin(a)gmx.fr> net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension Dong Chenchen <dongchenchen2(a)huawei.com> netfilter: Fix use-after-free in get_info() Byeonguk Jeong <jungbu2855(a)gmail.com> bpf: Fix out-of-bounds write in trie_get_next_key() Zichen Xie <zichenxie0106(a)gmail.com> netdevsim: Add trailing zero to terminate the string in nsim_nexthop_bucket_activity_write() Pedro Tammela <pctammela(a)mojatatu.com> net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT Pablo Neira Ayuso <pablo(a)netfilter.org> gtp: allow -1 to be specified as file description from userspace Ido Schimmel <idosch(a)nvidia.com> ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_init_flow() Wander Lairson Costa <wander(a)redhat.com> igb: Disable threaded IRQ for igb_msix_other Furong Xu <0x1207(a)gmail.com> net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> ASoC: cs42l51: Fix some error handling paths in cs42l51_probe() Daniel Gabay <daniel.gabay(a)intel.com> wifi: iwlwifi: mvm: Fix response handling in iwl_mvm_send_recovery_cmd() Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> wifi: iwlwifi: mvm: disconnect station vifs if recovery failed Youghandhar Chintala <youghand(a)codeaurora.org> mac80211: Add support to trigger sta disconnect on hardware restart Johannes Berg <johannes.berg(a)intel.com> mac80211: do drv_reconfig_complete() before restarting all Selvin Xavier <selvin.xavier(a)broadcom.com> RDMA/bnxt_re: synchronize the qp-handle table array Patrisious Haddad <phaddad(a)nvidia.com> RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of down Leon Romanovsky <leon(a)kernel.org> RDMA/cxgb4: Dump vendor specific QP details Geert Uytterhoeven <geert(a)linux-m68k.org> wifi: brcm80211: BRCM_TRACING should depend on TRACING Felix Fietkau <nbd(a)nbd.name> wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys Geert Uytterhoeven <geert(a)linux-m68k.org> mac80211: MAC80211_MESSAGE_TRACING should depend on TRACING Xiu Jianfeng <xiujianfeng(a)huawei.com> cgroup: Fix potential overflow issue when checking max_depth Koba Ko <kobak(a)nvidia.com> ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context Sudeep Holla <sudeep.holla(a)arm.com> ACPI: PRM: Change handler_addr type to void pointer Aubrey Li <aubrey.li(a)intel.com> ACPI: PRM: Remove unnecessary blank lines Namjae Jeon <linkinjeon(a)kernel.org> ksmbd: fix user-after-free from session log off Donet Tom <donettom(a)linux.ibm.com> selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test ------------- Diffstat: Makefile | 4 +- arch/riscv/kernel/asm-offsets.c | 2 - arch/riscv/kernel/cpu-hotplug.c | 2 +- arch/riscv/kernel/efi-header.S | 2 +- arch/riscv/kernel/traps_misaligned.c | 2 - arch/riscv/kernel/vdso/Makefile | 1 + arch/x86/include/asm/nospec-branch.h | 11 +- drivers/acpi/cppc_acpi.c | 9 +- drivers/acpi/prmt.c | 33 ++-- drivers/base/core.c | 13 +- drivers/base/module.c | 4 - drivers/firmware/arm_sdei.c | 2 +- drivers/gpu/drm/drm_mipi_dsi.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 ++- drivers/iio/adc/ad7124.c | 2 +- drivers/iio/light/veml6030.c | 2 +- drivers/infiniband/hw/bnxt_re/qplib_fp.c | 4 + drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 13 +- drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 + drivers/infiniband/hw/cxgb4/provider.c | 1 + drivers/infiniband/hw/mlx5/qp.c | 4 +- drivers/misc/sgi-gru/grukservices.c | 2 - drivers/misc/sgi-gru/grumain.c | 4 - drivers/misc/sgi-gru/grutlbpurge.c | 2 - drivers/net/ethernet/amd/mvme147.c | 7 +- drivers/net/ethernet/intel/igb/igb_main.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++- drivers/net/gtp.c | 22 +-- drivers/net/netdevsim/fib.c | 4 +- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 7 +- drivers/net/wireless/ath/ath10k/wmi.c | 2 + drivers/net/wireless/broadcom/brcm80211/Kconfig | 1 + drivers/net/wireless/intel/iwlegacy/common.c | 2 + drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 22 ++- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 3 +- drivers/scsi/scsi_transport_fc.c | 4 +- drivers/staging/iio/frequency/ad9832.c | 7 +- drivers/tty/vt/vt.c | 2 +- drivers/usb/host/xhci-pci.c | 6 +- drivers/usb/host/xhci-ring.c | 16 +- drivers/usb/phy/phy.c | 2 +- drivers/usb/typec/class.c | 1 + fs/ksmbd/mgmt/user_session.c | 26 ++- fs/ksmbd/mgmt/user_session.h | 4 + fs/ksmbd/server.c | 2 + fs/ksmbd/smb2pdu.c | 8 +- fs/nfs/delegation.c | 5 + fs/nilfs2/namei.c | 3 + fs/nilfs2/page.c | 1 + fs/ntfs3/frecord.c | 4 +- fs/ntfs3/lznt.c | 3 + fs/ntfs3/namei.c | 2 +- fs/ntfs3/ntfs_fs.h | 2 +- fs/ocfs2/file.c | 8 + include/acpi/cppc_acpi.h | 2 +- include/net/ip_tunnels.h | 2 +- include/net/mac80211.h | 10 ++ include/trace/events/kmem.h | 14 +- kernel/bpf/lpm_trie.c | 2 +- kernel/cgroup/cgroup.c | 4 +- mm/internal.h | 13 +- mm/page_alloc.c | 185 +++++++++++++--------- mm/shmem.c | 2 + net/core/dev.c | 4 + net/mac80211/Kconfig | 2 +- net/mac80211/cfg.c | 3 +- net/mac80211/ieee80211_i.h | 3 + net/mac80211/key.c | 42 +++-- net/mac80211/mlme.c | 14 +- net/mac80211/util.c | 45 ++++-- net/netfilter/nft_payload.c | 3 + net/netfilter/x_tables.c | 2 +- net/sched/sch_api.c | 2 +- sound/soc/codecs/cs42l51.c | 7 +- tools/testing/selftests/vm/hmm-tests.c | 2 +- tools/usb/usbip/src/usbip_detach.c | 1 + 76 files changed, 481 insertions(+), 230 deletions(-)

10 months, 2 weeks

12
85
0 0

[PATCH 6.1+] ASoC: amd: yc: fix internal mic on Xiaomi Book Pro 14 2022

by WangYuli

From: Mingcong Bai <jeffbai(a)aosc.io> [ Upstream commit de156f3cf70e17dc6ff4c3c364bb97a6db961ffd ] Xiaomi Book Pro 14 2022 (MIA2210-AD) requires a quirk entry for its internal microphone to be enabled. This is likely due to similar reasons as seen previously on Redmi Book 14/15 Pro 2022 models (since they likely came with similar firmware): - commit dcff8b7ca92d ("ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022 into DMI table") - commit c1dd6bf61997 ("ASoC: amd: yc: Add Xiaomi Redmi Book Pro 14 2022 into DMI table") A quirk would likely be needed for Xiaomi Book Pro 15 2022 models, too. However, I do not have such device on hand so I will leave it for now. Signed-off-by: Mingcong Bai <jeffbai(a)aosc.io> Link: https://patch.msgid.link/20241106024052.15748-1-jeffbai@aosc.io Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: WangYuli <wangyuli(a)uniontech.com> --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index 76f5d926d1ea..e027bc1d35f4 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -381,6 +381,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "Redmi Book Pro 15 2022"), } }, + { + .driver_data = &acp6x_card, + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "TIMI"), + DMI_MATCH(DMI_PRODUCT_NAME, "Xiaomi Book Pro 14 2022"), + } + }, { .driver_data = &acp6x_card, .matches = { -- 2.45.2

10 months, 2 weeks

2
1
0 0

[PATCH 5.4] NFSD: Fix NFSv4's PUTPUBFH operation

by cel＠kernel.org

From: Chuck Lever <chuck.lever(a)oracle.com> [ Upstream commit 202f39039a11402dcbcd5fece8d9fa6be83f49ae ] According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH. Replace the XDR decoder for PUTPUBFH with a "noop" since we no longer want the minorversion check, and PUTPUBFH has no arguments to decode. (Ideally nfsd4_decode_noop should really be called nfsd4_decode_void). PUTPUBFH should now behave just like PUTROOTFH. Reported-by: Cedric Blancher <cedric.blancher(a)gmail.com> Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1") Cc: Dan Shelton <dan.f.shelton(a)gmail.com> Cc: Roland Mainz <roland.mainz(a)nrubsig.org> Cc: stable(a)vger.kernel.org [ cel: adjusted to apply to origin/linux-5.4.y ] Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> --- fs/nfsd/nfs4xdr.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) In response to: https://lore.kernel.org/stable/2024100703-decorated-bodacious-fa3c@gregkh/ here is a version of upstream commit 202f39039a11 ("NFSD: Fix NFSv4's PUTPUBFH operation") that applies to both origin/linux-5.4.y and origin/linux-4.19.y. diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1d24fff2709c..55b18c145390 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1068,14 +1068,6 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) DECODE_TAIL; } -static __be32 -nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) -{ - if (argp->minorversion == 0) - return nfs_ok; - return nfserr_notsupp; -} - static __be32 nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) { @@ -1825,7 +1817,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = { [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_open_confirm, [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade, [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh, - [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_putpubfh, + [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_noop, [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop, [OP_READ] = (nfsd4_dec)nfsd4_decode_read, [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir, -- 2.47.0

10 months, 2 weeks

2
1
0 0

[PATCH v2 0/2] PCI: endpoint: fix bugs for both API pci_epc_destroy() and pci_epc_remove_epf()

by Zijun Hu

This patch series is to fix bugs for below 2 APIs: pci_epc_destroy() pci_epc_remove_epf() Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- Changes in v2: - Correct title and commit messages, and remove RFC tag - Link to v1: https://lore.kernel.org/r/20241102-epc_rfc-v1-0-5026322df5bc@quicinc.com --- Zijun Hu (2): PCI: endpoint: Fix API pci_epc_destroy() releasing domain_nr ID faults PCI: endpoint: Fix API pci_epc_remove_epf() cleaning up wrong EPC of EPF drivers/pci/endpoint/pci-epc-core.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) --- base-commit: ad5df4a631fa7eeb8eb212d21ab3f6979fd1926e change-id: 20241102-epc_rfc-e1d9d03d5101 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

10 months, 2 weeks

3
6
0 0

[PATCH] LoongArch: Fix early_numa_add_cpu() usage for FDT systems

by Huacai Chen

early_numa_add_cpu() applies on physical CPU id rather than logical CPU id, so use cpuid instead of cpu. Cc: stable(a)vger.kernel.org Fixes: 3de9c42d02a79a5 ("LoongArch: Add all CPUs enabled by fdt to NUMA node 0") Reported-by: Bibo Mao <maobibo(a)loongson.cn> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- arch/loongarch/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c index c0b498467ffa..5d59e9ce2772 100644 --- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -302,7 +302,7 @@ static void __init fdt_smp_setup(void) __cpu_number_map[cpuid] = cpu; __cpu_logical_map[cpu] = cpuid; - early_numa_add_cpu(cpu, 0); + early_numa_add_cpu(cpuid, 0); set_cpuid_to_node(cpuid, 0); } -- 2.43.5

10 months, 2 weeks

1
0
0 0

[PATCH v2] usb: dwc3: gadget: Add TxFIFO resizing supports for single port RAM

by Selvarasu Ganesan

The existing implementation of the TxFIFO resizing logic only supports scenarios where more than one port RAM is used. However, there is a need to resize the TxFIFO in USB2.0-only mode where only a single port RAM is available. This commit introduces the necessary changes to support TxFIFO resizing in such scenarios. Cc: stable(a)vger.kernel.org # 6.12.x: fad16c82: usb: dwc3: gadget: Refine the logic for resizing Tx FIFOs Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com> --- Changes in v2: - Removed the code change that limits the number of FIFOs for bulk EP, as plan to address this issue in a separate patch. - Renamed the variable spram_type to is_single_port_ram for better understanding. - Link to v1: https://lore.kernel.org/lkml/20241107104040.502-1-selvarasu.g@samsung.com/ --- drivers/usb/dwc3/core.h | 4 +++ drivers/usb/dwc3/gadget.c | 54 +++++++++++++++++++++++++++++++++------ 2 files changed, 50 insertions(+), 8 deletions(-) diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index eaa55c0cf62f..8306b39e5c64 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -915,6 +915,7 @@ struct dwc3_hwparams { #define DWC3_MODE(n) ((n) & 0x7) /* HWPARAMS1 */ +#define DWC3_SPRAM_TYPE(n) (((n) >> 23) & 1) #define DWC3_NUM_INT(n) (((n) & (0x3f << 15)) >> 15) /* HWPARAMS3 */ @@ -925,6 +926,9 @@ struct dwc3_hwparams { #define DWC3_NUM_IN_EPS(p) (((p)->hwparams3 & \ (DWC3_NUM_IN_EPS_MASK)) >> 18) +/* HWPARAMS6 */ +#define DWC3_RAM0_DEPTH(n) (((n) & (0xffff0000)) >> 16) + /* HWPARAMS7 */ #define DWC3_RAM1_DEPTH(n) ((n) & 0xffff) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 2fed2aa01407..4f2e063c9091 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -687,6 +687,44 @@ static int dwc3_gadget_calc_tx_fifo_size(struct dwc3 *dwc, int mult) return fifo_size; } +/** + * dwc3_gadget_calc_ram_depth - calculates the ram depth for txfifo + * @dwc: pointer to the DWC3 context + */ +static int dwc3_gadget_calc_ram_depth(struct dwc3 *dwc) +{ + int ram_depth; + int fifo_0_start; + bool is_single_port_ram; + int tmp; + + /* Check supporting RAM type by HW */ + is_single_port_ram = DWC3_SPRAM_TYPE(dwc->hwparams.hwparams1); + + /* + * If a single port RAM is utilized, then allocate TxFIFOs from + * RAM0. otherwise, allocate them from RAM1. + */ + ram_depth = is_single_port_ram ? DWC3_RAM0_DEPTH(dwc->hwparams.hwparams6) : + DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); + + + /* + * In a single port RAM configuration, the available RAM is shared + * between the RX and TX FIFOs. This means that the txfifo can begin + * at a non-zero address. + */ + if (is_single_port_ram) { + /* Check if TXFIFOs start at non-zero addr */ + tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0)); + fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp); + + ram_depth -= (fifo_0_start >> 16); + } + + return ram_depth; +} + /** * dwc3_gadget_clear_tx_fifos - Clears txfifo allocation * @dwc: pointer to the DWC3 context @@ -753,7 +791,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) { struct dwc3 *dwc = dep->dwc; int fifo_0_start; - int ram1_depth; + int ram_depth; int fifo_size; int min_depth; int num_in_ep; @@ -773,7 +811,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) if (dep->flags & DWC3_EP_TXFIFO_RESIZED) return 0; - ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); + ram_depth = dwc3_gadget_calc_ram_depth(dwc); switch (dwc->gadget->speed) { case USB_SPEED_SUPER_PLUS: @@ -809,7 +847,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) /* Reserve at least one FIFO for the number of IN EPs */ min_depth = num_in_ep * (fifo + 1); - remaining = ram1_depth - min_depth - dwc->last_fifo_depth; + remaining = ram_depth - min_depth - dwc->last_fifo_depth; remaining = max_t(int, 0, remaining); /* * We've already reserved 1 FIFO per EP, so check what we can fit in @@ -835,9 +873,9 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size); /* Check fifo size allocation doesn't exceed available RAM size. */ - if (dwc->last_fifo_depth >= ram1_depth) { + if (dwc->last_fifo_depth >= ram_depth) { dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n", - dwc->last_fifo_depth, ram1_depth, + dwc->last_fifo_depth, ram_depth, dep->endpoint.name, fifo_size); if (DWC3_IP_IS(DWC3)) fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size); @@ -3090,7 +3128,7 @@ static int dwc3_gadget_check_config(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); struct usb_ep *ep; int fifo_size = 0; - int ram1_depth; + int ram_depth; int ep_num = 0; if (!dwc->do_fifo_resize) @@ -3113,8 +3151,8 @@ static int dwc3_gadget_check_config(struct usb_gadget *g) fifo_size += dwc->max_cfg_eps; /* Check if we can fit a single fifo per endpoint */ - ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); - if (fifo_size > ram1_depth) + ram_depth = dwc3_gadget_calc_ram_depth(dwc); + if (fifo_size > ram_depth) return -ENOMEM; return 0; -- 2.17.1

10 months, 2 weeks

2
3
0 0

[merged mm-hotfixes-stable] nommu-pass-null-argument-to-vma_iter_prealloc.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nommu: pass NULL argument to vma_iter_prealloc() has been removed from the -mm tree. Its filename was nommu-pass-null-argument-to-vma_iter_prealloc.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hajime Tazaki <thehajime(a)gmail.com> Subject: nommu: pass NULL argument to vma_iter_prealloc() Date: Sat, 9 Nov 2024 07:28:34 +0900 When deleting a vma entry from a maple tree, it has to pass NULL to vma_iter_prealloc() in order to calculate internal state of the tree, but it passed a wrong argument. As a result, nommu kernels crashed upon accessing a vma iterator, such as acct_collect() reading the size of vma entries after do_munmap(). This commit fixes this issue by passing a right argument to the preallocation call. Link: https://lkml.kernel.org/r/20241108222834.3625217-1-thehajime@gmail.com Fixes: b5df09226450 ("mm: set up vma iterator for vma_iter_prealloc() calls") Signed-off-by: Hajime Tazaki <thehajime(a)gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/nommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/nommu.c~nommu-pass-null-argument-to-vma_iter_prealloc +++ a/mm/nommu.c @@ -573,7 +573,7 @@ static int delete_vma_from_mm(struct vm_ VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_start); vma_iter_config(&vmi, vma->vm_start, vma->vm_end); - if (vma_iter_prealloc(&vmi, vma)) { + if (vma_iter_prealloc(&vmi, NULL)) { pr_warn("Allocation of vma tree for process %d failed\n", current->pid); return -ENOMEM; _ Patches currently in -mm which might be from thehajime(a)gmail.com are

10 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: ocfs2: fix UBSAN warning in ocfs2_verify_volume() has been removed from the -mm tree. Its filename was ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Dmitry Antipov <dmantipov(a)yandex.ru> Subject: ocfs2: fix UBSAN warning in ocfs2_verify_volume() Date: Wed, 6 Nov 2024 12:21:00 +0300 Syzbot has reported the following splat triggered by UBSAN: UBSAN: shift-out-of-bounds in fs/ocfs2/super.c:2336:10 shift exponent 32768 is too large for 32-bit type 'int' CPU: 2 UID: 0 PID: 5255 Comm: repro Not tainted 6.12.0-rc4-syzkaller-00047-gc2ee9f594da8 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x241/0x360 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? __asan_memset+0x23/0x50 ? lockdep_init_map_type+0xa1/0x910 __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 ocfs2_fill_super+0xf9c/0x5750 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? validate_chain+0x11e/0x5920 ? __lock_acquire+0x1384/0x2050 ? __pfx_validate_chain+0x10/0x10 ? string+0x26a/0x2b0 ? widen_string+0x3a/0x310 ? string+0x26a/0x2b0 ? bdev_name+0x2b1/0x3c0 ? pointer+0x703/0x1210 ? __pfx_pointer+0x10/0x10 ? __pfx_format_decode+0x10/0x10 ? __lock_acquire+0x1384/0x2050 ? vsnprintf+0x1ccd/0x1da0 ? snprintf+0xda/0x120 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_lock+0x14f/0x370 ? __pfx_snprintf+0x10/0x10 ? set_blocksize+0x1f9/0x360 ? sb_set_blocksize+0x98/0xf0 ? setup_bdev_super+0x4e6/0x5d0 mount_bdev+0x20c/0x2d0 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_mount_bdev+0x10/0x10 ? vfs_parse_fs_string+0x190/0x230 ? __pfx_vfs_parse_fs_string+0x10/0x10 legacy_get_tree+0xf0/0x190 ? __pfx_ocfs2_mount+0x10/0x10 vfs_get_tree+0x92/0x2b0 do_new_mount+0x2be/0xb40 ? __pfx_do_new_mount+0x10/0x10 __se_sys_mount+0x2d6/0x3c0 ? __pfx___se_sys_mount+0x10/0x10 ? do_syscall_64+0x100/0x230 ? __x64_sys_mount+0x20/0xc0 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f37cae96fda Code: 48 8b 0d 51 ce 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e ce 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff6c1aa228 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007fff6c1aa240 RCX: 00007f37cae96fda RDX: 00000000200002c0 RSI: 0000000020000040 RDI: 00007fff6c1aa240 RBP: 0000000000000004 R08: 00007fff6c1aa280 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000206 R12: 00000000000008c0 R13: 00007fff6c1aa280 R14: 0000000000000003 R15: 0000000001000000 </TASK> For a really damaged superblock, the value of 'i_super.s_blocksize_bits' may exceed the maximum possible shift for an underlying 'int'. So add an extra check whether the aforementioned field represents the valid block size, which is 512 bytes, 1K, 2K, or 4K. Link: https://lkml.kernel.org/r/20241106092100.2661330-1-dmantipov@yandex.ru Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru> Reported-by: syzbot+56f7cd1abe4b8e475180(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=56f7cd1abe4b8e475180 Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Jun Piao <piaojun(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/super.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) --- a/fs/ocfs2/super.c~ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume +++ a/fs/ocfs2/super.c @@ -2319,6 +2319,7 @@ static int ocfs2_verify_volume(struct oc struct ocfs2_blockcheck_stats *stats) { int status = -EAGAIN; + u32 blksz_bits; if (memcmp(di->i_signature, OCFS2_SUPER_BLOCK_SIGNATURE, strlen(OCFS2_SUPER_BLOCK_SIGNATURE)) == 0) { @@ -2333,11 +2334,15 @@ static int ocfs2_verify_volume(struct oc goto out; } status = -EINVAL; - if ((1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits)) != blksz) { + /* Acceptable block sizes are 512 bytes, 1K, 2K and 4K. */ + blksz_bits = le32_to_cpu(di->id2.i_super.s_blocksize_bits); + if (blksz_bits < 9 || blksz_bits > 12) { mlog(ML_ERROR, "found superblock with incorrect block " - "size: found %u, should be %u\n", - 1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits), - blksz); + "size bits: found %u, should be 9, 10, 11, or 12\n", + blksz_bits); + } else if ((1 << le32_to_cpu(blksz_bits)) != blksz) { + mlog(ML_ERROR, "found superblock with incorrect block " + "size: found %u, should be %u\n", 1 << blksz_bits, blksz); } else if (le16_to_cpu(di->id2.i_super.s_major_rev_level) != OCFS2_MAJOR_REV_LEVEL || le16_to_cpu(di->id2.i_super.s_minor_rev_level) != _ Patches currently in -mm which might be from dmantipov(a)yandex.ru are

10 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint has been removed from the -mm tree. Its filename was nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint Date: Thu, 7 Nov 2024 01:07:33 +0900 When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty() may cause a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because, since the tracepoint was added in mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, nilfs_grab_buffer(), which grabs a buffer to read (or create) a block of metadata, including b-tree node blocks, does not set the block device, but instead does so only if the buffer is not in the "uptodate" state for each of its caller block reading functions. However, if the uptodate flag is set on a folio/page, and the buffer heads are detached from it by try_to_free_buffers(), and new buffer heads are then attached by create_empty_buffers(), the uptodate flag may be restored to each buffer without the block device being set to bh->b_bdev, and mark_buffer_dirty() may be called later in that state, resulting in the bug mentioned above. Fix this issue by making nilfs_grab_buffer() always set the block device of the super block structure to the buffer head, regardless of the state of the buffer's uptodate flag. Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Ubisectech Sirius <bugreport(a)valiantsec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/btnode.c | 2 -- fs/nilfs2/gcinode.c | 4 +--- fs/nilfs2/mdt.c | 1 - fs/nilfs2/page.c | 1 + 4 files changed, 2 insertions(+), 6 deletions(-) --- a/fs/nilfs2/btnode.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/btnode.c @@ -68,7 +68,6 @@ nilfs_btnode_create_block(struct address goto failed; } memset(bh->b_data, 0, i_blocksize(inode)); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = blocknr; set_buffer_mapped(bh); set_buffer_uptodate(bh); @@ -133,7 +132,6 @@ int nilfs_btnode_submit_block(struct add goto found; } set_buffer_mapped(bh); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = pblocknr; /* set block address for read */ bh->b_end_io = end_buffer_read_sync; get_bh(bh); --- a/fs/nilfs2/gcinode.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/gcinode.c @@ -83,10 +83,8 @@ int nilfs_gccache_submit_read_data(struc goto out; } - if (!buffer_mapped(bh)) { - bh->b_bdev = inode->i_sb->s_bdev; + if (!buffer_mapped(bh)) set_buffer_mapped(bh); - } bh->b_blocknr = pbn; bh->b_end_io = end_buffer_read_sync; get_bh(bh); --- a/fs/nilfs2/mdt.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/mdt.c @@ -89,7 +89,6 @@ static int nilfs_mdt_create_block(struct if (buffer_uptodate(bh)) goto failed_bh; - bh->b_bdev = sb->s_bdev; err = nilfs_mdt_insert_new_block(inode, block, bh, init_block); if (likely(!err)) { get_bh(bh); --- a/fs/nilfs2/page.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/page.c @@ -63,6 +63,7 @@ struct buffer_head *nilfs_grab_buffer(st folio_put(folio); return NULL; } + bh->b_bdev = inode->i_sb->s_bdev; return bh; } _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are

10 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint has been removed from the -mm tree. Its filename was nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint Date: Thu, 7 Nov 2024 01:07:32 +0900 Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints". This series fixes null pointer dereference bugs that occur when using nilfs2 and two block-related tracepoints. This patch (of 2): It has been reported that when using "block:block_touch_buffer" tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because since the tracepoint was added in touch_buffer(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, the block_device structure is set after the function returns to the caller. Here, touch_buffer() is used to mark the folio/page that owns the buffer head as accessed, but the common search helper for folio/page used by the caller function was optimized to mark the folio/page as accessed when it was reimplemented a long time ago, eliminating the need to call touch_buffer() here in the first place. So this solves the issue by eliminating the touch_buffer() call itself. Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: Ubisectech Sirius <bugreport(a)valiantsec.com> Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@va… Reported-by: syzbot+9982fb8d18eba905abe2(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2 Tested-by: syzbot+9982fb8d18eba905abe2(a)syzkaller.appspotmail.com Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 - 1 file changed, 1 deletion(-) --- a/fs/nilfs2/page.c~nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint +++ a/fs/nilfs2/page.c @@ -39,7 +39,6 @@ static struct buffer_head *__nilfs_get_f first_block = (unsigned long)index << (PAGE_SHIFT - blkbits); bh = get_nth_bh(bh, block - first_block); - touch_buffer(bh); wait_on_buffer(bh); return bh; } _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are

10 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: page_alloc: move mlocked flag clearance into free_pages_prepare() has been removed from the -mm tree. Its filename was mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Roman Gushchin <roman.gushchin(a)linux.dev> Subject: mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Date: Wed, 6 Nov 2024 19:53:54 +0000 Syzbot reported a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.5.504 pfn:61f45 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45 flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline] kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530 __do_compat_sys_ioctl fs/ioctl.c:1007 [inline] __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e page last free pid 8399 tgid 8399 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686 folios_put_refs+0x76c/0x860 mm/swap.c:1007 free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373 tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465 exit_mmap+0x496/0xc40 mm/mmap.c:1926 __mmput+0x115/0x390 kernel/fork.c:1348 exit_mm+0x220/0x310 kernel/exit.c:571 do_exit+0x9b2/0x28e0 kernel/exit.c:926 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 bad_page+0x176/0x1d0 mm/page_alloc.c:501 free_page_is_bad mm/page_alloc.c:918 [inline] free_pages_prepare mm/page_alloc.c:1100 [inline] free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638 kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline] kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386 kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:939 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e RIP: 0023:0xf745d579 Code: Unable to access opcode bytes at 0xf745d54f. RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4 RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings), aside from a small memory leak - "bad" pages are stopped from being allocated again. Link: https://lkml.kernel.org/r/20241106195354.270757-1-roman.gushchin@linux.dev Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Reported-by: syzbot+e985d3026c4fd041578e(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6729f475.050a0220.701a.0019.GAE@google.com Acked-by: Hugh Dickins <hughd(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 15 +++++++++++++++ mm/swap.c | 14 -------------- 2 files changed, 15 insertions(+), 14 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare +++ a/mm/page_alloc.c @@ -1048,6 +1048,7 @@ __always_inline bool free_pages_prepare( bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1057,6 +1058,20 @@ __always_inline bool free_pages_prepare( if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + /* + * In rare cases, when truncation or holepunching raced with + * munlock after VM_LOCKED was cleared, Mlocked may still be + * found set here. This does not indicate a problem, unless + * "unevictable_pgs_cleared" appears worryingly large. + */ + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); --- a/mm/swap.c~mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare +++ a/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /* _ Patches currently in -mm which might be from roman.gushchin(a)linux.dev are

10 months, 2 weeks

1
0
0 0

[merged mm-nonmm-stable] util_macrosh-fix-rework-find_closest-macros.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: util_macros.h: fix/rework find_closest() macros has been removed from the -mm tree. Its filename was util_macrosh-fix-rework-find_closest-macros.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Alexandru Ardelean <aardelean(a)baylibre.com> Subject: util_macros.h: fix/rework find_closest() macros Date: Tue, 5 Nov 2024 16:54:05 +0200 A bug was found in the find_closest() (find_closest_descending() is also affected after some testing), where for certain values with small progressions, the rounding (done by averaging 2 values) causes an incorrect index to be returned. The rounding issues occur for progressions of 1, 2 and 3. It goes away when the progression/interval between two values is 4 or larger. It's particularly bad for progressions of 1. For example if there's an array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0 (the index of '1'), rather than returning 1 (the index of '2'). This means that for exact values (with a progression of 1), find_closest() will misbehave and return the index of the value smaller than the one we're searching for. For progressions of 2 and 3, the exact values are obtained correctly; but values aren't approximated correctly (as one would expect). Starting with progressions of 4, all seems to be good (one gets what one would expect). While one could argue that 'find_closest()' should not be used for arrays with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still behave correctly. The bug was found while testing the 'drivers/iio/adc/ad7606.c', specifically the oversampling feature. For reference, the oversampling values are listed as: static const unsigned int ad7606_oversampling_avail[7] = { 1, 2, 4, 8, 16, 32, 64, }; When doing: 1. $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is fine 2. $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is wrong; 2 should be returned here 3. $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 2 # this is fine 4. $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine And from here-on, the values are as correct (one gets what one would expect.) While writing a kunit test for this bug, a peculiar issue was found for the array in the 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c' drivers. While running the kunit test (for 'ina226_avg_tab' from these drivers): * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, so value. * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, value 1; and now one could argue whether 3 is closer to 4 or to 1. This quirk only appears for value '3' in this array, but it seems to be a another rounding issue. * And from 4 onwards the 'find_closest'() works fine (one gets what one would expect). This change reworks the find_closest() macros to also check the difference between the left and right elements when 'x'. If the distance to the right is smaller (than the distance to the left), the index is incremented by 1. This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro. In order to accommodate for any mix of negative + positive values, the internal variables '__fc_x', '__fc_mid_x', '__fc_left' & '__fc_right' are forced to 'long' type. This also addresses any potential bugs/issues with 'x' being of an unsigned type. In those situations any comparison between signed & unsigned would be promoted to a comparison between 2 unsigned numbers; this is especially annoying when '__fc_left' & '__fc_right' underflow. The find_closest_descending() macro was also reworked and duplicated from the find_closest(), and it is being iterated in reverse. The main reason for this is to get the same indices as 'find_closest()' (but in reverse). The comparison for '__fc_right < __fc_left' favors going the array in ascending order. For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we get: __fc_mid_x = 2 __fc_left = -1 __fc_right = -2 Then '__fc_right < __fc_left' evaluates to true and '__fc_i++' becomes 7 which is not quite incorrect, but 3 is closer to 4 than to 1. This change has been validated with the kunit from the next patch. Link: https://lkml.kernel.org/r/20241105145406.554365-1-aardelean@baylibre.com Fixes: 95d119528b0b ("util_macros.h: add find_closest() macro") Signed-off-by: Alexandru Ardelean <aardelean(a)baylibre.com> Cc: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/util_macros.h | 56 ++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 16 deletions(-) --- a/include/linux/util_macros.h~util_macrosh-fix-rework-find_closest-macros +++ a/include/linux/util_macros.h @@ -4,19 +4,6 @@ #include <linux/math.h> -#define __find_closest(x, a, as, op) \ -({ \ - typeof(as) __fc_i, __fc_as = (as) - 1; \ - typeof(x) __fc_x = (x); \ - typeof(*a) const *__fc_a = (a); \ - for (__fc_i = 0; __fc_i < __fc_as; __fc_i++) { \ - if (__fc_x op DIV_ROUND_CLOSEST(__fc_a[__fc_i] + \ - __fc_a[__fc_i + 1], 2)) \ - break; \ - } \ - (__fc_i); \ -}) - /** * find_closest - locate the closest element in a sorted array * @x: The reference value. @@ -25,8 +12,27 @@ * @as: Size of 'a'. * * Returns the index of the element closest to 'x'. + * Note: If using an array of negative numbers (or mixed positive numbers), + * then be sure that 'x' is of a signed-type to get good results. */ -#define find_closest(x, a, as) __find_closest(x, a, as, <=) +#define find_closest(x, a, as) \ +({ \ + typeof(as) __fc_i, __fc_as = (as) - 1; \ + long __fc_mid_x, __fc_x = (x); \ + long __fc_left, __fc_right; \ + typeof(*a) const *__fc_a = (a); \ + for (__fc_i = 0; __fc_i < __fc_as; __fc_i++) { \ + __fc_mid_x = (__fc_a[__fc_i] + __fc_a[__fc_i + 1]) / 2; \ + if (__fc_x <= __fc_mid_x) { \ + __fc_left = __fc_x - __fc_a[__fc_i]; \ + __fc_right = __fc_a[__fc_i + 1] - __fc_x; \ + if (__fc_right < __fc_left) \ + __fc_i++; \ + break; \ + } \ + } \ + (__fc_i); \ +}) /** * find_closest_descending - locate the closest element in a sorted array @@ -36,9 +42,27 @@ * @as: Size of 'a'. * * Similar to find_closest() but 'a' is expected to be sorted in descending - * order. + * order. The iteration is done in reverse order, so that the comparison + * of '__fc_right' & '__fc_left' also works for unsigned numbers. */ -#define find_closest_descending(x, a, as) __find_closest(x, a, as, >=) +#define find_closest_descending(x, a, as) \ +({ \ + typeof(as) __fc_i, __fc_as = (as) - 1; \ + long __fc_mid_x, __fc_x = (x); \ + long __fc_left, __fc_right; \ + typeof(*a) const *__fc_a = (a); \ + for (__fc_i = __fc_as; __fc_i >= 1; __fc_i--) { \ + __fc_mid_x = (__fc_a[__fc_i] + __fc_a[__fc_i - 1]) / 2; \ + if (__fc_x <= __fc_mid_x) { \ + __fc_left = __fc_x - __fc_a[__fc_i]; \ + __fc_right = __fc_a[__fc_i - 1] - __fc_x; \ + if (__fc_right < __fc_left) \ + __fc_i--; \ + break; \ + } \ + } \ + (__fc_i); \ +}) /** * is_insidevar - check if the @ptr points inside the @var memory range. _ Patches currently in -mm which might be from aardelean(a)baylibre.com are

10 months, 2 weeks

1
0
0 0

[PATCH backport to 6.10] x86/cpu: Add INTEL_FAM6_LUNARLAKE_M to X86_BUG_MONITOR

by Len Brown

From: Len Brown <len.brown(a)intel.com> Under some conditions, MONITOR wakeups on Lunar Lake processors can be lost, resulting in significant user-visible delays. Add LunarLake to X86_BUG_MONITOR so that wake_up_idle_cpu() always sends an IPI, avoiding this potential delay. Update the X86_BUG_MONITOR workaround to handle the new smp_kick_mwait_play_dead() path. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219364 Cc: stable(a)vger.kernel.org # 6.10 Signed-off-by: Len Brown <len.brown(a)intel.com> --- This is a backport of the upstream patch to Linux-6.10 and earlier --- arch/x86/kernel/cpu/intel.c | 3 ++- arch/x86/kernel/smpboot.c | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 3ef4e0137d21..e6f4c16c0267 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -583,7 +583,8 @@ static void init_intel(struct cpuinfo_x86 *c) set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) && - ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT))) + ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT) || + (c->x86_model == INTEL_FAM6_LUNARLAKE_M))) set_cpu_bug(c, X86_BUG_MONITOR); #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 0c35207320cb..ca9358acc626 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1376,6 +1376,9 @@ void smp_kick_mwait_play_dead(void) for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { /* Bring it out of mwait */ WRITE_ONCE(md->control, newstate); + /* If MWAIT unreliable, send IPI */ + if (boot_cpu_has_bug(X86_BUG_MONITOR)) + __apic_send_IPI(cpu, RESCHEDULE_VECTOR); udelay(5); } -- 2.43.0

10 months, 2 weeks

3
2
0 0

[PATCH] x86/cpu: Add INTEL_LUNARLAKE_M to X86_BUG_MONITOR

by Len Brown

From: Len Brown <len.brown(a)intel.com> Under some conditions, MONITOR wakeups on Lunar Lake processors can be lost, resulting in significant user-visible delays. Add LunarLake to X86_BUG_MONITOR so that wake_up_idle_cpu() always sends an IPI, avoiding this potential delay. Also update the X86_BUG_MONITOR workaround to handle the new smp_kick_mwait_play_dead() path. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219364 Cc: stable(a)vger.kernel.org # 6.11 Signed-off-by: Len Brown <len.brown(a)intel.com> --- arch/x86/kernel/cpu/intel.c | 3 ++- arch/x86/kernel/smpboot.c | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index e7656cbef68d..aa63f5f780a0 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -586,7 +586,8 @@ static void init_intel(struct cpuinfo_x86 *c) c->x86_vfm == INTEL_WESTMERE_EX)) set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR); - if (boot_cpu_has(X86_FEATURE_MWAIT) && c->x86_vfm == INTEL_ATOM_GOLDMONT) + if (boot_cpu_has(X86_FEATURE_MWAIT) && + (c->x86_vfm == INTEL_ATOM_GOLDMONT || c->x86_vfm == INTEL_LUNARLAKE_M)) set_cpu_bug(c, X86_BUG_MONITOR); #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 766f092dab80..910cb2d72c13 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1377,6 +1377,9 @@ void smp_kick_mwait_play_dead(void) for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { /* Bring it out of mwait */ WRITE_ONCE(md->control, newstate); + /* If MONITOR unreliable, send IPI */ + if (boot_cpu_has_bug(X86_BUG_MONITOR)) + __apic_send_IPI(cpu, RESCHEDULE_VECTOR); udelay(5); } -- 2.43.0

10 months, 2 weeks

4
4
0 0

+ mm-mremap-fix-address-wraparound-in-move_page_tables.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/mremap: fix address wraparound in move_page_tables() has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-mremap-fix-address-wraparound-in-move_page_tables.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: mm/mremap: fix address wraparound in move_page_tables() Date: Mon, 11 Nov 2024 20:34:30 +0100 On 32-bit platforms, it is possible for the expression `len + old_addr < old_end` to be false-positive if `len + old_addr` wraps around. `old_addr` is the cursor in the old range up to which page table entries have been moved; so if the operation succeeded, `old_addr` is the *end* of the old region, and adding `len` to it can wrap. The overflow causes mremap() to mistakenly believe that PTEs have been copied; the consequence is that mremap() bails out, but doesn't move the PTEs back before the new VMA is unmapped, causing anonymous pages in the region to be lost. So basically if userspace tries to mremap() a private-anon region and hits this bug, mremap() will return an error and the private-anon region's contents appear to have been zeroed. The idea of this check is that `old_end - len` is the original start address, and writing the check that way also makes it easier to read; so fix the check by rearranging the comparison accordingly. (An alternate fix would be to refactor this function by introducing an "orig_old_start" variable or such.) Tested in a VM with a 32-bit X86 kernel; without the patch: ``` user@horn:~/big_mremap$ cat test.c #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <err.h> #include <sys/mman.h> #define ADDR1 ((void*)0x60000000) #define ADDR2 ((void*)0x10000000) #define SIZE 0x50000000uL int main(void) { unsigned char *p1 = mmap(ADDR1, SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0); if (p1 == MAP_FAILED) err(1, "mmap 1"); unsigned char *p2 = mmap(ADDR2, SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0); if (p2 == MAP_FAILED) err(1, "mmap 2"); *p1 = 0x41; printf("first char is 0x%02hhx\n", *p1); unsigned char *p3 = mremap(p1, SIZE, SIZE, MREMAP_MAYMOVE|MREMAP_FIXED, p2); if (p3 == MAP_FAILED) { printf("mremap() failed; first char is 0x%02hhx\n", *p1); } else { printf("mremap() succeeded; first char is 0x%02hhx\n", *p3); } } user@horn:~/big_mremap$ gcc -static -o test test.c user@horn:~/big_mremap$ setarch -R ./test first char is 0x41 mremap() failed; first char is 0x00 ``` With the patch: ``` user@horn:~/big_mremap$ setarch -R ./test first char is 0x41 mremap() succeeded; first char is 0x41 ``` Link: https://lkml.kernel.org/r/20241111-fix-mremap-32bit-wrap-v1-1-61d6be73b722@… Fixes: af8ca1c14906 ("mm/mremap: optimize the start addresses in move_page_tables()") Signed-off-by: Jann Horn <jannh(a)google.com> Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org> Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mremap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/mremap.c~mm-mremap-fix-address-wraparound-in-move_page_tables +++ a/mm/mremap.c @@ -648,7 +648,7 @@ again: * Prevent negative return values when {old,new}_addr was realigned * but we broke out of the above loop for the first PMD itself. */ - if (len + old_addr < old_end) + if (old_addr < old_end - len) return 0; return len + old_addr - old_end; /* how much done */ _ Patches currently in -mm which might be from jannh(a)google.com are mm-mremap-fix-address-wraparound-in-move_page_tables.patch

10 months, 2 weeks

1
0
0 0

Re: Patch "ALSA: usb-audio: Support jack detection on Dell dock" has been added to the 5.15-stable tree

by Jan Schär

Am Mo, 11. Nov 2024, um 18:00, schrieb Sasha Levin: > This is a note to let you know that I've just added the patch titled > > ALSA: usb-audio: Support jack detection on Dell dock > > to the 5.15-stable tree which can be found at: > > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > alsa-usb-audio-support-jack-detection-on-dell-dock.patch > and it can be found in the queue-5.15 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. I think it's fine to add the WD19 patch (upstream commit 4413665dd6c5) to newer stable trees which already have the WD15 patch (upstream commit 4b8ea38fabab), as Greg has already done. That patch just adds a new USB ID for an already existing feature. But I'm not sure if it's a good idea to also add the WD15 patch to the older stable trees. This is a feature, not a bug fix, and the device works fine without it. The only thing is that you may have to manually select the audio input and output. And, the jack detection feature only works (with both WD15 and WD19) if you also have alsa-ucm-conf at least 1.2.7.2 installed, which was released 2022-07-08 [1]. All these older kernels were released before that. I doubt that there are many people who have a new enough alsa-ucm-conf installed, and simultaneously one of these old kernels, and would benefit from this. Jan [1] https://github.com/alsa-project/alsa-ucm-conf/releases/tag/v1.2.7.2

10 months, 2 weeks

1
0
0 0

FAILED: Patch "posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From b5413156bad91dc2995a5c4eab1b05e56914638a Mon Sep 17 00:00:00 2001 From: Benjamin Segall <bsegall(a)google.com> Date: Fri, 25 Oct 2024 18:35:35 -0700 Subject: [PATCH] posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone When cloning a new thread, its posix_cputimers are not inherited, and are cleared by posix_cputimers_init(). However, this does not clear the tick dependency it creates in tsk->tick_dep_mask, and the handler does not reach the code to clear the dependency if there were no timers to begin with. Thus if a thread has a cputimer running before clone/fork, all descendants will prevent nohz_full unless they create a cputimer of their own. Fix this by entirely clearing the tick_dep_mask in copy_process(). (There is currently no inherited state that needs a tick dependency) Process-wide timers do not have this problem because fork does not copy signal_struct as a baseline, it creates one from scratch. Fixes: b78783000d5c ("posix-cpu-timers: Migrate to use new tick dependency mask model") Signed-off-by: Ben Segall <bsegall(a)google.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/all/xm26o737bq8o.fsf@google.com --- include/linux/tick.h | 8 ++++++++ kernel/fork.c | 2 ++ 2 files changed, 10 insertions(+) diff --git a/include/linux/tick.h b/include/linux/tick.h index 72744638c5b0f..99c9c5a7252aa 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -251,12 +251,19 @@ static inline void tick_dep_set_task(struct task_struct *tsk, if (tick_nohz_full_enabled()) tick_nohz_dep_set_task(tsk, bit); } + static inline void tick_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit) { if (tick_nohz_full_enabled()) tick_nohz_dep_clear_task(tsk, bit); } + +static inline void tick_dep_init_task(struct task_struct *tsk) +{ + atomic_set(&tsk->tick_dep_mask, 0); +} + static inline void tick_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { @@ -290,6 +297,7 @@ static inline void tick_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) { } static inline void tick_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit) { } +static inline void tick_dep_init_task(struct task_struct *tsk) { } static inline void tick_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { } static inline void tick_dep_clear_signal(struct signal_struct *signal, diff --git a/kernel/fork.c b/kernel/fork.c index 89ceb4a68af25..6fa9fe62e01e3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include <linux/rseq.h> #include <uapi/linux/pidfd.h> #include <linux/pidfs.h> +#include <linux/tick.h> #include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -2292,6 +2293,7 @@ __latent_entropy struct task_struct *copy_process( acct_clear_integrals(p); posix_cputimers_init(&p->posix_cputimers); + tick_dep_init_task(p); p->io_context = NULL; audit_set_context(p, NULL); -- 2.43.0

10 months, 2 weeks

3
3
0 0

FAILED: patch "[PATCH] mm/damon/core: handle zero {aggregation,ops_update} intervals" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 3488af0970445ff5532c7e8dc5e6456b877aee5e # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111132-portal-crowbar-256b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 3488af0970445ff5532c7e8dc5e6456b877aee5e Mon Sep 17 00:00:00 2001 From: SeongJae Park <sj(a)kernel.org> Date: Thu, 31 Oct 2024 11:37:56 -0700 Subject: [PATCH] mm/damon/core: handle zero {aggregation,ops_update} intervals Patch series "mm/damon/core: fix handling of zero non-sampling intervals". DAMON's internal intervals accounting logic is not correctly handling non-sampling intervals of zero values for a wrong assumption. This could cause unexpected monitoring behavior, and even result in infinite hang of DAMON sysfs interface user threads in case of zero aggregation interval. Fix those by updating the intervals accounting logic. For details of the root case and solutions, please refer to commit messages of fixes. This patch (of 2): DAMON's logics to determine if this is the time to do aggregation and ops update assumes next_{aggregation,ops_update}_sis are always set larger than current passed_sample_intervals. And therefore it further assumes continuously incrementing passed_sample_intervals every sampling interval will make it reaches to the next_{aggregation,ops_update}_sis in future. The logic therefore make the action and update next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same to the counts, respectively. If Aggregation interval or Ops update interval are zero, however, next_aggregation_sis or next_ops_update_sis are set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_{aggregation,ops_update}_sis check. Hence, passed_sample_intervals becomes larger than next_{aggregation,ops_update}_sis, and the logic says it is not the time to do the action and update next_{aggregation,ops_update}_sis forever, until an overflow happens. In other words, DAMON stops doing aggregations or ops updates effectively forever, and users cannot get monitoring results. Based on the documents and the common sense, a reasonable behavior for such inputs is doing an aggregation and an ops update for every sampling interval. Handle the case by removing the assumption. Note that this could incur particular real issue for DAMON sysfs interface users, in case of zero Aggregation interval. When user starts DAMON with zero Aggregation interval and asks online DAMON parameter tuning via DAMON sysfs interface, the request is handled by the aggregation callback. Until the callback finishes the work, the user who requested the online tuning just waits. Hence, the user will be stuck until the passed_sample_intervals overflows. Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@kernel.org Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@kernel.org Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer") Signed-off-by: SeongJae Park <sj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [6.7.x] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/damon/core.c b/mm/damon/core.c index a83f3b736d51..3131a07569e4 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -2000,7 +2000,7 @@ static int kdamond_fn(void *data) if (ctx->ops.check_accesses) max_nr_accesses = ctx->ops.check_accesses(ctx); - if (ctx->passed_sample_intervals == next_aggregation_sis) { + if (ctx->passed_sample_intervals >= next_aggregation_sis) { kdamond_merge_regions(ctx, max_nr_accesses / 10, sz_limit); @@ -2018,7 +2018,7 @@ static int kdamond_fn(void *data) sample_interval = ctx->attrs.sample_interval ? ctx->attrs.sample_interval : 1; - if (ctx->passed_sample_intervals == next_aggregation_sis) { + if (ctx->passed_sample_intervals >= next_aggregation_sis) { ctx->next_aggregation_sis = next_aggregation_sis + ctx->attrs.aggr_interval / sample_interval; @@ -2028,7 +2028,7 @@ static int kdamond_fn(void *data) ctx->ops.reset_aggregated(ctx); } - if (ctx->passed_sample_intervals == next_ops_update_sis) { + if (ctx->passed_sample_intervals >= next_ops_update_sis) { ctx->next_ops_update_sis = next_ops_update_sis + ctx->attrs.ops_update_interval / sample_interval;

10 months, 2 weeks

2
1
0 0

Re: [merged mm-stable] zram-clear-idle-flag-after-recompression.patch removed from -mm tree

by Brian Geffon

On Mon, Nov 11, 2024 at 8:42 AM Brian Geffon <bgeffon(a)google.com> wrote: > > On Mon, Nov 11, 2024 at 3:28 AM Andrew Morton <akpm(a)linux-foundation.org> wrote: > > > > > > The quilt patch titled > > Subject: zram: clear IDLE flag after recompression > > has been removed from the -mm tree. Its filename was > > zram-clear-idle-flag-after-recompression.patch > > > > This patch was dropped because it was merged into the mm-stable branch > > of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > > > ------------------------------------------------------ > > From: Sergey Senozhatsky <senozhatsky(a)chromium.org> > > Subject: zram: clear IDLE flag after recompression > > Date: Tue, 29 Oct 2024 00:36:14 +0900 > > > > Patch series "zram: IDLE flag handling fixes", v2. > > > > zram can wrongly preserve ZRAM_IDLE flag on its entries which can result > > in premature post-processing (writeback and recompression) of such > > entries. > > > > > > This patch (of 2) > > > > Recompression should clear ZRAM_IDLE flag on the entries it has accessed, > > because otherwise some entries, specifically those for which recompression > > has failed, become immediate candidate entries for another post-processing > > (e.g. writeback). > > > > Consider the following case: > > - recompression marks entries IDLE every 4 hours and attempts > > to recompress them > > - some entries are incompressible, so we keep them intact and > > hence preserve IDLE flag > > - writeback marks entries IDLE every 8 hours and writebacks > > IDLE entries, however we have IDLE entries left from > > recompression, so writeback prematurely writebacks those > > entries. > > > > The bug was reported by Shin Kawamura. > > > > Link: https://lkml.kernel.org/r/20241028153629.1479791-1-senozhatsky@chromium.org > > Link: https://lkml.kernel.org/r/20241028153629.1479791-2-senozhatsky@chromium.org > > Fixes: 84b33bf78889 ("zram: introduce recompress sysfs knob") > > Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> > > Reported-by: Shin Kawamura <kawasin(a)google.com> > > Acked-by: Brian Geffon <bgeffon(a)google.com> > > Cc: Minchan Kim <minchan(a)kernel.org> Cc: stable(a)vger.kernel.org > > Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> > > --- > > > > drivers/block/zram/zram_drv.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > --- a/drivers/block/zram/zram_drv.c~zram-clear-idle-flag-after-recompression > > +++ a/drivers/block/zram/zram_drv.c > > @@ -1864,6 +1864,13 @@ static int recompress_slot(struct zram * > > if (ret) > > return ret; > > > > + /* > > + * We touched this entry so mark it as non-IDLE. This makes sure that > > + * we don't preserve IDLE flag and don't incorrectly pick this entry > > + * for different post-processing type (e.g. writeback). > > + */ > > + zram_clear_flag(zram, index, ZRAM_IDLE); > > + > > class_index_old = zs_lookup_class_index(zram->mem_pool, comp_len_old); > > /* > > * Iterate the secondary comp algorithms list (in order of priority) > > _ > > > > Patches currently in -mm which might be from senozhatsky(a)chromium.org are > > > >

10 months, 2 weeks

2
1
0 0

Passenger traffic Expo 2024

by Camille Batiste

Hey, Would you be interested in acquiring the attendees list of Passenger traffic Expo 2024? List contains: Names, Titles, Phone Numbers, Company Details, and more… Interested? Let me know so that I’ll send you the pricing for the same. Kind Regards, Camille Batiste Marketing Executive If you do not wish to receive our emails, please reply with "Not Interested."

10 months, 2 weeks

1
0
0 0

[PATCH] ftrace: Fix possible use-after-free issue in ftrace_location()

by Hagar Hemdan

From: Zheng Yejian <zhengyejian1(a)huawei.com> commit e60b613df8b6253def41215402f72986fee3fc8d upstream. KASAN reports a bug: BUG: KASAN: use-after-free in ftrace_location+0x90/0x120 Read of size 8 at addr ffff888141d40010 by task insmod/424 CPU: 8 PID: 424 Comm: insmod Tainted: G W 6.9.0-rc2+ [...] Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 print_report+0xcf/0x610 kasan_report+0xb5/0xe0 ftrace_location+0x90/0x120 register_kprobe+0x14b/0xa40 kprobe_init+0x2d/0xff0 [kprobe_example] do_one_initcall+0x8f/0x2d0 do_init_module+0x13a/0x3c0 load_module+0x3082/0x33d0 init_module_from_file+0xd2/0x130 __x64_sys_finit_module+0x306/0x440 do_syscall_64+0x68/0x140 entry_SYSCALL_64_after_hwframe+0x71/0x79 The root cause is that, in ftrace_location_range(), ftrace record of some address is being searched in ftrace pages of some module, but those ftrace pages at the same time is being freed in ftrace_release_mod() as the corresponding module is being deleted: CPU1 | CPU2 register_kprobes() { | delete_module() { check_kprobe_address_safe() { | arch_check_ftrace_location() { | ftrace_location() { | lookup_rec() // USE! | ftrace_release_mod() // Free! To fix this issue: 1. Hold rcu lock as accessing ftrace pages in ftrace_location_range(); 2. Use ftrace_location_range() instead of lookup_rec() in ftrace_location(); 3. Call synchronize_rcu() before freeing any ftrace pages both in ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem(). Link: https://lore.kernel.org/linux-trace-kernel/20240509192859.1273558-1-zhengye… Cc: stable(a)vger.kernel.org Cc: <mhiramat(a)kernel.org> Cc: <mark.rutland(a)arm.com> Cc: <mathieu.desnoyers(a)efficios.com> Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization") Suggested-by: Steven Rostedt <rostedt(a)goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> [Hagar: Modified to apply on v5.4.y] Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> --- only compile tested. --- kernel/trace/ftrace.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 412505d94865..60bf8a6d55ce 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1552,7 +1552,9 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end) struct ftrace_page *pg; struct dyn_ftrace *rec; struct dyn_ftrace key; + unsigned long ip = 0; + rcu_read_lock(); key.ip = start; key.flags = end; /* overload flags, as it is unsigned long */ @@ -1565,10 +1567,13 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end) sizeof(struct dyn_ftrace), ftrace_cmp_recs); if (rec) - return rec->ip; + { + ip = rec->ip; + break; + } } - - return 0; + rcu_read_unlock(); + return ip; } /** @@ -5736,6 +5741,8 @@ static int ftrace_process_locs(struct module *mod, /* We should have used all pages unless we skipped some */ if (pg_unuse) { WARN_ON(!skipped); + /* Need to synchronize with ftrace_location_range() */ + synchronize_rcu(); ftrace_free_pages(pg_unuse); } return ret; @@ -5889,6 +5896,9 @@ void ftrace_release_mod(struct module *mod) out_unlock: mutex_unlock(&ftrace_lock); + /* Need to synchronize with ftrace_location_range() */ + if (tmp_page) + synchronize_rcu(); for (pg = tmp_page; pg; pg = tmp_page) { /* Needs to be called outside of ftrace_lock */ @@ -6196,6 +6206,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr) unsigned long start = (unsigned long)(start_ptr); unsigned long end = (unsigned long)(end_ptr); struct ftrace_page **last_pg = &ftrace_pages_start; + struct ftrace_page *tmp_page = NULL; struct ftrace_page *pg; struct dyn_ftrace *rec; struct dyn_ftrace key; @@ -6239,12 +6250,8 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr) ftrace_update_tot_cnt--; if (!pg->index) { *last_pg = pg->next; - if (pg->records) { - free_pages((unsigned long)pg->records, pg->order); - ftrace_number_of_pages -= 1 << pg->order; - } - ftrace_number_of_groups--; - kfree(pg); + pg->next = tmp_page; + tmp_page = pg; pg = container_of(last_pg, struct ftrace_page, next); if (!(*last_pg)) ftrace_pages = pg; @@ -6261,6 +6268,11 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr) clear_func_from_hashes(func); kfree(func); } + /* Need to synchronize with ftrace_location_range() */ + if (tmp_page) { + synchronize_rcu(); + ftrace_free_pages(tmp_page); + } } void __init ftrace_free_init_mem(void) -- 2.40.1

10 months, 2 weeks

1
0
0 0

[PATCH] usb: dwc3: gadget: Add TxFIFO resizing supports for single port RAM

by Selvarasu Ganesan

This commit adds support for resizing the TxFIFO in USB2.0-only mode where using single port RAM, and limit the use of extra FIFOs for bulk transfers in non-SS mode. It prevents the issue of limited RAM size usage. Fixes: fad16c823e66 ("usb: dwc3: gadget: Refine the logic for resizing Tx FIFOs") Cc: stable(a)vger.kernel.org # 6.12.x: fad16c82: usb: dwc3: gadget: Refine the logic for resizing Tx FIFOs Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com> --- drivers/usb/dwc3/core.h | 4 +++ drivers/usb/dwc3/gadget.c | 56 ++++++++++++++++++++++++++++++--------- 2 files changed, 48 insertions(+), 12 deletions(-) diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index eaa55c0cf62f..8306b39e5c64 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -915,6 +915,7 @@ struct dwc3_hwparams { #define DWC3_MODE(n) ((n) & 0x7) /* HWPARAMS1 */ +#define DWC3_SPRAM_TYPE(n) (((n) >> 23) & 1) #define DWC3_NUM_INT(n) (((n) & (0x3f << 15)) >> 15) /* HWPARAMS3 */ @@ -925,6 +926,9 @@ struct dwc3_hwparams { #define DWC3_NUM_IN_EPS(p) (((p)->hwparams3 & \ (DWC3_NUM_IN_EPS_MASK)) >> 18) +/* HWPARAMS6 */ +#define DWC3_RAM0_DEPTH(n) (((n) & (0xffff0000)) >> 16) + /* HWPARAMS7 */ #define DWC3_RAM1_DEPTH(n) ((n) & 0xffff) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 2fed2aa01407..d3e25f7d7cd0 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -687,6 +687,42 @@ static int dwc3_gadget_calc_tx_fifo_size(struct dwc3 *dwc, int mult) return fifo_size; } +/** + * dwc3_gadget_calc_ram_depth - calculates the ram depth for txfifo + * @dwc: pointer to the DWC3 context + */ +static int dwc3_gadget_calc_ram_depth(struct dwc3 *dwc) +{ + int ram_depth; + int fifo_0_start; + bool spram_type; + int tmp; + + /* Check supporting RAM type by HW */ + spram_type = DWC3_SPRAM_TYPE(dwc->hwparams.hwparams1); + + /* If a single port RAM is utilized, then allocate TxFIFOs from + * RAM0. otherwise, allocate them from RAM1. + */ + ram_depth = spram_type ? DWC3_RAM0_DEPTH(dwc->hwparams.hwparams6) : + DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); + + + /* In a single port RAM configuration, the available RAM is shared + * between the RX and TX FIFOs. This means that the txfifo can begin + * at a non-zero address. + */ + if (spram_type) { + /* Check if TXFIFOs start at non-zero addr */ + tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0)); + fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp); + + ram_depth -= (fifo_0_start >> 16); + } + + return ram_depth; +} + /** * dwc3_gadget_clear_tx_fifos - Clears txfifo allocation * @dwc: pointer to the DWC3 context @@ -753,7 +789,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) { struct dwc3 *dwc = dep->dwc; int fifo_0_start; - int ram1_depth; + int ram_depth; int fifo_size; int min_depth; int num_in_ep; @@ -773,7 +809,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) if (dep->flags & DWC3_EP_TXFIFO_RESIZED) return 0; - ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); + ram_depth = dwc3_gadget_calc_ram_depth(dwc); switch (dwc->gadget->speed) { case USB_SPEED_SUPER_PLUS: @@ -792,10 +828,6 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) break; } fallthrough; - case USB_SPEED_FULL: - if (usb_endpoint_xfer_bulk(dep->endpoint.desc)) - num_fifos = 2; - break; default: break; } @@ -809,7 +841,7 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) /* Reserve at least one FIFO for the number of IN EPs */ min_depth = num_in_ep * (fifo + 1); - remaining = ram1_depth - min_depth - dwc->last_fifo_depth; + remaining = ram_depth - min_depth - dwc->last_fifo_depth; remaining = max_t(int, 0, remaining); /* * We've already reserved 1 FIFO per EP, so check what we can fit in @@ -835,9 +867,9 @@ static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size); /* Check fifo size allocation doesn't exceed available RAM size. */ - if (dwc->last_fifo_depth >= ram1_depth) { + if (dwc->last_fifo_depth >= ram_depth) { dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n", - dwc->last_fifo_depth, ram1_depth, + dwc->last_fifo_depth, ram_depth, dep->endpoint.name, fifo_size); if (DWC3_IP_IS(DWC3)) fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size); @@ -3090,7 +3122,7 @@ static int dwc3_gadget_check_config(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); struct usb_ep *ep; int fifo_size = 0; - int ram1_depth; + int ram_depth; int ep_num = 0; if (!dwc->do_fifo_resize) @@ -3113,8 +3145,8 @@ static int dwc3_gadget_check_config(struct usb_gadget *g) fifo_size += dwc->max_cfg_eps; /* Check if we can fit a single fifo per endpoint */ - ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); - if (fifo_size > ram1_depth) + ram_depth = dwc3_gadget_calc_ram_depth(dwc); + if (fifo_size > ram_depth) return -ENOMEM; return 0; -- 2.17.1

10 months, 2 weeks

2
4
0 0

Re: [merged mm-stable] zram-clear-idle-flag-in-mark_idle.patch removed from -mm tree

by Brian Geffon

On Mon, Nov 11, 2024 at 3:28 AM Andrew Morton <akpm(a)linux-foundation.org> wrote: > > > The quilt patch titled > Subject: zram: clear IDLE flag in mark_idle() > has been removed from the -mm tree. Its filename was > zram-clear-idle-flag-in-mark_idle.patch > > This patch was dropped because it was merged into the mm-stable branch > of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm I think this also needs to be Cc'd to stable. > > ------------------------------------------------------ > From: Sergey Senozhatsky <senozhatsky(a)chromium.org> > Subject: zram: clear IDLE flag in mark_idle() > Date: Tue, 29 Oct 2024 00:36:15 +0900 > > If entry does not fulfill current mark_idle() parameters, e.g. cutoff > time, then we should clear its ZRAM_IDLE from previous mark_idle() > invocations. > > Consider the following case: > - mark_idle() cutoff time 8h > - mark_idle() cutoff time 4h > - writeback() idle - will writeback entries with cutoff time 8h, > while it should only pick entries with cutoff time 4h > > The bug was reported by Shin Kawamura. > > Link: https://lkml.kernel.org/r/20241028153629.1479791-3-senozhatsky@chromium.org > Fixes: 755804d16965 ("zram: introduce an aged idle interface") > Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> > Reported-by: Shin Kawamura <kawasin(a)google.com> > Acked-by: Brian Geffon <bgeffon(a)google.com> > Cc: Minchan Kim <minchan(a)kernel.org> Cc: stable(a)vger.kernel.org > Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> > --- > > drivers/block/zram/zram_drv.c | 2 ++ > 1 file changed, 2 insertions(+) > > --- a/drivers/block/zram/zram_drv.c~zram-clear-idle-flag-in-mark_idle > +++ a/drivers/block/zram/zram_drv.c > @@ -410,6 +410,8 @@ static void mark_idle(struct zram *zram, > #endif > if (is_idle) > zram_set_flag(zram, index, ZRAM_IDLE); > + else > + zram_clear_flag(zram, index, ZRAM_IDLE); > zram_slot_unlock(zram, index); > } > } > _ > > Patches currently in -mm which might be from senozhatsky(a)chromium.org are > >

10 months, 2 weeks

1
0
0 0

[PATCH net] wifi: brcmfmac: release 'root' node in all execution paths

by Javier Carrasco

The fixed patch introduced an additional condition to enter the scope where the 'root' device_node is released (!settings->board_type, currently 'err'), which avoid decrementing the refcount with a call to of_node_put() if that second condition is not satisfied. Move the call to of_node_put() to the point where 'root' is no longer required to avoid leaking the resource if err is not zero. Cc: stable(a)vger.kernel.org Fixes: 7682de8b3351 ("wifi: brcmfmac: of: Fetch Apple properties") Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- Note that a call to of_node_put() on a NULL device_node has no effect, which simplifies this patch as there is no need to refactor the or add more conditions. --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c index fe4f65756105..af930e34c21f 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c @@ -110,9 +110,8 @@ void brcmf_of_probe(struct device *dev, enum brcmf_bus_type bus_type, } strreplace(board_type, '/', '-'); settings->board_type = board_type; - - of_node_put(root); } + of_node_put(root); if (!np || !of_device_is_compatible(np, "brcm,bcm4329-fmac")) return; --- base-commit: c05c62850a8f035a267151dd86ea3daf887e28b8 change-id: 20241030-brcmfmac-of-cleanup-000fe98821df Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

10 months, 2 weeks

2
3
0 0

FAILED: patch "[PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 404b739e895522838f1abdc340c554654d671dde # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111133-accuracy-doozy-8ba2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 404b739e895522838f1abdc340c554654d671dde Mon Sep 17 00:00:00 2001 From: Umang Jain <umang.jain(a)ideasonboard.com> Date: Wed, 16 Oct 2024 18:32:24 +0530 Subject: [PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function. To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable(a)vger.kernel.org Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3dbeffc650d3..0d8d5555e8af 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -593,7 +593,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state; - platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 404b739e895522838f1abdc340c554654d671dde # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111130-spearman-gratified-fd88@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 404b739e895522838f1abdc340c554654d671dde Mon Sep 17 00:00:00 2001 From: Umang Jain <umang.jain(a)ideasonboard.com> Date: Wed, 16 Oct 2024 18:32:24 +0530 Subject: [PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function. To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable(a)vger.kernel.org Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3dbeffc650d3..0d8d5555e8af 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -593,7 +593,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state; - platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 404b739e895522838f1abdc340c554654d671dde # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111127-rectal-glandular-3e3c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 404b739e895522838f1abdc340c554654d671dde Mon Sep 17 00:00:00 2001 From: Umang Jain <umang.jain(a)ideasonboard.com> Date: Wed, 16 Oct 2024 18:32:24 +0530 Subject: [PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function. To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable(a)vger.kernel.org Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3dbeffc650d3..0d8d5555e8af 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -593,7 +593,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state; - platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 404b739e895522838f1abdc340c554654d671dde # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111124-riveting-exceeding-fd63@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 404b739e895522838f1abdc340c554654d671dde Mon Sep 17 00:00:00 2001 From: Umang Jain <umang.jain(a)ideasonboard.com> Date: Wed, 16 Oct 2024 18:32:24 +0530 Subject: [PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function. To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable(a)vger.kernel.org Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3dbeffc650d3..0d8d5555e8af 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -593,7 +593,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state; - platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 404b739e895522838f1abdc340c554654d671dde # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111121-reabsorb-jockstrap-464b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 404b739e895522838f1abdc340c554654d671dde Mon Sep 17 00:00:00 2001 From: Umang Jain <umang.jain(a)ideasonboard.com> Date: Wed, 16 Oct 2024 18:32:24 +0530 Subject: [PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function. To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable(a)vger.kernel.org Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3dbeffc650d3..0d8d5555e8af 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -593,7 +593,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state; - platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 404b739e895522838f1abdc340c554654d671dde # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111118-underfoot-footrest-44b3@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 404b739e895522838f1abdc340c554654d671dde Mon Sep 17 00:00:00 2001 From: Umang Jain <umang.jain(a)ideasonboard.com> Date: Wed, 16 Oct 2024 18:32:24 +0530 Subject: [PATCH] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function. To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation. Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable(a)vger.kernel.org Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3dbeffc650d3..0d8d5555e8af 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -593,7 +593,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state; - platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: resolve faulty mmap_region() error path behaviour" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 5de195060b2e251a835f622759550e6202167641 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111146-dictator-subscript-3ec4@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5de195060b2e251a835f622759550e6202167641 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:48 +0000 Subject: [PATCH] mm: resolve faulty mmap_region() error path behaviour The mmap_region() function is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. Taking advantage of previous patches in this series we move a number of checks earlier in the code, simplifying things by moving the core of the logic into a static internal function __mmap_region(). Doing this allows us to perform a number of checks up front before we do any real work, and allows us to unwind the writable unmap check unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE validation unconditionally also. We move a number of things here: 1. We preallocate memory for the iterator before we call the file-backed memory hook, allowing us to exit early and avoid having to perform complicated and error-prone close/free logic. We carefully free iterator state on both success and error paths. 2. The enclosing mmap_region() function handles the mapping_map_writable() logic early. Previously the logic had the mapping_map_writable() at the point of mapping a newly allocated file-backed VMA, and a matching mapping_unmap_writable() on success and error paths. We now do this unconditionally if this is a file-backed, shared writable mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however doing so does not invalidate the seal check we just performed, and we in any case always decrement the counter in the wrapper. We perform a debug assert to ensure a driver does not attempt to do the opposite. 3. We also move arch_validate_flags() up into the mmap_region() function. This is only relevant on arm64 and sparc64, and the check is only meaningful for SPARC with ADI enabled. We explicitly add a warning for this arch if a driver invalidates this check, though the code ought eventually to be fixed to eliminate the need for this. With all of these measures in place, we no longer need to explicitly close the VMA on error paths, as we place all checks which might fail prior to a call to any driver mmap hook. This eliminates an entire class of errors, makes the code easier to reason about and more robust. Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Tested-by: Mark Brown <broonie(a)kernel.org> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/mmap.c b/mm/mmap.c index aee5fa08ae5d..79d541f1502b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1358,20 +1358,18 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, return do_vmi_munmap(&vmi, mm, start, len, uf, false); } -unsigned long mmap_region(struct file *file, unsigned long addr, +static unsigned long __mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; pgoff_t pglen = PHYS_PFN(len); - struct vm_area_struct *merge; unsigned long charged = 0; struct vma_munmap_struct vms; struct ma_state mas_detach; struct maple_tree mt_detach; unsigned long end = addr + len; - bool writable_file_mapping = false; int error; VMA_ITERATOR(vmi, mm, addr); VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff); @@ -1445,28 +1443,26 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); + if (vma_iter_prealloc(&vmi, vma)) { + error = -ENOMEM; + goto free_vma; + } + if (file) { vma->vm_file = get_file(file); error = mmap_file(file, vma); if (error) - goto unmap_and_free_vma; - - if (vma_is_shared_maywrite(vma)) { - error = mapping_map_writable(file->f_mapping); - if (error) - goto close_and_free_vma; - - writable_file_mapping = true; - } + goto unmap_and_free_file_vma; + /* Drivers cannot alter the address of the VMA. */ + WARN_ON_ONCE(addr != vma->vm_start); /* - * Expansion is handled above, merging is handled below. - * Drivers should not alter the address of the VMA. + * Drivers should not permit writability when previously it was + * disallowed. */ - if (WARN_ON((addr != vma->vm_start))) { - error = -EINVAL; - goto close_and_free_vma; - } + VM_WARN_ON_ONCE(vm_flags != vma->vm_flags && + !(vm_flags & VM_MAYWRITE) && + (vma->vm_flags & VM_MAYWRITE)); vma_iter_config(&vmi, addr, end); /* @@ -1474,6 +1470,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { + struct vm_area_struct *merge; + vmg.flags = vma->vm_flags; /* If this fails, state is reset ready for a reattempt. */ merge = vma_merge_new_range(&vmg); @@ -1491,7 +1489,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma = merge; /* Update vm_flags to pick up the change. */ vm_flags = vma->vm_flags; - goto unmap_writable; + goto file_expanded; } vma_iter_config(&vmi, addr, end); } @@ -1500,26 +1498,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } else if (vm_flags & VM_SHARED) { error = shmem_zero_setup(vma); if (error) - goto free_vma; + goto free_iter_vma; } else { vma_set_anonymous(vma); } - if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { - error = -EACCES; - goto close_and_free_vma; - } - - /* Allow architectures to sanity-check the vm_flags */ - if (!arch_validate_flags(vma->vm_flags)) { - error = -EINVAL; - goto close_and_free_vma; - } - - if (vma_iter_prealloc(&vmi, vma)) { - error = -ENOMEM; - goto close_and_free_vma; - } +#ifdef CONFIG_SPARC64 + /* TODO: Fix SPARC ADI! */ + WARN_ON_ONCE(!arch_validate_flags(vm_flags)); +#endif /* Lock the VMA since it is modified after insertion into VMA tree */ vma_start_write(vma); @@ -1533,10 +1520,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, */ khugepaged_enter_vma(vma, vma->vm_flags); - /* Once vma denies write, undo our temporary denial count */ -unmap_writable: - if (writable_file_mapping) - mapping_unmap_writable(file->f_mapping); +file_expanded: file = vma->vm_file; ksm_add_vma(vma); expanded: @@ -1569,23 +1553,17 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_set_page_prot(vma); - validate_mm(mm); return addr; -close_and_free_vma: - vma_close(vma); +unmap_and_free_file_vma: + fput(vma->vm_file); + vma->vm_file = NULL; - if (file || vma->vm_file) { -unmap_and_free_vma: - fput(vma->vm_file); - vma->vm_file = NULL; - - vma_iter_set(&vmi, vma->vm_end); - /* Undo any partial mapping done by a device driver. */ - unmap_region(&vmi.mas, vma, vmg.prev, vmg.next); - } - if (writable_file_mapping) - mapping_unmap_writable(file->f_mapping); + vma_iter_set(&vmi, vma->vm_end); + /* Undo any partial mapping done by a device driver. */ + unmap_region(&vmi.mas, vma, vmg.prev, vmg.next); +free_iter_vma: + vma_iter_free(&vmi); free_vma: vm_area_free(vma); unacct_error: @@ -1595,10 +1573,43 @@ unsigned long mmap_region(struct file *file, unsigned long addr, abort_munmap: vms_abort_munmap_vmas(&vms, &mas_detach); gather_failed: - validate_mm(mm); return error; } +unsigned long mmap_region(struct file *file, unsigned long addr, + unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, + struct list_head *uf) +{ + unsigned long ret; + bool writable_file_mapping = false; + + /* Check to see if MDWE is applicable. */ + if (map_deny_write_exec(vm_flags, vm_flags)) + return -EACCES; + + /* Allow architectures to sanity-check the vm_flags. */ + if (!arch_validate_flags(vm_flags)) + return -EINVAL; + + /* Map writable and ensure this isn't a sealed memfd. */ + if (file && is_shared_maywrite(vm_flags)) { + int error = mapping_map_writable(file->f_mapping); + + if (error) + return error; + writable_file_mapping = true; + } + + ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf); + + /* Clear our write mapping regardless of error. */ + if (writable_file_mapping) + mapping_unmap_writable(file->f_mapping); + + validate_mm(current->mm); + return ret; +} + static int __vm_munmap(unsigned long start, size_t len, bool unlock) { int ret;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 5baf8b037debf4ec60108ccfeccb8636d1dbad81 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111135-thumb-pretended-bad3@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5baf8b037debf4ec60108ccfeccb8636d1dbad81 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:47 +0000 Subject: [PATCH] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Currently MTE is permitted in two circumstances (desiring to use MTE having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is specified, as checked by arch_calc_vm_flag_bits() and actualised by setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap hook is activated in mmap_region(). The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also set is the arm64 implementation of arch_validate_flags(). Unfortunately, we intend to refactor mmap_region() to perform this check earlier, meaning that in the case of a shmem backing we will not have invoked shmem_mmap() yet, causing the mapping to fail spuriously. It is inappropriate to set this architecture-specific flag in general mm code anyway, so a sensible resolution of this issue is to instead move the check somewhere else. We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via the arch_calc_vm_flag_bits() call. This is an appropriate place to do this as we already check for the MAP_ANONYMOUS case here, and the shmem file case is simply a variant of the same idea - we permit RAM-backed memory. This requires a modification to the arch_calc_vm_flag_bits() signature to pass in a pointer to the struct file associated with the mapping, however this is not too egregious as this is only used by two architectures anyway - arm64 and parisc. So this patch performs this adjustment and removes the unnecessary assignment of VM_MTE_ALLOWED in shmem_mmap(). [akpm(a)linux-foundation.org: fix whitespace, per Catalin] Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Suggested-by: Catalin Marinas <catalin.marinas(a)arm.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index 9e39217b4afb..798d965760d4 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -6,6 +6,8 @@ #ifndef BUILD_VDSO #include <linux/compiler.h> +#include <linux/fs.h> +#include <linux/shmem_fs.h> #include <linux/types.h> static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, @@ -31,19 +33,21 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, } #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) -static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, + unsigned long flags) { /* * Only allow MTE on anonymous mappings as these are guaranteed to be * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ - if (system_supports_mte() && (flags & MAP_ANONYMOUS)) + if (system_supports_mte() && + ((flags & MAP_ANONYMOUS) || shmem_file(file))) return VM_MTE_ALLOWED; return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags) static inline bool arch_validate_prot(unsigned long prot, unsigned long addr __always_unused) diff --git a/arch/parisc/include/asm/mman.h b/arch/parisc/include/asm/mman.h index 89b6beeda0b8..663f587dc789 100644 --- a/arch/parisc/include/asm/mman.h +++ b/arch/parisc/include/asm/mman.h @@ -2,6 +2,7 @@ #ifndef __ASM_MMAN_H__ #define __ASM_MMAN_H__ +#include <linux/fs.h> #include <uapi/asm/mman.h> /* PARISC cannot allow mdwe as it needs writable stacks */ @@ -11,7 +12,7 @@ static inline bool arch_memory_deny_write_exec_supported(void) } #define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported -static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags) { /* * The stack on parisc grows upwards, so if userspace requests memory @@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags) #endif /* __ASM_MMAN_H__ */ diff --git a/include/linux/mman.h b/include/linux/mman.h index 8ddca62d6460..a842783ffa62 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -2,6 +2,7 @@ #ifndef _LINUX_MMAN_H #define _LINUX_MMAN_H +#include <linux/fs.h> #include <linux/mm.h> #include <linux/percpu_counter.h> @@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long pages) #endif #ifndef arch_calc_vm_flag_bits -#define arch_calc_vm_flag_bits(flags) 0 +#define arch_calc_vm_flag_bits(file, flags) 0 #endif #ifndef arch_validate_prot @@ -151,13 +152,13 @@ calc_vm_prot_bits(unsigned long prot, unsigned long pkey) * Combine the mmap "flags" argument into "vm_flags" used internally. */ static inline unsigned long -calc_vm_flag_bits(unsigned long flags) +calc_vm_flag_bits(struct file *file, unsigned long flags) { return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | - arch_calc_vm_flag_bits(flags); + arch_calc_vm_flag_bits(file, flags); } unsigned long vm_commit_limit(void); diff --git a/mm/mmap.c b/mm/mmap.c index ab71d4c3464c..aee5fa08ae5d 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -344,7 +344,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; /* Obtain the address to map to. we verify (or select) it and ensure diff --git a/mm/nommu.c b/mm/nommu.c index 635d028d647b..e9b5f527ab5b 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -842,7 +842,7 @@ static unsigned long determine_vm_flags(struct file *file, { unsigned long vm_flags; - vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags); + vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags); if (!file) { /* diff --git a/mm/shmem.c b/mm/shmem.c index 4ba1d00fabda..e87f5d6799a7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2733,9 +2733,6 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma) if (ret) return ret; - /* arm64 - allow memory tagging on RAM-based files */ - vm_flags_set(vma, VM_MTE_ALLOWED); - file_accessed(file); /* This is anonymous shared memory if it is unlinked at the time of mmap */ if (inode->i_nlink)

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: refactor map_deny_write_exec()" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111112-headless-facelift-4a02@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:46 +0000 Subject: [PATCH] mm: refactor map_deny_write_exec() Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mman.h b/include/linux/mman.h index bcb201ab7a41..8ddca62d6460 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -188,16 +188,31 @@ static inline bool arch_memory_deny_write_exec_supported(void) * * d) mmap(PROT_READ | PROT_EXEC) * mmap(PROT_READ | PROT_EXEC | PROT_BTI) + * + * This is only applicable if the user has set the Memory-Deny-Write-Execute + * (MDWE) protection mask for the current process. + * + * @old specifies the VMA flags the VMA originally possessed, and @new the ones + * we propose to set. + * + * Return: false if proposed change is OK, true if not ok and should be denied. */ -static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +static inline bool map_deny_write_exec(unsigned long old, unsigned long new) { + /* If MDWE is disabled, we have nothing to deny. */ if (!test_bit(MMF_HAS_MDWE, &current->mm->flags)) return false; - if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + /* If the new VMA is not executable, we have nothing to deny. */ + if (!(new & VM_EXEC)) + return false; + + /* Under MDWE we do not accept newly writably executable VMAs... */ + if (new & VM_WRITE) return true; - if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + /* ...nor previously non-executable VMAs becoming executable. */ + if (!(old & VM_EXEC)) return true; return false; diff --git a/mm/mmap.c b/mm/mmap.c index ac0604f146f6..ab71d4c3464c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_set_anonymous(vma); } - if (map_deny_write_exec(vma, vma->vm_flags)) { + if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { error = -EACCES; goto close_and_free_vma; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 0c5d6d06107d..6f450af3252e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, break; } - if (map_deny_write_exec(vma, newflags)) { + if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; } diff --git a/mm/vma.h b/mm/vma.h index 75558b5e9c8c..d58068c0ff2e 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -42,7 +42,7 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - /* 1 byte hole */ + /* 2 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: refactor map_deny_write_exec()" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111111-pellet-mummify-1558@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:46 +0000 Subject: [PATCH] mm: refactor map_deny_write_exec() Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mman.h b/include/linux/mman.h index bcb201ab7a41..8ddca62d6460 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -188,16 +188,31 @@ static inline bool arch_memory_deny_write_exec_supported(void) * * d) mmap(PROT_READ | PROT_EXEC) * mmap(PROT_READ | PROT_EXEC | PROT_BTI) + * + * This is only applicable if the user has set the Memory-Deny-Write-Execute + * (MDWE) protection mask for the current process. + * + * @old specifies the VMA flags the VMA originally possessed, and @new the ones + * we propose to set. + * + * Return: false if proposed change is OK, true if not ok and should be denied. */ -static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +static inline bool map_deny_write_exec(unsigned long old, unsigned long new) { + /* If MDWE is disabled, we have nothing to deny. */ if (!test_bit(MMF_HAS_MDWE, &current->mm->flags)) return false; - if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + /* If the new VMA is not executable, we have nothing to deny. */ + if (!(new & VM_EXEC)) + return false; + + /* Under MDWE we do not accept newly writably executable VMAs... */ + if (new & VM_WRITE) return true; - if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + /* ...nor previously non-executable VMAs becoming executable. */ + if (!(old & VM_EXEC)) return true; return false; diff --git a/mm/mmap.c b/mm/mmap.c index ac0604f146f6..ab71d4c3464c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_set_anonymous(vma); } - if (map_deny_write_exec(vma, vma->vm_flags)) { + if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { error = -EACCES; goto close_and_free_vma; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 0c5d6d06107d..6f450af3252e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, break; } - if (map_deny_write_exec(vma, newflags)) { + if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; } diff --git a/mm/vma.h b/mm/vma.h index 75558b5e9c8c..d58068c0ff2e 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -42,7 +42,7 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - /* 1 byte hole */ + /* 2 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: refactor map_deny_write_exec()" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111109-deck-cranial-851e@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:46 +0000 Subject: [PATCH] mm: refactor map_deny_write_exec() Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mman.h b/include/linux/mman.h index bcb201ab7a41..8ddca62d6460 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -188,16 +188,31 @@ static inline bool arch_memory_deny_write_exec_supported(void) * * d) mmap(PROT_READ | PROT_EXEC) * mmap(PROT_READ | PROT_EXEC | PROT_BTI) + * + * This is only applicable if the user has set the Memory-Deny-Write-Execute + * (MDWE) protection mask for the current process. + * + * @old specifies the VMA flags the VMA originally possessed, and @new the ones + * we propose to set. + * + * Return: false if proposed change is OK, true if not ok and should be denied. */ -static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +static inline bool map_deny_write_exec(unsigned long old, unsigned long new) { + /* If MDWE is disabled, we have nothing to deny. */ if (!test_bit(MMF_HAS_MDWE, &current->mm->flags)) return false; - if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + /* If the new VMA is not executable, we have nothing to deny. */ + if (!(new & VM_EXEC)) + return false; + + /* Under MDWE we do not accept newly writably executable VMAs... */ + if (new & VM_WRITE) return true; - if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + /* ...nor previously non-executable VMAs becoming executable. */ + if (!(old & VM_EXEC)) return true; return false; diff --git a/mm/mmap.c b/mm/mmap.c index ac0604f146f6..ab71d4c3464c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_set_anonymous(vma); } - if (map_deny_write_exec(vma, vma->vm_flags)) { + if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { error = -EACCES; goto close_and_free_vma; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 0c5d6d06107d..6f450af3252e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, break; } - if (map_deny_write_exec(vma, newflags)) { + if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; } diff --git a/mm/vma.h b/mm/vma.h index 75558b5e9c8c..d58068c0ff2e 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -42,7 +42,7 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - /* 1 byte hole */ + /* 2 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: unconditionally close VMAs on error" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 4080ef1579b2413435413988d14ac8c68e4d42c8 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111154-upwind-corroding-eca2@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 4080ef1579b2413435413988d14ac8c68e4d42c8 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:45 +0000 Subject: [PATCH] mm: unconditionally close VMAs on error Incorrect invocation of VMA callbacks when the VMA is no longer in a consistent state is bug prone and risky to perform. With regards to the important vm_ops->close() callback We have gone to great lengths to try to track whether or not we ought to close VMAs. Rather than doing so and risking making a mistake somewhere, instead unconditionally close and reset vma->vm_ops to an empty dummy operations set with a NULL .close operator. We introduce a new function to do so - vma_close() - and simplify existing vms logic which tracked whether we needed to close or not. This simplifies the logic, avoids incorrect double-calling of the .close() callback and allows us to update error paths to simply call vma_close() unconditionally - making VMA closure idempotent. Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/internal.h b/mm/internal.h index 4eab2961e69c..64c2eb0b160e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -135,6 +135,24 @@ static inline int mmap_file(struct file *file, struct vm_area_struct *vma) return err; } +/* + * If the VMA has a close hook then close it, and since closing it might leave + * it in an inconsistent state which makes the use of any hooks suspect, clear + * them down by installing dummy empty hooks. + */ +static inline void vma_close(struct vm_area_struct *vma) +{ + if (vma->vm_ops && vma->vm_ops->close) { + vma->vm_ops->close(vma); + + /* + * The mapping is in an inconsistent state, and no further hooks + * may be invoked upon it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + } +} + #ifdef CONFIG_MMU /* Flags for folio_pte_batch(). */ diff --git a/mm/mmap.c b/mm/mmap.c index 6e3b25f7728f..ac0604f146f6 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1573,8 +1573,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return addr; close_and_free_vma: - if (file && !vms.closed_vm_ops && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (file || vma->vm_file) { unmap_and_free_vma: @@ -1934,7 +1933,7 @@ void exit_mmap(struct mm_struct *mm) do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted += vma_pages(vma); - remove_vma(vma, /* unreachable = */ true, /* closed = */ false); + remove_vma(vma, /* unreachable = */ true); count++; cond_resched(); vma = vma_next(&vmi); diff --git a/mm/nommu.c b/mm/nommu.c index f9ccc02458ec..635d028d647b 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -589,8 +589,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) */ static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma) { - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); put_nommu_region(vma->vm_region); diff --git a/mm/vma.c b/mm/vma.c index b21ffec33f8e..7621384d64cf 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -323,11 +323,10 @@ static bool can_vma_merge_right(struct vma_merge_struct *vmg, /* * Close a vm structure and free it. */ -void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed) +void remove_vma(struct vm_area_struct *vma, bool unreachable) { might_sleep(); - if (!closed && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); @@ -1115,9 +1114,7 @@ void vms_clean_up_area(struct vma_munmap_struct *vms, vms_clear_ptes(vms, mas_detach, true); mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); - vms->closed_vm_ops = true; + vma_close(vma); } /* @@ -1160,7 +1157,7 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms, /* Remove and clean up vmas */ mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - remove_vma(vma, /* = */ false, vms->closed_vm_ops); + remove_vma(vma, /* unreachable = */ false); vm_unacct_memory(vms->nr_accounted); validate_mm(mm); @@ -1684,8 +1681,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, return new_vma; out_vma_link: - if (new_vma->vm_ops && new_vma->vm_ops->close) - new_vma->vm_ops->close(new_vma); + vma_close(new_vma); if (new_vma->vm_file) fput(new_vma->vm_file); diff --git a/mm/vma.h b/mm/vma.h index 55457cb68200..75558b5e9c8c 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -42,7 +42,6 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - bool closed_vm_ops; /* call_mmap() was encountered, so vmas may be closed */ /* 1 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ @@ -198,7 +197,6 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms, vms->unmap_start = FIRST_USER_ADDRESS; vms->unmap_end = USER_PGTABLES_CEILING; vms->clear_ptes = false; - vms->closed_vm_ops = false; } #endif @@ -269,7 +267,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool unlock); -void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed); +void remove_vma(struct vm_area_struct *vma, bool unreachable); void unmap_region(struct ma_state *mas, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next);

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unsafe VMA hook invocation when error arises on" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 3dd6ed34ce1f2356a77fb88edafb5ec96784e3cf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111140-democrat-landmass-2df5@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 3dd6ed34ce1f2356a77fb88edafb5ec96784e3cf Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Date: Tue, 29 Oct 2024 18:11:44 +0000 Subject: [PATCH] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Patch series "fix error handling in mmap_region() and refactor (hotfixes)", v4. mmap_region() is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. This series goes to great lengths to simplify how mmap_region() works and to avoid unwinding errors late on in the process of setting up the VMA for the new mapping, and equally avoids such operations occurring while the VMA is in an inconsistent state. The patches in this series comprise the minimal changes required to resolve existing issues in mmap_region() error handling, in order that they can be hotfixed and backported. There is additionally a follow up series which goes further, separated out from the v1 series and sent and updated separately. This patch (of 5): After an attempted mmap() fails, we are no longer in a situation where we can safely interact with VMA hooks. This is currently not enforced, meaning that we need complicated handling to ensure we do not incorrectly call these hooks. We can avoid the whole issue by treating the VMA as suspect the moment that the file->f_ops->mmap() function reports an error by replacing whatever VMA operations were installed with a dummy empty set of VMA operations. We do so through a new helper function internal to mm - mmap_file() - which is both more logically named than the existing call_mmap() function and correctly isolates handling of the vm_op reassignment to mm. All the existing invocations of call_mmap() outside of mm are ultimately nested within the call_mmap() from mm, which we now replace. It is therefore safe to leave call_mmap() in place as a convenience function (and to avoid churn). The invokers are: ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap() coda_file_operations -> mmap -> coda_file_mmap() shm_file_operations -> shm_mmap() shm_file_operations_huge -> shm_mmap() dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops -> i915_gem_dmabuf_mmap() None of these callers interact with vm_ops or mappings in a problematic way on error, quickly exiting out. Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/internal.h b/mm/internal.h index 16c1f3cd599e..4eab2961e69c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -108,6 +108,33 @@ static inline void *folio_raw_mapping(const struct folio *folio) return (void *)(mapping & ~PAGE_MAPPING_FLAGS); } +/* + * This is a file-backed mapping, and is about to be memory mapped - invoke its + * mmap hook and safely handle error conditions. On error, VMA hooks will be + * mutated. + * + * @file: File which backs the mapping. + * @vma: VMA which we are mapping. + * + * Returns: 0 if success, error otherwise. + */ +static inline int mmap_file(struct file *file, struct vm_area_struct *vma) +{ + int err = call_mmap(file, vma); + + if (likely(!err)) + return 0; + + /* + * OK, we tried to call the file hook for mmap(), but an error + * arose. The mapping is in an inconsistent state and we most not invoke + * any further hooks on it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + + return err; +} + #ifdef CONFIG_MMU /* Flags for folio_pte_batch(). */ diff --git a/mm/mmap.c b/mm/mmap.c index 9841b41e3c76..6e3b25f7728f 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1422,7 +1422,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* * clear PTEs while the vma is still in the tree so that rmap * cannot race with the freeing later in the truncate scenario. - * This is also needed for call_mmap(), which is why vm_ops + * This is also needed for mmap_file(), which is why vm_ops * close function is called. */ vms_clean_up_area(&vms, &mas_detach); @@ -1447,7 +1447,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, if (file) { vma->vm_file = get_file(file); - error = call_mmap(file, vma); + error = mmap_file(file, vma); if (error) goto unmap_and_free_vma; @@ -1470,7 +1470,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_iter_config(&vmi, addr, end); /* - * If vm_flags changed after call_mmap(), we should try merge + * If vm_flags changed after mmap_file(), we should try merge * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { diff --git a/mm/nommu.c b/mm/nommu.c index 385b0c15add8..f9ccc02458ec 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -885,7 +885,7 @@ static int do_mmap_shared_file(struct vm_area_struct *vma) { int ret; - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); if (ret == 0) { vma->vm_region->vm_top = vma->vm_region->vm_end; return 0; @@ -918,7 +918,7 @@ static int do_mmap_private(struct vm_area_struct *vma, * happy. */ if (capabilities & NOMMU_MAP_DIRECT) { - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); /* shouldn't return success if we're not sharing */ if (WARN_ON_ONCE(!is_nommu_shared_mapping(vma->vm_flags))) ret = -ENOSYS;

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm/thp: fix deferred split unqueue naming and locking" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x f8f931bba0f92052cf842b7e30917b1afcc77d5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111131-haziness-slum-8f5e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f8f931bba0f92052cf842b7e30917b1afcc77d5a Mon Sep 17 00:00:00 2001 From: Hugh Dickins <hughd(a)google.com> Date: Sun, 27 Oct 2024 13:02:13 -0700 Subject: [PATCH] mm/thp: fix deferred split unqueue naming and locking Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a1d345f1680c..03fd4bc39ea1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *folio, struct list_head *list) return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct folio *folio) MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped) return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; diff --git a/mm/internal.h b/mm/internal.h index 93083bbeeefa..16c1f3cd599e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -639,11 +639,11 @@ static inline void folio_set_order(struct folio *folio, unsigned int order) #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -651,9 +651,9 @@ static inline void folio_undo_large_rmappable(struct folio *folio) * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 81d8819f13cd..f8744f5630bb 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struct folio *folio, css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2703227cce88..06df2af97415 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) diff --git a/mm/migrate.c b/mm/migrate.c index fab84a776088..dfa24e41e8f9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e108ae755cc..8ad38cd5e574 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2681,7 +2681,6 @@ void free_unref_folios(struct folio_batch *folios) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..b8e3259ea2c4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) diff --git a/mm/vmscan.c b/mm/vmscan.c index ddaaff67642e..28ba2b06fc7d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1476,7 +1476,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios);

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm/thp: fix deferred split unqueue naming and locking" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x f8f931bba0f92052cf842b7e30917b1afcc77d5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111118-stove-huddling-7076@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f8f931bba0f92052cf842b7e30917b1afcc77d5a Mon Sep 17 00:00:00 2001 From: Hugh Dickins <hughd(a)google.com> Date: Sun, 27 Oct 2024 13:02:13 -0700 Subject: [PATCH] mm/thp: fix deferred split unqueue naming and locking Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a1d345f1680c..03fd4bc39ea1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *folio, struct list_head *list) return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct folio *folio) MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped) return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; diff --git a/mm/internal.h b/mm/internal.h index 93083bbeeefa..16c1f3cd599e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -639,11 +639,11 @@ static inline void folio_set_order(struct folio *folio, unsigned int order) #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -651,9 +651,9 @@ static inline void folio_undo_large_rmappable(struct folio *folio) * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 81d8819f13cd..f8744f5630bb 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struct folio *folio, css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2703227cce88..06df2af97415 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) diff --git a/mm/migrate.c b/mm/migrate.c index fab84a776088..dfa24e41e8f9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e108ae755cc..8ad38cd5e574 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2681,7 +2681,6 @@ void free_unref_folios(struct folio_batch *folios) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..b8e3259ea2c4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) diff --git a/mm/vmscan.c b/mm/vmscan.c index ddaaff67642e..28ba2b06fc7d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1476,7 +1476,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios);

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm/thp: fix deferred split unqueue naming and locking" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x f8f931bba0f92052cf842b7e30917b1afcc77d5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111110-silencer-chitchat-1dc6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f8f931bba0f92052cf842b7e30917b1afcc77d5a Mon Sep 17 00:00:00 2001 From: Hugh Dickins <hughd(a)google.com> Date: Sun, 27 Oct 2024 13:02:13 -0700 Subject: [PATCH] mm/thp: fix deferred split unqueue naming and locking Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a1d345f1680c..03fd4bc39ea1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *folio, struct list_head *list) return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct folio *folio) MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped) return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; diff --git a/mm/internal.h b/mm/internal.h index 93083bbeeefa..16c1f3cd599e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -639,11 +639,11 @@ static inline void folio_set_order(struct folio *folio, unsigned int order) #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -651,9 +651,9 @@ static inline void folio_undo_large_rmappable(struct folio *folio) * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 81d8819f13cd..f8744f5630bb 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struct folio *folio, css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2703227cce88..06df2af97415 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) diff --git a/mm/migrate.c b/mm/migrate.c index fab84a776088..dfa24e41e8f9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e108ae755cc..8ad38cd5e574 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2681,7 +2681,6 @@ void free_unref_folios(struct folio_batch *folios) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..b8e3259ea2c4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) diff --git a/mm/vmscan.c b/mm/vmscan.c index ddaaff67642e..28ba2b06fc7d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1476,7 +1476,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios);

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] mm/thp: fix deferred split unqueue naming and locking" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x f8f931bba0f92052cf842b7e30917b1afcc77d5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111108-kangaroo-press-8c50@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f8f931bba0f92052cf842b7e30917b1afcc77d5a Mon Sep 17 00:00:00 2001 From: Hugh Dickins <hughd(a)google.com> Date: Sun, 27 Oct 2024 13:02:13 -0700 Subject: [PATCH] mm/thp: fix deferred split unqueue naming and locking Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a1d345f1680c..03fd4bc39ea1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *folio, struct list_head *list) return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct folio *folio) MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped) return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; diff --git a/mm/internal.h b/mm/internal.h index 93083bbeeefa..16c1f3cd599e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -639,11 +639,11 @@ static inline void folio_set_order(struct folio *folio, unsigned int order) #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -651,9 +651,9 @@ static inline void folio_undo_large_rmappable(struct folio *folio) * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 81d8819f13cd..f8744f5630bb 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struct folio *folio, css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2703227cce88..06df2af97415 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) diff --git a/mm/migrate.c b/mm/migrate.c index fab84a776088..dfa24e41e8f9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5e108ae755cc..8ad38cd5e574 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2681,7 +2681,6 @@ void free_unref_folios(struct folio_batch *folios) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..b8e3259ea2c4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) diff --git a/mm/vmscan.c b/mm/vmscan.c index ddaaff67642e..28ba2b06fc7d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1476,7 +1476,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios);

10 months, 2 weeks

1
0
0 0

FAILED: patch "[PATCH] signal: restore the override_rlimit logic" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 9e05e5c7ee8758141d2db7e8fea2cab34500c6ed # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111159-repaying-whole-1063@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9e05e5c7ee8758141d2db7e8fea2cab34500c6ed Mon Sep 17 00:00:00 2001 From: Roman Gushchin <roman.gushchin(a)linux.dev> Date: Mon, 4 Nov 2024 19:54:19 +0000 Subject: [PATCH] signal: restore the override_rlimit logic Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Co-developed-by: Andrei Vagin <avagin(a)google.com> Signed-off-by: Andrei Vagin <avagin(a)google.com> Acked-by: Oleg Nesterov <oleg(a)redhat.com> Acked-by: Alexey Gladkov <legion(a)kernel.org> Cc: Kees Cook <kees(a)kernel.org> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 3625096d5f85..7183e5aca282 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -141,7 +141,8 @@ static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type ty long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit); void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type); bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max); diff --git a/kernel/signal.c b/kernel/signal.c index 4344860ffcac..cbabb2d05e0a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, */ rcu_read_lock(); ucounts = task_ucounts(t); - sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, + override_rlimit); rcu_read_unlock(); if (!sigpending) return NULL; diff --git a/kernel/ucount.c b/kernel/ucount.c index 9469102c5ac0..696406939be5 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type) do_dec_rlimit_put_ucounts(ucounts, NULL, type); } -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit) { /* Caller must hold a reference to ucounts */ struct ucounts *iter; @@ -320,7 +321,8 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) goto dec_unwind; if (iter == ucounts) ret = new; - max = get_userns_rlimit_max(iter->ns, type); + if (!override_rlimit) + max = get_userns_rlimit_max(iter->ns, type); /* * Grab an extra ucount reference for the caller when * the rlimit count was previously 0.

10 months, 2 weeks

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror