Linux-stable-mirror May 2024

linux-stable-mirror@lists.linaro.org

412 participants
839 discussions

backport a patch for Linux kernel-5.10 and 6.6 stable tree

by Lin Gui (桂林)

Hi reviewers, I suggest to backport a commit to Linux kernel-5.10 and 6.6 stable tree. 　　https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d… author Lin Gui lin.gui(a)mediatek.com 2023-12-19 07:05:32 +0800 committer Ulf Hansson ulf.hansson(a)linaro.org 2024-01-02 17:54:05 +0100 commit e4df56ad0bf3506c5189abb9be83f3bea05a4c4f (patch) tree a5db3a85f44b29dd773c5c65c3340d50b74b6687 /drivers/mmc/core/mmc.c parent b062136d0d6f46d7ad5c88219cbd75f90cb18e81 (diff) download linux-e4df56ad0bf3506c5189abb9be83f3bea05a4c4f.tar.gz mmc: core: Add wp_grp_size sysfs node The eMMC card can be set into write-protected mode to prevent data from being accidentally modified or deleted. Wp_grp_size (Write Protect Group Size) refers to an attribute of the eMMC card, used to manage write protection and is the CSD register [36:32] of the eMMC device. Wp_grp_size (Write Protect Group Size) indicates how many eMMC blocks are contained in each write protection group on the eMMC card. To allow userspace easy access of the CSD register bits, let's add sysfs node "wp_grp_size". Signed-off-by: Lin Gui lin.gui(a)mediatek.com Signed-off-by: Bo Ye bo.ye(a)mediatek.com Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno(a)collabora.com Link: https://lore.kernel.org/r/20231218230532.82427-1-bo.ye@mediatek.com Signed-off-by: Ulf Hansson ulf.hansson(a)linaro.org ------------------------------------ Best Regards ！ Guilin ===================================== MediaTek(ChengDu)Inc. Email: mailto:lin.gui@mediatek.com tel:+86-28-85939000-67009 Fax:+86-28-85929875 ==============================================

1 year, 7 months

[PATCH] drm/shmem-helper: Fix BUG_ON() on mmap(PROT_WRITE, MAP_PRIVATE)

by Jacek Lawrynowicz

From: "Wachowski, Karol" <karol.wachowski(a)intel.com> Lack of check for copy-on-write (COW) mapping in drm_gem_shmem_mmap allows users to call mmap with PROT_WRITE and MAP_PRIVATE flag causing a kernel panic due to BUG_ON in vmf_insert_pfn_prot: BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); Return -EINVAL early if COW mapping is detected. This bug affects all drm drivers using default shmem helpers. It can be reproduced by this simple example: void *ptr = mmap(0, size, PROT_WRITE, MAP_PRIVATE, fd, mmap_offset); ptr[0] = 0; Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects") Cc: Noralf Trønnes <noralf(a)tronnes.org> Cc: Eric Anholt <eric(a)anholt.net> Cc: Rob Herring <robh(a)kernel.org> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: David Airlie <airlied(a)gmail.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: dri-devel(a)lists.freedesktop.org Cc: <stable(a)vger.kernel.org> # v5.2+ Signed-off-by: Wachowski, Karol <karol.wachowski(a)intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com> --- drivers/gpu/drm/drm_gem_shmem_helper.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 177773bcdbfd..885a62c2e1be 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -611,6 +611,9 @@ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct return ret; } + if (is_cow_mapping(vma->vm_flags)) + return -EINVAL; + dma_resv_lock(shmem->base.resv, NULL); ret = drm_gem_shmem_get_pages(shmem); dma_resv_unlock(shmem->base.resv); -- 2.45.1

1 year, 7 months

[PATCH v2 4/4] tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the inode is being dropped from the dentry, the TRACEFS_EVENT_INODE flag needs to be cleared to prevent a remount from calling eventfs_remount() on the tracefs_inode private data. There's a race between the inode is dropped (and the dentry freed) to where the inode is actually freed. If a remount happens between the two, the eventfs_inode could be accessed after it is freed (only the dentry keeps a ref count on it). Currently the TRACEFS_EVENT_INODE flag is cleared from the dentry iput() function. But this is incorrect, as it is possible that the inode has another reference to it. The flag should only be cleared when the inode is really being dropped and has no more references. That happens in the drop_inode callback of the inode, as that gets called when the last reference of the inode is released. Remove the tracefs_d_iput() function and move its logic to the more appropriate tracefs_drop_inode() callback function. Cc: stable(a)vger.kernel.org Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- fs/tracefs/inode.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 9252e0d78ea2..7c29f4afc23d 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -426,10 +426,26 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root) return 0; } +static int tracefs_drop_inode(struct inode *inode) +{ + struct tracefs_inode *ti = get_tracefs(inode); + + /* + * This inode is being freed and cannot be used for + * eventfs. Clear the flag so that it doesn't call into + * eventfs during the remount flag updates. The eventfs_inode + * gets freed after an RCU cycle, so the content will still + * be safe if the iteration is going on now. + */ + ti->flags &= ~TRACEFS_EVENT_INODE; + + return 1; +} + static const struct super_operations tracefs_super_operations = { .alloc_inode = tracefs_alloc_inode, .free_inode = tracefs_free_inode, - .drop_inode = generic_delete_inode, + .drop_inode = tracefs_drop_inode, .statfs = simple_statfs, .show_options = tracefs_show_options, }; @@ -455,22 +471,7 @@ static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags) return !(ei && ei->is_freed); } -static void tracefs_d_iput(struct dentry *dentry, struct inode *inode) -{ - struct tracefs_inode *ti = get_tracefs(inode); - - /* - * This inode is being freed and cannot be used for - * eventfs. Clear the flag so that it doesn't call into - * eventfs during the remount flag updates. The eventfs_inode - * gets freed after an RCU cycle, so the content will still - * be safe if the iteration is going on now. - */ - ti->flags &= ~TRACEFS_EVENT_INODE; -} - static const struct dentry_operations tracefs_dentry_operations = { - .d_iput = tracefs_d_iput, .d_revalidate = tracefs_d_revalidate, .d_release = tracefs_d_release, }; -- 2.43.0

1 year, 7 months

[PATCH v2 3/4] eventfs: Update all the eventfs_inodes from the events descriptor

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The change to update the permissions of the eventfs_inode had the misconception that using the tracefs_inode would find all the eventfs_inodes that have been updated and reset them on remount. The problem with this approach is that the eventfs_inodes are freed when they are no longer used (basically the reason the eventfs system exists). When they are freed, the updated eventfs_inodes are not reset on a remount because their tracefs_inodes have been freed. Instead, since the events directory eventfs_inode always has a tracefs_inode pointing to it (it is not freed when finished), and the events directory has a link to all its children, have the eventfs_remount() function only operate on the events eventfs_inode and have it descend into its children updating their uid and gids. Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVw… Cc: stable(a)vger.kernel.org Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Reported-by: Masahiro Yamada <masahiroy(a)kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- fs/tracefs/event_inode.c | 44 ++++++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 5dfb1ccd56ea..129d0f54ba62 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -305,27 +305,27 @@ static const struct file_operations eventfs_file_operations = { .llseek = generic_file_llseek, }; -/* - * On a remount of tracefs, if UID or GID options are set, then - * the mount point inode permissions should be used. - * Reset the saved permission flags appropriately. - */ -void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid) +static void eventfs_set_attrs(struct eventfs_inode *ei, bool update_uid, kuid_t uid, + bool update_gid, kgid_t gid, int level) { - struct eventfs_inode *ei = ti->private; + struct eventfs_inode *ei_child; - if (!ei) + /* Update events/<system>/<event> */ + if (WARN_ON_ONCE(level > 3)) return; if (update_uid) { ei->attr.mode &= ~EVENTFS_SAVE_UID; - ei->attr.uid = ti->vfs_inode.i_uid; + ei->attr.uid = uid; } - if (update_gid) { ei->attr.mode &= ~EVENTFS_SAVE_GID; - ei->attr.gid = ti->vfs_inode.i_gid; + ei->attr.gid = gid; + } + + list_for_each_entry(ei_child, &ei->children, list) { + eventfs_set_attrs(ei_child, update_uid, uid, update_gid, gid, level + 1); } if (!ei->entry_attrs) @@ -334,13 +334,31 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid) for (int i = 0; i < ei->nr_entries; i++) { if (update_uid) { ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID; - ei->entry_attrs[i].uid = ti->vfs_inode.i_uid; + ei->entry_attrs[i].uid = uid; } if (update_gid) { ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID; - ei->entry_attrs[i].gid = ti->vfs_inode.i_gid; + ei->entry_attrs[i].gid = gid; } } + +} + +/* + * On a remount of tracefs, if UID or GID options are set, then + * the mount point inode permissions should be used. + * Reset the saved permission flags appropriately. + */ +void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid) +{ + struct eventfs_inode *ei = ti->private; + + /* Only the events directory does the updates */ + if (!ei || !ei->is_events || ei->is_freed) + return; + + eventfs_set_attrs(ei, update_uid, ti->vfs_inode.i_uid, + update_gid, ti->vfs_inode.i_gid, 0); } /* Return the evenfs_inode of the "events" directory */ -- 2.43.0

1 year, 7 months

[PATCH v2 2/4] tracefs: Update inode permissions on remount

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When a remount happens, if a gid or uid is specified update the inodes to have the same gid and uid. This will allow the simplification of the permissions logic for the dynamically created files and directories. Cc: stable(a)vger.kernel.org Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- fs/tracefs/event_inode.c | 17 +++++++++++++---- fs/tracefs/inode.c | 15 ++++++++++++--- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 55a40a730b10..5dfb1ccd56ea 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -317,20 +317,29 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid) if (!ei) return; - if (update_uid) + if (update_uid) { ei->attr.mode &= ~EVENTFS_SAVE_UID; + ei->attr.uid = ti->vfs_inode.i_uid; + } + - if (update_gid) + if (update_gid) { ei->attr.mode &= ~EVENTFS_SAVE_GID; + ei->attr.gid = ti->vfs_inode.i_gid; + } if (!ei->entry_attrs) return; for (int i = 0; i < ei->nr_entries; i++) { - if (update_uid) + if (update_uid) { ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID; - if (update_gid) + ei->entry_attrs[i].uid = ti->vfs_inode.i_uid; + } + if (update_gid) { ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID; + ei->entry_attrs[i].gid = ti->vfs_inode.i_gid; + } } } diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index a827f6a716c4..9252e0d78ea2 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -373,12 +373,21 @@ static int tracefs_apply_options(struct super_block *sb, bool remount) rcu_read_lock(); list_for_each_entry_rcu(ti, &tracefs_inodes, list) { - if (update_uid) + if (update_uid) { ti->flags &= ~TRACEFS_UID_PERM_SET; + ti->vfs_inode.i_uid = fsi->uid; + } - if (update_gid) + if (update_gid) { ti->flags &= ~TRACEFS_GID_PERM_SET; - + ti->vfs_inode.i_gid = fsi->gid; + } + + /* + * Note, the above ti->vfs_inode updates are + * used in eventfs_remount() so they must come + * before calling it. + */ if (ti->flags & TRACEFS_EVENT_INODE) eventfs_remount(ti, update_uid, update_gid); } -- 2.43.0

1 year, 7 months

[PATCH v2 1/4] eventfs: Keep the directories from having the same inode number as files

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The directories require unique inode numbers but all the eventfs files have the same inode number. Prevent the directories from having the same inode numbers as the files as that can confuse some tooling. Cc: stable(a)vger.kernel.org Fixes: 834bf76add3e6 ("eventfs: Save directory inodes in the eventfs_inode structure") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- fs/tracefs/event_inode.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 0256afdd4acf..55a40a730b10 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -50,8 +50,12 @@ static struct eventfs_root_inode *get_root_inode(struct eventfs_inode *ei) /* Just try to make something consistent and unique */ static int eventfs_dir_ino(struct eventfs_inode *ei) { - if (!ei->ino) + if (!ei->ino) { ei->ino = get_next_ino(); + /* Must not have the file inode number */ + if (ei->ino == EVENTFS_FILE_INODE_INO) + ei->ino = get_next_ino(); + } return ei->ino; } -- 2.43.0

1 year, 7 months

Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

by Linux regression tracking (Thorsten Leemhuis)

[CCing Mario, who asked for the two suspected commits to be backported] On 06.05.24 14:24, Gia wrote: > Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit > TS3 Plus Thunderbolt 3 dock. > > After the update I see this message on boot "xHCI host controller not > responding, assume dead" and the dock is not working anymore. Kernel > 6.8.7 works great. Thx for the report. Could you make the kernel log (journalctl -k/dmesg) accessible somewhere? And have you looked into the other stuff that Mario suggested in the other thread? See the following mail and the reply to it for details: https://lore.kernel.org/all/1eb96465-0a81-4187-b8e7-607d85617d5f@gmail.com/… Ciao, Thorsten P.S.: To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced v6.8.7..v6.8.8 #regzbot title thunderbolt: TB3 dock problems, xHCI host controller not responding, assume dead

1 year, 7 months

RE: [PATCH v2] net: usb: ax88179_178a: avoid writing the mac address before first reading

by Isaac Ganoung

Hello, I am using a TP-Link UE306 USB Ethernet adapter. The kernel detects it as an ASIX AX88179A USB Ethernet adapter. When using a different MAC address than the adapter's own (i.e. MAC address randomization), I am unable to send or receive packets unless set to promiscuous mode. I am using NetworkManager to manage my connections. When I set 802-3-ethernet.cloned-mac-address to the device's MAC address in the connection settings (i.e. `nmcli con edit), the device works as expected. When that property is not set (null value), the device is only able to receive packets when set to promiscuous mode. uname -a output: Linux hostname 6.8.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 28 Apr 2024 18:53:26 +0000 x86_64 GNU/Linux This is Arch Linux's kernel. The patches applied are here: <https://github.com/archlinux/linux/releases/tag/v6.8.8-arch1> dmesg: [37988.917741] usb 2-2: new SuperSpeed USB device number 4 using xhci_hcd [37989.208722] usb 2-2: New USB device found, idVendor=0b95, idProduct=1790, bcdDevice= 2.00 [37989.208744] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [37989.208753] usb 2-2: Product: AX88179A [37989.208760] usb 2-2: Manufacturer: ASIX [37989.208766] usb 2-2: SerialNumber: 0003B40D [37989.481930] cdc_ncm 2-2:2.0: MAC-Address: <removed> [37989.481949] cdc_ncm 2-2:2.0: setting rx_max = 16384 [37989.494646] cdc_ncm 2-2:2.0: setting tx_max = 16384 [37989.506072] cdc_ncm 2-2:2.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-2, CDC NCM (NO ZLP), <removed> journalctl (from when not in promiscuous mode): Apr 29 17:34:47 hostname kernel: usb 2-1: new SuperSpeed USB device number 5 using xhci_hcd Apr 29 17:34:48 hostname kernel: usb 2-1: New USB device found, idVendor=0b95, idProduct=1790, bcdDevice= 2.00 Apr 29 17:34:48 hostname kernel: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Apr 29 17:34:48 hostname kernel: usb 2-1: Product: AX88179A Apr 29 17:34:48 hostname kernel: usb 2-1: Manufacturer: ASIX Apr 29 17:34:48 hostname kernel: usb 2-1: SerialNumber: 0003B40D Apr 29 17:34:48 hostname kernel: cdc_ncm 2-1:2.0: MAC-Address: <removed> Apr 29 17:34:48 hostname kernel: cdc_ncm 2-1:2.0: setting rx_max = 16384 Apr 29 17:34:48 hostname kernel: cdc_ncm 2-1:2.0: setting tx_max = 16384 Apr 29 17:34:48 hostname kernel: cdc_ncm 2-1:2.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM (NO ZLP), <removed> Apr 29 17:34:48 hostname NetworkManager[5652]: <info> [1714430088.5005] manager: (eth1): new Ethernet device (/org/freedesktop/NetworkManager/Devices/14) Apr 29 17:34:48 hostname mtp-probe[6423]: checking bus 2, device 5: "/sys/devices/pci0000:00/0000:00:14.0/usb2/2-1" Apr 29 17:34:48 hostname mtp-probe[6423]: bus: 2, device: 5 was not an MTP device Apr 29 17:34:49 hostname (udev-worker)[6422]: Network interface NamePolicy= disabled on kernel command line. Apr 29 17:34:49 hostname NetworkManager[5652]: <info> [1714430089.1558] device (eth1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Apr 29 17:34:49 hostname mtp-probe[6456]: checking bus 2, device 5: "/sys/devices/pci0000:00/0000:00:14.0/usb2/2-1" Apr 29 17:34:49 hostname mtp-probe[6456]: bus: 2, device: 5 was not an MTP device Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.2247] device (eth1): carrier: link connected Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.2255] device (eth1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.2275] policy: auto-activating connection 'Wired connection 2' (e1106b48-8695-3ed4-b512-a0909ddaa247) Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.2279] device (eth1): Activation: starting connection 'Wired connection 2' (e1106b48-8695-3ed4-b512-a0909ddaa247) Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.2280] device (eth1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.2282] manager: NetworkManager state is now CONNECTING Apr 29 17:34:51 hostname NetworkManager[5652]: <warn> [1714430091.2284] platform-linux: do-change-link[9]: failure 16 (Device or resource busy) Apr 29 17:34:51 hostname systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.3634] device (eth1): set-hw-addr: set-cloned MAC address to 6A:0D:A2:E2:9D:A6 (random) Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.3646] device (eth1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.3729] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.3740] dhcp4 (eth1): activation: beginning transaction (timeout in 45 seconds) Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.3791] dhcp4 (eth1): dhclient started with pid 6459 Apr 29 17:34:51 hostname dhclient[6459]: DHCPREQUEST for 192.168.1.169 on eth1 to 255.255.255.255 port 67 Apr 29 17:34:51 hostname dhclient[6459]: DHCPNAK from 192.168.1.1 Apr 29 17:34:51 hostname NetworkManager[5652]: <info> [1714430091.4578] dhcp4 (eth1): state changed no lease Thanks, Isaac Ganoung

1 year, 7 months

[for-linus][PATCH 3/4] tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test

by Steven Rostedt

From: Jeff Johnson <quic_jjohnson(a)quicinc.com> Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o Link: https://lore.kernel.org/linux-trace-kernel/20240518-md-preemptirq_delay_tes… Cc: stable(a)vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Signed-off-by: Jeff Johnson <quic_jjohnson(a)quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/preemptirq_delay_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 8c4ffd076162..cb0871fbdb07 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void) module_init(preemptirq_delay_init) module_exit(preemptirq_delay_exit) +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency tracers"); MODULE_LICENSE("GPL v2"); -- 2.43.0

1 year, 7 months

[for-linus][PATCH 2/4] ring-buffer: Fix a race between readers and resize checks

by Steven Rostedt

From: Petr Pavlu <petr.pavlu(a)suse.com> The reader code in rb_get_reader_page() swaps a new reader page into the ring buffer by doing cmpxchg on old->list.prev->next to point it to the new page. Following that, if the operation is successful, old->list.next->prev gets updated too. This means the underlying doubly-linked list is temporarily inconsistent, page->prev->next or page->next->prev might not be equal back to page for some page in the ring buffer. The resize operation in ring_buffer_resize() can be invoked in parallel. It calls rb_check_pages() which can detect the described inconsistency and stop further tracing: [ 190.271762] ------------[ cut here ]------------ [ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0 [ 190.271789] Modules linked in: [...] [ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1 [ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f [ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0 [ 190.272023] Code: [...] [ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206 [ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80 [ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700 [ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000 [ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720 [ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000 [ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000 [ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0 [ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 190.272077] Call Trace: [ 190.272098] <TASK> [ 190.272189] ring_buffer_resize+0x2ab/0x460 [ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0 [ 190.272206] tracing_resize_ring_buffer+0x65/0x90 [ 190.272216] tracing_entries_write+0x74/0xc0 [ 190.272225] vfs_write+0xf5/0x420 [ 190.272248] ksys_write+0x67/0xe0 [ 190.272256] do_syscall_64+0x82/0x170 [ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 190.272373] RIP: 0033:0x7f1bd657d263 [ 190.272381] Code: [...] [ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263 [ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001 [ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000 [ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500 [ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002 [ 190.272412] </TASK> [ 190.272414] ---[ end trace 0000000000000000 ]--- Note that ring_buffer_resize() calls rb_check_pages() only if the parent trace_buffer has recording disabled. Recent commit d78ab792705c ("tracing: Stop current tracer when resizing buffer") causes that it is now always the case which makes it more likely to experience this issue. The window to hit this race is nonetheless very small. To help reproducing it, one can add a delay loop in rb_get_reader_page(): ret = rb_head_page_replace(reader, cpu_buffer->reader_page); if (!ret) goto spin; for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */ __asm__ __volatile__ ("" : : : "memory"); rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list; .. and then run the following commands on the target system: echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable while true; do echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 done & while true; do for i in /sys/kernel/tracing/per_cpu/*; do timeout 0.1 cat $i/trace_pipe; sleep 0.2 done done To fix the problem, make sure ring_buffer_resize() doesn't invoke rb_check_pages() concurrently with a reader operating on the same ring_buffer_per_cpu by taking its cpu_buffer->reader_lock. Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavl… Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read") Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com> [ Fixed whitespace ] Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 42227727a49d..28853966aa9a 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1460,6 +1460,11 @@ static void rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer, * * As a safety measure we check to make sure the data pages have not * been corrupted. + * + * Callers of this function need to guarantee that the list of pages doesn't get + * modified during the check. In particular, if it's possible that the function + * is invoked with concurrent readers which can swap in a new reader page then + * the caller should take cpu_buffer->reader_lock. */ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) { @@ -2210,8 +2215,12 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size, */ synchronize_rcu(); for_each_buffer_cpu(buffer, cpu) { + unsigned long flags; + cpu_buffer = buffer->buffers[cpu]; + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); rb_check_pages(cpu_buffer); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } atomic_dec(&buffer->record_disabled); } -- 2.43.0

1 year, 7 months

← Newer
1
...
36
37
38
39
40
41
42
...
84
Older →

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2024