From: Christian Brauner brauner@kernel.org
[ Upstream commit 6474353a5e3d0b2cf610153cea0c61f576a36d0a ]
Epoll relies on a racy fastpath check during __fput() in eventpoll_release() to avoid the hit of pointlessly acquiring a semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE().
Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner Reviewed-by: Jan Kara jack@suse.cz Reported-by: syzbot+3b6b32dc50537a49bb4a@syzkaller.appspotmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/eventpoll.c | 6 ++++-- include/linux/eventpoll.h | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 7221072f39fad..f296ffb57d052 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -703,7 +703,8 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi) to_free = NULL; head = file->f_ep; if (head->first == &epi->fllink && !epi->fllink.next) { - file->f_ep = NULL; + /* See eventpoll_release() for details. */ + WRITE_ONCE(file->f_ep, NULL); if (!is_file_epoll(file)) { struct epitems_head *v; v = container_of(head, struct epitems_head, epitems); @@ -1467,7 +1468,8 @@ static int attach_epitem(struct file *file, struct epitem *epi) spin_unlock(&file->f_lock); goto allocate; } - file->f_ep = head; + /* See eventpoll_release() for details. */ + WRITE_ONCE(file->f_ep, head); to_free = NULL; } hlist_add_head_rcu(&epi->fllink, file->f_ep); diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 3337745d81bd6..0c0d00fcd131f 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -42,7 +42,7 @@ static inline void eventpoll_release(struct file *file) * because the file in on the way to be removed and nobody ( but * eventpoll ) has still a reference to this file. */ - if (likely(!file->f_ep)) + if (likely(!READ_ONCE(file->f_ep))) return;
/*
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit a0bd7dacbd51c632b8e2c0500b479af564afadf3 ]
CPU hotplug remove handling triggers the following function call sequence:
CPUHP_AP_PERF_S390_SF_ONLINE --> s390_pmu_sf_offline_cpu() ... CPUHP_AP_PERF_ONLINE --> perf_event_exit_cpu()
The s390 CPUMF sampling CPU hotplug handler invokes:
s390_pmu_sf_offline_cpu() +--> cpusf_pmu_setup() +--> setup_pmc_cpu() +--> deallocate_buffers()
This function de-allocates all sampling data buffers (SDBs) allocated for that CPU at event initialization. It also clears the PMU_F_RESERVED bit. The CPU is gone and can not be sampled.
With the event still being active on the removed CPU, the CPU event hotplug support in kernel performance subsystem triggers the following function calls on the removed CPU:
perf_event_exit_cpu() +--> perf_event_exit_cpu_context() +--> __perf_event_exit_context() +--> __perf_remove_from_context() +--> event_sched_out() +--> cpumsf_pmu_del() +--> cpumsf_pmu_stop() +--> hw_perf_event_update()
to stop and remove the event. During removal of the event, the sampling device driver tries to read out the remaining samples from the sample data buffers (SDBs). But they have already been freed (and may have been re-assigned). This may lead to a use after free situation in which case the samples are most likely invalid. In the best case the memory has not been reassigned and still contains valid data.
Remedy this situation and check if the CPU is still in reserved state (bit PMU_F_RESERVED set). In this case the SDBs have not been released an contain valid data. This is always the case when the event is removed (and no CPU hotplug off occured). If the PMU_F_RESERVED bit is not set, the SDB buffers are gone.
Signed-off-by: Thomas Richter tmricht@linux.ibm.com Reviewed-by: Hendrik Brueckner brueckner@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_cpum_sf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c index f3b0a106f7227..46a1a85a0e440 100644 --- a/arch/s390/kernel/perf_cpum_sf.c +++ b/arch/s390/kernel/perf_cpum_sf.c @@ -1896,7 +1896,9 @@ static void cpumsf_pmu_stop(struct perf_event *event, int flags) event->hw.state |= PERF_HES_STOPPED;
if ((flags & PERF_EF_UPDATE) && !(event->hw.state & PERF_HES_UPTODATE)) { - hw_perf_event_update(event, 1); + /* CPU hotplug off removes SDBs. No samples to extract. */ + if (cpuhw->flags & PMU_F_RESERVED) + hw_perf_event_update(event, 1); event->hw.state |= PERF_HES_UPTODATE; } perf_pmu_enable(event->pmu);
From: Qu Wenruo wqu@suse.com
[ Upstream commit 2e8b6bc0ab41ce41e6dfcc204b6cc01d5abbc952 ]
[PROBLEM] It is very common for udev to trigger device scan, and every time a mounted btrfs device got re-scan from different soft links, we will get some of unnecessary device path updates, this is especially common for LVM based storage:
# lvs scratch1 test -wi-ao---- 10.00g scratch2 test -wi-a----- 10.00g scratch3 test -wi-a----- 10.00g scratch4 test -wi-a----- 10.00g scratch5 test -wi-a----- 10.00g test test -wi-a----- 10.00g
# mkfs.btrfs -f /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # dmesg -c [ 205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154) [ 205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 [ 205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm [ 205.713856] BTRFS info (device dm-4): using free-space-tree [ 205.722324] BTRFS info (device dm-4): checking UUID tree
So far so good, but even if we just touched any soft link of "dm-4", we will get quite some unnecessary device path updates.
# touch /dev/mapper/test-scratch1 # dmesg -c [ 469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221) [ 469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221)
Such device path rename is unnecessary and can lead to random path change due to the udev race.
[CAUSE] Inside device_list_add(), we are using a very primitive way checking if the device has changed, strcmp().
Which can never handle links well, no matter if it's hard or soft links.
So every different link of the same device will be treated as a different device, causing the unnecessary device path update.
[FIX] Introduce a helper, is_same_device(), and use path_equal() to properly detect the same block device. So that the different soft links won't trigger the rename race.
Reviewed-by: Filipe Manana fdmanana@suse.com Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641 Reported-by: Fabian Vogt fvogt@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8c7e74499ed17..9779ab410f8fa 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -671,6 +671,42 @@ u8 *btrfs_sb_fsid_ptr(struct btrfs_super_block *sb) return has_metadata_uuid ? sb->metadata_uuid : sb->fsid; }
+static bool is_same_device(struct btrfs_device *device, const char *new_path) +{ + struct path old = { .mnt = NULL, .dentry = NULL }; + struct path new = { .mnt = NULL, .dentry = NULL }; + char *old_path = NULL; + bool is_same = false; + int ret; + + if (!device->name) + goto out; + + old_path = kzalloc(PATH_MAX, GFP_NOFS); + if (!old_path) + goto out; + + rcu_read_lock(); + ret = strscpy(old_path, rcu_str_deref(device->name), PATH_MAX); + rcu_read_unlock(); + if (ret < 0) + goto out; + + ret = kern_path(old_path, LOOKUP_FOLLOW, &old); + if (ret) + goto out; + ret = kern_path(new_path, LOOKUP_FOLLOW, &new); + if (ret) + goto out; + if (path_equal(&old, &new)) + is_same = true; +out: + kfree(old_path); + path_put(&old); + path_put(&new); + return is_same; +} + /* * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices * being created with a disk that has already completed its fsid change. Such @@ -889,7 +925,7 @@ static noinline struct btrfs_device *device_list_add(const char *path, disk_super->fsid, devid, found_transid, path, current->comm, task_pid_nr(current));
- } else if (!device->name || strcmp(device->name->str, path)) { + } else if (!device->name || !is_same_device(device, path)) { /* * When FS is already mounted. * 1. If you are here and if the device->name is NULL that
From: Boris Burkov boris@bur.io
[ Upstream commit 70958a949d852cbecc3d46127bf0b24786df0130 ]
If you follow the seed/sprout wiki, it suggests the following workflow:
btrfstune -S 1 seed_dev mount seed_dev mnt btrfs device add sprout_dev mount -o remount,rw mnt
The first mount mounts the FS readonly, which results in not setting BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add somewhat surprisingly clears the readonly bit on the sb (though the mount is still practically readonly, from the users perspective...). Finally, the remount checks the readonly bit on the sb against the flag and sees no change, so it does not run the code intended to run on ro->rw transitions, leaving BTRFS_FS_OPEN unset.
As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and does no work. This results in leaking deleted snapshots until we run out of space.
I propose fixing it at the first departure from what feels reasonable: when we clear the readonly bit on the sb during device add.
A new fstest I have written reproduces the bug and confirms the fix.
Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9779ab410f8fa..a4177014eb8b0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2767,8 +2767,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
if (seeding_dev) { - btrfs_clear_sb_rdonly(sb); - /* GFP_KERNEL allocation must not be under device_list_mutex */ seed_devices = btrfs_init_sprout(fs_info); if (IS_ERR(seed_devices)) { @@ -2911,8 +2909,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path mutex_unlock(&fs_info->chunk_mutex); mutex_unlock(&fs_info->fs_devices->device_list_mutex); error_trans: - if (seeding_dev) - btrfs_set_sb_rdonly(sb); if (trans) btrfs_end_transaction(trans); error_free_zone:
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 2342d6595b608eec94187a17dc112dd4c2a812fa ]
Smatch complains about calling PTR_ERR() against a NULL pointer:
fs/btrfs/super.c:2272 btrfs_control_ioctl() warn: passing zero to 'PTR_ERR'
Fix this by calling PTR_ERR() against the device pointer only if it contains an error.
Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/super.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d063379a031dc..3695c694dc0de 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2520,7 +2520,10 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd, &btrfs_root_fs_type); if (IS_ERR(device)) { mutex_unlock(&uuid_mutex); - ret = PTR_ERR(device); + if (IS_ERR(device)) + ret = PTR_ERR(device); + else + ret = 0; break; } ret = !(device->fs_devices->num_devices ==
From: Mark Brown broonie@kernel.org
[ Upstream commit 3e360ef0c0a1fb6ce9a302e40b8057c41ba8a9d2 ]
When building for streaming SVE the irritator for SVE skips updates of both P0 and FFR. While FFR is skipped since it might not be present there is no reason to skip corrupting P0 so switch to an instruction valid in streaming mode and move the ifdef.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20241107-arm64-fp-stress-irritator-v2-3-c4b9622e36... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/arm64/fp/sve-test.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S index 2a18cb4c528c5..b9648daeabbb1 100644 --- a/tools/testing/selftests/arm64/fp/sve-test.S +++ b/tools/testing/selftests/arm64/fp/sve-test.S @@ -304,9 +304,9 @@ function irritator_handler movi v0.8b, #1 movi v9.16b, #2 movi v31.8b, #3 -#ifndef SSVE // And P0 - rdffr p0.b + ptrue p0.d +#ifndef SSVE // And FFR wrffr p15.b #endif
From: Mark Brown broonie@kernel.org
[ Upstream commit 27141b690547da5650a420f26ec369ba142a9ebb ]
The PAC exec_sign_all() test spawns some child processes, creating pipes to be stdin and stdout for the child. It cleans up most of the file descriptors that are created as part of this but neglects to clean up the parent end of the child stdin and stdout. Add the missing close() calls.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20241111-arm64-pac-test-collisions-v1-1-171875f37e... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/arm64/pauth/pac.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/arm64/pauth/pac.c b/tools/testing/selftests/arm64/pauth/pac.c index b743daa772f55..5a07b3958fbf2 100644 --- a/tools/testing/selftests/arm64/pauth/pac.c +++ b/tools/testing/selftests/arm64/pauth/pac.c @@ -182,6 +182,9 @@ int exec_sign_all(struct signatures *signed_vals, size_t val) return -1; }
+ close(new_stdin[1]); + close(new_stdout[0]); + return 0; }
linux-stable-mirror@lists.linaro.org