The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2023040340-happiest-next-a09c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
50d281fc434c ("btrfs: scan device in non-exclusive mode") 12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode") 5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode") b70f509774ad ("btrfs: check and enable ZONED mode") 5b316468983d ("btrfs: get zone information of zoned block devices") bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 Mon Sep 17 00:00:00 2001 From: Anand Jain anand.jain@oracle.com Date: Thu, 23 Mar 2023 15:56:48 +0800 Subject: [PATCH] btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd scan.
During the device scan initiated by systemd-udevd, other user space EXCL operations such as mkfs, mount, or check may get blocked and result in a "Device or resource busy" error. This is because the device scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY error
- LTP pwritev03 test case, where mkfs.vfs failed with the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem on the device.
In both cases, fsck and mkfs (respectively) were racing with a systemd-udevd device scan, and systemd-udevd won, resulting in the -EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very small window during which these userspace threads can race to acquire the exclusive device open. Even on the system where the problem was observed, the problem occurrences were anywhere between 10 to 400 iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process, as there are no write operations on the device during scan. Furthermore, during the mount process, the superblock is re-read in the below function call chain:
btrfs_mount_root btrfs_open_devices open_fs_devices btrfs_open_one_device btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan operation, and add a comment.
The case where mkfs may still write to the device and a scan is running, the btrfs signature is not written at that time so scan will not recognize such device.
Reported-by: Sherry Yang sherry.yang@oracle.com Reported-by: kernel test robot oliver.sang@intel.com Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6d0124b6e79e..ac0e8fb92fc8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, * So, we need to add a special mount option to scan for * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ - flags |= FMODE_EXCL;
+ /* + * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may + * initiate the device scan which may race with the user's mount + * or mkfs command, resulting in failure. + * Since the device scan is solely for reading purposes, there is + * no need for FMODE_EXCL. Additionally, the devices are read again + * during the mount process. It is ok to get some inconsistent + * values temporarily, as the device paths of the fsid are the only + * required information for assembling the volume. + */ bdev = blkdev_get_by_path(path, flags, holder); if (IS_ERR(bdev)) return ERR_CAST(bdev);