On 28/02/2024 18.25, Patrick Plenefisch wrote:
I'm unsure if this is just an LVM bug, or a BTRFS+LVM interaction bug, but LVM is definitely involved somehow. Upgrading from 5.10 to 6.1, I noticed one of my filesystems was read-only. In dmesg, I found:
BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO failure (errors while submitting device barriers.) BTRFS info (device dm-75: state E): forced readonly BTRFS warning (device dm-75: state E): Skipping commit of aborted transaction. BTRFS: error (device dm-75: state EA) in cleanup_transaction:1992: errno=-5 IO failure
At first I suspected a btrfs error, but a scrub found no errors, and it continued to be read-write on 5.10 kernels.
Here is my setup:
/dev/lvm/brokenDisk is a lvm-on-lvm volume. I have /dev/sd{a,b,c,d} (of varying sizes) in a lower VG, which has three LVs, all raid1 volumes. Two of the volumes are further used as PV's for an upper VGs. One of the upper VGs has no issues. The non-PV LV has no issue. The remaining one, /dev/lowerVG/lvmPool, hosting nested LVM, is used as a PV for VG "lvm", and has 3 volumes inside. Two of those volumes have no issues (and are btrfs), but the last one is /dev/lvm/brokenDisk. This volume is the only one that exhibits this behavior, so something is special.
Or described as layers: /dev/sd{a,b,c,d} => PV => VG "lowerVG" /dev/lowerVG/single (RAID1 LV) => BTRFS, works fine /dev/lowerVG/works (RAID1 LV) => PV => VG "workingUpper" /dev/workingUpper/{a,b,c} => BTRFS, works fine /dev/lowerVG/lvmPool (RAID1 LV) => PV => VG "lvm" /dev/lvm/{a,b} => BTRFS, works fine /dev/lvm/brokenDisk => BTRFS, Exhibits errors
I am a bit curious about the reasons of this setup. However I understood that:
/dev/sda -+ +-- single (RAID1) -> ok +-> a ok /dev/sdb | | |-> b ok /dev/sdc +--> [lowerVG]>--+-- works (RAID1) -> [workingUpper] -+-> c ok /dev/sdd -+ | | +-> a -> ok +-- lvmPool -> [lvm] ->-| +-> b -> ok | +->brokenDisk -> fail
[xxx] means VG, the others are LVs that may act also as PV in an upper VG
So, it seems that
1) lowerVG/lvmPool/lvm/a 2) lowerVG/lvmPool/lvm/a 3) lowerVG/lvmPool/lvm/brokenDisk
are equivalent ... so I don't understand how 1) and 2) are fine but 3) is problematic.
Is my understanding of the LVM layouts correct ?
After some investigation, here is what I've found:
- This regression was introduced in 5.19. 5.18 and earlier kernels I
can keep this filesystem rw and everything works as expected, while 5.19.0 and later the filesystem is immediately ro on any write attempt. I couldn't build rc1, but I did confirm rc2 already has this regression. 2. Passing /dev/lvm/brokenDisk to a KVM VM as /dev/vdb with an unaffected kernel inside the vm exhibits the ro barrier problem on unaffected kernels.
Is /dev/lvm/brokenDisk *always* problematic with affected ( >= 5.19 ) and UNaffected ( < 5.19 ) kernel ?
- Passing /dev/lowerVG/lvmPool to a KVM VM as /dev/vdb with an
affected kernel inside the VM and using LVM inside the VM exhibits correct behavior (I can keep the filesystem rw, no barrier errors on host or guest)
Is /dev/lowerVG/lvmPool problematic with only "affected" kernel ?
[...]