On Wed, 2025-12-24 at 20:02 -0800, Viacheslav Dubeyko wrote:
I think that it's not completely correct fix. First of all, we have bitmap corruption. It means that we need to complain about it and return error code. Logic cannot continue to work normally because we cannot rely on bitmap anymore. It could contain multiple corrupted bits.
Technically speaking, we need to check that bitmap is corrupted when we create b-trees during mount operation (we can define it for node 0 but it could be tricky for other nodes). If we have detected the corruption, then we can recommend to run FSCK tool and we can mount in READ-ONLY mode.
I think we can check the bitmap when we are trying to open/create not a new node but already existing in the tree. I mean if we mounted the volume this b-tree containing several nodes on the volume, we can check that bitmap contains the set bit for these nodes. And if the bit is not there, then it's clear sign of bitmap corruption. Currently, I haven't idea how to check corrupted bits that showing presence of not existing nodes in the b-tree. But I suppose that we can do some check in driver's logic. Finally, if we detected corruption, then we should complain about the corruption. Ideally, it will be good to remount in READ-ONLY mode.
Does it make sense to you?
Hi Slava,
Yes, that makes sense.
Skipping node 0 indeed looks like only a local workaround: if the bitmap is already inconsistent, we shouldn’t proceed as if it is trustworthy for further allocations, because other bits could be wrong as well.
For the next revision I plan to replace the “skip node 0” guard with a bitmap sanity check during btree open/mount. At minimum, I will verify that the header node (node 0) is marked allocated, and I will also investigate whether other existing nodes can be validated as well. If corruption is detected, the driver will report it and force a read-only mount, along with a recommendation to run fsck.hfsplus. This avoids continuing RW operation with a known-bad allocator state.
In parallel, I plan to keep the -EEXIST change in hfs_bnode_create() as a robustness fix for any remaining or future inconsistency paths.
I’ll post a respin shortly.
If you’re OK with it, I can also post the hfs_bnode_create() -EEXIST change as a standalone fix, since it independently prevents a refcount underflow and panic even outside the bitmap-corruption scenario. I’ll continue working on the bitmap validation in parallel.
Thanks, Shardul