On Fri, Jul 30, 2021 at 12:17:26PM -0700, Eric Biggers wrote:
Currently, non-overwrite DIO writes are fundamentally unsafe on f2fs as they require preallocating blocks, but f2fs doesn't support unwritten blocks and therefore has to preallocate the blocks as regular blocks. f2fs has no way to reliably roll back such preallocations, so as a result, f2fs will leak uninitialized blocks to users if a DIO write doesn't fully complete.
There's another way of solving this problem which doesn't require supporting unwritten blocks. What a file system *could* do is to allocate the blocks, but *not* update the on-disk data structures --- so the allocation happens in memory only, so you know that the physical blocks won't get used for another files, and then issue the data block writes. On the block I/O completion, trigger a workqueue function which updates the on-disk metadata to assign physical blocks to the inode.
That way if you crash before the data I/O has a chance to complete, the on-disk logical block -> physical block map hasn't been updated yet, and so you don't need to worry about leaking uninitialized blocks.
Cheers,
- Ted