Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken the logic in iomap_dio_bio_iter() in a way that when the device does support FUA (or has no writeback cache) and the direct IO happens to freshly allocated or unwritten extents, we will *not* issue fsync after completing direct IO O_SYNC / O_DSYNC write because the IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as it was originally intended.
CC: John Garry john.g.garry@oracle.com CC: "Ritesh Harjani (IBM)" ritesh.list@gmail.com Fixes: d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz --- fs/iomap/direct-io.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
BTW, I've spotted this because some performance tests got suspiciously fast on recent kernels :) Sadly no easy improvement to cherry-pick for me to fix customer issue I'm chasing...
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 6f25d4cfea9f..b84f6af2eb4c 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -363,14 +363,14 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) if (iomap->flags & IOMAP_F_SHARED) dio->flags |= IOMAP_DIO_COW;
- if (iomap->flags & IOMAP_F_NEW) { + if (iomap->flags & IOMAP_F_NEW) need_zeroout = true; - } else if (iomap->type == IOMAP_MAPPED) { - if (iomap_dio_can_use_fua(iomap, dio)) - bio_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; - } + else if (iomap->type == IOMAP_MAPPED && + iomap_dio_can_use_fua(iomap, dio)) + bio_opf |= REQ_FUA; + + if (!(bio_opf & REQ_FUA)) + dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
/* * We can only do deferred completion for pure overwrites that
On Wed, Jul 30, 2025 at 12:28:41PM +0200, Jan Kara wrote:
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken the logic in iomap_dio_bio_iter() in a way that when the device does support FUA (or has no writeback cache) and the direct IO happens to freshly allocated or unwritten extents, we will *not* issue fsync after completing direct IO O_SYNC / O_DSYNC write because the IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set.
Uh-oh.
Looks good:
Reviewed-by: Christoph Hellwig hch@lst.de
On Wed, Jul 30, 2025 at 12:28:41PM +0200, Jan Kara wrote:
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken the logic in iomap_dio_bio_iter() in a way that when the device does support FUA (or has no writeback cache) and the direct IO happens to freshly allocated or unwritten extents, we will *not* issue fsync after completing direct IO O_SYNC / O_DSYNC write because the IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as it was originally intended.
CC: John Garry john.g.garry@oracle.com CC: "Ritesh Harjani (IBM)" ritesh.list@gmail.com Fixes: d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz
Aha, I wonder if that's why fuse2fs+iomap occasionally fails the fstest that tries to replay all the FUA writes, finds none, and complains.
(and I bet nobody noticed this on xfs because the log sends FUA, whereas fuse2fs merely sends fsync to the block device)
Reviewed-by: "Darrick J. Wong" djwong@kernel.org
--D
fs/iomap/direct-io.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
BTW, I've spotted this because some performance tests got suspiciously fast on recent kernels :) Sadly no easy improvement to cherry-pick for me to fix customer issue I'm chasing...
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 6f25d4cfea9f..b84f6af2eb4c 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -363,14 +363,14 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) if (iomap->flags & IOMAP_F_SHARED) dio->flags |= IOMAP_DIO_COW;
if (iomap->flags & IOMAP_F_NEW) {
if (iomap->flags & IOMAP_F_NEW) need_zeroout = true;
} else if (iomap->type == IOMAP_MAPPED) {
if (iomap_dio_can_use_fua(iomap, dio))
bio_opf |= REQ_FUA;
else
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
}
else if (iomap->type == IOMAP_MAPPED &&
iomap_dio_can_use_fua(iomap, dio))
bio_opf |= REQ_FUA;
if (!(bio_opf & REQ_FUA))
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
/* * We can only do deferred completion for pure overwrites that -- 2.43.0
On 30/07/2025 11:28, Jan Kara wrote:
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken the logic in iomap_dio_bio_iter() in a way that when the device does support FUA (or has no writeback cache) and the direct IO happens to freshly allocated or unwritten extents, we will*not* issue fsync after completing direct IO O_SYNC / O_DSYNC write because the IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as it was originally intended.
CC: John Garryjohn.g.garry@oracle.com CC: "Ritesh Harjani (IBM)"ritesh.list@gmail.com Fixes: d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") CC:stable@vger.kernel.org Signed-off-by: Jan Karajack@suse.cz
thanks
Reviewed-by: John Garry john.g.garry@oracle.com
Jan Kara jack@suse.cz writes:
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken the logic in iomap_dio_bio_iter() in a way that when the device does support FUA (or has no writeback cache) and the direct IO happens to freshly allocated or unwritten extents, we will *not* issue fsync after completing direct IO O_SYNC / O_DSYNC write because the IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as it was originally intended.
CC: John Garry john.g.garry@oracle.com CC: "Ritesh Harjani (IBM)" ritesh.list@gmail.com Fixes: d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz
Nice catch. It's been there since v6.15 I guess. Looks like it didn't get caught in xfstests either then.
We have generic/737, but looks like it only covers O_SYNC dio writes. Let me enhance that to cover O_DYSNC dio writes too.
Looks good to me. Please feel free to add:
Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
-ritesh
On Wed, 30 Jul 2025 12:28:41 +0200, Jan Kara wrote:
Commit d279c80e0bac ("iomap: inline iomap_dio_bio_opflags()") has broken the logic in iomap_dio_bio_iter() in a way that when the device does support FUA (or has no writeback cache) and the direct IO happens to freshly allocated or unwritten extents, we will *not* issue fsync after completing direct IO O_SYNC / O_DSYNC write because the IOMAP_DIO_WRITE_THROUGH flag stays mistakenly set. Fix the problem by clearing IOMAP_DIO_WRITE_THROUGH whenever we do not perform FUA write as it was originally intended.
[...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree. Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.fixes
[1/1] iomap: Fix broken data integrity guarantees for O_SYNC writes https://git.kernel.org/vfs/vfs/c/16f206eebbf8
linux-stable-mirror@lists.linaro.org