Re: [PATCH 1/5] ceph: Do not propagate page array emplacement errors as batch errors

6 Jan 2026

On Mon, Jan 5, 2026 at 12:24 PM Viacheslav Dubeyko
Slava.Dubeyko@ibm.com wrote:
...
On Tue, 2025-12-30 at 18:43 -0800, Sam Edwards wrote:
...
When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
because it needs to allocate bounce buffers to store the encrypted
versions of each folio. Each folio beyond the first allocates its bounce
buffer with GFP_NOWAIT. Failures are common (and expected) under this
allocation mode; they should flush (not abort) the batch.
However, ceph_process_folio_batch() uses the same `rc` variable for its
own return code and for capturing the return codes of its routine calls;
failing to reset `rc` back to 0 results in the error being propagated
out to the main writeback loop, which cannot actually tolerate any
errors here: once `ceph_wbc.pages` is allocated, it must be passed to
ceph_submit_write() to be freed. If it survives until the next iteration
(e.g. due to the goto being followed), ceph_allocate_page_array()'s
BUG_ON() will oops the worker. (Subsequent patches in this series make
the loop more robust.)
Hi Slava,
...
I think you are right with the fix. We have the loop here and if we already
moved some dirty folios, then we should flush it. But what if we failed on the
first one folio, then should we return no error code in this case?
The case you ask about, where move_dirty_folio_in_page_array() returns
an error for the first folio, is currently not possible:
1) The only error code that move_dirty_folio_in_page_array() can
propagate is from fscrypt_encrypt_pagecache_blocks(), which it calls
with GFP_NOFS for the first folio. The latter function's doc comment
outright states:
 * The bounce page allocation is mempool-backed, so it will always succeed when
 * @gfp_flags includes __GFP_DIRECT_RECLAIM, e.g. when it's GFP_NOFS.
2) The error return isn't even reachable for the first folio because
of the BUG_ON(ceph_wbc->locked_pages == 0); line.
...
...
Note that this failure mode is currently masked due to another bug
(addressed later in this series) that prevents multiple encrypted folios
from being selected for the same write.
So, maybe, this patch has been not correctly placed in the order?
This crash is unmasked by patch 5 of this series. (It allows multiple
folios to be batched when fscrypt is enabled.) Patch 5 has no hard
dependency on anything else in this series, so it could -- in
principle -- be ordered first as you suggest. However, that ordering
would deliberately cause a regression in kernel stability, even if
only briefly. That's not considered good practice in my view, as it
may affect people who are trying to bisect and regression test. So the
ordering of this series is: fix the crash in the unused code first,
then fix the bug that makes it unused.
...
It will be
good to see the reproduction of the issue and which symptoms we have for this
issue. Do you have the reproduction script and call trace of the issue?
Fair point!
Function inlining makes the call trace not very interesting:
Call trace:
 ceph_writepages_start+0x16ec/0x18e0 [ceph] ()
 do_writepages+0xb0/0x1c0
 __writeback_single_inode+0x4c/0x4d8
 writeback_sb_inodes+0x238/0x4c8
 __writeback_inodes_wb+0x64/0x120
 wb_writeback+0x320/0x3e8
 wb_workfn+0x42c/0x518
 process_one_work+0x17c/0x428
 worker_thread+0x260/0x390
 kthread+0x148/0x240
 ret_from_fork+0x10/0x20
Code: 34ffdee0 52800020 3903e7e0 17fffef4 (d4210000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops - BUG: Fatal exception
ceph_writepages_start+0x16ec corresponds to linux-6.18.2/fs/ceph/addr.c:1222
However, these repro steps should work:
1) Apply patch 5 from this series (and no other patches)
2) Mount CephFS and activate fscrypt
3) Copy a large directory into the CephFS mount
4) After dozens of GBs transferred, you should observe the above kernel oops
Warm regards,
Sam
...
...
For now, just reset `rc` when redirtying the folio and prevent the
error from propagating. After this change, ceph_process_folio_batch() no
longer returns errors; its only remaining failure indicator is
`locked_pages == 0`, which the caller already handles correctly. The
next patch in this series addresses this.
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Cc: stable@vger.kernel.org
Signed-off-by: Sam Edwards CFSworks@gmail.com

fs/ceph/addr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 63b75d214210..3462df35d245 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1369,6 +1369,7 @@ int ceph_process_folio_batch(struct address_space *mapping,
              rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
                              folio);
              if (rc) {

                rc = 0;



I like the fix but I would like to clarify the above questions at first.
Thanks,
Slava.
...
                  folio_redirty_for_writepage(wbc, folio);
                  folio_unlock(folio);
                  break;


    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/5] ceph: Do not propagate page array emplacement errors as batch errors