If `locked_pages` is zero, the page array must not be allocated: ceph_process_folio_batch() uses `locked_pages` to decide when to allocate `pages`, and redundant allocations trigger ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and writeback stall) or even a kernel panic. Consequently, the main loop in ceph_writepages_start() assumes that the lifetime of `pages` is confined to a single iteration.
The ceph_submit_write() function claims ownership of the page array on success. But failures only redirty/unlock the pages and fail to free the array, making the failure case in ceph_submit_write() fatal.
Free the page array in ceph_submit_write()'s error-handling 'if' block so that the caller's invariant (that the array does not outlive the iteration) is maintained unconditionally, allowing failures in ceph_submit_write() to be recoverable as originally intended.
Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards CFSworks@gmail.com --- fs/ceph/addr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 2b722916fb9b..91cc43950162 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1466,6 +1466,13 @@ int ceph_submit_write(struct address_space *mapping, unlock_page(page); }
+ if (ceph_wbc->from_pool) { + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool); + ceph_wbc->from_pool = false; + } else + kfree(ceph_wbc->pages); + ceph_wbc->pages = NULL; + ceph_osdc_put_request(req); return -EIO; }
On Tue, 2025-12-30 at 18:43 -0800, Sam Edwards wrote:
If `locked_pages` is zero, the page array must not be allocated: ceph_process_folio_batch() uses `locked_pages` to decide when to allocate `pages`,
I don't quite follow how this statement is relevant to the issue. If `locked_pages` is zero, then ceph_submit_write() will not to be called. Do I miss something here?
and redundant allocations trigger ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and writeback stall) or even a kernel panic. Consequently, the main loop in ceph_writepages_start() assumes that the lifetime of `pages` is confined to a single iteration.
It will be great to see the reproducer script or application and call trace of the issue. Could you please share the reproduction path and the call trace of the issue?
The ceph_submit_write() function claims ownership of the page array on success.
As far as I can see, writepages_finish() should free the page array on success.
But failures only redirty/unlock the pages and fail to free the array, making the failure case in ceph_submit_write() fatal.
Free the page array in ceph_submit_write()'s error-handling 'if' block so that the caller's invariant (that the array does not outlive the iteration) is maintained unconditionally, allowing failures in ceph_submit_write() to be recoverable as originally intended.
Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards CFSworks@gmail.com
fs/ceph/addr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 2b722916fb9b..91cc43950162 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1466,6 +1466,13 @@ int ceph_submit_write(struct address_space *mapping, unlock_page(page); }
if (ceph_wbc->from_pool) {mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);ceph_wbc->from_pool = false;} elsekfree(ceph_wbc->pages);ceph_wbc->pages = NULL;
Probably, it makes sense to introduce a method ceph_free_page_array likewise to __ceph_allocate_page_array() and to use for freeing page array in all places.
Could ceph_wbc->locked_pages be greater than zero but ceph_wbc->pages == NULL?
Thanks, Slava.
- ceph_osdc_put_request(req); return -EIO; }
On Mon, Jan 5, 2026 at 1:09 PM Viacheslav Dubeyko Slava.Dubeyko@ibm.com wrote:
On Tue, 2025-12-30 at 18:43 -0800, Sam Edwards wrote:
If `locked_pages` is zero, the page array must not be allocated: ceph_process_folio_batch() uses `locked_pages` to decide when to allocate `pages`,
Hi Slava,
I don't quite follow how this statement is relevant to the issue. If `locked_pages` is zero, then ceph_submit_write() will not to be called. Do I miss something here?
That statement is only informing that ceph_process_folio_batch() will BUG() when locked_pages == 0 && pages != NULL. It establishes why `pages` must be freed/NULLed before the next iteration of ceph_writepages_start()'s loop (which sets locked_pages = 0).
and redundant allocations trigger ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and writeback stall) or even a kernel panic. Consequently, the main loop in ceph_writepages_start() assumes that the lifetime of `pages` is confined to a single iteration.
It will be great to see the reproducer script or application and call trace of the issue. Could you please share the reproduction path and the call trace of the issue?
It's difficult to reproduce organically. It arises when `!ceph_inc_osd_stopping_blocker(fsc->mdsc)`, which I understand can only happen in a race. I used the fault injection framework to force ceph_inc_osd_stopping_blocker() to fail.
The call trace is disinteresting. See my reply to your comments on patch 1: it's the same trace.
The ceph_submit_write() function claims ownership of the page array on success.
As far as I can see, writepages_finish() should free the page array on success.
That's my understanding too; by "claims ownership of the page array" I only mean that ceph_writepages_start() isn't responsible for cleaning it up, once it calls ceph_submit_write().
But failures only redirty/unlock the pages and fail to free the array, making the failure case in ceph_submit_write() fatal.
Free the page array in ceph_submit_write()'s error-handling 'if' block so that the caller's invariant (that the array does not outlive the iteration) is maintained unconditionally, allowing failures in ceph_submit_write() to be recoverable as originally intended.
Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards CFSworks@gmail.com
fs/ceph/addr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 2b722916fb9b..91cc43950162 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1466,6 +1466,13 @@ int ceph_submit_write(struct address_space *mapping, unlock_page(page); }
if (ceph_wbc->from_pool) {mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);ceph_wbc->from_pool = false;} elsekfree(ceph_wbc->pages);ceph_wbc->pages = NULL;Probably, it makes sense to introduce a method ceph_free_page_array likewise to __ceph_allocate_page_array() and to use for freeing page array in all places.
I like the suggestion but not the name. Instead of ceph_free_page_array(), it should probably be called ceph_discard_page_array(), because it is also redirtying the pages and must not be used after successful writeback. (To me, "free" implies success while "discard" implies failure.)
Could ceph_wbc->locked_pages be greater than zero but ceph_wbc->pages == NULL?
ceph_wbc->locked_pages is the current array index into ceph_wbc->pages, so they both need to be reset sometime before the next iteration of ceph_writepages_start()'s loop.
Warm regards, Sam
Thanks, Slava.
ceph_osdc_put_request(req); return -EIO; }
linux-stable-mirror@lists.linaro.org