On Wed, Apr 23, 2025 at 9:23 AM Max Kellermann max.kellermann@ionos.com wrote:
Sometimes, when a file was read while it was being truncated by another NFS client, the kernel could deadlock because folio_unlock() was called twice, and the second call would XOR back the `PG_locked` flag.
Most of the time (depending on the timing of the truncation), nobody notices the problem because folio_unlock() gets called three times, which flips `PG_locked` back off:
- vfs_read, nfs_read_folio, ... nfs_read_add_folio, nfs_return_empty_folio
- vfs_read, nfs_read_folio, ... netfs_read_collection, netfs_unlock_abandoned_read_pages
- vfs_read, ... nfs_do_read_folio, nfs_read_add_folio, nfs_return_empty_folio
The problem is that nfs_read_add_folio() is not supposed to unlock the folio if fscache is enabled, and a nfs_netfs_folio_unlock() check is missing in nfs_return_empty_folio().
Rarely this leads to a warning in netfs_read_collection():
------------[ cut here ]------------ R=0000031c: folio 10 is not locked WARNING: CPU: 0 PID: 29 at fs/netfs/read_collect.c:133 netfs_read_collection+0x7c0/0xf00 [...] Workqueue: events_unbound netfs_read_collection_worker RIP: 0010:netfs_read_collection+0x7c0/0xf00 [...] Call Trace:
<TASK> netfs_read_collection_worker+0x67/0x80 process_one_work+0x12e/0x2c0 worker_thread+0x295/0x3a0
Most of the time, however, processes just get stuck forever in folio_wait_bit_common(), waiting for `PG_locked` to disappear, which never happens because nobody is really holding the folio lock.
Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled") Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com
fs/nfs/read.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 81bd1b9aba17..3c1fa320b3f1 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -56,7 +56,8 @@ static int nfs_return_empty_folio(struct folio *folio) { folio_zero_segment(folio, 0, folio_size(folio)); folio_mark_uptodate(folio);
folio_unlock(folio);
if (nfs_netfs_folio_unlock(folio))
folio_unlock(folio); return 0;
}
-- 2.47.2
LGTM Thanks for tracking this down, Max.
Reviewed-by: Dave Wysochanski dwysocha@redhat.com