On 07/11/2022 18:39, Luís Henriques wrote:
On Mon, Nov 07, 2022 at 03:17:59PM +0800, xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
When decoding the snaps fails it maybe leaving the 'first_realm' and 'realm' pointing to the same snaprealm memory. And then it'll put it twice and could cause random use-after-free, BUG_ON, etc issues.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/57686 Signed-off-by: Xiubo Li xiubli@redhat.com
fs/ceph/snap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index 9bceed2ebda3..baf17df05107 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, if (realm_to_rebuild && p >= e) rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
- if (!first_realm)
- if (!first_realm) { first_realm = realm;
- else
realm = NULL;
- } else { ceph_put_snap_realm(mdsc, realm);
- }
if (p < e) goto more; -- 2.31.1
This patch looks correct to me. But I wonder if there's a deeper problem there (probably not on the kernel client). Because the other question is: why are we failing to decode the snaps? But I guess this fix is worth it anyway.
Yeah, good question.
At the same time the MDS also crashed [1][2] just before the kernel crash was triggered seconds later. And the metadata in cephfs was corrupted due to some reasons.
[1] https://tracker.ceph.com/issues/56140
[2] https://tracker.ceph.com/issues/54546
Thanks!
- Xiubo
Reviewed-by: Luís Henriques lhenriques@suse.de
Cheers,
Luís