Patrick Donnelly batrick@batbytes.com wrote:
Log recovered from a user's cluster:
<7>[ 5413.970692] ceph: get_cap_refs 00000000958c114b ret 1 got Fr <7>[ 5413.970695] ceph: start_read 00000000958c114b, no cache cap ... <7>[ 5473.934609] ceph: my wanted = Fr, used = Fr, dirty - <7>[ 5473.934616] ceph: revocation: pAsLsXsFr -> pAsLsXs (revoking Fr) <7>[ 5473.934632] ceph: __ceph_caps_issued 00000000958c114b cap 00000000f7784259 issued pAsLsXs <7>[ 5473.934638] ceph: check_caps 10000000e68.fffffffffffffffe file_want - used Fr dirty - flushing - issued pAsLsXs revoking Fr retain pAsLsXsFsr AUTHONLY NOINVAL FLUSH_FORCE
The MDS subsequently complains that the kernel client is late releasing caps.
Approximately, a series of changes to this code by the three commits cited below resulted in subtle resource cleanup to be missed. The main culprit is the change in error handling in 2d31604 which meant that a failure in init_request would no longer cause cleanup to be called. That would prevent the ceph_put_cap_refs which would cleanup the leaked cap ref.
Closes: https://tracker.ceph.com/issues/67008 Fixes: 49870056005ca9387e5ee31451991491f99cc45f ("ceph: convert ceph_readpages to ceph_readahead") Fixes: 2de160417315b8d64455fe03e9bb7d3308ac3281 ("netfs: Change ->init_request() to return an error code") Fixes: a5c9dc4451394b2854493944dcc0ff71af9705a3 ("ceph: Make ceph_init_request() check caps on readahead")
Note that you only need the first 12 digits of the SHA1 sum.
Signed-off-by: Patrick Donnelly pdonnell@redhat.com Cc: stable@vger.kernel.org
Reviewed-by: David Howells dhowells@redhat.com