From: Xiubo Li xiubli@redhat.com
The osd code has remove cursor initilizing code and this will make the sparse read state into a infinite loop. We should initialize the cursor just before each sparse-read in messnger v2.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/64607 Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket") Reported-by: Luis Henriques lhenriques@suse.de Signed-off-by: Xiubo Li xiubli@redhat.com ---
V2: - Just removed the unnecessary 'sparse_read_total' check.
net/ceph/messenger_v2.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index a0ca5414b333..ab3ab130a911 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct ceph_connection *con) if (!con_secure(con)) con->in_data_crc = -1;
+ ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, + con->in_msg->sparse_read_total); + reset_in_kvecs(con); con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT; con->v2.data_len_remain = data_len(msg);
xiubli@redhat.com writes:
From: Xiubo Li xiubli@redhat.com
The osd code has remove cursor initilizing code and this will make the sparse read state into a infinite loop. We should initialize the cursor just before each sparse-read in messnger v2.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/64607 Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket") Reported-by: Luis Henriques lhenriques@suse.de Signed-off-by: Xiubo Li xiubli@redhat.com
V2:
- Just removed the unnecessary 'sparse_read_total' check.
Thanks a lot for the quick fix, Xiubo. FWIW:
Tested-by: Luis Henriques lhenriques@suse.de
Note that I still see this test failing occasionally, but I haven't had time to help debugging it. And that's a different issue, of course. TBH I don't remember if this test ever used to reliably pass. Here's the output diff shown by fstests in case you're not able to reproduce it:
@@ -65,7 +65,7 @@ # Getting encryption key status Present (user_count=1, added_by_self) # Removing encryption key -Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751 +Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751, but files still busy # Getting encryption key status Absent # Verifying that the encrypted directory was "locked"
Cheers,
On Wed, Mar 6, 2024 at 12:24 PM Luis Henriques lhenriques@suse.de wrote:
xiubli@redhat.com writes:
From: Xiubo Li xiubli@redhat.com
The osd code has remove cursor initilizing code and this will make the sparse read state into a infinite loop. We should initialize the cursor just before each sparse-read in messnger v2.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/64607 Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket") Reported-by: Luis Henriques lhenriques@suse.de Signed-off-by: Xiubo Li xiubli@redhat.com
V2:
- Just removed the unnecessary 'sparse_read_total' check.
Thanks a lot for the quick fix, Xiubo. FWIW:
Tested-by: Luis Henriques lhenriques@suse.de
Thank you for catching this, Luis! I'm still lacking clarity on how this got missed, but hopefully the fs suite will improve with regard to fscrypt + ms_type coverage.
I have staged the fix with a minor tweak to use msg local variable instead of con->in_msg and reworded changelog:
https://github.com/ceph/ceph-client/commit/321e3c3de53c7530cd518219d01f04e7e...
Ilya
On 3/6/24 19:24, Luis Henriques wrote:
xiubli@redhat.com writes:
From: Xiubo Li xiubli@redhat.com
The osd code has remove cursor initilizing code and this will make the sparse read state into a infinite loop. We should initialize the cursor just before each sparse-read in messnger v2.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/64607 Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket") Reported-by: Luis Henriques lhenriques@suse.de Signed-off-by: Xiubo Li xiubli@redhat.com
V2:
- Just removed the unnecessary 'sparse_read_total' check.
Thanks a lot for the quick fix, Xiubo. FWIW:
Tested-by: Luis Henriques lhenriques@suse.de
Note that I still see this test failing occasionally, but I haven't had time to help debugging it. And that's a different issue, of course. TBH I don't remember if this test ever used to reliably pass. Here's the output diff shown by fstests in case you're not able to reproduce it:
@@ -65,7 +65,7 @@ # Getting encryption key status Present (user_count=1, added_by_self) # Removing encryption key -Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751 +Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751, but files still busy # Getting encryption key status Absent # Verifying that the encrypted directory was "locked"
Thanks Luis.
This is a different issue as I remembered I have seen this before in msgr1.
Thanks
- Xiubo
Cheers,
linux-stable-mirror@lists.linaro.org