In two `break` statements, the call to ceph_release_page_vector() was missing, leaking the allocation from ceph_alloc_page_vector().
Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com --- fs/ceph/file.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 4b8d59ebda00..24d0f1cc9aac 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1134,6 +1134,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, extent_cnt = __ceph_sparse_read_ext_count(inode, read_len); ret = ceph_alloc_sparse_ext_map(op, extent_cnt); if (ret) { + ceph_release_page_vector(pages, num_pages); ceph_osdc_put_request(req); break; } @@ -1168,6 +1169,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, op->extent.sparse_ext_cnt); if (fret < 0) { ret = fret; + ceph_release_page_vector(pages, num_pages); ceph_osdc_put_request(req); break; }
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
On Wed, Nov 27, 2024 at 11:20 PM Max Kellermann max.kellermann@ionos.com wrote:
In two `break` statements, the call to ceph_release_page_vector() was missing, leaking the allocation from ceph_alloc_page_vector().
Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com
fs/ceph/file.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 4b8d59ebda00..24d0f1cc9aac 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1134,6 +1134,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, extent_cnt = __ceph_sparse_read_ext_count(inode, read_len); ret = ceph_alloc_sparse_ext_map(op, extent_cnt); if (ret) {
ceph_release_page_vector(pages, num_pages); ceph_osdc_put_request(req); break; }
@@ -1168,6 +1169,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, op->extent.sparse_ext_cnt); if (fret < 0) { ret = fret;
ceph_release_page_vector(pages, num_pages); ceph_osdc_put_request(req); break; }
-- 2.45.2
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
I don't get it. If this ends badly, why does the other ceph_release_page_vector() call after ceph_osdc_put_request() in that function not end badly?
On Thu, Nov 28, 2024 at 1:28 PM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
I don't get it. If this ends badly, why does the other ceph_release_page_vector() call after ceph_osdc_put_request() in that function not end badly?
Look at this piece:
osd_req_op_extent_osd_data_pages(req, 0, pages, read_len, offset_in_page(read_off), false, false);
The last parameter is "own_pages". Ownership of these pages is NOT transferred to the osdc request, therefore ceph_osdc_put_request() will NOT free them, and this is really a leak bug and my patch fixes it.
I just saw this piece of code for the first time, I have no idea. What am I missing?
The ergonomics and error handling on this function seem awkward. I'll take a closer look on how to arrange it better.
On Thu, Nov 28, 2024 at 2:31 PM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Nov 28, 2024 at 1:28 PM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
I don't get it. If this ends badly, why does the other ceph_release_page_vector() call after ceph_osdc_put_request() in that function not end badly?
Look at this piece:
osd_req_op_extent_osd_data_pages(req, 0, pages, read_len, offset_in_page(read_off), false, false);
The last parameter is "own_pages". Ownership of these pages is NOT transferred to the osdc request, therefore ceph_osdc_put_request() will NOT free them, and this is really a leak bug and my patch fixes it.
I just saw this piece of code for the first time, I have no idea. What am I missing?
On Thu, Nov 28, 2024 at 1:49 PM Alex Markuze amarkuze@redhat.com wrote:
The ergonomics and error handling on this function seem awkward. I'll take a closer look on how to arrange it better.
Does that mean you misunderstood page ownership, or do you still believe my patch is wrong? To me, "pages must not be freed manually" and "pages must be freed manually" can't both be right at the same time.
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
Is there anybody else who can explain this to me? I believe Alex is wrong and my patch is correct, but maybe I'm missing something.
On Fri, Nov 29, 2024 at 9:06 AM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
Is there anybody else who can explain this to me? I believe Alex is wrong and my patch is correct, but maybe I'm missing something.
It's been a week. Is there really nobody who understands this piece of code? I believe I do understand it, but my understanding conflicts with Alex's, and he's the expert (and I'm not).
On Thu, Dec 5, 2024 at 9:32 AM Max Kellermann max.kellermann@ionos.com wrote:
On Fri, Nov 29, 2024 at 9:06 AM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
Is there anybody else who can explain this to me? I believe Alex is wrong and my patch is correct, but maybe I'm missing something.
It's been a week. Is there really nobody who understands this piece of code? I believe I do understand it, but my understanding conflicts with Alex's, and he's the expert (and I'm not).
Hi Max,
Your understanding is correct. Pages would be freed automatically together with the request only if the ownership is transferred by passing true for own_pages to osd_req_op_extent_osd_data_pages(), which __ceph_sync_read() doesn't do.
These error path leaks were introduced in commits 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") and f0fe1e54cfcf ("ceph: plumb in decryption during reads") with support for fscrypt. Looking at the former commit, it looks like a similar leak was introduced in ceph_direct_read_write() too -- on bvecs instead of pages.
I have applied this patch and will take care of the leak on bvecs myself because I think I see other issues there.
Thanks,
Ilya
This is a bad patch, I don't appreciate partial fixes that introduce unnecessary complications to the code, and it conflicts with the complete fix in the other thread. Please coordinate with me in the future.
On Thu, Dec 5, 2024 at 1:25 PM Ilya Dryomov idryomov@gmail.com wrote:
On Thu, Dec 5, 2024 at 9:32 AM Max Kellermann max.kellermann@ionos.com wrote:
On Fri, Nov 29, 2024 at 9:06 AM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Nov 28, 2024 at 1:18 PM Alex Markuze amarkuze@redhat.com wrote:
Pages are freed in `ceph_osdc_put_request`, trying to release them this way will end badly.
Is there anybody else who can explain this to me? I believe Alex is wrong and my patch is correct, but maybe I'm missing something.
It's been a week. Is there really nobody who understands this piece of code? I believe I do understand it, but my understanding conflicts with Alex's, and he's the expert (and I'm not).
Hi Max,
Your understanding is correct. Pages would be freed automatically together with the request only if the ownership is transferred by passing true for own_pages to osd_req_op_extent_osd_data_pages(), which __ceph_sync_read() doesn't do.
These error path leaks were introduced in commits 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") and f0fe1e54cfcf ("ceph: plumb in decryption during reads") with support for fscrypt. Looking at the former commit, it looks like a similar leak was introduced in ceph_direct_read_write() too -- on bvecs instead of pages.
I have applied this patch and will take care of the leak on bvecs myself because I think I see other issues there.
Thanks,
Ilya
On Thu, Dec 5, 2024 at 12:31 PM Alex Markuze amarkuze@redhat.com wrote:
This is a bad patch, I don't appreciate partial fixes that introduce unnecessary complications to the code, and it conflicts with the complete fix in the other thread.
Alex, and I don't appreciate the unnecessary complications you introduce to the Ceph contribution process!
The mistake you made in your first review ("will end badly") is not a big deal; happens to everybody - but you still don't admit the mistake and you ghosted me for a week. But then saying you don't appreciate the work of somebody who found a bug and posted a simple fix is not good communication. You can say you prefer a different patch and explain the technical reasons; but saying you don't appreciate it is quite condescending.
Now back to the technical facts:
- What exactly about my patch is "bad"? - Do you believe my patch is not strictly an improvement? - Why do you believe my fix is only "partial"? - What unnecessary complications are introduced by my two-line patch in your opinion? - What "other thread"? I can't find anything on LKML and ceph-devel.
Max
The full fix is now in the testing branch.
Max, please follow the mailing list, I posted the patch last week on the initial thread regarding this issue. Please, comment on the correct thread, having two threads regarding the same issue introduces unnecessary confusion.
The fix resolves the following tracker.
https://tracker.ceph.com/issues/67524
Additionally, these issues are no longer recreated after the fix. https://tracker.ceph.com/issues/68981 https://tracker.ceph.com/issues/68980
I will make a couple runs with KASAN and its peers, as it's not immediately clear why these two issues are affected.
On Thu, Dec 5, 2024 at 2:02 PM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Dec 5, 2024 at 12:31 PM Alex Markuze amarkuze@redhat.com wrote:
This is a bad patch, I don't appreciate partial fixes that introduce unnecessary complications to the code, and it conflicts with the complete fix in the other thread.
Alex, and I don't appreciate the unnecessary complications you introduce to the Ceph contribution process!
The mistake you made in your first review ("will end badly") is not a big deal; happens to everybody - but you still don't admit the mistake and you ghosted me for a week. But then saying you don't appreciate the work of somebody who found a bug and posted a simple fix is not good communication. You can say you prefer a different patch and explain the technical reasons; but saying you don't appreciate it is quite condescending.
Now back to the technical facts:
- What exactly about my patch is "bad"?
- Do you believe my patch is not strictly an improvement?
- Why do you believe my fix is only "partial"?
- What unnecessary complications are introduced by my two-line patch
in your opinion?
- What "other thread"? I can't find anything on LKML and ceph-devel.
Max
On Thu, Dec 5, 2024 at 1:17 PM Alex Markuze amarkuze@redhat.com wrote:
The full fix is now in the testing branch.
Alex, yet again, you did not reply to any of my questions! This is tiring.
The full fix may be a "full" fix for something else, and just happens to mix in several other unrelated things, e.g. a fix for the memory leak bug I found. That is what I call "bad".
My patch is not a "partial" fix. It is a full fix for the memory leak bug. It just doesn't fix anything else, but that's how it's supposed to be (and certainly not "bad"): a tiny patch that can be reviewed easily that addresses just one issue. That's the opposite of "complications".
Max, please follow the mailing list, I posted the patch last week on the initial thread regarding this issue. Please, comment on the correct thread, having two threads regarding the same issue introduces unnecessary confusion.
But... THIS is the initial thread regarding this issue (= memory leak). It is you who creates unnecessary confusion: with your emails and with your code.
On Thu, Dec 5, 2024 at 1:02 PM Max Kellermann max.kellermann@ionos.com wrote:
- What "other thread"? I can't find anything on LKML and ceph-devel.
Just in case you mean this patch authored by you: https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae492dbfcd82ff412...
I don't think that's a good patch, and if I had the power, I would certainly reject it, because:
- it's big and confusing; hard to review - it's badly documented; the commit message is just "fixing a race condition when a file shrinks" but other than that, doesn't explain anything; a proper explanation is necessary for such a complicated diff - the patch changes many distinct things in one patch, but it should really be split - this patch is about the buffer overflow for which my patch is much simpler: https://lore.kernel.org/lkml/20241127212130.2704804-1-max.kellermann@ionos.c... which I suggested merging instead of all the other candicate patches https://lore.kernel.org/lkml/CAKPOu+9kdcjMf36bF3HAW4K8v0mHxXQX3_oQfGSshmXBKt... but you did not reply (as usual, sigh!) - deeply hidden in this patch is also a fix for the memory leak, but instead of making one large patch which fixes everything, you should first merge my trivial leak bug fix and then the fix for the buffer overflow on top
I will explain the process for ceph client patches. It's important to note: The process itself and part of the automation is still evolving and so many things have to be done manually.
1. A patch is created and pushed into a local testing branch and tested with xfstests on a kernel compiled with KASAN, several static analysis tools are run as well. 2. The local testing branch is merged with the `testing` branch which implies that all sepia labs teutology tests run on this new kernel -- so care must be taken before pushing. 3. The patch is reviewed on the mailing list by the community, when acked Ilya takes it to the main branch and eventually to the upstream kernel.
I will break it up into three separate patches, as it addresses three different issues. So that's a good comment. Any other constructive comments will be appreciated, both regarding the patch and the process. I didn't send an RFC yet; I need to understand how the other two issues were fixed as well, I will send an RFC when I'm done, as I need to make sure all root causes are fixed. I prefer fixing things once.
On Thu, Dec 5, 2024 at 2:22 PM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Dec 5, 2024 at 1:02 PM Max Kellermann max.kellermann@ionos.com wrote:
- What "other thread"? I can't find anything on LKML and ceph-devel.
Just in case you mean this patch authored by you: https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae492dbfcd82ff412...
I don't think that's a good patch, and if I had the power, I would certainly reject it, because:
- it's big and confusing; hard to review
- it's badly documented; the commit message is just "fixing a race
condition when a file shrinks" but other than that, doesn't explain anything; a proper explanation is necessary for such a complicated diff
- the patch changes many distinct things in one patch, but it should
really be split
- this patch is about the buffer overflow for which my patch is much
simpler: https://lore.kernel.org/lkml/20241127212130.2704804-1-max.kellermann@ionos.c... which I suggested merging instead of all the other candicate patches https://lore.kernel.org/lkml/CAKPOu+9kdcjMf36bF3HAW4K8v0mHxXQX3_oQfGSshmXBKt... but you did not reply (as usual, sigh!)
- deeply hidden in this patch is also a fix for the memory leak, but
instead of making one large patch which fixes everything, you should first merge my trivial leak bug fix and then the fix for the buffer overflow on top
On Thu, Dec 5, 2024 at 1:57 PM Alex Markuze amarkuze@redhat.com wrote:
I will explain the process for ceph client patches. It's important to note: The process itself and part of the automation is still evolving and so many things have to be done manually.
None of this answers any of my questions on your negative review comments.
I will break it up into three separate patches, as it addresses three different issues.
... one of which will be my patch.
On Thu, Dec 5, 2024 at 2:08 PM Max Kellermann max.kellermann@ionos.com wrote:
On Thu, Dec 5, 2024 at 1:57 PM Alex Markuze amarkuze@redhat.com wrote:
I will explain the process for ceph client patches. It's important to note: The process itself and part of the automation is still evolving and so many things have to be done manually.
None of this answers any of my questions on your negative review comments.
I will break it up into three separate patches, as it addresses three different issues.
... one of which will be my patch.
It looks like Alex's preference is to address these memory leaks by passing true for own_pages to osd_req_op_extent_osd_data_pages() and adjusting where the OSD request is put accordingly -- this is what he folded in his WIP patch which was posted in Luis' thread [1] the day after this patch and pushed to the testing branch earlier today [2]. Unfortunately an update to this thread was missed, leaving everyone including myself confused.
Max, would you be willing to redo this patch to pass true for own_pages and post a v2? There is nothing "bad", "partial" or otherwise wrong with this version, but having the pages be taken care of automatically is a bit nicer and a conflict with Alex's ongoing work would be avoided.
[1] https://lore.kernel.org/ceph-devel/CAO8a2ShzHuTizjY+T+RNr28XLGOJPNoXJf1rQUbu... [2] https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae492dbfcd82ff412...
Thanks,
Ilya
On Thu, Dec 5, 2024 at 4:12 PM Ilya Dryomov idryomov@gmail.com wrote:
Max, would you be willing to redo this patch to pass true for own_pages and post a v2? There is nothing "bad", "partial" or otherwise wrong with this version, but having the pages be taken care of automatically is a bit nicer and a conflict with Alex's ongoing work would be avoided.
Yes, I will send a patch for this. Even though I don't agree that this is the best way forward; I'd prefer to fix the leak with a minimal patch adding only the necessary calls, and not mix this with code refactoring. A minimal fix is easier to review. Mixing a bug fix with refactoring is more dangerous, which is important to avoid, because this patch will be put in all stable branches. But ... if you want to, I'll do that.
btw. Alex's patch (https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae4) introduces another memory leak: by postponing the ceph_osdc_put_request() call, the -EFAULT branch now leaks the ceph_osd_request object. I'll take care not to make the same mistake in my v2.
On Thu, Dec 5, 2024 at 4:37 PM Max Kellermann max.kellermann@ionos.com wrote:
btw. Alex's patch (https://github.com/ceph/ceph-client/commit/2a802a906f9c89f8ae4) introduces another memory leak
Sorry, that was wrong. The "break" only exits the inner "while" loop. I got confused by nested loops.
In two `break` statements, the call to ceph_release_page_vector() was missing, leaking the allocation from ceph_alloc_page_vector().
Instead of adding the missing ceph_release_page_vector() calls, the Ceph maintainers preferred to transfer page ownership to the `ceph_osd_request` by passing `own_pages=true` to osd_req_op_extent_osd_data_pages(). This requires postponing the ceph_osdc_put_request() call until after the block that accesses the `pages`.
Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com --- fs/ceph/file.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 4b8d59ebda00..ce342a5d4b8b 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1127,7 +1127,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
osd_req_op_extent_osd_data_pages(req, 0, pages, read_len, offset_in_page(read_off), - false, false); + false, true);
op = &req->r_ops[0]; if (sparse) { @@ -1186,8 +1186,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, ret = min_t(ssize_t, fret, len); }
- ceph_osdc_put_request(req); - /* Short read but not EOF? Zero out the remainder. */ if (ret >= 0 && ret < len && (off + ret < i_size)) { int zlen = min(len - ret, i_size - off - ret); @@ -1221,7 +1219,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, break; } } - ceph_release_page_vector(pages, num_pages); + + ceph_osdc_put_request(req);
if (ret < 0) { if (ret == -EBLOCKLISTED)
Good. This sequence has not been tested independently, but it should be fine.
On Thu, Dec 5, 2024 at 5:49 PM Max Kellermann max.kellermann@ionos.com wrote:
In two `break` statements, the call to ceph_release_page_vector() was missing, leaking the allocation from ceph_alloc_page_vector().
Instead of adding the missing ceph_release_page_vector() calls, the Ceph maintainers preferred to transfer page ownership to the `ceph_osd_request` by passing `own_pages=true` to osd_req_op_extent_osd_data_pages(). This requires postponing the ceph_osdc_put_request() call until after the block that accesses the `pages`.
Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com
fs/ceph/file.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 4b8d59ebda00..ce342a5d4b8b 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1127,7 +1127,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
osd_req_op_extent_osd_data_pages(req, 0, pages, read_len, offset_in_page(read_off),
false, false);
false, true); op = &req->r_ops[0]; if (sparse) {
@@ -1186,8 +1186,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, ret = min_t(ssize_t, fret, len); }
ceph_osdc_put_request(req);
/* Short read but not EOF? Zero out the remainder. */ if (ret >= 0 && ret < len && (off + ret < i_size)) { int zlen = min(len - ret, i_size - off - ret);
@@ -1221,7 +1219,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, break; } }
ceph_release_page_vector(pages, num_pages);
ceph_osdc_put_request(req); if (ret < 0) { if (ret == -EBLOCKLISTED)
-- 2.45.2
On Thu, Dec 5, 2024 at 5:30 PM Alex Markuze amarkuze@redhat.com wrote:
Good. This sequence has not been tested independently, but it should be fine.
Applied.
Thanks,
Ilya
On Thu, Dec 5, 2024 at 5:49 PM Max Kellermann max.kellermann@ionos.com wrote:
In two `break` statements, the call to ceph_release_page_vector() was missing, leaking the allocation from ceph_alloc_page_vector().
Instead of adding the missing ceph_release_page_vector() calls, the Ceph maintainers preferred to transfer page ownership to the `ceph_osd_request` by passing `own_pages=true` to osd_req_op_extent_osd_data_pages(). This requires postponing the ceph_osdc_put_request() call until after the block that accesses the `pages`.
Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann max.kellermann@ionos.com
fs/ceph/file.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 4b8d59ebda00..ce342a5d4b8b 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1127,7 +1127,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
osd_req_op_extent_osd_data_pages(req, 0, pages, read_len, offset_in_page(read_off),
false, false);
false, true); op = &req->r_ops[0]; if (sparse) {
@@ -1186,8 +1186,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, ret = min_t(ssize_t, fret, len); }
ceph_osdc_put_request(req);
/* Short read but not EOF? Zero out the remainder. */ if (ret >= 0 && ret < len && (off + ret < i_size)) { int zlen = min(len - ret, i_size - off - ret);
@@ -1221,7 +1219,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, break; } }
ceph_release_page_vector(pages, num_pages);
ceph_osdc_put_request(req); if (ret < 0) { if (ret == -EBLOCKLISTED)
-- 2.45.2
linux-stable-mirror@lists.linaro.org