From: Charan Teja Kalla quic_charante@quicinc.com Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed through 'struct iovec' vector list. But do_madvise, which process_madvise() calls for each vma, returns ENOMEM in case of unmapped holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the VMA passed to as processed and continue processing other vma's in the vector list. Returning -ENOMEM to user, despite the VMA is processed, will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.164700875... Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Cc: David Rientjes rientjes@google.com Cc: Michal Hocko mhocko@suse.com Cc: Minchan Kim minchan@kernel.org Cc: Nadav Amit nadav.amit@gmail.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Suren Baghdasaryan surenb@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
mm/madvise.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise +++ a/mm/madvise.c @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi
while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter); + /* + * do_madvise returns ENOMEM if unmapped holes are present + * in the passed VMA. process_madvise() is expected to skip + * unmapped holes passed to it in the 'struct iovec' list + * and not fail because of them. Thus treat -ENOMEM return + * from do_madvise as valid and continue processing. + */ ret = do_madvise(mm, (unsigned long)iovec.iov_base, iovec.iov_len, behavior); - if (ret < 0) + if (ret < 0 && ret != -ENOMEM) break; iov_iter_advance(&iter, iovec.iov_len); } _
On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
From: Charan Teja Kalla quic_charante@quicinc.com Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed through 'struct iovec' vector list. But do_madvise, which process_madvise() calls for each vma, returns ENOMEM in case of unmapped holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the VMA passed to as processed and continue processing other vma's in the vector list. Returning -ENOMEM to user, despite the VMA is processed, will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.164700875...
I thought it was still under discussion and Charan will post next version along with previous patch "mm: madvise: return correct bytes advised with process_madvise"
https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicin...
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Cc: David Rientjes rientjes@google.com Cc: Michal Hocko mhocko@suse.com Cc: Minchan Kim minchan@kernel.org Cc: Nadav Amit nadav.amit@gmail.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Suren Baghdasaryan surenb@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
mm/madvise.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise +++ a/mm/madvise.c @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter);
/*
* do_madvise returns ENOMEM if unmapped holes are present
* in the passed VMA. process_madvise() is expected to skip
* unmapped holes passed to it in the 'struct iovec' list
* and not fail because of them. Thus treat -ENOMEM return
* from do_madvise as valid and continue processing.
ret = do_madvise(mm, (unsigned long)iovec.iov_base, iovec.iov_len, behavior);*/
if (ret < 0)
iov_iter_advance(&iter, iovec.iov_len); }if (ret < 0 && ret != -ENOMEM) break;
_
On Tue, Mar 22, 2022 at 5:25 PM Minchan Kim minchan@kernel.org wrote:
I thought it was still under discussion and Charan will post next version along with previous patch "mm: madvise: return correct bytes advised with process_madvise"
https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicin...
Hmm. It's merged now, as commit 08095d6310a7.
So any fixes please do it on top of that existing state ;(
Linus
On Tue 22-03-22 17:24:58, Minchan Kim wrote:
On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
From: Charan Teja Kalla quic_charante@quicinc.com Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed through 'struct iovec' vector list. But do_madvise, which process_madvise() calls for each vma, returns ENOMEM in case of unmapped holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the VMA passed to as processed and continue processing other vma's in the vector list. Returning -ENOMEM to user, despite the VMA is processed, will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.164700875...
I thought it was still under discussion and Charan will post next version along with previous patch "mm: madvise: return correct bytes advised with process_madvise"
https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicin...
Yes, I am not even sure the new semantic is sensible[1]. We should discuss that and see all the consequences. Changing the semantic of an existing syscall is always tricky going back and forth is even worse.
On 3/23/2022 1:58 PM, Michal Hocko wrote:
On Tue 22-03-22 17:24:58, Minchan Kim wrote:
On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
From: Charan Teja Kalla quic_charante@quicinc.com Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed through 'struct iovec' vector list. But do_madvise, which process_madvise() calls for each vma, returns ENOMEM in case of unmapped holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the VMA passed to as processed and continue processing other vma's in the vector list. Returning -ENOMEM to user, despite the VMA is processed, will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.164700875...
I thought it was still under discussion and Charan will post next version along with previous patch "mm: madvise: return correct bytes advised with process_madvise"
https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicin...
Yes, I am not even sure the new semantic is sensible[1]. We should discuss that and see all the consequences. Changing the semantic of an existing syscall is always tricky going back and forth is even worse.
Starting the discussion @ https://lore.kernel.org/linux-mm/cover.1648046642.git.quic_charante@quicinc....
Thanks, Charan
linux-stable-mirror@lists.linaro.org