On Fri 11-03-22 20:59:06, Charan Teja Kalla wrote:
The process_madvise() system call is expected to skip holes in vma passed through 'struct iovec' vector list.
Where is this assumption coming from? From the man page I can see: : The advice might be applied to only a part of iovec if one of its : elements points to an invalid memory region in the remote : process. No further elements will be processed beyond that : point.
But do_madvise, which process_madvise() calls for each vma, returns ENOMEM in case of unmapped holes, despite the VMA is processed. Thus process_madvise() should treat ENOMEM as expected and consider the VMA passed to as processed and continue processing other vma's in the vector list. Returning -ENOMEM to user, despite the VMA is processed, will be unable to figure out where to start the next madvise.
I am not sure I follow. With your previous patch and -ENOMEM from do_madvise you get the the answer you are looking for, no? With this applied you are loosing the information that some of the iters are not mapped or has a hole. Which might be a useful information especially when processing on remote tasks which are free to manipulate their address spaces.
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com
Changes in V2: -- Fixed handling of ENOMEM by process_madvise(). -- Patch doesn't exist in V1.
mm/madvise.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c index e97e6a9..14fb76d 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1426,9 +1426,16 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter);
/*
* do_madvise returns ENOMEM if unmapped holes are present
* in the passed VMA. process_madvise() is expected to skip
* unmapped holes passed to it in the 'struct iovec' list
* and not fail because of them. Thus treat -ENOMEM return
* from do_madvise as valid and continue processing.
ret = do_madvise(mm, (unsigned long)iovec.iov_base, iovec.iov_len, behavior);*/
if (ret < 0)
iov_iter_advance(&iter, iovec.iov_len); }if (ret < 0 && ret != -ENOMEM) break;
-- 2.7.4