On Thu, Jun 27, 2024 at 11:55 AM Peter Xu peterx@redhat.com wrote:
On Thu, Jun 27, 2024 at 04:53:08PM +0800, yangge1116@126.com wrote:
From: yangge yangge1116@126.com
If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a SEV virtual machine will fail. During starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_fast() will first call __get_user_pages_locked() to pin the page in CMA area, and then call check_and_migrate_movable_pages() to migrate the page from CMA area to non-CMA area. But the current code calling __get_user_pages_locked() will fail, because it call try_grab_folio() to pin page in gup slow path.
The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") uses try_grab_folio() in gup slow path, which seems to be problematic because try_grap_folio() will check if the page can be longterm pinned. This check may fail and cause __get_user_pages_lock() to fail. However, these checks are not required in gup slow path, seems we can use try_grab_page() instead of try_grab_folio(). In addition, in the current code, try_grab_page() can only add 1 to the page's refcount. We extend this function so that the page's refcount can be increased according to the parameters passed in.
The following log reveals it:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") Cc: stable@vger.kernel.org Signed-off-by: yangge yangge1116@126.com
Thanks for the report and the fix proposed. This is unfortunate..
It's just that I worry this may not be enough, as thp slow gup isn't the only one using try_grab_folio(). There're also hugepd and memfd pinning (which just got queued, again).
I suspect both of them can also hit a cma chunk here, and fail whenever they shouldn't have.
The slight complexity resides in the hugepd path where it right now shares with fast-gup. So we may potentially need something similiar to what Yang used to introduce in this patch:
https://lore.kernel.org/r/20240604234858.948986-2-yang@os.amperecomputing.co...
So as to identify whether the hugepd gup is slow or fast, and we should only let the fast gup fail on those.
Let me also loop them in on the other relevant discussion.
Thanks, Peter. I was actually typing the same thing...
Yes, I agree my patch should be able to solve the problem. At the beginning I thought it is just a pure clean up patch, but it seems like it is more useful.
I'm going to port my patch to the latest mm-unstable, then post it.
Thanks,
-- Peter Xu