On Fri, Jun 28, 2024 at 1:42 PM Andrew Morton akpm@linux-foundation.org wrote:
On Fri, 28 Jun 2024 14:01:58 +0800 yangge1116@126.com wrote:
From: yangge yangge1116@126.com
If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a SEV virtual machine will fail. During starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_fast() will first call __get_user_pages_locked() to pin the page in CMA area, and then call check_and_migrate_movable_pages() to migrate the page from CMA area to non-CMA area. But the current code calling __get_user_pages_locked() will fail, because it call try_grab_folio() to pin page in gup slow path.
The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") uses try_grab_folio() in gup slow path, which seems to be problematic because try_grap_folio() will check if the page can be longterm pinned. This check may fail and cause __get_user_pages_lock() to fail. However, these checks are not required in gup slow path, seems we can use try_grab_page() instead of try_grab_folio(). In addition, in the current code, try_grab_page() can only add 1 to the page's refcount. We extend this function so that the page's refcount can be increased according to the parameters passed in.
The following log reveals it:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Well, we also have Yang Shi's patch (https://lkml.kernel.org/r/20240627231601.1713119-1-yang@os.amperecomputing.c...) which takes a significantly different approach. Which way should we go?
IMO, my patch is more complete, it should be sent to the mainline. This patch can be considered if it is hard to backport my patch to the stable tree.