On Thu, Jun 27, 2024 at 4:42 PM Andrew Morton akpm@linux-foundation.org wrote:
On Thu, 27 Jun 2024 16:16:01 -0700 Yang Shi yang@os.amperecomputing.com wrote:
The try_grab_folio() is supposed to be used in fast path and it elevates folio refcount by using add ref unless zero. We are guaranteed to have at least one stable reference in slow path, so the simple atomic add could be used. The performance difference should be trivial, but the misuse may be confusing and misleading.
In another thread [1] a kernel warning was reported when pinning folio in CMA memory when launching SEV virtual machine. The splat looks like:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Per the analysis done by yangge, when starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory. But the page is in CMA area, so fast GUP will fail then fallback to the slow path due to the longterm pinnalbe check in try_grab_folio(). The slow path will try to pin the pages then migrate them out of CMA area. But the slow path also uses try_grab_folio() to pin the page, it will also fail due to the same check then the above warning is triggered.
The remainder of mm-unstable actually applies OK on top of this.
I applied the below as a fixup to Vivek's "mm/gup: introduce memfd_pin_folios() for pinning memfd folios". After this, your v1 patch reverts cleanly.
Thanks for taking care of this. Yeah, it is not bad. I actually removed the memfd hunk then the patch can be applied to Linus's tree cleanly.
--- a/mm/gup.c~mm-gup-introduce-memfd_pin_folios-for-pinning-memfd-folios-fix +++ a/mm/gup.c @@ -3856,14 +3856,15 @@ long memfd_pin_folios(struct file *memfd next_idx != folio_index(fbatch.folios[i])) continue;
folio = try_grab_folio(&fbatch.folios[i]->page,
1, FOLL_PIN);
if (!folio) {
if (try_grab_folio(fbatch.folios[i],
1, FOLL_PIN)) { folio_batch_release(&fbatch); ret = -EINVAL; goto err; }
folio = fbatch.folios[i];
if (nr_folios == 0) *offset = offset_in_folio(folio, start);
_