Yan Zhao yan.y.zhao@intel.com writes:
On Tue, Sep 10, 2024 at 11:44:10PM +0000, Ackerley Tng wrote:
+/*
- Allocates and then caches a folio in the filemap. Returns a folio with
- refcount of 2: 1 after allocation, and 1 taken by the filemap.
- */
+static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio(struct inode *inode,
pgoff_t index)
+{
- struct kvm_gmem_hugetlb *hgmem;
- pgoff_t aligned_index;
- struct folio *folio;
- int nr_pages;
- int ret;
- hgmem = kvm_gmem_hgmem(inode);
- folio = kvm_gmem_hugetlb_alloc_folio(hgmem->h, hgmem->spool);
- if (IS_ERR(folio))
return folio;
- nr_pages = 1UL << huge_page_order(hgmem->h);
- aligned_index = round_down(index, nr_pages);
Maybe a gap here.
When a guest_memfd is bound to a slot where slot->base_gfn is not aligned to 2M/1G and slot->gmem.pgoff is 0, even if an index is 2M/1G aligned, the corresponding GFN is not 2M/1G aligned.
Thanks for looking into this.
In 1G page support for guest_memfd, the offset and size are always hugepage aligned to the hugepage size requested at guest_memfd creation time, and it is true that when binding to a memslot, slot->base_gfn and slot->npages may not be hugepage aligned.
However, TDX requires that private huge pages be 2M aligned in GFN.
IIUC other factors also contribute to determining the mapping level in the guest page tables, like lpage_info and .private_max_mapping_level() in kvm_x86_ops.
If slot->base_gfn and slot->npages are not hugepage aligned, lpage_info will track that and not allow faulting into guest page tables at higher granularity.
Hence I think it is okay to leave it to KVM to fault pages into the guest correctly. For guest_memfd will just maintain the invariant that offset and size are hugepage aligned, but not require that slot->base_gfn and slot->npages are hugepage aligned. This behavior will be consistent with other backing memory for guests like regular shmem or HugeTLB.
- ret = kvm_gmem_hugetlb_filemap_add_folio(inode->i_mapping, folio,
aligned_index,
htlb_alloc_mask(hgmem->h));
- WARN_ON(ret);
- spin_lock(&inode->i_lock); inode->i_blocks += blocks_per_huge_page(hgmem->h); spin_unlock(&inode->i_lock);
- return page_folio(requested_page);
- return folio;
+}