Hi Mike,
Sorry for the late reply; I just got back from vacation. If it is unsafe to directly use the subpages of a hugetlb page, then reverting this patch seems like the only option for addressing this issue immediately. So, this patch is Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com
As far as the use-case is concerned, there are two main users of the udmabuf driver: Qemu and CrosVM VMMs. However, it appears Qemu is the only one that uses hugetlb pages (when hugetlb=on is set) as the backing store for Guest (Linux, Android and Windows) system memory. The main goal is to share the pages associated with the Guest allocated framebuffer (FB) with the Host GPU driver and other components in a zero-copy way. To that end, the guest GPU driver (virtio-gpu) allocates 4k size pages (associated with the FB) and pins them before sharing the (guest) physical (or dma) addresses (and lengths) with Qemu. Qemu then translates the addresses into file offsets and shares these offsets with udmabuf.
The udmabuf driver obtains the pages associated with the file offsets and uses these pages to eventually populate a scatterlist. It also creates a dmabuf fd and acts as the exporter. AFAIK, it should be possible to populate the scatterlist with physical/dma addresses (of huge pages) instead of using subpages but this might limit the capabilities of some (dmabuf) importers. I'll try to figure out a solution using physical/dma addresses and see if it works as expected, and will share the patch on linux-mm to request feedback once it is ready.
Thanks, Vivek
This effectively reverts commit 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)"). Recently, Junxiao Chang found a BUG with page map counting as described here [1]. This issue pointed out that the udmabuf driver was making direct use of subpages of hugetlb pages. This is not a good idea, and no other mm code attempts such use. In addition to the mapcount issue, this also causes issues with hugetlb vmemmap optimization and page poisoning.
For now, remove hugetlb support.
If udmabuf wants to be used on hugetlb mappings, it should be changed to only use complete hugetlb pages. This will require different alignment and size requirements on the UDMABUF_CREATE API.
[1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1- junxiao.chang@intel.com/
Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)") Cc: stable@vger.kernel.org Signed-off-by: Mike Kravetz mike.kravetz@oracle.com
drivers/dma-buf/udmabuf.c | 47 +++++---------------------------------- 1 file changed, 6 insertions(+), 41 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 01f2e86f3f7c..12cf6bb2e3ce 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -12,7 +12,6 @@ #include <linux/shmem_fs.h> #include <linux/slab.h> #include <linux/udmabuf.h> -#include <linux/hugetlb.h> #include <linux/vmalloc.h> #include <linux/iosys-map.h>
@@ -207,9 +206,7 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit;
- struct page *page, *hpage = NULL;
- pgoff_t subpgoff, maxsubpgs;
- struct hstate *hpstate;
- struct page *page; int seals, ret = -EINVAL; u32 i, flags;
@@ -245,7 +242,7 @@ static long udmabuf_create(struct miscdevice *device, if (!memfd) goto err; mapping = memfd->f_mapping;
if (!shmem_mapping(mapping) &&
!is_file_hugepages(memfd))
seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL)if (!shmem_mapping(mapping)) goto err;
@@ -256,48 +253,16 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT;
if (is_file_hugepages(memfd)) {
hpstate = hstate_file(memfd);
pgoff = list[i].offset >> huge_page_shift(hpstate);
subpgoff = (list[i].offset &
~huge_page_mask(hpstate)) >>
PAGE_SHIFT;
maxsubpgs = huge_page_size(hpstate) >>
PAGE_SHIFT;
for (pgidx = 0; pgidx < pgcnt; pgidx++) {}
if (is_file_hugepages(memfd)) {
if (!hpage) {
hpage =
find_get_page_flags(mapping, pgoff,
FGP_ACCESSED);
if (!hpage) {
ret = -EINVAL;
goto err;
}
}
page = hpage + subpgoff;
get_page(page);
subpgoff++;
if (subpgoff == maxsubpgs) {
put_page(hpage);
hpage = NULL;
subpgoff = 0;
pgoff++;
}
} else {
page =
shmem_read_mapping_page(mapping,
pgoff + pgidx);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
goto err;
}
page = shmem_read_mapping_page(mapping, pgoff
- pgidx);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
} fput(memfd); memfd = NULL;goto err; } ubuf->pages[pgbuf++] = page;
if (hpage) {
put_page(hpage);
hpage = NULL;
}
}
exp_info.ops = &udmabuf_ops;
-- 2.40.1