I have tested that patches 2 and 3 work using the following reproducers. I did not write a reproducer for the issue described in patch 1.
Reproducer for F_SEAL_FUTURE_WRITE not being respected: ``` #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <linux/udmabuf.h>
#define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ })
int main(void) { int memfd = SYSCHK(memfd_create("test", MFD_ALLOW_SEALING)); SYSCHK(ftruncate(memfd, 0x1000)); SYSCHK(fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK|F_SEAL_FUTURE_WRITE)); int udmabuf_fd = SYSCHK(open("/dev/udmabuf", O_RDWR)); struct udmabuf_create create_arg = { .memfd = memfd, .flags = 0, .offset = 0, .size = 0x1000 }; int buf_fd = SYSCHK(ioctl(udmabuf_fd, UDMABUF_CREATE, &create_arg)); printf("created udmabuf buffer fd %d\n", buf_fd); char *map = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, buf_fd, 0)); *map = 'a'; } ```
Reproducer for the memory leak (if you run this for a while, your memory usage will steadily go up, and /sys/kernel/debug/dma_buf/bufinfo will contain a ton of entries): ``` #define _GNU_SOURCE #include <err.h> #include <errno.h> #include <assert.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/resource.h> #include <linux/udmabuf.h>
#define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ })
int main(void) { int memfd = SYSCHK(memfd_create("test", MFD_ALLOW_SEALING)); SYSCHK(ftruncate(memfd, 0x1000)); SYSCHK(fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK)); int udmabuf_fd = SYSCHK(open("/dev/udmabuf", O_RDWR));
// prevent creating new FDs struct rlimit rlim = { .rlim_cur = 1, .rlim_max = 1 }; SYSCHK(setrlimit(RLIMIT_NOFILE, &rlim));
while (1) { struct udmabuf_create create_arg = { .memfd = memfd, .flags = 0, .offset = 0, .size = 0x1000 }; int buf_fd = ioctl(udmabuf_fd, UDMABUF_CREATE, &create_arg); assert(buf_fd == -1); assert(errno == EMFILE); } } ```
Signed-off-by: Jann Horn jannh@google.com --- Changes in v2: - patch 1/3: use file_inode instead of ->f_inode (Vivek) - patch 1/3: add comment explaining the inode_lock_shared() - patch 3/3: remove unused parameter (Vivek) - patch 3/3: add comment on cleanup (Vivek) - collect Acks - Link to v1: https://lore.kernel.org/r/20241203-udmabuf-fixes-v1-0-f99281c345aa@google.co...
--- Jann Horn (3): udmabuf: fix racy memfd sealing check udmabuf: also check for F_SEAL_FUTURE_WRITE udmabuf: fix memory leak on last export_udmabuf() error path
drivers/dma-buf/udmabuf.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-) --- base-commit: b86545e02e8c22fb89218f29d381fa8e8b91d815 change-id: 20241203-udmabuf-fixes-d0435ebab663
The current check_memfd_seals() is racy: Since we first do check_memfd_seals() and then udmabuf_pin_folios() without holding any relevant lock across both, F_SEAL_WRITE can be set in between. This is problematic because we can end up holding pins to pages in a write-sealed memfd.
Fix it using the inode lock, that's probably the easiest way. In the future, we might want to consider moving this logic into memfd, especially if anyone else wants to use memfd_pin_folios().
Reported-by: Julian Orth ju.orth@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219106 Closes: https://lore.kernel.org/r/CAG48ez0w8HrFEZtJkfmkVKFDhE5aP7nz=obrimeTgpD+StkV9... Fixes: fbb0de795078 ("Add udmabuf misc device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com --- drivers/dma-buf/udmabuf.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 8ce1f074c2d32a0a9f59ff7184359e37d56548c6..c1d8c2766d6d36fc5fe1b3d73057f6e01ec6678f 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -436,14 +436,19 @@ static long udmabuf_create(struct miscdevice *device, goto err; }
+ /* + * Take the inode lock to protect against concurrent + * memfd_add_seals(), which takes this lock in write mode. + */ + inode_lock_shared(file_inode(memfd)); ret = check_memfd_seals(memfd); - if (ret < 0) { - fput(memfd); - goto err; - } + if (ret) + goto out_unlock;
ret = udmabuf_pin_folios(ubuf, memfd, list[i].offset, list[i].size, folios); +out_unlock: + inode_unlock_shared(file_inode(memfd)); fput(memfd); if (ret) goto err;
On Wed, Dec 4, 2024 at 11:27 AM Jann Horn jannh@google.com wrote:
The current check_memfd_seals() is racy: Since we first do check_memfd_seals() and then udmabuf_pin_folios() without holding any relevant lock across both, F_SEAL_WRITE can be set in between. This is problematic because we can end up holding pins to pages in a write-sealed memfd.
Fix it using the inode lock, that's probably the easiest way. In the future, we might want to consider moving this logic into memfd, especially if anyone else wants to use memfd_pin_folios().
I am curious, why is it not possible to have a reproducer for this issue, is it not reproducible and is theoretical?
thanks,
- Joel
Reported-by: Julian Orth ju.orth@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219106 Closes: https://lore.kernel.org/r/CAG48ez0w8HrFEZtJkfmkVKFDhE5aP7nz=obrimeTgpD+StkV9... Fixes: fbb0de795078 ("Add udmabuf misc device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
drivers/dma-buf/udmabuf.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 8ce1f074c2d32a0a9f59ff7184359e37d56548c6..c1d8c2766d6d36fc5fe1b3d73057f6e01ec6678f 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -436,14 +436,19 @@ static long udmabuf_create(struct miscdevice *device, goto err; }
/*
* Take the inode lock to protect against concurrent
* memfd_add_seals(), which takes this lock in write mode.
*/
inode_lock_shared(file_inode(memfd)); ret = check_memfd_seals(memfd);
if (ret < 0) {
fput(memfd);
goto err;
}
if (ret)
goto out_unlock; ret = udmabuf_pin_folios(ubuf, memfd, list[i].offset, list[i].size, folios);
+out_unlock:
inode_unlock_shared(file_inode(memfd)); fput(memfd); if (ret) goto err;
-- 2.47.0.338.g60cca15819-goog
On Tue, Dec 10, 2024 at 11:51 PM Joel Fernandes joel@joelfernandes.org wrote:
On Wed, Dec 4, 2024 at 11:27 AM Jann Horn jannh@google.com wrote:
The current check_memfd_seals() is racy: Since we first do check_memfd_seals() and then udmabuf_pin_folios() without holding any relevant lock across both, F_SEAL_WRITE can be set in between. This is problematic because we can end up holding pins to pages in a write-sealed memfd.
Fix it using the inode lock, that's probably the easiest way. In the future, we might want to consider moving this logic into memfd, especially if anyone else wants to use memfd_pin_folios().
I am curious, why is it not possible to have a reproducer for this issue, is it not reproducible and is theoretical?
Sorry, I think I must have forgotten about this part when I wrote the cover letter: The original bug reporter (Julian) linked to a reproducer that is linked in the bugzilla bug report, at https://github.com/mahkoh/udmabuf-seal. I haven't tried running it myself though.
thanks,
- Joel
Reported-by: Julian Orth ju.orth@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219106 Closes: https://lore.kernel.org/r/CAG48ez0w8HrFEZtJkfmkVKFDhE5aP7nz=obrimeTgpD+StkV9... Fixes: fbb0de795078 ("Add udmabuf misc device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
On Tue, Dec 10, 2024 at 6:12 PM Jann Horn jannh@google.com wrote:
On Tue, Dec 10, 2024 at 11:51 PM Joel Fernandes joel@joelfernandes.org wrote:
On Wed, Dec 4, 2024 at 11:27 AM Jann Horn jannh@google.com wrote:
The current check_memfd_seals() is racy: Since we first do check_memfd_seals() and then udmabuf_pin_folios() without holding any relevant lock across both, F_SEAL_WRITE can be set in between. This is problematic because we can end up holding pins to pages in a write-sealed memfd.
Fix it using the inode lock, that's probably the easiest way. In the future, we might want to consider moving this logic into memfd, especially if anyone else wants to use memfd_pin_folios().
I am curious, why is it not possible to have a reproducer for this issue, is it not reproducible and is theoretical?
Sorry, I think I must have forgotten about this part when I wrote the cover letter: The original bug reporter (Julian) linked to a reproducer that is linked in the bugzilla bug report, at https://github.com/mahkoh/udmabuf-seal. I haven't tried running it myself though.
Thanks, I appreciate the pointer to the reproducer.
Acked-by: Joel Fernandes (Google) joel@joelfernandes.org
thanks,
- Joel
Subject: [PATCH v2 1/3] udmabuf: fix racy memfd sealing check
The current check_memfd_seals() is racy: Since we first do check_memfd_seals() and then udmabuf_pin_folios() without holding any relevant lock across both, F_SEAL_WRITE can be set in between. This is problematic because we can end up holding pins to pages in a write-sealed memfd.
Fix it using the inode lock, that's probably the easiest way. In the future, we might want to consider moving this logic into memfd, especially if anyone else wants to use memfd_pin_folios().
Reported-by: Julian Orth ju.orth@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219106 Closes: https://lore.kernel.org/r/CAG48ez0w8HrFEZtJkfmkVKFDhE5aP7nz=obrimeTg pD+StkV9w@mail.gmail.com Fixes: fbb0de795078 ("Add udmabuf misc device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
drivers/dma-buf/udmabuf.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 8ce1f074c2d32a0a9f59ff7184359e37d56548c6..c1d8c2766d6d36fc5fe1b3d73 057f6e01ec6678f 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -436,14 +436,19 @@ static long udmabuf_create(struct miscdevice *device, goto err; }
/*
* Take the inode lock to protect against concurrent
* memfd_add_seals(), which takes this lock in write mode.
*/
Thank you for adding comments.
Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com
ret = check_memfd_seals(memfd);inode_lock_shared(file_inode(memfd));
if (ret < 0) {
fput(memfd);
goto err;
}
if (ret)
goto out_unlock;
ret = udmabuf_pin_folios(ubuf, memfd, list[i].offset, list[i].size, folios);
+out_unlock:
fput(memfd); if (ret) goto err;inode_unlock_shared(file_inode(memfd));
-- 2.47.0.338.g60cca15819-goog
When F_SEAL_FUTURE_WRITE was introduced, it was overlooked that udmabuf must reject memfds with this flag, just like ones with F_SEAL_WRITE. Fix it by adding F_SEAL_FUTURE_WRITE to SEALS_DENIED.
Fixes: ab3948f58ff8 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Cc: stable@vger.kernel.org Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com Signed-off-by: Jann Horn jannh@google.com --- drivers/dma-buf/udmabuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index c1d8c2766d6d36fc5fe1b3d73057f6e01ec6678f..b330b99fcc7619a05bb7dc2aeeb9c82faf9a387b 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -297,7 +297,7 @@ static const struct dma_buf_ops udmabuf_ops = { };
#define SEALS_WANTED (F_SEAL_SHRINK) -#define SEALS_DENIED (F_SEAL_WRITE) +#define SEALS_DENIED (F_SEAL_WRITE|F_SEAL_FUTURE_WRITE)
static int check_memfd_seals(struct file *memfd) {
On Wed, Dec 4, 2024 at 11:27 AM Jann Horn jannh@google.com wrote:
When F_SEAL_FUTURE_WRITE was introduced, it was overlooked that udmabuf must reject memfds with this flag, just like ones with F_SEAL_WRITE. Fix it by adding F_SEAL_FUTURE_WRITE to SEALS_DENIED.
Fixes: ab3948f58ff8 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Cc: stable@vger.kernel.org Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com
Thanks!
Reviewed-by: Joel Fernandes (Google) joel@joelfernandes.org
- Joel
Signed-off-by: Jann Horn jannh@google.com
drivers/dma-buf/udmabuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index c1d8c2766d6d36fc5fe1b3d73057f6e01ec6678f..b330b99fcc7619a05bb7dc2aeeb9c82faf9a387b 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -297,7 +297,7 @@ static const struct dma_buf_ops udmabuf_ops = { };
#define SEALS_WANTED (F_SEAL_SHRINK) -#define SEALS_DENIED (F_SEAL_WRITE) +#define SEALS_DENIED (F_SEAL_WRITE|F_SEAL_FUTURE_WRITE)
static int check_memfd_seals(struct file *memfd) {
-- 2.47.0.338.g60cca15819-goog
Hi Jann,
Subject: [PATCH v2 0/3] fixes for udmabuf (memfd sealing checks and a leak)
I have tested that patches 2 and 3 work using the following reproducers. I did not write a reproducer for the issue described in patch 1.
Reproducer for F_SEAL_FUTURE_WRITE not being respected:
#define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <linux/udmabuf.h> #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) int main(void) { int memfd = SYSCHK(memfd_create("test", MFD_ALLOW_SEALING)); SYSCHK(ftruncate(memfd, 0x1000)); SYSCHK(fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK|F_SEAL_FUTURE_WRITE)); int udmabuf_fd = SYSCHK(open("/dev/udmabuf", O_RDWR)); struct udmabuf_create create_arg = { .memfd = memfd, .flags = 0, .offset = 0, .size = 0x1000 }; int buf_fd = SYSCHK(ioctl(udmabuf_fd, UDMABUF_CREATE, &create_arg)); printf("created udmabuf buffer fd %d\n", buf_fd); char *map = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, buf_fd, 0)); *map = 'a'; }
Reproducer for the memory leak (if you run this for a while, your memory usage will steadily go up, and /sys/kernel/debug/dma_buf/bufinfo will contain a ton of entries):
#define _GNU_SOURCE #include <err.h> #include <errno.h> #include <assert.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/resource.h> #include <linux/udmabuf.h> #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) int main(void) { int memfd = SYSCHK(memfd_create("test", MFD_ALLOW_SEALING)); SYSCHK(ftruncate(memfd, 0x1000)); SYSCHK(fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK)); int udmabuf_fd = SYSCHK(open("/dev/udmabuf", O_RDWR)); // prevent creating new FDs struct rlimit rlim = { .rlim_cur = 1, .rlim_max = 1 }; SYSCHK(setrlimit(RLIMIT_NOFILE, &rlim)); while (1) { struct udmabuf_create create_arg = { .memfd = memfd, .flags = 0, .offset = 0, .size = 0x1000 }; int buf_fd = ioctl(udmabuf_fd, UDMABUF_CREATE, &create_arg); assert(buf_fd == -1); assert(errno == EMFILE); } }
Signed-off-by: Jann Horn jannh@google.com
Changes in v2:
- patch 1/3: use file_inode instead of ->f_inode (Vivek)
- patch 1/3: add comment explaining the inode_lock_shared()
- patch 3/3: remove unused parameter (Vivek)
- patch 3/3: add comment on cleanup (Vivek)
- collect Acks
- Link to v1: https://lore.kernel.org/r/20241203-udmabuf-fixes-v1-0-
f99281c345aa@google.com
Jann Horn (3): udmabuf: fix racy memfd sealing check udmabuf: also check for F_SEAL_FUTURE_WRITE udmabuf: fix memory leak on last export_udmabuf() error path
Thank you for the fixes; all patches pushed to drm-misc-fixes.
Thanks, Vivek
drivers/dma-buf/udmabuf.c | 43 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 16 deletions(-)
base-commit: b86545e02e8c22fb89218f29d381fa8e8b91d815 change-id: 20241203-udmabuf-fixes-d0435ebab663
-- Jann Horn jannh@google.com
linux-stable-mirror@lists.linaro.org