If a process receives a signal while it executes some kernel code that calls mm_take_all_locks, we get -EINTR error. The -EINTR is propagated up the call stack to userspace and userspace may fail if it gets this error.
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler was installed with the SA_RESTART flag, the operation is automatically restarted.
For example, this problem happens when using OpenCL on AMDGPU. If some signal races with clGetDeviceIDs, clGetDeviceIDs returns an error CL_DEVICE_NOT_FOUND (and strace shows that open("/dev/kfd") failed with EINTR).
This problem can be reproduced with the following program.
To run this program, you need AMD graphics card and the package "rocm-opencl" installed. You must not have the package "mesa-opencl-icd" installed, because it redirects the default OpenCL implementation to itself.
include <stdio.h> include <stdlib.h> include <unistd.h> include <string.h> include <signal.h> include <sys/time.h>
define CL_TARGET_OPENCL_VERSION 300 include <CL/opencl.h>
static void fn(void) { while (1) { int32_t err; cl_device_id device; err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL); if (err != CL_SUCCESS) { fprintf(stderr, "clGetDeviceIDs failed: %d\n", err); exit(1); } write(2, "-", 1); } }
static void alrm(int sig) { write(2, ".", 1); }
int main(void) { struct itimerval it; struct sigaction sa; memset(&sa, 0, sizeof sa); sa.sa_handler = alrm; sa.sa_flags = SA_RESTART; sigaction(SIGALRM, &sa, NULL); it.it_interval.tv_sec = 0; it.it_interval.tv_usec = 50; it.it_value.tv_sec = 0; it.it_value.tv_usec = 50; setitimer(ITIMER_REAL, &it, NULL); fn(); return 1; }
I'm submitting this patch for the stable kernels, because the AMD ROCm stack fails if it receives EINTR from open (it seems to restart EINTR from ioctl correctly). The process may receive signals at unpredictable times, so the OpenCL implementation may fail at unpredictable times.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Link: https://lists.freedesktop.org/archives/amd-gfx/2025-November/133141.html Link: https://yhbt.net/lore/linux-mm/6f16b618-26fc-3031-abe8-65c2090262e7@redhat.c... Cc: stable@vger.kernel.org Fixes: 7906d00cd1f6 ("mmu-notifiers: add mm_take_all_locks() operation") --- mm/vma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Index: mm/mm/vma.c =================================================================== --- mm.orig/mm/vma.c 2026-01-07 20:11:21.000000000 +0100 +++ mm/mm/vma.c 2026-01-07 20:11:21.000000000 +0100 @@ -2202,7 +2202,7 @@ int mm_take_all_locks(struct mm_struct *
out_unlock: mm_drop_all_locks(mm); - return -EINTR; + return -ERESTARTSYS; }
static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
On Wed, Jan 07, 2026 at 09:31:14PM +0100, Mikulas Patocka wrote:
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler was installed with the SA_RESTART flag, the operation is automatically restarted.
No, this is bonkers. If you get a signal, you return -EINTR.
On Wed, 7 Jan 2026, Matthew Wilcox wrote:
On Wed, Jan 07, 2026 at 09:31:14PM +0100, Mikulas Patocka wrote:
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler was installed with the SA_RESTART flag, the operation is automatically restarted.
No, this is bonkers. If you get a signal, you return -EINTR.
Why?
fifo_open returns -ERESTARTSYS, so why not here?
Mikulas
On Wed, Jan 07, 2026 at 10:29:44PM +0100, Mikulas Patocka wrote:
On Wed, 7 Jan 2026, Matthew Wilcox wrote:
On Wed, Jan 07, 2026 at 09:31:14PM +0100, Mikulas Patocka wrote:
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler was installed with the SA_RESTART flag, the operation is automatically restarted.
No, this is bonkers. If you get a signal, you return -EINTR.
Why?
fifo_open returns -ERESTARTSYS, so why not here?
This is fundamentally changing what this does for all callers, and we've simply not encountered this issue elsewhere.
Given how long it might take to work it might be interrupted by another signal on the next attempt, whose to say that it might not result in a situation where it can never complete?
I thnk it should stay as it is.
Also the comment above literally has:
* mm_take_all_locks() and mm_drop_all_locks are expensive operations * that may have to take thousand of locks. * * mm_take_all_locks() can fail if it's interrupted by signals. */
Which at any rate would need to be changed even if we were to change it.
I'm more and more inclined to say let's just drop this series in general, and you should go fix how signals are handled in the driver/userland code.
This is really feeling like the wrong place to fix this.
Mikulas
Cheers, Lorenzo
* Mikulas Patocka mpatocka@redhat.com [260107 15:32]:
If a process receives a signal while it executes some kernel code that calls mm_take_all_locks, we get -EINTR error. The -EINTR is propagated up the call stack to userspace and userspace may fail if it gets this error.
This commit changes -EINTR to -ERESTARTSYS, so that if the signal handler was installed with the SA_RESTART flag, the operation is automatically restarted.
For example, this problem happens when using OpenCL on AMDGPU. If some signal races with clGetDeviceIDs, clGetDeviceIDs returns an error CL_DEVICE_NOT_FOUND (and strace shows that open("/dev/kfd") failed with EINTR).
This problem can be reproduced with the following program.
To run this program, you need AMD graphics card and the package "rocm-opencl" installed. You must not have the package "mesa-opencl-icd" installed, because it redirects the default OpenCL implementation to itself.
What about [1]? This reproducer should be fixed in the ROCm userspace library, as stated by Christian König.
1. https://lore.kernel.org/linux-mm/5c8ccd03-9a3a-47c0-9a89-be0abf6c1fb5@amd.co...
Thanks, Liam
linux-stable-mirror@lists.linaro.org