drm_wait_one_vblank() uses drm_WARN() to check for a time-dependent condition. Since syzkaller runs the kernel with the panic_on_warn set, this causes the entire kernel to panic with a "vblank wait timed out on crtc %i" message.
In this case it does not mean that there is something wrong with the kernel but is caused by time delays in vblanks handling that the fuzzer introduces as a side effect when fail_alloc_pages, failslab, fail_usercopy faults are injected with maximum verbosity. With lower verbosity this issue disappears.
drm_WARN() was introduced here by e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function") and it is intended to indicate a failure with vblank irqs handling by the underlying driver. The issue is raised during testing of the vkms driver, but it may be potentially reproduced with other drivers.
Fix this by using drm_warn() instead which does not cause the kernel to panic with panic_on_warn set, but still provides a way to tell users about this unexpected condition.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function") Cc: stable@vger.kernel.org Reported-by: syzbot+9a8f87865d5e2e8ef57f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9a8f87865d5e2e8ef57f Signed-off-by: Vitaliy Shevtsov v.shevtsov@maxima.ru --- drivers/gpu/drm/drm_vblank.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index 94e45ed6869d..fa09ff5b1d48 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -1304,7 +1304,8 @@ void drm_wait_one_vblank(struct drm_device *dev, unsigned int pipe) last != drm_vblank_count(dev, pipe), msecs_to_jiffies(100));
- drm_WARN(dev, ret == 0, "vblank wait timed out on crtc %i\n", pipe); + if (!ret) + drm_warn(dev, "vblank wait timed out on crtc %i\n", pipe);
drm_vblank_put(dev, pipe); }
On Fri, Jan 10, 2025 at 04:49:13PM +0000, Vitaliy Shevtsov wrote:
drm_wait_one_vblank() uses drm_WARN() to check for a time-dependent condition. Since syzkaller runs the kernel with the panic_on_warn set, this causes the entire kernel to panic with a "vblank wait timed out on crtc %i" message.
In this case it does not mean that there is something wrong with the kernel but is caused by time delays in vblanks handling that the fuzzer introduces as a side effect when fail_alloc_pages, failslab, fail_usercopy faults are injected with maximum verbosity. With lower verbosity this issue disappears.
Hm, unless a drivers vblank handling code is extremely fun, there should be absolutely no memory allocations or user copies in there at all. Hence I think you're papering over a real bug here. The vblank itself should be purely a free-wheeling hrtimer, if those stop we have serious kernel bug at our hands.
Which wouldn't be a big surprise, because we've fixed a _lot_ of bugs in vkms' vblank and page flip code, it's surprisingly tricky.
Iow, what kind of memory allocation is holding up vkms vblanks?
Cheers, Sima
drm_WARN() was introduced here by e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function") and it is intended to indicate a failure with vblank irqs handling by the underlying driver. The issue is raised during testing of the vkms driver, but it may be potentially reproduced with other drivers.
Fix this by using drm_warn() instead which does not cause the kernel to panic with panic_on_warn set, but still provides a way to tell users about this unexpected condition.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function") Cc: stable@vger.kernel.org Reported-by: syzbot+9a8f87865d5e2e8ef57f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9a8f87865d5e2e8ef57f Signed-off-by: Vitaliy Shevtsov v.shevtsov@maxima.ru
drivers/gpu/drm/drm_vblank.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index 94e45ed6869d..fa09ff5b1d48 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -1304,7 +1304,8 @@ void drm_wait_one_vblank(struct drm_device *dev, unsigned int pipe) last != drm_vblank_count(dev, pipe), msecs_to_jiffies(100));
- drm_WARN(dev, ret == 0, "vblank wait timed out on crtc %i\n", pipe);
- if (!ret)
drm_warn(dev, "vblank wait timed out on crtc %i\n", pipe);
drm_vblank_put(dev, pipe); } -- 2.47.1
On Fri, 10 Jan 2025 20:19:47 +0100 Simona Vetter simona.vetter@ffwll.ch wrote:
Hm, unless a drivers vblank handling code is extremely fun, there should be absolutely no memory allocations or user copies in there at all. Hence I think you're papering over a real bug here. The vblank itself should be purely a free-wheeling hrtimer, if those stop we have serious kernel bug at our hands.
Which wouldn't be a big surprise, because we've fixed a _lot_ of bugs in vkms' vblank and page flip code, it's surprisingly tricky.
Iow, what kind of memory allocation is holding up vkms vblanks?
Cheers, Sima
I don't think this is because of memory allocation. As far as I can see there is no memory allocation in vblanks handling. Okay, there is a kzalloc() call in vkms_atomic_crtc_reset() without checking a pointer but this is not the root cause of this issue. My first thought was that somehow a vblank might not be successfully processed by drm_crtc_handle_vblank() in vkms_vblank_simulate() function which always returns HRTIMER_RESTART even if a vblank handling failed. But this hypothesis was not confirmed - all vblanks are fine. The hrtimers in vkms have a hardcoded framedur value of 16ms and what I can see is that the fault injection creates some delays by unwinding the call stack when it simulates an allocation failure and this caused the hrtimers to lag. This what I was able to investigate while I was debugging the kernel in the gdb.
A similar issue was being discussed in https://lore.kernel.org/linux-kernel//0000000000009cd8d505bd545452@google.co...
On Sat, 11 Jan 2025, Vitaliy Shevtsov v.shevtsov@maxima.ru wrote:
On Fri, 10 Jan 2025 20:19:47 +0100 Simona Vetter simona.vetter@ffwll.ch wrote:
Hm, unless a drivers vblank handling code is extremely fun, there should be absolutely no memory allocations or user copies in there at all. Hence I think you're papering over a real bug here. The vblank itself should be purely a free-wheeling hrtimer, if those stop we have serious kernel bug at our hands.
Which wouldn't be a big surprise, because we've fixed a _lot_ of bugs in vkms' vblank and page flip code, it's surprisingly tricky.
Iow, what kind of memory allocation is holding up vkms vblanks?
Cheers, Sima
I don't think this is because of memory allocation. As far as I can see there is no memory allocation in vblanks handling. Okay, there is a kzalloc() call in vkms_atomic_crtc_reset() without checking a pointer but this is not the root cause of this issue. My first thought was that somehow a vblank might not be successfully processed by drm_crtc_handle_vblank() in vkms_vblank_simulate() function which always returns HRTIMER_RESTART even if a vblank handling failed. But this hypothesis was not confirmed - all vblanks are fine. The hrtimers in vkms have a hardcoded framedur value of 16ms and what I can see is that the fault injection creates some delays by unwinding the call stack when it simulates an allocation failure and this caused the hrtimers to lag. This what I was able to investigate while I was debugging the kernel in the gdb.
A similar issue was being discussed in https://lore.kernel.org/linux-kernel//0000000000009cd8d505bd545452@google.co...
Seems to me in most cases we do want the WARN, but there are corner cases. Arguably those should be addressed instead to ensure we won't ignore the real bugs. We want the warning, you want the panic.
BR, Jani.
linux-stable-mirror@lists.linaro.org