Hi Steve,
You're absolutely right to ask for clarification, and I now realize that I didn’t explain the background clearly enough in my cover letter.
On 8/5/25 1:24 오전, Steven Rostedt wrote:
On Sun, 3 Aug 2025 07:20:41 +0000 Yunseong Kim ysk@kzalloc.com wrote:
This patch series resolves a sleeping function called from invalid context bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel.
The regression was introduced by the interaction of two separate patches: one that made kcov's internal locks sleep on PREEMPT_RT for better latency
Just so I fully understand this change. It is basically reverting the "better latency" changes? That is, with KCOV anyone running with PREEMPT_RT can expect non deterministic latency behavior?
The regression results from the interaction of two changes — and in my original description, I inaccurately characterized one of them as being "for better latency." That was misleading.
The first change d5d2c51 replaced spin_lock_irqsave() with local_lock_irqsave() in KCOV to ensure compatibility with PREEMPT_RT. This avoided using a potentially sleeping lock with interrupts disabled. At the time, as Sebastian noted:
"There is no compelling reason to change the lock type to raw_spin_lock_t... Changing it would require to move memory allocation and deallocation outside of the locked section."
However, the situation changed after another patch 8fea0c8 converted the USB HCD tasklet to a BH workqueue. As a result, usb_giveback_urb_bh() began running with interrupts enabled, and the KCOV remote coverage collection section in this path became re-entrant. To prevent nested coverage sections — which KCOV doesn’t support — kcov_remote_start_usb_softirq() was updated to explicitly disable interrupts during coverage collection f85d39d.
This combination — using a local_lock (which can sleep on RT) alongside local_irq_save() — inadvertently created a scenario where a sleeping lock was acquired in atomic context, triggering a kernel BUG on PREEMPT_RT.
So while the original KCOV locking change didn't require raw spinlocks at the time, it became effectively incompatible with the USB softirq use case once that path began relying on interrupt disabling for correctness. In this sense, the "no compelling reason" eventually turned into a "necessary compromise."
To clarify: this patch series doesn't revert the previous change entirely. It keeps the local_lock behavior for task context (where it's safe and appropriate), but ensures atomic safety in interrupt/softirq contexts by using raw spinlocks selectively where needed.
This should be fully documented. I assume this will not be a problem as kcov is more for debugging and should not be enabled in production.
-- Steve
Thanks again for raising this — I’ll make sure the changelog documents this interaction more clearly.
Best regards, Yunseong Kim