Hi Sebastian,
I was waiting for your review — thanks!
On 8/9/25 1:33 오전, Sebastian Andrzej Siewior wrote:
On 2025-07-25 20:14:01 [+0000], Yunseong Kim wrote:
When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following bug is triggered in the ksoftirqd context.
…
This issue was introduced by commit f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save() call establishes an atomic context where sleeping is forbidden. Inside this context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in a sleeping function called from invalid context.
On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario is already safely handled by the existing local_lock_t and the global kcov_remote_lock within kcov_remote_start(). Therefore, the outer local_irq_save() is not necessary.
This preserves the intended re-entrancy protection for non-RT kernels while resolving the locking violation on PREEMPT_RT kernels.
After making this modification and testing it, syzkaller fuzzing the PREEMPT_RT kernel is now running without stopping on latest announced Real-time Linux.
This looks oddly familiar because I removed the irq-disable bits while adding local-locks.
Commit f85d39dd7ed8 looks wrong not that it shouldn't disable interrupts. The statement in the added comment
| + * 2. Disables interrupts for the duration of the coverage collection section. | + * This allows avoiding nested remote coverage collection sections in the | + * softirq context (a softirq might occur during the execution of a work in | + * the BH workqueue, which runs with in_serving_softirq() > 0).
is wrong. Softirqs are never nesting. While the BH workqueue is running another softirq does not occur. The softirq is raised (again) and will be handled _after_ BH workqueue is done. So this is already serialised.
The issue is __usb_hcd_giveback_urb() always invokes kcov_remote_start_usb_softirq(). __usb_hcd_giveback_urb() itself is invoked from BH context (for the majority of HCDs) and from hardirq context for the root-HUB. This gets us to the scenario that that we are in the give-back path in softirq context and then invoke the function once again in hardirq context.
I have no idea how kcov works but reverting the original commit and avoiding the false nesting due to hardirq context should do the trick, an untested patch follows.
This isn't any different than the tasklet handling that was used before so I am not sure why it is now a problem.
Thank you for the detailed analysis and the patch. Your explanation about the real re-entrancy issue being "softirq vs. hardirq" and the faulty premise in the original commit makes perfect sense.
Could someone maybe test this?
As you requested, I have tested your patch on my setup.
I can check that your patch resolves the issue. I have been running the syzkaller for several hours, and the "sleeping function called from invalid context" bug is no longer triggered.
--- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1636,7 +1636,6 @@ static void __usb_hcd_giveback_urb(struct urb *urb) struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); struct usb_anchor *anchor = urb->anchor; int status = urb->unlinked;
- unsigned long flags;
urb->hcpriv = NULL; if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) && @@ -1654,14 +1653,13 @@ static void __usb_hcd_giveback_urb(struct urb *urb) /* pass ownership to the completion handler */ urb->status = status; /*
* Only collect coverage in the softirq context and disable interrupts
* to avoid scenarios with nested remote coverage collection sections
* that KCOV does not support.
* See the comment next to kcov_remote_start_usb_softirq() for details.
* This function can be called in task context inside another remote
* coverage collection section, but kcov doesn't support that kind of
*/* recursion yet. Only collect coverage in softirq context for now.
- flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
- kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); urb->complete(urb);
- kcov_remote_stop_softirq(flags);
- kcov_remote_stop_softirq();
usb_anchor_resume_wakeups(anchor); atomic_dec(&urb->use_count); diff --git a/include/linux/kcov.h b/include/linux/kcov.h index 75a2fb8b16c32..0143358874b07 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -57,47 +57,21 @@ static inline void kcov_remote_start_usb(u64 id) /*
- The softirq flavor of kcov_remote_*() functions is introduced as a temporary
- workaround for KCOV's lack of nested remote coverage sections support.
- Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337.
- kcov_remote_start_usb_softirq():
- Only collects coverage when called in the softirq context. This allows
- avoiding nested remote coverage collection sections in the task context.
- For example, USB/IP calls usb_hcd_giveback_urb() in the task context
- within an existing remote coverage collection section. Thus, KCOV should
- not attempt to start collecting coverage within the coverage collection
- section in __usb_hcd_giveback_urb() in this case.
- Disables interrupts for the duration of the coverage collection section.
- This allows avoiding nested remote coverage collection sections in the
- softirq context (a softirq might occur during the execution of a work in
- the BH workqueue, which runs with in_serving_softirq() > 0).
- For example, usb_giveback_urb_bh() runs in the BH workqueue with
- interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- the middle of its remote coverage collection section, and the interrupt
- handler might invoke __usb_hcd_giveback_urb() again.
- work around for kcov's lack of nested remote coverage sections support in
- task context. Adding support for nested sections is tracked in:
- */
-static inline unsigned long kcov_remote_start_usb_softirq(u64 id) +static inline void kcov_remote_start_usb_softirq(u64 id) {
- unsigned long flags = 0;
- if (in_serving_softirq()) {
local_irq_save(flags);
- if (in_serving_softirq() && !in_hardirq()) kcov_remote_start_usb(id);
- }
- return flags;
} -static inline void kcov_remote_stop_softirq(unsigned long flags) +static inline void kcov_remote_stop_softirq(void) {
- if (in_serving_softirq()) {
- if (in_serving_softirq() && !in_hardirq()) kcov_remote_stop();
local_irq_restore(flags);
- }
} #ifdef CONFIG_64BIT @@ -131,11 +105,8 @@ static inline u64 kcov_common_handle(void) } static inline void kcov_remote_start_common(u64 id) {} static inline void kcov_remote_start_usb(u64 id) {} -static inline unsigned long kcov_remote_start_usb_softirq(u64 id) -{
- return 0;
-} -static inline void kcov_remote_stop_softirq(unsigned long flags) {} +static inline void kcov_remote_start_usb_softirq(u64 id) {} +static inline void kcov_remote_stop_softirq(void) {} #endif /* CONFIG_KCOV */ #endif /* _LINUX_KCOV_H */
I really impressed your "How to Not Break PREEMPT_RT" talk at LPC 22.
Tested-by: Yunseong Kim ysk@kzalloc.com
Thanks,
Yunseong Kim