Huge thanks to everyone for the feedback!
While working on earlier patches, running syzkaller on PREEMPT_RT uncovered numerous sleep-in-atomic-context bugs and other synchronization issues unique to that environment. This highlighted the need to address these problems.
On 7/26/25 8:59 오후, Thomas Gleixner wrote:
On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
static void __usb_hcd_giveback_urb(struct urb *urb) { (...snipped...) kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) { if (in_serving_softirq()) { local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y kcov_remote_start_usb(id) { kcov_remote_start(id) { kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) { (...snipped...) local_lock_irqsave(&kcov_percpu_data.lock, flags) { __local_lock_irqsave(lock, flags) { #ifndef CONFIG_PREEMPT_RT https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... #else https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_i... // not calling local_irq_save(flags) #endif
Right, it does not invoke local_irq_save(flags), but it takes the underlying lock, which means it prevents reentrance.
Ok, but then how does the big comment section for kcov_remote_start_usb_softirq() work, where it explicitly states:
- Disables interrupts for the duration of the coverage collection section.
- This allows avoiding nested remote coverage collection sections in the
- softirq context (a softirq might occur during the execution of a work in
- the BH workqueue, which runs with in_serving_softirq() > 0).
- For example, usb_giveback_urb_bh() runs in the BH workqueue with
- interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- the middle of its remote coverage collection section, and the interrupt
- handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong to me as any sort of solution, as you have just said that all of that documentation entry is now not needed.
I'm not so sure because kcov_percpu_data.lock is only held within kcov_remote_start() and kcov_remote_stop(), but the above comment suggests that the whole section needs to be serialized.
Though I'm not a KCOV wizard and might be completely wrong here.
If the whole section is required to be serialized, then this need another local lock in kcov_percpu_data to work correctly on RT.
Thanks,
tglx
After receiving comments from maintainers, I realized that my initial patch set wasn't heading in the right direction.
It seems that the following two patches conflict on PREEMPT_RT kernels:
1. kcov: replace local_irq_save() with a local_lock_t Link: https://github.com/torvalds/linux/commit/d5d2c51f1e5f 2. kcov, usb: disable interrupts in kcov_remote_start_usb_softirq Link: https://github.com/torvalds/linux/commit/f85d39dd7ed8
My current approach involves:
* Removing the existing 'kcov_percpu_data.lock' * Converting 'kcov->lock' and 'kcov_remote_lock' to raw spinlocks * Relocating the kmalloc call for kcov_remote_add() outside kcov_ioctl_locked(), as GFP_ATOMIC allocations can potentially sleep under PREEMPT_RT. : As expected from further testing, keeping the GFP_ATOMIC allocation inside kcov_remote_add() still leads to sleep in atomic context.
This approach allows us to keep Andrey’s patch d5d2c51f1e5f while making modifications as Sebastian suggested in his commit f85d39dd7ed8 message, which I found particularly insightful and full of helpful hints.
The work I'm doing on PATCH v2 involves a number of changes, and I would truly appreciate any critical feedback. I'm always happy to hear insights!
Best regards, Yunseong Kim