On Fri, Feb 9, 2024 at 6:05 PM Toke Høiland-Jørgensen toke@redhat.com wrote:
Benjamin Tissoires benjamin.tissoires@redhat.com writes:
On Fri, Feb 9, 2024 at 4:42 PM Toke Høiland-Jørgensen toke@redhat.com wrote:
Benjamin Tissoires bentiss@kernel.org writes:
[Putting this as a RFC because I'm pretty sure I'm not doing the things correctly at the BPF level.] [Also using bpf-next as the base tree as there will be conflicting changes otherwise]
Ideally I'd like to have something similar to bpf_timers, but not in soft IRQ context. So I'm emulating this with a sleepable bpf_tail_call() (see "HID: bpf: allow to defer work in a delayed workqueue").
Why implement a new mechanism? Sounds like what you need is essentially the bpf_timer functionality, just running in a different context, right?
Heh, that's exactly why I put in a RFC :)
So yes, the bpf_timer approach is cleaner, but I need it in a workqueue, as a hrtimer in a softIRQ would prevent me to kzalloc and wait for the device.
Right, makes sense.
So why not just add a flag to the timer setup that controls the callback context? I've been toying with something similar for restarting XDP TX for my queueing patch series (though I'm not sure if this will actually end up being needed in the end):
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/commit/?h=xdp...
Oh, nice. Good idea. But would it be OK to have a "timer-like" where it actually defers the job in a workqueue instead of using an hrtimer?
That's conceptually still a timer, though, isn't it? I.e., it's a mechanism whereby you specify a callback and a delay, and bpf_timer ensures that your callback is called after that delay. IMO it's totally congruent with that API to be able to specify a different execution context as part of the timer setup.
Yep :)
There is still a problem I wasn't able to fix over the week end and today. How can I tell the verifier that the callback is sleepable, when the tracing function that called the timer_start() function is not? (more on that below).
As for how to implement it, I suspect the easiest may be something similar to what the patch I linked above does: keep the hrtimer, and just have a different (kernel) callback function when the timer fires which does an immediate schedule_work() (without the _delayed) and then runs the BPF callback in that workqueue. I.e., keep the delay handling the way the existing bpf_timer implementation does it, and just add an indirection to start the workqueue in the kernel dispatch code.
Sounds good, especially given that's roughly how the delayed_timers are implemented.
I thought I would have to rewrite the entire bpf_timer approach without the softIRQ, but if I can just add a new flag, that will make things way simpler for me.
IMO that would be fine. You may want to wait for the maintainers to chime in before going down this route, though :)
This however raises another issue if I were to use the bpf_timers: now the HID-BPF kfuncs will not be available as they are only available to tracing prog types. And when I tried to call them from a bpf_timer (in softIRQ) they were not available.
IIUC, the bpf_timer callback is just a function (subprog) from the verifier PoV, so it is verified as whatever program type is creating the timer. So in other words, as long as you setup the timer from inside a tracing prog type, you should have access to all the same kfuncs, I think?
Yep, you are correct. But as mentioned above, I am now in trouble with the sleepable state: - I need to call timer_start() from a non sleepable tracing function (I'm in hard IRQ when dealing with a physical device) - but then, ideally, the callback function needs to be tagged as a sleepable one, so I can export my kfuncs which are doing kzalloc and device IO as such.
However, I can not really teach the BPF verifier to do so: - it seems to check for the callback first when it is loaded, and there is no SEC() equivalent for static functions - libbpf doesn't have access to the callback as a prog as it has to be a static function, and thus isn't exported as a full-blown prog. - the verifier only checks for the callback when dealing with BPF_FUNC_timer_set_callback, which doesn't have a "flag" argument (though the validation of the callback has already been done while checking it first, so we are already too late to change the sleppable state of the callback)
Right now, the only OK-ish version I have is declaring the kfunc as non-sleepable, but checking that we are in a different context than the IRQ of the initial event. This way, I am not crashing if this function is called from the initial IRQ, but will still crash if used outside of the hid context.
This is not satisfactory, but I feel like it's going to be hard to teach the verifier that the callback function is sleepable in that case (maybe we could suffix the callback name, like we do for arguments, but this is not very clean either).
Cheers, Benjamin