On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires benjamin.tissoires@redhat.com wrote:
On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen toke@redhat.com wrote:
Benjamin Tissoires benjamin.tissoires@redhat.com writes:
[...]
IIUC, the bpf_timer callback is just a function (subprog) from the verifier PoV, so it is verified as whatever program type is creating the timer. So in other words, as long as you setup the timer from inside a tracing prog type, you should have access to all the same kfuncs, I think?
Yep, you are correct. But as mentioned above, I am now in trouble with the sleepable state:
- I need to call timer_start() from a non sleepable tracing function
(I'm in hard IRQ when dealing with a physical device)
- but then, ideally, the callback function needs to be tagged as a
sleepable one, so I can export my kfuncs which are doing kzalloc and device IO as such.
However, I can not really teach the BPF verifier to do so:
- it seems to check for the callback first when it is loaded, and
there is no SEC() equivalent for static functions
- libbpf doesn't have access to the callback as a prog as it has to be
a static function, and thus isn't exported as a full-blown prog.
- the verifier only checks for the callback when dealing with
BPF_FUNC_timer_set_callback, which doesn't have a "flag" argument (though the validation of the callback has already been done while checking it first, so we are already too late to change the sleppable state of the callback)
Right now, the only OK-ish version I have is declaring the kfunc as non-sleepable, but checking that we are in a different context than the IRQ of the initial event. This way, I am not crashing if this function is called from the initial IRQ, but will still crash if used outside of the hid context.
This is not satisfactory, but I feel like it's going to be hard to teach the verifier that the callback function is sleepable in that case (maybe we could suffix the callback name, like we do for arguments, but this is not very clean either).
The callback is only set once when the timer is first setup; I *think* it works to do the setup (bpf_timer_init() and bpf_timer_set_callback()) in the context you need (from a sleepable prog), but do the arming (bpf_timer_start()) from a different program that is not itself sleepable?
Genius! It works, and I can just keep having them declared as a syscall kfunc, not as a tracing kfunc.
But isn't this an issue outside of my use case? I mean, if the callback is assuming the environment for when it is set up but can be called from any context there seems to be a problem when 2 contexts are not equivalent, no?
I agree that workqueue delegation fits into the bpf_timer concept and a lot of code can and should be shared. All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :) Too bad, bpf_timer_set_callback() doesn't have a flag argument, so we need a new kfunc to set a sleepable callback. Maybe bpf_timer_set_sleepable_cb() ? The verifier will set is_async_cb = true for it (like it does for regular cb-s). And since prog->aux->sleepable is kinda "global" we need another per subprog flag: bool is_sleepable: 1;
We can factor out a check "if (prog->aux->sleepable)" into a helper that will check that "global" flag and another env->cur_state->in_sleepable flag that will work similar to active_rcu_lock. Once the verifier starts processing subprog->is_sleepable it will set cur_state->in_sleepable = true; to make all subprogs called from that cb to be recognized as sleepable too.
A bit of a challenge is what to do with global subprogs, since they're verified lazily. They can be called from sleepable and non-sleepable contex. Should be solvable.
Overall I think this feature is needed urgently, so if you don't have cycles to work on this soon, I can prioritize it right after bpf_arena work.