On Mon, Apr 8, 2024 at 10:20 AM Benjamin Tissoires benjamin.tissoires@redhat.com wrote:
On Mon, Apr 8, 2024 at 7:08 PM Eduard Zingerman eddyz87@gmail.com wrote:
On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
[...]
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 9234174ccb21..fd05d4358b31 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
- freeing the timers when inner map is replaced or deleted by user space.
*/ struct bpf_hrtimer {
struct hrtimer timer;
union {
struct hrtimer timer;
struct work_struct work;
}; struct bpf_map *map; struct bpf_prog *prog; void __rcu *callback_fn; void *value;
struct rcu_head rcu;
union {
struct rcu_head rcu;
struct work_struct sync_work;
Nit: I find this name very confusing, the field is used to cancel timer execution, is it a convention to call such things '...sync...'?
};
u64 flags;
};
[...]
+static void bpf_timer_sync_work_cb(struct work_struct *work) +{
struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work);
cancel_work_sync(&t->work);
kfree_rcu(t, rcu);
Sorry, I might be wrong, but this looks suspicious. The 'rcu' field of 'bpf_hrtimer' is defined as follows:
struct bpf_hrtimer { ... union { struct rcu_head rcu; struct work_struct sync_work; }; ... };
And for sleepable timers the 'sync_work' field is set as follows:
BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map, u64, flags) { ... INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb); ... }
So, it looks like 'kfree_rcu' would be called for a non-rcu pointer.
That was my initial assumption too, but Alexei told me it was fine. And I think he is correct because kfree_rcu doesn't need the rcu_head to be initialized.
So in the end, we initialize the memory as a work_struct, and when that work kicks in, we reuse that exact same memory as the rcu_head. This is fine because that work will never be reused.
If I understand correctly, this is to save a few bytes as this is a critical struct used in programs with a high rate usage, and every byte counts.
Yes. All correct. Probably makes sense to add a comment before kfree_rcu() line in bpf_timer_sync_work_cb() that kfree_rcu will wait for bpf_timer_cancel() to finish as was done in commit 0281b919e175 ("bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel").
I suspect that's what confused Eduard.
The patch 1 looks great overall.
If we're going to keep this combined bpf_timer_* api for both wq and timer we'd need to add flags compatibility check to bpf_timer_start() too. We can disallow this flag in 'flags' argument and use one from t->flags. Which kinda makes sense to make bpf_timer_start() less verbose. Alternatively we can allow bpf_timer_start() to have it, but then we'd need to check that it is actually passed. Either way the patch needs an extra check in bpf_timer_start(). Just ignoring BPF_F_TIMER_SLEEPABLE in bpf_timer_start() doesn't look right.