On 2018-09-06 10:35 p.m., Al Viro wrote:
On Thu, Sep 06, 2018 at 06:21:09AM -0400, Jamal Hadi Salim wrote:
[..]
Argh... Unfortunately, there's this: in u32_delete() we have if (root_ht) { if (root_ht->refcnt > 1) { *last = false; goto ret; } if (root_ht->refcnt == 1) { if (!ht_empty(root_ht)) { *last = false; goto ret; } } } and that would need to be updated.
It is not detrimental as you have it right now but you are right an adjustment is needed...
Deleting of a root directly should not be allowed. But you can flush a whole tp. Consider this: -- sudo tc qdisc add dev $P ingress sudo tc filter add dev $P parent ffff: protocol ip prio 10 \ u32 match ip protocol 1 0xff
Which creates root ht 800
You shouldnt be allowed to do this: -- tc filter delete dev $P parent ffff: protocol ip prio 10 handle 800: u32 ---
But you can delete the tp entirely as such: --- tc filter delete dev $P parent ffff: protocol ip prio 10 u32 --
The later will go via the destroy() path and flush all filters.
You should also be able to delete individual filters. ex: $tc filter del dev $P parent ffff: prio 10 handle 800:0:800 u32
Where that code you are referring to is important is when the last filter deleted - we need the caller to know and it destroys root.
i.e you should return last=true when the last filter is deleted so root gets auto deleted (just like it was autocreated)
However, that logics is bloody odd to start with. First of all, root_ht has come from struct tc_u_hnode *root_ht = rtnl_dereference(tp->root); and the only place where it's ever modified is rcu_assign_pointer(tp->root, root_ht); in u32_init(), where we'd bloody well checked that root_ht is non-NULL (see if (root_ht == NULL) return -ENOBUFS; upstream of that place) and where that assignment is inevitable on the way to returning 0. No matter what, if tp has passed u32_init() it will have non-NULL ->root, forever. And there is no way for tcf_proto to be seen outside of tcf_proto_create() without ->init() having returned 0 - it gets freed before anyone sees it.
Yes, the check for root_ht is not necessary - but the check for the last filter (and testing for last) is needed.
So this 'if (root_ht)' can't be false. What's more, what the hell is the whole thing checking? We are in u32_delete(). It's called (as ->delete()) from tfilter_del_notify(), which is called from tc_del_tfilter(). If we return 0 with *last true, we follow up calling tcf_proto_destroy(). OK, let's look at the logics in there:
- if there are links to root hnode => false
- if there's no links to root hnode and it has knodes => false
(BTW, if we ever get there with root_ht->refcnt < 1, we are obviously screwed)
- if there is a tcf_proto sharing tp->data => false (i.e. any filters
with different prio - don't bother)
- if tp is the only one with reference to tp->data and there are *any*
knodes => false.
Any extra links can come only from knodes in a non-empty hnode. And it's not a common case. Shouldn't thIe whole thing be
- shared tp->data => false
- any non-empty hnode => false
instead? Perhaps even with the knode counter in tp->data, avoiding any loops in there, as well as the entire ht_empty()...
Now, in the very beginning of u32_delete() we have this: struct tc_u_hnode *ht = arg; if (ht == NULL) goto out; OK, but the call of ->delete() is err = tp->ops->delete(tp, fh, last, extack); and arg == NULL seen in u32_delete() means fh == NULL in tfilter_del_notify(). Which is called in if (!fh) { ... } else { bool last;
err = tfilter_del_notify(net, skb, n, tp, block, q, parent, fh, false, &last, extack);
How can we ever get there with NULL fh?
Try: tc filter delete dev $P parent ffff: protocol ip prio 10 u32 tcm handle is 0, so will hit that code path.
The whole thing makes very little sense; looks like it used to live in u32_destroy() prior to commit 763dbf6328e41 ("net_sched: move the empty tp check from ->destroy() to ->delete()"), but looking at the rationale in that commit... I don't see how it fixes anything - sure, now we remove tcf_proto from the list before calling ->destroy(). Without any RCU delays in between. How could it possibly solve any issues with ->classify() called in parallel with ->destroy()? cls_u32 (at least these days) does try to survive u32_destroy() in parallel with u32_classify(); if any other classifiers do not, they are still broken and that commit has not done anything for them.
Anyway, adjusting 1/7 for that is trivial, but I would really like to understand what that code is doing... Comments?
Refer to above.
cheers, jamal