This patchset backports 4 important bug fixes for tc filter to 4.14 stable branch. Due to some big changes between 4.14 and 4.15, the backports are not trivial, I have to adjust and fix the conflicts manually.
Thanks to Roland for reporting the kernel warning and testing the patches.
Reported-by: Roland Franke fli4l@franke-prem.de Tested-by: Roland Franke fli4l@franke-prem.de Cc: Jiri Pirko jiri@mellanox.com Cc: Roman Kapl code@rkapl.cz Cc: David S. Miller davem@davemloft.net Signed-off-by: Cong Wang xiyou.wangcong@gmail.com
---
Cong Wang (1): net_sched: get rid of rcu_barrier() in tcf_block_put_ext()
Jiri Pirko (1): net: sched: fix use-after-free in tcf_block_put_ext
Roman Kapl (2): net: sched: fix crash when deleting secondary chains net: sched: crash on blocks with goto chain action
include/net/sch_generic.h | 1 - net/sched/cls_api.c | 48 +++++++++++++++++++++++------------------------ 2 files changed, 23 insertions(+), 26 deletions(-)
From: Roman Kapl code@rkapl.cz
[ Upstream commit d7aa04a5e82b4f254d306926c81eae8df69e5200 ]
If you flush (delete) a filter chain other than chain 0 (such as when deleting the device), the kernel may run into a use-after-free. The chain refcount must not be decremented unless we are sure we are done with the chain.
To reproduce the bug, run: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent ffff: flower ip link del dtest
Introduced in: commit f93e1cdcf42c ("net/sched: fix filter flushing"), but unless you have KAsan or luck, you won't notice it until commit 0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()")
Fixes: f93e1cdcf42c ("net/sched: fix filter flushing") Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Roman Kapl code@rkapl.cz Signed-off-by: David S. Miller davem@davemloft.net --- net/sched/cls_api.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index ecbb019efcbd..1451a56a8f93 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -197,14 +197,15 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
static void tcf_chain_flush(struct tcf_chain *chain) { - struct tcf_proto *tp; + struct tcf_proto *tp = rtnl_dereference(chain->filter_chain);
if (chain->p_filter_chain) RCU_INIT_POINTER(*chain->p_filter_chain, NULL); - while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) { + while (tp) { RCU_INIT_POINTER(chain->filter_chain, tp->next); - tcf_chain_put(chain); tcf_proto_destroy(tp); + tp = rtnl_dereference(chain->filter_chain); + tcf_chain_put(chain); } }
From: Roman Kapl code@rkapl.cz
[ Upstream commit a60b3f515d30d0fe8537c64671926879a3548103 ]
tcf_block_put_ext has assumed that all filters (and thus their goto actions) are destroyed in RCU callback and thus can not race with our list iteration. However, that is not true during netns cleanup (see tcf_exts_get_net comment).
Prevent the user after free by holding all chains (except 0, that one is already held). foreach_safe is not enough in this case.
To reproduce, run the following in a netns and then delete the ns: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2
Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()") Signed-off-by: Roman Kapl code@rkapl.cz Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net --- net/sched/cls_api.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 1451a56a8f93..ddae7053b745 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -282,7 +282,8 @@ static void tcf_block_put_final(struct work_struct *work) struct tcf_chain *chain, *tmp;
rtnl_lock(); - /* Only chain 0 should be still here. */ + + /* At this point, all the chains should have refcnt == 1. */ list_for_each_entry_safe(chain, tmp, &block->chain_list, list) tcf_chain_put(chain); rtnl_unlock(); @@ -290,17 +291,23 @@ static void tcf_block_put_final(struct work_struct *work) }
/* XXX: Standalone actions are not allowed to jump to any chain, and bound - * actions should be all removed after flushing. However, filters are now - * destroyed in tc filter workqueue with RTNL lock, they can not race here. + * actions should be all removed after flushing. */ void tcf_block_put(struct tcf_block *block) { - struct tcf_chain *chain, *tmp; + struct tcf_chain *chain;
if (!block) return;
- list_for_each_entry_safe(chain, tmp, &block->chain_list, list) + /* Hold a refcnt for all chains, except 0, so that they don't disappear + * while we are iterating. + */ + list_for_each_entry(chain, &block->chain_list, list) + if (chain->index) + tcf_chain_hold(chain); + + list_for_each_entry(chain, &block->chain_list, list) tcf_chain_flush(chain);
INIT_WORK(&block->work, tcf_block_put_final);
[ Upstream commit efbf78973978b0d25af59bc26c8013 ]
Both Eric and Paolo noticed the rcu_barrier() we use in tcf_block_put_ext() could be a performance bottleneck when we have a lot of tc classes.
Paolo provided the following to demonstrate the issue:
tc qdisc add dev lo root htb for I in `seq 1 1000`; do tc class add dev lo parent 1: classid 1:$I htb rate 100kbit tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb for J in `seq 1 10`; do tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J done done time tc qdisc del dev root
real 0m54.764s user 0m0.023s sys 0m0.000s
The rcu_barrier() there is to ensure we free the block after all chains are gone, that is, to queue tcf_block_put_final() at the tail of workqueue. We can achieve this ordering requirement by refcnt'ing tcf block instead, that is, the tcf block is freed only when the last chain in this block is gone. This also simplifies the code.
Paolo reported after this patch we get:
real 0m0.017s user 0m0.000s sys 0m0.017s
Tested-by: Paolo Abeni pabeni@redhat.com Cc: Eric Dumazet edumazet@google.com Cc: Jiri Pirko jiri@mellanox.com Cc: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net --- include/net/sch_generic.h | 1 - net/sched/cls_api.c | 29 +++++++++-------------------- 2 files changed, 9 insertions(+), 21 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 236bfe5b2ffe..6073e8bae025 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -273,7 +273,6 @@ struct tcf_chain {
struct tcf_block { struct list_head chain_list; - struct work_struct work; };
static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index ddae7053b745..135147c1deed 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -211,8 +211,12 @@ static void tcf_chain_flush(struct tcf_chain *chain)
static void tcf_chain_destroy(struct tcf_chain *chain) { + struct tcf_block *block = chain->block; + list_del(&chain->list); kfree(chain); + if (list_empty(&block->chain_list)) + kfree(block); }
static void tcf_chain_hold(struct tcf_chain *chain) @@ -276,26 +280,12 @@ int tcf_block_get(struct tcf_block **p_block, } EXPORT_SYMBOL(tcf_block_get);
-static void tcf_block_put_final(struct work_struct *work) -{ - struct tcf_block *block = container_of(work, struct tcf_block, work); - struct tcf_chain *chain, *tmp; - - rtnl_lock(); - - /* At this point, all the chains should have refcnt == 1. */ - list_for_each_entry_safe(chain, tmp, &block->chain_list, list) - tcf_chain_put(chain); - rtnl_unlock(); - kfree(block); -} - /* XXX: Standalone actions are not allowed to jump to any chain, and bound * actions should be all removed after flushing. */ void tcf_block_put(struct tcf_block *block) { - struct tcf_chain *chain; + struct tcf_chain *chain, *tmp;
if (!block) return; @@ -310,12 +300,11 @@ void tcf_block_put(struct tcf_block *block) list_for_each_entry(chain, &block->chain_list, list) tcf_chain_flush(chain);
- INIT_WORK(&block->work, tcf_block_put_final); - /* Wait for RCU callbacks to release the reference count and make - * sure their works have been queued before this. + /* At this point, all the chains should have refcnt >= 1. Block will be + * freed after all chains are gone. */ - rcu_barrier(); - tcf_queue_work(&block->work); + list_for_each_entry_safe(chain, tmp, &block->chain_list, list) + tcf_chain_put(chain); } EXPORT_SYMBOL(tcf_block_put);
From: Jiri Pirko jiri@mellanox.com
[ Upstream commit df45bf84e4f5a48f23d4b1a07d21d5 ]
Since the block is freed with last chain being put, once we reach the end of iteration of list_for_each_entry_safe, the block may be already freed. I'm hitting this only by creating and deleting clsact:
[ 202.171952] ================================================================== [ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390 [ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796 [ 202.194508] [ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5 [ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 202.213613] Call Trace: [ 202.216369] dump_stack+0xda/0x169 [ 202.220192] ? dma_virt_map_sg+0x147/0x147 [ 202.224790] ? show_regs_print_info+0x54/0x54 [ 202.229691] ? tcf_chain_destroy+0x1dc/0x250 [ 202.234494] print_address_description+0x83/0x3d0 [ 202.239781] ? tcf_block_put_ext+0x240/0x390 [ 202.244575] kasan_report+0x1ba/0x460 [ 202.248707] ? tcf_block_put_ext+0x240/0x390 [ 202.253518] tcf_block_put_ext+0x240/0x390 [ 202.258117] ? tcf_chain_flush+0x290/0x290 [ 202.262708] ? qdisc_hash_del+0x82/0x1a0 [ 202.267111] ? qdisc_hash_add+0x50/0x50 [ 202.271411] ? __lock_is_held+0x5f/0x1a0 [ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress] [ 202.281323] qdisc_destroy+0xcb/0x240 [ 202.285445] qdisc_graft+0x216/0x7b0 [ 202.289497] tc_get_qdisc+0x260/0x560
Fix this by holding the block also by chain 0 and put chain 0 explicitly, out of the list_for_each_entry_safe loop at the very end of tcf_block_put_ext.
Fixes: efbf78973978 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()") Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net --- net/sched/cls_api.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 135147c1deed..934c239cf98d 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -290,21 +290,22 @@ void tcf_block_put(struct tcf_block *block) if (!block) return;
- /* Hold a refcnt for all chains, except 0, so that they don't disappear + /* Hold a refcnt for all chains, so that they don't disappear * while we are iterating. */ list_for_each_entry(chain, &block->chain_list, list) - if (chain->index) - tcf_chain_hold(chain); + tcf_chain_hold(chain);
list_for_each_entry(chain, &block->chain_list, list) tcf_chain_flush(chain);
- /* At this point, all the chains should have refcnt >= 1. Block will be - * freed after all chains are gone. - */ + /* At this point, all the chains should have refcnt >= 1. */ list_for_each_entry_safe(chain, tmp, &block->chain_list, list) tcf_chain_put(chain); + + /* Finally, put chain 0 and allow block to be freed. */ + chain = list_first_entry(&block->chain_list, struct tcf_chain, list); + tcf_chain_put(chain); } EXPORT_SYMBOL(tcf_block_put);
From: Cong Wang xiyou.wangcong@gmail.com Date: Thu, 1 Mar 2018 13:46:35 -0800
This patchset backports 4 important bug fixes for tc filter to 4.14 stable branch. Due to some big changes between 4.14 and 4.15, the backports are not trivial, I have to adjust and fix the conflicts manually.
Thanks to Roland for reporting the kernel warning and testing the patches.
Reported-by: Roland Franke fli4l@franke-prem.de Tested-by: Roland Franke fli4l@franke-prem.de Cc: Jiri Pirko jiri@mellanox.com Cc: Roman Kapl code@rkapl.cz Cc: David S. Miller davem@davemloft.net Signed-off-by: Cong Wang xiyou.wangcong@gmail.com
Greg, please queue up this series for -stable.
Thank you!
On Thu, Mar 01, 2018 at 08:56:17PM -0500, David Miller wrote:
From: Cong Wang xiyou.wangcong@gmail.com Date: Thu, 1 Mar 2018 13:46:35 -0800
This patchset backports 4 important bug fixes for tc filter to 4.14 stable branch. Due to some big changes between 4.14 and 4.15, the backports are not trivial, I have to adjust and fix the conflicts manually.
Thanks to Roland for reporting the kernel warning and testing the patches.
Reported-by: Roland Franke fli4l@franke-prem.de Tested-by: Roland Franke fli4l@franke-prem.de Cc: Jiri Pirko jiri@mellanox.com Cc: Roman Kapl code@rkapl.cz Cc: David S. Miller davem@davemloft.net Signed-off-by: Cong Wang xiyou.wangcong@gmail.com
Greg, please queue up this series for -stable.
All now applied, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org