Hi,
I would like to request backporting commit b441cf3f8c4b ("xfrm: delete x->tunnel as we delete x") to all LTS kernels. This patch actually fixes a use-after-free issue, but it hasn't been backported to any of the LTS versions, which are still being affected.
As the patch describes, a specific trigger scenario could be:
If a tunnel packet is received (e.g., in ip_local_deliver()), with the outer layer being IPComp protocol and the inner layer being fragmented packets, during outer packet processing, it will go through xfrm_input() to hold a reference to the IPComp xfrm_state. Then, it is re-injected into the network stack via gro_cells_receive() and placed in the reassembly queue. When exiting the netns and calling cleanup_net(), although ipv4_frags_exit_net() is called before xfrm_net_exit(), due to asynchronous scheduling, fqdir_free_work() may execute after xfrm_state_fini().
In xfrm_state_fini(), xfrm_state_flush() puts and deletes the xfrm_state for IPPROTO_COMP, but does not delete the xfrm_state for IPPROTO_IPIP. Meanwhile, the skb in the reassembly queue holds the last reference to the IPPROTO_COMP xfrm_state, so it isn't destroyed yet. Only when the skb in the reassembly queue is destroyed does the IPPROTO_COMP xfrm_state get fully destroyed, which calls ipcomp_destroy() to delete the IPPROTO_IPIP xfrm_state. However, by this time, the hash tables (net->xfrm.state_byxxx) have already been kfreed in xfrm_state_fini(), leading to a use-after-free during the deletion.
The bug has existed since kernel v2.6.29, so the patch should be backported to all LTS kernels.
thanks,
Slavin Liu