On 9/21/18 6:33 AM, Eric Dumazet wrote:
On 09/21/2018 12:17 AM, Song Liu wrote:
On Sep 20, 2018, at 4:49 PM, Eric Dumazet eric.dumazet@gmail.com wrote:
On 09/20/2018 04:43 PM, Song Liu wrote:
I tried to totally skip ndo_poll_controller() here. It did avoid hitting the issue. However, netpoll will drop (fail to send) more packets.
Why is it failing ?
If you are under high memory pressure, then maybe if you absolutely want memory to send netpoll packets, you want to grab all NAPI contexts as a way to prevent other cpus from feeding incoming packets to the host and add more memory pressure ;)
I did the test with Eric's latest patch (and disable ndo_poll_controller in driver). The result didn't show significant increase in drop packets. I guess packet drops in my earlier test was caused by some other changes I mixed there.
So I think this patch does fix the issue. Thanks Eric!
Great, this is awesome.
I will prepare a patch series for net tree.
The core infrastructure is just better at being able to drain TX completions without risking stealing the NAPI context forever.
should we remove ndo_poll_controller then? My understanding that the patch helps by not letting drivers do napi_schedule() for all queues into this_cpu, right? But most of the drivers do exactly that in their ndo_poll_controller implementations. Means most of the drivers will experience this nasty behavior.