Hi!
[ Upstream commit be6cef69ba570ebb327eba1ef6438f7af49aaf86 ]
On embedded environments with hard memory limits it is a normal although rare case when skb can't be allocated on rx part under high traffic.
In such OOM cases napi_complete_done() was not called. So the napi object became in an invalid state like it is "scheduled". Kernel do not re-schedules the poll of that napi object.
Consequently, kernel can not remove that object the system hangs on `ifconfig down` waiting for a poll.
We are fixing this by gracefully closing napi poll routine with correct invocation of napi_complete_done.
This was reproduced with artificially failing the allocation of skb to simulate an "out of memory" error case and check that traffic does not get stuck.
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c @@ -89,6 +89,7 @@ static int aq_vec_poll(struct napi_struct *napi, int budget) } } +err_exit: if (!was_tx_cleaned) work_done = budget;
This results in some... really "interesting" code that could use some refactoring.
First, "goto err_exit" is now same as break.
Second, if (!self) now sets variable that is never used. "if (!self) return 0;" would be more readable and would allow for less confusing indentation.
Best regards, Pavel