On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
In any case you must avoid empty lines in the tag area.
If you really intend targeting the -net tree, please repost fixing the above and explicitly specifying the target tree in the subj prefix.
thanks!
Paolo