From: Long Li longli@microsoft.com
After napi_complete_done() is called, another NAPI may be running on another CPU and ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to ARM the CQ, this triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary rings.
Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Signed-off-by: Long Li longli@microsoft.com --- drivers/net/ethernet/microsoft/mana/mana_en.c | 24 ++++++++++++------- include/net/mana/mana.h | 1 + 2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index d2f07e179e86..7d08e23c6749 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1788,7 +1788,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq) static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) { struct mana_cq *cq = context; - u8 arm_bit; int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue); @@ -1799,16 +1798,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) mana_poll_tx_cq(cq);
w = cq->work_done; - - if (w < cq->budget && - napi_complete_done(&cq->napi, w)) { - arm_bit = SET_ARM_BIT; - } else { - arm_bit = 0; + cq->work_done_since_doorbell += w; + + if (w < cq->budget) { + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); + cq->work_done_since_doorbell = 0; + napi_complete_done(&cq->napi, w); + } else if (cq->work_done_since_doorbell > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { + /* MANA hardware requires at least one doorbell ring every 8 + * wraparounds of CQ even there is no need to ARM. This driver + * rings the doorbell as soon as we have execceded 4 + * wraparounds. + */ + mana_gd_ring_cq(gdma_queue, 0); + cq->work_done_since_doorbell = 0; }
- mana_gd_ring_cq(gdma_queue, arm_bit); - return w; }
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 6439fd8b437b..7caa334f4888 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -275,6 +275,7 @@ struct mana_cq { /* NAPI data */ struct napi_struct napi; int work_done; + int work_done_since_doorbell; int budget; };
-----Original Message----- From: longli@linuxonhyperv.com longli@linuxonhyperv.com Sent: Monday, August 5, 2024 7:38 PM To: KY Srinivasan kys@microsoft.com; Haiyang Zhang haiyangz@microsoft.com; Wei Liu wei.liu@kernel.org; Dexuan Cui decui@microsoft.com; David S. Miller davem@davemloft.net; Eric Dumazet edumazet@google.com; Jakub Kicinski kuba@kernel.org; Paolo Abeni pabeni@redhat.com; Shradha Gupta shradhagupta@linux.microsoft.com; Simon Horman horms@kernel.org; Konstantin Taranov kotaranov@microsoft.com; Souradeep Chakrabarti schakrabarti@linux.microsoft.com; Erick Archer erick.archer@outlook.com; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux- rdma@vger.kernel.org Cc: Long Li longli@microsoft.com; stable@vger.kernel.org Subject: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
From: Long Li longli@microsoft.com
After napi_complete_done() is called, another NAPI may be running on another CPU and ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to ARM the CQ, this triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary rings.
Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Signed-off-by: Long Li longli@microsoft.com
Reviewed-by: Haiyang Zhang haiyangz@microsoft.com
Thank you.
在 2024/8/6 7:38, longli@linuxonhyperv.com 写道:
From: Long Li longli@microsoft.com
After napi_complete_done() is called, another NAPI may be running on another CPU and ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to ARM the CQ, this triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary rings.
Trivial problem^_^
s/ring/ringing ?
Zhu Yanjun
Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Signed-off-by: Long Li longli@microsoft.com
drivers/net/ethernet/microsoft/mana/mana_en.c | 24 ++++++++++++------- include/net/mana/mana.h | 1 + 2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index d2f07e179e86..7d08e23c6749 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1788,7 +1788,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq) static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) { struct mana_cq *cq = context;
- u8 arm_bit; int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue); @@ -1799,16 +1798,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) mana_poll_tx_cq(cq); w = cq->work_done;
- if (w < cq->budget &&
napi_complete_done(&cq->napi, w)) {
arm_bit = SET_ARM_BIT;
- } else {
arm_bit = 0;
- cq->work_done_since_doorbell += w;
- if (w < cq->budget) {
mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
cq->work_done_since_doorbell = 0;
napi_complete_done(&cq->napi, w);
- } else if (cq->work_done_since_doorbell >
cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
/* MANA hardware requires at least one doorbell ring every 8
* wraparounds of CQ even there is no need to ARM. This driver
* rings the doorbell as soon as we have execceded 4
* wraparounds.
*/
mana_gd_ring_cq(gdma_queue, 0);
}cq->work_done_since_doorbell = 0;
- mana_gd_ring_cq(gdma_queue, arm_bit);
- return w; }
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 6439fd8b437b..7caa334f4888 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -275,6 +275,7 @@ struct mana_cq { /* NAPI data */ struct napi_struct napi; int work_done;
- int work_done_since_doorbell; int budget; };
Subject: Re: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
在 2024/8/6 7:38, longli@linuxonhyperv.com 写道:
From: Long Li longli@microsoft.com
After napi_complete_done() is called, another NAPI may be running on another CPU and ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to ARM the CQ, this triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary rings.
Trivial problem^_^
s/ring/ringing ?
Zhu Yanjun
I'm sending v2 to fix this.
Thanks,
Long
From: longli@linuxonhyperv.com longli@linuxonhyperv.com Sent: Monday, August 5, 2024 4:38 PM [...] After napi_complete_done() is called, another NAPI may be running on another CPU and ring the doorbell before the current CPU does. When
Can you please share more details about "another NAPI"? Is it about busy_poll?
combined with unnecessary rings when there is no need to ARM the CQ, this triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary rings.
I'm not sure what "error paths in the hardware" means. It's better to describe the user-visible consequence.
Maybe this is clearer:
When there is no need to arm the CQ from NAPI's perspective, the driver must not combine "too many" arming operations due to a MANA hardware requirement: the driver must ring the doorbell at least once within every 8 wraparounds of the CQ, otherwise "XXX" would happen. //Dexuan: I don't know what the "XXX" is
Add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed within 4 wraparounds of the CQ. //Dexuan: why not 8 or 7?
- if (w < cq->budget) {
mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
cq->work_done_since_doorbell = 0;
napi_complete_done(&cq->napi, w);
- } else if (cq->work_done_since_doorbell >
cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
/* MANA hardware requires at least one doorbell ring every 8
s/ring every 8/arming within every 8/ ?
* wraparounds of CQ even there is no need to ARM. This
driver
s/ARM/arming/ ? s/even/even if/ ?
Thanks, Dexuan
Subject: RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
From: longli@linuxonhyperv.com longli@linuxonhyperv.com Sent: Monday, August 5, 2024 4:38 PM [...] After napi_complete_done() is called, another NAPI may be running on another CPU and ring the doorbell before the current CPU does. When
Can you please share more details about "another NAPI"? Is it about busy_poll?
combined with unnecessary rings when there is no need to ARM the CQ, this triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary rings.
I'm not sure what "error paths in the hardware" means. It's better to describe the user-visible consequence.
Maybe this is clearer:
When there is no need to arm the CQ from NAPI's perspective, the driver must not combine "too many" arming operations due to a MANA hardware requirement: the driver must ring the doorbell at least once within every 8 wraparounds of the CQ, otherwise "XXX" would happen. //Dexuan: I don't know what the "XXX" is
Add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed within 4 wraparounds of the CQ. //Dexuan: why not 8 or 7?
I'm sending v2 to address the details in the comments.
- if (w < cq->budget) {
mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
cq->work_done_since_doorbell = 0;
napi_complete_done(&cq->napi, w);
- } else if (cq->work_done_since_doorbell >
cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
/* MANA hardware requires at least one doorbell ring every 8
s/ring every 8/arming within every 8/ ?
* wraparounds of CQ even there is no need to ARM. This
driver
s/ARM/arming/ ? s/even/even if/ ?
Will fix this in v2.
Thanks,
Long
linux-stable-mirror@lists.linaro.org