From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Reviewed-by: Dexuan Cui decui@microsoft.com Signed-off-by: Long Li longli@microsoft.com --- Change log: v2: Check for comp_read > 0 as it might be negative on completion error. Set rq.wqe_cnt to 0 according to BNIC spec.
v3: Add details in the commit on the reason of performance increase and test numbers. Add details in the commit on why rq.wqe_cnt should be set to 0 according to hardware spec. Add "Reviewed-by" from Haiyang and Dexuan.
drivers/net/ethernet/microsoft/mana/gdma_main.c | 5 ++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 10 ++++++++-- 2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index 8f3f78b68592..3765d3389a9a 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -300,8 +300,11 @@ static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
void mana_gd_wq_ring_doorbell(struct gdma_context *gc, struct gdma_queue *queue) { + /* Hardware Spec specifies that software client should set 0 for + * wqe_cnt for Receive Queues. This value is not used in Send Queues. + */ mana_gd_ring_doorbell(gc, queue->gdma_dev->doorbell, queue->type, - queue->id, queue->head * GDMA_WQE_BU_SIZE, 1); + queue->id, queue->head * GDMA_WQE_BU_SIZE, 0); }
void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit) diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index cd4d5ceb9f2d..1d8abe63fcb8 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1383,8 +1383,8 @@ static void mana_post_pkt_rxq(struct mana_rxq *rxq)
recv_buf_oob = &rxq->rx_oobs[curr_index];
- err = mana_gd_post_and_ring(rxq->gdma_rq, &recv_buf_oob->wqe_req, - &recv_buf_oob->wqe_inf); + err = mana_gd_post_work_request(rxq->gdma_rq, &recv_buf_oob->wqe_req, + &recv_buf_oob->wqe_inf); if (WARN_ON_ONCE(err)) return;
@@ -1654,6 +1654,12 @@ static void mana_poll_rx_cq(struct mana_cq *cq) mana_process_rx_cqe(rxq, cq, &comp[i]); }
+ if (comp_read > 0) { + struct gdma_context *gc = rxq->gdma_rq->gdma_dev->gdma_context; + + mana_gd_wq_ring_doorbell(gc, rxq->gdma_rq); + } + if (rxq->xdp_flush) xdp_do_flush(); }
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
In any case you must avoid empty lines in the tag area.
If you really intend targeting the -net tree, please repost fixing the above and explicitly specifying the target tree in the subj prefix.
thanks!
Paolo
On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote:
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
And please split this cleanup into a separate patch, it doesn't sound like it has to be done as part of the optimization.
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote:
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
And please split this cleanup into a separate patch, it doesn't sound like it has to be done as part of the optimization.
Will do, thanks.
Long
-----Original Message----- From: Paolo Abeni pabeni@redhat.com Sent: Thursday, June 29, 2023 4:43 AM To: longli@linuxonhyperv.com; Jason Gunthorpe jgg@ziepe.ca; Leon Romanovsky leon@kernel.org; Ajay Sharma sharmaajay@microsoft.com; Dexuan Cui decui@microsoft.com; KY Srinivasan kys@microsoft.com; Haiyang Zhang haiyangz@microsoft.com; Wei Liu wei.liu@kernel.org; David S. Miller davem@davemloft.net; Eric Dumazet edumazet@google.com; Jakub Kicinski kuba@kernel.org Cc: linux-rdma@vger.kernel.org; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Long Li longli@microsoft.com; stable@vger.kernel.org Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
This web page shows the net-next is "open": http://vger.kernel.org/~davem/net-next.html
Is this still the right place to check net-next status?
- Haiyang
On Thu, 29 Jun 2023 16:53:42 +0000 Haiyang Zhang wrote:
This web page shows the net-next is "open": http://vger.kernel.org/~davem/net-next.html
Is this still the right place to check net-next status?
We're working on fixing it. Unfortunately it's a private page and most of the netdev maintainers don't have access to changing it :(
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix.
In any case you must avoid empty lines in the tag area.
If you really intend targeting the -net tree, please repost fixing the above and explicitly specifying the target tree in the subj prefix.
Will do, thank you.
Long
thanks!
Paolo
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix.
Let me ask the question the other way around: is there any specific reason to have this fix into 6.5 and all the way back to 5.13? Especially the latest bit (CC-ing stable) looks at least debatable.
Thanks,
Paolo
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix.
Let me ask the question the other way around: is there any specific reason to have this fix into 6.5 and all the way back to 5.13? Especially the latest bit (CC-ing stable) looks at least debatable.
There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.
Thanks,
Long
On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote:
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix.
Let me ask the question the other way around: is there any specific reason to have this fix into 6.5 and all the way back to 5.13? Especially the latest bit (CC-ing stable) looks at least debatable.
There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.
Why can't they be upgraded to get that performance target, and all the other goodness that those kernels have? We don't normally backport new features, right?
thanks,
greg k-h
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote:
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote:
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com
wrote:
From: Long Li longli@microsoft.com
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance.
Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll().
With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues.
Tests showed no regression in network latency benchmarks on single connection.
While we are making changes in this code path, change the code for ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The hardware specification specifies that it should set to 0. Although currently the hardware doesn't enforce the check, in the future releases it may do.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now).
This issue is a blocker for usage on 200G physical link. I think it can be categorized as a fix.
Let me ask the question the other way around: is there any specific reason to have this fix into 6.5 and all the way back to 5.13? Especially the latest bit (CC-ing stable) looks at least debatable.
There are many deployed Linux distributions with MANA driver on kernel
5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.
Why can't they be upgraded to get that performance target, and all the other goodness that those kernels have? We don't normally backport new features, right?
I think this should be considered as a fix, not a new feature.
MANA is designed to be 200GB full duplex at the start. Due to lack of hardware testing capability at early stage of the project, we could only test 100GB for the Linux driver. When hardware is fully capable of reaching designed spec, this bug in the Linux driver shows up.
Thanks,
Long
thanks,
greg k-h
On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.
Why can't they be upgraded to get that performance target, and all the other goodness that those kernels have? We don't normally backport new features, right?
I think this should be considered as a fix, not a new feature.
MANA is designed to be 200GB full duplex at the start. Due to lack of hardware testing capability at early stage of the project, we could only test 100GB for the Linux driver. When hardware is fully capable of reaching designed spec, this bug in the Linux driver shows up.
That part we understand.
If I were you I'd try to convince Greg and Paolo that the change is small and significant for user experience. And answer Greg's question why upgrading the kernel past 6.1 is a challenge in your environment.
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target.
Why can't they be upgraded to get that performance target, and all the other goodness that those kernels have? We don't normally backport new features, right?
I think this should be considered as a fix, not a new feature.
MANA is designed to be 200GB full duplex at the start. Due to lack of hardware testing capability at early stage of the project, we could only test 100GB for the Linux driver. When hardware is fully capable of reaching designed spec, this bug in the Linux driver shows up.
That part we understand.
If I were you I'd try to convince Greg and Paolo that the change is small and significant for user experience. And answer Greg's question why upgrading the kernel past 6.1 is a challenge in your environment.
I was under the impression that this patch was considered to be a feature, not a bug fix. I was trying to justify that the "Fixes:" tag was needed.
I apologize for misunderstanding this.
Without this fix, it's not possible to run a typical workload designed for 200Gb physical link speed.
We see a large number of customers and Linux distributions committed on 5.15 and 6.1 kernels. They planned the product cycles and certification processes around these longterm kernel versions. It's difficult for them to upgrade to newer kernel versions.
Thanks,
Long
On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote:
> > Subject: Re: [Patch v3] net: mana: Batch ringing RX > > queue > > doorbell > > on receiving > > packets > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those > > > > > > > > > > > > > > kernels are longterm) > > > > > > > > > > > > > > They need > > > > > > > > > > > > > > this > > > > > > > > > > > > > > fix to achieve the performance > > > > > > > > > > > > > > target. > > > > > > > > > > > > > > > > > > > > Why can't they be upgraded to get that > > > > > > > > > > performance > > > > > > > > > > target, and > > > > > > > > > > all > > > > > > > > > > the other goodness that those kernels > > > > > > > > > > have? We don't > > > > > > > > > > normally > > > > > > > > > > backport new features, right? > > > > > > > > > > > > I think this should be considered as a fix, not > > > > > > a new > > > > > > feature. > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the > > > > > > start. Due > > > > > > to > > > > > > lack of > > > > > > hardware testing capability at early stage of > > > > > > the project, > > > > > > we > > > > > > could > > > > > > only test 100GB for the Linux driver. When > > > > > > hardware is > > > > > > fully > > > > > > capable > > > > > > of reaching designed spec, this bug in the > > > > > > Linux driver > > > > > > shows up. > > > > That part we understand. > > > > If I were you I'd try to convince Greg and Paolo that > > the > > change is > > small and > > significant for user experience. And answer Greg's > > question why > > upgrading the > > kernel past 6.1 is a challenge in your environment.
I was under the impression that this patch was considered to be a feature, not a bug fix. I was trying to justify that the "Fixes:" tag was needed.
I apologize for misunderstanding this.
Without this fix, it's not possible to run a typical workload designed for 200Gb physical link speed.
We see a large number of customers and Linux distributions committed on 5.15 and 6.1 kernels. They planned the product cycles and certification processes around these longterm kernel versions. It's difficult for them to upgrade to newer kernel versions.
I think there are some misunderstanding WRT distros and stable kernels. (Commercial) distros will backport the patch as needed, regardless such patch landing in the 5.15 upstream tree or not. Individual users running their own vanilla 5.15 kernel can't expect performance improvement landing there.
All in all I feel undecided. I would endorse this change going trough net-next (without the stable tag). I would feel less torn with this change targeting -net without the stable tag. Targeting -net with the stable tag sounds a bit too much to me.
Cheers, Paolo
Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote:
> > > Subject: Re: [Patch v3] net: mana: Batch ringing RX > > > queue doorbell on receiving packets > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those > > > > > > > > > > > > > > > kernels are longterm) They need > > > > > > > > > > > > > > > this fix to achieve the > > > > > > > > > > > > > > > performance target. > > > > > > > > > > > > > > > > > > > > > > Why can't they be upgraded to get that > > > > > > > > > > > performance target, and all the other > > > > > > > > > > > goodness that those kernels have? We > > > > > > > > > > > don't normally backport new features, > > > > > > > > > > > right? > > > > > > > > > > > > > > I think this should be considered as a fix, not > > > > > > > a new feature. > > > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the > > > > > > > start. Due to lack of hardware testing > > > > > > > capability at early stage of the project, we > > > > > > > could only test 100GB for the Linux driver. When > > > > > > > hardware is fully capable of reaching designed > > > > > > > spec, this bug in the Linux driver shows up. > > > > > > That part we understand. > > > > > > If I were you I'd try to convince Greg and Paolo that > > > the change is small and significant for user experience. > > > And answer Greg's question why upgrading the kernel past > > > 6.1 is a challenge in your environment.
I was under the impression that this patch was considered to be a feature, not a bug fix. I was trying to justify that the "Fixes:" tag was needed.
I apologize for misunderstanding this.
Without this fix, it's not possible to run a typical workload designed for 200Gb physical link speed.
We see a large number of customers and Linux distributions committed on 5.15 and 6.1 kernels. They planned the product cycles and certification processes around these longterm kernel versions. It's difficult for them to upgrade to newer kernel versions.
I think there are some misunderstanding WRT distros and stable kernels. (Commercial) distros will backport the patch as needed, regardless such patch landing in the 5.15 upstream tree or not. Individual users running their own vanilla 5.15 kernel can't expect performance improvement landing there.
All in all I feel undecided. I would endorse this change going trough net-next (without the stable tag). I would feel less torn with this change targeting -net without the stable tag. Targeting -net with the stable tag sounds a bit too much to me.
Cheers, Paolo
I'm sending this patch to net-next without stable tag.
Thanks,
Long
linux-stable-mirror@lists.linaro.org