Hi,
On Thu, Jul 5, 2018 at 7:31 AM, Antti Seppälä a.seppala@gmail.com wrote:
The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way") introduced a common way to align DMA allocations. The code in the commit aligns the struct dma_aligned_buffer but the actual DMA address pointed by data[0] gets aligned to an offset from the allocated boundary by the kmalloc_ptr and the old_xfer_buffer pointers.
This is against the recommendation in Documentation/DMA-API.txt which states:
Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries).
The effect of this is that architectures with non-coherent DMA caches may run into memory corruption or kernel crashes with Unhandled kernel unaligned accesses exceptions.
Fix the alignment by positioning the DMA area in front of the allocation and use memory at the end of the area for storing the orginal transfer_buffer pointer. This may have the added benefit of increased performance as the DMA area is now fully aligned on all architectures.
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way")
Signed-off-by: Antti Seppälä a.seppala@gmail.com
drivers/usb/dwc2/hcd.c | 44 +++++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 21 deletions(-)
Thanks for tracking this down and sorry for the original regression. Seems like a good fix. With this fix, I'd be curious of your observations on how dwc2 performs (both performance and compatibility under stress) with the newest driver compared to whatever you were using before.
Also: you're using the dwc2_set_ltq_params() parameters? Have you checked if removing the "max_transfer_size" limit boosts your performance?
Cc: stable@vger.kernel.org Reviewed-by: Douglas Anderson dianders@chromium.org
On 6 July 2018 at 18:57, Doug Anderson dianders@chromium.org wrote:
Hi,
Thanks for tracking this down and sorry for the original regression. Seems like a good fix. With this fix, I'd be curious of your observations on how dwc2 performs (both performance and compatibility under stress) with the newest driver compared to whatever you were using before.
My totally not scientifically accurate performance test included running iperf through my LTE dongle that was connected to dwc2. I saw throughput increase in download speeds.
Before (kernel 4.9.109 with the offending commit reverted) iperf reported download bandwidth at 33.2 Mbits/sec
Using newest dwc2 driver after applying "Fix DMA alignment to start at allocated boundary" patch I got 38.2 Mbits/sec
If I also apply the "Fix inefficient copy of unaligned buffers" patch I could achieve a total throughput for download around 44.6 Mbits/sec which I believe is capped by my 50Mbit/s subscription.
Also: you're using the dwc2_set_ltq_params() parameters? Have you checked if removing the "max_transfer_size" limit boosts your performance?
Yes, I'm using the parameters set there. I tried removing max_transfer_size but it did not have noticeable impact on the performance in my tests.
Cc: stable@vger.kernel.org Reviewed-by: Douglas Anderson dianders@chromium.org
Thanks for reviewing :)
-Antti
linux-stable-mirror@lists.linaro.org