On Thu, 09 Nov 2023 13:24:48 +0100 Niklas Schnelle schnelle@linux.ibm.com wrote:
On Wed, 2023-11-08 at 13:21 +0100, Petr Tesařík wrote:
On Wed, 8 Nov 2023 12:12:49 +0100 Petr Tesarik petrtesarik@huaweicloud.com wrote:
From: Petr Tesarik petr.tesarik1@huawei-partners.com
Limit the free list length to the size of the IO TLB. Transient pool can be smaller than IO_TLB_SEGSIZE, but the free list is initialized with the assumption that the total number of slots is a multiple of IO_TLB_SEGSIZE. As a result, swiotlb_area_find_slots() may allocate slots past the end of a transient IO TLB buffer.
Just to make it clear, this patch addresses only the memory corruption reported by Niklas, without addressing the underlying issues. Where corruption happened before, allocations will fail with this patch.
I am still looking into improving the allocation strategy itself.
Petr T
I know this has already been applied but for what its worth I did finally manage to test this with my reproducer and the allocation overrun is fixed by this change. I also confirmed that at least my ConnectX VF TCP/IP test case seems to handle the DMA error gracefully enough.
Thank you for testing!
Inded, the failed request is often retried at a later time. For example I tested with a SCSI driver, and by the time the SCSI layer retried the request, a new standard pool was already available. But this situation is not ideal. If nothing else, it incurs an unnecessary delay.
Petr T