Unfortunately, we ended up with the wrong version of the patch "fix info leak with DMA_FROM_DEVICE" getting merged. We got v4 merged, but the version we want is v7 with some minor tweaks which were supposed to be applied by Christoph (swiotlb maintainer). After pointing this out, I was asked by Christoph to create an incremental fix.
IMHO the cleanest way to do this is a reverting the incorrect version of the patch and applying the correct one. I hope that qualifies as an incremental fix.
The main differences between what we got and what we need are: * swiotlb_sync_single_for_device is also required to do an extra bounce * It was decided that we want to defer introducing DMA_ATTR_OVERWRITE, until we have exploiters * And anyway DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC, so the v4 implementation of DMA_ATTR_OVERWRITE ain't even orrect.
Halil Pasic (2): Revert "swiotlb: fix info leak with DMA_FROM_DEVICE" swiotlb: fix info leak with DMA_FROM_DEVICE
Documentation/core-api/dma-attributes.rst | 8 -------- include/linux/dma-mapping.h | 8 -------- kernel/dma/swiotlb.c | 23 +++++++++++++++-------- 3 files changed, 15 insertions(+), 24 deletions(-)
base-commit: 38f80f42147ff658aff218edb0a88c37e58bf44f
This reverts commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e. Signed-off-by: Halil Pasic pasic@linux.ibm.com --- Documentation/core-api/dma-attributes.rst | 8 -------- include/linux/dma-mapping.h | 8 -------- kernel/dma/swiotlb.c | 3 +-- 3 files changed, 1 insertion(+), 18 deletions(-)
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst index 17706dc91ec9..1887d92e8e92 100644 --- a/Documentation/core-api/dma-attributes.rst +++ b/Documentation/core-api/dma-attributes.rst @@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged subsystem that the buffer is fully accessible at the elevated privilege level (and ideally inaccessible or at least read-only at the lesser-privileged levels). - -DMA_ATTR_OVERWRITE ------------------- - -This is a hint to the DMA-mapping subsystem that the device is expected to -overwrite the entire mapped size, thus the caller does not require any of the -previous buffer contents to be preserved. This allows bounce-buffering -implementations to optimise DMA_FROM_DEVICE transfers. diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 6150d11a607e..dca2b1355bb1 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -61,14 +61,6 @@ */ #define DMA_ATTR_PRIVILEGED (1UL << 9)
-/* - * This is a hint to the DMA-mapping subsystem that the device is expected - * to overwrite the entire mapped size, thus the caller does not require any - * of the previous buffer contents to be preserved. This allows - * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. - */ -#define DMA_ATTR_OVERWRITE (1UL << 10) - /* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index bfc56cb21705..f1e7ea160b43 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -628,8 +628,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, mem->slots[index + i].orig_addr = slot_addr(orig_addr, i); tlb_addr = slot_addr(mem->start, index) + offset; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE || - dir == DMA_BIDIRECTIONAL)) + (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE); return tlb_addr; }
On Fri, Mar 04, 2022 at 02:58:58PM +0100, Halil Pasic wrote:
This reverts commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e.
Why???
Signed-off-by: Halil Pasic pasic@linux.ibm.com
You need a blank line before this one.
also:
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
On Fri, 4 Mar 2022 15:28:44 +0100 Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Mar 04, 2022 at 02:58:58PM +0100, Halil Pasic wrote:
This reverts commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e.
Why???
TLDR; We got v4 merged instead of v7
Signed-off-by: Halil Pasic pasic@linux.ibm.com
You need a blank line before this one.
Sorry I wasn't careful enough and checkpatch.pl didn't complain about it.
Regards, Halil
also:
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
On Fri, Mar 04, 2022 at 05:34:47PM +0100, Halil Pasic wrote:
On Fri, 4 Mar 2022 15:28:44 +0100 Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Mar 04, 2022 at 02:58:58PM +0100, Halil Pasic wrote:
This reverts commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e.
Why???
TLDR; We got v4 merged instead of v7
That makes no sense at all to me, you need to describe it in detail.
You know better than this :(
thanks,
greg k-h
On Fri, 4 Mar 2022 17:55:40 +0100 Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Mar 04, 2022 at 05:34:47PM +0100, Halil Pasic wrote:
On Fri, 4 Mar 2022 15:28:44 +0100 Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Mar 04, 2022 at 02:58:58PM +0100, Halil Pasic wrote:
This reverts commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e.
Why???
TLDR; We got v4 merged instead of v7
That makes no sense at all to me, you need to describe it in detail.
You know better than this :(
I have described the why in the cover letter of the series. Let me copy-paste it here:
Unfortunately, we ended up with the wrong version of the patch "fix info leak with DMA_FROM_DEVICE" getting merged. We got v4 merged, but the version we want is v7 with some minor tweaks which were supposed to be applied by Christoph (swiotlb maintainer). After pointing this out, I was asked by Christoph to create an incremental fix.
IMHO the cleanest way to do this is a reverting the incorrect version of the patch and applying the correct one. I hope that qualifies as an incremental fix.
The main differences between what we got and what we need are: * swiotlb_sync_single_for_device is also required to do an extra bounce * It was decided that we want to defer introducing DMA_ATTR_OVERWRITE, until we have exploiters * And anyway DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC, so the v4 implementation of DMA_ATTR_OVERWRITE ain't even correct.
Describing it in the revert commit would have been a wiser choice, I agree.
BTW the consensus seems to be, that a revert should be avoided, so I will send a single-patch version of this fix soon(ish).
Sorry for the inconvenience!
Regards, Halil
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204.
A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails.
This is an swiotlb problem, because the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved), and without swiotlb we leak all zeros. Even if swiotlb_tbl_map_single() zero-initialised the allocated slot(s) to avoid leaking stale data back to the caller later, when it comes to unmap or sync_for_cpu it still fundamentally cannot tell how much of the contents of the bounce slot have actually changed, therefore if the caller was expecting the device to do a partial write, the rest of the mapped buffer *will* be corrupted by bouncing the whole thing back again.
Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that.
The extra bounce is expected to hurt the performance. For the cases where the extra bounce is not necessary we could get rid of it, if we were told by the client code, that it is not needed. Such optimisations are out of scope for this patch, and are likely to be a subject of some future work.
Signed-off-by: Halil Pasic pasic@linux.ibm.com Reported-by: Ali Haider ali.haider@ibm.com Reviewed-by: Christoph Hellwig hch@lst.de --- kernel/dma/swiotlb.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index f1e7ea160b43..6db1c475ec82 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -627,9 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, for (i = 0; i < nr_slots(alloc_size + offset); i++) mem->slots[index + i].orig_addr = slot_addr(orig_addr, i); tlb_addr = slot_addr(mem->start, index) + offset; - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE); + /* + * When dir == DMA_FROM_DEVICE we could omit the copy from the orig + * to the tlb buffer, if we knew for sure the device will + * overwirte the entire current content. But we don't. Thus + * unconditional bounce may prevent leaking swiotlb content (i.e. + * kernel memory) to user-space. + */ + swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE); return tlb_addr; }
@@ -696,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr, void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { - if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) - swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE); - else - BUG_ON(dir != DMA_FROM_DEVICE); + /* + * Unconditional bounce is necessary to avoid corruption on + * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite + * the whole lengt of the bounce buffer. + */ + swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE); + BUG_ON(!valid_dma_direction(dir)); }
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
On Fri, Mar 04, 2022 at 02:58:59PM +0100, Halil Pasic wrote:
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204.
A short description of what happens follows:
- The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device.
- In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO.
- For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV).
- When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer.
- The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
This is an swiotlb problem, because the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved), and without swiotlb we leak all zeros. Even if swiotlb_tbl_map_single() zero-initialised the allocated slot(s) to avoid leaking stale data back to the caller later, when it comes to unmap or sync_for_cpu it still fundamentally cannot tell how much of the contents of the bounce slot have actually changed, therefore if the caller was expecting the device to do a partial write, the rest of the mapped buffer *will* be corrupted by bouncing the whole thing back again.
Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that.
The extra bounce is expected to hurt the performance. For the cases where the extra bounce is not necessary we could get rid of it, if we were told by the client code, that it is not needed. Such optimisations are out of scope for this patch, and are likely to be a subject of some future work.
Signed-off-by: Halil Pasic pasic@linux.ibm.com Reported-by: Ali Haider ali.haider@ibm.com Reviewed-by: Christoph Hellwig hch@lst.de
kernel/dma/swiotlb.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
On Fri, Mar 04, 2022 at 02:58:57PM +0100, Halil Pasic wrote:
Unfortunately, we ended up with the wrong version of the patch "fix info leak with DMA_FROM_DEVICE" getting merged. We got v4 merged, but the version we want is v7 with some minor tweaks which were supposed to be applied by Christoph (swiotlb maintainer). After pointing this out, I was asked by Christoph to create an incremental fix.
IMHO the cleanest way to do this is a reverting the incorrect version of the patch and applying the correct one. I hope that qualifies as an incremental fix.
I'd really do one patch to move to the expected state. I'd volunteer to merge the two patches, but I've recently shown that I'm not exactly good at that..
On Fri, 4 Mar 2022 07:53:48 -0800 Christoph Hellwig hch@infradead.org wrote:
On Fri, Mar 04, 2022 at 02:58:57PM +0100, Halil Pasic wrote:
Unfortunately, we ended up with the wrong version of the patch "fix info leak with DMA_FROM_DEVICE" getting merged. We got v4 merged, but the version we want is v7 with some minor tweaks which were supposed to be applied by Christoph (swiotlb maintainer). After pointing this out, I was asked by Christoph to create an incremental fix.
IMHO the cleanest way to do this is a reverting the incorrect version of the patch and applying the correct one. I hope that qualifies as an incremental fix.
I'd really do one patch to move to the expected state. I'd volunteer to merge the two patches, but I've recently shown that I'm not exactly good at that..
No problem, I can do that. It isn't hard to squash things together, but when I was about to write the commit message, I had the feeling doing a revert is cleaner.
Any other opinions?
Regards, Halil
linux-stable-mirror@lists.linaro.org