On Sun, Mar 06, 2022 at 12:30:55PM -0800, Linus Torvalds wrote:
So I would expect that
(a) READ/WRITE actually fills the whole buffer
(b) READ/WRITE are the only ones where we care about performance at a bounce-buffer level
so it boils down to "do we still do this horrible memcpy even for regular IO commands"? Because that would, in my opinion, just be stupid.
For one thing this is not just for block I/O, but for all DMA. Second, "normal" I/O might always fail, including after partial transfers. SCSI even considers that normal. Network devices consider it normal to not fill the entiret receive buffers, etc.
In short: anything that operates directly on user memory is a trivial reproducer here. The CVE uses SG_IO, but staying in block land direct I/O will work just the same because swiotlb will copy back the uninitialized data to user memory after an I/O failure.
What we've been thinking of is a version of the dma map calls where the unmap gets passed how much data actually was transferred and only copies that out. But that seems like the only sane interface.
Now IFF we known that the buffer is never looked at on I/O failure or short I/O we could do away with all this. But without adding new interfaces where the caller guarantees that we can't know that. For userspace memory it is guaranteed to be not true. For kernel memory is most likely is true, but there's some amazingly awful pieces of code that probably still get it wrong.