On Thu, Sep 7, 2023 at 9:11 PM Keith Busch kbusch@kernel.org wrote:
On Wed, Sep 06, 2023 at 09:18:15PM +0530, Kanchan Joshi wrote:
Would you really prefer to have nvme_add_user_metadata() changed to do away with allocation and use userspace meta-buffer directly?
I mean, sure, if it's possible. We can avoid a costly copy if the user metabuffer is aligned and physically contiguous.
Seems possible, but that does not really solve the actual problem (which is not performance) this patch is for. It will require replicating big code of blk_rq_map_user_iov() for integrity metadata and map the pages into bip. But since the user-space meta buffer can be unaligned (and bunch of other conditions present there), it has to make the meta copy in kernel-space. And we will be back to where we started - how to avoid corruption into kernel memory.
Same situation for the case when we are dealing with extended-lba-format and interleaved user-buffer is unaligned. .
Handling both these anyway requires adding the kind of code/checks mentioned in the previous patch. Do you see another way?
While I agree there is value in avoiding the meta copy in general and I can look into it, but that should be a separate effort (with focus on performance).
Even with that route, extended-lba-with-short-unaligned-buffer remains unhandled. That will still require similar checks that I would like to avoid but cannnot.
So how about this -
There's lots of bad things you can do with this interface. Example, provide an unaligned single byte user buffer and send an Identify command.
Not sure I follow. Do you mean the patch does not handle these cases?
We never provided opcode decoding sanity checks before because it's a bad maintenance burden, adds performance killing overhead, couldn't catch all the cases anyway due to vendor specific and future opcodes, and harms the flexibility of the interface.
Given the way things are (integrity schemes, cdw12 etc.), I do not see a way to avoid opcode checks. Flexibility is not getting reduced in the previous patch. All the other commands (beyond read/write/compare/append) remain untouched. And metadata-io is not the fast path at the moment (given memory allocation, bunch of extra things by blk-integrity, bunch of extra things done by device etc.).
The burden is usually on the user for these kinds of priviledged interfaces: if you abuse it, "you get to keep both pieces" territory.
Not sure I got that. Have you seen the crash mentioned in this cover letter: https://lore.kernel.org/linux-nvme/20230811155906.15883-1-joshi.k@samsung.co... A simple unprivileged read by a rogue application can wreck other applications/system at the moment. Is it fine to keep the status quo?