On Tue, Oct 10, 2023 at 1:16 PM Christoph Hellwig hch@lst.de wrote:
On Fri, Oct 06, 2023 at 07:17:06PM +0530, Kanchan Joshi wrote:
Same issue is possible for extended-lba case also. When user specifies a short unaligned buffer, the kernel makes a copy and uses that for DMA.
I fail to understand the extent LBA case, and also from looking at the code mixing it up with validation of the metadata_len seems very confusion. Can you try to clearly explain it and maybe split it into a separate patch?
The case is for the single interleaved buffer with both data and metadata. When the driver sends this buffer to blk_rq_map_user_iov(), it may make a copy of it. This kernel buffer will be used for DMA rather than user buffer. If the user-buffer is short, the kernel buffer is also short.
Does this explanation help? I can move the part to a separate patch.
Fixes: 456cba386e94 ("nvme: wire-up uring-cmd support for io-passthru on char-device")
Is this really io_uring specific? I think we also had the same issue before and this should go back to adding metadata support to the general passthrough ioctl?
Yes, not io_uring specific. Just that I was not sure on (i) whether to go back that far in history, and (ii) what patch to tag.
+static inline bool nvme_nlb_in_cdw12(u8 opcode) +{
switch (opcode) {
case nvme_cmd_read:
case nvme_cmd_write:
case nvme_cmd_compare:
case nvme_cmd_zone_append:
return true;
}
return false;
+}
Nitpick: I find it nicer to read to have a switch that catches everything with a default statement instead of falling out of it for checks like this. It's not making any different in practice but just reads a little nicer.
Sure, I can change it.
/* Exclude commands that do not have nlb in cdw12 */
if (!nvme_nlb_in_cdw12(c->common.opcode))
return true;
So we can still get exactly the same corruption for all commands that are not known? That's not a very safe way to deal with the issue..
Given the way things are in NVMe, I do not find a better way. Maybe another day for commands that do (or can do) things very differently for nlb and PI representation.
control = upper_16_bits(le32_to_cpu(c->common.cdw12));
/* Exclude when meta transfer from/to host is not done */
if (control & NVME_RW_PRINFO_PRACT && ns->ms == ns->pi_size)
return true;
nlb = lower_16_bits(le32_to_cpu(c->common.cdw12));
I'd use the rw field of the union and the typed control and length fields to clean this up a bit.
if (bdev && meta_buffer && meta_len) {
if (!nvme_validate_passthru_meta(ns, nvme_req(req)->cmd,
meta_len, bufflen)) {
ret = -EINVAL;
goto out_unmap;
}
meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
I'd move the check into nvme_add_user_metadata to keep it out of the hot path.
FYI: here is what I'd do for the external metadata only case:
Since you have improvised comments too, I may just use this for the next iteration.