On 7/16/25 04:15, Rick Wertenbroek wrote:
Have nvmet_req_init() and req->execute() complete failed commands.
Description of the problem: nvmet_req_init() calls __nvmet_req_complete() internally upon failure, e.g., unsupported opcode, which calls the "queue_response" callback, this results in nvmet_pci_epf_queue_response() being called, which will call nvmet_pci_epf_complete_iod() if data_len is 0 or if dma_dir is different from DMA_TO_DEVICE. This results in a double completion as nvmet_pci_epf_exec_iod_work() also calls nvmet_pci_epf_complete_iod() when nvmet_req_init() fails.
Steps to reproduce: On the host send a command with an unsupported opcode with nvme-cli, For example the admin command "security receive" $ sudo nvme security-recv /dev/nvme0n1 -n1 -x4096
This triggers a double completion as nvmet_req_init() fails and nvmet_pci_epf_queue_response() is called, here iod->dma_dir is still in the default state of "DMA_NONE" as set by default in nvmet_pci_epf_alloc_iod(), so nvmet_pci_epf_complete_iod() is called. Because nvmet_req_init() failed nvmet_pci_epf_complete_iod() is also called in nvmet_pci_epf_exec_iod_work() leading to a double completion. This not only sends two completions to the host but also corrupts the state of the PCI NVMe target leading to kernel oops.
This patch lets nvmet_req_init() and req->execute() complete all failed commands, and removes the double completion case in nvmet_pci_epf_exec_iod_work() therefore fixing the edge cases where double completions occurred.
Signed-off-by: Rick Wertenbroekrick.wertenbroek@gmail.com Reviewed-by: Damien Le Moaldlemoal@kernel.org Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Cc:stable@vger.kernel.org
Good catch, looks good, I wish we have tests for this part of target to it will get tested on regular basis, not the requirement, just a thought.
Reviewed-by: Chaitanya Kulkarni kch@nvidia.com
-ck