There isn't sg iterator helper for building bvec, so invent one and fix the issue in nvmet_file_init_bvec().
The issue is that one sg may include multipge continuous pages, and only the 1st .bv_offset isn't zero, also the length for the last bvec has to consider the remained length.
Reported-by: Yi Zhang yi.zhang@redhat.com Fixes: d5eff33ee6f8("nvmet: add simple file backed ns support") Cc: Yi Zhang yi.zhang@redhat.com Cc: Sagi Grimberg sagi@grimberg.me Cc: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com --- drivers/nvme/target/io-cmd-file.c | 57 ++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 28 deletions(-)
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c index 3e43212d3c1c..1c76e9e2a474 100644 --- a/drivers/nvme/target/io-cmd-file.c +++ b/drivers/nvme/target/io-cmd-file.c @@ -75,13 +75,6 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns) return ret; }
-static void nvmet_file_init_bvec(struct bio_vec *bv, struct sg_page_iter *iter) -{ - bv->bv_page = sg_page_iter_page(iter); - bv->bv_offset = iter->sg->offset; - bv->bv_len = PAGE_SIZE - iter->sg->offset; -} - static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos, unsigned long nr_segs, size_t count, int ki_flags) { @@ -129,13 +122,14 @@ static void nvmet_file_io_done(struct kiocb *iocb, long ret, long ret2) static bool nvmet_file_execute_io(struct nvmet_req *req, int ki_flags) { ssize_t nr_bvec = DIV_ROUND_UP(req->data_len, PAGE_SIZE); - struct sg_page_iter sg_pg_iter; + struct scatterlist *sg; + struct bio_vec *bv; unsigned long bv_cnt = 0; bool is_sync = false; size_t len = 0, total_len = 0; ssize_t ret = 0; loff_t pos; - + int i, j, sg_done;
if (req->f.mpool_alloc && nr_bvec > NVMET_MAX_MPOOL_BVEC) is_sync = true; @@ -147,26 +141,33 @@ static bool nvmet_file_execute_io(struct nvmet_req *req, int ki_flags) }
memset(&req->f.iocb, 0, sizeof(struct kiocb)); - for_each_sg_page(req->sg, &sg_pg_iter, req->sg_cnt, 0) { - nvmet_file_init_bvec(&req->f.bvec[bv_cnt], &sg_pg_iter); - len += req->f.bvec[bv_cnt].bv_len; - total_len += req->f.bvec[bv_cnt].bv_len; - bv_cnt++; - - WARN_ON_ONCE((nr_bvec - 1) < 0); - - if (unlikely(is_sync) && - (nr_bvec - 1 == 0 || bv_cnt == NVMET_MAX_MPOOL_BVEC)) { - ret = nvmet_file_submit_bvec(req, pos, bv_cnt, len, 0); - if (ret < 0) - goto complete; - - pos += len; - bv_cnt = 0; - len = 0; + for_each_sg(req->sg, sg, req->sg_cnt, i) + for (j = 0, sg_done = 0; + (bv = &req->f.bvec[bv_cnt]) && sg_done < sg->length; + j++, sg_done += bv->bv_len) { + bv->bv_offset = j ? 0 : sg->offset; + bv->bv_len = min_t(unsigned, PAGE_SIZE - bv->bv_offset, + sg->length - sg_done); + bv->bv_page = nth_page(sg_page(sg), j); + + len += bv->bv_len; + total_len += bv->bv_len; + bv_cnt++; + + WARN_ON_ONCE((nr_bvec - 1) < 0); + + if (unlikely(is_sync) && + (nr_bvec - 1 == 0 || bv_cnt == NVMET_MAX_MPOOL_BVEC)) { + ret = nvmet_file_submit_bvec(req, pos, bv_cnt, len, 0); + if (ret < 0) + goto complete; + + pos += len; + bv_cnt = 0; + len = 0; + } + nr_bvec--; } - nr_bvec--; - }
if (WARN_ON_ONCE(total_len != req->data_len)) { ret = -EIO;
On Mon, Mar 25, 2019 at 06:07:07PM +0800, Ming Lei wrote:
There isn't sg iterator helper for building bvec, so invent one and fix the issue in nvmet_file_init_bvec().
The issue is that one sg may include multipge continuous pages, and only the 1st .bv_offset isn't zero, also the length for the last bvec has to consider the remained length.
The scatterlist in the nvme target is always allocated by the nvmet code itself an thus never contains multi-page sg list entries.
On Mon, Mar 25, 2019 at 11:52:31AM +0100, Christoph Hellwig wrote:
On Mon, Mar 25, 2019 at 06:07:07PM +0800, Ming Lei wrote:
There isn't sg iterator helper for building bvec, so invent one and fix the issue in nvmet_file_init_bvec().
The issue is that one sg may include multipge continuous pages, and only the 1st .bv_offset isn't zero, also the length for the last bvec has to consider the remained length.
The scatterlist in the nvme target is always allocated by the nvmet code itself an thus never contains multi-page sg list entries.
I am wondering if it is true.
Not look at other target code yet, however seems it isn't true for loop, see the following code in nvme_loop_queue_rq():
iod->req.sg = iod->sg_table.sgl; iod->req.sg_cnt = blk_rq_map_sg(req->q, req, iod->sg_table.sgl); iod->req.transfer_len = blk_rq_payload_bytes(req);
And it has been triggered by nvme/011 in Yi's test.
Thanks, Ming
The scatterlist in the nvme target is always allocated by the nvmet code itself an thus never contains multi-page sg list entries.
I am wondering if it is true.
Not look at other target code yet, however seems it isn't true for loop, see the following code in nvme_loop_queue_rq():
iod->req.sg = iod->sg_table.sgl; iod->req.sg_cnt = blk_rq_map_sg(req->q, req, iod->sg_table.sgl); iod->req.transfer_len = blk_rq_payload_bytes(req);
And it has been triggered by nvme/011 in Yi's test.
Yes, loop is an exception in this case.
On Mon, Mar 25, 2019 at 07:03:23PM -0700, Sagi Grimberg wrote:
The scatterlist in the nvme target is always allocated by the nvmet code itself an thus never contains multi-page sg list entries.
I am wondering if it is true.
Not look at other target code yet, however seems it isn't true for loop, see the following code in nvme_loop_queue_rq():
iod->req.sg = iod->sg_table.sgl; iod->req.sg_cnt = blk_rq_map_sg(req->q, req, iod->sg_table.sgl); iod->req.transfer_len = blk_rq_payload_bytes(req);
And it has been triggered by nvme/011 in Yi's test.
Yes, loop is an exception in this case.
Thanks for the clarification!
Another candidate fix is to set nvmet-loop's queue segment boundary mask as PAGE_SIZE - 1.
Also there is the same issue for block device backed target.
If no one objects, I'd like to take the approach of adjusting segment boundary mask.
Thanks, Ming
On Tue, Mar 26, 2019 at 10:16:28AM +0800, Ming Lei wrote:
Also there is the same issue for block device backed target.
If no one objects, I'd like to take the approach of adjusting segment boundary mask.
Just go for the merge of your two patches. That fixes the issue and actually makes the code simpler.
linux-stable-mirror@lists.linaro.org