Hi,
When one request is dispatched to LLD via dm-rq, if the result is BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate private data for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue, since SCSI is the only driver which allocates private requst data in .queue_rq() path.
Another use case of this callback is to free the request and re-submit bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hua...
V3: - run .cleanup_rq() from dm-rq because this issue is dm-rq specific, and even in future it should be still very unusual to free request in this way. If we call .cleanup_rq() in generic rq free code(fast path), cost will be introduced unnecessarily, also we have to consider related race.
V2: - run .cleanup_rq() in blk_mq_free_request(), as suggested by Mike
Ming Lei (2): blk-mq: add callback of .cleanup_rq scsi: implement .cleanup_rq callback
drivers/md/dm-rq.c | 1 + drivers/scsi/scsi_lib.c | 13 +++++++++++++ include/linux/blk-mq.h | 13 +++++++++++++ 3 files changed, 27 insertions(+)
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
dm-rq needs to free request which has been dispatched and not completed by underlying queue. However, the underlying queue may have allocated private data for this request in .queue_rq(), so the request private data will be leaked in dm multipath IO code path.
Add one new callback of .cleanup_rq() to fix the memory leak.
Another use case is to free request when the hctx is dead during cpu hotplug context.
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei ming.lei@redhat.com --- drivers/md/dm-rq.c | 1 + include/linux/blk-mq.h | 13 +++++++++++++ 2 files changed, 14 insertions(+)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index c9e44ac1f9a6..21d5c1784d0c 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio) ret = dm_dispatch_clone_request(clone, rq); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_rq_unprep_clone(clone); + blk_mq_cleanup_rq(clone); tio->ti->type->release_clone_rq(clone, &tio->info); tio->clone = NULL; return DM_MAPIO_REQUEUE; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 3fa1fa59f9b2..ab25e69a15d1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -140,6 +140,7 @@ typedef int (poll_fn)(struct blk_mq_hw_ctx *); typedef int (map_queues_fn)(struct blk_mq_tag_set *set); typedef bool (busy_fn)(struct request_queue *); typedef void (complete_fn)(struct request *); +typedef void (cleanup_rq_fn)(struct request *);
struct blk_mq_ops { @@ -200,6 +201,12 @@ struct blk_mq_ops { /* Called from inside blk_get_request() */ void (*initialize_rq_fn)(struct request *rq);
+ /* + * Called before freeing one request which isn't completed yet, + * and usually for freeing the driver private data + */ + cleanup_rq_fn *cleanup_rq; + /* * If set, returns whether or not this queue currently is busy */ @@ -366,4 +373,10 @@ static inline blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, BLK_QC_T_INTERNAL; }
+static inline void blk_mq_cleanup_rq(struct request *rq) +{ + if (rq->q->mq_ops->cleanup_rq) + rq->q->mq_ops->cleanup_rq(rq); +} + #endif
On Tue, Jul 23 2019 at 11:12pm -0400, Ming Lei ming.lei@redhat.com wrote:
dm-rq needs to free request which has been dispatched and not completed by underlying queue. However, the underlying queue may have allocated private data for this request in .queue_rq(), so the request private data will be leaked in dm multipath IO code path.
Add one new callback of .cleanup_rq() to fix the memory leak.
Think the above kind of glosses over the nature of the issue. While this issue _is_ unique to request-based DM multipath's use of blk-mq it ultimately is a failing of the existing blk-mq interface that SCSI's per-request private data is leaking. SO all said, I'd rather this patch header reflect the nuance of why you skinned the cat like you have.
Something like this would be a better header IMHO:
SCSI maintains its own driver private data hooked off of each SCSI request. An upper layer driver (e.g. dm-rq) may need to retry these SCSI requests, before SCSI has fully dispatched them, due to a lower level SCSI driver's resource limitation identified in scsi_queue_rq(). Currently SCSI's per-request private data is leaked when the upper layer driver (dm-rq) frees and then retries these requests in response to BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
This usecase is so specialized that it doesn't warrant training an existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to account for freeing its driver private data -- doing so would add an extra branch for handling a special case that all other consumers of SCSI (and blk-mq) won't ever need to worry about.
So the most pragmatic way forward is to delegate freeing SCSI driver private data to the upper layer driver (dm-rq). Do so by calling a new blk_mq_cleanup_rq() method from dm-rq. A following commit will implement the .cleanup_rq() hook in scsi_mq_ops.
Another use case is to free request when the hctx is dead during cpu hotplug context.
Not seeing any point forecasting this .cleanup_rq() hook's potential ability to address that cpu hotplug case; the future patch that provides that fix can deal with it. Reality is the existing SCSI per-request private data leak justifies this new hook on its own.
Mike
Implement .cleanup_rq() callback for freeing driver private part of the request. Then we can avoid to leak this part if the request isn't completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally.
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei ming.lei@redhat.com --- drivers/scsi/scsi_lib.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index e1da8c70a266..eb848ff46afd 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1089,6 +1089,18 @@ static void scsi_initialize_rq(struct request *rq) cmd->retries = 0; }
+/* + * Only called when the request isn't completed by SCSI, and not freed by + * SCSI + */ +static void scsi_cleanup_rq(struct request *rq) +{ + if (rq->rq_flags & RQF_DONTPREP) { + scsi_mq_uninit_cmd(blk_mq_rq_to_pdu(rq)); + rq->rq_flags &= ~RQF_DONTPREP; + } +} + /* Add a command to the list used by the aacraid and dpt_i2o drivers */ void scsi_add_cmd_to_list(struct scsi_cmnd *cmd) { @@ -1816,6 +1828,7 @@ static const struct blk_mq_ops scsi_mq_ops = { .init_request = scsi_mq_init_request, .exit_request = scsi_mq_exit_request, .initialize_rq_fn = scsi_initialize_rq, + .cleanup_rq = scsi_cleanup_rq, .busy = scsi_mq_lld_busy, .map_queues = scsi_map_queues, };
linux-stable-mirror@lists.linaro.org