Hi,
When one request is dispatched to LLD via dm-rq, if the result is BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate private data for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue, since SCSI is the only driver which allocates private requst data in .queue_rq() path.
Another use case of this callback is to free the request and re-submit bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hua...
V4: - add more commit log on the new .cleanup_rq callback, as suggested by Mike
V3: - run .cleanup_rq() from dm-rq because this issue is dm-rq specific, and even in future it should be still very unusual to free request in this way. If we call .cleanup_rq() in generic rq free code(fast path), cost will be introduced unnecessarily, also we have to consider related race.
V2: - run .cleanup_rq() in blk_mq_free_request(), as suggested by Mike
Ming Lei (2): blk-mq: add callback of .cleanup_rq scsi: implement .cleanup_rq callback
drivers/md/dm-rq.c | 1 + drivers/scsi/scsi_lib.c | 13 +++++++++++++ include/linux/blk-mq.h | 13 +++++++++++++ 3 files changed, 27 insertions(+)
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
SCSI maintains its own driver private data hooked off of each SCSI request, and the pridate data won't be freed after scsi_queue_rq() returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver (e.g. dm-rq) may need to retry these SCSI requests, before SCSI has fully dispatched them, due to a lower level SCSI driver's resource limitation identified in scsi_queue_rq(). Currently SCSI's per-request private data is leaked when the upper layer driver (dm-rq) frees and then retries these requests in response to BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
This usecase is so specialized that it doesn't warrant training an existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to account for freeing its driver private data -- doing so would add an extra branch for handling a special case that all other consumers of SCSI (and blk-mq) won't ever need to worry about.
So the most pragmatic way forward is to delegate freeing SCSI driver private data to the upper layer driver (dm-rq). Do so by adding new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method from dm-rq. A following commit will implement the .cleanup_rq() hook in scsi_mq_ops.
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei ming.lei@redhat.com --- drivers/md/dm-rq.c | 1 + include/linux/blk-mq.h | 13 +++++++++++++ 2 files changed, 14 insertions(+)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index c9e44ac1f9a6..21d5c1784d0c 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio) ret = dm_dispatch_clone_request(clone, rq); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_rq_unprep_clone(clone); + blk_mq_cleanup_rq(clone); tio->ti->type->release_clone_rq(clone, &tio->info); tio->clone = NULL; return DM_MAPIO_REQUEUE; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 3fa1fa59f9b2..ab25e69a15d1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -140,6 +140,7 @@ typedef int (poll_fn)(struct blk_mq_hw_ctx *); typedef int (map_queues_fn)(struct blk_mq_tag_set *set); typedef bool (busy_fn)(struct request_queue *); typedef void (complete_fn)(struct request *); +typedef void (cleanup_rq_fn)(struct request *);
struct blk_mq_ops { @@ -200,6 +201,12 @@ struct blk_mq_ops { /* Called from inside blk_get_request() */ void (*initialize_rq_fn)(struct request *rq);
+ /* + * Called before freeing one request which isn't completed yet, + * and usually for freeing the driver private data + */ + cleanup_rq_fn *cleanup_rq; + /* * If set, returns whether or not this queue currently is busy */ @@ -366,4 +373,10 @@ static inline blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, BLK_QC_T_INTERNAL; }
+static inline void blk_mq_cleanup_rq(struct request *rq) +{ + if (rq->q->mq_ops->cleanup_rq) + rq->q->mq_ops->cleanup_rq(rq); +} + #endif
Implement .cleanup_rq() callback for freeing driver private part of the request. Then we can avoid to leak this part if the request isn't completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally.
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei ming.lei@redhat.com --- drivers/scsi/scsi_lib.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index e1da8c70a266..eb848ff46afd 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1089,6 +1089,18 @@ static void scsi_initialize_rq(struct request *rq) cmd->retries = 0; }
+/* + * Only called when the request isn't completed by SCSI, and not freed by + * SCSI + */ +static void scsi_cleanup_rq(struct request *rq) +{ + if (rq->rq_flags & RQF_DONTPREP) { + scsi_mq_uninit_cmd(blk_mq_rq_to_pdu(rq)); + rq->rq_flags &= ~RQF_DONTPREP; + } +} + /* Add a command to the list used by the aacraid and dpt_i2o drivers */ void scsi_add_cmd_to_list(struct scsi_cmnd *cmd) { @@ -1816,6 +1828,7 @@ static const struct blk_mq_ops scsi_mq_ops = { .init_request = scsi_mq_init_request, .exit_request = scsi_mq_exit_request, .initialize_rq_fn = scsi_initialize_rq, + .cleanup_rq = scsi_cleanup_rq, .busy = scsi_mq_lld_busy, .map_queues = scsi_map_queues, };
On Thu, Jul 25, 2019 at 10:04:58AM +0800, Ming Lei wrote:
Hi,
When one request is dispatched to LLD via dm-rq, if the result is BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate private data for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue, since SCSI is the only driver which allocates private requst data in .queue_rq() path.
Another use case of this callback is to free the request and re-submit bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hua...
V4:
- add more commit log on the new .cleanup_rq callback, as suggested by Mike
V3:
- run .cleanup_rq() from dm-rq because this issue is dm-rq specific,
and even in future it should be still very unusual to free request in this way. If we call .cleanup_rq() in generic rq free code(fast path), cost will be introduced unnecessarily, also we have to consider related race.
V2:
- run .cleanup_rq() in blk_mq_free_request(), as suggested by Mike
Ming Lei (2): blk-mq: add callback of .cleanup_rq scsi: implement .cleanup_rq callback
drivers/md/dm-rq.c | 1 + drivers/scsi/scsi_lib.c | 13 +++++++++++++ include/linux/blk-mq.h | 13 +++++++++++++ 3 files changed, 27 insertions(+)
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Hello Jens & guys,
Ping on this fix.
Thanks, Ming
On Tue, Jul 30, 2019 at 08:43:59AM +0800, Ming Lei wrote:
On Thu, Jul 25, 2019 at 10:04:58AM +0800, Ming Lei wrote:
Hi,
When one request is dispatched to LLD via dm-rq, if the result is BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate private data for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue, since SCSI is the only driver which allocates private requst data in .queue_rq() path.
Another use case of this callback is to free the request and re-submit bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hua...
V4:
- add more commit log on the new .cleanup_rq callback, as suggested by Mike
V3:
- run .cleanup_rq() from dm-rq because this issue is dm-rq specific,
and even in future it should be still very unusual to free request in this way. If we call .cleanup_rq() in generic rq free code(fast path), cost will be introduced unnecessarily, also we have to consider related race.
V2:
- run .cleanup_rq() in blk_mq_free_request(), as suggested by Mike
Ming Lei (2): blk-mq: add callback of .cleanup_rq scsi: implement .cleanup_rq callback
drivers/md/dm-rq.c | 1 + drivers/scsi/scsi_lib.c | 13 +++++++++++++ include/linux/blk-mq.h | 13 +++++++++++++ 3 files changed, 27 insertions(+)
Cc: Ewan D. Milne emilne@redhat.com Cc: Bart Van Assche bvanassche@acm.org Cc: Hannes Reinecke hare@suse.com Cc: Christoph Hellwig hch@lst.de Cc: Mike Snitzer snitzer@redhat.com Cc: dm-devel@redhat.com Cc: stable@vger.kernel.org Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Hello Jens & guys,
Ping on this fix.
Hi Jens,
Could you make the patcheset merged for 5.4? And it has been verified that big memory leak issue can be fixed by this patchset.
thanks, Ming
On 7/24/19 7:04 PM, Ming Lei wrote:
Hi,
When one request is dispatched to LLD via dm-rq, if the result is BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate private data for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue, since SCSI is the only driver which allocates private requst data in .queue_rq() path.
Another use case of this callback is to free the request and re-submit bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hua...
Applied for 5.4, thanks.
linux-stable-mirror@lists.linaro.org