The code "max(1U, 3 * (1U << shift) / 4)" comes from the Kyber I/O scheduler. The Kyber I/O scheduler maintains one internal queue per hwq and hence derives its async_depth from the number of hwq tags. Using this approach for the mq-deadline scheduler is wrong since the mq-deadline scheduler maintains one internal queue for all hwqs combined. Hence this revert.
Cc: stable@vger.kernel.org Cc: Damien Le Moal dlemoal@kernel.org Cc: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Cc: Zhiguo Niu Zhiguo.Niu@unisoc.com Fixes: d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") Signed-off-by: Bart Van Assche bvanassche@acm.org --- block/mq-deadline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index f958e79277b8..02a916ba62ee 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -646,9 +646,8 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; struct blk_mq_tags *tags = hctx->sched_tags; - unsigned int shift = tags->bitmap_tags.sb.shift;
- dd->async_depth = max(1U, 3 * (1U << shift) / 4); + dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth); }
On Wed, 13 Mar 2024 14:42:18 -0700, Bart Van Assche wrote:
The code "max(1U, 3 * (1U << shift) / 4)" comes from the Kyber I/O scheduler. The Kyber I/O scheduler maintains one internal queue per hwq and hence derives its async_depth from the number of hwq tags. Using this approach for the mq-deadline scheduler is wrong since the mq-deadline scheduler maintains one internal queue for all hwqs combined. Hence this revert.
[...]
Applied, thanks!
[1/1] Revert "block/mq-deadline: use correct way to throttling write requests" commit: 256aab46e31683d76d45ccbedc287b4d3f3e322b
Best regards,
Hi Bart,
Just as mentioned in original patch, "dd->async_depth = max(1UL, 3 * q->nr_requests / 4);", this limitation methods look likes won't have a limit effect, because tag allocated is based on sbitmap, not based the whole nr_requests. Right? Thanks!
For write requests, when we assign a tags from sched_tags, data->shallow_depth will be passed to sbitmap_find_bit, see the following code:
nr = sbitmap_find_bit_in_word(&sb->map[index], min_t (unsigned int, __map_depth(sb, index), depth), alloc_hint, wrap);
The smaller of data->shallow_depth and __map_depth(sb, index) will be used as the maximum range when allocating bits.
For a mmc device (one hw queue, deadline I/O scheduler): q->nr_requests = sched_tags = 128, so according to the previous calculation method, dd->async_depth = data->shallow_depth = 96, and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5, sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or a write I/O, tags can be allocated to the maximum range each time, which has not throttling effect.
-----邮件原件----- 发件人: Bart Van Assche bvanassche@acm.org 发送时间: 2024年3月14日 5:42 收件人: Jens Axboe axboe@kernel.dk 抄送: linux-block@vger.kernel.org; Christoph Hellwig hch@lst.de; Bart Van Assche bvanassche@acm.org; stable@vger.kernel.org; Damien Le Moal dlemoal@kernel.org; Harshit Mogalapalli harshit.m.mogalapalli@oracle.com; 牛志国 (Zhiguo Niu) Zhiguo.Niu@unisoc.com 主题: [PATCH] Revert "block/mq-deadline: use correct way to throttling write requests"
注意: 这封邮件来自于外部。除非你确定邮件内容安全,否则不要点击任何链接和附件。 CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
The code "max(1U, 3 * (1U << shift) / 4)" comes from the Kyber I/O scheduler. The Kyber I/O scheduler maintains one internal queue per hwq and hence derives its async_depth from the number of hwq tags. Using this approach for the mq-deadline scheduler is wrong since the mq-deadline scheduler maintains one internal queue for all hwqs combined. Hence this revert.
Cc: stable@vger.kernel.org Cc: Damien Le Moal dlemoal@kernel.org Cc: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Cc: Zhiguo Niu Zhiguo.Niu@unisoc.com Fixes: d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") Signed-off-by: Bart Van Assche bvanassche@acm.org --- block/mq-deadline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index f958e79277b8..02a916ba62ee 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -646,9 +646,8 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; struct blk_mq_tags *tags = hctx->sched_tags; - unsigned int shift = tags->bitmap_tags.sb.shift;
- dd->async_depth = max(1U, 3 * (1U << shift) / 4); + dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth); }
On 3/13/24 18:03, 牛志国 (Zhiguo Niu) wrote:
Just as mentioned in original patch, "dd->async_depth = max(1UL, 3 * q->nr_requests / 4);", this limitation methods look likes won't have a limit effect, because tag allocated is based on sbitmap, not based the whole nr_requests. Right? Thanks!
For write requests, when we assign a tags from sched_tags, data->shallow_depth will be passed to sbitmap_find_bit, see the following code:
nr = sbitmap_find_bit_in_word(&sb->map[index], min_t (unsigned int, __map_depth(sb, index), depth), alloc_hint, wrap);
The smaller of data->shallow_depth and __map_depth(sb, index) will be used as the maximum range when allocating bits.
For a mmc device (one hw queue, deadline I/O scheduler): q->nr_requests = sched_tags = 128, so according to the previous calculation method, dd->async_depth = data->shallow_depth = 96, and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5, sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or a write I/O, tags can be allocated to the maximum range each time, which has not throttling effect.
Whether or not the code in my patch effectively performs throttling, we need this revert to be merged. The patch that is being reverted ("block/mq-deadline: use correct way to throttling write requests") ended up in Greg KH's stable branches. Hence, the first step is to revert that patch and tag it with "Cc: stable" such that the revert lands in the stable branches.
Thanks,
Bart.
On 3/14/24 11:08 AM, Bart Van Assche wrote:
On 3/13/24 18:03, ??? (Zhiguo Niu) wrote:
Just as mentioned in original patch, "dd->async_depth = max(1UL, 3 * q->nr_requests / 4);", this limitation methods look likes won't have a limit effect, because tag allocated is based on sbitmap, not based the whole nr_requests. Right? Thanks!
For write requests, when we assign a tags from sched_tags, data->shallow_depth will be passed to sbitmap_find_bit, see the following code:
nr = sbitmap_find_bit_in_word(&sb->map[index], min_t (unsigned int, __map_depth(sb, index), depth), alloc_hint, wrap);
The smaller of data->shallow_depth and __map_depth(sb, index) will be used as the maximum range when allocating bits.
For a mmc device (one hw queue, deadline I/O scheduler): q->nr_requests = sched_tags = 128, so according to the previous calculation method, dd->async_depth = data->shallow_depth = 96, and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5, sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or a write I/O, tags can be allocated to the maximum range each time, which has not throttling effect.
Whether or not the code in my patch effectively performs throttling, we need this revert to be merged. The patch that is being reverted ("block/mq-deadline: use correct way to throttling write requests") ended up in Greg KH's stable branches. Hence, the first step is to revert that patch and tag it with "Cc: stable" such that the revert lands in the stable branches.
Indeed, no amount of arguing is going to change that fact. Zhiguo, it caused a regression. Rather than argue on why the change is correct, it'd be much more productive to figure out a future solution.
Hi Bart, Jens and Zhiguo,
On 14/03/24 03:12, Bart Van Assche wrote:
The code "max(1U, 3 * (1U << shift) / 4)" comes from the Kyber I/O scheduler. The Kyber I/O scheduler maintains one internal queue per hwq and hence derives its async_depth from the number of hwq tags. Using this approach for the mq-deadline scheduler is wrong since the mq-deadline scheduler maintains one internal queue for all hwqs combined. Hence this revert.
Thanks a lot for helping with this performance regression[1].
Regards, Harshit
[1] https://lore.kernel.org/all/5ce2ae5d-61e2-4ede-ad55-551112602401@oracle.com/
Cc: stable@vger.kernel.org Cc: Damien Le Moal dlemoal@kernel.org Cc: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Cc: Zhiguo Niu Zhiguo.Niu@unisoc.com Fixes: d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") Signed-off-by: Bart Van Assche bvanassche@acm.org
block/mq-deadline.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index f958e79277b8..02a916ba62ee 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -646,9 +646,8 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; struct blk_mq_tags *tags = hctx->sched_tags;
- unsigned int shift = tags->bitmap_tags.sb.shift;
- dd->async_depth = max(1U, 3 * (1U << shift) / 4);
- dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth); }
linux-stable-mirror@lists.linaro.org