August 2023 - Linux-stable-mirror

[PATCH v4.19.y] sched/rt: pick_next_rt_entity(): check list_entry

by Srish Srinivasan

From: Pietro Borrello <borrello(a)diag.uniroma1.it> commit 7c4a5b89a0b5a57a64b601775b296abf77a9fe97 upstream. Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") removed any path which could make pick_next_rt_entity() return NULL. However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of pick_next_rt_entity()) still checks the error condition, which can never happen, since list_entry() never returns NULL. Remove the BUG_ON check, and instead emit a warning in the only possible error condition here: the queue being empty which should never happen. Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <borrello(a)diag.uniroma1.it> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Phil Auld <pauld(a)redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd… Signed-off-by: Sasha Levin <sashal(a)kernel.org> [Srish: Fixes CVE-2023-1077: sched/rt: pick_next_rt_entity(): check list_entry An insufficient list empty checking in pick_next_rt_entity(). The _pick_next_task_rt() checks pick_next_rt_entity() returns NULL or not but pick_next_rt_entity() never returns NULL. So, even if the list is empty, _pick_next_task_rt() continues its process.] Signed-off-by: Srish Srinivasan <ssrish(a)vmware.com> --- kernel/sched/rt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 9c6c3572b..394c66442 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1522,6 +1522,8 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq, BUG_ON(idx >= MAX_RT_PRIO); queue = array->queue + idx; + if (SCHED_WARN_ON(list_empty(queue))) + return NULL; next = list_entry(queue->next, struct sched_rt_entity, run_list); return next; @@ -1535,7 +1537,8 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq) do { rt_se = pick_next_rt_entity(rq, rt_rq); - BUG_ON(!rt_se); + if (unlikely(!rt_se)) + return NULL; rt_rq = group_rt_rq(rt_se); } while (rt_rq); -- 2.35.6

2 years

2
1
0 0

[PATCH 6.1 -stable] ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd

by Ming Lei

commit 9c7c4bc986932218fd0df9d2a100509772028fb1 upstream sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE, so not necessary to check IO_URING_F_SQE128. With this change, we get chance to save half SQ ring memory. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei(a)redhat.com> Link: https://lore.kernel.org/r/20230220041413.1524335-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> --- drivers/block/ublk_drv.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index f48d213fb65e..09d29fa53939 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1271,9 +1271,6 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) __func__, cmd->cmd_op, ub_cmd->q_id, tag, ub_cmd->result); - if (!(issue_flags & IO_URING_F_SQE128)) - goto out; - if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) goto out; -- 2.40.1

2 years

2
1
0 0

[PATCH 6.4.y] thunderbolt: Fix Thunderbolt 3 display flickering issue on 2nd hot plug onwards

by Mario Limonciello

From: Sanjay R Mehta <sanju.mehta(a)amd.com> Previously, on unplug events, the TMU mode was disabled first followed by the Time Synchronization Handshake, irrespective of whether the tb_switch_tmu_rate_write() API was successful or not. However, this caused a problem with Thunderbolt 3 (TBT3) devices, as the TSPacketInterval bits were always enabled by default, leading the host router to assume that the device router's TMU was already enabled and preventing it from initiating the Time Synchronization Handshake. As a result, TBT3 monitors experienced display flickering from the second hot plug onwards. To address this issue, we have modified the code to only disable the Time Synchronization Handshake during TMU disable if the tb_switch_tmu_rate_write() function is successful. This ensures that the TBT3 devices function correctly and eliminates the display flickering issue. Co-developed-by: Sanath S <Sanath.S(a)amd.com> Signed-off-by: Sanath S <Sanath.S(a)amd.com> Signed-off-by: Sanjay R Mehta <sanju.mehta(a)amd.com> Cc: stable(a)vger.kernel.org Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> (cherry picked from commit 583893a66d731f5da010a3fa38a0460e05f0149b) USB4v2 introduced support for uni-directional TMU mode as part of d49b4f043d63 ("thunderbolt: Add support for enhanced uni-directional TMU mode") This is not a stable candidate commit, so adjust the code for backport. Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> --- drivers/thunderbolt/tmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/thunderbolt/tmu.c b/drivers/thunderbolt/tmu.c index 626aca3124b1..d9544600b386 100644 --- a/drivers/thunderbolt/tmu.c +++ b/drivers/thunderbolt/tmu.c @@ -415,7 +415,8 @@ int tb_switch_tmu_disable(struct tb_switch *sw) * uni-directional mode and we don't want to change it's TMU * mode. */ - tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF); + ret = tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF); + return ret; tb_port_tmu_time_sync_disable(up); ret = tb_port_tmu_time_sync_disable(down); -- 2.34.1

2 years

2
1
0 0

FAILED: patch "[PATCH] ext4: fix rbtree traversal bug in ext4_mb_use_preallocated" failed to apply to 6.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y git checkout FETCH_HEAD git cherry-pick -x 9d3de7ee192a6a253f475197fe4d2e2af10a731f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072413-glamorous-unjustly-bb12@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9d3de7ee192a6a253f475197fe4d2e2af10a731f Mon Sep 17 00:00:00 2001 From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com> Date: Sat, 22 Jul 2023 22:45:24 +0530 Subject: [PATCH] ext4: fix rbtree traversal bug in ext4_mb_use_preallocated During allocations, while looking for preallocations(PA) in the per inode rbtree, we can't do a direct traversal of the tree because ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted and that can cause direct traversal to skip some entries. This was leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy our request and ultimately tried to create a new PA that would overlap with the missed one. To makes sure we handle that case while still keeping the performance of the rbtree, we make use of the fact that the only pa that could possibly overlap the original goal start is the one that satisfies the below conditions: 1. It must have it's logical start immediately to the left of (ie less than) original logical start. 2. It must not be deleted To find this pa we use the following traversal method: 1. Descend into the rbtree normally to find the immediate neighboring PA. Here we keep descending irrespective of if the PA is deleted or if it overlaps with our request etc. The goal is to find an immediately adjacent PA. 2. If the found PA is on right of original goal, use rb_prev() to find the left adjacent PA. 3. Check if this PA is deleted and keep moving left with rb_prev() until a non deleted PA is found. 4. This is the PA we are looking for. Now we can check if it can satisfy the original request and proceed accordingly. This approach also takes care of having deleted PAs in the tree. (While we are at it, also fix a possible overflow bug in calculating the end of a PA) [1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+t… Cc: stable(a)kernel.org # 6.4 Fixes: 3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list") Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com> Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org> Reviewed-by: Ritesh Harjani (IBM) ritesh.list(a)gmail.com Tested-by: Ritesh Harjani (IBM) ritesh.list(a)gmail.com Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.16900459… Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 456150ef6111..21b903fe546e 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4765,8 +4765,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) int order, i; struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); struct ext4_locality_group *lg; - struct ext4_prealloc_space *tmp_pa, *cpa = NULL; - ext4_lblk_t tmp_pa_start, tmp_pa_end; + struct ext4_prealloc_space *tmp_pa = NULL, *cpa = NULL; + loff_t tmp_pa_end; struct rb_node *iter; ext4_fsblk_t goal_block; @@ -4774,47 +4774,151 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) if (!(ac->ac_flags & EXT4_MB_HINT_DATA)) return false; - /* first, try per-file preallocation */ + /* + * first, try per-file preallocation by searching the inode pa rbtree. + * + * Here, we can't do a direct traversal of the tree because + * ext4_mb_discard_group_preallocation() can paralelly mark the pa + * deleted and that can cause direct traversal to skip some entries. + */ read_lock(&ei->i_prealloc_lock); + + if (RB_EMPTY_ROOT(&ei->i_prealloc_node)) { + goto try_group_pa; + } + + /* + * Step 1: Find a pa with logical start immediately adjacent to the + * original logical start. This could be on the left or right. + * + * (tmp_pa->pa_lstart never changes so we can skip locking for it). + */ for (iter = ei->i_prealloc_node.rb_node; iter; iter = ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical, - tmp_pa_start, iter)) { + tmp_pa->pa_lstart, iter)) { tmp_pa = rb_entry(iter, struct ext4_prealloc_space, pa_node.inode_node); + } - /* all fields in this condition don't change, - * so we can skip locking for them */ - tmp_pa_start = tmp_pa->pa_lstart; - tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + /* + * Step 2: The adjacent pa might be to the right of logical start, find + * the left adjacent pa. After this step we'd have a valid tmp_pa whose + * logical start is towards the left of original request's logical start + */ + if (tmp_pa->pa_lstart > ac->ac_o_ex.fe_logical) { + struct rb_node *tmp; + tmp = rb_prev(&tmp_pa->pa_node.inode_node); - /* original request start doesn't lie in this PA */ - if (ac->ac_o_ex.fe_logical < tmp_pa_start || - ac->ac_o_ex.fe_logical >= tmp_pa_end) - continue; - - /* non-extent files can't have physical blocks past 2^32 */ - if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && - (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) > - EXT4_MAX_BLOCK_FILE_PHYS)) { + if (tmp) { + tmp_pa = rb_entry(tmp, struct ext4_prealloc_space, + pa_node.inode_node); + } else { /* - * Since PAs don't overlap, we won't find any - * other PA to satisfy this. + * If there is no adjacent pa to the left then finding + * an overlapping pa is not possible hence stop searching + * inode pa tree + */ + goto try_group_pa; + } + } + + BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical)); + + /* + * Step 3: If the left adjacent pa is deleted, keep moving left to find + * the first non deleted adjacent pa. After this step we should have a + * valid tmp_pa which is guaranteed to be non deleted. + */ + for (iter = &tmp_pa->pa_node.inode_node;; iter = rb_prev(iter)) { + if (!iter) { + /* + * no non deleted left adjacent pa, so stop searching + * inode pa tree + */ + goto try_group_pa; + } + tmp_pa = rb_entry(iter, struct ext4_prealloc_space, + pa_node.inode_node); + spin_lock(&tmp_pa->pa_lock); + if (tmp_pa->pa_deleted == 0) { + /* + * We will keep holding the pa_lock from + * this point on because we don't want group discard + * to delete this pa underneath us. Since group + * discard is anyways an ENOSPC operation it + * should be okay for it to wait a few more cycles. */ break; - } - - /* found preallocated blocks, use them */ - spin_lock(&tmp_pa->pa_lock); - if (tmp_pa->pa_deleted == 0 && tmp_pa->pa_free && - likely(ext4_mb_pa_goal_check(ac, tmp_pa))) { - atomic_inc(&tmp_pa->pa_count); - ext4_mb_use_inode_pa(ac, tmp_pa); + } else { spin_unlock(&tmp_pa->pa_lock); - read_unlock(&ei->i_prealloc_lock); - return true; } - spin_unlock(&tmp_pa->pa_lock); } + + BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical)); + BUG_ON(tmp_pa->pa_deleted == 1); + + /* + * Step 4: We now have the non deleted left adjacent pa. Only this + * pa can possibly satisfy the request hence check if it overlaps + * original logical start and stop searching if it doesn't. + */ + tmp_pa_end = (loff_t)tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len); + + if (ac->ac_o_ex.fe_logical >= tmp_pa_end) { + spin_unlock(&tmp_pa->pa_lock); + goto try_group_pa; + } + + /* non-extent files can't have physical blocks past 2^32 */ + if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) && + (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) > + EXT4_MAX_BLOCK_FILE_PHYS)) { + /* + * Since PAs don't overlap, we won't find any other PA to + * satisfy this. + */ + spin_unlock(&tmp_pa->pa_lock); + goto try_group_pa; + } + + if (tmp_pa->pa_free && likely(ext4_mb_pa_goal_check(ac, tmp_pa))) { + atomic_inc(&tmp_pa->pa_count); + ext4_mb_use_inode_pa(ac, tmp_pa); + spin_unlock(&tmp_pa->pa_lock); + read_unlock(&ei->i_prealloc_lock); + return true; + } else { + /* + * We found a valid overlapping pa but couldn't use it because + * it had no free blocks. This should ideally never happen + * because: + * + * 1. When a new inode pa is added to rbtree it must have + * pa_free > 0 since otherwise we won't actually need + * preallocation. + * + * 2. An inode pa that is in the rbtree can only have it's + * pa_free become zero when another thread calls: + * ext4_mb_new_blocks + * ext4_mb_use_preallocated + * ext4_mb_use_inode_pa + * + * 3. Further, after the above calls make pa_free == 0, we will + * immediately remove it from the rbtree in: + * ext4_mb_new_blocks + * ext4_mb_release_context + * ext4_mb_put_pa + * + * 4. Since the pa_free becoming 0 and pa_free getting removed + * from tree both happen in ext4_mb_new_blocks, which is always + * called with i_data_sem held for data allocations, we can be + * sure that another process will never see a pa in rbtree with + * pa_free == 0. + */ + WARN_ON_ONCE(tmp_pa->pa_free == 0); + } + spin_unlock(&tmp_pa->pa_lock); +try_group_pa: read_unlock(&ei->i_prealloc_lock); /* can we use group allocation? */

2 years

3
2
0 0

FAILED: patch "[PATCH] ext4: fix off by one issue in" failed to apply to 6.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y git checkout FETCH_HEAD git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072456-starting-gauging-768c@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001 From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com> Date: Fri, 9 Jun 2023 16:04:03 +0530 Subject: [PATCH] ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail() In ext4_mb_choose_next_group_best_avail(), we want the start order to be 1 less than goal length and the min_order to be, at max, 1 more than the original length. This commit fixes an off by one issue that arose due to the fact that 1 << fls(n) > (n). After all the processing: order = 1 order below goal len min_order = maximum of the three:- - order - trim_order - 1 order below B2C(s_stripe) - 1 order above original len Cc: stable(a)kernel.org Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)") Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com> Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a2475b8c9fb5..456150ef6111 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context * fls() instead since we need to know the actual length while modifying * goal length. */ - order = fls(ac->ac_g_ex.fe_len); + order = fls(ac->ac_g_ex.fe_len) - 1; min_order = order - sbi->s_mb_best_avail_max_trim_order; if (min_order < 0) min_order = 0; - if (1 << min_order < ac->ac_o_ex.fe_len) - min_order = fls(ac->ac_o_ex.fe_len) + 1; - if (sbi->s_stripe > 0) { /* * We are assuming that stripe size is always a multiple of @@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context */ num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe); if (1 << min_order < num_stripe_clusters) - min_order = fls(num_stripe_clusters); + /* + * We consider 1 order less because later we round + * up the goal len to num_stripe_clusters + */ + min_order = fls(num_stripe_clusters) - 1; } + if (1 << min_order < ac->ac_o_ex.fe_len) + min_order = fls(ac->ac_o_ex.fe_len); + for (i = order; i >= min_order; i--) { int frag_order; /*

2 years

3
2
0 0

FAILED: patch "[PATCH] mmc: block: Fix in_flight[issue_type] value error" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 4b430d4ac99750ee2ae2f893f1055c7af1ec3dc5 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082118-donation-clench-604d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: 4b430d4ac997 ("mmc: block: Fix in_flight[issue_type] value error") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 4b430d4ac99750ee2ae2f893f1055c7af1ec3dc5 Mon Sep 17 00:00:00 2001 From: Yibin Ding <yibin.ding(a)unisoc.com> Date: Wed, 2 Aug 2023 10:30:23 +0800 Subject: [PATCH] mmc: block: Fix in_flight[issue_type] value error For a completed request, after the mmc_blk_mq_complete_rq(mq, req) function is executed, the bitmap_tags corresponding to the request will be cleared, that is, the request will be regarded as idle. If the request is acquired by a different type of process at this time, the issue_type of the request may change. It further caused the value of mq->in_flight[issue_type] to be abnormal, and a large number of requests could not be sent. p1: p2: mmc_blk_mq_complete_rq blk_mq_free_request blk_mq_get_request blk_mq_rq_ctx_init mmc_blk_mq_dec_in_flight mmc_issue_type(mq, req) This strategy can ensure the consistency of issue_type before and after executing mmc_blk_mq_complete_rq. Fixes: 81196976ed94 ("mmc: block: Add blk-mq support") Cc: stable(a)vger.kernel.org Signed-off-by: Yibin Ding <yibin.ding(a)unisoc.com> Acked-by: Adrian Hunter <adrian.hunter(a)intel.com> Link: https://lore.kernel.org/r/20230802023023.1318134-1-yunlong.xing@unisoc.com Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index f701efb1fa78..b6f4be25b31b 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2097,14 +2097,14 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue *mq, mmc_blk_urgent_bkops(mq, mqrq); } -static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, struct request *req) +static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, enum mmc_issue_type issue_type) { unsigned long flags; bool put_card; spin_lock_irqsave(&mq->lock, flags); - mq->in_flight[mmc_issue_type(mq, req)] -= 1; + mq->in_flight[issue_type] -= 1; put_card = (mmc_tot_in_flight(mq) == 0); @@ -2117,6 +2117,7 @@ static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, struct request *req) static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req, bool can_sleep) { + enum mmc_issue_type issue_type = mmc_issue_type(mq, req); struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); struct mmc_request *mrq = &mqrq->brq.mrq; struct mmc_host *host = mq->card->host; @@ -2136,7 +2137,7 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req, blk_mq_complete_request(req); } - mmc_blk_mq_dec_in_flight(mq, req); + mmc_blk_mq_dec_in_flight(mq, issue_type); } void mmc_blk_mq_recovery(struct mmc_queue *mq)

2 years

3
2
0 0

[PATCH 5.4 0/1] mm: allow a controlled amount of unfairness in the page lock

by Saeed Mirzamohammadi

We observed a 35% of regression running phoronix pts/ramspeed and also 16% with unixbench. Regression is caused by the following commit: dd0f194cfeb5 | mm: rewrite wait_on_page_bit_common() logic Backporting this fixes the regression (this is already in 5.9+): - 5ef64cc8987a mm: allow a controlled amount of unfairness in the page lock Linus Torvalds (1): mm: allow a controlled amount of unfairness in the page lock include/linux/mm.h | 2 + include/linux/wait.h | 2 + kernel/sysctl.c | 8 +++ mm/filemap.c | 160 ++++++++++++++++++++++++++++++++++--------- 4 files changed, 141 insertions(+), 31 deletions(-) -- 2.41.0

2 years

2
4
0 0

FAILED: patch "[PATCH] drm/i915: Fix premature release of request's reusable memory" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x a337b64f0d5717248a0c894e2618e658e6a9de9f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080742-ion-implement-ceb1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: a337b64f0d57 ("drm/i915: Fix premature release of request's reusable memory") 506006055769 ("drm/i915/active: Fix misuse of non-idle barriers as fence trackers") ad5c99e02047 ("drm/i915: Remove unused bits of i915_vma/active api") f6c466b84cfa ("drm/i915: Add support for moving fence waiting") 544460c33821 ("drm/i915: Multi-BB execbuf") 5851387a422c ("drm/i915/guc: Implement no mid batch preemption for multi-lrc") e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") d38a9294491d ("drm/i915/guc: Update debugfs for GuC multi-lrc") bc955204919e ("drm/i915/guc: Insert submit fences between requests in parent-child relationship") 6b540bf6f143 ("drm/i915/guc: Implement multi-lrc submission") 99b47aaddfa9 ("drm/i915/guc: Implement parallel context pin / unpin functions") c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration") 3897df4c0187 ("drm/i915/guc: Introduce context parent-child relationship") 4f3059dc2dbb ("drm/i915: Add logical engine mapping") 1a52faed3131 ("drm/i915/guc: Take GT PM ref when deregistering context") 0ea92ace8b95 ("drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct") 0d8ee5ba8db4 ("drm/i915: Don't back up pinned LMEM context images and rings during suspend") c56ce9565374 ("drm/i915 Implement LMEM backup and restore for suspend / resume") 0d9388635a22 ("drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects") 68c03c0e985e ("drm/i915/debugfs: Do not report currently active engine when describing objects") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a337b64f0d5717248a0c894e2618e658e6a9de9f Mon Sep 17 00:00:00 2001 From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> Date: Thu, 20 Jul 2023 11:35:44 +0200 Subject: [PATCH] drm/i915: Fix premature release of request's reusable memory Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janus… (cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 8ef93889061a..5ec293011d99 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq) } } while (unlikely(is_barrier(active))); - if (!__i915_active_fence_set(active, fence)) + fence = __i915_active_fence_set(active, fence); + if (!fence) __i915_active_acquire(ref); + else + dma_fence_put(fence); out: i915_active_release(ref); @@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref, return NULL; } - rcu_read_lock(); prev = __i915_active_fence_set(active, fence); - if (prev) - prev = dma_fence_get_rcu(prev); - else + if (!prev) __i915_active_acquire(ref); - rcu_read_unlock(); return prev; } @@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq) * * Records the new @fence as the last active fence along its timeline in * this active tracker, moving the tracking callbacks from the previous - * fence onto this one. Returns the previous fence (if not already completed), - * which the caller must ensure is executed before the new fence. To ensure - * that the order of fences within the timeline of the i915_active_fence is - * understood, it should be locked by the caller. + * fence onto this one. Gets and returns a reference to the previous fence + * (if not already completed), which the caller must put after making sure + * that it is executed before the new fence. To ensure that the order of + * fences within the timeline of the i915_active_fence is understood, it + * should be locked by the caller. */ struct dma_fence * __i915_active_fence_set(struct i915_active_fence *active, @@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active, struct dma_fence *prev; unsigned long flags; - if (fence == rcu_access_pointer(active->fence)) + /* + * In case of fences embedded in i915_requests, their memory is + * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release + * by new requests. Then, there is a risk of passing back a pointer + * to a new, completely unrelated fence that reuses the same memory + * while tracked under a different active tracker. Combined with i915 + * perf open/close operations that build await dependencies between + * engine kernel context requests and user requests from different + * timelines, this can lead to dependency loops and infinite waits. + * + * As a countermeasure, we try to get a reference to the active->fence + * first, so if we succeed and pass it back to our user then it is not + * released and potentially reused by an unrelated request before the + * user has a chance to set up an await dependency on it. + */ + prev = i915_active_fence_get(active); + if (fence == prev) return fence; GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)); @@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active, * Consider that we have two threads arriving (A and B), with * C already resident as the active->fence. * - * A does the xchg first, and so it sees C or NULL depending - * on the timing of the interrupt handler. If it is NULL, the - * previous fence must have been signaled and we know that - * we are first on the timeline. If it is still present, - * we acquire the lock on that fence and serialise with the interrupt - * handler, in the process removing it from any future interrupt - * callback. A will then wait on C before executing (if present). - * - * As B is second, it sees A as the previous fence and so waits for - * it to complete its transition and takes over the occupancy for - * itself -- remembering that it needs to wait on A before executing. + * Both A and B have got a reference to C or NULL, depending on the + * timing of the interrupt handler. Let's assume that if A has got C + * then it has locked C first (before B). * * Note the strong ordering of the timeline also provides consistent * nesting rules for the fence->lock; the inner lock is always the * older lock. */ spin_lock_irqsave(fence->lock, flags); - prev = xchg(__active_fence_slot(active), fence); - if (prev) { - GEM_BUG_ON(prev == fence); + if (prev) spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING); + + /* + * A does the cmpxchg first, and so it sees C or NULL, as before, or + * something else, depending on the timing of other threads and/or + * interrupt handler. If not the same as before then A unlocks C if + * applicable and retries, starting from an attempt to get a new + * active->fence. Meanwhile, B follows the same path as A. + * Once A succeeds with cmpxch, B fails again, retires, gets A from + * active->fence, locks it as soon as A completes, and possibly + * succeeds with cmpxchg. + */ + while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) { + if (prev) { + spin_unlock(prev->lock); + dma_fence_put(prev); + } + spin_unlock_irqrestore(fence->lock, flags); + + prev = i915_active_fence_get(active); + GEM_BUG_ON(prev == fence); + + spin_lock_irqsave(fence->lock, flags); + if (prev) + spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING); + } + + /* + * If prev is NULL then the previous fence must have been signaled + * and we know that we are first on the timeline. If it is still + * present then, having the lock on that fence already acquired, we + * serialise with the interrupt handler, in the process of removing it + * from any future interrupt callback. A will then wait on C before + * executing (if present). + * + * As B is second, it sees A as the previous fence and so waits for + * it to complete its transition and takes over the occupancy for + * itself -- remembering that it needs to wait on A before executing. + */ + if (prev) { __list_del_entry(&active->cb.node); spin_unlock(prev->lock); /* serialise with prev->cb_list */ } @@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active, int err = 0; /* Must maintain timeline ordering wrt previous active requests */ - rcu_read_lock(); fence = __i915_active_fence_set(active, &rq->fence); - if (fence) /* but the previous fence may not belong to that timeline! */ - fence = dma_fence_get_rcu(fence); - rcu_read_unlock(); if (fence) { err = i915_request_await_dma_fence(rq, fence); dma_fence_put(fence); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 894068bb37b6..833b73edefdb 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq, request_to_parent(rq)->parallel.last_rq = i915_request_get(rq); + /* + * Users have to put a reference potentially got by + * __i915_active_fence_set() to the returned request + * when no longer needed + */ return to_request(__i915_active_fence_set(&timeline->last_request, &rq->fence)); } @@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq, 0); } + /* + * Users have to put the reference to prev potentially got + * by __i915_active_fence_set() when no longer needed + */ return prev; } @@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) prev = __i915_request_ensure_ordering(rq, timeline); else prev = __i915_request_ensure_parallel_ordering(rq, timeline); + if (prev) + i915_request_put(prev); /* * Make sure that no request gazumped us - if it was allocated after

2 years

3
2
0 0

FAILED: patch "[PATCH] LoongArch: Ensure FP/SIMD registers in the core dump file is" failed to apply to 6.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y git checkout FETCH_HEAD git cherry-pick -x 656f9aec07dba7c61d469727494a5d1b18d0bef4 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082705-predator-enjoyable-15fb@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 656f9aec07dba7c61d469727494a5d1b18d0bef4 Mon Sep 17 00:00:00 2001 From: Huacai Chen <chenhuacai(a)kernel.org> Date: Sat, 26 Aug 2023 22:21:57 +0800 Subject: [PATCH] LoongArch: Ensure FP/SIMD registers in the core dump file is up to date This is a port of commit 379eb01c21795edb4c ("riscv: Ensure the value of FP registers in the core dump file is up to date"). The values of FP/SIMD registers in the core dump file come from the thread.fpu. However, kernel saves the FP/SIMD registers only before scheduling out the process. If no process switch happens during the exception handling, kernel will not have a chance to save the latest values of FP/SIMD registers. So it may cause their values in the core dump file incorrect. To solve this problem, force fpr_get()/simd_get() to save the FP/SIMD registers into the thread.fpu if the target task equals the current task. Cc: stable(a)vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index b541f6248837..c2d8962fda00 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -173,16 +173,30 @@ static inline void restore_fp(struct task_struct *tsk) _restore_fp(&tsk->thread.fpu); } -static inline union fpureg *get_fpu_regs(struct task_struct *tsk) +static inline void save_fpu_regs(struct task_struct *tsk) { + unsigned int euen; + if (tsk == current) { preempt_disable(); - if (is_fpu_owner()) + + euen = csr_read32(LOONGARCH_CSR_EUEN); + +#ifdef CONFIG_CPU_HAS_LASX + if (euen & CSR_EUEN_LASXEN) + _save_lasx(&current->thread.fpu); + else +#endif +#ifdef CONFIG_CPU_HAS_LSX + if (euen & CSR_EUEN_LSXEN) + _save_lsx(&current->thread.fpu); + else +#endif + if (euen & CSR_EUEN_FPEN) _save_fp(&current->thread.fpu); + preempt_enable(); } - - return tsk->thread.fpu.fpr; } static inline int is_simd_owner(void) diff --git a/arch/loongarch/kernel/ptrace.c b/arch/loongarch/kernel/ptrace.c index a0767c3a0f0a..f72adbf530c6 100644 --- a/arch/loongarch/kernel/ptrace.c +++ b/arch/loongarch/kernel/ptrace.c @@ -147,6 +147,8 @@ static int fpr_get(struct task_struct *target, { int r; + save_fpu_regs(target); + if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t)) r = gfpr_get(target, &to); else @@ -278,6 +280,8 @@ static int simd_get(struct task_struct *target, { const unsigned int wr_size = NUM_FPU_REGS * regset->size; + save_fpu_regs(target); + if (!tsk_used_math(target)) { /* The task hasn't used FP or LSX, fill with 0xff */ copy_pad_fprs(target, regset, &to, 0);

2 years

3
3
0 0

FAILED: patch "[PATCH] drm/i915: Fix premature release of request's reusable memory" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x a337b64f0d5717248a0c894e2618e658e6a9de9f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080739-trinity-overfed-f523@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: a337b64f0d57 ("drm/i915: Fix premature release of request's reusable memory") 506006055769 ("drm/i915/active: Fix misuse of non-idle barriers as fence trackers") ad5c99e02047 ("drm/i915: Remove unused bits of i915_vma/active api") f6c466b84cfa ("drm/i915: Add support for moving fence waiting") 544460c33821 ("drm/i915: Multi-BB execbuf") 5851387a422c ("drm/i915/guc: Implement no mid batch preemption for multi-lrc") e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") d38a9294491d ("drm/i915/guc: Update debugfs for GuC multi-lrc") bc955204919e ("drm/i915/guc: Insert submit fences between requests in parent-child relationship") 6b540bf6f143 ("drm/i915/guc: Implement multi-lrc submission") 99b47aaddfa9 ("drm/i915/guc: Implement parallel context pin / unpin functions") c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration") 3897df4c0187 ("drm/i915/guc: Introduce context parent-child relationship") 4f3059dc2dbb ("drm/i915: Add logical engine mapping") 1a52faed3131 ("drm/i915/guc: Take GT PM ref when deregistering context") 0ea92ace8b95 ("drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct") 0d8ee5ba8db4 ("drm/i915: Don't back up pinned LMEM context images and rings during suspend") c56ce9565374 ("drm/i915 Implement LMEM backup and restore for suspend / resume") 0d9388635a22 ("drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects") 68c03c0e985e ("drm/i915/debugfs: Do not report currently active engine when describing objects") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a337b64f0d5717248a0c894e2618e658e6a9de9f Mon Sep 17 00:00:00 2001 From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> Date: Thu, 20 Jul 2023 11:35:44 +0200 Subject: [PATCH] drm/i915: Fix premature release of request's reusable memory Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janus… (cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 8ef93889061a..5ec293011d99 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq) } } while (unlikely(is_barrier(active))); - if (!__i915_active_fence_set(active, fence)) + fence = __i915_active_fence_set(active, fence); + if (!fence) __i915_active_acquire(ref); + else + dma_fence_put(fence); out: i915_active_release(ref); @@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref, return NULL; } - rcu_read_lock(); prev = __i915_active_fence_set(active, fence); - if (prev) - prev = dma_fence_get_rcu(prev); - else + if (!prev) __i915_active_acquire(ref); - rcu_read_unlock(); return prev; } @@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq) * * Records the new @fence as the last active fence along its timeline in * this active tracker, moving the tracking callbacks from the previous - * fence onto this one. Returns the previous fence (if not already completed), - * which the caller must ensure is executed before the new fence. To ensure - * that the order of fences within the timeline of the i915_active_fence is - * understood, it should be locked by the caller. + * fence onto this one. Gets and returns a reference to the previous fence + * (if not already completed), which the caller must put after making sure + * that it is executed before the new fence. To ensure that the order of + * fences within the timeline of the i915_active_fence is understood, it + * should be locked by the caller. */ struct dma_fence * __i915_active_fence_set(struct i915_active_fence *active, @@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active, struct dma_fence *prev; unsigned long flags; - if (fence == rcu_access_pointer(active->fence)) + /* + * In case of fences embedded in i915_requests, their memory is + * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release + * by new requests. Then, there is a risk of passing back a pointer + * to a new, completely unrelated fence that reuses the same memory + * while tracked under a different active tracker. Combined with i915 + * perf open/close operations that build await dependencies between + * engine kernel context requests and user requests from different + * timelines, this can lead to dependency loops and infinite waits. + * + * As a countermeasure, we try to get a reference to the active->fence + * first, so if we succeed and pass it back to our user then it is not + * released and potentially reused by an unrelated request before the + * user has a chance to set up an await dependency on it. + */ + prev = i915_active_fence_get(active); + if (fence == prev) return fence; GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)); @@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active, * Consider that we have two threads arriving (A and B), with * C already resident as the active->fence. * - * A does the xchg first, and so it sees C or NULL depending - * on the timing of the interrupt handler. If it is NULL, the - * previous fence must have been signaled and we know that - * we are first on the timeline. If it is still present, - * we acquire the lock on that fence and serialise with the interrupt - * handler, in the process removing it from any future interrupt - * callback. A will then wait on C before executing (if present). - * - * As B is second, it sees A as the previous fence and so waits for - * it to complete its transition and takes over the occupancy for - * itself -- remembering that it needs to wait on A before executing. + * Both A and B have got a reference to C or NULL, depending on the + * timing of the interrupt handler. Let's assume that if A has got C + * then it has locked C first (before B). * * Note the strong ordering of the timeline also provides consistent * nesting rules for the fence->lock; the inner lock is always the * older lock. */ spin_lock_irqsave(fence->lock, flags); - prev = xchg(__active_fence_slot(active), fence); - if (prev) { - GEM_BUG_ON(prev == fence); + if (prev) spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING); + + /* + * A does the cmpxchg first, and so it sees C or NULL, as before, or + * something else, depending on the timing of other threads and/or + * interrupt handler. If not the same as before then A unlocks C if + * applicable and retries, starting from an attempt to get a new + * active->fence. Meanwhile, B follows the same path as A. + * Once A succeeds with cmpxch, B fails again, retires, gets A from + * active->fence, locks it as soon as A completes, and possibly + * succeeds with cmpxchg. + */ + while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) { + if (prev) { + spin_unlock(prev->lock); + dma_fence_put(prev); + } + spin_unlock_irqrestore(fence->lock, flags); + + prev = i915_active_fence_get(active); + GEM_BUG_ON(prev == fence); + + spin_lock_irqsave(fence->lock, flags); + if (prev) + spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING); + } + + /* + * If prev is NULL then the previous fence must have been signaled + * and we know that we are first on the timeline. If it is still + * present then, having the lock on that fence already acquired, we + * serialise with the interrupt handler, in the process of removing it + * from any future interrupt callback. A will then wait on C before + * executing (if present). + * + * As B is second, it sees A as the previous fence and so waits for + * it to complete its transition and takes over the occupancy for + * itself -- remembering that it needs to wait on A before executing. + */ + if (prev) { __list_del_entry(&active->cb.node); spin_unlock(prev->lock); /* serialise with prev->cb_list */ } @@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active, int err = 0; /* Must maintain timeline ordering wrt previous active requests */ - rcu_read_lock(); fence = __i915_active_fence_set(active, &rq->fence); - if (fence) /* but the previous fence may not belong to that timeline! */ - fence = dma_fence_get_rcu(fence); - rcu_read_unlock(); if (fence) { err = i915_request_await_dma_fence(rq, fence); dma_fence_put(fence); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 894068bb37b6..833b73edefdb 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq, request_to_parent(rq)->parallel.last_rq = i915_request_get(rq); + /* + * Users have to put a reference potentially got by + * __i915_active_fence_set() to the returned request + * when no longer needed + */ return to_request(__i915_active_fence_set(&timeline->last_request, &rq->fence)); } @@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq, 0); } + /* + * Users have to put the reference to prev potentially got + * by __i915_active_fence_set() when no longer needed + */ return prev; } @@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) prev = __i915_request_ensure_ordering(rq, timeline); else prev = __i915_request_ensure_parallel_ordering(rq, timeline); + if (prev) + i915_request_put(prev); /* * Make sure that no request gazumped us - if it was allocated after

2 years

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2023