The quilt patch titled
Subject: maple_tree: fix mas_skip_node() end slot detection
has been removed from the -mm tree. Its filename was
maple_tree-fix-mas_skip_node-end-slot-detection.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: fix mas_skip_node() end slot detection
Date: Tue, 7 Mar 2023 13:02:46 -0500
Patch series "Fix mas_skip_node() for mas_empty_area()", v2.
mas_empty_area() was incorrectly returning an error when there was room.
The issue was tracked down to mas_skip_node() using the incorrect
end-of-slot count. Instead of using the nodes hard limit, the limit of
data should be used.
mas_skip_node() was also setting the min and max to that of the child
node, which was unnecessary. Within these limits being set, there was
also a bug that corrupted the maple state's max if the offset was set to
the maximum node pivot. The bug was without consequence unless there was
a sufficient gap in the next child node which would cause an error to be
returned.
This patch set fixes these errors by removing the limit setting from
mas_skip_node() and uses the mas_data_end() for slot limits, and adds
tests for all failures discovered.
This patch (of 2):
mas_skip_node() is used to move the maple state to the node with a higher
limit. It does this by walking up the tree and increasing the slot count.
Since slot count may not be able to be increased, it may need to walk up
multiple times to find room to walk right to a higher limit node. The
limit of slots that was being used was the node limit and not the last
location of data in the node. This would cause the maple state to be
shifted outside actual data and enter an error state, thus returning
-EBUSY.
The result of the incorrect error state means that mas_awalk() would
return an error instead of finding the allocation space.
The fix is to use mas_data_end() in mas_skip_node() to detect the nodes
data end point and continue walking the tree up until it is safe to move
to a node with a higher limit.
The walk up the tree also sets the maple state limits so remove the buggy
code from mas_skip_node(). Setting the limits had the unfortunate side
effect of triggering another bug if the parent node was full and the there
was no suitable gap in the second last child, but room in the next child.
mas_skip_node() may also be passed a maple state in an error state from
mas_anode_descend() when no allocations are available. Return on such an
error state immediately.
Link: https://lkml.kernel.org/r/20230307180247.2220303-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230307180247.2220303-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Snild Dolkow <snild(a)sony.com>
Link: https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.…
Tested-by: Snild Dolkow <snild(a)sony.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/lib/maple_tree.c~maple_tree-fix-mas_skip_node-end-slot-detection
+++ a/lib/maple_tree.c
@@ -5099,35 +5099,21 @@ static inline bool mas_rewind_node(struc
*/
static inline bool mas_skip_node(struct ma_state *mas)
{
- unsigned char slot, slot_count;
- unsigned long *pivots;
- enum maple_type mt;
+ if (mas_is_err(mas))
+ return false;
- mt = mte_node_type(mas->node);
- slot_count = mt_slots[mt] - 1;
do {
if (mte_is_root(mas->node)) {
- slot = mas->offset;
- if (slot > slot_count) {
+ if (mas->offset >= mas_data_end(mas)) {
mas_set_err(mas, -EBUSY);
return false;
}
} else {
mas_ascend(mas);
- slot = mas->offset;
- mt = mte_node_type(mas->node);
- slot_count = mt_slots[mt] - 1;
}
- } while (slot > slot_count);
-
- mas->offset = ++slot;
- pivots = ma_pivots(mas_mn(mas), mt);
- if (slot > 0)
- mas->min = pivots[slot - 1] + 1;
-
- if (slot <= slot_count)
- mas->max = pivots[slot];
+ } while (mas->offset >= mas_data_end(mas));
+ mas->offset++;
return true;
}
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-be-more-cautious-about-dead-nodes.patch
maple_tree-detect-dead-nodes-in-mas_start.patch
maple_tree-fix-freeing-of-nodes-in-rcu-mode.patch
maple_tree-remove-extra-smp_wmb-from-mas_dead_leaves.patch
maple_tree-fix-write-memory-barrier-of-nodes-once-dead-for-rcu-mode.patch
maple_tree-add-smp_rmb-to-dead-node-detection.patch
maple_tree-add-rcu-lock-checking-to-rcu-callback-functions.patch
mm-enable-maple-tree-rcu-mode-by-default.patch
On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <linux(a)roeck-us.net> wrote:
>
> On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > LoongArch:
> > >
> > > BUG: MAX_STACK_TRACE_ENTRIES too low!
> > > turning off the locking correctness validator.
> > >
> > > This is because stack_trace_save() returns a big value after call
> > > arch_stack_walk(), here is the call trace:
> > >
> > > save_trace()
> > > stack_trace_save()
> > > arch_stack_walk()
> > > stack_trace_consume_entry()
> > >
> > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > failed, no need to do the useless loops to increase the value of
> > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > problem.
> > >
> > > Reported-by: Guenter Roeck <linux(a)roeck-us.net>
> > > Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.n…
> > > Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
> >
> > The fix makes sense, but I'm asking the same question again (sorry if
> > it's noisy): should we Cc stable(a)vger.kernel.org and/or make a PR for
> > 6.3?
> >
> > To me a bug fixes should be backported into all stable branches affected
> > by the bug, unless there is some serious difficulty. As 6.3 release
> > will work on launched 3A5000 boards out-of-box, people may want to stop
> > staying on the leading edge and use a LTS/stable release series. We
> > can't just say (or behave like) "we don't backport, please use latest
> > mainline" IMO :).
>
> It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> if your policy is to not backport bug fixes, I might as well stop testing
> loongarch on all but the most recent kernel branch. Let me know if this is
> what you want. If so, I think you should let all other regression testers
> know that they should only test loongarch on mainline and possibly on
> linux-next.
This is of course a bug fix, but should Tiezhu resend this patch? Or
just replying to this message with CC stable(a)vger.kernel.org is
enough?
Huacai
>
> Thanks,
> Guenter
From: Dean Luick <dean.luick(a)cornelisnetworks.com>
[ Upstream commit 892ede5a77f337831609fb9c248ac60948061894 ]
Fix possible RMT overflow: Use the correct netdev size.
Don't allow adjusted user contexts to go negative.
Fix QOS calculation: Send kernel context count as an argument since
dd->n_krcv_queues is not yet set up in earliest call. Do not include
the control context in the QOS calculation. Use the same sized
variable to find the max of krcvq[] entries.
Update the RMT count explanation to make more sense.
Signed-off-by: Dean Luick <dean.luick(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Link: https://lore.kernel.org/r/167329106946.1472990.18385495251650939054.stgit@a…
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/infiniband/hw/hfi1/chip.c | 59 +++++++++++++++++--------------
1 file changed, 32 insertions(+), 27 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index ebe970f76232d..90b672feed83d 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -1056,7 +1056,7 @@ static void read_link_down_reason(struct hfi1_devdata *dd, u8 *ldr);
static void handle_temp_err(struct hfi1_devdata *dd);
static void dc_shutdown(struct hfi1_devdata *dd);
static void dc_start(struct hfi1_devdata *dd);
-static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
+static int qos_rmt_entries(unsigned int n_krcv_queues, unsigned int *mp,
unsigned int *np);
static void clear_full_mgmt_pkey(struct hfi1_pportdata *ppd);
static int wait_link_transfer_active(struct hfi1_devdata *dd, int wait_ms);
@@ -13362,7 +13362,6 @@ static int set_up_context_variables(struct hfi1_devdata *dd)
int ret;
unsigned ngroups;
int rmt_count;
- int user_rmt_reduced;
u32 n_usr_ctxts;
u32 send_contexts = chip_send_contexts(dd);
u32 rcv_contexts = chip_rcv_contexts(dd);
@@ -13421,28 +13420,34 @@ static int set_up_context_variables(struct hfi1_devdata *dd)
(num_kernel_contexts + n_usr_ctxts),
&node_affinity.real_cpu_mask);
/*
- * The RMT entries are currently allocated as shown below:
- * 1. QOS (0 to 128 entries);
- * 2. FECN (num_kernel_context - 1 + num_user_contexts +
- * num_netdev_contexts);
- * 3. netdev (num_netdev_contexts).
- * It should be noted that FECN oversubscribe num_netdev_contexts
- * entries of RMT because both netdev and PSM could allocate any receive
- * context between dd->first_dyn_alloc_text and dd->num_rcv_contexts,
- * and PSM FECN must reserve an RMT entry for each possible PSM receive
- * context.
+ * RMT entries are allocated as follows:
+ * 1. QOS (0 to 128 entries)
+ * 2. FECN (num_kernel_context - 1 [a] + num_user_contexts +
+ * num_netdev_contexts [b])
+ * 3. netdev (NUM_NETDEV_MAP_ENTRIES)
+ *
+ * Notes:
+ * [a] Kernel contexts (except control) are included in FECN if kernel
+ * TID_RDMA is active.
+ * [b] Netdev and user contexts are randomly allocated from the same
+ * context pool, so FECN must cover all contexts in the pool.
*/
- rmt_count = qos_rmt_entries(dd, NULL, NULL) + (num_netdev_contexts * 2);
- if (HFI1_CAP_IS_KSET(TID_RDMA))
- rmt_count += num_kernel_contexts - 1;
- if (rmt_count + n_usr_ctxts > NUM_MAP_ENTRIES) {
- user_rmt_reduced = NUM_MAP_ENTRIES - rmt_count;
- dd_dev_err(dd,
- "RMT size is reducing the number of user receive contexts from %u to %d\n",
- n_usr_ctxts,
- user_rmt_reduced);
- /* recalculate */
- n_usr_ctxts = user_rmt_reduced;
+ rmt_count = qos_rmt_entries(num_kernel_contexts - 1, NULL, NULL)
+ + (HFI1_CAP_IS_KSET(TID_RDMA) ? num_kernel_contexts - 1
+ : 0)
+ + n_usr_ctxts
+ + num_netdev_contexts
+ + NUM_NETDEV_MAP_ENTRIES;
+ if (rmt_count > NUM_MAP_ENTRIES) {
+ int over = rmt_count - NUM_MAP_ENTRIES;
+ /* try to squish user contexts, minimum of 1 */
+ if (over >= n_usr_ctxts) {
+ dd_dev_err(dd, "RMT overflow: reduce the requested number of contexts\n");
+ return -EINVAL;
+ }
+ dd_dev_err(dd, "RMT overflow: reducing # user contexts from %u to %u\n",
+ n_usr_ctxts, n_usr_ctxts - over);
+ n_usr_ctxts -= over;
}
/* the first N are kernel contexts, the rest are user/netdev contexts */
@@ -14299,15 +14304,15 @@ static void clear_rsm_rule(struct hfi1_devdata *dd, u8 rule_index)
}
/* return the number of RSM map table entries that will be used for QOS */
-static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
+static int qos_rmt_entries(unsigned int n_krcv_queues, unsigned int *mp,
unsigned int *np)
{
int i;
unsigned int m, n;
- u8 max_by_vl = 0;
+ uint max_by_vl = 0;
/* is QOS active at all? */
- if (dd->n_krcv_queues <= MIN_KERNEL_KCTXTS ||
+ if (n_krcv_queues < MIN_KERNEL_KCTXTS ||
num_vls == 1 ||
krcvqsset <= 1)
goto no_qos;
@@ -14365,7 +14370,7 @@ static void init_qos(struct hfi1_devdata *dd, struct rsm_map_table *rmt)
if (!rmt)
goto bail;
- rmt_entries = qos_rmt_entries(dd, &m, &n);
+ rmt_entries = qos_rmt_entries(dd->n_krcv_queues - 1, &m, &n);
if (rmt_entries == 0)
goto bail;
qpns_per_vl = 1 << m;
--
2.39.2