smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable@vger.kernel.org Signed-off-by: Sean Anderson sean.anderson@seco.com Reviewed-by: Camelia Groza camelia.groza@nxp.com Tested-by: Vladimir Oltean vladimir.oltean@nxp.com --- I got no response the first time I sent this, so I am resending to net. This issue was introduced in a series which went through net, so I hope it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ander...
(no changes since v3)
Changes in v3: - Change blamed commit to something more appropriate
Changes in v2: - Fix one additional call to spin_unlock
drivers/soc/fsl/qbman/qman.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index 739e4eee6b75..1bf1f1ea67f0 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -1456,11 +1456,11 @@ static void qm_congestion_task(struct work_struct *work) union qm_mc_result *mcr; struct qman_cgr *cgr;
- spin_lock(&p->cgr_lock); + spin_lock_irq(&p->cgr_lock); qm_mc_start(&p->p); qm_mc_commit(&p->p, QM_MCC_VERB_QUERYCONGESTION); if (!qm_mc_result_timeout(&p->p, &mcr)) { - spin_unlock(&p->cgr_lock); + spin_unlock_irq(&p->cgr_lock); dev_crit(p->config->dev, "QUERYCONGESTION timeout\n"); qman_p_irqsource_add(p, QM_PIRQ_CSCI); return; @@ -1476,7 +1476,7 @@ static void qm_congestion_task(struct work_struct *work) list_for_each_entry(cgr, &p->cgr_cbs, node) if (cgr->cb && qman_cgrs_get(&c, cgr->cgrid)) cgr->cb(p, cgr, qman_cgrs_get(&rr, cgr->cgrid)); - spin_unlock(&p->cgr_lock); + spin_unlock_irq(&p->cgr_lock); qman_p_irqsource_add(p, QM_PIRQ_CSCI); }
@@ -2440,7 +2440,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, preempt_enable();
cgr->chan = p->config->channel; - spin_lock(&p->cgr_lock); + spin_lock_irq(&p->cgr_lock);
if (opts) { struct qm_mcc_initcgr local_opts = *opts; @@ -2477,7 +2477,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, qman_cgrs_get(&p->cgrs[1], cgr->cgrid)) cgr->cb(p, cgr, 1); out: - spin_unlock(&p->cgr_lock); + spin_unlock_irq(&p->cgr_lock); put_affine_portal(); return ret; } -- 2.35.1.1320.gc452695387.dirty
[Embedded World 2024, SECO SpA]https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689
cgr_lock may be locked with interrupts already disabled by smp_call_function_single. As such, we must use a raw spinlock to avoid problems on PREEMPT_RT kernels. Although this bug has existed for a while, it was not apparent until commit ef2a8d5478b9 ("net: dpaa: Adjust queue depth on rate change") which invokes smp_call_function_single via qman_update_cgr_safe every time a link goes up or down.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable@vger.kernel.org Reported-by: Vladimir Oltean vladimir.oltean@nxp.com Closes: https://lore.kernel.org/all/20230323153935.nofnjucqjqnz34ej@skbuf/ Reported-by: Steffen Trumtrar s.trumtrar@pengutronix.de Closes: https://lore.kernel.org/linux-arm-kernel/87wmsyvclu.fsf@pengutronix.de/ Signed-off-by: Sean Anderson sean.anderson@seco.com Reviewed-by: Camelia Groza camelia.groza@nxp.com Tested-by: Vladimir Oltean vladimir.oltean@nxp.com
---
Changes in v4: - Add a note about how raw spinlocks aren't quite right
Changes in v3: - Change blamed commit to something more appropriate
drivers/soc/fsl/qbman/qman.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index 1bf1f1ea67f0..7e9074519ad2 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -991,7 +991,7 @@ struct qman_portal { /* linked-list of CSCN handlers. */ struct list_head cgr_cbs; /* list lock */ - spinlock_t cgr_lock; + raw_spinlock_t cgr_lock; struct work_struct congestion_work; struct work_struct mr_work; char irqname[MAX_IRQNAME]; @@ -1281,7 +1281,7 @@ static int qman_create_portal(struct qman_portal *portal, /* if the given mask is NULL, assume all CGRs can be seen */ qman_cgrs_fill(&portal->cgrs[0]); INIT_LIST_HEAD(&portal->cgr_cbs); - spin_lock_init(&portal->cgr_lock); + raw_spin_lock_init(&portal->cgr_lock); INIT_WORK(&portal->congestion_work, qm_congestion_task); INIT_WORK(&portal->mr_work, qm_mr_process_task); portal->bits = 0; @@ -1456,11 +1456,14 @@ static void qm_congestion_task(struct work_struct *work) union qm_mc_result *mcr; struct qman_cgr *cgr;
- spin_lock_irq(&p->cgr_lock); + /* + * FIXME: QM_MCR_TIMEOUT is 10ms, which is too long for a raw spinlock! + */ + raw_spin_lock_irq(&p->cgr_lock); qm_mc_start(&p->p); qm_mc_commit(&p->p, QM_MCC_VERB_QUERYCONGESTION); if (!qm_mc_result_timeout(&p->p, &mcr)) { - spin_unlock_irq(&p->cgr_lock); + raw_spin_unlock_irq(&p->cgr_lock); dev_crit(p->config->dev, "QUERYCONGESTION timeout\n"); qman_p_irqsource_add(p, QM_PIRQ_CSCI); return; @@ -1476,7 +1479,7 @@ static void qm_congestion_task(struct work_struct *work) list_for_each_entry(cgr, &p->cgr_cbs, node) if (cgr->cb && qman_cgrs_get(&c, cgr->cgrid)) cgr->cb(p, cgr, qman_cgrs_get(&rr, cgr->cgrid)); - spin_unlock_irq(&p->cgr_lock); + raw_spin_unlock_irq(&p->cgr_lock); qman_p_irqsource_add(p, QM_PIRQ_CSCI); }
@@ -2440,7 +2443,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, preempt_enable();
cgr->chan = p->config->channel; - spin_lock_irq(&p->cgr_lock); + raw_spin_lock_irq(&p->cgr_lock);
if (opts) { struct qm_mcc_initcgr local_opts = *opts; @@ -2477,7 +2480,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, qman_cgrs_get(&p->cgrs[1], cgr->cgrid)) cgr->cb(p, cgr, 1); out: - spin_unlock_irq(&p->cgr_lock); + raw_spin_unlock_irq(&p->cgr_lock); put_affine_portal(); return ret; } @@ -2512,7 +2515,7 @@ int qman_delete_cgr(struct qman_cgr *cgr) return -EINVAL;
memset(&local_opts, 0, sizeof(struct qm_mcc_initcgr)); - spin_lock_irqsave(&p->cgr_lock, irqflags); + raw_spin_lock_irqsave(&p->cgr_lock, irqflags); list_del(&cgr->node); /* * If there are no other CGR objects for this CGRID in the list, @@ -2537,7 +2540,7 @@ int qman_delete_cgr(struct qman_cgr *cgr) /* add back to the list */ list_add(&cgr->node, &p->cgr_cbs); release_lock: - spin_unlock_irqrestore(&p->cgr_lock, irqflags); + raw_spin_unlock_irqrestore(&p->cgr_lock, irqflags); put_affine_portal(); return ret; } @@ -2577,9 +2580,9 @@ static int qman_update_cgr(struct qman_cgr *cgr, struct qm_mcc_initcgr *opts) if (!p) return -EINVAL;
- spin_lock_irqsave(&p->cgr_lock, irqflags); + raw_spin_lock_irqsave(&p->cgr_lock, irqflags); ret = qm_modify_cgr(cgr, 0, opts); - spin_unlock_irqrestore(&p->cgr_lock, irqflags); + raw_spin_unlock_irqrestore(&p->cgr_lock, irqflags); put_affine_portal(); return ret; } -- 2.35.1.1320.gc452695387.dirty
[Embedded World 2024, SECO SpA]https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689
Hi Sean,
On Thu, Feb 15, 2024 at 11:23:26AM -0500, Sean Anderson wrote:
smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable@vger.kernel.org Signed-off-by: Sean Anderson sean.anderson@seco.com Reviewed-by: Camelia Groza camelia.groza@nxp.com Tested-by: Vladimir Oltean vladimir.oltean@nxp.com
I got no response the first time I sent this, so I am resending to net. This issue was introduced in a series which went through net, so I hope it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ander...
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
Leo Li (Li Yang) is no longer with NXP. Until we figure out within NXP how to continue with the maintainership of drivers/soc/fsl/, yes, please continue to submit this series to 'net'. I would also like to point out to Arnd that this is the case.
Arnd, a large portion of drivers/soc/fsl/ is networking-related (dpio, qbman). Would it make sense to transfer the maintainership of these under the respective networking drivers, to simplify the procedures?
Also, your patches are whitespace-damaged. They do not apply to the kernel, and patchwork shows this as well. https://patchwork.kernel.org/project/netdevbpf/patch/20240215162327.3663092-...
Please repost with this fixed.
On 2/19/24 10:30, Vladimir Oltean wrote:
Hi Sean,
On Thu, Feb 15, 2024 at 11:23:26AM -0500, Sean Anderson wrote:
smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable@vger.kernel.org Signed-off-by: Sean Anderson sean.anderson@seco.com Reviewed-by: Camelia Groza camelia.groza@nxp.com Tested-by: Vladimir Oltean vladimir.oltean@nxp.com
I got no response the first time I sent this, so I am resending to net. This issue was introduced in a series which went through net, so I hope it makes sense to take it via net.
[1] https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=http...
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
Leo Li (Li Yang) is no longer with NXP. Until we figure out within NXP how to continue with the maintainership of drivers/soc/fsl/, yes, please continue to submit this series to 'net'. I would also like to point out to Arnd that this is the case.
Arnd, a large portion of drivers/soc/fsl/ is networking-related (dpio, qbman). Would it make sense to transfer the maintainership of these under the respective networking drivers, to simplify the procedures?
Also, your patches are whitespace-damaged. They do not apply to the kernel, and patchwork shows this as well. https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=http...
Please repost with this fixed.
Hm, I used the same method I have in the past (git send-email). But I guess something is converting my tabs to spaces? Maybe it is related to the embedded world advertisement...
Maybe the solution is to get a kernel.org email...
--Sean
[Embedded World 2024, SECO SpA]https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689
Hi Vladimir,
Le 19/02/2024 à 16:30, Vladimir Oltean a écrit :
Hi Sean,
On Thu, Feb 15, 2024 at 11:23:26AM -0500, Sean Anderson wrote:
smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable@vger.kernel.org Signed-off-by: Sean Anderson sean.anderson@seco.com Reviewed-by: Camelia Groza camelia.groza@nxp.com Tested-by: Vladimir Oltean vladimir.oltean@nxp.com
I got no response the first time I sent this, so I am resending to net. This issue was introduced in a series which went through net, so I hope it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ander...
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
Leo Li (Li Yang) is no longer with NXP. Until we figure out within NXP how to continue with the maintainership of drivers/soc/fsl/, yes, please continue to submit this series to 'net'. I would also like to point out to Arnd that this is the case.
Arnd, a large portion of drivers/soc/fsl/ is networking-related (dpio, qbman). Would it make sense to transfer the maintainership of these under the respective networking drivers, to simplify the procedures?
I see FREESCALE QUICC ENGINE LIBRARY (drivers/soc/fsl/qe/) is maintained by Qiang Zhao qiang.zhao@nxp.com but I can't find any mail from him in the past 4 years in linuxppc-dev list, and everytime I wanted to submit something I only got responses from Leo Ly.
The last commit he reviewed is 661ea25e5319 ("soc: fsl: qe: Replace one-element array and use struct_size() helper"), it was in May 2020.
Is he still working at NXP and actively maintaining that library ? Keeping this part maintained is vital for me as this SOC is embedded in the two powerpc platform I maintain (8xx and 83xx).
If Qiang Zhao is not able to activaly maintain that SOC anymore, I volonteer to maintain it.
Thanks Christophe
On Wed, Apr 10, 2024, at 06:54, Christophe Leroy wrote:
Le 19/02/2024 à 16:30, Vladimir Oltean a écrit :
On Thu, Feb 15, 2024 at 11:23:26AM -0500, Sean Anderson wrote:
smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable@vger.kernel.org Signed-off-by: Sean Anderson sean.anderson@seco.com Reviewed-by: Camelia Groza camelia.groza@nxp.com Tested-by: Vladimir Oltean vladimir.oltean@nxp.com
I got no response the first time I sent this, so I am resending to net. This issue was introduced in a series which went through net, so I hope it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ander...
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
Leo Li (Li Yang) is no longer with NXP. Until we figure out within NXP how to continue with the maintainership of drivers/soc/fsl/, yes, please continue to submit this series to 'net'. I would also like to point out to Arnd that this is the case.
Arnd, a large portion of drivers/soc/fsl/ is networking-related (dpio, qbman). Would it make sense to transfer the maintainership of these under the respective networking drivers, to simplify the procedures?
If there are parts that are only used by networking, I'm definitely fine with moving those out of drivers/soc into the respective users, but as far as I can tell, all the code there is shared by multiple subsystems (crypto, dma, usb, ...), so that would likely require at least a reorganization.
I see FREESCALE QUICC ENGINE LIBRARY (drivers/soc/fsl/qe/) is maintained by Qiang Zhao qiang.zhao@nxp.com but I can't find any mail from him in the past 4 years in linuxppc-dev list, and everytime I wanted to submit something I only got responses from Leo Ly.
The last commit he reviewed is 661ea25e5319 ("soc: fsl: qe: Replace one-element array and use struct_size() helper"), it was in May 2020.
Is he still working at NXP and actively maintaining that library ? Keeping this part maintained is vital for me as this SOC is embedded in the two powerpc platform I maintain (8xx and 83xx).
If Qiang Zhao is not able to activaly maintain that SOC anymore, I volonteer to maintain it.
Thanks, much appreciated. The QE driver is also used on arm64/ls1043a, but I have not seen any email or pull requests from Qiang Zhao for that driver either.
The previous setup was that Li Yang picked up patches for anything under drivers/soc/fsl/ and forwarded them to soc@kernel.org for me to pick up.
I would very much like to get back to the state of having one or two maintainers for all of drivers/soc/fsl/ and not have to worry about individual drivers under it when they are all maintained by different people.
Shawn Guo is already maintaining the arm64 side of Layerscape in addition to the i.MX code. Herve Codina in turn has taken responsibility for qe/qmc.c and qe/tsa.c.
Maybe you can pick one more more maintainers for drivers/soc/fsl/ between the three of you to collect patches into a git branch and send pull requests to soc@kernel.org?
Arnd
linux-stable-mirror@lists.linaro.org