- Linux-stable-mirror - lists.linaro.org

[PATCH] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS

by Nathan Chancellor

arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an error when building with clang after a recent change in main: error: option '-msoft-float' cannot be specified with '-maltivec' make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1 Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS to override the previous inclusion of '-msoft-float' (as the last option wins), which matches how other areas of the kernel use '-maltivec', such as AMDGPU. Cc: stable(a)vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1986 Link: https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873b… Signed-off-by: Nathan Chancellor <nathan(a)kernel.org> --- arch/powerpc/lib/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 6eac63e79a89..0ab65eeb93ee 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -76,7 +76,7 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o -CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec) +CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec) # Enable <altivec.h> CFLAGS_xor_vmx.o += -isystem $(shell $(CC) -print-file-name=include) --- base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d change-id: 20240127-ppc-xor_vmx-drop-msoft-float-ad68b437f86c Best regards, -- Nathan Chancellor <nathan(a)kernel.org>

1 year, 7 months

3
4
0 0

[PATCH] input: ioc3kbd: add device table

by Karel Balej

Without the device table the driver will not auto-load when compiled as a module. Fixes: 273db8f03509 ("Input: add IOC3 serio driver") Cc: stable(a)vger.kernel.org Signed-off-by: Karel Balej <balejk(a)matfyz.cz> --- I do not own any device using this driver, but I verified that modinfo does not show the "platform:ioc3-kbd" alias when run on the module compiled without this patch and it does with it. drivers/input/serio/ioc3kbd.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/input/serio/ioc3kbd.c b/drivers/input/serio/ioc3kbd.c index 50552dc7b4f5..676b0bda3d72 100644 --- a/drivers/input/serio/ioc3kbd.c +++ b/drivers/input/serio/ioc3kbd.c @@ -200,9 +200,16 @@ static void ioc3kbd_remove(struct platform_device *pdev) serio_unregister_port(d->aux); } +static const struct platform_device_id ioc3kbd_id_table[] = { + { "ioc3-kbd", }, + { } +}; +MODULE_DEVICE_TABLE(platform, ioc3kbd_id_table); + static struct platform_driver ioc3kbd_driver = { .probe = ioc3kbd_probe, .remove_new = ioc3kbd_remove, + .id_table = ioc3kbd_id_table, .driver = { .name = "ioc3-kbd", }, -- 2.44.0

1 year, 7 months

1
0
0 0

[PATCH STABLE v6.1.y] mm/migrate: set swap entry values of THP tail pages properly.

by Zi Yan

From: Zi Yan <ziy(a)nvidia.com> The tail pages in a THP can have swap entry information stored in their private field. When migrating to a new page, all tail pages of the new page need to update ->private to avoid future data corruption. This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory: work on folio->swap instead of page->private when splitting folio"), subpages of a swapcached THP no longer requires the maintenance. Adding THPs to the swapcache was introduced in commit 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"), where each subpage of a THP added to the swapcache had its own swapcache entry and required the ->private field to point to the correct swapcache entry. Later, when THP migration functionality was implemented in commit 616b8371539a6 ("mm: thp: enable thp migration in generic path"), it initially did not handle the subpages of swapcached THPs, failing to update their ->private fields or replace the subpage pointers in the swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration for shmem thp") addressed the swapcache update aspect. This patch fixes the update of subpage ->private fields. Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_cha… Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Zi Yan <ziy(a)nvidia.com> Acked-by: David Hildenbrand <david(a)redhat.com> --- mm/migrate.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index c93dd6a31c31..c5968021fde0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -423,8 +423,12 @@ int folio_migrate_mapping(struct address_space *mapping, if (folio_test_swapbacked(folio)) { __folio_set_swapbacked(newfolio); if (folio_test_swapcache(folio)) { + int i; + folio_set_swapcache(newfolio); - newfolio->private = folio_get_private(folio); + for (i = 0; i < nr; i++) + set_page_private(folio_page(newfolio, i), + page_private(folio_page(folio, i))); } entries = nr; } else { -- 2.43.0

1 year, 7 months

2
1
0 0

[PATCH 2/2] f2fs: truncate page cache before clearing flags when aborting atomic file

by Sunmin Jeong

In f2fs_do_write_data_page, FI_ATOMIC_FILE flag selects the target inode between the original inode and COW inode. When aborting atomic write and writeback occur simultaneously, invalid data can be written to original inode if the FI_ATOMIC_FILE flag is cleared meanwhile. To prevent the problem, let's truncate all pages before clearing the flag Atomic write thread Writeback thread f2fs_abort_atomic_write clear_inode_flag(inode, FI_ATOMIC_FILE) __writeback_single_inode do_writepages f2fs_do_write_data_page - use dn of original inode truncate_inode_pages_final Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable(a)vger.kernel.org #v5.19+ Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com> --- fs/f2fs/segment.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 7901ede58113..7e47b8054413 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -192,6 +192,9 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean) if (!f2fs_is_atomic_file(inode)) return; + if (clean) + truncate_inode_pages_final(inode->i_mapping); + release_atomic_write_cnt(inode); clear_inode_flag(inode, FI_ATOMIC_COMMITTED); clear_inode_flag(inode, FI_ATOMIC_REPLACE); @@ -201,7 +204,6 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean) F2FS_I(inode)->atomic_write_task = NULL; if (clean) { - truncate_inode_pages_final(inode->i_mapping); f2fs_i_size_write(inode, fi->original_i_size); fi->original_i_size = 0; } -- 2.25.1

1 year, 7 months

1
0
0 0

[PATCH 1/2] f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag

by Sunmin Jeong

In f2fs_update_inode, i_size of the atomic file isn't updated until FI_ATOMIC_COMMITTED flag is set. When committing atomic write right after the writeback of the inode, i_size of the raw inode will not be updated. It can cause the atomicity corruption due to a mismatch between old file size and new data. To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED Atomic write thread Writeback thread __writeback_single_inode write_inode f2fs_update_inode - skip i_size update f2fs_ioc_commit_atomic_write f2fs_commit_atomic_write set_inode_flag(inode, FI_ATOMIC_COMMITTED) f2fs_do_sync_file f2fs_fsync_node_pages - skip f2fs_update_inode since the inode is clean Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable(a)vger.kernel.org #v5.19+ Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com> --- fs/f2fs/f2fs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 543898482f8b..a000cb024dbe 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3039,6 +3039,7 @@ static inline void __mark_inode_dirty_flag(struct inode *inode, case FI_INLINE_DOTS: case FI_PIN_FILE: case FI_COMPRESS_RELEASED: + case FI_ATOMIC_COMMITTED: f2fs_mark_inode_dirty_sync(inode, true); } } -- 2.25.1

1 year, 7 months

1
0
0 0

[PATCH] usb: dwc2: gadget: LPM flow fix

by Minas Harutyunyan

Added functionality to exit from L1 state by device initiation using remote wakeup signaling, in case when function driver queuing request while core in L1 state. Fixes: 273d576c4d41 ("usb: dwc2: gadget: Add functionality to exit from LPM L1 state") Fixes: 88b02f2cb1e1 ("usb: dwc2: Add core state checking") CC: stable(a)vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com> --- drivers/usb/dwc2/core.h | 1 + drivers/usb/dwc2/core_intr.c | 63 ++++++++++++++++++++++++------------ drivers/usb/dwc2/gadget.c | 4 +++ 3 files changed, 47 insertions(+), 21 deletions(-) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index c92a1da46a01..747b69d390b2 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -1323,6 +1323,7 @@ int dwc2_backup_global_registers(struct dwc2_hsotg *hsotg); int dwc2_restore_global_registers(struct dwc2_hsotg *hsotg); void dwc2_enable_acg(struct dwc2_hsotg *hsotg); +void dwc2_wakeup_from_lpm_l1(struct dwc2_hsotg *hsotg, bool remotewakeup); /* This function should be called on every hardware interrupt. */ irqreturn_t dwc2_handle_common_intr(int irq, void *dev); diff --git a/drivers/usb/dwc2/core_intr.c b/drivers/usb/dwc2/core_intr.c index 158ede753854..6e67804eca6f 100644 --- a/drivers/usb/dwc2/core_intr.c +++ b/drivers/usb/dwc2/core_intr.c @@ -322,10 +322,11 @@ static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg) * @hsotg: Programming view of DWC_otg controller * */ -static void dwc2_wakeup_from_lpm_l1(struct dwc2_hsotg *hsotg) +void dwc2_wakeup_from_lpm_l1(struct dwc2_hsotg *hsotg, bool remotewakeup) { u32 glpmcfg; - u32 i = 0; + u32 pcgctl; + u32 dctl; if (hsotg->lx_state != DWC2_L1) { dev_err(hsotg->dev, "Core isn't in DWC2_L1 state\n"); @@ -334,37 +335,57 @@ static void dwc2_wakeup_from_lpm_l1(struct dwc2_hsotg *hsotg) glpmcfg = dwc2_readl(hsotg, GLPMCFG); if (dwc2_is_device_mode(hsotg)) { - dev_dbg(hsotg->dev, "Exit from L1 state\n"); + dev_dbg(hsotg->dev, "Exit from L1 state, remotewakeup=%d\n", remotewakeup); glpmcfg &= ~GLPMCFG_ENBLSLPM; - glpmcfg &= ~GLPMCFG_HIRD_THRES_EN; + glpmcfg &= ~GLPMCFG_HIRD_THRES_MASK; dwc2_writel(hsotg, glpmcfg, GLPMCFG); - do { - glpmcfg = dwc2_readl(hsotg, GLPMCFG); + pcgctl = dwc2_readl(hsotg, PCGCTL); + pcgctl &= ~PCGCTL_ENBL_SLEEP_GATING; + dwc2_writel(hsotg, pcgctl, PCGCTL); - if (!(glpmcfg & (GLPMCFG_COREL1RES_MASK | - GLPMCFG_L1RESUMEOK | GLPMCFG_SLPSTS))) - break; + glpmcfg = dwc2_readl(hsotg, GLPMCFG); + if (glpmcfg & GLPMCFG_ENBESL) { + glpmcfg |= GLPMCFG_RSTRSLPSTS; + dwc2_writel(hsotg, glpmcfg, GLPMCFG); + } + + if (remotewakeup) { + if (dwc2_hsotg_wait_bit_set(hsotg, GLPMCFG, GLPMCFG_L1RESUMEOK, 1000)) { + dev_warn(hsotg->dev, "%s: timeout GLPMCFG_L1RESUMEOK\n", __func__); + goto fail; + return; + } + + dctl = dwc2_readl(hsotg, DCTL); + dctl |= DCTL_RMTWKUPSIG; + dwc2_writel(hsotg, dctl, DCTL); - udelay(1); - } while (++i < 200); + if (dwc2_hsotg_wait_bit_set(hsotg, GINTSTS, GINTSTS_WKUPINT, 1000)) { + dev_warn(hsotg->dev, "%s: timeout GINTSTS_WKUPINT\n", __func__); + goto fail; + return; + } + } - if (i == 200) { - dev_err(hsotg->dev, "Failed to exit L1 sleep state in 200us.\n"); + glpmcfg = dwc2_readl(hsotg, GLPMCFG); + if (glpmcfg & GLPMCFG_COREL1RES_MASK || glpmcfg & GLPMCFG_SLPSTS || + glpmcfg & GLPMCFG_L1RESUMEOK) { + goto fail; return; } - dwc2_gadget_init_lpm(hsotg); + + /* Inform gadget to exit from L1 */ + call_gadget(hsotg, resume); + /* Change to L0 state */ + hsotg->lx_state = DWC2_L0; + hsotg->bus_suspended = false; +fail: dwc2_gadget_init_lpm(hsotg); } else { /* TODO */ dev_err(hsotg->dev, "Host side LPM is not supported.\n"); return; } - - /* Change to L0 state */ - hsotg->lx_state = DWC2_L0; - - /* Inform gadget to exit from L1 */ - call_gadget(hsotg, resume); } /* @@ -385,7 +406,7 @@ static void dwc2_handle_wakeup_detected_intr(struct dwc2_hsotg *hsotg) dev_dbg(hsotg->dev, "%s lxstate = %d\n", __func__, hsotg->lx_state); if (hsotg->lx_state == DWC2_L1) { - dwc2_wakeup_from_lpm_l1(hsotg); + dwc2_wakeup_from_lpm_l1(hsotg, false); return; } diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c index b517a7216de2..38cd91bc29b4 100644 --- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -1415,6 +1415,10 @@ static int dwc2_hsotg_ep_queue(struct usb_ep *ep, struct usb_request *req, ep->name, req, req->length, req->buf, req->no_interrupt, req->zero, req->short_not_ok); + if (hs->lx_state == DWC2_L1) { + dwc2_wakeup_from_lpm_l1(hs, true); + } + /* Prevent new request submission when controller is suspended */ if (hs->lx_state != DWC2_L0) { dev_dbg(hs->dev, "%s: submit request only in active state\n", base-commit: 539f317ea7321225be4508975fa6dfbe2281cff9 -- 2.41.0

1 year, 7 months

1
0
0 0

[PATCH] usb: dwc2: gadget: Fix exiting from clock gating

by Minas Harutyunyan

Added exiting from the clock gating mode on USB Reset Detect interrupt if core in the clock gating mode. Added new condition to check core in clock gating mode or no. Fixes: 9b4965d77e11 ("usb: dwc2: Add exit clock gating from session request interrupt") Fixes: 5d240efddc7f ("usb: dwc2: Add exit clock gating from wakeup interrupt") Fixes: 16c729f90bdf ("usb: dwc2: Allow exit clock gating in urb enqueue") Fixes: 401411bbc4e6 ("usb: dwc2: Add exit clock gating before removing driver") CC: stable(a)vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com> --- drivers/usb/dwc2/core_intr.c | 9 ++++++--- drivers/usb/dwc2/gadget.c | 6 ++++++ drivers/usb/dwc2/hcd.c | 2 +- drivers/usb/dwc2/platform.c | 2 +- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/usb/dwc2/core_intr.c b/drivers/usb/dwc2/core_intr.c index 158ede753854..f8426e3d2b19 100644 --- a/drivers/usb/dwc2/core_intr.c +++ b/drivers/usb/dwc2/core_intr.c @@ -297,7 +297,8 @@ static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg) /* Exit gadget mode clock gating. */ if (hsotg->params.power_down == - DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended) + DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended && + !hsotg->params.no_clock_gating) dwc2_gadget_exit_clock_gating(hsotg, 0); } @@ -408,7 +409,8 @@ static void dwc2_handle_wakeup_detected_intr(struct dwc2_hsotg *hsotg) /* Exit gadget mode clock gating. */ if (hsotg->params.power_down == - DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended) + DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended && + !hsotg->params.no_clock_gating) dwc2_gadget_exit_clock_gating(hsotg, 0); } else { /* Change to L0 state */ @@ -425,7 +427,8 @@ static void dwc2_handle_wakeup_detected_intr(struct dwc2_hsotg *hsotg) } if (hsotg->params.power_down == - DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended) + DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended && + !hsotg->params.no_clock_gating) dwc2_host_exit_clock_gating(hsotg, 1); /* diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c index b517a7216de2..8d3d937c81f9 100644 --- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -3727,6 +3727,12 @@ static irqreturn_t dwc2_hsotg_irq(int irq, void *pw) if (hsotg->in_ppd && hsotg->lx_state == DWC2_L2) dwc2_exit_partial_power_down(hsotg, 0, true); + /* Exit gadget mode clock gating. */ + if (hsotg->params.power_down == + DWC2_POWER_DOWN_PARAM_NONE && hsotg->bus_suspended && + !hsotg->params.no_clock_gating) + dwc2_gadget_exit_clock_gating(hsotg, 0); + hsotg->lx_state = DWC2_L0; } diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index 35c7a4df8e71..d5491ada8eed 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -4649,7 +4649,7 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, } if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_NONE && - hsotg->bus_suspended) { + hsotg->bus_suspended && !hsotg->params.no_clock_gating) { if (dwc2_is_device_mode(hsotg)) dwc2_gadget_exit_clock_gating(hsotg, 0); else diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c index b1d48019e944..7b84416dfc2b 100644 --- a/drivers/usb/dwc2/platform.c +++ b/drivers/usb/dwc2/platform.c @@ -331,7 +331,7 @@ static void dwc2_driver_remove(struct platform_device *dev) /* Exit clock gating when driver is removed. */ if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_NONE && - hsotg->bus_suspended) { + hsotg->bus_suspended && !hsotg->params.no_clock_gating) { if (dwc2_is_device_mode(hsotg)) dwc2_gadget_exit_clock_gating(hsotg, 0); else base-commit: 539f317ea7321225be4508975fa6dfbe2281cff9 -- 2.41.0

1 year, 7 months

1
0
0 0

[PATCH] usb: dwc2: host: Fix remote wakeup from hibernation

by Minas Harutyunyan

Starting from core v4.30a changed order of programming GPWRDN_PMUACTV to 0 in case of exit from hibernation on remote wakeup signaling from device. Fixes: c5c403dc4336 ("usb: dwc2: Add host/device hibernation functions") CC: stable(a)vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com> --- drivers/usb/dwc2/core.h | 1 + drivers/usb/dwc2/hcd.c | 17 +++++++++++++---- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index c92a1da46a01..2f999023ffa3 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -1086,6 +1086,7 @@ struct dwc2_hsotg { bool needs_byte_swap; /* DWC OTG HW Release versions */ +#define DWC2_CORE_REV_4_30a 0x4f54430a #define DWC2_CORE_REV_2_71a 0x4f54271a #define DWC2_CORE_REV_2_72a 0x4f54272a #define DWC2_CORE_REV_2_80a 0x4f54280a diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index 35c7a4df8e71..3b955b314199 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -5610,10 +5610,12 @@ int dwc2_host_exit_hibernation(struct dwc2_hsotg *hsotg, int rem_wakeup, dwc2_writel(hsotg, hr->hcfg, HCFG); /* De-assert Wakeup Logic */ - gpwrdn = dwc2_readl(hsotg, GPWRDN); - gpwrdn &= ~GPWRDN_PMUACTV; - dwc2_writel(hsotg, gpwrdn, GPWRDN); - udelay(10); + if (!(rem_wakeup && hsotg->hw_params.snpsid >= DWC2_CORE_REV_4_30a)) { + gpwrdn = dwc2_readl(hsotg, GPWRDN); + gpwrdn &= ~GPWRDN_PMUACTV; + dwc2_writel(hsotg, gpwrdn, GPWRDN); + udelay(10); + } hprt0 = hr->hprt0; hprt0 |= HPRT0_PWR; @@ -5638,6 +5640,13 @@ int dwc2_host_exit_hibernation(struct dwc2_hsotg *hsotg, int rem_wakeup, hprt0 |= HPRT0_RES; dwc2_writel(hsotg, hprt0, HPRT0); + /* De-assert Wakeup Logic */ + if ((rem_wakeup && hsotg->hw_params.snpsid >= DWC2_CORE_REV_4_30a)) { + gpwrdn = dwc2_readl(hsotg, GPWRDN); + gpwrdn &= ~GPWRDN_PMUACTV; + dwc2_writel(hsotg, gpwrdn, GPWRDN); + udelay(10); + } /* Wait for Resume time and then program HPRT again */ mdelay(100); hprt0 &= ~HPRT0_RES; base-commit: 539f317ea7321225be4508975fa6dfbe2281cff9 -- 2.41.0

1 year, 7 months

1
0
0 0

[PATCH] usb: dwc2: host: Fix hibernation flow

by Minas Harutyunyan

Added to backup/restore registers HFLBADDR, HCCHARi, HCSPLTi, HCTSIZi, HCDMAi and HCDMABi. Fixes: 58e52ff6a6c3 ("usb: dwc2: Move register save and restore functions") Fixes: d17ee77b3044 ("usb: dwc2: add controller hibernation support") CC: stable(a)vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com> --- drivers/usb/dwc2/core.h | 12 ++++++++++++ drivers/usb/dwc2/hcd.c | 18 ++++++++++++++++-- 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index c92a1da46a01..40f0af171bac 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -729,8 +729,14 @@ struct dwc2_dregs_backup { * struct dwc2_hregs_backup - Holds host registers state before * entering partial power down * @hcfg: Backup of HCFG register + * @hflbaddr: Backup of HFLBADDR register * @haintmsk: Backup of HAINTMSK register + * @hcchar: Backup of HCCHAR register + * @hcsplt: Backup of HCSPLT register * @hcintmsk: Backup of HCINTMSK register + * @hctsiz: Backup of HCTSIZ register + * @hdma: Backup of HCDMA register + * @hcdmab: Backup of HCDMAB register * @hprt0: Backup of HPTR0 register * @hfir: Backup of HFIR register * @hptxfsiz: Backup of HPTXFSIZ register @@ -738,8 +744,14 @@ struct dwc2_dregs_backup { */ struct dwc2_hregs_backup { u32 hcfg; + u32 hflbaddr; u32 haintmsk; + u32 hcchar[MAX_EPS_CHANNELS]; + u32 hcsplt[MAX_EPS_CHANNELS]; u32 hcintmsk[MAX_EPS_CHANNELS]; + u32 hctsiz[MAX_EPS_CHANNELS]; + u32 hcidma[MAX_EPS_CHANNELS]; + u32 hcidmab[MAX_EPS_CHANNELS]; u32 hprt0; u32 hfir; u32 hptxfsiz; diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index 35c7a4df8e71..83d5b2548f59 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -5406,9 +5406,16 @@ int dwc2_backup_host_registers(struct dwc2_hsotg *hsotg) /* Backup Host regs */ hr = &hsotg->hr_backup; hr->hcfg = dwc2_readl(hsotg, HCFG); + hr->hflbaddr = dwc2_readl(hsotg, HFLBADDR); hr->haintmsk = dwc2_readl(hsotg, HAINTMSK); - for (i = 0; i < hsotg->params.host_channels; ++i) + for (i = 0; i < hsotg->params.host_channels; ++i) { + hr->hcchar[i] = dwc2_readl(hsotg, HCCHAR(i)); + hr->hcsplt[i] = dwc2_readl(hsotg, HCSPLT(i)); hr->hcintmsk[i] = dwc2_readl(hsotg, HCINTMSK(i)); + hr->hctsiz[i] = dwc2_readl(hsotg, HCTSIZ(i)); + hr->hcidma[i] = dwc2_readl(hsotg, HCDMA(i)); + hr->hcidmab[i] = dwc2_readl(hsotg, HCDMAB(i)); + } hr->hprt0 = dwc2_read_hprt0(hsotg); hr->hfir = dwc2_readl(hsotg, HFIR); @@ -5442,10 +5449,17 @@ int dwc2_restore_host_registers(struct dwc2_hsotg *hsotg) hr->valid = false; dwc2_writel(hsotg, hr->hcfg, HCFG); + dwc2_writel(hsotg, hr->hflbaddr, HFLBADDR); dwc2_writel(hsotg, hr->haintmsk, HAINTMSK); - for (i = 0; i < hsotg->params.host_channels; ++i) + for (i = 0; i < hsotg->params.host_channels; ++i) { + dwc2_writel(hsotg, hr->hcchar[i], HCCHAR(i)); + dwc2_writel(hsotg, hr->hcsplt[i], HCSPLT(i)); dwc2_writel(hsotg, hr->hcintmsk[i], HCINTMSK(i)); + dwc2_writel(hsotg, hr->hctsiz[i], HCTSIZ(i)); + dwc2_writel(hsotg, hr->hcidma[i], HCDMA(i)); + dwc2_writel(hsotg, hr->hcidmab[i], HCDMAB(i)); + } dwc2_writel(hsotg, hr->hprt0, HPRT0); dwc2_writel(hsotg, hr->hfir, HFIR); base-commit: 539f317ea7321225be4508975fa6dfbe2281cff9 -- 2.41.0

1 year, 7 months

1
0
0 0

[RFC PATCH] sched: Add missing memory barrier in switch_mm_cid

by Mathieu Desnoyers

Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb() which the core scheduler code has depended upon since commit: commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can unset the activly used cid when it fails to observe active task after it sets lazy_put. The *is* a memory barrier between storing to rq->curr and _return to userspace_ (as required by membarrier), but the rseq mm_cid has stricter requirements: the barrier needs to be issued between store to rq->curr and switch_mm_cid(), which happens earlier than: - spin_unlock(), - switch_to(). So it's fine when the architecture switch_mm happens to have that barrier already, but less so when the architecture only provides the full barrier in switch_to() or spin_unlock(). It is a bug in the rseq switch_mm_cid() implementation. All architectures that don't have memory barriers in switch_mm(), but rather have the full barrier either in finish_lock_switch() or switch_to() have them too late for the needs of switch_mm_cid(). Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the generic barrier.h header, and use it in switch_mm_cid() for scheduler transitions where switch_mm() is expected to provide a memory barrier. Architectures can override smp_mb__after_switch_mm() if their switch_mm() implementation provides an implicit memory barrier. Override it with a no-op on x86 which implicitly provide this memory barrier by writing to CR3. Reported-by: levi.yun <yeoreum.yun(a)arm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Cc: <stable(a)vger.kernel.org> # 6.4.x Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Will Deacon <will(a)kernel.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Aaron Lu <aaron.lu(a)intel.com> --- arch/x86/include/asm/barrier.h | 3 +++ include/asm-generic/barrier.h | 8 ++++++++ kernel/sched/sched.h | 19 +++++++++++++------ 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 35389b2af88e..0d5e54201eb2 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -79,6 +79,9 @@ do { \ #define __smp_mb__before_atomic() do { } while (0) #define __smp_mb__after_atomic() do { } while (0) +/* Writing to CR3 provides a full memory barrier in switch_mm(). */ +#define smp_mb__after_switch_mm() do { } while (0) + #include <asm-generic/barrier.h> /* diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 961f4d88f9ef..5a6c94d7a598 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -296,5 +296,13 @@ do { \ #define io_stop_wc() do { } while (0) #endif +/* + * Architectures that guarantee an implicit smp_mb() in switch_mm() + * can override smp_mb__after_switch_mm. + */ +#ifndef smp_mb__after_switch_mm +#define smp_mb__after_switch_mm() smp_mb() +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e5a95486a42..638ebd355912 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -79,6 +79,8 @@ # include <asm/paravirt_api_clock.h> #endif +#include <asm/barrier.h> + #include "cpupri.h" #include "cpudeadline.h" @@ -3481,13 +3483,18 @@ static inline void switch_mm_cid(struct rq *rq, * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu]. * Provide it here. */ - if (!prev->mm) // from kernel + if (!prev->mm) { // from kernel smp_mb(); - /* - * user -> user transition guarantees a memory barrier through - * switch_mm() when current->mm changes. If current->mm is - * unchanged, no barrier is needed. - */ + } else { // from user + /* + * user -> user transition relies on an implicit the + * memory barrier in switch_mm() when current->mm + * changes. If the architecture switch_mm() does not + * have an implicit memory barrier, it is emitted here. + * If current->mm is unchanged, no barrier is needed. + */ + smp_mb__after_switch_mm(); + } } if (prev->mm_cid_active) { mm_cid_snapshot_time(rq, prev->mm); -- 2.39.2

1 year, 7 months

3
3
0 0

[PATCH net v5 1/2] soc: fsl: qbman: Always disable interrupts when taking cgr_lock

by Sean Anderson

smp_call_function_single disables IRQs when executing the callback. To prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere. This is already done by qman_update_cgr and qman_delete_cgr; fix the other lockers. Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()") CC: stable(a)vger.kernel.org Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev> Reviewed-by: Camelia Groza <camelia.groza(a)nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean(a)nxp.com> --- Resent from a non-mangling email. (no changes since v3) Changes in v3: - Change blamed commit to something more appropriate Changes in v2: - Fix one additional call to spin_unlock drivers/soc/fsl/qbman/qman.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index 739e4eee6b75..1bf1f1ea67f0 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -1456,11 +1456,11 @@ static void qm_congestion_task(struct work_struct *work) union qm_mc_result *mcr; struct qman_cgr *cgr; - spin_lock(&p->cgr_lock); + spin_lock_irq(&p->cgr_lock); qm_mc_start(&p->p); qm_mc_commit(&p->p, QM_MCC_VERB_QUERYCONGESTION); if (!qm_mc_result_timeout(&p->p, &mcr)) { - spin_unlock(&p->cgr_lock); + spin_unlock_irq(&p->cgr_lock); dev_crit(p->config->dev, "QUERYCONGESTION timeout\n"); qman_p_irqsource_add(p, QM_PIRQ_CSCI); return; @@ -1476,7 +1476,7 @@ static void qm_congestion_task(struct work_struct *work) list_for_each_entry(cgr, &p->cgr_cbs, node) if (cgr->cb && qman_cgrs_get(&c, cgr->cgrid)) cgr->cb(p, cgr, qman_cgrs_get(&rr, cgr->cgrid)); - spin_unlock(&p->cgr_lock); + spin_unlock_irq(&p->cgr_lock); qman_p_irqsource_add(p, QM_PIRQ_CSCI); } @@ -2440,7 +2440,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, preempt_enable(); cgr->chan = p->config->channel; - spin_lock(&p->cgr_lock); + spin_lock_irq(&p->cgr_lock); if (opts) { struct qm_mcc_initcgr local_opts = *opts; @@ -2477,7 +2477,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags, qman_cgrs_get(&p->cgrs[1], cgr->cgrid)) cgr->cb(p, cgr, 1); out: - spin_unlock(&p->cgr_lock); + spin_unlock_irq(&p->cgr_lock); put_affine_portal(); return ret; } -- 2.35.1.1320.gc452695387.dirty

1 year, 7 months

2
2
0 0

[PATCH] mm: mglru: Fix soft lockup attributed to scanning folios

by Yafang Shao

After we enabled mglru on our 384C1536GB production servers, we encountered frequent soft lockups attributed to scanning folios. The soft lockup as follows, [Sat Feb 24 02:29:42 2024] watchdog: BUG: soft lockup - CPU#215 stuck for 111s! [kworker/215:0:2200100] [Sat Feb 24 02:29:42 2024] Call Trace: [Sat Feb 24 02:29:42 2024] <IRQ> [Sat Feb 24 02:29:42 2024] ? show_regs.cold+0x1a/0x1f [Sat Feb 24 02:29:42 2024] ? watchdog_timer_fn+0x1c4/0x220 [Sat Feb 24 02:29:42 2024] ? softlockup_fn+0x30/0x30 [Sat Feb 24 02:29:42 2024] ? __hrtimer_run_queues+0xa2/0x2b0 [Sat Feb 24 02:29:42 2024] ? hrtimer_interrupt+0x109/0x220 [Sat Feb 24 02:29:42 2024] ? __sysvec_apic_timer_interrupt+0x5e/0x110 [Sat Feb 24 02:29:42 2024] ? sysvec_apic_timer_interrupt+0x7b/0x90 [Sat Feb 24 02:29:42 2024] </IRQ> [Sat Feb 24 02:29:42 2024] <TASK> [Sat Feb 24 02:29:42 2024] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [Sat Feb 24 02:29:42 2024] ? folio_end_writeback+0x73/0xa0 [Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x8c/0x90 [Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x57/0x90 [Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x8c/0x90 [Sat Feb 24 02:29:42 2024] folio_end_writeback+0x73/0xa0 [Sat Feb 24 02:29:42 2024] iomap_finish_ioend+0x1d4/0x420 [Sat Feb 24 02:29:42 2024] iomap_finish_ioends+0x5e/0xe0 [Sat Feb 24 02:29:42 2024] xfs_end_ioend+0x65/0x150 [xfs] [Sat Feb 24 02:29:42 2024] xfs_end_io+0xbc/0xf0 [xfs] [Sat Feb 24 02:29:42 2024] process_one_work+0x1ec/0x3c0 [Sat Feb 24 02:29:42 2024] worker_thread+0x4d/0x390 [Sat Feb 24 02:29:42 2024] ? process_one_work+0x3c0/0x3c0 [Sat Feb 24 02:29:42 2024] kthread+0xee/0x120 [Sat Feb 24 02:29:42 2024] ? kthread_complete_and_exit+0x20/0x20 [Sat Feb 24 02:29:42 2024] ret_from_fork+0x1f/0x30 [Sat Feb 24 02:29:42 2024] </TASK> From our analysis of the vmcore generated by the soft lockup, the thread was waiting for the spinlock lruvec->lru_lock: PID: 2200100 TASK: ffff9a221d8b4000 CPU: 215 COMMAND: "kworker/215:0" #0 [fffffe000319ae20] crash_nmi_callback at ffffffff8e055419 #1 [fffffe000319ae58] nmi_handle at ffffffff8e0253c0 #2 [fffffe000319aea0] default_do_nmi at ffffffff8eae5985 #3 [fffffe000319aec8] exc_nmi at ffffffff8eae5b78 #4 [fffffe000319aef0] end_repeat_nmi at ffffffff8ec015f0 [exception RIP: queued_spin_lock_slowpath+59] RIP: ffffffff8eaf9b8b RSP: ffffb58b01d4fc20 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffb58b01d4fc90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff99d2b6ff9050 RBP: ffffb58b01d4fc40 R8: 0000000000035b21 R9: 0000000000000040 R10: 0000000000035b00 R11: 0000000000000001 R12: ffff99d2b6ff9050 R13: 0000000000000046 R14: ffffffff8e28bd30 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffffb58b01d4fc20] queued_spin_lock_slowpath at ffffffff8eaf9b8b #6 [ffffb58b01d4fc48] _raw_spin_lock_irqsave at ffffffff8eaf9b11 #7 [ffffb58b01d4fc68] folio_lruvec_lock_irqsave at ffffffff8e337a82 #8 [ffffb58b01d4fc88] folio_batch_move_lru at ffffffff8e28dbcf #9 [ffffb58b01d4fcd0] folio_batch_add_and_move at ffffffff8e28dce7 #10 [ffffb58b01d4fce0] folio_rotate_reclaimable at ffffffff8e28eee7 #11 [ffffb58b01d4fcf8] folio_end_writeback at ffffffff8e27bfb3 #12 [ffffb58b01d4fd10] iomap_finish_ioend at ffffffff8e3d9d04 #13 [ffffb58b01d4fd98] iomap_finish_ioends at ffffffff8e3d9fae #14 [ffffb58b01d4fde0] xfs_end_ioend at ffffffffc0fae835 [xfs] #15 [ffffb58b01d4fe20] xfs_end_io at ffffffffc0fae9dc [xfs] #16 [ffffb58b01d4fe60] process_one_work at ffffffff8e0ae08c #17 [ffffb58b01d4feb0] worker_thread at ffffffff8e0ae2ad #18 [ffffb58b01d4ff10] kthread at ffffffff8e0b671e #19 [ffffb58b01d4ff50] ret_from_fork at ffffffff8e002dcf While the spinlock (RDI: ffff99d2b6ff9050) was held by a task which was scanning folios: PID: 2400713 TASK: ffff996be1d14000 CPU: 50 COMMAND: "chitu_main" --- <NMI exception stack> --- #5 [ffffb58b14ef76e8] __mod_zone_page_state at ffffffff8e2a9c36 #6 [ffffb58b14ef76f0] folio_inc_gen at ffffffff8e2990bd #7 [ffffb58b14ef7740] sort_folio at ffffffff8e29afbb #8 [ffffb58b14ef7748] sysvec_apic_timer_interrupt at ffffffff8eae79f0 #9 [ffffb58b14ef77b0] scan_folios at ffffffff8e29b49b #10 [ffffb58b14ef7878] evict_folios at ffffffff8e29bb53 #11 [ffffb58b14ef7968] lru_gen_shrink_lruvec at ffffffff8e29cb57 #12 [ffffb58b14ef7a28] shrink_lruvec at ffffffff8e29e135 #13 [ffffb58b14ef7af0] shrink_node at ffffffff8e29e78c #14 [ffffb58b14ef7b88] do_try_to_free_pages at ffffffff8e29ec08 #15 [ffffb58b14ef7bf8] try_to_free_mem_cgroup_pages at ffffffff8e2a17a6 #16 [ffffb58b14ef7ca8] try_charge_memcg at ffffffff8e338879 #17 [ffffb58b14ef7d48] charge_memcg at ffffffff8e3394f8 #18 [ffffb58b14ef7d70] __mem_cgroup_charge at ffffffff8e33aded #19 [ffffb58b14ef7d98] do_anonymous_page at ffffffff8e2c6523 #20 [ffffb58b14ef7dd8] __handle_mm_fault at ffffffff8e2cc27d #21 [ffffb58b14ef7e78] handle_mm_fault at ffffffff8e2cc3ba #22 [ffffb58b14ef7eb8] do_user_addr_fault at ffffffff8e073a99 #23 [ffffb58b14ef7f20] exc_page_fault at ffffffff8eae82f7 #24 [ffffb58b14ef7f50] asm_exc_page_fault at ffffffff8ec00bb7 There were a total of 22 tasks waiting for this spinlock (RDI: ffff99d2b6ff9050): crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l 22 Additionally, two other threads were also engaged in scanning folios, one with 19 waiters and the other with 15 waiters. To address this issue under heavy reclaim conditions, we introduced a hotfix version of the fix, incorporating cond_resched() in scan_folios(). Following the application of this hotfix to our servers, the soft lockup issue ceased. Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: stable(a)vger.kernel.org # 6.1+ --- mm/vmscan.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..8f2877285b9a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) break; + + spin_unlock_irq(&lruvec->lru_lock); + cond_resched(); + spin_lock_irq(&lruvec->lru_lock); } if (skipped_zone) { -- 2.30.1 (Apple Git-130)

1 year, 7 months

3
5
0 0

[for-linus][PATCH 5/5] tracing/ring-buffer: Fix wait_on_pipe() race

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the trace_pipe_raw file is closed, there should be no new readers on the file descriptor. This is mostly handled with the waking and wait_index fields of the iterator. But there's still a slight race. CPU 0 CPU 1 ----- ----- wait_index++; index = wait_index; ring_buffer_wake_waiters(); wait_on_pipe() ring_buffer_wait(); The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is that the ring_buffer_wait() needs the logic of: prepare_to_wait(); if (!condition) schedule(); Where the missing condition check is the iter->wait_index update. Have the ring_buffer_wait() take a conditional callback function and a data parameter that can be used within the wait_event_interruptible() of the ring_buffer_wait() function. In wait_on_pipe(), pass a condition function that will check if the wait_index has been updated, if it has, it will return true to break out of the wait_event_interruptible() loop. Create a new field "closed" in the trace_iterator and set it in the .flush() callback before calling ring_buffer_wake_waiters(). This will keep any new readers from waiting on a closed file descriptor. Have the wait_on_pipe() condition callback also check the closed field. Change the wait_index field of the trace_iterator to atomic_t. There's no reason it needs to be 'long' and making it atomic and using atomic_read_acquire() and atomic_fetch_inc_release() will provide the necessary memory barriers. Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after one more try to fetch data. That is, if it waited for data and something woke it up, it should try to collect any new data and then exit back to user space. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm… Link: https://lore.kernel.org/linux-trace-kernel/20240312121703.557950713@goodmis… Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: linke li <lilinke99(a)qq.com> Cc: Rabin Vincent <rabin(a)rab.in> Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 3 ++- include/linux/trace_events.h | 5 ++++- kernel/trace/ring_buffer.c | 13 ++++++----- kernel/trace/trace.c | 43 ++++++++++++++++++++++++++---------- 4 files changed, 45 insertions(+), 19 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 338a33db1577..dc5ae4e96aee 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k }) typedef bool (*ring_buffer_cond_fn)(void *data); -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d68ff9b1247f..fc6d0af56bb1 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -103,13 +103,16 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; - long wait_index; + atomic_t wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; cpumask_var_t started; + /* Set when the file is closed to prevent new waiters */ + bool closed; + /* it's true when current open file is snapshot */ bool snapshot; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index f4c34b7c7e1e..350607cce869 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -902,23 +902,26 @@ static bool rb_wait_once(void *data) * @buffer: buffer to wait on * @cpu: the cpu buffer to wait on * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @cond: condition function to break out of wait (NULL to run once) + * @data: the data to pass to @cond. * * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon * as data is added to any of the @buffer's cpu buffers. Otherwise * it will wait for data to be added to a specific cpu buffer. */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data) { struct ring_buffer_per_cpu *cpu_buffer; struct wait_queue_head *waitq; - ring_buffer_cond_fn cond; struct rb_irq_work *rbwork; - void *data; long once = 0; int ret = 0; - cond = rb_wait_once; - data = &once; + if (!cond) { + cond = rb_wait_once; + data = &once; + } /* * Depending on what the caller is waiting for, either any diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c9c898307348..d390fea3a6a5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1955,15 +1955,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu) #endif /* CONFIG_TRACER_MAX_TRACE */ +struct pipe_wait { + struct trace_iterator *iter; + int wait_index; +}; + +static bool wait_pipe_cond(void *data) +{ + struct pipe_wait *pwait = data; + struct trace_iterator *iter = pwait->iter; + + if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index) + return true; + + return iter->closed; +} + static int wait_on_pipe(struct trace_iterator *iter, int full) { + struct pipe_wait pwait; int ret; /* Iterators are static, they should be filled or empty */ if (trace_buffer_iter(iter, iter->cpu_file)) return 0; - ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full); + pwait.wait_index = atomic_read_acquire(&iter->wait_index); + pwait.iter = iter; + + ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full, + wait_pipe_cond, &pwait); #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -8398,9 +8419,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id) struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; - iter->wait_index++; + iter->closed = true; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); @@ -8500,6 +8521,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + bool woken = false; int page_size; int entries, i; ssize_t ret = 0; @@ -8573,17 +8595,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, /* did we read anything? */ if (!spd.nr_pages) { - long wait_index; if (ret) goto out; + if (woken) + goto out; + ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; - wait_index = READ_ONCE(iter->wait_index); - ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent); if (ret) goto out; @@ -8592,10 +8614,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!tracer_tracing_is_on(iter->tr)) goto out; - /* Make sure we see the new wait_index */ - smp_rmb(); - if (wait_index != iter->wait_index) - goto out; + /* Iterate one more time to collect any new data then exit */ + woken = true; goto again; } @@ -8618,9 +8638,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned mutex_lock(&trace_types_lock); - iter->wait_index++; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); -- 2.43.0

1 year, 7 months

1
0
0 0

[for-linus][PATCH 4/5] ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> Convert ring_buffer_wait() over to wait_event_interruptible(). The default condition is to execute the wait loop inside __wait_event() just once. This does not change the ring_buffer_wait() prototype yet, but restructures the code so that it can take a "cond" and "data" parameter and will call wait_event_interruptible() with a helper function as the condition. The helper function (rb_wait_cond) takes the cond function and data parameters. It will first check if the buffer hit the watermark defined by the "full" parameter and then call the passed in condition parameter. If either are true, it returns true. If rb_wait_cond() does not return true, it will set the appropriate "waiters_pending" flag and returns false. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm… Link: https://lore.kernel.org/linux-trace-kernel/20240312121703.399598519@goodmis… Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: linke li <lilinke99(a)qq.com> Cc: Rabin Vincent <rabin(a)rab.in> Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 1 + kernel/trace/ring_buffer.c | 116 +++++++++++++++++++++--------------- 2 files changed, 69 insertions(+), 48 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index fa802db216f9..338a33db1577 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -98,6 +98,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +typedef bool (*ring_buffer_cond_fn)(void *data); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 8c3730a88662..f4c34b7c7e1e 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -843,43 +843,15 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) return ret; } -/** - * ring_buffer_wait - wait for input to the ring buffer - * @buffer: buffer to wait on - * @cpu: the cpu buffer to wait on - * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS - * - * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon - * as data is added to any of the @buffer's cpu buffers. Otherwise - * it will wait for data to be added to a specific cpu buffer. - */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +static inline bool +rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer, + int cpu, int full, ring_buffer_cond_fn cond, void *data) { - struct ring_buffer_per_cpu *cpu_buffer; - DEFINE_WAIT(wait); - struct rb_irq_work *work; - int ret = 0; - - /* - * Depending on what the caller is waiting for, either any - * data in any cpu buffer, or a specific buffer, put the - * caller on the appropriate wait queue. - */ - if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; - /* Full only makes sense on per cpu reads */ - full = 0; - } else { - if (!cpumask_test_cpu(cpu, buffer->cpumask)) - return -ENODEV; - cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; - } + if (rb_watermark_hit(buffer, cpu, full)) + return true; - if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); - else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); + if (cond(data)) + return true; /* * The events can happen in critical sections where @@ -902,27 +874,75 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * a task has been queued. It's OK for spurious wake ups. */ if (full) - work->full_waiters_pending = true; + rbwork->full_waiters_pending = true; else - work->waiters_pending = true; + rbwork->waiters_pending = true; - if (rb_watermark_hit(buffer, cpu, full)) - goto out; + return false; +} - if (signal_pending(current)) { - ret = -EINTR; - goto out; +/* + * The default wait condition for ring_buffer_wait() is to just to exit the + * wait loop the first time it is woken up. + */ +static bool rb_wait_once(void *data) +{ + long *once = data; + + /* wait_event() actually calls this twice before scheduling*/ + if (*once > 1) + return true; + + (*once)++; + return false; +} + +/** + * ring_buffer_wait - wait for input to the ring buffer + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon + * as data is added to any of the @buffer's cpu buffers. Otherwise + * it will wait for data to be added to a specific cpu buffer. + */ +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + struct wait_queue_head *waitq; + ring_buffer_cond_fn cond; + struct rb_irq_work *rbwork; + void *data; + long once = 0; + int ret = 0; + + cond = rb_wait_once; + data = &once; + + /* + * Depending on what the caller is waiting for, either any + * data in any cpu buffer, or a specific buffer, put the + * caller on the appropriate wait queue. + */ + if (cpu == RING_BUFFER_ALL_CPUS) { + rbwork = &buffer->irq_work; + /* Full only makes sense on per cpu reads */ + full = 0; + } else { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + return -ENODEV; + cpu_buffer = buffer->buffers[cpu]; + rbwork = &cpu_buffer->irq_work; } - schedule(); - out: if (full) - finish_wait(&work->full_waiters, &wait); + waitq = &rbwork->full_waiters; else - finish_wait(&work->waiters, &wait); + waitq = &rbwork->waiters; - if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) - ret = -EINTR; + ret = wait_event_interruptible((*waitq), + rb_wait_cond(rbwork, buffer, cpu, full, cond, data)); return ret; } -- 2.43.0

1 year, 7 months

1
0
0 0

[for-linus][PATCH 2/5] ring-buffer: Fix full_waiters_pending in poll

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> If a reader of the ring buffer is doing a poll, and waiting for the ring buffer to hit a specific watermark, there could be a case where it gets into an infinite ping-pong loop. The poll code has: rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; The writer will see full_waiters_pending and check if the ring buffer is filled over the percentage of the shortest_full value. If it is, it calls an irq_work to wake up all the waiters. But the code could get into a circular loop: CPU 0 CPU 1 ----- ----- [ Poll ] [ shortest_full = 0 ] rbwork->full_waiters_pending = true; if (rbwork->full_waiters_pending && [ buffer percent ] > shortest_full) { rbwork->wakeup_full = true; [ queue_irqwork ] cpu_buffer->shortest_full = full; [ IRQ work ] if (rbwork->wakeup_full) { cpu_buffer->shortest_full = 0; wakeup poll waiters; [woken] if ([ buffer percent ] > full) break; rbwork->full_waiters_pending = true; if (rbwork->full_waiters_pending && [ buffer percent ] > shortest_full) { rbwork->wakeup_full = true; [ queue_irqwork ] cpu_buffer->shortest_full = full; [ IRQ work ] if (rbwork->wakeup_full) { cpu_buffer->shortest_full = 0; wakeup poll waiters; [woken] [ Wash, rinse, repeat! ] In the poll, the shortest_full needs to be set before the full_pending_waiters, as once that is set, the writer will compare the current shortest_full (which is incorrect) to decide to call the irq_work, which will reset the shortest_full (expecting the readers to update it). Also move the setting of full_waiters_pending after the check if the ring buffer has the required percentage filled. There's no reason to tell the writer to wake up waiters if there are no waiters. Link: https://lore.kernel.org/linux-trace-kernel/20240312131952.630922155@goodmis… Cc: stable(a)vger.kernel.org Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark") Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6ffbccb9bcf0..99fdda29ce4e 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -965,16 +965,32 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, poll_wait(filp, &rbwork->full_waiters, poll_table); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - } else { - poll_wait(filp, &rbwork->waiters, poll_table); - rbwork->waiters_pending = true; + if (full_hit(buffer, cpu, full)) + return EPOLLIN | EPOLLRDNORM; + /* + * Only allow full_waiters_pending update to be seen after + * the shortest_full is set. If the writer sees the + * full_waiters_pending flag set, it will compare the + * amount in the ring buffer to shortest_full. If the amount + * in the ring buffer is greater than the shortest_full + * percent, it will call the irq_work handler to wake up + * this list. The irq_handler will reset shortest_full + * back to zero. That's done under the reader_lock, but + * the below smp_mb() makes sure that the update to + * full_waiters_pending doesn't leak up into the above. + */ + smp_mb(); + rbwork->full_waiters_pending = true; + return 0; } + poll_wait(filp, &rbwork->waiters, poll_table); + rbwork->waiters_pending = true; + /* * There's a tight race between setting the waiters_pending and * checking if the ring buffer is empty. Once the waiters_pending bit @@ -990,9 +1006,6 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, */ smp_mb(); - if (full) - return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0; - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) return EPOLLIN | EPOLLRDNORM; -- 2.43.0

1 year, 7 months

1
0
0 0

[for-linus][PATCH 1/5] ring-buffer: Do not set shortest_full when full target is hit

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The rb_watermark_hit() checks if the amount of data in the ring buffer is above the percentage level passed in by the "full" variable. If it is, it returns true. But it also sets the "shortest_full" field of the cpu_buffer that informs writers that it needs to call the irq_work if the amount of data on the ring buffer is above the requested amount. The rb_watermark_hit() always sets the shortest_full even if the amount in the ring buffer is what it wants. As it is not going to wait, because it has what it wants, there's no reason to set shortest_full. Link: https://lore.kernel.org/linux-trace-kernel/20240312115641.6aa8ba08@gandalf.… Cc: stable(a)vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark") Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index aa332ace108b..6ffbccb9bcf0 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -834,9 +834,10 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; ret = !pagebusy && full_hit(buffer, cpu, full); - if (!cpu_buffer->shortest_full || - cpu_buffer->shortest_full > full) - cpu_buffer->shortest_full = full; + if (!ret && (!cpu_buffer->shortest_full || + cpu_buffer->shortest_full > full)) { + cpu_buffer->shortest_full = full; + } raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } return ret; -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH AUTOSEL 6.1 1/7] arm64: tegra: Set the correct PHY mode for MGBE

by Sasha Levin

From: Thierry Reding <treding(a)nvidia.com> [ Upstream commit 4c892121d43bc2b45896ca207b54f39a8fa6b852 ] The PHY is configured in 10GBASE-R, so make sure to reflect that in DT. Reviewed-by: Jon Hunter <jonathanh(a)nvidia.com> Tested-by: Jon Hunter <jonathanh(a)nvidia.com> Signed-off-by: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts index f094011be9ed9..8099dc04ed2e1 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts +++ b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts @@ -2024,7 +2024,7 @@ status = "okay"; phy-handle = <&mgbe0_phy>; - phy-mode = "usxgmii"; + phy-mode = "10gbase-r"; mdio { #address-cells = <1>; -- 2.43.0

1 year, 7 months

3
8
0 0

[PATCH v5 0/4] Disable automatic load CCS load balancing

by Andi Shyti

Hi, this series does basically two things: 1. Disables automatic load balancing as adviced by the hardware workaround. 2. Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine In this v5 I have created a new file, gt/intel_gt_ccs_mode.c where I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting. I saw also necessary the creation of a new mechanism fro looping through engines in order to exclude the CCS's that are merged into one single stream. It's called for_each_available_engine() and I started using it in the hangcheck sefltest. I might still need to iterate a few CI runs in order to cover more cases when this call is needed. I'm using here the "Requires: " tag, but I'm not sure the commit id will be valid, on the other hand, I don't know what commit id I should use. Thanks Tvrtko, Matt, John and Joonas for your reviews! Andi Changelog ========= v4 -> v5 - Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt) - Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt) - create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series. - Fix a selftest that was failing because could not set CCS2. - Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest. v3 -> v4 - Reword correctly the comment in the workaround - Fix a buffer overflow (Thanks Joonas) - Handle properly the fused engines when setting the CCS mode. v2 -> v3 - Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko) - Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt) - Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome. v1 -> v2 - In Patch 1 use the correct workaround number (thanks Matt). - In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko). - Reword the commit of Patch 2 (thanks John). Andi Shyti (4): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Refactor uabi engine class/instance list creation drm/i915/gt: Disable tests for CCS engines beyond the first drm/i915/gt: Enable only one CCS for compute workload drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_user.c | 40 ++++++++++++++------ drivers/gpu/drm/i915/gt/intel_gt.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 22 +++++------ 8 files changed, 139 insertions(+), 25 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h -- 2.43.0

1 year, 7 months

3
9
0 0

[PATCH 1/2] dlm: fix user space lkb refcounting

by Alexander Aring

This patch fixes to check on the right return value if it was the last callback. The rv variable got overwritten by the return of copy_result_to_user(). Fixing it by introducing a second variable for the return value and don't let rv being overwritten. Cc: stable(a)vger.kernel.org Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks") Reported-by: Valentin Vidić <vvidic(a)valentin-vidic.from.hr> Closes: https://lore.kernel.org/gfs2/Ze4qSvzGJDt5yxC3@valentin-vidic.from.hr Signed-off-by: Alexander Aring <aahringo(a)redhat.com> --- fs/dlm/user.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/dlm/user.c b/fs/dlm/user.c index 695e691b38b3..9f9b68448830 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -806,7 +806,7 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count, struct dlm_lkb *lkb; DECLARE_WAITQUEUE(wait, current); struct dlm_callback *cb; - int rv, copy_lvb = 0; + int rv, ret, copy_lvb = 0; int old_mode, new_mode; if (count == sizeof(struct dlm_device_version)) { @@ -906,9 +906,9 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count, trace_dlm_ast(lkb->lkb_resource->res_ls, lkb); } - rv = copy_result_to_user(lkb->lkb_ua, - test_bit(DLM_PROC_FLAGS_COMPAT, &proc->flags), - cb->flags, cb->mode, copy_lvb, buf, count); + ret = copy_result_to_user(lkb->lkb_ua, + test_bit(DLM_PROC_FLAGS_COMPAT, &proc->flags), + cb->flags, cb->mode, copy_lvb, buf, count); kref_put(&cb->ref, dlm_release_callback); @@ -916,7 +916,7 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count, if (rv == DLM_DEQUEUE_CALLBACK_LAST) dlm_put_lkb(lkb); - return rv; + return ret; } static __poll_t device_poll(struct file *file, poll_table *wait) -- 2.43.0

1 year, 7 months

2
2
0 0

[PATCH 0/4] RFDS backport 6.8.y

by Pawan Gupta

This is a backport of recently upstreamed mitigation of a CPU vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It has a dependency on "Delay VERW" series which is already present in v6.8. v6.8 just got released so the backport was very smooth. Cc: Sasha Levin <sashal(a)kernel.org> To: stable(a)vger.kernel.org Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> --- Pawan Gupta (4): x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Documentation/hw-vuln: Add documentation for RFDS x86/rfds: Mitigate Register File Data Sampling (RFDS) KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + Documentation/admin-guide/hw-vuln/index.rst | 1 + .../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++ Documentation/admin-guide/kernel-parameters.txt | 21 +++++ arch/x86/Kconfig | 11 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 8 ++ arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++- arch/x86/kernel/cpu/common.c | 38 +++++++- arch/x86/kvm/x86.c | 5 +- drivers/base/cpu.c | 3 + include/linux/cpu.h | 2 + 12 files changed, 278 insertions(+), 9 deletions(-) --- base-commit: e8f897f4afef0031fe618a8e94127a0934896aba change-id: 20240312-rfds-backport-6-8-y-d67dcdfe4e51 Best regards, -- Thanks, Pawan

1 year, 7 months

2
5
0 0

[PATCH 0/4] RFDS backport 6.1.y

by Pawan Gupta

This is a backport of recently upstreamed mitigation of a CPU vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It has a dependency on "Delay VERW" series which is already backported and merged in linux-6.1.y. - There was a minor conflict for patch 2/4 in Documentation index. - There were many easy to resolve conflicts in patch 3/4 related to sysfs reporting. - s/ATOM_GRACEMONT/ALDERLAKE_N/ - ATOM_GRACEMONT is called ALDERLAKE_N in 6.6. Cc: Sasha Levin <sashal(a)kernel.org> To: stable(a)vger.kernel.org Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> --- Pawan Gupta (4): x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Documentation/hw-vuln: Add documentation for RFDS x86/rfds: Mitigate Register File Data Sampling (RFDS) KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + Documentation/admin-guide/hw-vuln/index.rst | 1 + .../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++ Documentation/admin-guide/kernel-parameters.txt | 21 +++++ arch/x86/Kconfig | 11 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 8 ++ arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++- arch/x86/kernel/cpu/common.c | 38 +++++++- arch/x86/kvm/x86.c | 5 +- drivers/base/cpu.c | 8 ++ include/linux/cpu.h | 2 + 12 files changed, 283 insertions(+), 9 deletions(-) --- base-commit: 61adba85cc40287232a539e607164f273260e0fe change-id: 20240312-rfds-backport-6-1-y-d17ecf8c1ec5 Best regards, -- Thanks, Pawan

1 year, 7 months

1
4
0 0

[PATCH 0/4] RFDS backport 6.6.y

by Pawan Gupta

This is a backport of recently upstreamed mitigation of a CPU vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It has a dependency on "Delay VERW" series which is already backported and merged in linux-6.6.y. There were no hiccups in backporting this. Cc: Sasha Levin <sashal(a)kernel.org> To: stable(a)vger.kernel.org Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> --- Pawan Gupta (4): x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Documentation/hw-vuln: Add documentation for RFDS x86/rfds: Mitigate Register File Data Sampling (RFDS) KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + Documentation/admin-guide/hw-vuln/index.rst | 1 + .../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++ Documentation/admin-guide/kernel-parameters.txt | 21 +++++ arch/x86/Kconfig | 11 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 8 ++ arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++- arch/x86/kernel/cpu/common.c | 38 +++++++- arch/x86/kvm/x86.c | 5 +- drivers/base/cpu.c | 3 + include/linux/cpu.h | 2 + 12 files changed, 278 insertions(+), 9 deletions(-) --- base-commit: 62e5ae5007ef14cf9b12da6520d50fe90079d8d4 change-id: 20240312-rfds-backport-6-6-y-e1425616b52a Best regards, -- Thanks, Pawan

1 year, 7 months

1
4
0 0

[PATCH 0/4] RFDS backport 6.7.y

by Pawan Gupta

This is a backport of recently upstreamed mitigation of a CPU vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It has a dependency on "Delay VERW" series which is already backported and merged in linux-6.7.y. There were no hiccups in backporting this. Cc: Sasha Levin <sashal(a)kernel.org> To: stable(a)vger.kernel.org Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> --- Pawan Gupta (4): x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Documentation/hw-vuln: Add documentation for RFDS x86/rfds: Mitigate Register File Data Sampling (RFDS) KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Documentation/ABI/testing/sysfs-devices-system-cpu | 1 + Documentation/admin-guide/hw-vuln/index.rst | 1 + .../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++ Documentation/admin-guide/kernel-parameters.txt | 21 +++++ arch/x86/Kconfig | 11 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 8 ++ arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++- arch/x86/kernel/cpu/common.c | 38 +++++++- arch/x86/kvm/x86.c | 5 +- drivers/base/cpu.c | 3 + include/linux/cpu.h | 2 + 12 files changed, 278 insertions(+), 9 deletions(-) --- base-commit: 2e7cdd29fc42c410eab52fffe5710bf656619222 change-id: 20240312-rfds-backport-6-7-y-4fbe3fc5366a Best regards, -- Thanks, Pawan

1 year, 7 months

1
4
0 0

Re: [PATCH v4 0/9] x86/sev: KEXEC/KDUMP support for SEV-ES guests

by Tom Lendacky

On 3/11/24 15:32, Vasant k wrote: > Hi Tom, > > Right, it just escaped my mind that the SNP uses the secrets page > to hand over APs to the next stage. I will correct that in the next Not quite... The MADT table lists the APs and the GHCB AP Create NAE event is used to start the APs. Thanks, Tom > version. Please let me know if you have any corrections or improvement > suggestions on the rest of the patchset. > > Thanks, > Vasant >

1 year, 7 months

2
2
0 0

INVESTMENT PROPOSAL

by Howard＠web.codeaurora.org

Good day, Our investors are exploring all possibilities to initiate change on a massive scale of investment opportunities, We offer personal, business, loans at a fixed interest rate of 2% per annum. Regards Howard Ethan Financial Advisor

1 year, 7 months

1
0
0 0

+ memtest-use-readwrite_once-in-memory-scanning.patch added to mm-unstable branch

by Andrew Morton

The patch titled Subject: memtest: use {READ,WRITE}_ONCE in memory scanning has been added to the -mm mm-unstable branch. Its filename is memtest-use-readwrite_once-in-memory-scanning.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Qiang Zhang <qiang4.zhang(a)intel.com> Subject: memtest: use {READ,WRITE}_ONCE in memory scanning Date: Tue, 12 Mar 2024 16:04:23 +0800 memtest failed to find bad memory when compiled with clang. So use {WRITE,READ}_ONCE to access memory to avoid compiler over optimization. Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com Signed-off-by: Qiang Zhang <qiang4.zhang(a)intel.com> Cc: Bill Wendling <morbo(a)google.com> Cc: Justin Stitt <justinstitt(a)google.com> Cc: Nathan Chancellor <nathan(a)kernel.org> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memtest.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/memtest.c~memtest-use-readwrite_once-in-memory-scanning +++ a/mm/memtest.c @@ -51,10 +51,10 @@ static void __init memtest(u64 pattern, last_bad = 0; for (p = start; p < end; p++) - *p = pattern; + WRITE_ONCE(*p, pattern); for (p = start; p < end; p++, start_phys_aligned += incr) { - if (*p == pattern) + if (READ_ONCE(*p) == pattern) continue; if (start_phys_aligned == last_bad + incr) { last_bad += incr; _ Patches currently in -mm which might be from qiang4.zhang(a)intel.com are memtest-use-readwrite_once-in-memory-scanning.patch

1 year, 7 months

1
0
0 0

[PATCH v2 1/2] ring-buffer: Fix full_waiters_pending in poll

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> If a reader of the ring buffer is doing a poll, and waiting for the ring buffer to hit a specific watermark, there could be a case where it gets into an infinite ping-pong loop. The poll code has: rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; The writer will see full_waiters_pending and check if the ring buffer is filled over the percentage of the shortest_full value. If it is, it calls an irq_work to wake up all the waiters. But the code could get into a circular loop: CPU 0 CPU 1 ----- ----- [ Poll ] [ shortest_full = 0 ] rbwork->full_waiters_pending = true; if (rbwork->full_waiters_pending && [ buffer percent ] > shortest_full) { rbwork->wakeup_full = true; [ queue_irqwork ] cpu_buffer->shortest_full = full; [ IRQ work ] if (rbwork->wakeup_full) { cpu_buffer->shortest_full = 0; wakeup poll waiters; [woken] if ([ buffer percent ] > full) break; rbwork->full_waiters_pending = true; if (rbwork->full_waiters_pending && [ buffer percent ] > shortest_full) { rbwork->wakeup_full = true; [ queue_irqwork ] cpu_buffer->shortest_full = full; [ IRQ work ] if (rbwork->wakeup_full) { cpu_buffer->shortest_full = 0; wakeup poll waiters; [woken] [ Wash, rinse, repeat! ] In the poll, the shortest_full needs to be set before the full_pending_waiters, as once that is set, the writer will compare the current shortest_full (which is incorrect) to decide to call the irq_work, which will reset the shortest_full (expecting the readers to update it). Also move the setting of full_waiters_pending after the check if the ring buffer has the required percentage filled. There's no reason to tell the writer to wake up waiters if there are no waiters. Cc: stable(a)vger.kernel.org Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index aa332ace108b..adfe603a769b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -964,16 +964,32 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, poll_wait(filp, &rbwork->full_waiters, poll_table); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - } else { - poll_wait(filp, &rbwork->waiters, poll_table); - rbwork->waiters_pending = true; + if (full_hit(buffer, cpu, full)) + return EPOLLIN | EPOLLRDNORM; + /* + * Only allow full_waiters_pending update to be seen after + * the shortest_full is set. If the writer sees the + * full_waiters_pending flag set, it will compare the + * amount in the ring buffer to shortest_full. If the amount + * in the ring buffer is greater than the shortest_full + * percent, it will call the irq_work handler to wake up + * this list. The irq_handler will reset shortest_full + * back to zero. That's done under the reader_lock, but + * the below smp_mb() makes sure that the update to + * full_waiters_pending doesn't leak up into the above. + */ + smp_mb(); + rbwork->full_waiters_pending = true; + return 0; } + poll_wait(filp, &rbwork->waiters, poll_table); + rbwork->waiters_pending = true; + /* * There's a tight race between setting the waiters_pending and * checking if the ring buffer is empty. Once the waiters_pending bit @@ -989,9 +1005,6 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, */ smp_mb(); - if (full) - return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0; - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) return EPOLLIN | EPOLLRDNORM; -- 2.43.0

1 year, 7 months

2
2
0 0

[PATCH 1/2] driver core: Clear FWNODE_FLAG_NOT_DEVICE when a device is added

by Herve Codina

Since commit 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays"), when using device-tree overlays, the FWNODE_FLAG_NOT_DEVICE is set on each overlay nodes. When an overlay contains a node related to a bus (i2c for instance) and its children nodes representing i2c devices, the flag is cleared for the bus node by the OF notifier but the "standard" probe sequence takes place (the same one is performed without an overlay) for the bus and children devices are created simply by walking the children DT nodes without clearing the FWNODE_FLAG_NOT_DEVICE flag for these devices. Clear the FWNODE_FLAG_NOT_DEVICE when the device is added, no matter if an overlay is used or not. Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays") Cc: <stable(a)vger.kernel.org> Signed-off-by: Herve Codina <herve.codina(a)bootlin.com> --- drivers/base/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index 14d46af40f9a..61d09ac57bfb 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -3619,6 +3619,7 @@ int device_add(struct device *dev) */ if (dev->fwnode && !dev->fwnode->dev) { dev->fwnode->dev = dev; + dev->fwnode->flags &= ~FWNODE_FLAG_NOT_DEVICE; fw_devlink_link_device(dev); } -- 2.43.0

1 year, 7 months

2
2
0 0

Regression with Lenovo ThinkPad Compact USB Keyboard

by Raphaël Halimi

Dear developers, (sorry for the long CC list, it looks quite long to me, but I tried to follow the issue reporting guide as closely as possible) Since patches [1], [2] and [3] were applied to the kernel, there is a regression with Lenovo ThinkPad Compact USB Keyboard (old model, not II). [1] https://github.com/torvalds/linux/commit/46a0a2c96f0f47628190f122c2e3d879e5… [2] https://github.com/torvalds/linux/commit/2f2bd7cbd1d1548137b351040dc4e037d1… [3] https://github.com/torvalds/linux/commit/43527a0094c10dfbf0d5a2e7979395a38d… The regression is that a middle click is performed when releasing middle button after wheel emulation. The bug appears randomly, it can be after 5 minutes or 1 hour of keyboard usage, and can only be worked around by unplugging/re-plugging the keyboard. (I ended up resorting to simulate an unplug/replug, with a script which echoes 0 then 1 to /sys/bus/usb/devices/<id>/authorized, since I was afraid to damage the Micro-USB outlet by physically unplugging/re-plugging too much). Those spurious clicks are very annoying, since they can open links in new tabs when scrolling in Firefox, or pasting text when scrolling in terminals, or other unwanted stuff. I witnessed it with latest kernels (Debian unstable) as well as stable kernels (Debian 12 Bookworm, stable). On Debian Stable, the last working kernel was 5.10.127, the regression appeared in 5.10.136 (i read all changelogs on kernel.org between those two releases but couldn't find anything about hid-lenovo, so I can't tell exactly in which release the regression appeared, Debian upgraded directly from .127 to .136). I reported it in Debian [4], and apparently I'm not the only person suffering from it [5]. [4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058758#32 [5] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058758#42 I would understand that such bugs would end up in a development kernel like the ones provided by Debian Unstable, but not with stable kernels like the ones provided by Debian Stable. Regards, -- Raphaël Halimi

1 year, 7 months

5
15
0 0

[PATCH v4 2/2] tracing/ring-buffer: Fix wait_on_pipe() race

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the trace_pipe_raw file is closed, there should be no new readers on the file descriptor. This is mostly handled with the waking and wait_index fields of the iterator. But there's still a slight race. CPU 0 CPU 1 ----- ----- wait_index++; index = wait_index; ring_buffer_wake_waiters(); wait_on_pipe() ring_buffer_wait(); The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is that the ring_buffer_wait() needs the logic of: prepare_to_wait(); if (!condition) schedule(); Where the missing condition check is the iter->wait_index update. Have the ring_buffer_wait() take a conditional callback function and a data parameter that can be used within the wait_event_interruptible() of the ring_buffer_wait() function. In wait_on_pipe(), pass a condition function that will check if the wait_index has been updated, if it has, it will return true to break out of the wait_event_interruptible() loop. Create a new field "closed" in the trace_iterator and set it in the .flush() callback before calling ring_buffer_wake_waiters(). This will keep any new readers from waiting on a closed file descriptor. Have the wait_on_pipe() condition callback also check the closed field. Change the wait_index field of the trace_iterator to atomic_t. There's no reason it needs to be 'long' and making it atomic and using atomic_read_acquire() and atomic_fetch_inc_release() will provide the necessary memory barriers. Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after one more try to fetch data. That is, if it waited for data and something woke it up, it should try to collect any new data and then exit back to user space. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 3 ++- include/linux/trace_events.h | 5 ++++- kernel/trace/ring_buffer.c | 13 ++++++----- kernel/trace/trace.c | 43 ++++++++++++++++++++++++++---------- 4 files changed, 45 insertions(+), 19 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 338a33db1577..dc5ae4e96aee 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k }) typedef bool (*ring_buffer_cond_fn)(void *data); -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d68ff9b1247f..fc6d0af56bb1 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -103,13 +103,16 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; - long wait_index; + atomic_t wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; cpumask_var_t started; + /* Set when the file is closed to prevent new waiters */ + bool closed; + /* it's true when current open file is snapshot */ bool snapshot; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c198ba466853..67d8405f4451 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -901,23 +901,26 @@ static bool rb_wait_once(void *data) * @buffer: buffer to wait on * @cpu: the cpu buffer to wait on * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @cond: condition function to break out of wait (NULL to run once) + * @data: the data to pass to @cond. * * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon * as data is added to any of the @buffer's cpu buffers. Otherwise * it will wait for data to be added to a specific cpu buffer. */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data) { struct ring_buffer_per_cpu *cpu_buffer; struct wait_queue_head *waitq; - ring_buffer_cond_fn cond; struct rb_irq_work *rbwork; - void *data; long once = 0; int ret = 0; - cond = rb_wait_once; - data = &once; + if (!cond) { + cond = rb_wait_once; + data = &once; + } /* * Depending on what the caller is waiting for, either any diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c9c898307348..d390fea3a6a5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1955,15 +1955,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu) #endif /* CONFIG_TRACER_MAX_TRACE */ +struct pipe_wait { + struct trace_iterator *iter; + int wait_index; +}; + +static bool wait_pipe_cond(void *data) +{ + struct pipe_wait *pwait = data; + struct trace_iterator *iter = pwait->iter; + + if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index) + return true; + + return iter->closed; +} + static int wait_on_pipe(struct trace_iterator *iter, int full) { + struct pipe_wait pwait; int ret; /* Iterators are static, they should be filled or empty */ if (trace_buffer_iter(iter, iter->cpu_file)) return 0; - ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full); + pwait.wait_index = atomic_read_acquire(&iter->wait_index); + pwait.iter = iter; + + ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full, + wait_pipe_cond, &pwait); #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -8398,9 +8419,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id) struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; - iter->wait_index++; + iter->closed = true; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); @@ -8500,6 +8521,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + bool woken = false; int page_size; int entries, i; ssize_t ret = 0; @@ -8573,17 +8595,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, /* did we read anything? */ if (!spd.nr_pages) { - long wait_index; if (ret) goto out; + if (woken) + goto out; + ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; - wait_index = READ_ONCE(iter->wait_index); - ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent); if (ret) goto out; @@ -8592,10 +8614,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!tracer_tracing_is_on(iter->tr)) goto out; - /* Make sure we see the new wait_index */ - smp_rmb(); - if (wait_index != iter->wait_index) - goto out; + /* Iterate one more time to collect any new data then exit */ + woken = true; goto again; } @@ -8618,9 +8638,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned mutex_lock(&trace_types_lock); - iter->wait_index++; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH v4 1/2] ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> Convert ring_buffer_wait() over to wait_event_interruptible(). The default condition is to execute the wait loop inside __wait_event() just once. This does not change the ring_buffer_wait() prototype yet, but restructures the code so that it can take a "cond" and "data" parameter and will call wait_event_interruptible() with a helper function as the condition. The helper function (rb_wait_cond) takes the cond function and data parameters. It will first check if the buffer hit the watermark defined by the "full" parameter and then call the passed in condition parameter. If either are true, it returns true. If rb_wait_cond() does not return true, it will set the appropriate "waiters_pending" flag and returns false. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 1 + kernel/trace/ring_buffer.c | 116 +++++++++++++++++++++--------------- 2 files changed, 69 insertions(+), 48 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index fa802db216f9..338a33db1577 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -98,6 +98,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +typedef bool (*ring_buffer_cond_fn)(void *data); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6ef763f57c66..c198ba466853 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -842,43 +842,15 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) return ret; } -/** - * ring_buffer_wait - wait for input to the ring buffer - * @buffer: buffer to wait on - * @cpu: the cpu buffer to wait on - * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS - * - * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon - * as data is added to any of the @buffer's cpu buffers. Otherwise - * it will wait for data to be added to a specific cpu buffer. - */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +static inline bool +rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer, + int cpu, int full, ring_buffer_cond_fn cond, void *data) { - struct ring_buffer_per_cpu *cpu_buffer; - DEFINE_WAIT(wait); - struct rb_irq_work *work; - int ret = 0; - - /* - * Depending on what the caller is waiting for, either any - * data in any cpu buffer, or a specific buffer, put the - * caller on the appropriate wait queue. - */ - if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; - /* Full only makes sense on per cpu reads */ - full = 0; - } else { - if (!cpumask_test_cpu(cpu, buffer->cpumask)) - return -ENODEV; - cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; - } + if (rb_watermark_hit(buffer, cpu, full)) + return true; - if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); - else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); + if (cond(data)) + return true; /* * The events can happen in critical sections where @@ -901,27 +873,75 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * a task has been queued. It's OK for spurious wake ups. */ if (full) - work->full_waiters_pending = true; + rbwork->full_waiters_pending = true; else - work->waiters_pending = true; + rbwork->waiters_pending = true; - if (rb_watermark_hit(buffer, cpu, full)) - goto out; + return false; +} - if (signal_pending(current)) { - ret = -EINTR; - goto out; +/* + * The default wait condition for ring_buffer_wait() is to just to exit the + * wait loop the first time it is woken up. + */ +static bool rb_wait_once(void *data) +{ + long *once = data; + + /* wait_event() actually calls this twice before scheduling*/ + if (*once > 1) + return true; + + (*once)++; + return false; +} + +/** + * ring_buffer_wait - wait for input to the ring buffer + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon + * as data is added to any of the @buffer's cpu buffers. Otherwise + * it will wait for data to be added to a specific cpu buffer. + */ +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + struct wait_queue_head *waitq; + ring_buffer_cond_fn cond; + struct rb_irq_work *rbwork; + void *data; + long once = 0; + int ret = 0; + + cond = rb_wait_once; + data = &once; + + /* + * Depending on what the caller is waiting for, either any + * data in any cpu buffer, or a specific buffer, put the + * caller on the appropriate wait queue. + */ + if (cpu == RING_BUFFER_ALL_CPUS) { + rbwork = &buffer->irq_work; + /* Full only makes sense on per cpu reads */ + full = 0; + } else { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + return -ENODEV; + cpu_buffer = buffer->buffers[cpu]; + rbwork = &cpu_buffer->irq_work; } - schedule(); - out: if (full) - finish_wait(&work->full_waiters, &wait); + waitq = &rbwork->full_waiters; else - finish_wait(&work->waiters, &wait); + waitq = &rbwork->waiters; - if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) - ret = -EINTR; + ret = wait_event_interruptible((*waitq), + rb_wait_cond(rbwork, buffer, cpu, full, cond, data)); return ret; } -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH v3 2/2] tracing/ring-buffer: Fix wait_on_pipe() race

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the trace_pipe_raw file is closed, there should be no new readers on the file descriptor. This is mostly handled with the waking and wait_index fields of the iterator. But there's still a slight race. CPU 0 CPU 1 ----- ----- wait_index++; index = wait_index; ring_buffer_wake_waiters(); wait_on_pipe() ring_buffer_wait(); The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is that the ring_buffer_wait() needs the logic of: prepare_to_wait(); if (!condition) schedule(); Where the missing condition check is the iter->wait_index update. Have the ring_buffer_wait() take a conditional callback function and a data parameter that can be used within the wait_event_interruptible() of the ring_buffer_wait() function. In wait_on_pipe(), pass a condition function that will check if the wait_index has been updated, if it has, it will return true to break out of the wait_event_interruptible() loop. Create a new field "closed" in the trace_iterator and set it in the .flush() callback before calling ring_buffer_wake_waiters(). This will keep any new readers from waiting on a closed file descriptor. Have the wait_on_pipe() condition callback also check the closed field. Change the wait_index field of the trace_iterator to atomic_t. There's no reason it needs to be 'long' and making it atomic and using atomic_read_acquire() and atomic_fetch_inc_release() will provide the necessary memory barriers. Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after one more try to fetch data. That is, if it waited for data and something woke it up, it should try to collect any new data and then exit back to user space. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 3 ++- include/linux/trace_events.h | 5 ++++- kernel/trace/ring_buffer.c | 11 ++++----- kernel/trace/trace.c | 43 ++++++++++++++++++++++++++---------- 4 files changed, 43 insertions(+), 19 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 338a33db1577..dc5ae4e96aee 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k }) typedef bool (*ring_buffer_cond_fn)(void *data); -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d68ff9b1247f..fc6d0af56bb1 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -103,13 +103,16 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; - long wait_index; + atomic_t wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; cpumask_var_t started; + /* Set when the file is closed to prevent new waiters */ + bool closed; + /* it's true when current open file is snapshot */ bool snapshot; diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c198ba466853..81a5303bdc09 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -906,18 +906,19 @@ static bool rb_wait_once(void *data) * as data is added to any of the @buffer's cpu buffers. Otherwise * it will wait for data to be added to a specific cpu buffer. */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, + ring_buffer_cond_fn cond, void *data) { struct ring_buffer_per_cpu *cpu_buffer; struct wait_queue_head *waitq; - ring_buffer_cond_fn cond; struct rb_irq_work *rbwork; - void *data; long once = 0; int ret = 0; - cond = rb_wait_once; - data = &once; + if (!cond) { + cond = rb_wait_once; + data = &once; + } /* * Depending on what the caller is waiting for, either any diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c9c898307348..d390fea3a6a5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1955,15 +1955,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu) #endif /* CONFIG_TRACER_MAX_TRACE */ +struct pipe_wait { + struct trace_iterator *iter; + int wait_index; +}; + +static bool wait_pipe_cond(void *data) +{ + struct pipe_wait *pwait = data; + struct trace_iterator *iter = pwait->iter; + + if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index) + return true; + + return iter->closed; +} + static int wait_on_pipe(struct trace_iterator *iter, int full) { + struct pipe_wait pwait; int ret; /* Iterators are static, they should be filled or empty */ if (trace_buffer_iter(iter, iter->cpu_file)) return 0; - ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full); + pwait.wait_index = atomic_read_acquire(&iter->wait_index); + pwait.iter = iter; + + ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full, + wait_pipe_cond, &pwait); #ifdef CONFIG_TRACER_MAX_TRACE /* @@ -8398,9 +8419,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id) struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; - iter->wait_index++; + iter->closed = true; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); @@ -8500,6 +8521,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + bool woken = false; int page_size; int entries, i; ssize_t ret = 0; @@ -8573,17 +8595,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, /* did we read anything? */ if (!spd.nr_pages) { - long wait_index; if (ret) goto out; + if (woken) + goto out; + ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; - wait_index = READ_ONCE(iter->wait_index); - ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent); if (ret) goto out; @@ -8592,10 +8614,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!tracer_tracing_is_on(iter->tr)) goto out; - /* Make sure we see the new wait_index */ - smp_rmb(); - if (wait_index != iter->wait_index) - goto out; + /* Iterate one more time to collect any new data then exit */ + woken = true; goto again; } @@ -8618,9 +8638,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned mutex_lock(&trace_types_lock); - iter->wait_index++; /* Make sure the waiters see the new wait_index */ - smp_wmb(); + (void)atomic_fetch_inc_release(&iter->wait_index); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH v3 1/2] ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> Convert ring_buffer_wait() over to wait_event_interruptible(). The default condition is to execute the wait loop inside __wait_event() just once. This does not change the ring_buffer_wait() prototype yet, but restructures the code so that it can take a "cond" and "data" parameter and will call wait_event_interruptible() with a helper function as the condition. The helper function (rb_wait_cond) takes the cond function and data parameters. It will first check if the buffer hit the watermark defined by the "full" parameter and then call the passed in condition parameter. If either are true, it returns true. If rb_wait_cond() does not return true, it will set the appropriate "waiters_pending" flag and returns false. Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 1 + kernel/trace/ring_buffer.c | 116 +++++++++++++++++++++--------------- 2 files changed, 69 insertions(+), 48 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index fa802db216f9..338a33db1577 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -98,6 +98,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +typedef bool (*ring_buffer_cond_fn)(void *data); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6ef763f57c66..c198ba466853 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -842,43 +842,15 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) return ret; } -/** - * ring_buffer_wait - wait for input to the ring buffer - * @buffer: buffer to wait on - * @cpu: the cpu buffer to wait on - * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS - * - * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon - * as data is added to any of the @buffer's cpu buffers. Otherwise - * it will wait for data to be added to a specific cpu buffer. - */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +static inline bool +rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer, + int cpu, int full, ring_buffer_cond_fn cond, void *data) { - struct ring_buffer_per_cpu *cpu_buffer; - DEFINE_WAIT(wait); - struct rb_irq_work *work; - int ret = 0; - - /* - * Depending on what the caller is waiting for, either any - * data in any cpu buffer, or a specific buffer, put the - * caller on the appropriate wait queue. - */ - if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; - /* Full only makes sense on per cpu reads */ - full = 0; - } else { - if (!cpumask_test_cpu(cpu, buffer->cpumask)) - return -ENODEV; - cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; - } + if (rb_watermark_hit(buffer, cpu, full)) + return true; - if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); - else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); + if (cond(data)) + return true; /* * The events can happen in critical sections where @@ -901,27 +873,75 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * a task has been queued. It's OK for spurious wake ups. */ if (full) - work->full_waiters_pending = true; + rbwork->full_waiters_pending = true; else - work->waiters_pending = true; + rbwork->waiters_pending = true; - if (rb_watermark_hit(buffer, cpu, full)) - goto out; + return false; +} - if (signal_pending(current)) { - ret = -EINTR; - goto out; +/* + * The default wait condition for ring_buffer_wait() is to just to exit the + * wait loop the first time it is woken up. + */ +static bool rb_wait_once(void *data) +{ + long *once = data; + + /* wait_event() actually calls this twice before scheduling*/ + if (*once > 1) + return true; + + (*once)++; + return false; +} + +/** + * ring_buffer_wait - wait for input to the ring buffer + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon + * as data is added to any of the @buffer's cpu buffers. Otherwise + * it will wait for data to be added to a specific cpu buffer. + */ +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + struct wait_queue_head *waitq; + ring_buffer_cond_fn cond; + struct rb_irq_work *rbwork; + void *data; + long once = 0; + int ret = 0; + + cond = rb_wait_once; + data = &once; + + /* + * Depending on what the caller is waiting for, either any + * data in any cpu buffer, or a specific buffer, put the + * caller on the appropriate wait queue. + */ + if (cpu == RING_BUFFER_ALL_CPUS) { + rbwork = &buffer->irq_work; + /* Full only makes sense on per cpu reads */ + full = 0; + } else { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + return -ENODEV; + cpu_buffer = buffer->buffers[cpu]; + rbwork = &cpu_buffer->irq_work; } - schedule(); - out: if (full) - finish_wait(&work->full_waiters, &wait); + waitq = &rbwork->full_waiters; else - finish_wait(&work->waiters, &wait); + waitq = &rbwork->waiters; - if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) - ret = -EINTR; + ret = wait_event_interruptible((*waitq), + rb_wait_cond(rbwork, buffer, cpu, full, cond, data)); return ret; } -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH 1/2] ring-buffer: Fix full_waiters_pending in poll

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> If a reader of the ring buffer is doing a poll, and waiting for the ring buffer to hit a specific watermark, there could be a case where it gets into an infinite ping-pong loop. The poll code has: rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; The writer will see full_waiters_pending and check if the ring buffer is filled over the percentage of the shortest_full value. If it is, it calls an irq_work to wake up all the waiters. But the code could get into a circular loop: CPU 0 CPU 1 ----- ----- [ Poll ] [ shortest_full = 0 ] rbwork->full_waiters_pending = true; if (rbwork->full_waiters_pending && [ buffer percent ] > shortest_full) { rbwork->wakeup_full = true; [ queue_irqwork ] cpu_buffer->shortest_full = full; [ IRQ work ] if (rbwork->wakeup_full) { cpu_buffer->shortest_full = 0; wakeup poll waiters; [woken] if ([ buffer percent ] > full) break; rbwork->full_waiters_pending = true; if (rbwork->full_waiters_pending && [ buffer percent ] > shortest_full) { rbwork->wakeup_full = true; [ queue_irqwork ] cpu_buffer->shortest_full = full; [ IRQ work ] if (rbwork->wakeup_full) { cpu_buffer->shortest_full = 0; wakeup poll waiters; [woken] [ Wash, rinse, repeat! ] In the poll, the shortest_full needs to be set before the full_pending_waiters, as once that is set, the writer will compare the current shortest_full (which is incorrect) to decide to call the irq_work, which will reset the shortest_full (expecting the readers to update it). Also move the setting of full_waiters_pending after the check if the ring buffer has the required percentage filled. There's no reason to tell the writer to wake up waiters if there are no waiters. Cc: stable(a)vger.kernel.org Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index aa332ace108b..adfe603a769b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -964,16 +964,32 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, poll_wait(filp, &rbwork->full_waiters, poll_table); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - } else { - poll_wait(filp, &rbwork->waiters, poll_table); - rbwork->waiters_pending = true; + if (full_hit(buffer, cpu, full)) + return EPOLLIN | EPOLLRDNORM; + /* + * Only allow full_waiters_pending update to be seen after + * the shortest_full is set. If the writer sees the + * full_waiters_pending flag set, it will compare the + * amount in the ring buffer to shortest_full. If the amount + * in the ring buffer is greater than the shortest_full + * percent, it will call the irq_work handler to wake up + * this list. The irq_handler will reset shortest_full + * back to zero. That's done under the reader_lock, but + * the below smp_mb() makes sure that the update to + * full_waiters_pending doesn't leak up into the above. + */ + smp_mb(); + rbwork->full_waiters_pending = true; + return 0; } + poll_wait(filp, &rbwork->waiters, poll_table); + rbwork->waiters_pending = true; + /* * There's a tight race between setting the waiters_pending and * checking if the ring buffer is empty. Once the waiters_pending bit @@ -989,9 +1005,6 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, */ smp_mb(); - if (full) - return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0; - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) return EPOLLIN | EPOLLRDNORM; -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH v1] usb: typec: tcpm: Update PD of Type-C port upon pd_set

by Kyle Tso

The PD of Type-C port needs to be updated in pd_set. Unlink the Type-C port device to the old PD before linking it to a new one. Fixes: cd099cde4ed2 ("usb: typec: tcpm: Support multiple capabilities") Cc: stable(a)vger.kernel.org Signed-off-by: Kyle Tso <kyletso(a)google.com> --- drivers/usb/typec/tcpm/tcpm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index 3d505614bff1..896f594b9328 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -6907,7 +6907,9 @@ static int tcpm_pd_set(struct typec_port *p, struct usb_power_delivery *pd) port->port_source_caps = data->source_cap; port->port_sink_caps = data->sink_cap; + typec_port_set_usb_power_delivery(p, NULL); port->selected_pd = pd; + typec_port_set_usb_power_delivery(p, port->selected_pd); unlock: mutex_unlock(&port->lock); return ret; -- 2.44.0.278.ge034bb2e1d-goog

1 year, 7 months

2
1
0 0

[PATCH v1] usb: typec: tcpm: Correct port source pdo array in pd_set callback

by Kyle Tso

In tcpm_pd_set, the array of port source capabilities is port->src_pdo, not port->snk_pdo. Fixes: cd099cde4ed2 ("usb: typec: tcpm: Support multiple capabilities") Cc: stable(a)vger.kernel.org Signed-off-by: Kyle Tso <kyletso(a)google.com> --- drivers/usb/typec/tcpm/tcpm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index 3d505614bff1..9485f6373de4 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -6858,7 +6858,7 @@ static int tcpm_pd_set(struct typec_port *p, struct usb_power_delivery *pd) if (data->source_desc.pdo[0]) { for (i = 0; i < PDO_MAX_OBJECTS && data->source_desc.pdo[i]; i++) - port->snk_pdo[i] = data->source_desc.pdo[i]; + port->src_pdo[i] = data->source_desc.pdo[i]; port->nr_src_pdo = i + 1; } -- 2.44.0.278.ge034bb2e1d-goog

1 year, 7 months

2
1
0 0

[PATCH 2/2] tracing: Remove unnecessary var destroy in onmax_destroy()

by George Guo

From: George Guo <guodongtai(a)kylinos.cn> The onmax_destroy() destroyed the onmax var, casusing a double-free error flagged by KASAN. This is tested via "./ftracetest test.d/trigger/inter-event/ trigger-onmatch-onmax-action-hist.tc". ================================================================== BUG: KASAN: use-after-free in destroy_hist_field+0x1c2/0x200 Read of size 8 at addr ffff88800a4ad100 by task ftracetest/4731 CPU: 0 PID: 4731 Comm: ftracetest Kdump: loaded Tainted: GE 4.19.90-89 #77 Source Version: Unknown Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0 Call Trace: dump_stack+0xcb/0x10b print_address_description.cold+0x54/0x249 kasan_report_error.cold+0x63/0xab ? destroy_hist_field+0x1c2/0x200 ? hist_trigger_elt_data_alloc+0x5a0/0x5a0 __asan_report_load8_noabort+0x8d/0xa0 ? destroy_hist_field+0x1c2/0x200 destroy_hist_field+0x1c2/0x200 onmax_destroy+0x72/0x1e0 ? hist_trigger_elt_data_alloc+0x5a0/0x5a0 destroy_hist_data+0x236/0xa40 event_hist_trigger_free+0x212/0x2f0 ? update_cond_flag+0x128/0x170 ? event_hist_trigger_func+0x2880/0x2880 hist_unregister_trigger+0x2f2/0x4f0 event_hist_trigger_func+0x168c/0x2880 ? tracing_map_cmp_u64+0xa0/0xa0 ? onmatch_create.constprop.0+0xf50/0xf50 ? __mutex_lock_slowpath+0x10/0x10 event_trigger_write+0x2f4/0x490 ? trigger_start+0x180/0x180 ? __fget_light+0x369/0x5d0 ? count_memcg_event_mm+0x104/0x2b0 ? trigger_start+0x180/0x180 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x540 ksys_write+0x12a/0x290 ? __ia32_sys_read+0xb0/0xb0 ? __close_fd+0x1d3/0x280 do_syscall_64+0xe3/0x2d0 entry_SYSCALL_64_after_hwframe+0x5c/0xc1 RIP: 0033:0x7fd7f4c44e04 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 39 34 0c 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5 RSP: 002b:00007fff10370df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000010f RCX: 00007fd7f4c44e04 RDX: 000000000000010f RSI: 000055fa765df650 RDI: 0000000000000001 RBP: 000055fa765df650 R08: 000000000000000a R09: 0000000000000000 R10: 000000000000000a R11: 0000000000000246 R12: 00007fd7f4d035c0 R13: 000000000000010f R14: 00007fd7f4d037c0 R15: 000000000000010f ================================================================== So remove the onmax_destroy() destroy_hist_field() call for that var. Fixes: 50450603ec9c("tracing: Add 'onmax' hist trigger action support") Cc: stable(a)vger.kernel.org Signed-off-by: George Guo <guodongtai(a)kylinos.cn> --- kernel/trace/trace_events_hist.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 7dcb96305e56..58b8a2575b8c 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -337,7 +337,6 @@ struct action_data { char *fn_name; unsigned int max_var_ref_idx; struct hist_field *max_var; - struct hist_field *var; } onmax; }; }; @@ -3489,7 +3488,6 @@ static void onmax_destroy(struct action_data *data) unsigned int i; destroy_hist_field(data->onmax.max_var, 0); - destroy_hist_field(data->onmax.var, 0); kfree(data->onmax.var_str); kfree(data->onmax.fn_name); @@ -3528,8 +3526,6 @@ static int onmax_create(struct hist_trigger_data *hist_data, if (!ref_field) return -ENOMEM; - data->onmax.var = ref_field; - data->fn = onmax_save; data->onmax.max_var_ref_idx = var_ref_idx; max_var = create_var(hist_data, file, "max", sizeof(u64), "u64"); -- 2.34.1

1 year, 7 months

1
0
0 0

[PATCH 1/2] tracing: Remove unnecessary hist_data destroy in destroy_synth_var_refs()

by George Guo

From: George Guo <guodongtai(a)kylinos.cn> The destroy_synth_var_refs() destroyed hist_data, casusing a double-free error flagged by KASAN. This is tested via "./ftracetest test.d/trigger/inter-event/ trigger-field-variable-support.tc" ================================================================== BUG: KASAN: use-after-free in destroy_hist_field+0x115/0x140 Read of size 4 at addr ffff888012e95318 by task ftracetest/1858 CPU: 1 PID: 1858 Comm: ftracetest Kdump: loaded Tainted: GE 4.19.90-89 #24 Source Version: Unknown Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0 Call Trace: dump_stack+0xcb/0x10b print_address_description.cold+0x54/0x249 kasan_report_error.cold+0x63/0xab ? destroy_hist_field+0x115/0x140 __asan_report_load4_noabort+0x8d/0xa0 ? destroy_hist_field+0x115/0x140 destroy_hist_field+0x115/0x140 destroy_hist_data+0x4e4/0x9a0 event_hist_trigger_free+0x212/0x2f0 ? update_cond_flag+0x128/0x170 ? event_hist_trigger_func+0x2880/0x2880 hist_unregister_trigger+0x2f2/0x4f0 event_hist_trigger_func+0x168c/0x2880 ? tracing_map_read_var_once+0xd0/0xd0 ? create_key_field+0x520/0x520 ? __mutex_lock_slowpath+0x10/0x10 event_trigger_write+0x2f4/0x490 ? trigger_start+0x180/0x180 ? __fget_light+0x369/0x5d0 ? count_memcg_event_mm+0x104/0x2b0 ? trigger_start+0x180/0x180 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x540 ksys_write+0x12a/0x290 ? __ia32_sys_read+0xb0/0xb0 ? __close_fd+0x1d3/0x280 do_syscall_64+0xe3/0x2d0 entry_SYSCALL_64_after_hwframe+0x5c/0xc1 RIP: 0033:0x7efdd342ee04 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 39 34 0c 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5 RSP: 002b:00007ffda01f5e08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000000000b4 RCX: 00007efdd342ee04 RDX: 00000000000000b4 RSI: 000055c5b41b1e90 RDI: 0000000000000001 RBP: 000055c5b41b1e90 R08: 000000000000000a R09: 0000000000000000 R10: 000000000000000a R11: 0000000000000246 R12: 00007efdd34ed5c0 R13: 00000000000000b4 R14: 00007efdd34ed7c0 R15: 00000000000000b4 ================================================================== So remove the destroy_synth_var_refs() call for that hist_data. Fixes: c282a386a397("tracing: Add 'onmatch' hist trigger action support") Cc: stable(a)vger.kernel.org Signed-off-by: George Guo <guodongtai(a)kylinos.cn> --- kernel/trace/trace_events_hist.c | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index e004daf8cad5..7dcb96305e56 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -280,8 +280,6 @@ struct hist_trigger_data { struct action_data *actions[HIST_ACTIONS_MAX]; unsigned int n_actions; - struct hist_field *synth_var_refs[SYNTH_FIELDS_MAX]; - unsigned int n_synth_var_refs; struct field_var *field_vars[SYNTH_FIELDS_MAX]; unsigned int n_field_vars; unsigned int n_field_var_str; @@ -1363,8 +1361,8 @@ static struct hist_field *find_var_ref(struct hist_trigger_data *hist_data, return found; } - for (i = 0; i < hist_data->n_synth_var_refs; i++) { - hist_field = hist_data->synth_var_refs[i]; + for (i = 0; i < hist_data->n_var_refs; i++) { + hist_field = hist_data->var_refs[i]; found = check_field_for_var_refs(hist_data, hist_field, var_data, var_idx, 0); if (found) @@ -3707,21 +3705,6 @@ static void save_field_var(struct hist_trigger_data *hist_data, hist_data->n_field_var_str++; } - -static void destroy_synth_var_refs(struct hist_trigger_data *hist_data) -{ - unsigned int i; - - for (i = 0; i < hist_data->n_synth_var_refs; i++) - destroy_hist_field(hist_data->synth_var_refs[i], 0); -} - -static void save_synth_var_ref(struct hist_trigger_data *hist_data, - struct hist_field *var_ref) -{ - hist_data->synth_var_refs[hist_data->n_synth_var_refs++] = var_ref; -} - static int check_synth_field(struct synth_event *event, struct hist_field *hist_field, unsigned int field_pos) @@ -3884,7 +3867,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data, goto err; } - save_synth_var_ref(hist_data, var_ref); field_pos++; kfree(p); continue; @@ -4631,7 +4613,6 @@ static void destroy_hist_data(struct hist_trigger_data *hist_data) destroy_actions(hist_data); destroy_field_vars(hist_data); destroy_field_var_hists(hist_data); - destroy_synth_var_refs(hist_data); kfree(hist_data); } -- 2.34.1

1 year, 7 months

1
0
0 0

[PATCH 41/43] drm/amd/display: fix a bug to dereference already freed old current state memory

by Wayne Lin

From: Wenjing Liu <wenjing.liu(a)amd.com> [why] During minimal transition commit, the base state could be freed if it is current state. This is because after committing minimal transition state, the current state will be swapped to the minimal transition state and the old current state will be released. the release could cause the old current state's memory to be freed. However dc will derefernce this memory when release minimal transition state. Therefore, we need to retain the old current state until we release minimal transition state. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Josip Pavic <josip.pavic(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index a372c4965adf..ab0c920333be 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -4203,7 +4203,6 @@ static void release_minimal_transition_state(struct dc *dc, { restore_minimal_pipe_split_policy(dc, base_context, policy); dc_state_release(minimal_transition_context); - /* restore previous pipe split and odm policy */ } static void force_vsync_flip_in_minimal_transition_context(struct dc_state *context) @@ -4258,7 +4257,7 @@ static bool is_pipe_topology_transition_seamless_with_intermediate_step( intermediate_state, final_state); } -static void swap_and_free_current_context(struct dc *dc, +static void swap_and_release_current_context(struct dc *dc, struct dc_state *new_context, struct dc_stream_state *stream) { @@ -4321,7 +4320,7 @@ static bool commit_minimal_transition_based_on_new_context(struct dc *dc, commit_planes_for_stream(dc, srf_updates, surface_count, stream, NULL, UPDATE_TYPE_FULL, intermediate_context); - swap_and_free_current_context( + swap_and_release_current_context( dc, intermediate_context, stream); dc_state_retain(dc->current_state); success = true; @@ -4338,6 +4337,7 @@ static bool commit_minimal_transition_based_on_current_context(struct dc *dc, bool success = false; struct pipe_split_policy_backup policy; struct dc_state *intermediate_context; + struct dc_state *old_current_state = dc->current_state; struct dc_surface_update srf_updates[MAX_SURFACE_NUM]; int surface_count; @@ -4353,8 +4353,10 @@ static bool commit_minimal_transition_based_on_current_context(struct dc *dc, * with the current state. */ restore_planes_and_stream_state(&dc->scratch.current_state, stream); + dc_state_retain(old_current_state); intermediate_context = create_minimal_transition_state(dc, - dc->current_state, &policy); + old_current_state, &policy); + if (intermediate_context) { if (is_pipe_topology_transition_seamless_with_intermediate_step( dc, @@ -4367,14 +4369,15 @@ static bool commit_minimal_transition_based_on_current_context(struct dc *dc, commit_planes_for_stream(dc, srf_updates, surface_count, stream, NULL, UPDATE_TYPE_FULL, intermediate_context); - swap_and_free_current_context( + swap_and_release_current_context( dc, intermediate_context, stream); dc_state_retain(dc->current_state); success = true; } release_minimal_transition_state(dc, intermediate_context, - dc->current_state, &policy); + old_current_state, &policy); } + dc_state_release(old_current_state); /* * Restore stream and plane states back to the values associated with * new context. @@ -4496,12 +4499,14 @@ static bool commit_minimal_transition_state(struct dc *dc, dc->debug.pipe_split_policy != MPC_SPLIT_AVOID ? "MPC in Use" : "Unknown"); + dc_state_retain(transition_base_context); transition_context = create_minimal_transition_state(dc, transition_base_context, &policy); if (transition_context) { ret = dc_commit_state_no_check(dc, transition_context); release_minimal_transition_state(dc, transition_context, transition_base_context, &policy); } + dc_state_release(transition_base_context); if (ret != DC_OK) { /* this should never happen */ @@ -4839,7 +4844,7 @@ static bool update_planes_and_stream_v2(struct dc *dc, context); } if (dc->current_state != context) - swap_and_free_current_context(dc, context, stream); + swap_and_release_current_context(dc, context, stream); return true; } @@ -4941,7 +4946,7 @@ static bool update_planes_and_stream_v3(struct dc *dc, commit_planes_and_stream_update_with_new_context(dc, srf_updates, surface_count, stream, stream_update, update_type, new_context); - swap_and_free_current_context(dc, new_context, stream); + swap_and_release_current_context(dc, new_context, stream); } return true; -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH 33/43] drm/amd/display: Prevent crash on bring-up

by Wayne Lin

From: Chris Park <chris.park(a)amd.com> [Why] Disabling stream encoder invokes a function that no longer exists in bring-up. [How] Check if the function declaration is NULL in disable stream encoder. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Charlene Liu <charlene.liu(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Chris Park <chris.park(a)amd.com> --- drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c index 9d5df4c0da59..0ba1feaf96c0 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c @@ -1185,7 +1185,8 @@ void dce110_disable_stream(struct pipe_ctx *pipe_ctx) if (dccg) { dccg->funcs->disable_symclk32_se(dccg, dp_hpo_inst); dccg->funcs->set_dpstreamclk(dccg, REFCLK, tg->inst, dp_hpo_inst); - dccg->funcs->set_dtbclk_dto(dccg, &dto_params); + if (dccg && dccg->funcs->set_dtbclk_dto) + dccg->funcs->set_dtbclk_dto(dccg, &dto_params); } } else if (dccg && dccg->funcs->disable_symclk_se) { dccg->funcs->disable_symclk_se(dccg, stream_enc->stream_enc_inst, -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH 27/43] drm/amd/display: Revert Remove pixle rate limit for subvp

by Wayne Lin

From: Wenjing Liu <wenjing.liu(a)amd.com> This reverts commit 6cf00f4c4d5c ("drm/amd/display: Remove pixle rate limit for subvp") [why] The original commit causes a regression when subvp is applied on ODM required 8k60hz timing. The display shows black screen on boot. The issue can be recovered with hotplug. It also causes MPO to fail. We will temprarily revert this commit and investigate the root cause further. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Chaitanya Dhere <chaitanya.dhere(a)amd.com> Reviewed-by: Martin Leung <martin.leung(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com> --- drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c index b49e1dc9d8ba..a0a65e099104 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c @@ -623,6 +623,7 @@ static bool dcn32_assign_subvp_pipe(struct dc *dc, * - Not TMZ surface */ if (pipe->plane_state && !pipe->top_pipe && !dcn32_is_center_timing(pipe) && + !(pipe->stream->timing.pix_clk_100hz / 10000 > DCN3_2_MAX_SUBVP_PIXEL_RATE_MHZ) && (!dcn32_is_psr_capable(pipe) || (context->stream_count == 1 && dc->caps.dmub_caps.subvp_psr)) && dc_state_get_pipe_subvp_type(context, pipe) == SUBVP_NONE && (refresh_rate < 120 || dcn32_allow_subvp_high_refresh_rate(dc, context, pipe)) && -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH 20/43] drm/amd/display: Revert "Set the power_down_on_boot function pointer to null"

by Wayne Lin

From: Ovidiu Bunea <ovidiu.bunea(a)amd.com> This reverts commit 1b35616f8bdb ("drm/amd/display: Set the power_down_on_boot function pointer to null") [why & how] This commit breaks S0i3 entry because DCN does not enter IPS2. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Reviewed-by: Charlene Liu <charlene.liu(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea(a)amd.com> --- drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c index d4e0abbef28e..dce620d359a6 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c @@ -39,7 +39,7 @@ static const struct hw_sequencer_funcs dcn35_funcs = { .program_gamut_remap = dcn30_program_gamut_remap, .init_hw = dcn35_init_hw, - .power_down_on_boot = NULL, + .power_down_on_boot = dcn35_power_down_on_boot, .apply_ctx_to_hw = dce110_apply_ctx_to_hw, .apply_ctx_for_surface = NULL, .program_front_end_for_ctx = dcn20_program_front_end_for_ctx, -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH 17/43] drm/amd/display: Revert Add left edge pixel + ODM pipe split

by Wayne Lin

From: Gabe Teeger <gabe.teeger(a)amd.com> This reverts commit 97c109f498da ("drm/amd/display: Add left edge pixel for YCbCr422/420 + ODM pipe split") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: George Shen <george.shen(a)amd.com> Reviewed-by: Charlene Liu <charlene.liu(a)amd.com> Reviewed-by: Jun Lei <jun.lei(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Gabe Teeger <gabe.teeger(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 -- .../gpu/drm/amd/display/dc/core/dc_resource.c | 37 ------------------- .../amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 10 ----- .../gpu/drm/amd/display/dc/inc/core_types.h | 2 - drivers/gpu/drm/amd/display/dc/inc/resource.h | 4 -- 5 files changed, 57 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index ad969e1dd427..a372c4965adf 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -3258,10 +3258,6 @@ static bool update_planes_and_stream_state(struct dc *dc, if (otg_master && otg_master->stream->test_pattern.type != DP_TEST_PATTERN_VIDEO_MODE) resource_build_test_pattern_params(&context->res_ctx, otg_master); - - if (otg_master && (otg_master->stream->timing.pixel_encoding == PIXEL_ENCODING_YCBCR422 || - otg_master->stream->timing.pixel_encoding == PIXEL_ENCODING_YCBCR420)) - resource_build_subsampling_params(&context->res_ctx, otg_master); } } update_seamless_boot_flags(dc, context, surface_count, stream); diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 945ee57f1721..96b4f68ec374 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -828,16 +828,6 @@ static struct rect calculate_odm_slice_in_timing_active(struct pipe_ctx *pipe_ct stream->timing.v_border_bottom + stream->timing.v_border_top; - /* Recout for ODM slices after the first slice need one extra left edge pixel - * for 3-tap chroma subsampling. - */ - if (odm_slice_idx > 0 && - (pipe_ctx->stream->timing.pixel_encoding == PIXEL_ENCODING_YCBCR422 || - pipe_ctx->stream->timing.pixel_encoding == PIXEL_ENCODING_YCBCR420)) { - odm_rec.x -= 1; - odm_rec.width += 1; - } - return odm_rec; } @@ -1454,7 +1444,6 @@ void resource_build_test_pattern_params(struct resource_context *res_ctx, enum controller_dp_test_pattern controller_test_pattern; enum controller_dp_color_space controller_color_space; enum dc_color_depth color_depth = otg_master->stream->timing.display_color_depth; - enum dc_pixel_encoding pixel_encoding = otg_master->stream->timing.pixel_encoding; int h_active = otg_master->stream->timing.h_addressable + otg_master->stream->timing.h_border_left + otg_master->stream->timing.h_border_right; @@ -1486,36 +1475,10 @@ void resource_build_test_pattern_params(struct resource_context *res_ctx, else params->width = last_odm_slice_width; - /* Extra left edge pixel is required for 3-tap chroma subsampling. */ - if (i != 0 && (pixel_encoding == PIXEL_ENCODING_YCBCR422 || - pixel_encoding == PIXEL_ENCODING_YCBCR420)) { - params->offset -= 1; - params->width += 1; - } - offset += odm_slice_width; } } -void resource_build_subsampling_params(struct resource_context *res_ctx, - struct pipe_ctx *otg_master) -{ - struct pipe_ctx *opp_heads[MAX_PIPES]; - int odm_cnt = 1; - int i; - - odm_cnt = resource_get_opp_heads_for_otg_master(otg_master, res_ctx, opp_heads); - - /* For ODM slices after the first slice, extra left edge pixel is required - * for 3-tap chroma subsampling. - */ - if (otg_master->stream->timing.pixel_encoding == PIXEL_ENCODING_YCBCR422 || - otg_master->stream->timing.pixel_encoding == PIXEL_ENCODING_YCBCR420) { - for (i = 0; i < odm_cnt; i++) - opp_heads[i]->stream_res.left_edge_extra_pixel = (i == 0) ? false : true; - } -} - bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx) { const struct dc_plane_state *plane_state = pipe_ctx->plane_state; diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c index 4dfc2dff5e01..8b3536c380b8 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c @@ -1579,11 +1579,6 @@ static void dcn20_detect_pipe_changes(struct dc_state *old_state, if (old_pipe->stream_res.tg != new_pipe->stream_res.tg) new_pipe->update_flags.bits.tg_changed = 1; - if (resource_is_pipe_type(new_pipe, OPP_HEAD)) { - if (old_pipe->stream_res.left_edge_extra_pixel != new_pipe->stream_res.left_edge_extra_pixel) - new_pipe->update_flags.bits.opp_changed = 1; - } - /* * Detect mpcc blending changes, only dpp inst and opp matter here, * mpccs getting removed/inserted update connected ones during their own @@ -1967,11 +1962,6 @@ static void dcn20_program_pipe( pipe_ctx->stream_res.opp, &pipe_ctx->stream->bit_depth_params, &pipe_ctx->stream->clamping); - - if (resource_is_pipe_type(pipe_ctx, OPP_HEAD)) - pipe_ctx->stream_res.opp->funcs->opp_program_left_edge_extra_pixel( - pipe_ctx->stream_res.opp, - pipe_ctx->stream_res.left_edge_extra_pixel); } /* Set ABM pipe after other pipe configurations done */ diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h b/drivers/gpu/drm/amd/display/dc/inc/core_types.h index b5b090197ad7..34764094f546 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h +++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h @@ -333,8 +333,6 @@ struct stream_resource { uint8_t gsl_group; struct test_pattern_params test_pattern_params; - - bool left_edge_extra_pixel; }; struct plane_resource { diff --git a/drivers/gpu/drm/amd/display/dc/inc/resource.h b/drivers/gpu/drm/amd/display/dc/inc/resource.h index b14d52e52fa2..77a60aa9f27b 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/resource.h +++ b/drivers/gpu/drm/amd/display/dc/inc/resource.h @@ -107,10 +107,6 @@ void resource_build_test_pattern_params( struct resource_context *res_ctx, struct pipe_ctx *pipe_ctx); -void resource_build_subsampling_params( - struct resource_context *res_ctx, - struct pipe_ctx *pipe_ctx); - bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx); enum dc_status resource_build_scaling_params_for_context( -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH 15/43] drm/amd/display: revert Exit idle optimizations before HDCP execution

by Wayne Lin

From: Martin Leung <martin.leung(a)amd.com> why and how: causes black screen on PNP on DCN 3.5 This reverts commit 520b0596f978 ("drm/amd/display: Exit idle optimizations before HDCP execution") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Martin Leung <martin.leung(a)amd.com> --- drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c | 10 ---------- drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h | 8 -------- 2 files changed, 18 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c index 9a5a1726acaf..5e01c6e24cbc 100644 --- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c +++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c @@ -88,14 +88,6 @@ static uint8_t is_cp_desired_hdcp2(struct mod_hdcp *hdcp) !hdcp->connection.is_hdcp2_revoked; } -static void exit_idle_optimizations(struct mod_hdcp *hdcp) -{ - struct mod_hdcp_dm *dm = &hdcp->config.dm; - - if (dm->funcs.exit_idle_optimizations) - dm->funcs.exit_idle_optimizations(dm->handle); -} - static enum mod_hdcp_status execution(struct mod_hdcp *hdcp, struct mod_hdcp_event_context *event_ctx, union mod_hdcp_transition_input *input) @@ -551,8 +543,6 @@ enum mod_hdcp_status mod_hdcp_process_event(struct mod_hdcp *hdcp, memset(&event_ctx, 0, sizeof(struct mod_hdcp_event_context)); event_ctx.event = event; - exit_idle_optimizations(hdcp); - /* execute and transition */ exec_status = execution(hdcp, &event_ctx, &hdcp->auth.trans_input); trans_status = transition( diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h index cdb17b093f2b..a4d344a4db9e 100644 --- a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h +++ b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h @@ -156,13 +156,6 @@ struct mod_hdcp_ddc { } funcs; }; -struct mod_hdcp_dm { - void *handle; - struct { - void (*exit_idle_optimizations)(void *handle); - } funcs; -}; - struct mod_hdcp_psp { void *handle; void *funcs; @@ -279,7 +272,6 @@ struct mod_hdcp_display_query { struct mod_hdcp_config { struct mod_hdcp_psp psp; struct mod_hdcp_ddc ddc; - struct mod_hdcp_dm dm; uint8_t index; }; -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH 13/43] drm/amd/display: Fix noise issue on HDMI AV mute

by Wayne Lin

From: Leo Ma <hanghong.ma(a)amd.com> [Why] When mode switching is triggered there is momentary noise visible on some HDMI TV or displays. [How] Wait for 2 frames to make sure we have enough time to send out AV mute and sink receives a full frame. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org Reviewed-by: Wenjing Liu <wenjing.liu(a)amd.com> Acked-by: Wayne Lin <wayne.lin(a)amd.com> Signed-off-by: Leo Ma <hanghong.ma(a)amd.com> --- .../gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c index 7e6b7f2a6dc9..8bc3d01537bb 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c @@ -812,10 +812,20 @@ void dcn30_set_avmute(struct pipe_ctx *pipe_ctx, bool enable) if (pipe_ctx == NULL) return; - if (dc_is_hdmi_signal(pipe_ctx->stream->signal) && pipe_ctx->stream_res.stream_enc != NULL) + if (dc_is_hdmi_signal(pipe_ctx->stream->signal) && pipe_ctx->stream_res.stream_enc != NULL) { pipe_ctx->stream_res.stream_enc->funcs->set_avmute( pipe_ctx->stream_res.stream_enc, enable); + + /* Wait for two frame to make sure AV mute is sent out */ + if (enable) { + pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VACTIVE); + pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VBLANK); + pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VACTIVE); + pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VBLANK); + pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VACTIVE); + } + } } void dcn30_update_info_frame(struct pipe_ctx *pipe_ctx) -- 2.37.3

1 year, 7 months

1
0
0 0

[PATCH] usb: typec: ucsi: Fix race between typec_switch and role_switch

by Krishna Kurapati

When orientation switch is enabled in ucsi glink, there is a xhci probe failure seen when booting up in host mode in reverse orientation. During bootup the following things happen in multiple drivers: a) DWC3 controller driver initializes the core in device mode when the dr_mode is set to DRD. It relies on role_switch call to change role to host. b) QMP driver initializes the lanes to TYPEC_ORIENTATION_NORMAL as a normal routine. It relies on the typec_switch_set call to get notified of orientation changes. c) UCSI core reads the UCSI_GET_CONNECTOR_STATUS via the glink and provides initial role switch to dwc3 controller. When booting up in host mode with orientation TYPEC_ORIENTATION_REVERSE, then we see the following things happening in order: a) UCSI gives initial role as host to dwc3 controller ucsi_register_port. Upon receiving this notification, the dwc3 core needs to program GCTL from PRTCAP_DEVICE to PRTCAP_HOST and as part of this change, it asserts GCTL Core soft reset and waits for it to be completed before shifting it to host. Only after the reset is done will the dwc3_host_init be invoked and xhci is probed. DWC3 controller expects that the usb phy's are stable during this process i.e., the phy init is already done. b) During the 100ms wait for GCTL core soft reset, the actual notification from PPM is received by ucsi_glink via pmic glink for changing role to host. The pmic_glink_ucsi_notify routine first sends the orientation change to QMP and then sends role to dwc3 via ucsi framework. This is happening exactly at the time GCTL core soft reset is being processed. c) When QMP driver receives typec switch to TYPEC_ORIENTATION_REVERSE, it then re-programs the phy at the instant GCTL core soft reset has been asserted by dwc3 controller due to which the QMP PLL lock fails in qmp_combo_usb_power_on. d) After the 100ms of GCTL core soft reset is completed, the dwc3 core goes for initializing the host mode and invokes xhci probe. But at this point the QMP is non-responsive and as a result, the xhci plat probe fails during xhci_reset. Fix this by passing orientation switch to available ucsi instances if their gpio configuration is available before ucsi_register is invoked so that by the time, the pmic_glink_ucsi_notify provides typec_switch to QMP, the lane is already configured and the call would be a NOP thus not racing with role switch. Cc: <stable(a)vger.kernel.org> Fixes: c6165ed2f425 ("usb: ucsi: glink: use the connector orientation GPIO to provide switch events") Suggested-by: Wesley Cheng <quic_wcheng(a)quicinc.com> Signed-off-by: Krishna Kurapati <quic_kriskura(a)quicinc.com> --- drivers/usb/typec/ucsi/ucsi_glink.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 0bd3f6dee678..466df7b9f953 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -255,6 +255,20 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work); + int orientation; + int i; + + for (i = 0; i < PMIC_GLINK_MAX_PORTS; i++) { + if (!ucsi->port_orientation[i]) + continue; + orientation = gpiod_get_value(ucsi->port_orientation[i]); + + if (orientation >= 0) { + typec_switch_set(ucsi->port_switch[i], + orientation ? TYPEC_ORIENTATION_REVERSE + : TYPEC_ORIENTATION_NORMAL); + } + } ucsi_register(ucsi->ucsi); } -- 2.34.1

1 year, 7 months

2
1
0 0

[PATCH v3] usb: typec: altmodes/displayport: create sysfs nodes as driver's default device attribute group

by RD Babiera

The DisplayPort driver's sysfs nodes may be present to the userspace before typec_altmode_set_drvdata() completes in dp_altmode_probe. This means that a sysfs read can trigger a NULL pointer error by deferencing dp->hpd in hpd_show or dp->lock in pin_assignment_show, as dev_get_drvdata() returns NULL in those cases. Remove manual sysfs node creation in favor of adding attribute group as default for devices bound to the driver. The ATTRIBUTE_GROUPS() macro is not used here otherwise the path to the sysfs nodes is no longer compliant with the ABI. Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable(a)vger.kernel.org Signed-off-by: RD Babiera <rdbabiera(a)google.com> --- Changes from v1: * Moved sysfs node creation instead of NULL checking dev_get_drvdata(). Changes from v2: * Removed manual sysfs node creation, now added as default device group in driver. --- drivers/usb/typec/altmodes/displayport.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/usb/typec/altmodes/displayport.c b/drivers/usb/typec/altmodes/displayport.c index 5a80776c7255..94e1b43a862d 100644 --- a/drivers/usb/typec/altmodes/displayport.c +++ b/drivers/usb/typec/altmodes/displayport.c @@ -702,16 +702,21 @@ static ssize_t hpd_show(struct device *dev, struct device_attribute *attr, char } static DEVICE_ATTR_RO(hpd); -static struct attribute *dp_altmode_attrs[] = { +static struct attribute *displayport_attrs[] = { &dev_attr_configuration.attr, &dev_attr_pin_assignment.attr, &dev_attr_hpd.attr, NULL }; -static const struct attribute_group dp_altmode_group = { +static const struct attribute_group displayport_group = { .name = "displayport", - .attrs = dp_altmode_attrs, + .attrs = displayport_attrs, +}; + +static const struct attribute_group *displayport_groups[] = { + &displayport_group, + NULL, }; int dp_altmode_probe(struct typec_altmode *alt) @@ -720,7 +725,6 @@ int dp_altmode_probe(struct typec_altmode *alt) struct typec_altmode *plug = typec_altmode_get_plug(alt, TYPEC_PLUG_SOP_P); struct fwnode_handle *fwnode; struct dp_altmode *dp; - int ret; /* FIXME: Port can only be DFP_U. */ @@ -731,10 +735,6 @@ int dp_altmode_probe(struct typec_altmode *alt) DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo))) return -ENODEV; - ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group); - if (ret) - return ret; - dp = devm_kzalloc(&alt->dev, sizeof(*dp), GFP_KERNEL); if (!dp) return -ENOMEM; @@ -777,7 +777,6 @@ void dp_altmode_remove(struct typec_altmode *alt) { struct dp_altmode *dp = typec_altmode_get_drvdata(alt); - sysfs_remove_group(&alt->dev.kobj, &dp_altmode_group); cancel_work_sync(&dp->work); typec_altmode_put_plug(dp->plug_prime); @@ -803,6 +802,7 @@ static struct typec_altmode_driver dp_altmode_driver = { .driver = { .name = "typec_displayport", .owner = THIS_MODULE, + .dev_groups = displayport_groups, }, }; module_typec_altmode_driver(dp_altmode_driver); base-commit: a560a5672826fc1e057068bda93b3d4c98d037a2 -- 2.44.0.rc1.240.g4c46232300-goog

1 year, 7 months

2
1
0 0

[PATCH v2] usb: typec: tpcm: Fix PORT_RESET behavior for self powered devices

by Badhri Jagan Sridharan

While commit 69f89168b310 ("usb: typec: tpcm: Fix issues with power being removed during reset") fixes the boot issues for bus powered devices such as LibreTech Renegade Elite/Firefly, it trades off the CC pins NOT being Hi-Zed during errory recovery (i.e PORT_RESET) for devices which are NOT bus powered(a.k.a self powered). This change Hi-Zs the CC pins only for self powered devices, thus preventing brown out for bus powered devices Adhering to spec is gaining more importance due to the Common charger initiative enforced by the European Union. Quoting from the spec: 4.5.2.2.2.1 ErrorRecovery State Requirements The port shall not drive VBUS or VCONN, and shall present a high-impedance to ground (above zOPEN) on its CC1 and CC2 pins. Hi-Zing the CC pins is the inteded behavior for PORT_RESET. CC pins are set to default state after tErrorRecovery in PORT_RESET_WAIT_OFF. 4.5.2.2.2.2 Exiting From ErrorRecovery State A Sink shall transition to Unattached.SNK after tErrorRecovery. A Source shall transition to Unattached.SRC after tErrorRecovery. Fixes: 69f89168b310 ("usb: typec: tpcm: Fix issues with power being removed during reset") Cc: stable(a)vger.kernel.org Cc: Mark Brown <broonie(a)kernel.org> Signed-off-by: Badhri Jagan Sridharan <badhri(a)google.com> --- Changes since V1: * Fix CC for linux stable --- drivers/usb/typec/tcpm/tcpm.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index c9a78f55ca48..bbe1381232eb 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -5593,8 +5593,11 @@ static void run_state_machine(struct tcpm_port *port) break; case PORT_RESET: tcpm_reset_port(port); - tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ? - TYPEC_CC_RD : tcpm_rp_cc(port)); + if (port->self_powered) + tcpm_set_cc(port, TYPEC_CC_OPEN); + else + tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ? + TYPEC_CC_RD : tcpm_rp_cc(port)); tcpm_set_state(port, PORT_RESET_WAIT_OFF, PD_T_ERROR_RECOVERY); break; base-commit: a560a5672826fc1e057068bda93b3d4c98d037a2 -- 2.44.0.rc1.240.g4c46232300-goog

1 year, 7 months

3
2
0 0

6.5.0 broke XHCI URB submissions for count >512

by Chris Yokum

We have found a regression bug, where more than 512 URBs cannot be reliably submitted to XHCI. URBs beyond that return 0x00 instead of valid data in the buffer. Our software works reliably on kernel versions through 6.4.x and fails on versions 6.5, 6.6, 6.7, and 6.8.0-rc6. This was discovered when Ubuntu recently updated their latest kernel package to version 6.5. The issue is limited to the XHCI driver and appears to be isolated to this specific commit: [ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d… | https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d… ] Attached is a test program that demonstrates the problem. We used a few different USB-to-Serial adapters with no driver installed as a convenient way to reproduce. We check the TRB debug information before and after to verify the actual number of allocated TRBs. With some adapters on unaffected kernels, the TRB map gets expanded correctly. This directly corresponds to correct functional behavior. On affected kernels, the TRB ring does not expand, and our functional tests also will fail. We don't know exactly why this happens. Some adapters do work correctly, so there seems to also be some subtle problem that was being masked by the liberal expansion of the TRB ring in older kernels. We also saw on one system that the TRB expansion did work correctly with one particular adapter. However, on all systems at least two adapters did exhibit the problem and fail. Would it be possible to resolve this regression for the 6.8 release and backport the fix to versions 6.5, 6.6, and 6.7? #regzbot ^introduced: f5af638f0609af889f15c700c60b93c06cc76675

1 year, 7 months

4
8
0 0

[char-misc-next] mei: me: disable RPL-S on SPS and IGN firmwares

by Tomas Winkler

From: Alexander Usyskin <alexander.usyskin(a)intel.com> Extend the quirk to disable MEI interface on Intel PCH Ignition (IGN) and SPS firmwares for RPL-S devices. These firmwares do not support the MEI protocol. Fixes: 3ed8c7d39cfe ("mei: me: add raptor lake point S DID") Cc: <stable(a)vger.kernel.org> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com> --- drivers/misc/mei/pci-me.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c index 1f0d983d3f06e5385b30592b..9b8021eb7ab5f8eefdb89ab1 100644 --- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -116,7 +116,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = { {MEI_PCI_DEVICE(MEI_DEV_ID_ADP_P, MEI_ME_PCH15_CFG)}, {MEI_PCI_DEVICE(MEI_DEV_ID_ADP_N, MEI_ME_PCH15_CFG)}, - {MEI_PCI_DEVICE(MEI_DEV_ID_RPL_S, MEI_ME_PCH15_CFG)}, + {MEI_PCI_DEVICE(MEI_DEV_ID_RPL_S, MEI_ME_PCH15_SPS_CFG)}, {MEI_PCI_DEVICE(MEI_DEV_ID_MTL_M, MEI_ME_PCH15_CFG)}, -- 2.44.0

1 year, 7 months

1
0
0 0

[PATCH] Bluetooth: Add device 0bda:4853 to blacklist/quirk table

by WangYuli

This new device is part of a Realtek RTW8852BE chip. Without this change the device utilizes an obsolete version of the firmware that is encoded in it rather than the updated Realtek firmware and config files from the firmware directory. The latter files implement many new features. The device table is as follows: T: Bus=03 Lev=01 Prnt=01 Port=09 Cnt=03 Dev#= 4 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0bda ProdID=4853 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Link: https://lore.kernel.org/all/20230810144507.9599-1-Larry.Finger@lwfinger.net/ Cc: stable(a)vger.kernel.org Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net> Signed-off-by: WangYuli <wangyuli(a)uniontech.com> --- drivers/bluetooth/btusb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 06e915b57283..d9c621d15fee 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -542,6 +542,8 @@ static const struct usb_device_id quirks_table[] = { /* Realtek 8852BE Bluetooth devices */ { USB_DEVICE(0x0cb8, 0xc559), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x0bda, 0x4853), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x0bda, 0x887b), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x0bda, 0xb85b), .driver_info = BTUSB_REALTEK | -- 2.43.0

1 year, 7 months

1
0
0 0

[PATCH v5.10 0/2] v5.10 backports for CVE-2023-52447

by Robert Kolchmeyer

Hi all, This patch series includes backports for the changes that fix CVE-2023-52447. The changes were applied cleanly from my 5.15 backport, viewable at https://lore.kernel.org/stable/cover.1710187165.git.rkolchmeyer@google.com/…. The notes in that patch set largely apply here. The only note I have for 5.10 is that the test cases with sleepable BPF programs in commit 1624918be84a ("selftests/bpf: Add test cases for inner map") do not seem to be compatible with the 5.10 kernel. But the situation with the other test cases matches the observations I shared in the 5.15 patch set. Thanks, -Robert Hou Tao (1): bpf: Defer the free of inner map when necessary Paul E. McKenney (1): rcu-tasks: Provide rcu_trace_implies_rcu_gp() include/linux/bpf.h | 7 ++++++- include/linux/rcupdate.h | 12 ++++++++++++ kernel/bpf/map_in_map.c | 11 ++++++++--- kernel/bpf/syscall.c | 26 ++++++++++++++++++++++++-- kernel/rcu/tasks.h | 2 ++ 5 files changed, 52 insertions(+), 6 deletions(-) -- 2.44.0.278.ge034bb2e1d-goog

1 year, 7 months

1
2
0 0

[PATCH -v3] mm: swap: fix race between free_swap_and_cache() and swapoff()

by Huang Ying

From: Ryan Roberts <ryan.roberts(a)arm.com> There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache(). Details of how to provoke one possible issue: --8<----- CPU0 CPU1 ---- ---- shmem_undo_range shmem_free_swap xa_cmpxchg_irq free_swap_and_cache __swap_entry_free /* swap_count() become 0 */ swapoff try_to_unuse shmem_unuse /* cannot find swap entry */ find_next_to_unuse filemap_get_folio folio_free_swap /* remove swap cache */ /* free si->swap_map[] */ swap_page_trans_huge_swapped <-- access freed si->swap_map !!! --8<----- Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Closes: https://lore.kernel.org/linux-mm/8734t27awd.fsf@yhuang6-desk2.ccr.corp.inte… Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com> [patch description] Cc: David Hildenbrand <david(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Hi, Andrew, If it's not too late. Please replace v2 of this patch in mm-stable with this version. Changes since v2: - Remove comments for get_swap_device() because it's not correct. - Revised patch description about the race condition description. Changes since v1: - Added comments for get_swap_device() as suggested by David - Moved check that swap entry is not free from get_swap_device() to free_swap_and_cache() since there are some paths that legitimately call with a free offset. Best Regards, Huang, Ying mm/swapfile.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 2b3a2d85e350..9e0691276f5e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1609,13 +1609,19 @@ int free_swap_and_cache(swp_entry_t entry) if (non_swap_entry(entry)) return 1; - p = _swap_info_get(entry); + p = get_swap_device(entry); if (p) { + if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) { + put_swap_device(p); + return 0; + } + count = __swap_entry_free(p, entry); if (count == SWAP_HAS_CACHE && !swap_page_trans_huge_swapped(p, entry)) __try_to_reclaim_swap(p, swp_offset(entry), TTRS_UNMAPPED | TTRS_FULL); + put_swap_device(p); } return p != NULL; } -- 2.39.2

1 year, 7 months

3
2
0 0

[PATCH AUTOSEL 5.15 1/9] RDMA/mlx5: Fix fortify source warning while accessing Eth segment

by Sasha Levin

From: Leon Romanovsky <leonro(a)nvidia.com> [ Upstream commit 4d5e86a56615cc387d21c629f9af8fb0e958d350 ] ------------[ cut here ]------------ memcpy: detected field-spanning write (size 56) of single field "eseg->inline_hdr.start" at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 (size 2) WARNING: CPU: 0 PID: 293779 at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] Modules linked in: 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) pci_hyperv_intf mlxdevm(OE) mlx_compat(OE) tls mlxfw(OE) psample nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink mst_pciconf(OE) knem(OE) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd irqbypass cuse nfsv3 nfs fscache netfs xfrm_user xfrm_algo ipmi_devintf ipmi_msghandler binfmt_misc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 snd_pcsp aesni_intel crypto_simd cryptd snd_pcm snd_timer joydev snd soundcore input_leds serio_raw evbug nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc drm efi_pstore ip_tables x_tables autofs4 psmouse virtio_net net_failover failover floppy [last unloaded: mlx_compat(OE)] CPU: 0 PID: 293779 Comm: ssh Tainted: G OE 6.2.0-32-generic #32~22.04.1-Ubuntu Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] Code: 0c 01 00 a8 01 75 25 48 8b 75 a0 b9 02 00 00 00 48 c7 c2 10 5b fd c0 48 c7 c7 80 5b fd c0 c6 05 57 0c 03 00 01 e8 95 4d 93 da <0f> 0b 44 8b 4d b0 4c 8b 45 c8 48 8b 4d c0 e9 49 fb ff ff 41 0f b7 RSP: 0018:ffffb5b48478b570 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb5b48478b628 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5b48478b5e8 R13: ffff963a3c609b5e R14: ffff9639c3fbd800 R15: ffffb5b480475a80 FS: 00007fc03b444c80(0000) GS:ffff963a3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556f46bdf000 CR3: 0000000006ac6003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x72/0x90 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] ? __warn+0x8d/0x160 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] ? report_bug+0x1bb/0x1d0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x19/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] mlx5_ib_post_send_nodrain+0xb/0x20 [mlx5_ib] ipoib_send+0x2ec/0x770 [ib_ipoib] ipoib_start_xmit+0x5a0/0x770 [ib_ipoib] dev_hard_start_xmit+0x8e/0x1e0 ? validate_xmit_skb_list+0x4d/0x80 sch_direct_xmit+0x116/0x3a0 __dev_xmit_skb+0x1fd/0x580 __dev_queue_xmit+0x284/0x6b0 ? _raw_spin_unlock_irq+0xe/0x50 ? __flush_work.isra.0+0x20d/0x370 ? push_pseudo_header+0x17/0x40 [ib_ipoib] neigh_connected_output+0xcd/0x110 ip_finish_output2+0x179/0x480 ? __smp_call_single_queue+0x61/0xa0 __ip_finish_output+0xc3/0x190 ip_finish_output+0x2e/0xf0 ip_output+0x78/0x110 ? __pfx_ip_finish_output+0x10/0x10 ip_local_out+0x64/0x70 __ip_queue_xmit+0x18a/0x460 ip_queue_xmit+0x15/0x30 __tcp_transmit_skb+0x914/0x9c0 tcp_write_xmit+0x334/0x8d0 tcp_push_one+0x3c/0x60 tcp_sendmsg_locked+0x2e1/0xac0 tcp_sendmsg+0x2d/0x50 inet_sendmsg+0x43/0x90 sock_sendmsg+0x68/0x80 sock_write_iter+0x93/0x100 vfs_write+0x326/0x3c0 ksys_write+0xbd/0xf0 ? do_syscall_64+0x69/0x90 __x64_sys_write+0x19/0x30 do_syscall_64+0x59/0x90 ? do_user_addr_fault+0x1d0/0x640 ? exit_to_user_mode_prepare+0x3b/0xd0 ? irqentry_exit_to_user_mode+0x9/0x20 ? irqentry_exit+0x43/0x50 ? exc_page_fault+0x92/0x1b0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fc03ad14a37 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdf8697fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000008024 RCX: 00007fc03ad14a37 RDX: 0000000000008024 RSI: 0000556f46bd8270 RDI: 0000000000000003 RBP: 0000556f46bb1800 R08: 0000000000007fe3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 0000556f46bc66b0 R14: 000000000000000a R15: 0000556f46bb2f50 </TASK> ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/r/8228ad34bd1a25047586270f7b1fb4ddcd046282.17064339… Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/infiniband/hw/mlx5/wr.c | 2 +- include/linux/mlx5/qp.h | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/wr.c b/drivers/infiniband/hw/mlx5/wr.c index 8841620af82f1..b81d282fb0d52 100644 --- a/drivers/infiniband/hw/mlx5/wr.c +++ b/drivers/infiniband/hw/mlx5/wr.c @@ -128,7 +128,7 @@ static void set_eth_seg(const struct ib_send_wr *wr, struct mlx5_ib_qp *qp, */ copysz = min_t(u64, *cur_edge - (void *)eseg->inline_hdr.start, left); - memcpy(eseg->inline_hdr.start, pdata, copysz); + memcpy(eseg->inline_hdr.data, pdata, copysz); stride = ALIGN(sizeof(struct mlx5_wqe_eth_seg) - sizeof(eseg->inline_hdr.start) + copysz, 16); *size += stride / 16; diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index 61e48d459b23c..693d758f2f1c9 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -261,7 +261,10 @@ struct mlx5_wqe_eth_seg { union { struct { __be16 sz; - u8 start[2]; + union { + u8 start[2]; + DECLARE_FLEX_ARRAY(u8, data); + }; } inline_hdr; struct { __be16 type; -- 2.43.0

1 year, 7 months

2
9
0 0

[PATCH v4 0/9] x86/sev: KEXEC/KDUMP support for SEV-ES guests

by Vasant Karasulli

From: Vasant Karasulli <vkarasulli(a)suse.de> Hi, here are changes to enable kexec/kdump in SEV-ES guests. The biggest problem for supporting kexec/kdump under SEV-ES is to find a way to hand the non-boot CPUs (APs) from one kernel to another. Without SEV-ES the first kernel parks the CPUs in a HLT loop until they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence. For virtual machines the CPU reset is emulated by the hypervisor, which sets the vCPU registers back to reset state. This does not work under SEV-ES, because the hypervisor has no access to the vCPU registers and can't make modifications to them. So an SEV-ES guest needs to reset the vCPU itself and park it using the AP-reset-hold protocol. Upon wakeup the guest needs to jump to real-mode and to the reset-vector configured in the AP-Jump-Table. The code to do this is the main part of this patch-set. It works by placing code on the AP Jump-Table page itself to park the vCPU and for jumping to the reset vector upon wakeup. The code on the AP Jump Table runs in 16-bit protected mode with segment base set to the beginning of the page. The AP Jump-Table is usually not within the first 1MB of memory, so the code can't run in real-mode. The AP Jump-Table is the best place to put the parking code, because the memory is owned, but read-only by the firmware and writeable by the OS. Only the first 4 bytes are used for the reset-vector, leaving the rest of the page for code/data/stack to park a vCPU. The code can't be in kernel memory because by the time the vCPU wakes up the memory will be owned by the new kernel, which might have overwritten it already. The other patches add initial GHCB Version 2 protocol support, because kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT, which is a GHCB protocol version 2 feature. The kexec'ed kernel is also entered via the decompressor and needs MMIO support there, so this patch-set also adds MMIO #VC support to the decompressor and support for handling CLFLUSH instructions. Finally there is also code to disable kexec/kdump support at runtime when the environment does not support it (e.g. no GHCB protocol version 2 support or AP Jump Table over 4GB). The diffstat looks big, but most of it is moving code for MMIO #VC support around to make it available to the decompressor. The previous version of this patch-set can be found here: https://lore.kernel.org/lkml/20220127101044.13803-1-joro@8bytes.org/ Please review. Thanks, Vasant Changes v3->v4: - Rebased to v6.8 kernel - Applied review comments by Sean Christopherson - Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table() into a single function which makes caching jump table address unnecessary - annotated struct sev_ap_jump_table_header with __packed attribute - added code to set up real mode data segment at boot time instead of hardcoding the value. Changes v2->v3: - Rebased to v5.17-rc1 - Applied most review comments by Boris - Use the name 'AP jump table' consistently - Make kexec-disabling for unsupported guests x86-specific - Cleanup and consolidate patches to detect GHCB v2 protocol support Joerg Roedel (9): x86/kexec/64: Disable kexec when SEV-ES is active x86/sev: Save and print negotiated GHCB protocol version x86/sev: Set GHCB data structure version x86/sev: Setup code to park APs in the AP Jump Table x86/sev: Park APs on AP Jump Table with GHCB protocol version 2 x86/sev: Use AP Jump Table blob to stop CPU x86/sev: Add MMIO handling support to boot/compressed/ code x86/sev: Handle CLFLUSH MMIO events x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob arch/x86/boot/compressed/sev.c | 45 +- arch/x86/include/asm/insn-eval.h | 1 + arch/x86/include/asm/realmode.h | 5 + arch/x86/include/asm/sev-ap-jumptable.h | 30 + arch/x86/include/asm/sev.h | 7 + arch/x86/kernel/machine_kexec_64.c | 12 + arch/x86/kernel/process.c | 8 + arch/x86/kernel/sev-shared.c | 234 +++++- arch/x86/kernel/sev.c | 372 +++++----- arch/x86/lib/insn-eval-shared.c | 912 ++++++++++++++++++++++++ arch/x86/lib/insn-eval.c | 911 +---------------------- arch/x86/realmode/Makefile | 9 +- arch/x86/realmode/rm/Makefile | 11 +- arch/x86/realmode/rm/header.S | 3 + arch/x86/realmode/rm/sev.S | 85 +++ arch/x86/realmode/rmpiggy.S | 6 + arch/x86/realmode/sev/Makefile | 33 + arch/x86/realmode/sev/ap_jump_table.S | 131 ++++ arch/x86/realmode/sev/ap_jump_table.lds | 24 + 19 files changed, 1695 insertions(+), 1144 deletions(-) create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h create mode 100644 arch/x86/lib/insn-eval-shared.c create mode 100644 arch/x86/realmode/rm/sev.S create mode 100644 arch/x86/realmode/sev/Makefile create mode 100644 arch/x86/realmode/sev/ap_jump_table.S create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds base-commit: e8f897f4afef0031fe618a8e94127a0934896aba -- 2.34.1

1 year, 8 months

3
10
0 0

[PATCH 5.15.y 0/7] Delay VERW 5.15.y backport

by Pawan Gupta

This is the backport of recently upstreamed series that moves VERW execution to a later point in exit-to-user path. This is needed because in some cases it may be possible for data accessed after VERW executions may end into MDS affected CPU buffers. Moving VERW closer to ring transition reduces the attack surface. - The series includes a dependency commit f87bc8dc7a7c ("x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix"). - Patch 2 includes a change that adds runtime patching for jmp (instead of verw in original series) due to lack of rip-relative relocation support in kernels <v6.5. - Fixed warning: arch/x86/entry/entry.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction. - Resolved merge conflicts in: swapgs_restore_regs_and_return_to_usermode in entry_64.S. __vmx_vcpu_run in vmenter.S. vmx_update_fb_clear_dis in vmx.c. - Boot tested with KASLR and KPTI enabled. - Verified VERW being executed with mitigation ON, and not being executed with mitigation turned OFF. To: stable(a)vger.kernel.org Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> --- H. Peter Anvin (Intel) (1): x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix Pawan Gupta (5): x86/bugs: Add asm helpers for executing VERW x86/entry_64: Add VERW just before userspace transition x86/entry_32: Add VERW just before userspace transition x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key KVM/VMX: Move VERW closer to VMentry for MDS mitigation Sean Christopherson (1): KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH Documentation/x86/mds.rst | 38 +++++++++++++++++++++++++----------- arch/x86/entry/entry.S | 23 ++++++++++++++++++++++ arch/x86/entry/entry_32.S | 3 +++ arch/x86/entry/entry_64.S | 11 +++++++++++ arch/x86/entry/entry_64_compat.S | 1 + arch/x86/include/asm/asm.h | 5 +++++ arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/entry-common.h | 1 - arch/x86/include/asm/nospec-branch.h | 27 +++++++++++++------------ arch/x86/kernel/cpu/bugs.c | 15 ++++++-------- arch/x86/kernel/nmi.c | 3 --- arch/x86/kvm/vmx/run_flags.h | 7 +++++-- arch/x86/kvm/vmx/vmenter.S | 9 ++++++--- arch/x86/kvm/vmx/vmx.c | 12 ++++++++---- 14 files changed, 111 insertions(+), 46 deletions(-) --- base-commit: 80efc6265290d34b75921bf7294e0d9c5a8749dc change-id: 20240304-delay-verw-backport-5-15-y-e16f07fbb71e Best regards, -- Thanks, Pawan

1 year, 8 months

1
9
0 0

[PATCH AUTOSEL 4.19 1/5] ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet

by Sasha Levin

From: Alban Boyé <alban.boye(a)protonmail.com> [ Upstream commit f8b0127aca8c60826e7354e504a12d4a46b1c3bb ] The bios version can differ depending if it is a dual-boot variant of the tablet. Therefore another DMI match is required. Signed-off-by: Alban Boyé <alban.boye(a)protonmail.com> Reviewed-by: Cezary Rojewski <cezary.rojewski(a)intel.com> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com> Link: https://msgid.link/r/20240228192807.15130-1-alban.boye@protonmail.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index d27dd170bedaf..19f425eb4a40f 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -523,6 +523,18 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { /* Chuwi Vi8 dual-boot (CWI506) */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Insyde"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "i86"), + /* The above are too generic, also match BIOS info */ + DMI_MATCH(DMI_BIOS_VERSION, "CHUWI2.D86JHBNR02"), + }, + .driver_data = (void *)(BYTCR_INPUT_DEFAULTS | + BYT_RT5640_MONO_SPEAKER | + BYT_RT5640_SSP0_AIF1 | + BYT_RT5640_MCLK_EN), + }, { /* Chuwi Vi10 (CWI505) */ .matches = { -- 2.43.0

1 year, 8 months

1
4
0 0

[PATCH AUTOSEL 5.4 1/5] ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet

by Sasha Levin

From: Alban Boyé <alban.boye(a)protonmail.com> [ Upstream commit f8b0127aca8c60826e7354e504a12d4a46b1c3bb ] The bios version can differ depending if it is a dual-boot variant of the tablet. Therefore another DMI match is required. Signed-off-by: Alban Boyé <alban.boye(a)protonmail.com> Reviewed-by: Cezary Rojewski <cezary.rojewski(a)intel.com> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com> Link: https://msgid.link/r/20240228192807.15130-1-alban.boye@protonmail.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index df3b370fe7292..c740dec00f83b 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -526,6 +526,18 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { /* Chuwi Vi8 dual-boot (CWI506) */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Insyde"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "i86"), + /* The above are too generic, also match BIOS info */ + DMI_MATCH(DMI_BIOS_VERSION, "CHUWI2.D86JHBNR02"), + }, + .driver_data = (void *)(BYTCR_INPUT_DEFAULTS | + BYT_RT5640_MONO_SPEAKER | + BYT_RT5640_SSP0_AIF1 | + BYT_RT5640_MCLK_EN), + }, { /* Chuwi Vi10 (CWI505) */ .matches = { -- 2.43.0

1 year, 8 months

1
4
0 0

[PATCH AUTOSEL 5.10 1/5] ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet

by Sasha Levin

From: Alban Boyé <alban.boye(a)protonmail.com> [ Upstream commit f8b0127aca8c60826e7354e504a12d4a46b1c3bb ] The bios version can differ depending if it is a dual-boot variant of the tablet. Therefore another DMI match is required. Signed-off-by: Alban Boyé <alban.boye(a)protonmail.com> Reviewed-by: Cezary Rojewski <cezary.rojewski(a)intel.com> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com> Link: https://msgid.link/r/20240228192807.15130-1-alban.boye@protonmail.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index f5b1b3b876980..1d049685e7075 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -529,6 +529,18 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { /* Chuwi Vi8 dual-boot (CWI506) */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Insyde"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "i86"), + /* The above are too generic, also match BIOS info */ + DMI_MATCH(DMI_BIOS_VERSION, "CHUWI2.D86JHBNR02"), + }, + .driver_data = (void *)(BYTCR_INPUT_DEFAULTS | + BYT_RT5640_MONO_SPEAKER | + BYT_RT5640_SSP0_AIF1 | + BYT_RT5640_MCLK_EN), + }, { /* Chuwi Vi10 (CWI505) */ .matches = { -- 2.43.0

1 year, 8 months

1
4
0 0

[PATCH AUTOSEL 5.15 1/5] ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet

by Sasha Levin

From: Alban Boyé <alban.boye(a)protonmail.com> [ Upstream commit f8b0127aca8c60826e7354e504a12d4a46b1c3bb ] The bios version can differ depending if it is a dual-boot variant of the tablet. Therefore another DMI match is required. Signed-off-by: Alban Boyé <alban.boye(a)protonmail.com> Reviewed-by: Cezary Rojewski <cezary.rojewski(a)intel.com> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com> Link: https://msgid.link/r/20240228192807.15130-1-alban.boye@protonmail.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 49dfbd29c5451..434679afa7e1a 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -668,6 +668,18 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { /* Chuwi Vi8 dual-boot (CWI506) */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Insyde"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "i86"), + /* The above are too generic, also match BIOS info */ + DMI_MATCH(DMI_BIOS_VERSION, "CHUWI2.D86JHBNR02"), + }, + .driver_data = (void *)(BYTCR_INPUT_DEFAULTS | + BYT_RT5640_MONO_SPEAKER | + BYT_RT5640_SSP0_AIF1 | + BYT_RT5640_MCLK_EN), + }, { /* Chuwi Vi10 (CWI505) */ .matches = { -- 2.43.0

1 year, 8 months

1
4
0 0

[PATCH AUTOSEL 6.6 01/12] arm64: tegra: Set the correct PHY mode for MGBE

by Sasha Levin

From: Thierry Reding <treding(a)nvidia.com> [ Upstream commit 4c892121d43bc2b45896ca207b54f39a8fa6b852 ] The PHY is configured in 10GBASE-R, so make sure to reflect that in DT. Reviewed-by: Jon Hunter <jonathanh(a)nvidia.com> Tested-by: Jon Hunter <jonathanh(a)nvidia.com> Signed-off-by: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts index 4413a9b6da87a..bf2ccc8ff93c4 100644 --- a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts +++ b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts @@ -174,7 +174,7 @@ status = "okay"; phy-handle = <&mgbe0_phy>; - phy-mode = "usxgmii"; + phy-mode = "10gbase-r"; mdio { #address-cells = <1>; -- 2.43.0

1 year, 8 months

1
11
0 0

[PATCH AUTOSEL 6.1 01/13] btrfs: fix data races when accessing the reserved amount of block reserves

by Sasha Levin

From: Filipe Manana <fdmanana(a)suse.com> [ Upstream commit e06cc89475eddc1f3a7a4d471524256152c68166 ] At space_info.c we have several places where we access the ->reserved field of a block reserve without taking the block reserve's spinlock first, which makes KCSAN warn about a data race since that field is always updated while holding the spinlock. The reports from KCSAN are like the following: [117.193526] BUG: KCSAN: data-race in btrfs_block_rsv_release [btrfs] / need_preemptive_reclaim [btrfs] [117.195148] read to 0x000000017f587190 of 8 bytes by task 6303 on cpu 3: [117.195172] need_preemptive_reclaim+0x222/0x2f0 [btrfs] [117.195992] __reserve_bytes+0xbb0/0xdc8 [btrfs] [117.196807] btrfs_reserve_metadata_bytes+0x4c/0x120 [btrfs] [117.197620] btrfs_block_rsv_add+0x78/0xa8 [btrfs] [117.198434] btrfs_delayed_update_inode+0x154/0x368 [btrfs] [117.199300] btrfs_update_inode+0x108/0x1c8 [btrfs] [117.200122] btrfs_dirty_inode+0xb4/0x140 [btrfs] [117.200937] btrfs_update_time+0x8c/0xb0 [btrfs] [117.201754] touch_atime+0x16c/0x1e0 [117.201789] filemap_read+0x674/0x728 [117.201823] btrfs_file_read_iter+0xf8/0x410 [btrfs] [117.202653] vfs_read+0x2b6/0x498 [117.203454] ksys_read+0xa2/0x150 [117.203473] __s390x_sys_read+0x68/0x88 [117.203495] do_syscall+0x1c6/0x210 [117.203517] __do_syscall+0xc8/0xf0 [117.203539] system_call+0x70/0x98 [117.203579] write to 0x000000017f587190 of 8 bytes by task 11 on cpu 0: [117.203604] btrfs_block_rsv_release+0x2e8/0x578 [btrfs] [117.204432] btrfs_delayed_inode_release_metadata+0x7c/0x1d0 [btrfs] [117.205259] __btrfs_update_delayed_inode+0x37c/0x5e0 [btrfs] [117.206093] btrfs_async_run_delayed_root+0x356/0x498 [btrfs] [117.206917] btrfs_work_helper+0x160/0x7a0 [btrfs] [117.207738] process_one_work+0x3b6/0x838 [117.207768] worker_thread+0x75e/0xb10 [117.207797] kthread+0x21a/0x230 [117.207830] __ret_from_fork+0x6c/0xb8 [117.207861] ret_from_fork+0xa/0x30 So add a helper to get the reserved amount of a block reserve while holding the lock. The value may be not be up to date anymore when used by need_preemptive_reclaim() and btrfs_preempt_reclaim_metadata_space(), but that's ok since the worst it can do is cause more reclaim work do be done sooner rather than later. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- fs/btrfs/block-rsv.h | 16 ++++++++++++++++ fs/btrfs/space-info.c | 26 +++++++++++++------------- 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h index 578c3497a455c..cda79d3e0c263 100644 --- a/fs/btrfs/block-rsv.h +++ b/fs/btrfs/block-rsv.h @@ -101,4 +101,20 @@ static inline bool btrfs_block_rsv_full(const struct btrfs_block_rsv *rsv) return data_race(rsv->full); } +/* + * Get the reserved mount of a block reserve in a context where getting a stale + * value is acceptable, instead of accessing it directly and trigger data race + * warning from KCSAN. + */ +static inline u64 btrfs_block_rsv_reserved(struct btrfs_block_rsv *rsv) +{ + u64 ret; + + spin_lock(&rsv->lock); + ret = rsv->reserved; + spin_unlock(&rsv->lock); + + return ret; +} + #endif /* BTRFS_BLOCK_RSV_H */ diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 2635fb4bffa06..8b75f436a9a3c 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -847,7 +847,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info, static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info) { - u64 global_rsv_size = fs_info->global_block_rsv.reserved; + const u64 global_rsv_size = btrfs_block_rsv_reserved(&fs_info->global_block_rsv); u64 ordered, delalloc; u64 total = writable_total_bytes(fs_info, space_info); u64 thresh; @@ -948,8 +948,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info, ordered = percpu_counter_read_positive(&fs_info->ordered_bytes) >> 1; delalloc = percpu_counter_read_positive(&fs_info->delalloc_bytes); if (ordered >= delalloc) - used += fs_info->delayed_refs_rsv.reserved + - fs_info->delayed_block_rsv.reserved; + used += btrfs_block_rsv_reserved(&fs_info->delayed_refs_rsv) + + btrfs_block_rsv_reserved(&fs_info->delayed_block_rsv); else used += space_info->bytes_may_use - global_rsv_size; @@ -1164,7 +1164,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) enum btrfs_flush_state flush; u64 delalloc_size = 0; u64 to_reclaim, block_rsv_size; - u64 global_rsv_size = global_rsv->reserved; + const u64 global_rsv_size = btrfs_block_rsv_reserved(global_rsv); loops++; @@ -1176,9 +1176,9 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) * assume it's tied up in delalloc reservations. */ block_rsv_size = global_rsv_size + - delayed_block_rsv->reserved + - delayed_refs_rsv->reserved + - trans_rsv->reserved; + btrfs_block_rsv_reserved(delayed_block_rsv) + + btrfs_block_rsv_reserved(delayed_refs_rsv) + + btrfs_block_rsv_reserved(trans_rsv); if (block_rsv_size < space_info->bytes_may_use) delalloc_size = space_info->bytes_may_use - block_rsv_size; @@ -1198,16 +1198,16 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) to_reclaim = delalloc_size; flush = FLUSH_DELALLOC; } else if (space_info->bytes_pinned > - (delayed_block_rsv->reserved + - delayed_refs_rsv->reserved)) { + (btrfs_block_rsv_reserved(delayed_block_rsv) + + btrfs_block_rsv_reserved(delayed_refs_rsv))) { to_reclaim = space_info->bytes_pinned; flush = COMMIT_TRANS; - } else if (delayed_block_rsv->reserved > - delayed_refs_rsv->reserved) { - to_reclaim = delayed_block_rsv->reserved; + } else if (btrfs_block_rsv_reserved(delayed_block_rsv) > + btrfs_block_rsv_reserved(delayed_refs_rsv)) { + to_reclaim = btrfs_block_rsv_reserved(delayed_block_rsv); flush = FLUSH_DELAYED_ITEMS_NR; } else { - to_reclaim = delayed_refs_rsv->reserved; + to_reclaim = btrfs_block_rsv_reserved(delayed_refs_rsv); flush = FLUSH_DELAYED_REFS_NR; } -- 2.43.0

1 year, 8 months

2
13
0 0

[PATCH AUTOSEL 4.19 1/3] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready

by Sasha Levin

From: Ranjan Kumar <ranjan.kumar(a)broadcom.com> [ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ] If the driver detects that the controller is not ready before sending the first IOC facts command, it will wait for a maximum of 10 seconds for it to become ready. However, even if the controller becomes ready within 10 seconds, the driver will still issue a diagnostic reset. Modify the driver to avoid sending a diag reset if the controller becomes ready within the 10-second wait time. Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com> Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/scsi/mpt3sas/mpt3sas_base.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 447ac667f4b2b..7588c2c11a879 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -5584,7 +5584,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout) return -EFAULT; } - issue_diag_reset: + return 0; + +issue_diag_reset: rc = _base_diag_reset(ioc); return rc; } -- 2.43.0

1 year, 8 months

1
2
0 0

[PATCH AUTOSEL 5.4 1/4] btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve

by Sasha Levin

From: Filipe Manana <fdmanana(a)suse.com> [ Upstream commit c7bb26b847e5b97814f522686068c5628e2b3646 ] At btrfs_use_block_rsv() we read the size of a block reserve without locking its spinlock, which makes KCSAN complain because the size of a block reserve is always updated while holding its spinlock. The report from KCSAN is the following: [653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs] [653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0: [653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs] [653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs] [653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs] [653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs] [653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs] [653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs] [653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs] [653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs] [653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs] [653.322140] flush_space+0x5e4/0x718 [btrfs] [653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs] [653.323781] process_one_work+0x3b6/0x838 [653.323800] worker_thread+0x75e/0xb10 [653.323817] kthread+0x21a/0x230 [653.323836] __ret_from_fork+0x6c/0xb8 [653.323855] ret_from_fork+0xa/0x30 [653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3: [653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs] [653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs] [653.325494] btrfs_free_extent+0x76/0x120 [btrfs] [653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs] [653.327064] btrfs_dec_ref+0x50/0x70 [btrfs] [653.327849] walk_up_proc+0x236/0xa50 [btrfs] [653.328633] walk_up_tree+0x21c/0x448 [btrfs] [653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs] [653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs] [653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs] [653.331781] kthread+0x21a/0x230 [653.331800] __ret_from_fork+0x6c/0xb8 [653.331818] ret_from_fork+0xa/0x30 So add a helper to get the size of a block reserve while holding the lock. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- fs/btrfs/block-rsv.c | 2 +- fs/btrfs/block-rsv.h | 16 ++++++++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c index 36ef3228bac86..63205d2f4d84c 100644 --- a/fs/btrfs/block-rsv.c +++ b/fs/btrfs/block-rsv.c @@ -392,7 +392,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans, block_rsv = get_block_rsv(trans, root); - if (unlikely(block_rsv->size == 0)) + if (unlikely(btrfs_block_rsv_size(block_rsv) == 0)) goto try_reserve; again: ret = btrfs_block_rsv_use_bytes(block_rsv, blocksize); diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h index d1428bb73fc5a..69770360917cb 100644 --- a/fs/btrfs/block-rsv.h +++ b/fs/btrfs/block-rsv.h @@ -98,4 +98,20 @@ static inline void btrfs_unuse_block_rsv(struct btrfs_fs_info *fs_info, btrfs_block_rsv_release(fs_info, block_rsv, 0); } +/* + * Get the size of a block reserve in a context where getting a stale value is + * acceptable, instead of accessing it directly and trigger data race warning + * from KCSAN. + */ +static inline u64 btrfs_block_rsv_size(struct btrfs_block_rsv *rsv) +{ + u64 ret; + + spin_lock(&rsv->lock); + ret = rsv->size; + spin_unlock(&rsv->lock); + + return ret; +} + #endif /* BTRFS_BLOCK_RSV_H */ -- 2.43.0

1 year, 8 months

1
3
0 0

[PATCH AUTOSEL 5.10 1/3] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready

by Sasha Levin

From: Ranjan Kumar <ranjan.kumar(a)broadcom.com> [ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ] If the driver detects that the controller is not ready before sending the first IOC facts command, it will wait for a maximum of 10 seconds for it to become ready. However, even if the controller becomes ready within 10 seconds, the driver will still issue a diagnostic reset. Modify the driver to avoid sending a diag reset if the controller becomes ready within the 10-second wait time. Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com> Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/scsi/mpt3sas/mpt3sas_base.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 814ac25238058..105d781d0cacf 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -6357,7 +6357,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout) return -EFAULT; } - issue_diag_reset: + return 0; + +issue_diag_reset: rc = _base_diag_reset(ioc); return rc; } -- 2.43.0

1 year, 8 months

1
2
0 0

[PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ

by Alex Williamson

Currently for devices requiring masking at the irqchip for INTx, ie. devices without DisINTx support, the IRQ is enabled in request_irq() and subsequently disabled as necessary to align with the masked status flag. This presents a window where the interrupt could fire between these events, resulting in the IRQ incrementing the disable depth twice. This would be unrecoverable for a user since the masked flag prevents nested enables through vfio. Instead, invert the logic using IRQF_NO_AUTOEN such that exclusive INTx is never auto-enabled, then unmask as required. Cc: stable(a)vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reviewed-by: Kevin Tian <kevin.tian(a)intel.com> Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> --- drivers/vfio/pci/vfio_pci_intrs.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 237beac83809..136101179fcb 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -296,8 +296,15 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd) ctx->trigger = trigger; + /* + * Devices without DisINTx support require an exclusive interrupt, + * IRQ masking is performed at the IRQ chip. The masked status is + * protected by vdev->irqlock. Setup the IRQ without auto-enable and + * unmask as necessary below under lock. DisINTx is unmodified by + * the IRQ configuration and may therefore use auto-enable. + */ if (!vdev->pci_2_3) - irqflags = 0; + irqflags = IRQF_NO_AUTOEN; ret = request_irq(pdev->irq, vfio_intx_handler, irqflags, ctx->name, vdev); @@ -308,13 +315,9 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd) return ret; } - /* - * INTx disable will stick across the new irq setup, - * disable_irq won't. - */ spin_lock_irqsave(&vdev->irqlock, flags); - if (!vdev->pci_2_3 && ctx->masked) - disable_irq_nosync(pdev->irq); + if (!vdev->pci_2_3 && !ctx->masked) + enable_irq(pdev->irq); spin_unlock_irqrestore(&vdev->irqlock, flags); return 0; -- 2.44.0

1 year, 8 months

2
2
0 0

[PATCH] can: mcp251xfd: fix infinite loop when xmit fails

by Vitor Soares

From: Vitor Soares <vitor.soares(a)toradex.com> When the mcp251xfd_start_xmit() function fails, the driver stops processing messages, and the interrupt routine does not return, running indefinitely even after killing the running application. Error messages: [ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16 [ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3). ... and repeat forever. The issue can be triggered when multiple devices share the same SPI interface. And there is concurrent access to the bus. The problem occurs because tx_ring->head increments even if mcp251xfd_start_xmit() fails. Consequently, the driver skips one TX package while still expecting a response in mcp251xfd_handle_tefif_one(). This patch resolves the issue by decreasing tx_ring->head if mcp251xfd_start_xmit() fails. With the fix, if we trigger the issue and the err = -EBUSY, the driver returns NETDEV_TX_BUSY. The network stack retries to transmit the message. Otherwise, it prints an error and discards the message. Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Cc: stable(a)vger.kernel.org Signed-off-by: Vitor Soares <vitor.soares(a)toradex.com> --- V2->V3: - Add tx_dropped stats. - netdev_sent_queue() only if can_put_echo_skb() succeed. V1->V2: - Return NETDEV_TX_BUSY if mcp251xfd_tx_obj_write() == -EBUSY. - Rework the commit message to address the change above. - Change can_put_echo_skb() to be called after mcp251xfd_tx_obj_write() succeed. Otherwise, we get Kernel NULL pointer dereference error. drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c | 34 ++++++++++++-------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c index 160528d3cc26..146c44e47c60 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c @@ -166,6 +166,7 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, struct net_device *ndev) { struct mcp251xfd_priv *priv = netdev_priv(ndev); + struct net_device_stats *stats = &ndev->stats; struct mcp251xfd_tx_ring *tx_ring = priv->tx; struct mcp251xfd_tx_obj *tx_obj; unsigned int frame_len; @@ -181,25 +182,32 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, tx_obj = mcp251xfd_get_tx_obj_next(tx_ring); mcp251xfd_tx_obj_from_skb(priv, tx_obj, skb, tx_ring->head); - /* Stop queue if we occupy the complete TX FIFO */ tx_head = mcp251xfd_get_tx_head(tx_ring); - tx_ring->head++; - if (mcp251xfd_get_tx_free(tx_ring) == 0) - netif_stop_queue(ndev); - frame_len = can_skb_get_frame_len(skb); - err = can_put_echo_skb(skb, ndev, tx_head, frame_len); - if (!err) - netdev_sent_queue(priv->ndev, frame_len); + + tx_ring->head++; err = mcp251xfd_tx_obj_write(priv, tx_obj); - if (err) - goto out_err; + if (err) { + tx_ring->head--; - return NETDEV_TX_OK; + if (err == -EBUSY) + return NETDEV_TX_BUSY; - out_err: - netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err); + stats->tx_dropped++; + + if (net_ratelimit()) + netdev_err(priv->ndev, + "ERROR in %s: %d\n", __func__, err); + } else { + err = can_put_echo_skb(skb, ndev, tx_head, frame_len); + if (!err) + netdev_sent_queue(priv->ndev, frame_len); + + /* Stop queue if we occupy the complete TX FIFO */ + if (mcp251xfd_get_tx_free(tx_ring) == 0) + netif_stop_queue(ndev); + } return NETDEV_TX_OK; } -- 2.34.1

1 year, 8 months

1
0
0 0

[PATCH 5.15.y] selftests: mptcp: decrease BW in simult flows

by Matthieu Baerts (NGI0)

When running the simult_flow selftest in slow environments -- e.g. QEmu without KVM support --, the results can be unstable. This selftest checks if the aggregated bandwidth is (almost) fully used as expected. To help improving the stability while still keeping the same validation in place, the BW and the delay are reduced to lower the pressure on the CPU. Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests") Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space") Cc: stable(a)vger.kernel.org Suggested-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1… Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> (cherry picked from commit 5e2f3c65af47e527ccac54060cf909e3306652ff) Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Notes: - Conflicts in simult_flows.sh, because v5.15 doesn't have commit 675d99338e7a ("selftests: mptcp: simult flows: format subtests results in TAP") which modifies the context for a new but unrelated feature. - This is a new version to the one recently proposed by Sasha, this time without dependences: https://lore.kernel.org/stable/9f185a3f-9373-401c-9a5c-ec0f106c0cbc@kernel.… - This is the same patch as the one recently sent for v6.1: https://lore.kernel.org/stable/20240311111224.1421344-2-matttbe@kernel.org/ --- tools/testing/selftests/net/mptcp/simult_flows.sh | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh index 752cef168804..cab3e3a5481d 100755 --- a/tools/testing/selftests/net/mptcp/simult_flows.sh +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh @@ -289,10 +289,11 @@ done setup run_test 10 10 0 0 "balanced bwidth" -run_test 10 10 1 50 "balanced bwidth with unbalanced delay" +run_test 10 10 1 25 "balanced bwidth with unbalanced delay" # we still need some additional infrastructure to pass the following test-cases -run_test 30 10 0 0 "unbalanced bwidth" -run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay" -run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay" +run_test 10 3 0 0 "unbalanced bwidth" +run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay" +run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay" + exit $ret -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH 6.1.y] selftests: mptcp: decrease BW in simult flows

by Matthieu Baerts (NGI0)

When running the simult_flow selftest in slow environments -- e.g. QEmu without KVM support --, the results can be unstable. This selftest checks if the aggregated bandwidth is (almost) fully used as expected. To help improving the stability while still keeping the same validation in place, the BW and the delay are reduced to lower the pressure on the CPU. Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests") Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space") Cc: stable(a)vger.kernel.org Suggested-by: Paolo Abeni <pabeni(a)redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1… Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> (cherry picked from commit 5e2f3c65af47e527ccac54060cf909e3306652ff) Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Notes: - Conflicts in simult_flows.sh, because v6.1 doesn't have commit 675d99338e7a ("selftests: mptcp: simult flows: format subtests results in TAP") which modifies the context for a new but unrelated feature. - This is a new version to the one recently proposed by Sasha, this time without dependences: https://lore.kernel.org/stable/9f185a3f-9373-401c-9a5c-ec0f106c0cbc@kernel.… --- tools/testing/selftests/net/mptcp/simult_flows.sh | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh index 4a417f9d51d6..ee24e06521e6 100755 --- a/tools/testing/selftests/net/mptcp/simult_flows.sh +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh @@ -301,10 +301,11 @@ done setup run_test 10 10 0 0 "balanced bwidth" -run_test 10 10 1 50 "balanced bwidth with unbalanced delay" +run_test 10 10 1 25 "balanced bwidth with unbalanced delay" # we still need some additional infrastructure to pass the following test-cases -run_test 30 10 0 0 "unbalanced bwidth" -run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay" -run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay" +run_test 10 3 0 0 "unbalanced bwidth" +run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay" +run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay" + exit $ret -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger

by Alex Williamson

The eventfd_ctx trigger pointer of the vfio_fsl_mc_irq object is initially NULL and may become NULL if the user sets the trigger eventfd to -1. The interrupt handler itself is guaranteed that trigger is always valid between request_irq() and free_irq(), but the loopback testing mechanisms to invoke the handler function need to test the trigger. The triggering and setting ioctl paths both make use of igate and are therefore mutually exclusive. The vfio-fsl-mc driver does not make use of irqfds, nor does it support any sort of masking operations, therefore unlike vfio-pci and vfio-platform, the flow can remain essentially unchanged. Cc: Diana Craciun <diana.craciun(a)oss.nxp.com> Cc: stable(a)vger.kernel.org Fixes: cc0ee20bd969 ("vfio/fsl-mc: trigger an interrupt via eventfd") Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> --- drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c index d62fbfff20b8..82b2afa9b7e3 100644 --- a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c @@ -141,13 +141,14 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev, irq = &vdev->mc_irqs[index]; if (flags & VFIO_IRQ_SET_DATA_NONE) { - vfio_fsl_mc_irq_handler(hwirq, irq); + if (irq->trigger) + eventfd_signal(irq->trigger); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { u8 trigger = *(u8 *)data; - if (trigger) - vfio_fsl_mc_irq_handler(hwirq, irq); + if (trigger && irq->trigger) + eventfd_signal(irq->trigger); } return 0; -- 2.44.0

1 year, 8 months

3
2
0 0

[PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup

by Alex Williamson

irqfds for mask and unmask that are not specifically disabled by the user are leaked. Remove any irqfds during cleanup Cc: Eric Auger <eric.auger(a)redhat.com> Cc: stable(a)vger.kernel.org Fixes: a7fa7c77cf15 ("vfio/platform: implement IRQ masking/unmasking via an eventfd") Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> --- drivers/vfio/platform/vfio_platform_irq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 61a1bfb68ac7..e5dcada9e86c 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -321,8 +321,11 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev) { int i; - for (i = 0; i < vdev->num_irqs; i++) + for (i = 0; i < vdev->num_irqs; i++) { + vfio_virqfd_disable(&vdev->irqs[i].mask); + vfio_virqfd_disable(&vdev->irqs[i].unmask); vfio_set_trigger(vdev, i, -1, NULL); + } vdev->num_irqs = 0; kfree(vdev->irqs); -- 2.44.0

1 year, 8 months

3
2
0 0

[PATCH v2 4/7] vfio/pci: Create persistent INTx handler

by Alex Williamson

A vulnerability exists where the eventfd for INTx signaling can be deconfigured, which unregisters the IRQ handler but still allows eventfds to be signaled with a NULL context through the SET_IRQS ioctl or through unmask irqfd if the device interrupt is pending. Ideally this could be solved with some additional locking; the igate mutex serializes the ioctl and config space accesses, and the interrupt handler is unregistered relative to the trigger, but the irqfd path runs asynchronous to those. The igate mutex cannot be acquired from the atomic context of the eventfd wake function. Disabling the irqfd relative to the eventfd registration is potentially incompatible with existing userspace. As a result, the solution implemented here moves configuration of the INTx interrupt handler to track the lifetime of the INTx context object and irq_type configuration, rather than registration of a particular trigger eventfd. Synchronization is added between the ioctl path and eventfd_signal() wrapper such that the eventfd trigger can be dynamically updated relative to in-flight interrupts or irqfd callbacks. Cc: stable(a)vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Reinette Chatre <reinette.chatre(a)intel.com> Reviewed-by: Kevin Tian <kevin.tian(a)intel.com> Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com> Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> --- drivers/vfio/pci/vfio_pci_intrs.c | 145 ++++++++++++++++-------------- 1 file changed, 78 insertions(+), 67 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 75c85eec21b3..fb5392b749ff 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -90,11 +90,15 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) if (likely(is_intx(vdev) && !vdev->virq_disabled)) { struct vfio_pci_irq_ctx *ctx; + struct eventfd_ctx *trigger; ctx = vfio_irq_ctx_get(vdev, 0); if (WARN_ON_ONCE(!ctx)) return; - eventfd_signal(ctx->trigger); + + trigger = READ_ONCE(ctx->trigger); + if (likely(trigger)) + eventfd_signal(trigger); } } @@ -253,100 +257,100 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id) return ret; } -static int vfio_intx_enable(struct vfio_pci_core_device *vdev) +static int vfio_intx_enable(struct vfio_pci_core_device *vdev, + struct eventfd_ctx *trigger) { + struct pci_dev *pdev = vdev->pdev; struct vfio_pci_irq_ctx *ctx; + unsigned long irqflags; + char *name; + int ret; if (!is_irq_none(vdev)) return -EINVAL; - if (!vdev->pdev->irq) + if (!pdev->irq) return -ENODEV; + name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev)); + if (!name) + return -ENOMEM; + ctx = vfio_irq_ctx_alloc(vdev, 0); if (!ctx) return -ENOMEM; + ctx->name = name; + ctx->trigger = trigger; + /* - * If the virtual interrupt is masked, restore it. Devices - * supporting DisINTx can be masked at the hardware level - * here, non-PCI-2.3 devices will have to wait until the - * interrupt is enabled. + * Fill the initial masked state based on virq_disabled. After + * enable, changing the DisINTx bit in vconfig directly changes INTx + * masking. igate prevents races during setup, once running masked + * is protected via irqlock. + * + * Devices supporting DisINTx also reflect the current mask state in + * the physical DisINTx bit, which is not affected during IRQ setup. + * + * Devices without DisINTx support require an exclusive interrupt. + * IRQ masking is performed at the IRQ chip. Again, igate protects + * against races during setup and IRQ handlers and irqfds are not + * yet active, therefore masked is stable and can be used to + * conditionally auto-enable the IRQ. + * + * irq_type must be stable while the IRQ handler is registered, + * therefore it must be set before request_irq(). */ ctx->masked = vdev->virq_disabled; - if (vdev->pci_2_3) - pci_intx(vdev->pdev, !ctx->masked); + if (vdev->pci_2_3) { + pci_intx(pdev, !ctx->masked); + irqflags = IRQF_SHARED; + } else { + irqflags = ctx->masked ? IRQF_NO_AUTOEN : 0; + } vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX; + ret = request_irq(pdev->irq, vfio_intx_handler, + irqflags, ctx->name, vdev); + if (ret) { + vdev->irq_type = VFIO_PCI_NUM_IRQS; + kfree(name); + vfio_irq_ctx_free(vdev, ctx, 0); + return ret; + } + return 0; } -static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd) +static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, + struct eventfd_ctx *trigger) { struct pci_dev *pdev = vdev->pdev; - unsigned long irqflags = IRQF_SHARED; struct vfio_pci_irq_ctx *ctx; - struct eventfd_ctx *trigger; - unsigned long flags; - int ret; + struct eventfd_ctx *old; ctx = vfio_irq_ctx_get(vdev, 0); if (WARN_ON_ONCE(!ctx)) return -EINVAL; - if (ctx->trigger) { - free_irq(pdev->irq, vdev); - kfree(ctx->name); - eventfd_ctx_put(ctx->trigger); - ctx->trigger = NULL; - } - - if (fd < 0) /* Disable only */ - return 0; - - ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", - pci_name(pdev)); - if (!ctx->name) - return -ENOMEM; - - trigger = eventfd_ctx_fdget(fd); - if (IS_ERR(trigger)) { - kfree(ctx->name); - return PTR_ERR(trigger); - } + old = ctx->trigger; - ctx->trigger = trigger; + WRITE_ONCE(ctx->trigger, trigger); - /* - * Devices without DisINTx support require an exclusive interrupt, - * IRQ masking is performed at the IRQ chip. The masked status is - * protected by vdev->irqlock. Setup the IRQ without auto-enable and - * unmask as necessary below under lock. DisINTx is unmodified by - * the IRQ configuration and may therefore use auto-enable. - */ - if (!vdev->pci_2_3) - irqflags = IRQF_NO_AUTOEN; - - ret = request_irq(pdev->irq, vfio_intx_handler, - irqflags, ctx->name, vdev); - if (ret) { - ctx->trigger = NULL; - kfree(ctx->name); - eventfd_ctx_put(trigger); - return ret; + /* Releasing an old ctx requires synchronizing in-flight users */ + if (old) { + synchronize_irq(pdev->irq); + vfio_virqfd_flush_thread(&ctx->unmask); + eventfd_ctx_put(old); } - spin_lock_irqsave(&vdev->irqlock, flags); - if (!vdev->pci_2_3 && !ctx->masked) - enable_irq(pdev->irq); - spin_unlock_irqrestore(&vdev->irqlock, flags); - return 0; } static void vfio_intx_disable(struct vfio_pci_core_device *vdev) { + struct pci_dev *pdev = vdev->pdev; struct vfio_pci_irq_ctx *ctx; ctx = vfio_irq_ctx_get(vdev, 0); @@ -354,10 +358,13 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev) if (ctx) { vfio_virqfd_disable(&ctx->unmask); vfio_virqfd_disable(&ctx->mask); + free_irq(pdev->irq, vdev); + if (ctx->trigger) + eventfd_ctx_put(ctx->trigger); + kfree(ctx->name); + vfio_irq_ctx_free(vdev, ctx, 0); } - vfio_intx_set_signal(vdev, -1); vdev->irq_type = VFIO_PCI_NUM_IRQS; - vfio_irq_ctx_free(vdev, ctx, 0); } /* @@ -641,19 +648,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, return -EINVAL; if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + struct eventfd_ctx *trigger = NULL; int32_t fd = *(int32_t *)data; int ret; - if (is_intx(vdev)) - return vfio_intx_set_signal(vdev, fd); + if (fd >= 0) { + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) + return PTR_ERR(trigger); + } - ret = vfio_intx_enable(vdev); - if (ret) - return ret; + if (is_intx(vdev)) + ret = vfio_intx_set_signal(vdev, trigger); + else + ret = vfio_intx_enable(vdev, trigger); - ret = vfio_intx_set_signal(vdev, fd); - if (ret) - vfio_intx_disable(vdev); + if (ret && trigger) + eventfd_ctx_put(trigger); return ret; } -- 2.44.0

1 year, 8 months

2
1
0 0

[PATCH v2 2/7] vfio/pci: Lock external INTx masking ops

by Alex Williamson

Mask operations through config space changes to DisINTx may race INTx configuration changes via ioctl. Create wrappers that add locking for paths outside of the core interrupt code. In particular, irq_type is updated holding igate, therefore testing is_intx() requires holding igate. For example clearing DisINTx from config space can otherwise race changes of the interrupt configuration. This aligns interfaces which may trigger the INTx eventfd into two camps, one side serialized by igate and the other only enabled while INTx is configured. A subsequent patch introduces synchronization for the latter flows. Cc: stable(a)vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Reinette Chatre <reinette.chatre(a)intel.com> Reviewed-by: Kevin Tian <kevin.tian(a)intel.com> Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com> Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com> --- drivers/vfio/pci/vfio_pci_intrs.c | 34 +++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 136101179fcb..75c85eec21b3 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -99,13 +99,15 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) } /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */ -bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) +static bool __vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; struct vfio_pci_irq_ctx *ctx; unsigned long flags; bool masked_changed = false; + lockdep_assert_held(&vdev->igate); + spin_lock_irqsave(&vdev->irqlock, flags); /* @@ -143,6 +145,17 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) return masked_changed; } +bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) +{ + bool mask_changed; + + mutex_lock(&vdev->igate); + mask_changed = __vfio_pci_intx_mask(vdev); + mutex_unlock(&vdev->igate); + + return mask_changed; +} + /* * If this is triggered by an eventfd, we can't call eventfd_signal * or else we'll deadlock on the eventfd wait queue. Return >0 when @@ -194,12 +207,21 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused) return ret; } -void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev) +static void __vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev) { + lockdep_assert_held(&vdev->igate); + if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0) vfio_send_intx_eventfd(vdev, NULL); } +void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev) +{ + mutex_lock(&vdev->igate); + __vfio_pci_intx_unmask(vdev); + mutex_unlock(&vdev->igate); +} + static irqreturn_t vfio_intx_handler(int irq, void *dev_id) { struct vfio_pci_core_device *vdev = dev_id; @@ -563,11 +585,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev, return -EINVAL; if (flags & VFIO_IRQ_SET_DATA_NONE) { - vfio_pci_intx_unmask(vdev); + __vfio_pci_intx_unmask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { uint8_t unmask = *(uint8_t *)data; if (unmask) - vfio_pci_intx_unmask(vdev); + __vfio_pci_intx_unmask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0); int32_t fd = *(int32_t *)data; @@ -594,11 +616,11 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev, return -EINVAL; if (flags & VFIO_IRQ_SET_DATA_NONE) { - vfio_pci_intx_mask(vdev); + __vfio_pci_intx_mask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { uint8_t mask = *(uint8_t *)data; if (mask) - vfio_pci_intx_mask(vdev); + __vfio_pci_intx_mask(vdev); } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { return -ENOTTY; /* XXX implement me */ } -- 2.44.0

1 year, 8 months

2
1
0 0

[PATCH] drm: Don't return unsupported formats in drm_mode_legacy_fb_format

by Frej Drejhammar

This patch changes drm_mode_legacy_fb_format() to only return formats which are supported by the current drm-device. The motivation for this change is to fix a regression introduced by commit c91acda3a380 ("drm/gem: Check for valid formats") which stops the Xorg modesetting driver from working on the Beagleboard Black (it uses the tilcdc kernel driver). When the Xorg modesetting driver starts up, it tries to determine the default bpp for the device. It does this by allocating a dumb 32bpp frame buffer (using DRM_IOCTL_MODE_CREATE_DUMB) and then calling drmModeAddFB() with that frame buffer asking for a 24-bit depth and 32 bpp. As the modesetting driver uses drmModeAddFB() which doesn't supply a format, the kernel's drm_mode_legacy_fb_format() is called to provide a format matching the requested depth and bpp. If the drmModeAddFB() call fails, it forces both depth and bpp to 24. If drmModeAddFB() succeeds, depth is assumed to be 24 and bpp 32. The dummy frame buffer is then removed (using drmModeRmFB()). If the modesetting driver finds that both the default bpp and depth are 24, it forces the use of a 32bpp shadow buffer and a 24bpp front buffer. Following this, the driver reads the user-specified color depth option and tries to create a framebuffer of that depth, but if the use of a shadow buffer has been forced, the bpp and depth of it overrides the user-supplied option. The Xorg modesetting driver on top of the tilcdc kernel driver used to work on the Beagleboard Black if a 16 bit color depth was configured. The hardware in the Beagleboard Black supports the RG16, BG24, and XB24 formats. When drm_mode_legacy_fb_format() was called to request a format for a 24-bit depth and 32 bpp, it would return the unsupported RG24 format which drmModeAddFB() would happily accept (as there was no check for a valid format). As a shadow buffer wasn't forced, the modesetting driver would try the user specified 16 bit color depth and drm_mode_legacy_fb_format() would return RG16 which is supported by the hardware. Color depths of 24 bits were not supported, as the unsupported RG24 would be detected when drmModeSetCrtc() was called. Following commit c91acda3a380 ("drm/gem: Check for valid formats"), which adds a check for a valid (supported by the hardware) format to the code path for the kernel part of drmModeAddFB(), the modesetting driver fails to configure and add a frame buffer. This is because the call to create a 24-bit depth and 32 bpp framebuffer during detection of the default bpp will now fail and a 24-bit depth and 24 bpp front buffer will be forced. As drm_mode_legacy_fb_format() will return RG24 which isn't supported, the creation of that framebuffer will also fail. To fix the regression, this patch extends drm_mode_legacy_fb_format() to list all formats with a particular bpp and color depth known to the kernel, and have it probe the current drm-device for a supported format. This fixes the regression and, as a bonus, a color depth of 24 bits on the Beagleboard Black is now working. As this patch changes drm_mode_legacy_fb_format() which is used by other drivers, it has, in addition to the Beagleboard Black, also been tested with the nouveau and modesetting drivers on a NVIDIA NV96, and with the intel and modesetting drivers on an intel HD Graphics 4000 chipset. Signed-off-by: Frej Drejhammar <frej.drejhammar(a)gmail.com> Fixes: c91acda3a380 ("drm/gem: Check for valid formats") Cc: stable(a)vger.kernel.org Cc: Russell King <linux(a)armlinux.org.uk> Cc: David Airlie <airlied(a)gmail.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> Cc: Patrik Jakobsson <patrik.r.jakobsson(a)gmail.com> Cc: Rob Clark <robdclark(a)gmail.com> Cc: Abhinav Kumar <quic_abhinavk(a)quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org> Cc: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com> Cc: Javier Martinez Canillas <javierm(a)redhat.com> Cc: "Maíra Canal" <mcanal(a)igalia.com> Cc: "Ville Syrjälä" <ville.syrjala(a)linux.intel.com> Cc: dri-devel(a)lists.freedesktop.org --- drivers/gpu/drm/armada/armada_fbdev.c | 2 +- drivers/gpu/drm/drm_fb_helper.c | 2 +- drivers/gpu/drm/drm_fbdev_dma.c | 3 +- drivers/gpu/drm/drm_fbdev_generic.c | 3 +- drivers/gpu/drm/drm_fourcc.c | 91 ++++++++++++++++--- drivers/gpu/drm/exynos/exynos_drm_fbdev.c | 3 +- drivers/gpu/drm/gma500/fbdev.c | 2 +- drivers/gpu/drm/i915/display/intel_fbdev_fb.c | 3 +- drivers/gpu/drm/msm/msm_fbdev.c | 3 +- drivers/gpu/drm/omapdrm/omap_fbdev.c | 2 +- drivers/gpu/drm/radeon/radeon_fbdev.c | 3 +- drivers/gpu/drm/tegra/fbdev.c | 3 +- drivers/gpu/drm/tiny/ofdrm.c | 6 +- drivers/gpu/drm/xe/display/intel_fbdev_fb.c | 3 +- include/drm/drm_fourcc.h | 3 +- 15 files changed, 104 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/armada/armada_fbdev.c b/drivers/gpu/drm/armada/armada_fbdev.c index d223176912b6..82f312f76980 100644 --- a/drivers/gpu/drm/armada/armada_fbdev.c +++ b/drivers/gpu/drm/armada/armada_fbdev.c @@ -54,7 +54,7 @@ static int armada_fbdev_create(struct drm_fb_helper *fbh, mode.width = sizes->surface_width; mode.height = sizes->surface_height; mode.pitches[0] = armada_pitch(mode.width, sizes->surface_bpp); - mode.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + mode.pixel_format = drm_mode_legacy_fb_format(dev, sizes->surface_bpp, sizes->surface_depth); size = mode.pitches[0] * mode.height; diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d612133e2cf7..62f81a14fb2e 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1453,7 +1453,7 @@ static uint32_t drm_fb_helper_find_format(struct drm_fb_helper *fb_helper, const * the framebuffer emulation can only deal with such * formats, specifically RGB/BGA formats. */ - format = drm_mode_legacy_fb_format(bpp, depth); + format = drm_mode_legacy_fb_format(dev, bpp, depth); if (!format) goto err; diff --git a/drivers/gpu/drm/drm_fbdev_dma.c b/drivers/gpu/drm/drm_fbdev_dma.c index 6c9427bb4053..cdb315c6d110 100644 --- a/drivers/gpu/drm/drm_fbdev_dma.c +++ b/drivers/gpu/drm/drm_fbdev_dma.c @@ -90,7 +90,8 @@ static int drm_fbdev_dma_helper_fb_probe(struct drm_fb_helper *fb_helper, sizes->surface_width, sizes->surface_height, sizes->surface_bpp); - format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth); + format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); buffer = drm_client_framebuffer_create(client, sizes->surface_width, sizes->surface_height, format); if (IS_ERR(buffer)) diff --git a/drivers/gpu/drm/drm_fbdev_generic.c b/drivers/gpu/drm/drm_fbdev_generic.c index d647d89764cb..aba8c272560c 100644 --- a/drivers/gpu/drm/drm_fbdev_generic.c +++ b/drivers/gpu/drm/drm_fbdev_generic.c @@ -84,7 +84,8 @@ static int drm_fbdev_generic_helper_fb_probe(struct drm_fb_helper *fb_helper, sizes->surface_width, sizes->surface_height, sizes->surface_bpp); - format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth); + format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); buffer = drm_client_framebuffer_create(client, sizes->surface_width, sizes->surface_height, format); if (IS_ERR(buffer)) diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c index 193cf8ed7912..034f2087af9a 100644 --- a/drivers/gpu/drm/drm_fourcc.c +++ b/drivers/gpu/drm/drm_fourcc.c @@ -29,47 +29,97 @@ #include <drm/drm_device.h> #include <drm/drm_fourcc.h> +#include <drm/drm_plane.h> +#include <drm/drm_print.h> + +/* + * Internal helper to find a valid format among a list of potentially + * valid formats. + * + * Traverses the variadic arguments until a format supported by @dev + * or an DRM_FORMAT_INVALID argument is found. If a supported format + * is found it is returned, otherwise DRM_FORMAT_INVALID is returned. + */ +static uint32_t select_valid_format(struct drm_device *dev, ...) +{ + va_list va; + uint32_t fmt = DRM_FORMAT_INVALID; + uint32_t to_try; + + va_start(va, dev); + + for (to_try = va_arg(va, uint32_t); + to_try != DRM_FORMAT_INVALID; + to_try = va_arg(va, uint32_t)) { + if (drm_any_plane_has_format(dev, to_try, 0)) { + fmt = to_try; + break; + } + } + + va_end(va); + + return fmt; +} /** * drm_mode_legacy_fb_format - compute drm fourcc code from legacy description + * @dev: DRM device * @bpp: bits per pixels * @depth: bit depth per pixel * * Computes a drm fourcc pixel format code for the given @bpp/@depth values. * Useful in fbdev emulation code, since that deals in those values. */ -uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth) +uint32_t drm_mode_legacy_fb_format(struct drm_device *dev, + uint32_t bpp, uint32_t depth) { uint32_t fmt = DRM_FORMAT_INVALID; switch (bpp) { case 1: if (depth == 1) - fmt = DRM_FORMAT_C1; + fmt = select_valid_format(dev, + DRM_FORMAT_C1, + DRM_FORMAT_INVALID); break; case 2: if (depth == 2) - fmt = DRM_FORMAT_C2; + fmt = select_valid_format(dev, + DRM_FORMAT_C2, + DRM_FORMAT_INVALID); break; case 4: if (depth == 4) - fmt = DRM_FORMAT_C4; + fmt = select_valid_format(dev, + DRM_FORMAT_C4, + DRM_FORMAT_INVALID); break; case 8: if (depth == 8) - fmt = DRM_FORMAT_C8; + fmt = select_valid_format(dev, + DRM_FORMAT_C8, + DRM_FORMAT_INVALID); break; case 16: switch (depth) { case 15: - fmt = DRM_FORMAT_XRGB1555; + fmt = select_valid_format(dev, + DRM_FORMAT_XRGB1555, + DRM_FORMAT_XBGR1555, + DRM_FORMAT_RGBX5551, + DRM_FORMAT_BGRX5551, + DRM_FORMAT_INVALID); break; case 16: - fmt = DRM_FORMAT_RGB565; + fmt = select_valid_format(dev, + DRM_FORMAT_RGB565, + DRM_FORMAT_BGR565, + DRM_FORMAT_INVALID); break; default: break; @@ -78,19 +128,36 @@ uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth) case 24: if (depth == 24) - fmt = DRM_FORMAT_RGB888; + fmt = select_valid_format(dev, + DRM_FORMAT_RGB888, + DRM_FORMAT_BGR888); break; case 32: switch (depth) { case 24: - fmt = DRM_FORMAT_XRGB8888; + fmt = select_valid_format(dev, + DRM_FORMAT_XRGB8888, + DRM_FORMAT_XBGR8888, + DRM_FORMAT_RGBX8888, + DRM_FORMAT_BGRX8888, + DRM_FORMAT_INVALID); break; case 30: - fmt = DRM_FORMAT_XRGB2101010; + fmt = select_valid_format(dev, + DRM_FORMAT_XRGB2101010, + DRM_FORMAT_XBGR2101010, + DRM_FORMAT_RGBX1010102, + DRM_FORMAT_BGRX1010102, + DRM_FORMAT_INVALID); break; case 32: - fmt = DRM_FORMAT_ARGB8888; + fmt = select_valid_format(dev, + DRM_FORMAT_ARGB8888, + DRM_FORMAT_ABGR8888, + DRM_FORMAT_RGBA8888, + DRM_FORMAT_BGRA8888, + DRM_FORMAT_INVALID); break; default: break; @@ -119,7 +186,7 @@ EXPORT_SYMBOL(drm_mode_legacy_fb_format); uint32_t drm_driver_legacy_fb_format(struct drm_device *dev, uint32_t bpp, uint32_t depth) { - uint32_t fmt = drm_mode_legacy_fb_format(bpp, depth); + uint32_t fmt = drm_mode_legacy_fb_format(dev, bpp, depth); if (dev->mode_config.quirk_addfb_prefer_host_byte_order) { if (fmt == DRM_FORMAT_XRGB8888) diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c index a379c8ca435a..e114ebd44169 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c @@ -104,7 +104,8 @@ static int exynos_drm_fbdev_create(struct drm_fb_helper *helper, mode_cmd.width = sizes->surface_width; mode_cmd.height = sizes->surface_height; mode_cmd.pitches[0] = sizes->surface_width * (sizes->surface_bpp >> 3); - mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); size = mode_cmd.pitches[0] * mode_cmd.height; diff --git a/drivers/gpu/drm/gma500/fbdev.c b/drivers/gpu/drm/gma500/fbdev.c index 98b44974d42d..811ae5cccf2c 100644 --- a/drivers/gpu/drm/gma500/fbdev.c +++ b/drivers/gpu/drm/gma500/fbdev.c @@ -189,7 +189,7 @@ static int psb_fbdev_fb_probe(struct drm_fb_helper *fb_helper, mode_cmd.width = sizes->surface_width; mode_cmd.height = sizes->surface_height; mode_cmd.pitches[0] = ALIGN(mode_cmd.width * DIV_ROUND_UP(bpp, 8), 64); - mode_cmd.pixel_format = drm_mode_legacy_fb_format(bpp, depth); + mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, bpp, depth); size = mode_cmd.pitches[0] * mode_cmd.height; size = ALIGN(size, PAGE_SIZE); diff --git a/drivers/gpu/drm/i915/display/intel_fbdev_fb.c b/drivers/gpu/drm/i915/display/intel_fbdev_fb.c index 0665f943f65f..cb32fcff8fb5 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev_fb.c @@ -30,7 +30,8 @@ struct drm_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper, mode_cmd.pitches[0] = ALIGN(mode_cmd.width * DIV_ROUND_UP(sizes->surface_bpp, 8), 64); - mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); size = mode_cmd.pitches[0] * mode_cmd.height; diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c index 030bedac632d..8748610299b4 100644 --- a/drivers/gpu/drm/msm/msm_fbdev.c +++ b/drivers/gpu/drm/msm/msm_fbdev.c @@ -77,7 +77,8 @@ static int msm_fbdev_create(struct drm_fb_helper *helper, uint32_t format; int ret, pitch; - format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth); + format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); DBG("create fbdev: %dx%d@%d (%dx%d)", sizes->surface_width, sizes->surface_height, sizes->surface_bpp, diff --git a/drivers/gpu/drm/omapdrm/omap_fbdev.c b/drivers/gpu/drm/omapdrm/omap_fbdev.c index 6b08b137af1a..98f01d80abd8 100644 --- a/drivers/gpu/drm/omapdrm/omap_fbdev.c +++ b/drivers/gpu/drm/omapdrm/omap_fbdev.c @@ -139,7 +139,7 @@ static int omap_fbdev_create(struct drm_fb_helper *helper, sizes->surface_height, sizes->surface_bpp, sizes->fb_width, sizes->fb_height); - mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, sizes->surface_bpp, sizes->surface_depth); mode_cmd.width = sizes->surface_width; diff --git a/drivers/gpu/drm/radeon/radeon_fbdev.c b/drivers/gpu/drm/radeon/radeon_fbdev.c index 02bf25759059..bf1843529c7c 100644 --- a/drivers/gpu/drm/radeon/radeon_fbdev.c +++ b/drivers/gpu/drm/radeon/radeon_fbdev.c @@ -221,7 +221,8 @@ static int radeon_fbdev_fb_helper_fb_probe(struct drm_fb_helper *fb_helper, if ((sizes->surface_bpp == 24) && ASIC_IS_AVIVO(rdev)) sizes->surface_bpp = 32; - mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); ret = radeon_fbdev_create_pinned_object(fb_helper, &mode_cmd, &gobj); diff --git a/drivers/gpu/drm/tegra/fbdev.c b/drivers/gpu/drm/tegra/fbdev.c index db6eaac3d30e..290e8c426b0c 100644 --- a/drivers/gpu/drm/tegra/fbdev.c +++ b/drivers/gpu/drm/tegra/fbdev.c @@ -87,7 +87,8 @@ static int tegra_fbdev_probe(struct drm_fb_helper *helper, cmd.pitches[0] = round_up(sizes->surface_width * bytes_per_pixel, tegra->pitch_align); - cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + cmd.pixel_format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); size = cmd.pitches[0] * cmd.height; diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c index ab89b7fc7bf6..ded868601aea 100644 --- a/drivers/gpu/drm/tiny/ofdrm.c +++ b/drivers/gpu/drm/tiny/ofdrm.c @@ -100,14 +100,14 @@ static const struct drm_format_info *display_get_validated_format(struct drm_dev switch (depth) { case 8: - format = drm_mode_legacy_fb_format(8, 8); + format = drm_mode_legacy_fb_format(dev, 8, 8); break; case 15: case 16: - format = drm_mode_legacy_fb_format(16, depth); + format = drm_mode_legacy_fb_format(dev, 16, depth); break; case 32: - format = drm_mode_legacy_fb_format(32, 24); + format = drm_mode_legacy_fb_format(dev, 32, 24); break; default: drm_err(dev, "unsupported framebuffer depth %u\n", depth); diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c index 51ae3561fd0d..a38a8143d632 100644 --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c @@ -32,7 +32,8 @@ struct drm_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper, mode_cmd.pitches[0] = ALIGN(mode_cmd.width * DIV_ROUND_UP(sizes->surface_bpp, 8), XE_PAGE_SIZE); - mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, + mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, + sizes->surface_bpp, sizes->surface_depth); size = mode_cmd.pitches[0] * mode_cmd.height; diff --git a/include/drm/drm_fourcc.h b/include/drm/drm_fourcc.h index ccf91daa4307..75d06393a564 100644 --- a/include/drm/drm_fourcc.h +++ b/include/drm/drm_fourcc.h @@ -310,7 +310,8 @@ const struct drm_format_info *drm_format_info(u32 format); const struct drm_format_info * drm_get_format_info(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd); -uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth); +uint32_t drm_mode_legacy_fb_format(struct drm_device *dev, + uint32_t bpp, uint32_t depth); uint32_t drm_driver_legacy_fb_format(struct drm_device *dev, uint32_t bpp, uint32_t depth); unsigned int drm_format_info_block_width(const struct drm_format_info *info, base-commit: b9511c6d277c31b13d4f3128eba46f4e0733d734 -- 2.44.0

1 year, 8 months

2
1
0 0

[PATCH 1/2] mtd: rawnand: Constrain even more when continuous reads are enabled

by Miquel Raynal

As a matter of fact, continuous reads require additional handling at the operation level in order for them to work properly. The core helpers do have this additional logic now, but any time a controller implements its own page helper, this extra logic is "lost". This means we need another level of per-controller driver checks to ensure they can leverage continuous reads. This is for now unsupported, so in order to ensure continuous reads are enabled only when fully using the core page helpers, we need to add more initial checks. Also, as performance is not relevant during raw accesses, we also prevent these from enabling the feature. This should solve the issue seen with controllers such as the STM32 FMC2 when in sequencer mode. In this case, the continuous read feature would be enabled but not leveraged, and most importantly not disabled, leading to further operations to fail. Reported-by: Christophe Kerello <christophe.kerello(a)foss.st.com> Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Cc: stable(a)vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com> --- drivers/mtd/nand/raw/nand_base.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index 4d5a663e4e05..2479fa98f991 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -3594,7 +3594,8 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from, oob = ops->oobbuf; oob_required = oob ? 1 : 0; - rawnand_enable_cont_reads(chip, page, readlen, col); + if (likely(ops->mode != MTD_OPS_RAW)) + rawnand_enable_cont_reads(chip, page, readlen, col); while (1) { struct mtd_ecc_stats ecc_stats = mtd->ecc_stats; @@ -5212,6 +5213,15 @@ static void rawnand_late_check_supported_ops(struct nand_chip *chip) if (!nand_has_exec_op(chip)) return; + /* + * For now, continuous reads can only be used with the core page helpers. + * This can be extended later. + */ + if (!(chip->ecc.read_page == nand_read_page_hwecc || + chip->ecc.read_page == nand_read_page_syndrome || + chip->ecc.read_page == nand_read_page_swecc)) + return; + rawnand_check_cont_read_support(chip); } -- 2.40.1

1 year, 8 months

2
2
0 0

[PATCH 2/8] drm/panel: do not return negative error codes from drm_panel_get_modes()

by Jani Nikula

None of the callers of drm_panel_get_modes() expect it to return negative error codes. Either they propagate the return value in their struct drm_connector_helper_funcs .get_modes() hook (which is also not supposed to return negative codes), or add it to other counts leading to bogus values. On the other hand, many of the struct drm_panel_funcs .get_modes() hooks do return negative error codes, so handle them gracefully instead of propagating further. Return 0 for no modes, whatever the reason. Cc: Neil Armstrong <neil.armstrong(a)linaro.org> Cc: Jessica Zhang <quic_jesszhan(a)quicinc.com> Cc: Sam Ravnborg <sam(a)ravnborg.org> Cc: stable(a)vger.kernel.org Signed-off-by: Jani Nikula <jani.nikula(a)intel.com> --- drivers/gpu/drm/drm_panel.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_panel.c b/drivers/gpu/drm/drm_panel.c index e814020bbcd3..cfbe020de54e 100644 --- a/drivers/gpu/drm/drm_panel.c +++ b/drivers/gpu/drm/drm_panel.c @@ -274,19 +274,24 @@ EXPORT_SYMBOL(drm_panel_disable); * The modes probed from the panel are automatically added to the connector * that the panel is attached to. * - * Return: The number of modes available from the panel on success or a - * negative error code on failure. + * Return: The number of modes available from the panel on success, or 0 on + * failure (no modes). */ int drm_panel_get_modes(struct drm_panel *panel, struct drm_connector *connector) { if (!panel) - return -EINVAL; + return 0; - if (panel->funcs && panel->funcs->get_modes) - return panel->funcs->get_modes(panel, connector); + if (panel->funcs && panel->funcs->get_modes) { + int num; - return -EOPNOTSUPP; + num = panel->funcs->get_modes(panel, connector); + if (num > 0) + return num; + } + + return 0; } EXPORT_SYMBOL(drm_panel_get_modes); -- 2.39.2

1 year, 8 months

3
2
0 0

Re: Patch "selftests: mptcp: simult flows: format subtests results in TAP" has been added to the 6.1-stable tree

by Matthieu Baerts

Hi Sasha, On 10/03/2024 03:33, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > selftests: mptcp: simult flows: format subtests results in TAP > > to the 6.1-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > selftests-mptcp-simult-flows-format-subtests-results.patch > and it can be found in the queue-6.1 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. Thank you for having backported this commit 675d99338e7a ("selftests: mptcp: simult flows: format subtests results in TAP") -- as well as commit 4d8e0dde0403 ("selftests: mptcp: simult flows: fix some subtest names"), a fix for it -- as a "dependence" for commit 5e2f3c65af47 ("selftests: mptcp: decrease BW in simult flows"), but I think it is better not to include 675d99338e7a (and 4d8e0dde0403): they are not dependences, just modifying the lines around, and they depend on other commits to have this feature to work. In other words, commit 675d99338e7a ("selftests: mptcp: simult flows: format subtests results in TAP") -- and 4d8e0dde0403 ("selftests: mptcp: simult flows: fix some subtest names") -- is now causing the MPTCP simult flows selftest to fail. Could it be possible to remove them from 6.1 and 5.15 queues please? > commit 4eeef0aaffa567f812390612c30f800de02edd73 > Author: Matthieu Baerts <matttbe(a)kernel.org> > Date: Mon Jul 17 15:21:31 2023 +0200 > > selftests: mptcp: simult flows: format subtests results in TAP > > [ Upstream commit 675d99338e7a6cd925d61d7dbf8c26612f7f08a9 ] > > The current selftests infrastructure formats the results in TAP 13. This > version doesn't support subtests and only the end result of each > selftest is taken into account. It means that a single issue in a > subtest of a selftest containing multiple subtests forces the whole > selftest to be marked as failed. It also means that subtests results are > not tracked by CIs executing selftests. > > MPTCP selftests run hundreds of various subtests. It is then important > to track each of them and not one result per selftest. > > It is particularly interesting to do that when validating stable kernels > with the last version of the test suite: tests might fail because a > feature is not supported but the test didn't skip that part. In this > case, if subtests are not tracked, the whole selftest will be marked as > failed making the other subtests useless because their results are > ignored. > > This patch formats subtests results in TAP in simult_flows.sh selftest. > > Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 > Acked-by: Paolo Abeni <pabeni(a)redhat.com> > Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net> > Signed-off-by: David S. Miller <davem(a)davemloft.net> > Stable-dep-of: 5e2f3c65af47 ("selftests: mptcp: decrease BW in simult flows") If needed, I can help to resolve the conflicts to have commit 5e2f3c65af47 ("selftests: mptcp: decrease BW in simult flows") backported to 6.1 and 5.15. Cheers, Matt -- Sponsored by the NGI0 Core fund.

1 year, 8 months

2
1
0 0

Re: [GIT PULL] bcachefs fixes for 6.7.y

by Kent Overstreet

On Sun, Mar 10, 2024 at 03:43:38PM -0400, Kent Overstreet wrote: > The following changes since commit 2e7cdd29fc42c410eab52fffe5710bf656619222: > > Linux 6.7.9 (2024-03-06 14:54:01 +0000) > > are available in the Git repository at: > > https://evilpiepirate.org/git/bcachefs.git tags/bcachefs-for-v6.7-stable-20240310 > > for you to fetch changes up to 560ceb6a4d9e3bea57c29f5f3a7a1d671dfc7983: > > bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree (2024-03-10 14:36:57 -0400) > > ---------------------------------------------------------------- > bcachefs fixes for 6.7 stable > > "bcachefs: fix simulateously upgrading & downgrading" is the important > one here. This fixes a really nasty bug where in a rare situation we > wouldn't downgrade; we'd write a superblock where the version number is > higher than the currently supported version. > > This caused total failure to mount multi device filesystems with the > splitbrain checking in 6.8, since now we wouldn't be updating the member > sequence numbers used for splitbrain checking, but the version number > said we would be - and newer versions would attempt to kick every device > out of the fs. > > ---------------------------------------------------------------- > Helge Deller (1): > bcachefs: Fix build on parisc by avoiding __multi3() > > Kent Overstreet (3): > bcachefs: check for failure to downgrade > bcachefs: fix simulateously upgrading & downgrading > bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree > > Mathias Krause (1): > bcachefs: install fd later to avoid race with close > > fs/bcachefs/btree_iter.c | 4 +++- > fs/bcachefs/chardev.c | 3 +-- > fs/bcachefs/errcode.h | 1 + > fs/bcachefs/mean_and_variance.h | 2 +- > fs/bcachefs/super-io.c | 27 ++++++++++++++++++++++++--- > 5 files changed, 30 insertions(+), 7 deletions(-)

1 year, 8 months

1
0
0 0

[PATCH] PCI: qcom: Enable BDF to SID translation properly

by Manivannan Sadhasivam

Qcom SoCs making use of ARM SMMU require BDF to SID translation table in the driver to properly map the SID for the PCIe devices based on their BDF identifier. This is currently achieved with the help of qcom_pcie_config_sid_1_9_0() function for SoCs supporting the 1_9_0 config. But With newer Qcom SoCs starting from SM8450, BDF to SID translation is set to bypass mode by default in hardware. Due to this, the translation table that is set in the qcom_pcie_config_sid_1_9_0() is essentially unused and the default SID is used for all endpoints in SoCs starting from SM8450. This is a security concern and also warrants swapping the DeviceID in DT while using the GIC ITS to handle MSIs from endpoints. The swapping is currently done like below in DT when using GIC ITS: /* * MSIs for BDF (1:0.0) only works with Device ID 0x5980. * Hence, the IDs are swapped. */ msi-map = <0x0 &gic_its 0x5981 0x1>, <0x100 &gic_its 0x5980 0x1>; Here, swapping of the DeviceIDs ensure that the endpoint with BDF (1:0.0) gets the DeviceID 0x5980 which is associated with the default SID as per the iommu mapping in DT. So MSIs were delivered with IDs swapped so far. But this also means the Root Port (0:0.0) won't receive any MSIs (for PME, AER etc...) So let's fix these issues by clearing the BDF to SID bypass mode for all SoCs making use of the 1_9_0 config. This allows the PCIe devices to use the correct SID, thus avoiding the DeviceID swapping hack in DT and also achieving the isolation between devices. Cc: <stable(a)vger.kernel.org> # 5.11 Fixes: 4c9398822106 ("PCI: qcom: Add support for configuring BDF to SID mapping for SM8250") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org> --- I will send the DT patches to fix the msi-map entries once this patch gets merged. --- drivers/pci/controller/dwc/pcie-qcom.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 10f2d0bb86be..84e47c6f95fe 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -53,6 +53,7 @@ #define PARF_SLV_ADDR_SPACE_SIZE 0x358 #define PARF_DEVICE_TYPE 0x1000 #define PARF_BDF_TO_SID_TABLE_N 0x2000 +#define PARF_BDF_TO_SID_CFG 0x2c00 /* ELBI registers */ #define ELBI_SYS_CTRL 0x04 @@ -120,6 +121,9 @@ /* PARF_DEVICE_TYPE register fields */ #define DEVICE_TYPE_RC 0x4 +/* PARF_BDF_TO_SID_CFG fields */ +#define BDF_TO_SID_BYPASS BIT(0) + /* ELBI_SYS_CTRL register fields */ #define ELBI_SYS_CTRL_LT_ENABLE BIT(0) @@ -1008,11 +1012,17 @@ static int qcom_pcie_config_sid_1_9_0(struct qcom_pcie *pcie) u8 qcom_pcie_crc8_table[CRC8_TABLE_SIZE]; int i, nr_map, size = 0; u32 smmu_sid_base; + u32 val; of_get_property(dev->of_node, "iommu-map", &size); if (!size) return 0; + /* Enable BDF to SID translation by disabling bypass mode (default) */ + val = readl(pcie->parf + PARF_BDF_TO_SID_CFG); + val &= ~BDF_TO_SID_BYPASS; + writel(val, pcie->parf + PARF_BDF_TO_SID_CFG); + map = kzalloc(size, GFP_KERNEL); if (!map) return -ENOMEM; --- base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d change-id: 20240307-pci-bdf-sid-fix-c9cd8c0023d0 Best regards, -- Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>

1 year, 8 months

3
3
0 0

[PATCH v2] kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1

by Nathan Chancellor

Clang enables -Wenum-enum-conversion and -Wenum-compare-conditional under -Wenum-conversion. A recent change in Clang strengthened these warnings and they appear frequently in common builds, primarily due to several instances in common headers but there are quite a few drivers that have individual instances as well. include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 508 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 509 | item]; | ~~~~ drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:955:24: warning: conditional expression between different enumeration types ('enum iwl_mac_beacon_flags' and 'enum iwl_mac_beacon_flags_v1') [-Wenum-compare-conditional] 955 | flags |= is_new_rate ? IWL_MAC_BEACON_CCK | ^ ~~~~~~~~~~~~~~~~~~ 956 | : IWL_MAC_BEACON_CCK_V1; | ~~~~~~~~~~~~~~~~~~~~~ drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:1120:21: warning: conditional expression between different enumeration types ('enum iwl_mac_beacon_flags' and 'enum iwl_mac_beacon_flags_v1') [-Wenum-compare-conditional] 1120 | 0) > 10 ? | ^ 1121 | IWL_MAC_BEACON_FILS : | ~~~~~~~~~~~~~~~~~~~ 1122 | IWL_MAC_BEACON_FILS_V1; | ~~~~~~~~~~~~~~~~~~~~~~ Doing arithmetic between or returning two different types of enums could be a bug, so each of the instance of the warning needs to be evaluated. Unfortunately, as mentioned above, there are many instances of this warning in many different configurations, which can break the build when CONFIG_WERROR is enabled. To avoid introducing new instances of the warnings while cleaning up the disruption for the majority of users, disable these warnings for the default build while leaving them on for W=1 builds. Cc: stable(a)vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2002 Link: https://github.com/llvm/llvm-project/commit/8c2ae42b3e1c6aa7c18f873edcebff7… Acked-by: Yonghong Song <yonghong.song(a)linux.dev> Signed-off-by: Nathan Chancellor <nathan(a)kernel.org> --- Changes in v2: - Only disable the warning for the default build, leave it on for W=1 (Arnd) - Add Yonghong's ack, as the warning is still disabled for the default build. - Link to v1: https://lore.kernel.org/r/20240305-disable-extra-clang-enum-warnings-v1-1-6… --- scripts/Makefile.extrawarn | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn index a9e552a1e910..2f25a1de129d 100644 --- a/scripts/Makefile.extrawarn +++ b/scripts/Makefile.extrawarn @@ -132,6 +132,8 @@ KBUILD_CFLAGS += $(call cc-disable-warning, pointer-to-enum-cast) KBUILD_CFLAGS += -Wno-tautological-constant-out-of-range-compare KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access) KBUILD_CFLAGS += $(call cc-disable-warning, cast-function-type-strict) +KBUILD_CFLAGS += -Wno-enum-compare-conditional +KBUILD_CFLAGS += -Wno-enum-enum-conversion endif endif --- base-commit: 90d35da658da8cff0d4ecbb5113f5fac9d00eb72 change-id: 20240304-disable-extra-clang-enum-warnings-bf574c7c99fd Best regards, -- Nathan Chancellor <nathan(a)kernel.org>

1 year, 8 months

4
3
0 0

[for-linus][PATCH 3/3] tracing: Use .flush() call to wake up readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The .release() function does not get called until all readers of a file descriptor are finished. If a thread is blocked on reading a file descriptor in ring_buffer_wait(), and another thread closes the file descriptor, it will not wake up the other thread as ring_buffer_wake_waiters() is called by .release(), and that will not get called until the .read() is finished. The issue originally showed up in trace-cmd, but the readers are actually other processes with their own file descriptors. So calling close() would wake up the other tasks because they are blocked on another descriptor then the one that was closed(). But there's other wake ups that solve that issue. When a thread is blocked on a read, it can still hang even when another thread closed its descriptor. This is what the .flush() callback is for. Have the .flush() wake up the readers. Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis… Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: linke li <lilinke99(a)qq.com> Cc: Rabin Vincent <rabin(a)rab.in> Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index d16b95ca58a7..c9c898307348 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8393,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, return size; } +static int tracing_buffers_flush(struct file *file, fl_owner_t id) +{ + struct ftrace_buffer_info *info = file->private_data; + struct trace_iterator *iter = &info->iter; + + iter->wait_index++; + /* Make sure the waiters see the new wait_index */ + smp_wmb(); + + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + + return 0; +} + static int tracing_buffers_release(struct inode *inode, struct file *file) { struct ftrace_buffer_info *info = file->private_data; @@ -8404,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file) __trace_array_put(iter->tr); - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); - - ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); - if (info->spare) ring_buffer_free_read_page(iter->array_buffer->buffer, info->spare_cpu, info->spare); @@ -8625,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = { .read = tracing_buffers_read, .poll = tracing_buffers_poll, .release = tracing_buffers_release, + .flush = tracing_buffers_flush, .splice_read = tracing_buffers_splice_read, .unlocked_ioctl = tracing_buffers_ioctl, .llseek = no_llseek, -- 2.43.0

1 year, 8 months

1
0
0 0

[for-linus][PATCH 2/3] ring-buffer: Fix resetting of shortest_full

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The "shortest_full" variable is used to keep track of the waiter that is waiting for the smallest amount on the ring buffer before being woken up. When a tasks waits on the ring buffer, it passes in a "full" value that is a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to 100% full buffer. As all waiters are on the same wait queue, the wake up happens for the waiter with the smallest percentage. The problem is that the smallest_full on the cpu_buffer that stores the smallest amount doesn't get reset when all the waiters are woken up. It does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace). This means that tasks may be woken up more often then when they want to be. Instead, have the shortest_full field get reset just before waking up all the tasks. If the tasks wait again, they will update the shortest_full before sleeping. Also add locking around setting of shortest_full in the poll logic, and change "work" to "rbwork" to match the variable name for rb_irq_work structures that are used in other places. Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.948914369@goodmis… Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: linke li <lilinke99(a)qq.com> Cc: Rabin Vincent <rabin(a)rab.in> Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 3400f11286e3..aa332ace108b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -755,8 +755,19 @@ static void rb_wake_up_waiters(struct irq_work *work) wake_up_all(&rbwork->waiters); if (rbwork->full_waiters_pending || rbwork->wakeup_full) { + /* Only cpu_buffer sets the above flags */ + struct ring_buffer_per_cpu *cpu_buffer = + container_of(rbwork, struct ring_buffer_per_cpu, irq_work); + + /* Called from interrupt context */ + raw_spin_lock(&cpu_buffer->reader_lock); rbwork->wakeup_full = false; rbwork->full_waiters_pending = false; + + /* Waking up all waiters, they will reset the shortest full */ + cpu_buffer->shortest_full = 0; + raw_spin_unlock(&cpu_buffer->reader_lock); + wake_up_all(&rbwork->full_waiters); } } @@ -934,28 +945,33 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full) { struct ring_buffer_per_cpu *cpu_buffer; - struct rb_irq_work *work; + struct rb_irq_work *rbwork; if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; + rbwork = &buffer->irq_work; full = 0; } else { if (!cpumask_test_cpu(cpu, buffer->cpumask)) return EPOLLERR; cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; + rbwork = &cpu_buffer->irq_work; } if (full) { - poll_wait(filp, &work->full_waiters, poll_table); - work->full_waiters_pending = true; + unsigned long flags; + + poll_wait(filp, &rbwork->full_waiters, poll_table); + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } else { - poll_wait(filp, &work->waiters, poll_table); - work->waiters_pending = true; + poll_wait(filp, &rbwork->waiters, poll_table); + rbwork->waiters_pending = true; } /* -- 2.43.0

1 year, 8 months

1
0
0 0

[for-linus][PATCH 1/3] ring-buffer: Fix waking up ring buffer readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> A task can wait on a ring buffer for when it fills up to a specific watermark. The writer will check the minimum watermark that waiters are waiting for and if the ring buffer is past that, it will wake up all the waiters. The waiters are in a wait loop, and will first check if a signal is pending and then check if the ring buffer is at the desired level where it should break out of the loop. If a file that uses a ring buffer closes, and there's threads waiting on the ring buffer, it needs to wake up those threads. To do this, a "wait_index" was used. Before entering the wait loop, the waiter will read the wait_index. On wakeup, it will check if the wait_index is different than when it entered the loop, and will exit the loop if it is. The waker will only need to update the wait_index before waking up the waiters. This had a couple of bugs. One trivial one and one broken by design. The trivial bug was that the waiter checked the wait_index after the schedule() call. It had to be checked between the prepare_to_wait() and the schedule() which it was not. The main bug is that the first check to set the default wait_index will always be outside the prepare_to_wait() and the schedule(). That's because the ring_buffer_wait() doesn't have enough context to know if it should break out of the loop. The loop itself is not needed, because all the callers to the ring_buffer_wait() also has their own loop, as the callers have a better sense of what the context is to decide whether to break out of the loop or not. Just have the ring_buffer_wait() block once, and if it gets woken up, exit the function and let the callers decide what to do next. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis… Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: linke li <lilinke99(a)qq.com> Cc: Rabin Vincent <rabin(a)rab.in> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 139 ++++++++++++++++++------------------- 1 file changed, 68 insertions(+), 71 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 0699027b4f4c..3400f11286e3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -384,7 +384,6 @@ struct rb_irq_work { struct irq_work work; wait_queue_head_t waiters; wait_queue_head_t full_waiters; - long wait_index; bool waiters_pending; bool full_waiters_pending; bool wakeup_full; @@ -798,14 +797,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu) rbwork = &cpu_buffer->irq_work; } - rbwork->wait_index++; - /* make sure the waiters see the new index */ - smp_wmb(); - /* This can be called in any context */ irq_work_queue(&rbwork->work); } +static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + bool ret = false; + + /* Reads of all CPUs always waits for any data */ + if (cpu == RING_BUFFER_ALL_CPUS) + return !ring_buffer_empty(buffer); + + cpu_buffer = buffer->buffers[cpu]; + + if (!ring_buffer_empty_cpu(buffer, cpu)) { + unsigned long flags; + bool pagebusy; + + if (!full) + return true; + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; + ret = !pagebusy && full_hit(buffer, cpu, full); + + if (!cpu_buffer->shortest_full || + cpu_buffer->shortest_full > full) + cpu_buffer->shortest_full = full; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + } + return ret; +} + /** * ring_buffer_wait - wait for input to the ring buffer * @buffer: buffer to wait on @@ -821,7 +846,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) struct ring_buffer_per_cpu *cpu_buffer; DEFINE_WAIT(wait); struct rb_irq_work *work; - long wait_index; int ret = 0; /* @@ -840,81 +864,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) work = &cpu_buffer->irq_work; } - wait_index = READ_ONCE(work->wait_index); - - while (true) { - if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); - else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); - - /* - * The events can happen in critical sections where - * checking a work queue can cause deadlocks. - * After adding a task to the queue, this flag is set - * only to notify events to try to wake up the queue - * using irq_work. - * - * We don't clear it even if the buffer is no longer - * empty. The flag only causes the next event to run - * irq_work to do the work queue wake up. The worse - * that can happen if we race with !trace_empty() is that - * an event will cause an irq_work to try to wake up - * an empty queue. - * - * There's no reason to protect this flag either, as - * the work queue and irq_work logic will do the necessary - * synchronization for the wake ups. The only thing - * that is necessary is that the wake up happens after - * a task has been queued. It's OK for spurious wake ups. - */ - if (full) - work->full_waiters_pending = true; - else - work->waiters_pending = true; - - if (signal_pending(current)) { - ret = -EINTR; - break; - } - - if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) - break; - - if (cpu != RING_BUFFER_ALL_CPUS && - !ring_buffer_empty_cpu(buffer, cpu)) { - unsigned long flags; - bool pagebusy; - bool done; - - if (!full) - break; - - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; - done = !pagebusy && full_hit(buffer, cpu, full); + if (full) + prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); + else + prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); - if (!cpu_buffer->shortest_full || - cpu_buffer->shortest_full > full) - cpu_buffer->shortest_full = full; - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - if (done) - break; - } + /* + * The events can happen in critical sections where + * checking a work queue can cause deadlocks. + * After adding a task to the queue, this flag is set + * only to notify events to try to wake up the queue + * using irq_work. + * + * We don't clear it even if the buffer is no longer + * empty. The flag only causes the next event to run + * irq_work to do the work queue wake up. The worse + * that can happen if we race with !trace_empty() is that + * an event will cause an irq_work to try to wake up + * an empty queue. + * + * There's no reason to protect this flag either, as + * the work queue and irq_work logic will do the necessary + * synchronization for the wake ups. The only thing + * that is necessary is that the wake up happens after + * a task has been queued. It's OK for spurious wake ups. + */ + if (full) + work->full_waiters_pending = true; + else + work->waiters_pending = true; - schedule(); + if (rb_watermark_hit(buffer, cpu, full)) + goto out; - /* Make sure to see the new wait index */ - smp_rmb(); - if (wait_index != work->wait_index) - break; + if (signal_pending(current)) { + ret = -EINTR; + goto out; } + schedule(); + out: if (full) finish_wait(&work->full_waiters, &wait); else finish_wait(&work->waiters, &wait); + if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) + ret = -EINTR; + return ret; } -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH] SEV: disable SEV-ES DebugSwap by default

by Paolo Bonzini

The DebugSwap feature of SEV-ES provides a way for confidential guests to use data breakpoints. However, because the status of the DebugSwap feature is recorded in the VMSA, enabling it by default invalidates the attestation signatures. In 6.10 we will introduce a new API to create SEV VMs that will allow enabling DebugSwap based on what the user tells KVM to do. Contextually, we will change the legacy KVM_SEV_ES_INIT API to never enable DebugSwap. For compatibility with kernels that pre-date the introduction of DebugSwap, as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable the feature by default. If anybody wants to use it, for now they can enable the sev_es_debug_swap_enabled module parameter, but this will result in a warning. Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES") Cc: stable(a)vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> --- arch/x86/kvm/svm/sev.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index f760106c31f8..69b37956c1c8 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -57,7 +57,7 @@ static bool sev_es_enabled = true; module_param_named(sev_es, sev_es_enabled, bool, 0444); /* enable/disable SEV-ES DebugSwap support */ -static bool sev_es_debug_swap_enabled = true; +static bool sev_es_debug_swap_enabled = false; module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444); #else #define sev_enabled false @@ -612,8 +612,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) save->xss = svm->vcpu.arch.ia32_xss; save->dr6 = svm->vcpu.arch.dr6; - if (sev_es_debug_swap_enabled) + if (sev_es_debug_swap_enabled) { save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP; + pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. " + "This will not work starting with Linux 6.10\n"); + } pr_debug("Virtual Machine Save Area (VMSA):\n"); print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false); -- 2.39.1

1 year, 8 months

1
0
0 0

[PATCH 0/3] QCM2290 LMH

by Konrad Dybcio

Wire up LMH on QCM2290 and fix a bad bug while at it. P1-2 for thermal, P3 for qcom Signed-off-by: Konrad Dybcio <konrad.dybcio(a)linaro.org> --- Konrad Dybcio (2): dt-bindings: thermal: lmh: Add QCM2290 compatible thermal: qcom: lmh: Check for SCM avaiability at probe Loic Poulain (1): arm64: dts: qcom: qcm2290: Add LMH node Documentation/devicetree/bindings/thermal/qcom-lmh.yaml | 13 +++++++++---- arch/arm64/boot/dts/qcom/qcm2290.dtsi | 14 +++++++++++++- drivers/thermal/qcom/lmh.c | 3 +++ 3 files changed, 25 insertions(+), 5 deletions(-) --- base-commit: 8ffc8b1bbd505e27e2c8439d326b6059c906c9dd change-id: 20240308-topic-rb1_lmh-1e0f440c392a Best regards, -- Konrad Dybcio <konrad.dybcio(a)linaro.org>

1 year, 8 months

2
2
0 0

[PATCH v2 0/7] sparc32: build fixes for all{yes,mod}config builds

by Sam Ravnborg via B4 Relay

This is a small set of patches that address build breakage with allyesconfig / allmodconfig. This solves some, but not all, build breakage. The parport fix depends on the previous patch, the rest are independent fixes. With v2 there is a extra patch that drops ZONE_DMA support. It does not fix any build failure, but a nice cleanup. Cc: Miquel Raynal <miquel.raynal(a)bootlin.com> To: Maciej W. Rozycki <macro(a)orcam.me.uk> To: <sparclinux(a)vger.kernel.org> Cc: <linux-parport(a)lists.infradead.org> Cc: David S. Miller <davem(a)davemloft.net> To: Andreas Larsson <andreas(a)gaisler.com> To: Randy Dunlap <rdunlap(a)infradead.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: <linux-kernel(a)vger.kernel.org> Changes in v2: - Added r-b/tested by (thanks to Randy and Maciej) - Dropped patch for uhci-grlib.c as it is already upstream (Randy) - Added a few Fixes (Maciej) - Fixed commit message when dropping GENERIC_ISA_DMA (Maciej) - Added new patch that drop ZONE_DMA (Maciej) - Added new patch to fix section mismatch error In an allmodconfig build I see a lot of: modpost: "__udelay" [module] has no CRC! Similar for a handful of other symbols. Any hint how to get rid of them would be nice. I have tried to add the prototype to asm-prototypes.h with no luck. On top of this the link fails, but I assume this the kernel that grows too big which is no surprise. - Link to v1: https://lore.kernel.org/r/20240223-sam-fix-sparc32-all-builds-v1-0-5c60fd5c… --- Sam Ravnborg (7): sparc32: Use generic cmpdi2/ucmpdi2 variants sparc32: Fix build with trapbase mtd: maps: sun_uflash: Declare uflash_devinit static sparc32: Do not select ZONE_DMA sparc32: Do not select GENERIC_ISA_DMA sparc32: Fix parport build with sparc32 sparc32: Fix section mismatch in leon_pci_grpci arch/sparc/Kconfig | 7 +- arch/sparc/include/asm/parport.h | 259 +----------------------------------- arch/sparc/include/asm/parport_64.h | 256 +++++++++++++++++++++++++++++++++++ arch/sparc/kernel/irq_32.c | 6 +- arch/sparc/kernel/kernel.h | 8 +- arch/sparc/kernel/kgdb_32.c | 4 +- arch/sparc/kernel/leon_pci_grpci1.c | 2 +- arch/sparc/kernel/leon_pci_grpci2.c | 2 +- arch/sparc/kernel/leon_smp.c | 6 +- arch/sparc/kernel/setup_32.c | 4 +- arch/sparc/lib/Makefile | 4 +- arch/sparc/lib/cmpdi2.c | 28 ---- arch/sparc/lib/ucmpdi2.c | 20 --- arch/sparc/mm/srmmu.c | 1 - drivers/mtd/maps/sun_uflash.c | 2 +- 15 files changed, 284 insertions(+), 325 deletions(-) --- base-commit: 626db6ee8ee1edac206610db407114aa83b53fd3 change-id: 20240223-sam-fix-sparc32-all-builds-0a0403d6e1b3 Best regards, -- Sam Ravnborg <sam(a)ravnborg.org>

1 year, 8 months

4
4
0 0

[PATCH v2 6/6] tracing/ring-buffer: Fix wait_on_pipe() race

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the trace_pipe_raw file is closed, there should be no new readers on the file descriptor. This is mostly handled with the waking and wait_index fields of the iterator. But there's still a slight race. CPU 0 CPU 1 ----- ----- wait_woken_prepare() if (waking) woken = true; index = wait_index; wait_woken_set() waking = true wait_index++; ring_buffer_wake_waiters(); wait_on_pipe() ring_buffer_wait(); The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is that the ring_buffer_wait() needs the logic of: prepare_to_wait(); if (!condition) schedule(); Where the missing condition check is the iter->waking. Either that condition check needs to be passed to ring_buffer_wait() or the function needs to be broken up into three parts. This chooses to do the break up. Break ring_buffer_wait() into: ring_buffer_prepare_to_wait(); ring_buffer_wait(); ring_buffer_finish_wait(); Now wait_on_pipe() can have: ring_buffer_prepare_to_wait(); if (!iter->waking) ring_buffer_wait(); ring_buffer_finish_wait(); And this will catch the above race, as the waiter will either see waking, or already have been woken up. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 4 ++ kernel/trace/ring_buffer.c | 88 ++++++++++++++++++++++++++----------- kernel/trace/trace.c | 14 +++++- 3 files changed, 78 insertions(+), 28 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index fa802db216f9..e5b5903cdc21 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -98,7 +98,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full, + struct wait_queue_entry *wait); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); +void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full, + struct wait_queue_entry *wait); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 856d0e5b0da5..fa7090f6b4fc 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -868,29 +868,29 @@ rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full) } /** - * ring_buffer_wait - wait for input to the ring buffer + * ring_buffer_prepare_to_wait - Prepare to wait for data on the ring buffer * @buffer: buffer to wait on * @cpu: the cpu buffer to wait on - * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @full: wait until the percentage of pages are available, + * if @cpu != RING_BUFFER_ALL_CPUS. It may be updated via this function. + * @wait: The wait queue entry. * - * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon - * as data is added to any of the @buffer's cpu buffers. Otherwise - * it will wait for data to be added to a specific cpu buffer. + * This must be called before ring_buffer_wait(). It calls the prepare_to_wait() + * on @wait for the necessary wait queue defined by @buffer, @cpu, and @full. */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full, + struct wait_queue_entry *wait) { struct rb_irq_work *rbwork; - DEFINE_WAIT(wait); - int ret = 0; - rbwork = rb_get_work_queue(buffer, cpu, &full); + rbwork = rb_get_work_queue(buffer, cpu, full); if (IS_ERR(rbwork)) return PTR_ERR(rbwork); - if (full) - prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE); + if (*full) + prepare_to_wait(&rbwork->full_waiters, wait, TASK_INTERRUPTIBLE); else - prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(&rbwork->waiters, wait, TASK_INTERRUPTIBLE); /* * The events can happen in critical sections where @@ -912,30 +912,66 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * that is necessary is that the wake up happens after * a task has been queued. It's OK for spurious wake ups. */ - if (full) + if (*full) rbwork->full_waiters_pending = true; else rbwork->waiters_pending = true; - if (rb_watermark_hit(buffer, cpu, full)) - goto out; + return 0; +} - if (signal_pending(current)) { - ret = -EINTR; - goto out; - } +/** + * ring_buffer_finish_wait - clean up of ring_buffer_prepare_to_wait() + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @wait: The wait queue entry. + * + * This must be called after ring_buffer_prepare_to_wait(). It cleans up + * the @wait for the queue defined by @buffer, @cpu, and @full. + */ +void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full, + struct wait_queue_entry *wait) +{ + struct rb_irq_work *rbwork; + + rbwork = rb_get_work_queue(buffer, cpu, &full); + if (WARN_ON_ONCE(IS_ERR(rbwork))) + return; - schedule(); - out: if (full) - finish_wait(&rbwork->full_waiters, &wait); + finish_wait(&rbwork->full_waiters, wait); else - finish_wait(&rbwork->waiters, &wait); + finish_wait(&rbwork->waiters, wait); +} - if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) - ret = -EINTR; +/** + * ring_buffer_wait - wait for input to the ring buffer + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon + * as data is added to any of the @buffer's cpu buffers. Otherwise + * it will wait for data to be added to a specific cpu buffer. + * + * ring_buffer_prepare_to_wait() must be called before this function + * and ring_buffer_finish_wait() must be called after. + */ +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +{ + if (rb_watermark_hit(buffer, cpu, full)) + return 0; - return ret; + if (signal_pending(current)) + return -EINTR; + + schedule(); + + if (!rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) + return -EINTR; + + return 0; } /** diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a184dbdf8e91..2d6bc6ee8a58 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1984,7 +1984,8 @@ static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index) spin_lock(&wait_lock); if (iter->waking) woken = true; - *wait_index = iter->wait_index; + if (wait_index) + *wait_index = iter->wait_index; spin_unlock(&wait_lock); return woken; @@ -2019,13 +2020,22 @@ static void wait_woken_clear(struct trace_iterator *iter) static int wait_on_pipe(struct trace_iterator *iter, int full) { + struct trace_buffer *buffer; + DEFINE_WAIT(wait); int ret; /* Iterators are static, they should be filled or empty */ if (trace_buffer_iter(iter, iter->cpu_file)) return 0; - ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full); + buffer = iter->array_buffer->buffer; + + ret = ring_buffer_prepare_to_wait(buffer, iter->cpu_file, &full, &wait); + if (ret < 0) + return ret; + if (!wait_woken_prepare(iter, NULL)) + ret = ring_buffer_wait(buffer, iter->cpu_file, full); + ring_buffer_finish_wait(buffer, iter->cpu_file, full, &wait); #ifdef CONFIG_TRACER_MAX_TRACE /* -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH v2 5/6] ring-buffer: Restructure ring_buffer_wait() to prepare for updates

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The ring_buffer_wait() needs to be broken into three functions for proper synchronization from the context of the callers: ring_buffer_prepare_to_wait() ring_buffer_wait() ring_buffer_finish_wait() To simplify the process, pull out the logic for getting the right work queue to wait on, as it will be needed for the above functions. There are three work queues depending on the cpu value. If cpu == RING_BUFFER_ALL_CPUS, then the main "buffer->irq_work" is used. Otherwise, the cpu_buffer representing the CPU buffer's irq_work is used. Create a rb_get_work_queue() helper function to retrieve the proper queue. Also rename "work" to "rbwork" as the variable point to struct rb_irq_work, and to be more consistent with the variable naming elsewhere in the file. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 58 +++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 23 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index aa332ace108b..856d0e5b0da5 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -842,6 +842,31 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) return ret; } +static struct rb_irq_work * +rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + struct rb_irq_work *rbwork; + + /* + * Depending on what the caller is waiting for, either any + * data in any cpu buffer, or a specific buffer, put the + * caller on the appropriate wait queue. + */ + if (cpu == RING_BUFFER_ALL_CPUS) { + rbwork = &buffer->irq_work; + /* Full only makes sense on per cpu reads */ + *full = 0; + } else { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + return ERR_PTR(-ENODEV); + cpu_buffer = buffer->buffers[cpu]; + rbwork = &cpu_buffer->irq_work; + } + + return rbwork; +} + /** * ring_buffer_wait - wait for input to the ring buffer * @buffer: buffer to wait on @@ -854,31 +879,18 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) */ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) { - struct ring_buffer_per_cpu *cpu_buffer; + struct rb_irq_work *rbwork; DEFINE_WAIT(wait); - struct rb_irq_work *work; int ret = 0; - /* - * Depending on what the caller is waiting for, either any - * data in any cpu buffer, or a specific buffer, put the - * caller on the appropriate wait queue. - */ - if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; - /* Full only makes sense on per cpu reads */ - full = 0; - } else { - if (!cpumask_test_cpu(cpu, buffer->cpumask)) - return -ENODEV; - cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; - } + rbwork = rb_get_work_queue(buffer, cpu, &full); + if (IS_ERR(rbwork)) + return PTR_ERR(rbwork); if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE); else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE); /* * The events can happen in critical sections where @@ -901,9 +913,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * a task has been queued. It's OK for spurious wake ups. */ if (full) - work->full_waiters_pending = true; + rbwork->full_waiters_pending = true; else - work->waiters_pending = true; + rbwork->waiters_pending = true; if (rb_watermark_hit(buffer, cpu, full)) goto out; @@ -916,9 +928,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) schedule(); out: if (full) - finish_wait(&work->full_waiters, &wait); + finish_wait(&rbwork->full_waiters, &wait); else - finish_wait(&work->waiters, &wait); + finish_wait(&rbwork->waiters, &wait); if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) ret = -EINTR; -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH v2 4/6] tracing: Fix waking up tracing readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the tracing_pipe_raw file is closed, if there are readers still blocked on it, they need to be woken up. Currently a wait_index is used. When the readers need to be woken, the index is updated and they are all woken up. But there is a race where a new reader could be coming in just as the file is being closed, and it could still block if the wake up happens just before the reader enters the wait. Add another field called "waking" and wrap both the waking and wait_index around a new wait_lock to synchronize them. When a reader comes in, it will save the current wait_index, but if waking is set, then it will not block no matter what wait_index is. After it wakes from the wait, if either the waking is set or the wait_index is not the same as what it read before, then it will not block. The waker will set waking and increment the wait_index. For the .flush() function, it will not clear waking so that all new readers must not block. There's an ioctl() that kicks all current waiters, but does not care about new waiters. It will set the waking count back to what it was when it came in. There's still a race with the wait_on_pipe() with respect to the ring_buffer_wait(), but that will be dealt with separately. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240308184007.805898590@goodmis… - My tests triggered a warning about calling a mutex_lock() after a prepare_to_wait() that changed the task's state. Convert the affected mutex over to a spinlock. include/linux/trace_events.h | 3 +- kernel/trace/trace.c | 104 +++++++++++++++++++++++++++++------ 2 files changed, 89 insertions(+), 18 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d68ff9b1247f..adf8e163a7be 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -103,7 +103,8 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; - long wait_index; + int wait_index; + int waking; /* set by a waker */ /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c9c898307348..a184dbdf8e91 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1955,6 +1955,68 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu) #endif /* CONFIG_TRACER_MAX_TRACE */ +/* + * In order to wake up readers and have them return back to user space, + * the iterator has two counters: + * + * wait_index - always increases every time a waker wakes up the readers. + * waking - Set by the waker when waking and cleared afterward. + * + * Both are protected together with the wait_lock. + * When waking, the lock is taken and both indexes are incremented. + * The reader will first prepare the wait by taking the lock, + * if waking is set, it will sleep regardless of what wait_index is. + * Then after it sleeps it checks if wait_index has been updated + * and if it has, it will not sleep again. + * + * Note, if wait_woken_clear() is not called, then all new readers + * will not sleep (this happens in closing the file). + * + * wait_lock needs to be a spinlock and not a mutex as it can be called + * after prepare_to_wait(), which changes the task's state. + */ +static DEFINE_SPINLOCK(wait_lock); + +static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index) +{ + bool woken = false; + + spin_lock(&wait_lock); + if (iter->waking) + woken = true; + *wait_index = iter->wait_index; + spin_unlock(&wait_lock); + + return woken; +} + +static bool wait_woken_check(struct trace_iterator *iter, int *wait_index) +{ + bool woken = false; + + spin_lock(&wait_lock); + if (iter->waking || *wait_index != iter->wait_index) + woken = true; + spin_unlock(&wait_lock); + + return woken; +} + +static void wait_woken_set(struct trace_iterator *iter) +{ + spin_lock(&wait_lock); + iter->waking++; + iter->wait_index++; + spin_unlock(&wait_lock); +} + +static void wait_woken_clear(struct trace_iterator *iter) +{ + spin_lock(&wait_lock); + iter->waking--; + spin_unlock(&wait_lock); +} + static int wait_on_pipe(struct trace_iterator *iter, int full) { int ret; @@ -8312,9 +8374,11 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, struct ftrace_buffer_info *info = filp->private_data; struct trace_iterator *iter = &info->iter; void *trace_data; + int wait_index; int page_size; ssize_t ret = 0; ssize_t size; + bool woken; if (!count) return 0; @@ -8353,6 +8417,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, if (info->read < page_size) goto read; + woken = wait_woken_prepare(iter, &wait_index); again: trace_access_lock(iter->cpu_file); ret = ring_buffer_read_page(iter->array_buffer->buffer, @@ -8362,7 +8427,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, trace_access_unlock(iter->cpu_file); if (ret < 0) { - if (trace_empty(iter)) { + if (trace_empty(iter) && !woken) { if ((filp->f_flags & O_NONBLOCK)) return -EAGAIN; @@ -8370,6 +8435,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, if (ret) return ret; + woken = wait_woken_check(iter, &wait_index); + goto again; } return 0; @@ -8398,12 +8465,14 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id) struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); + wait_woken_set(iter); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + /* + * Do not call wait_woken_clear(), as the file is being closed. + * this will prevent any new readers from sleeping. + */ return 0; } @@ -8500,9 +8569,11 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + int wait_index; int page_size; int entries, i; ssize_t ret = 0; + bool woken = false; #ifdef CONFIG_TRACER_MAX_TRACE if (iter->snapshot && iter->tr->current_trace->use_max_tr) @@ -8522,6 +8593,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (splice_grow_spd(pipe, &spd)) return -ENOMEM; + woken = wait_woken_prepare(iter, &wait_index); again: trace_access_lock(iter->cpu_file); entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file); @@ -8573,17 +8645,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, /* did we read anything? */ if (!spd.nr_pages) { - long wait_index; if (ret) goto out; + if (woken) + goto out; + ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; - wait_index = READ_ONCE(iter->wait_index); - ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent); if (ret) goto out; @@ -8592,10 +8664,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!tracer_tracing_is_on(iter->tr)) goto out; - /* Make sure we see the new wait_index */ - smp_rmb(); - if (wait_index != iter->wait_index) - goto out; + woken = wait_woken_check(iter, &wait_index); goto again; } @@ -8616,15 +8685,16 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned if (cmd) return -ENOIOCTLCMD; - mutex_lock(&trace_types_lock); - - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); + wait_woken_set(iter); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); - mutex_unlock(&trace_types_lock); + /* + * This just kicks existing readers, a new reader coming in may + * still sleep. + */ + wait_woken_clear(iter); + return 0; } -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH v2 3/6] tracing: Use .flush() call to wake up readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The .release() function does not get called until all readers of a file descriptor are finished. If a thread is blocked on reading a file descriptor in ring_buffer_wait(), and another thread closes the file descriptor, it will not wake up the other thread as ring_buffer_wake_waiters() is called by .release(), and that will not get called until the .read() is finished. The issue originally showed up in trace-cmd, but the readers are actually other processes with their own file descriptors. So calling close() would wake up the other tasks because they are blocked on another descriptor then the one that was closed(). But there's other wake ups that solve that issue. When a thread is blocked on a read, it can still hang even when another thread closed its descriptor. This is what the .flush() callback is for. Have the .flush() wake up the readers. Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index d16b95ca58a7..c9c898307348 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8393,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, return size; } +static int tracing_buffers_flush(struct file *file, fl_owner_t id) +{ + struct ftrace_buffer_info *info = file->private_data; + struct trace_iterator *iter = &info->iter; + + iter->wait_index++; + /* Make sure the waiters see the new wait_index */ + smp_wmb(); + + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + + return 0; +} + static int tracing_buffers_release(struct inode *inode, struct file *file) { struct ftrace_buffer_info *info = file->private_data; @@ -8404,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file) __trace_array_put(iter->tr); - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); - - ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); - if (info->spare) ring_buffer_free_read_page(iter->array_buffer->buffer, info->spare_cpu, info->spare); @@ -8625,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = { .read = tracing_buffers_read, .poll = tracing_buffers_poll, .release = tracing_buffers_release, + .flush = tracing_buffers_flush, .splice_read = tracing_buffers_splice_read, .unlocked_ioctl = tracing_buffers_ioctl, .llseek = no_llseek, -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH v2 2/6] ring-buffer: Fix resetting of shortest_full

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The "shortest_full" variable is used to keep track of the waiter that is waiting for the smallest amount on the ring buffer before being woken up. When a tasks waits on the ring buffer, it passes in a "full" value that is a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to 100% full buffer. As all waiters are on the same wait queue, the wake up happens for the waiter with the smallest percentage. The problem is that the smallest_full on the cpu_buffer that stores the smallest amount doesn't get reset when all the waiters are woken up. It does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace). This means that tasks may be woken up more often then when they want to be. Instead, have the shortest_full field get reset just before waking up all the tasks. If the tasks wait again, they will update the shortest_full before sleeping. Also add locking around setting of shortest_full in the poll logic, and change "work" to "rbwork" to match the variable name for rb_irq_work structures that are used in other places. Link: https://lore.kernel.org/linux-trace-kernel/20240308184007.485732758@goodmis… Cc: stable(a)vger.kernel.org Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 3400f11286e3..aa332ace108b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -755,8 +755,19 @@ static void rb_wake_up_waiters(struct irq_work *work) wake_up_all(&rbwork->waiters); if (rbwork->full_waiters_pending || rbwork->wakeup_full) { + /* Only cpu_buffer sets the above flags */ + struct ring_buffer_per_cpu *cpu_buffer = + container_of(rbwork, struct ring_buffer_per_cpu, irq_work); + + /* Called from interrupt context */ + raw_spin_lock(&cpu_buffer->reader_lock); rbwork->wakeup_full = false; rbwork->full_waiters_pending = false; + + /* Waking up all waiters, they will reset the shortest full */ + cpu_buffer->shortest_full = 0; + raw_spin_unlock(&cpu_buffer->reader_lock); + wake_up_all(&rbwork->full_waiters); } } @@ -934,28 +945,33 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full) { struct ring_buffer_per_cpu *cpu_buffer; - struct rb_irq_work *work; + struct rb_irq_work *rbwork; if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; + rbwork = &buffer->irq_work; full = 0; } else { if (!cpumask_test_cpu(cpu, buffer->cpumask)) return EPOLLERR; cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; + rbwork = &cpu_buffer->irq_work; } if (full) { - poll_wait(filp, &work->full_waiters, poll_table); - work->full_waiters_pending = true; + unsigned long flags; + + poll_wait(filp, &rbwork->full_waiters, poll_table); + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + rbwork->full_waiters_pending = true; if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } else { - poll_wait(filp, &work->waiters, poll_table); - work->waiters_pending = true; + poll_wait(filp, &rbwork->waiters, poll_table); + rbwork->waiters_pending = true; } /* -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH v2 1/6] ring-buffer: Fix waking up ring buffer readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> A task can wait on a ring buffer for when it fills up to a specific watermark. The writer will check the minimum watermark that waiters are waiting for and if the ring buffer is past that, it will wake up all the waiters. The waiters are in a wait loop, and will first check if a signal is pending and then check if the ring buffer is at the desired level where it should break out of the loop. If a file that uses a ring buffer closes, and there's threads waiting on the ring buffer, it needs to wake up those threads. To do this, a "wait_index" was used. Before entering the wait loop, the waiter will read the wait_index. On wakeup, it will check if the wait_index is different than when it entered the loop, and will exit the loop if it is. The waker will only need to update the wait_index before waking up the waiters. This had a couple of bugs. One trivial one and one broken by design. The trivial bug was that the waiter checked the wait_index after the schedule() call. It had to be checked between the prepare_to_wait() and the schedule() which it was not. The main bug is that the first check to set the default wait_index will always be outside the prepare_to_wait() and the schedule(). That's because the ring_buffer_wait() doesn't have enough context to know if it should break out of the loop. The loop itself is not needed, because all the callers to the ring_buffer_wait() also has their own loop, as the callers have a better sense of what the context is to decide whether to break out of the loop or not. Just have the ring_buffer_wait() block once, and if it gets woken up, exit the function and let the callers decide what to do next. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 139 ++++++++++++++++++------------------- 1 file changed, 68 insertions(+), 71 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 0699027b4f4c..3400f11286e3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -384,7 +384,6 @@ struct rb_irq_work { struct irq_work work; wait_queue_head_t waiters; wait_queue_head_t full_waiters; - long wait_index; bool waiters_pending; bool full_waiters_pending; bool wakeup_full; @@ -798,14 +797,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu) rbwork = &cpu_buffer->irq_work; } - rbwork->wait_index++; - /* make sure the waiters see the new index */ - smp_wmb(); - /* This can be called in any context */ irq_work_queue(&rbwork->work); } +static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + bool ret = false; + + /* Reads of all CPUs always waits for any data */ + if (cpu == RING_BUFFER_ALL_CPUS) + return !ring_buffer_empty(buffer); + + cpu_buffer = buffer->buffers[cpu]; + + if (!ring_buffer_empty_cpu(buffer, cpu)) { + unsigned long flags; + bool pagebusy; + + if (!full) + return true; + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; + ret = !pagebusy && full_hit(buffer, cpu, full); + + if (!cpu_buffer->shortest_full || + cpu_buffer->shortest_full > full) + cpu_buffer->shortest_full = full; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + } + return ret; +} + /** * ring_buffer_wait - wait for input to the ring buffer * @buffer: buffer to wait on @@ -821,7 +846,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) struct ring_buffer_per_cpu *cpu_buffer; DEFINE_WAIT(wait); struct rb_irq_work *work; - long wait_index; int ret = 0; /* @@ -840,81 +864,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) work = &cpu_buffer->irq_work; } - wait_index = READ_ONCE(work->wait_index); - - while (true) { - if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); - else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); - - /* - * The events can happen in critical sections where - * checking a work queue can cause deadlocks. - * After adding a task to the queue, this flag is set - * only to notify events to try to wake up the queue - * using irq_work. - * - * We don't clear it even if the buffer is no longer - * empty. The flag only causes the next event to run - * irq_work to do the work queue wake up. The worse - * that can happen if we race with !trace_empty() is that - * an event will cause an irq_work to try to wake up - * an empty queue. - * - * There's no reason to protect this flag either, as - * the work queue and irq_work logic will do the necessary - * synchronization for the wake ups. The only thing - * that is necessary is that the wake up happens after - * a task has been queued. It's OK for spurious wake ups. - */ - if (full) - work->full_waiters_pending = true; - else - work->waiters_pending = true; - - if (signal_pending(current)) { - ret = -EINTR; - break; - } - - if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) - break; - - if (cpu != RING_BUFFER_ALL_CPUS && - !ring_buffer_empty_cpu(buffer, cpu)) { - unsigned long flags; - bool pagebusy; - bool done; - - if (!full) - break; - - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; - done = !pagebusy && full_hit(buffer, cpu, full); + if (full) + prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); + else + prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); - if (!cpu_buffer->shortest_full || - cpu_buffer->shortest_full > full) - cpu_buffer->shortest_full = full; - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - if (done) - break; - } + /* + * The events can happen in critical sections where + * checking a work queue can cause deadlocks. + * After adding a task to the queue, this flag is set + * only to notify events to try to wake up the queue + * using irq_work. + * + * We don't clear it even if the buffer is no longer + * empty. The flag only causes the next event to run + * irq_work to do the work queue wake up. The worse + * that can happen if we race with !trace_empty() is that + * an event will cause an irq_work to try to wake up + * an empty queue. + * + * There's no reason to protect this flag either, as + * the work queue and irq_work logic will do the necessary + * synchronization for the wake ups. The only thing + * that is necessary is that the wake up happens after + * a task has been queued. It's OK for spurious wake ups. + */ + if (full) + work->full_waiters_pending = true; + else + work->waiters_pending = true; - schedule(); + if (rb_watermark_hit(buffer, cpu, full)) + goto out; - /* Make sure to see the new wait index */ - smp_rmb(); - if (wait_index != work->wait_index) - break; + if (signal_pending(current)) { + ret = -EINTR; + goto out; } + schedule(); + out: if (full) finish_wait(&work->full_waiters, &wait); else finish_wait(&work->waiters, &wait); + if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) + ret = -EINTR; + return ret; } -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH 4/6] tracing: Fix waking up tracing readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the tracing_pipe_raw file is closed, if there are readers still blocked on it, they need to be woken up. Currently a wait_index is used. When the readers need to be woken, the index is updated and they are all woken up. But there is a race where a new reader could be coming in just as the file is being closed, and it could still block if the wake up happens just before the reader enters the wait. Add another field called "waking" and wrap both the waking and wait_index around a new wait_mutex to synchronize them. When a reader comes in, it will save the current wait_index, but if waking is set, then it will not block no matter what wait_index is. After it wakes from the wait, if either the waking is set or the wait_index is not the same as what it read before, then it will not block. The waker will set waking and increment the wait_index. For the .flush() function, it will not clear waking so that all new readers must not block. There's an ioctl() that kicks all current waiters, but does not care about new waiters. It will set the waking count back to what it was when it came in. There's still a race with the wait_on_pipe() with respect to the ring_buffer_wait(), but that will be dealt with separately. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/trace_events.h | 3 +- kernel/trace/trace.c | 101 +++++++++++++++++++++++++++++------ 2 files changed, 86 insertions(+), 18 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d68ff9b1247f..adf8e163a7be 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -103,7 +103,8 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; - long wait_index; + int wait_index; + int waking; /* set by a waker */ /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c9c898307348..4e8f6cdeafd5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1955,6 +1955,65 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu) #endif /* CONFIG_TRACER_MAX_TRACE */ +/* + * In order to wake up readers and have them return back to user space, + * the iterator has two counters: + * + * wait_index - always increases every time a waker wakes up the readers. + * waking - Set by the waker when waking and cleared afterward. + * + * Both are protected together with the wait_mutex. + * When waking, the lock is taken and both indexes are incremented. + * The reader will first prepare the wait by taking the lock, + * if waking is set, it will sleep regardless of what wait_index is. + * Then after it sleeps it checks if wait_index has been updated + * and if it has, it will not sleep again. + * + * Note, if wait_woken_clear() is not called, then all new readers + * will not sleep (this happens in closing the file). + */ +static DEFINE_MUTEX(wait_mutex); + +static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index) +{ + bool woken = false; + + mutex_lock(&wait_mutex); + if (iter->waking) + woken = true; + *wait_index = iter->wait_index; + mutex_unlock(&wait_mutex); + + return woken; +} + +static bool wait_woken_check(struct trace_iterator *iter, int *wait_index) +{ + bool woken = false; + + mutex_lock(&wait_mutex); + if (iter->waking || *wait_index != iter->wait_index) + woken = true; + mutex_unlock(&wait_mutex); + + return woken; +} + +static void wait_woken_set(struct trace_iterator *iter) +{ + mutex_lock(&wait_mutex); + iter->waking++; + iter->wait_index++; + mutex_unlock(&wait_mutex); +} + +static void wait_woken_clear(struct trace_iterator *iter) +{ + mutex_lock(&wait_mutex); + iter->waking--; + mutex_unlock(&wait_mutex); +} + static int wait_on_pipe(struct trace_iterator *iter, int full) { int ret; @@ -8312,9 +8371,11 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, struct ftrace_buffer_info *info = filp->private_data; struct trace_iterator *iter = &info->iter; void *trace_data; + int wait_index; int page_size; ssize_t ret = 0; ssize_t size; + bool woken; if (!count) return 0; @@ -8353,6 +8414,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, if (info->read < page_size) goto read; + woken = wait_woken_prepare(iter, &wait_index); again: trace_access_lock(iter->cpu_file); ret = ring_buffer_read_page(iter->array_buffer->buffer, @@ -8362,7 +8424,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, trace_access_unlock(iter->cpu_file); if (ret < 0) { - if (trace_empty(iter)) { + if (trace_empty(iter) && !woken) { if ((filp->f_flags & O_NONBLOCK)) return -EAGAIN; @@ -8370,6 +8432,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, if (ret) return ret; + woken = wait_woken_check(iter, &wait_index); + goto again; } return 0; @@ -8398,12 +8462,14 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id) struct ftrace_buffer_info *info = file->private_data; struct trace_iterator *iter = &info->iter; - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); + wait_woken_set(iter); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + /* + * Do not call wait_woken_clear(), as the file is being closed. + * this will prevent any new readers from sleeping. + */ return 0; } @@ -8500,9 +8566,11 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; + int wait_index; int page_size; int entries, i; ssize_t ret = 0; + bool woken = false; #ifdef CONFIG_TRACER_MAX_TRACE if (iter->snapshot && iter->tr->current_trace->use_max_tr) @@ -8522,6 +8590,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (splice_grow_spd(pipe, &spd)) return -ENOMEM; + woken = wait_woken_prepare(iter, &wait_index); again: trace_access_lock(iter->cpu_file); entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file); @@ -8573,17 +8642,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, /* did we read anything? */ if (!spd.nr_pages) { - long wait_index; if (ret) goto out; + if (woken) + goto out; + ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; - wait_index = READ_ONCE(iter->wait_index); - ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent); if (ret) goto out; @@ -8592,10 +8661,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, if (!tracer_tracing_is_on(iter->tr)) goto out; - /* Make sure we see the new wait_index */ - smp_rmb(); - if (wait_index != iter->wait_index) - goto out; + woken = wait_woken_check(iter, &wait_index); goto again; } @@ -8616,15 +8682,16 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned if (cmd) return -ENOIOCTLCMD; - mutex_lock(&trace_types_lock); - - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); + wait_woken_set(iter); ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); - mutex_unlock(&trace_types_lock); + /* + * This just kicks existing readers, a new reader coming in may + * still sleep. + */ + wait_woken_clear(iter); + return 0; } -- 2.43.0

1 year, 8 months

1
1
0 0

[PATCH v2] can: mcp251xfd: fix infinite loop when xmit fails

by Vitor Soares

From: Vitor Soares <vitor.soares(a)toradex.com> When the mcp251xfd_start_xmit() function fails, the driver stops processing messages and the interrupt routine does not return, running indefinitely even after killing the running application. Error messages: [ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16 [ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3). ... and repeat forever. The issue can be triggered when multiple devices share the same SPI interface. And there is concurrent access to the bus. The problem occurs because tx_ring->head increments even if mcp251xfd_start_xmit() fails. Consequently, the driver skips one TX package while still expecting a response in mcp251xfd_handle_tefif_one(). This patch resolves the issue by decreasing tx_ring->head if mcp251xfd_start_xmit() fails. With the fix, if we trigger the issue and the err = -EBUSY, the driver returns NETDEV_TX_BUSY. The network stack retries to transmit the message. Otherwise, it prints an error and discards the message. Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Cc: stable(a)vger.kernel.org Signed-off-by: Vitor Soares <vitor.soares(a)toradex.com> --- V1->V2: - Return NETDEV_TX_BUSY if mcp251xfd_tx_obj_write() == -EBUSY - Rework the commit message to address the change above - Change can_put_echo_skb() to be called after mcp251xfd_tx_obj_write() succeed. Otherwise, we get Kernel NULL pointer dereference error. drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c | 29 +++++++++++--------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c index 160528d3cc26..0fdaececebdd 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c @@ -181,25 +181,28 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb, tx_obj = mcp251xfd_get_tx_obj_next(tx_ring); mcp251xfd_tx_obj_from_skb(priv, tx_obj, skb, tx_ring->head); - /* Stop queue if we occupy the complete TX FIFO */ tx_head = mcp251xfd_get_tx_head(tx_ring); - tx_ring->head++; - if (mcp251xfd_get_tx_free(tx_ring) == 0) - netif_stop_queue(ndev); - frame_len = can_skb_get_frame_len(skb); - err = can_put_echo_skb(skb, ndev, tx_head, frame_len); - if (!err) - netdev_sent_queue(priv->ndev, frame_len); + + tx_ring->head++; err = mcp251xfd_tx_obj_write(priv, tx_obj); - if (err) - goto out_err; + if (err) { + tx_ring->head--; - return NETDEV_TX_OK; + if (err == -EBUSY) + return NETDEV_TX_BUSY; - out_err: - netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err); + netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err); + } else { + can_put_echo_skb(skb, ndev, tx_head, frame_len); + + /* Stop queue if we occupy the complete TX FIFO */ + if (mcp251xfd_get_tx_free(tx_ring) == 0) + netif_stop_queue(ndev); + + netdev_sent_queue(priv->ndev, frame_len); + } return NETDEV_TX_OK; } -- 2.34.1

1 year, 8 months

2
1
0 0

[PATCH 6/6] tracing/ring-buffer: Fix wait_on_pipe() race

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> When the trace_pipe_raw file is closed, there should be no new readers on the file descriptor. This is mostly handled with the waking and wait_index fields of the iterator. But there's still a slight race. CPU 0 CPU 1 ----- ----- wait_woken_prepare() if (waking) woken = true; index = wait_index; wait_woken_set() waking = true wait_index++; ring_buffer_wake_waiters(); wait_on_pipe() ring_buffer_wait(); The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is that the ring_buffer_wait() needs the logic of: prepare_to_wait(); if (!condition) schedule(); Where the missing condition check is the iter->waking. Either that condition check needs to be passed to ring_buffer_wait() or the function needs to be broken up into three parts. This chooses to do the break up. Break ring_buffer_wait() into: ring_buffer_prepare_to_wait(); ring_buffer_wait(); ring_buffer_finish_wait(); Now wait_on_pipe() can have: ring_buffer_prepare_to_wait(); if (!iter->waking) ring_buffer_wait(); ring_buffer_finish_wait(); And this will catch the above race, as the waiter will either see waking, or already have been woken up. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- include/linux/ring_buffer.h | 4 ++ kernel/trace/ring_buffer.c | 88 ++++++++++++++++++++++++++----------- kernel/trace/trace.c | 14 +++++- 3 files changed, 78 insertions(+), 28 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index fa802db216f9..e5b5903cdc21 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -98,7 +98,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full, + struct wait_queue_entry *wait); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); +void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full, + struct wait_queue_entry *wait); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 856d0e5b0da5..fa7090f6b4fc 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -868,29 +868,29 @@ rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full) } /** - * ring_buffer_wait - wait for input to the ring buffer + * ring_buffer_prepare_to_wait - Prepare to wait for data on the ring buffer * @buffer: buffer to wait on * @cpu: the cpu buffer to wait on - * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @full: wait until the percentage of pages are available, + * if @cpu != RING_BUFFER_ALL_CPUS. It may be updated via this function. + * @wait: The wait queue entry. * - * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon - * as data is added to any of the @buffer's cpu buffers. Otherwise - * it will wait for data to be added to a specific cpu buffer. + * This must be called before ring_buffer_wait(). It calls the prepare_to_wait() + * on @wait for the necessary wait queue defined by @buffer, @cpu, and @full. */ -int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full, + struct wait_queue_entry *wait) { struct rb_irq_work *rbwork; - DEFINE_WAIT(wait); - int ret = 0; - rbwork = rb_get_work_queue(buffer, cpu, &full); + rbwork = rb_get_work_queue(buffer, cpu, full); if (IS_ERR(rbwork)) return PTR_ERR(rbwork); - if (full) - prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE); + if (*full) + prepare_to_wait(&rbwork->full_waiters, wait, TASK_INTERRUPTIBLE); else - prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(&rbwork->waiters, wait, TASK_INTERRUPTIBLE); /* * The events can happen in critical sections where @@ -912,30 +912,66 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * that is necessary is that the wake up happens after * a task has been queued. It's OK for spurious wake ups. */ - if (full) + if (*full) rbwork->full_waiters_pending = true; else rbwork->waiters_pending = true; - if (rb_watermark_hit(buffer, cpu, full)) - goto out; + return 0; +} - if (signal_pending(current)) { - ret = -EINTR; - goto out; - } +/** + * ring_buffer_finish_wait - clean up of ring_buffer_prepare_to_wait() + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * @wait: The wait queue entry. + * + * This must be called after ring_buffer_prepare_to_wait(). It cleans up + * the @wait for the queue defined by @buffer, @cpu, and @full. + */ +void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full, + struct wait_queue_entry *wait) +{ + struct rb_irq_work *rbwork; + + rbwork = rb_get_work_queue(buffer, cpu, &full); + if (WARN_ON_ONCE(IS_ERR(rbwork))) + return; - schedule(); - out: if (full) - finish_wait(&rbwork->full_waiters, &wait); + finish_wait(&rbwork->full_waiters, wait); else - finish_wait(&rbwork->waiters, &wait); + finish_wait(&rbwork->waiters, wait); +} - if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) - ret = -EINTR; +/** + * ring_buffer_wait - wait for input to the ring buffer + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS + * + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon + * as data is added to any of the @buffer's cpu buffers. Otherwise + * it will wait for data to be added to a specific cpu buffer. + * + * ring_buffer_prepare_to_wait() must be called before this function + * and ring_buffer_finish_wait() must be called after. + */ +int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) +{ + if (rb_watermark_hit(buffer, cpu, full)) + return 0; - return ret; + if (signal_pending(current)) + return -EINTR; + + schedule(); + + if (!rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) + return -EINTR; + + return 0; } /** diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 4e8f6cdeafd5..790ce3ba2acb 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1981,7 +1981,8 @@ static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index) mutex_lock(&wait_mutex); if (iter->waking) woken = true; - *wait_index = iter->wait_index; + if (wait_index) + *wait_index = iter->wait_index; mutex_unlock(&wait_mutex); return woken; @@ -2016,13 +2017,22 @@ static void wait_woken_clear(struct trace_iterator *iter) static int wait_on_pipe(struct trace_iterator *iter, int full) { + struct trace_buffer *buffer; + DEFINE_WAIT(wait); int ret; /* Iterators are static, they should be filled or empty */ if (trace_buffer_iter(iter, iter->cpu_file)) return 0; - ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full); + buffer = iter->array_buffer->buffer; + + ret = ring_buffer_prepare_to_wait(buffer, iter->cpu_file, &full, &wait); + if (ret < 0) + return ret; + if (!wait_woken_prepare(iter, NULL)) + ret = ring_buffer_wait(buffer, iter->cpu_file, full); + ring_buffer_finish_wait(buffer, iter->cpu_file, full, &wait); #ifdef CONFIG_TRACER_MAX_TRACE /* -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH 5/6] ring-buffer: Restructure ring_buffer_wait() to prepare for updates

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The ring_buffer_wait() needs to be broken into three functions for proper synchronization from the context of the callers: ring_buffer_prepare_to_wait() ring_buffer_wait() ring_buffer_finish_wait() To simplify the process, pull out the logic for getting the right work queue to wait on, as it will be needed for the above functions. There are three work queues depending on the cpu value. If cpu == RING_BUFFER_ALL_CPUS, then the main "buffer->irq_work" is used. Otherwise, the cpu_buffer representing the CPU buffer's irq_work is used. Create a rb_get_work_queue() helper function to retrieve the proper queue. Also rename "work" to "rbwork" as the variable point to struct rb_irq_work, and to be more consistent with the variable naming elsewhere in the file. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 58 +++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 23 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index aa332ace108b..856d0e5b0da5 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -842,6 +842,31 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) return ret; } +static struct rb_irq_work * +rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + struct rb_irq_work *rbwork; + + /* + * Depending on what the caller is waiting for, either any + * data in any cpu buffer, or a specific buffer, put the + * caller on the appropriate wait queue. + */ + if (cpu == RING_BUFFER_ALL_CPUS) { + rbwork = &buffer->irq_work; + /* Full only makes sense on per cpu reads */ + *full = 0; + } else { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) + return ERR_PTR(-ENODEV); + cpu_buffer = buffer->buffers[cpu]; + rbwork = &cpu_buffer->irq_work; + } + + return rbwork; +} + /** * ring_buffer_wait - wait for input to the ring buffer * @buffer: buffer to wait on @@ -854,31 +879,18 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) */ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) { - struct ring_buffer_per_cpu *cpu_buffer; + struct rb_irq_work *rbwork; DEFINE_WAIT(wait); - struct rb_irq_work *work; int ret = 0; - /* - * Depending on what the caller is waiting for, either any - * data in any cpu buffer, or a specific buffer, put the - * caller on the appropriate wait queue. - */ - if (cpu == RING_BUFFER_ALL_CPUS) { - work = &buffer->irq_work; - /* Full only makes sense on per cpu reads */ - full = 0; - } else { - if (!cpumask_test_cpu(cpu, buffer->cpumask)) - return -ENODEV; - cpu_buffer = buffer->buffers[cpu]; - work = &cpu_buffer->irq_work; - } + rbwork = rb_get_work_queue(buffer, cpu, &full); + if (IS_ERR(rbwork)) + return PTR_ERR(rbwork); if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE); else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); + prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE); /* * The events can happen in critical sections where @@ -901,9 +913,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * a task has been queued. It's OK for spurious wake ups. */ if (full) - work->full_waiters_pending = true; + rbwork->full_waiters_pending = true; else - work->waiters_pending = true; + rbwork->waiters_pending = true; if (rb_watermark_hit(buffer, cpu, full)) goto out; @@ -916,9 +928,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) schedule(); out: if (full) - finish_wait(&work->full_waiters, &wait); + finish_wait(&rbwork->full_waiters, &wait); else - finish_wait(&work->waiters, &wait); + finish_wait(&rbwork->waiters, &wait); if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) ret = -EINTR; -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH 3/6] tracing: Use .flush() call to wake up readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> The .release() function does not get called until all readers of a file descriptor are finished. If a thread is blocked on reading a file descriptor in ring_buffer_wait(), and another thread closes the file descriptor, it will not wake up the other thread as ring_buffer_wake_waiters() is called by .release(), and that will not get called until the .read() is finished. The issue originally showed up in trace-cmd, but the readers are actually other processes with their own file descriptors. So calling close() would wake up the other tasks because they are blocked on another descriptor then the one that was closed(). But there's other wake ups that solve that issue. When a thread is blocked on a read, it can still hang even when another thread closed its descriptor. This is what the .flush() callback is for. Have the .flush() wake up the readers. Cc: stable(a)vger.kernel.org Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index d16b95ca58a7..c9c898307348 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8393,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, return size; } +static int tracing_buffers_flush(struct file *file, fl_owner_t id) +{ + struct ftrace_buffer_info *info = file->private_data; + struct trace_iterator *iter = &info->iter; + + iter->wait_index++; + /* Make sure the waiters see the new wait_index */ + smp_wmb(); + + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + + return 0; +} + static int tracing_buffers_release(struct inode *inode, struct file *file) { struct ftrace_buffer_info *info = file->private_data; @@ -8404,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file) __trace_array_put(iter->tr); - iter->wait_index++; - /* Make sure the waiters see the new wait_index */ - smp_wmb(); - - ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); - if (info->spare) ring_buffer_free_read_page(iter->array_buffer->buffer, info->spare_cpu, info->spare); @@ -8625,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = { .read = tracing_buffers_read, .poll = tracing_buffers_poll, .release = tracing_buffers_release, + .flush = tracing_buffers_flush, .splice_read = tracing_buffers_splice_read, .unlocked_ioctl = tracing_buffers_ioctl, .llseek = no_llseek, -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH 1/6] ring-buffer: Fix waking up ring buffer readers

by Steven Rostedt

From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> A task can wait on a ring buffer for when it fills up to a specific watermark. The writer will check the minimum watermark that waiters are waiting for and if the ring buffer is past that, it will wake up all the waiters. The waiters are in a wait loop, and will first check if a signal is pending and then check if the ring buffer is at the desired level where it should break out of the loop. If a file that uses a ring buffer closes, and there's threads waiting on the ring buffer, it needs to wake up those threads. To do this, a "wait_index" was used. Before entering the wait loop, the waiter will read the wait_index. On wakeup, it will check if the wait_index is different than when it entered the loop, and will exit the loop if it is. The waker will only need to update the wait_index before waking up the waiters. This had a couple of bugs. One trivial one and one broken by design. The trivial bug was that the waiter checked the wait_index after the schedule() call. It had to be checked between the prepare_to_wait() and the schedule() which it was not. The main bug is that the first check to set the default wait_index will always be outside the prepare_to_wait() and the schedule(). That's because the ring_buffer_wait() doesn't have enough context to know if it should break out of the loop. The loop itself is not needed, because all the callers to the ring_buffer_wait() also has their own loop, as the callers have a better sense of what the context is to decide whether to break out of the loop or not. Just have the ring_buffer_wait() block once, and if it gets woken up, exit the function and let the callers decide what to do next. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS… Cc: stable(a)vger.kernel.org Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ring_buffer.c | 139 ++++++++++++++++++------------------- 1 file changed, 68 insertions(+), 71 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 0699027b4f4c..3400f11286e3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -384,7 +384,6 @@ struct rb_irq_work { struct irq_work work; wait_queue_head_t waiters; wait_queue_head_t full_waiters; - long wait_index; bool waiters_pending; bool full_waiters_pending; bool wakeup_full; @@ -798,14 +797,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu) rbwork = &cpu_buffer->irq_work; } - rbwork->wait_index++; - /* make sure the waiters see the new index */ - smp_wmb(); - /* This can be called in any context */ irq_work_queue(&rbwork->work); } +static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer; + bool ret = false; + + /* Reads of all CPUs always waits for any data */ + if (cpu == RING_BUFFER_ALL_CPUS) + return !ring_buffer_empty(buffer); + + cpu_buffer = buffer->buffers[cpu]; + + if (!ring_buffer_empty_cpu(buffer, cpu)) { + unsigned long flags; + bool pagebusy; + + if (!full) + return true; + + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; + ret = !pagebusy && full_hit(buffer, cpu, full); + + if (!cpu_buffer->shortest_full || + cpu_buffer->shortest_full > full) + cpu_buffer->shortest_full = full; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + } + return ret; +} + /** * ring_buffer_wait - wait for input to the ring buffer * @buffer: buffer to wait on @@ -821,7 +846,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) struct ring_buffer_per_cpu *cpu_buffer; DEFINE_WAIT(wait); struct rb_irq_work *work; - long wait_index; int ret = 0; /* @@ -840,81 +864,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) work = &cpu_buffer->irq_work; } - wait_index = READ_ONCE(work->wait_index); - - while (true) { - if (full) - prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); - else - prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); - - /* - * The events can happen in critical sections where - * checking a work queue can cause deadlocks. - * After adding a task to the queue, this flag is set - * only to notify events to try to wake up the queue - * using irq_work. - * - * We don't clear it even if the buffer is no longer - * empty. The flag only causes the next event to run - * irq_work to do the work queue wake up. The worse - * that can happen if we race with !trace_empty() is that - * an event will cause an irq_work to try to wake up - * an empty queue. - * - * There's no reason to protect this flag either, as - * the work queue and irq_work logic will do the necessary - * synchronization for the wake ups. The only thing - * that is necessary is that the wake up happens after - * a task has been queued. It's OK for spurious wake ups. - */ - if (full) - work->full_waiters_pending = true; - else - work->waiters_pending = true; - - if (signal_pending(current)) { - ret = -EINTR; - break; - } - - if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) - break; - - if (cpu != RING_BUFFER_ALL_CPUS && - !ring_buffer_empty_cpu(buffer, cpu)) { - unsigned long flags; - bool pagebusy; - bool done; - - if (!full) - break; - - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; - done = !pagebusy && full_hit(buffer, cpu, full); + if (full) + prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE); + else + prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); - if (!cpu_buffer->shortest_full || - cpu_buffer->shortest_full > full) - cpu_buffer->shortest_full = full; - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - if (done) - break; - } + /* + * The events can happen in critical sections where + * checking a work queue can cause deadlocks. + * After adding a task to the queue, this flag is set + * only to notify events to try to wake up the queue + * using irq_work. + * + * We don't clear it even if the buffer is no longer + * empty. The flag only causes the next event to run + * irq_work to do the work queue wake up. The worse + * that can happen if we race with !trace_empty() is that + * an event will cause an irq_work to try to wake up + * an empty queue. + * + * There's no reason to protect this flag either, as + * the work queue and irq_work logic will do the necessary + * synchronization for the wake ups. The only thing + * that is necessary is that the wake up happens after + * a task has been queued. It's OK for spurious wake ups. + */ + if (full) + work->full_waiters_pending = true; + else + work->waiters_pending = true; - schedule(); + if (rb_watermark_hit(buffer, cpu, full)) + goto out; - /* Make sure to see the new wait index */ - smp_rmb(); - if (wait_index != work->wait_index) - break; + if (signal_pending(current)) { + ret = -EINTR; + goto out; } + schedule(); + out: if (full) finish_wait(&work->full_waiters, &wait); else finish_wait(&work->waiters, &wait); + if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current)) + ret = -EINTR; + return ret; } -- 2.43.0

1 year, 8 months

1
0
0 0

[PATCH 5/8] drm/imx/ipuv3: do not return negative values from .get_modes()

by Jani Nikula

The .get_modes() hooks aren't supposed to return negative error codes. Return 0 for no modes, whatever the reason. Cc: Philipp Zabel <p.zabel(a)pengutronix.de> Cc: stable(a)vger.kernel.org Signed-off-by: Jani Nikula <jani.nikula(a)intel.com> --- drivers/gpu/drm/imx/ipuv3/parallel-display.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/imx/ipuv3/parallel-display.c b/drivers/gpu/drm/imx/ipuv3/parallel-display.c index 70349739dd89..55dedd73f528 100644 --- a/drivers/gpu/drm/imx/ipuv3/parallel-display.c +++ b/drivers/gpu/drm/imx/ipuv3/parallel-display.c @@ -72,14 +72,14 @@ static int imx_pd_connector_get_modes(struct drm_connector *connector) int ret; if (!mode) - return -EINVAL; + return 0; ret = of_get_drm_display_mode(np, &imxpd->mode, &imxpd->bus_flags, OF_USE_NATIVE_MODE); if (ret) { drm_mode_destroy(connector->dev, mode); - return ret; + return 0; } drm_mode_copy(mode, &imxpd->mode); -- 2.39.2

1 year, 8 months

2
1
0 0

[PATCH 6/8] drm/vc4: hdmi: do not return negative values from .get_modes()

by Jani Nikula

The .get_modes() hooks aren't supposed to return negative error codes. Return 0 for no modes, whatever the reason. Cc: Maxime Ripard <mripard(a)kernel.org> Cc: stable(a)vger.kernel.org Signed-off-by: Jani Nikula <jani.nikula(a)intel.com> --- drivers/gpu/drm/vc4/vc4_hdmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 34f807ed1c31..d8751ea20303 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -509,7 +509,7 @@ static int vc4_hdmi_connector_get_modes(struct drm_connector *connector) edid = drm_get_edid(connector, vc4_hdmi->ddc); cec_s_phys_addr_from_edid(vc4_hdmi->cec_adap, edid); if (!edid) - return -ENODEV; + return 0; drm_connector_update_edid_property(connector, edid); ret = drm_add_edid_modes(connector, edid); -- 2.39.2

1 year, 8 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror