From: Aradhya Bhatia <a-bhatia1(a)ti.com>
Once the DSI Link and DSI Phy are initialized, the code needs to wait
for Clk and Data Lanes to be ready, before continuing configuration.
This is in accordance with the DSI Start-up procedure, found in the
Technical Reference Manual of Texas Instrument's J721E SoC[0] which
houses this DSI TX controller.
If the previous bridge (or crtc/encoder) are configured pre-maturely,
the input signal FIFO gets corrupt. This introduces a color-shift on the
display.
Allow the driver to wait for the clk and data lanes to get ready during
DSI enable.
[0]: See section 12.6.5.7.3 "Start-up Procedure" in J721E SoC TRM
TRM Link: http://www.ti.com/lit/pdf/spruil1
Fixes: e19233955d9e ("drm/bridge: Add Cadence DSI driver")
Cc: Stable List <stable(a)vger.kernel.org>
Tested-by: Dominik Haller <d.haller(a)phytec.de>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Signed-off-by: Aradhya Bhatia <a-bhatia1(a)ti.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia(a)linux.dev>
---
drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
index 87921a748cdb..6a77ca36cb9d 100644
--- a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
+++ b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
@@ -769,7 +769,7 @@ static void cdns_dsi_bridge_enable(struct drm_bridge *bridge)
struct phy_configure_opts_mipi_dphy *phy_cfg = &output->phy_opts.mipi_dphy;
unsigned long tx_byte_period;
struct cdns_dsi_cfg dsi_cfg;
- u32 tmp, reg_wakeup, div;
+ u32 tmp, reg_wakeup, div, status;
int nlanes;
if (WARN_ON(pm_runtime_get_sync(dsi->base.dev) < 0))
@@ -786,6 +786,19 @@ static void cdns_dsi_bridge_enable(struct drm_bridge *bridge)
cdns_dsi_hs_init(dsi);
cdns_dsi_init_link(dsi);
+ /*
+ * Now that the DSI Link and DSI Phy are initialized,
+ * wait for the CLK and Data Lanes to be ready.
+ */
+ tmp = CLK_LANE_RDY;
+ for (int i = 0; i < nlanes; i++)
+ tmp |= DATA_LANE_RDY(i);
+
+ if (readl_poll_timeout(dsi->regs + MCTL_MAIN_STS, status,
+ (tmp == (status & tmp)), 100, 500000))
+ dev_err(dsi->base.dev,
+ "Timed Out: DSI-DPhy Clock and Data Lanes not ready.\n");
+
writel(HBP_LEN(dsi_cfg.hbp) | HSA_LEN(dsi_cfg.hsa),
dsi->regs + VID_HSIZE1);
writel(HFP_LEN(dsi_cfg.hfp) | HACT_LEN(dsi_cfg.hact),
--
2.34.1
From: Aradhya Bhatia <a-bhatia1(a)ti.com>
The crtc_* mode parameters do not get generated (duplicated in this
case) from the regular parameters before the mode validation phase
begins.
The rest of the code conditionally uses the crtc_* parameters only
during the bridge enable phase, but sticks to the regular parameters
for mode validation. In this singular instance, however, the driver
tries to use the crtc_clock parameter even during the mode validation,
causing the validation to fail.
Allow the D-Phy config checks to use mode->clock instead of
mode->crtc_clock during mode_valid checks, like everywhere else in the
driver.
Fixes: fced5a364dee ("drm/bridge: cdns: Convert to phy framework")
Cc: Stable List <stable(a)vger.kernel.org>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Aradhya Bhatia <a-bhatia1(a)ti.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia(a)linux.dev>
---
drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
index b0a1a6774ea6..19cc8734a4c8 100644
--- a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
+++ b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
@@ -568,13 +568,14 @@ static int cdns_dsi_check_conf(struct cdns_dsi *dsi,
struct phy_configure_opts_mipi_dphy *phy_cfg = &output->phy_opts.mipi_dphy;
unsigned long dsi_hss_hsa_hse_hbp;
unsigned int nlanes = output->dev->lanes;
+ int mode_clock = (mode_valid_check ? mode->clock : mode->crtc_clock);
int ret;
ret = cdns_dsi_mode2cfg(dsi, mode, dsi_cfg, mode_valid_check);
if (ret)
return ret;
- phy_mipi_dphy_get_default_config(mode->crtc_clock * 1000,
+ phy_mipi_dphy_get_default_config(mode_clock * 1000,
mipi_dsi_pixel_format_to_bpp(output->dev->format),
nlanes, phy_cfg);
--
2.34.1
From: Thomas Gleixner <tglx(a)linutronix.de>
commit fe513c2ef0a172a58f158e2e70465c4317f0a9a2 upstream.
static_call_module_notify() triggers a WARN_ON(), when memory allocation
fails in __static_call_add_module().
That's not really justified, because the failure case must be correctly
handled by the well known call chain and the error code is passed
through to the initiating userspace application.
A memory allocation fail is not a fatal problem, but the WARN_ON() takes
the machine out when panic_on_warn is set.
Replace it with a pr_warn().
Fixes: 9183c3f9ed71 ("static_call: Add inline static call infrastructure")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lkml.kernel.org/r/8734mf7pmb.ffs@tglx
Signed-off-by: Alexey Nepomnyashih <sdl(a)nppct.ru>
---
Backport to fix CVE-2024-49954
Link: https://nvd.nist.gov/vuln/detail/CVE-2024-49954
---
kernel/static_call.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/static_call.c b/kernel/static_call.c
index e9408409eb46..a008250e2533 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -431,7 +431,7 @@ static int static_call_module_notify(struct notifier_block *nb,
case MODULE_STATE_COMING:
ret = static_call_add_module(mod);
if (ret) {
- WARN(1, "Failed to allocate memory for static calls");
+ pr_warn("Failed to allocate memory for static calls\n");
static_call_del_module(mod);
}
break;
--
2.43.0
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[ To fix CVE-2024-27041, I fix the merge conflict by still using macro
CONFIG_DRM_AMD_DC_HDCP in the first adev->dm.dc check. ]
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8dc0f70df24f..7b4d44dcb343 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1882,14 +1882,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
[ commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream ]
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0b2591c07166..53f1deb049ec 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5264,6 +5264,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5514,7 +5516,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
sb->s_bdev->bd_super = sb;
--
2.25.1
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
[ commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream ]
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 71ced0ada9a2..f019ce64eba4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5366,6 +5366,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5586,7 +5588,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
--
2.25.1
From: David Woodhouse <dwmw(a)amazon.co.uk>
[ Upstream commit 4b5bc2ec9a239bce261ffeafdd63571134102323 ]
Now that the following fix:
d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
stops kernel_ident_mapping_init() from scribbling over the end of a
4KiB PGD by assuming the following 4KiB will be a userspace PGD,
there's no good reason for the kexec PGD to be part of a single
8KiB allocation with the control_code_page.
( It's not clear that that was the reason for x86_64 kexec doing it that
way in the first place either; there were no comments to that effect and
it seems to have been the case even before PTI came along. It looks like
it was just a happy accident which prevented memory corruption on kexec. )
Either way, it definitely isn't needed now. Just allocate the PGD
separately on x86_64, like i386 already does.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-6-dwmw2@infradead.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/include/asm/kexec.h | 18 +++++++++---
arch/x86/kernel/machine_kexec_64.c | 45 ++++++++++++++++--------------
2 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 256eee99afc8f..e2e1ec99c9998 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -16,6 +16,7 @@
# define PAGES_NR 4
#endif
+# define KEXEC_CONTROL_PAGE_SIZE 4096
# define KEXEC_CONTROL_CODE_MAX_SIZE 2048
#ifndef __ASSEMBLY__
@@ -44,7 +45,6 @@ struct kimage;
/* Maximum address we can use for the control code buffer */
# define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
-# define KEXEC_CONTROL_PAGE_SIZE 4096
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_386
@@ -59,9 +59,6 @@ struct kimage;
/* Maximum address we can use for the control pages */
# define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
-/* Allocate one page for the pdp and the second for the code */
-# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL)
-
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_X86_64
#endif
@@ -146,6 +143,19 @@ struct kimage_arch {
};
#else
struct kimage_arch {
+ /*
+ * This is a kimage control page, as it must not overlap with either
+ * source or destination address ranges.
+ */
+ pgd_t *pgd;
+ /*
+ * The virtual mapping of the control code page itself is used only
+ * during the transition, while the current kernel's pages are all
+ * in place. Thus the intermediate page table pages used to map it
+ * are not control pages, but instead just normal pages obtained
+ * with get_zeroed_page(). And have to be tracked (below) so that
+ * they can be freed.
+ */
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 24b6eaacc81eb..5d61a342871b5 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -149,7 +149,8 @@ static void free_transition_pgtable(struct kimage *image)
image->arch.pte = NULL;
}
-static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd,
+ unsigned long control_page)
{
pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
unsigned long vaddr, paddr;
@@ -160,7 +161,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
pte_t *pte;
vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+ paddr = control_page;
pgd += pgd_index(vaddr);
if (!pgd_present(*pgd)) {
p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
@@ -219,7 +220,7 @@ static void *alloc_pgt_page(void *data)
return p;
}
-static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
+static int init_pgtable(struct kimage *image, unsigned long control_page)
{
struct x86_mapping_info info = {
.alloc_pgt_page = alloc_pgt_page,
@@ -228,12 +229,12 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
.kernpg_flag = _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
- pgd_t *level4p;
int result;
int i;
- level4p = (pgd_t *)__va(start_pgtable);
- clear_page(level4p);
+ image->arch.pgd = alloc_pgt_page(image);
+ if (!image->arch.pgd)
+ return -ENOMEM;
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
info.page_flag |= _PAGE_ENC;
@@ -247,8 +248,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
}
@@ -263,8 +264,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = image->segment[i].mem;
mend = mstart + image->segment[i].memsz;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
@@ -274,15 +275,19 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
* Prepare EFI systab and ACPI tables for kexec kernel since they are
* not covered by pfn_mapped.
*/
- result = map_efi_systab(&info, level4p);
+ result = map_efi_systab(&info, image->arch.pgd);
if (result)
return result;
- result = map_acpi_tables(&info, level4p);
+ result = map_acpi_tables(&info, image->arch.pgd);
if (result)
return result;
- return init_transition_pgtable(image, level4p);
+ /*
+ * This must be last because the intermediate page table pages it
+ * allocates will not be control pages and may overlap the image.
+ */
+ return init_transition_pgtable(image, image->arch.pgd, control_page);
}
static void load_segments(void)
@@ -299,14 +304,14 @@ static void load_segments(void)
int machine_kexec_prepare(struct kimage *image)
{
- unsigned long start_pgtable;
+ unsigned long control_page;
int result;
/* Calculate the offsets */
- start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
+ control_page = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
/* Setup the identity mapped 64bit page table */
- result = init_pgtable(image, start_pgtable);
+ result = init_pgtable(image, control_page);
if (result)
return result;
@@ -353,13 +358,12 @@ void machine_kexec(struct kimage *image)
#endif
}
- control_page = page_address(image->control_code_page) + PAGE_SIZE;
+ control_page = page_address(image->control_code_page);
__memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
- page_list[PA_TABLE_PAGE] =
- (unsigned long)__pa(page_address(image->control_code_page));
+ page_list[PA_TABLE_PAGE] = (unsigned long)__pa(image->arch.pgd);
if (image->type == KEXEC_TYPE_DEFAULT)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
@@ -578,8 +582,7 @@ static void kexec_mark_crashkres(bool protect)
/* Don't touch the control code page used in crash_kexec().*/
control = PFN_PHYS(page_to_pfn(kexec_crash_image->control_code_page));
- /* Control code page is located in the 2nd page. */
- kexec_mark_range(crashk_res.start, control + PAGE_SIZE - 1, protect);
+ kexec_mark_range(crashk_res.start, control - 1, protect);
control += KEXEC_CONTROL_PAGE_SIZE;
kexec_mark_range(control, crashk_res.end, protect);
}
--
2.39.5
From: David Woodhouse <dwmw(a)amazon.co.uk>
[ Upstream commit 4b5bc2ec9a239bce261ffeafdd63571134102323 ]
Now that the following fix:
d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
stops kernel_ident_mapping_init() from scribbling over the end of a
4KiB PGD by assuming the following 4KiB will be a userspace PGD,
there's no good reason for the kexec PGD to be part of a single
8KiB allocation with the control_code_page.
( It's not clear that that was the reason for x86_64 kexec doing it that
way in the first place either; there were no comments to that effect and
it seems to have been the case even before PTI came along. It looks like
it was just a happy accident which prevented memory corruption on kexec. )
Either way, it definitely isn't needed now. Just allocate the PGD
separately on x86_64, like i386 already does.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-6-dwmw2@infradead.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/include/asm/kexec.h | 18 +++++++++---
arch/x86/kernel/machine_kexec_64.c | 45 ++++++++++++++++--------------
2 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index c9f6a6c5de3cf..d54dd7084a3e1 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -16,6 +16,7 @@
# define PAGES_NR 4
#endif
+# define KEXEC_CONTROL_PAGE_SIZE 4096
# define KEXEC_CONTROL_CODE_MAX_SIZE 2048
#ifndef __ASSEMBLY__
@@ -44,7 +45,6 @@ struct kimage;
/* Maximum address we can use for the control code buffer */
# define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
-# define KEXEC_CONTROL_PAGE_SIZE 4096
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_386
@@ -59,9 +59,6 @@ struct kimage;
/* Maximum address we can use for the control pages */
# define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
-/* Allocate one page for the pdp and the second for the code */
-# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL)
-
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_X86_64
#endif
@@ -146,6 +143,19 @@ struct kimage_arch {
};
#else
struct kimage_arch {
+ /*
+ * This is a kimage control page, as it must not overlap with either
+ * source or destination address ranges.
+ */
+ pgd_t *pgd;
+ /*
+ * The virtual mapping of the control code page itself is used only
+ * during the transition, while the current kernel's pages are all
+ * in place. Thus the intermediate page table pages used to map it
+ * are not control pages, but instead just normal pages obtained
+ * with get_zeroed_page(). And have to be tracked (below) so that
+ * they can be freed.
+ */
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 2fa12d1dc6760..8509d809a9f1b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -149,7 +149,8 @@ static void free_transition_pgtable(struct kimage *image)
image->arch.pte = NULL;
}
-static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd,
+ unsigned long control_page)
{
pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
unsigned long vaddr, paddr;
@@ -160,7 +161,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
pte_t *pte;
vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+ paddr = control_page;
pgd += pgd_index(vaddr);
if (!pgd_present(*pgd)) {
p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
@@ -219,7 +220,7 @@ static void *alloc_pgt_page(void *data)
return p;
}
-static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
+static int init_pgtable(struct kimage *image, unsigned long control_page)
{
struct x86_mapping_info info = {
.alloc_pgt_page = alloc_pgt_page,
@@ -228,12 +229,12 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
.kernpg_flag = _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
- pgd_t *level4p;
int result;
int i;
- level4p = (pgd_t *)__va(start_pgtable);
- clear_page(level4p);
+ image->arch.pgd = alloc_pgt_page(image);
+ if (!image->arch.pgd)
+ return -ENOMEM;
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
info.page_flag |= _PAGE_ENC;
@@ -247,8 +248,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
}
@@ -263,8 +264,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = image->segment[i].mem;
mend = mstart + image->segment[i].memsz;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
@@ -274,15 +275,19 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
* Prepare EFI systab and ACPI tables for kexec kernel since they are
* not covered by pfn_mapped.
*/
- result = map_efi_systab(&info, level4p);
+ result = map_efi_systab(&info, image->arch.pgd);
if (result)
return result;
- result = map_acpi_tables(&info, level4p);
+ result = map_acpi_tables(&info, image->arch.pgd);
if (result)
return result;
- return init_transition_pgtable(image, level4p);
+ /*
+ * This must be last because the intermediate page table pages it
+ * allocates will not be control pages and may overlap the image.
+ */
+ return init_transition_pgtable(image, image->arch.pgd, control_page);
}
static void load_segments(void)
@@ -299,14 +304,14 @@ static void load_segments(void)
int machine_kexec_prepare(struct kimage *image)
{
- unsigned long start_pgtable;
+ unsigned long control_page;
int result;
/* Calculate the offsets */
- start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
+ control_page = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
/* Setup the identity mapped 64bit page table */
- result = init_pgtable(image, start_pgtable);
+ result = init_pgtable(image, control_page);
if (result)
return result;
@@ -360,13 +365,12 @@ void machine_kexec(struct kimage *image)
#endif
}
- control_page = page_address(image->control_code_page) + PAGE_SIZE;
+ control_page = page_address(image->control_code_page);
__memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
- page_list[PA_TABLE_PAGE] =
- (unsigned long)__pa(page_address(image->control_code_page));
+ page_list[PA_TABLE_PAGE] = (unsigned long)__pa(image->arch.pgd);
if (image->type == KEXEC_TYPE_DEFAULT)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
@@ -574,8 +578,7 @@ static void kexec_mark_crashkres(bool protect)
/* Don't touch the control code page used in crash_kexec().*/
control = PFN_PHYS(page_to_pfn(kexec_crash_image->control_code_page));
- /* Control code page is located in the 2nd page. */
- kexec_mark_range(crashk_res.start, control + PAGE_SIZE - 1, protect);
+ kexec_mark_range(crashk_res.start, control - 1, protect);
control += KEXEC_CONTROL_PAGE_SIZE;
kexec_mark_range(control, crashk_res.end, protect);
}
--
2.39.5
From: Bard Liao <yung-chuan.liao(a)linux.intel.com>
[ Upstream commit 569922b82ca660f8b24e705f6cf674e6b1f99cc7 ]
Each cpu DAI should associate with a widget. However, the topology might
not create the right number of DAI widgets for aggregated amps. And it
will cause NULL pointer deference.
Check that the DAI widget associated with the CPU DAI is valid to prevent
NULL pointer deference due to missing DAI widgets in topologies with
aggregated amps.
Signed-off-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood(a)intel.com>
Link: https://patch.msgid.link/20241203104853.56956-1-yung-chuan.liao@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 12 ++++++++++++
sound/soc/sof/intel/hda.c | 5 +++++
2 files changed, 17 insertions(+)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index 82f46ecd94301..2e58a264da556 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -503,6 +503,12 @@ int sdw_hda_dai_hw_params(struct snd_pcm_substream *substream,
int ret;
int i;
+ if (!w) {
+ dev_err(cpu_dai->dev, "%s widget not found, check amp link num in the topology\n",
+ cpu_dai->name);
+ return -EINVAL;
+ }
+
ops = hda_dai_get_ops(substream, cpu_dai);
if (!ops) {
dev_err(cpu_dai->dev, "DAI widget ops not set\n");
@@ -582,6 +588,12 @@ int sdw_hda_dai_hw_params(struct snd_pcm_substream *substream,
*/
for_each_rtd_cpu_dais(rtd, i, dai) {
w = snd_soc_dai_get_widget(dai, substream->stream);
+ if (!w) {
+ dev_err(cpu_dai->dev,
+ "%s widget not found, check amp link num in the topology\n",
+ dai->name);
+ return -EINVAL;
+ }
ipc4_copier = widget_to_copier(w);
memcpy(&ipc4_copier->dma_config_tlv[cpu_dai_id], dma_config_tlv,
sizeof(*dma_config_tlv));
diff --git a/sound/soc/sof/intel/hda.c b/sound/soc/sof/intel/hda.c
index 70fc08c8fc99e..f10ed4d102501 100644
--- a/sound/soc/sof/intel/hda.c
+++ b/sound/soc/sof/intel/hda.c
@@ -63,6 +63,11 @@ static int sdw_params_stream(struct device *dev,
struct snd_soc_dapm_widget *w = snd_soc_dai_get_widget(d, params_data->substream->stream);
struct snd_sof_dai_config_data data = { 0 };
+ if (!w) {
+ dev_err(dev, "%s widget not found, check amp link num in the topology\n",
+ d->name);
+ return -EINVAL;
+ }
data.dai_index = (params_data->link_id << 8) | d->id;
data.dai_data = params_data->alh_stream_id;
data.dai_node_id = data.dai_data;
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0adce9bf7a1e5..87cc7d778c3cf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -636,14 +636,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2838,7 +2842,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c34c6f0d23efe..52ea9f81d388b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -586,14 +586,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2772,7 +2776,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 959ca6b9cd138..a85f743aa1573 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -575,14 +575,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2718,7 +2722,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ea98d93138c12..a6c9f9062dbd4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -574,14 +574,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2767,7 +2771,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51ac62637e4ed..39ce8a3d8c573 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -176,13 +176,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7cf45d506688c..42dad8c8d6f28 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -279,13 +279,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ed92b75f7e024..691af58c156e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -642,13 +642,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 54af671e8d510..3927904984fd5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -704,13 +704,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 86606fb9e6bc6..c686d826a91cf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -726,13 +726,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d07dc87787dff..9015c792830c2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -766,13 +766,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Suleiman Souhlal <suleiman(a)google.com>
[ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
When steal time exceeds the measured delta when updating clock_task, we
currently try to catch up the excess in future updates.
However, this results in inaccurate run times for the future things using
clock_task, in some situations, as they end up getting additional steal
time that did not actually happen.
This is because there is a window between reading the elapsed time in
update_rq_clock() and sampling the steal time in update_rq_clock_task().
If the VCPU gets preempted between those two points, any additional
steal time is accounted to the outgoing task even though the calculated
delta did not actually contain any of that "stolen" time.
When this race happens, we can end up with steal time that exceeds the
calculated delta, and the previous code would try to catch up that excess
steal time in future clock updates, which is given to the next,
incoming task, even though it did not actually have any time stolen.
This behavior is particularly bad when steal time can be very long,
which we've seen when trying to extend steal time to contain the duration
that the host was suspended [0]. When this happens, clock_task stays
frozen, during which the running task stays running for the whole
duration, since its run time doesn't increase.
However the race can happen even under normal operation.
Ideally we would read the elapsed cpu time and the steal time atomically,
to prevent this race from happening in the first place, but doing so
is non-trivial.
Since the time between those two points isn't otherwise accounted anywhere,
neither to the outgoing task nor the incoming task (because the "end of
outgoing task" and "start of incoming task" timestamps are the same),
I would argue that the right thing to do is to simply drop any excess steal
time, in order to prevent these issues.
[0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@google.com/
Signed-off-by: Suleiman Souhlal <suleiman(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3e5a6bf587f91..296e77380318e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -766,13 +766,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
- steal = paravirt_steal_clock(cpu_of(rq));
+ u64 prev_steal;
+
+ steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
- rq->prev_steal_time_rq += steal;
+ rq->prev_steal_time_rq = prev_steal;
delta -= steal;
}
#endif
--
2.39.5
From: Josef Bacik <josef(a)toxicpanda.com>
[ Upstream commit 6a4730b325aaa48f7a5d5ba97aff0a955e2d9cec ]
This BUG_ON is meant to catch backref cache problems, but these can
arise from either bugs in the backref cache or corruption in the extent
tree. Fix it to be a proper error.
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/relocation.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 89ad7e12c08bb..062154c6a65f6 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4789,8 +4789,18 @@ int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
WARN_ON(!first_cow && level == 0);
node = rc->backref_cache.path[level];
- BUG_ON(node->bytenr != buf->start &&
- node->new_bytenr != buf->start);
+
+ /*
+ * If node->bytenr != buf->start and node->new_bytenr !=
+ * buf->start then we've got the wrong backref node for what we
+ * expected to see here and the cache is incorrect.
+ */
+ if (unlikely(node->bytenr != buf->start && node->new_bytenr != buf->start)) {
+ btrfs_err(fs_info,
+"bytenr %llu was found but our backref cache was expecting %llu or %llu",
+ buf->start, node->bytenr, node->new_bytenr);
+ return -EUCLEAN;
+ }
drop_node_buffer(node);
extent_buffer_get(cow);
--
2.39.5
From: Josef Bacik <josef(a)toxicpanda.com>
[ Upstream commit 6a4730b325aaa48f7a5d5ba97aff0a955e2d9cec ]
This BUG_ON is meant to catch backref cache problems, but these can
arise from either bugs in the backref cache or corruption in the extent
tree. Fix it to be a proper error.
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/relocation.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 98e3b3749ec12..5b921e6ed94e2 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3976,8 +3976,18 @@ int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
WARN_ON(!first_cow && level == 0);
node = rc->backref_cache.path[level];
- BUG_ON(node->bytenr != buf->start &&
- node->new_bytenr != buf->start);
+
+ /*
+ * If node->bytenr != buf->start and node->new_bytenr !=
+ * buf->start then we've got the wrong backref node for what we
+ * expected to see here and the cache is incorrect.
+ */
+ if (unlikely(node->bytenr != buf->start && node->new_bytenr != buf->start)) {
+ btrfs_err(fs_info,
+"bytenr %llu was found but our backref cache was expecting %llu or %llu",
+ buf->start, node->bytenr, node->new_bytenr);
+ return -EUCLEAN;
+ }
btrfs_backref_drop_node_buffer(node);
atomic_inc(&cow->refs);
--
2.39.5
From: Hao-ran Zheng <zhenghaoran154(a)gmail.com>
[ Upstream commit 5324c4e10e9c2ce307a037e904c0d9671d7137d9 ]
A data race occurs when the function `insert_ordered_extent_file_extent()`
and the function `btrfs_inode_safe_disk_i_size_write()` are executed
concurrently. The function `insert_ordered_extent_file_extent()` is not
locked when reading inode->disk_i_size, causing
`btrfs_inode_safe_disk_i_size_write()` to cause data competition when
writing inode->disk_i_size, thus affecting the value of `modify_tree`.
The specific call stack that appears during testing is as follows:
============DATA_RACE============
btrfs_drop_extents+0x89a/0xa060 [btrfs]
insert_reserved_file_extent+0xb54/0x2960 [btrfs]
insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
btrfs_finish_ordered_io+0x37/0x60 [btrfs]
finish_ordered_fn+0x3e/0x50 [btrfs]
btrfs_work_helper+0x9c9/0x27a0 [btrfs]
process_scheduled_works+0x716/0xf10
worker_thread+0xb6a/0x1190
kthread+0x292/0x330
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
============OTHER_INFO============
btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
btrfs_finish_ordered_io+0x37/0x60 [btrfs]
finish_ordered_fn+0x3e/0x50 [btrfs]
btrfs_work_helper+0x9c9/0x27a0 [btrfs]
process_scheduled_works+0x716/0xf10
worker_thread+0xb6a/0x1190
kthread+0x292/0x330
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
=================================
The main purpose of the check of the inode's disk_i_size is to avoid
taking write locks on a btree path when we have a write at or beyond
EOF, since in these cases we don't expect to find extent items in the
root to drop. However if we end up taking write locks due to a data
race on disk_i_size, everything is still correct, we only add extra
lock contention on the tree in case there's concurrency from other tasks.
If the race causes us to not take write locks when we actually need them,
then everything is functionally correct as well, since if we find out we
have extent items to drop and we took read locks (modify_tree set to 0),
we release the path and retry again with write locks.
Since this data race does not affect the correctness of the function,
it is a harmless data race, use data_race() to check inode->disk_i_size.
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Hao-ran Zheng <zhenghaoran154(a)gmail.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 44160d4ad53e0..31b25cb2f5cc3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -731,7 +731,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans,
if (args->drop_cache)
btrfs_drop_extent_cache(inode, args->start, args->end - 1, 0);
- if (args->start >= inode->disk_i_size && !args->replace_extent)
+ if (data_race(args->start >= inode->disk_i_size) && !args->replace_extent)
modify_tree = 0;
update_refs = (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID);
--
2.39.5
The quilt patch titled
Subject: mm/compaction: fix UBSAN shift-out-of-bounds warning
has been removed from the -mm tree. Its filename was
mm-compaction-fix-ubsan-shift-out-of-bounds-warning.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/compaction: fix UBSAN shift-out-of-bounds warning
Date: Thu, 23 Jan 2025 10:10:29 +0800
syzkaller reported a UBSAN shift-out-of-bounds warning of (1UL << order)
in isolate_freepages_block(). The bogus compound_order can be any value
because it is union with flags. Add back the MAX_PAGE_ORDER check to fix
the warning.
Link: https://lkml.kernel.org/r/20250123021029.2826736-1-liushixin2@huawei.com
Fixes: 3da0272a4c7d ("mm/compaction: correctly return failure with bogus compound_order in strict mode")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kemeng Shi <shikemeng(a)huaweicloud.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/compaction.c~mm-compaction-fix-ubsan-shift-out-of-bounds-warning
+++ a/mm/compaction.c
@@ -631,7 +631,8 @@ static unsigned long isolate_freepages_b
if (PageCompound(page)) {
const unsigned int order = compound_order(page);
- if (blockpfn + (1UL << order) <= end_pfn) {
+ if ((order <= MAX_PAGE_ORDER) &&
+ (blockpfn + (1UL << order) <= end_pfn)) {
blockpfn += (1UL << order) - 1;
page += (1UL << order) - 1;
nr_scanned += (1UL << order) - 1;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
mm-page_isolation-avoid-call-folio_hstate-without-hugetlb_lock.patch
Also add a kunit testcase to make sure the function works correctly now
and doesn't regress in the future.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (2):
of: address: Fix empty resource handling in __of_address_resource_bounds()
of: address: Add kunit test for __of_address_resource_bounds()
drivers/of/address.c | 17 +++----
drivers/of/of_private.h | 4 ++
drivers/of/of_test.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 132 insertions(+), 9 deletions(-)
---
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
change-id: 20250120-of-address-overflow-a59476362885
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
From: Gabriele Monaco <gmonaco(a)redhat.com>
RV per-task monitors are implemented through a monitor structure
available for each task_struct. This structure is reset every time the
monitor is (re-)started, to avoid inconsistencies if the monitor was
activated previously.
To do so, we reset the monitor on all threads using the macro
for_each_process_thread. However, this macro excludes the idle tasks on
each CPU. Idle tasks could be considered tasks on their own right and it
should be up to the model whether to ignore them or not.
Reset monitors also on the idle tasks for each present CPU whenever we
reset all per-task monitors.
Cc: stable(a)vger.kernel.org
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: John Kacur <jkacur(a)redhat.com>
Link: https://lore.kernel.org/20250115151547.605750-2-gmonaco@redhat.com
Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/rv/da_monitor.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 9705b2a98e49..510c88bfabd4 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -14,6 +14,7 @@
#include <rv/automata.h>
#include <linux/rv.h>
#include <linux/bug.h>
+#include <linux/sched.h>
#ifdef CONFIG_RV_REACTORS
@@ -324,10 +325,13 @@ static inline struct da_monitor *da_get_monitor_##name(struct task_struct *tsk)
static void da_monitor_reset_all_##name(void) \
{ \
struct task_struct *g, *p; \
+ int cpu; \
\
read_lock(&tasklist_lock); \
for_each_process_thread(g, p) \
da_monitor_reset_##name(da_get_monitor_##name(p)); \
+ for_each_present_cpu(cpu) \
+ da_monitor_reset_##name(da_get_monitor_##name(idle_task(cpu))); \
read_unlock(&tasklist_lock); \
} \
\
--
2.45.2
From: Tomas Glozar <tglozar(a)redhat.com>
When using rtla timerlat with userspace threads (-u or -U), rtla
disables the OSNOISE_WORKLOAD option in
/sys/kernel/tracing/osnoise/options. This option is not re-enabled in a
subsequent run with kernel-space threads, leading to rtla collecting no
results if the previous run exited abnormally:
$ rtla timerlat top -u
^\Quit (core dumped)
$ rtla timerlat top -k -d 1s
Timer Latency
0 00:00:01 | IRQ Timer Latency (us) | Thread Timer Latency (us)
CPU COUNT | cur min avg max | cur min avg max
The issue persists until OSNOISE_WORKLOAD is set manually by running:
$ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if
available to fix the issue.
Cc: stable(a)vger.kernel.org
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Luis Goncalves <lgoncalv(a)redhat.com>
Link: https://lore.kernel.org/20250107144823.239782-4-tglozar@redhat.com
Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support")
Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
tools/tracing/rtla/src/timerlat_top.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index d358cd39f360..f387597d3ac2 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -851,12 +851,15 @@ timerlat_top_apply_config(struct osnoise_tool *top, struct timerlat_top_params *
}
}
- if (params->user_top) {
- retval = osnoise_set_workload(top->context, 0);
- if (retval) {
- err_msg("Failed to set OSNOISE_WORKLOAD option\n");
- goto out_err;
- }
+ /*
+ * Set workload according to type of thread if the kernel supports it.
+ * On kernels without support, user threads will have already failed
+ * on missing timerlat_fd, and kernel threads do not need it.
+ */
+ retval = osnoise_set_workload(top->context, params->kernel_workload);
+ if (retval < -1) {
+ err_msg("Failed to set OSNOISE_WORKLOAD option\n");
+ goto out_err;
}
if (isatty(STDOUT_FILENO) && !params->quiet)
--
2.45.2
From: Tomas Glozar <tglozar(a)redhat.com>
When using rtla timerlat with userspace threads (-u or -U), rtla
disables the OSNOISE_WORKLOAD option in
/sys/kernel/tracing/osnoise/options. This option is not re-enabled in a
subsequent run with kernel-space threads, leading to rtla collecting no
results if the previous run exited abnormally:
$ rtla timerlat hist -u
^\Quit (core dumped)
$ rtla timerlat hist -k -d 1s
Index
over:
count:
min:
avg:
max:
ALL: IRQ Thr Usr
count: 0 0 0
min: - - -
avg: - - -
max: - - -
The issue persists until OSNOISE_WORKLOAD is set manually by running:
$ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if
available to fix the issue.
Cc: stable(a)vger.kernel.org
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Luis Goncalves <lgoncalv(a)redhat.com>
Link: https://lore.kernel.org/20250107144823.239782-3-tglozar@redhat.com
Fixes: ed774f7481fa ("rtla/timerlat_hist: Add timerlat user-space support")
Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
tools/tracing/rtla/src/timerlat_hist.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c
index 53ded187b33a..d4bd02c01c53 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -1085,12 +1085,15 @@ timerlat_hist_apply_config(struct osnoise_tool *tool, struct timerlat_hist_param
}
}
- if (params->user_hist) {
- retval = osnoise_set_workload(tool->context, 0);
- if (retval) {
- err_msg("Failed to set OSNOISE_WORKLOAD option\n");
- goto out_err;
- }
+ /*
+ * Set workload according to type of thread if the kernel supports it.
+ * On kernels without support, user threads will have already failed
+ * on missing timerlat_fd, and kernel threads do not need it.
+ */
+ retval = osnoise_set_workload(tool->context, params->kernel_workload);
+ if (retval < -1) {
+ err_msg("Failed to set OSNOISE_WORKLOAD option\n");
+ goto out_err;
}
return 0;
--
2.45.2
From: Tomas Glozar <tglozar(a)redhat.com>
Currently, when either SIGINT from the user or SIGALRM from the duration
timer is caught by rtla-timerlat, stop_tracing is set to break out of
the main loop. This is not sufficient for cases where the timerlat
tracer is producing more data than rtla can consume, since in that case,
rtla is looping indefinitely inside tracefs_iterate_raw_events, never
reaches the check of stop_tracing and hangs.
In addition to setting stop_tracing, also stop the timerlat tracer on
received signal (SIGINT or SIGALRM). This will stop new samples so that
the existing samples may be processed and tracefs_iterate_raw_events
eventually exits.
Cc: stable(a)vger.kernel.org
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Luis Goncalves <lgoncalv(a)redhat.com>
Cc: Gabriele Monaco <gmonaco(a)redhat.com>
Link: https://lore.kernel.org/20250116144931.649593-4-tglozar@redhat.com
Fixes: a828cd18bc4a ("rtla: Add timerlat tool and timelart top mode")
Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
tools/tracing/rtla/src/timerlat_top.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index 059b468981e4..d21a21053917 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -900,9 +900,12 @@ static struct osnoise_tool
}
static int stop_tracing;
+static struct trace_instance *top_inst = NULL;
static void stop_top(int sig)
{
stop_tracing = 1;
+ if (top_inst)
+ trace_instance_stop(top_inst);
}
/*
@@ -950,6 +953,13 @@ int timerlat_top_main(int argc, char *argv[])
}
trace = &top->trace;
+ /*
+ * Save trace instance into global variable so that SIGINT can stop
+ * the timerlat tracer.
+ * Otherwise, rtla could loop indefinitely when overloaded.
+ */
+ top_inst = trace;
+
retval = enable_timerlat(trace);
if (retval) {
@@ -1131,7 +1141,7 @@ int timerlat_top_main(int argc, char *argv[])
return_value = 0;
- if (trace_is_off(&top->trace, &record->trace)) {
+ if (trace_is_off(&top->trace, &record->trace) && !stop_tracing) {
printf("rtla timerlat hit stop tracing\n");
if (!params->no_aa)
--
2.45.2
From: Tomas Glozar <tglozar(a)redhat.com>
Currently, when either SIGINT from the user or SIGALRM from the duration
timer is caught by rtla-timerlat, stop_tracing is set to break out of
the main loop. This is not sufficient for cases where the timerlat
tracer is producing more data than rtla can consume, since in that case,
rtla is looping indefinitely inside tracefs_iterate_raw_events, never
reaches the check of stop_tracing and hangs.
In addition to setting stop_tracing, also stop the timerlat tracer on
received signal (SIGINT or SIGALRM). This will stop new samples so that
the existing samples may be processed and tracefs_iterate_raw_events
eventually exits.
Cc: stable(a)vger.kernel.org
Cc: John Kacur <jkacur(a)redhat.com>
Cc: Luis Goncalves <lgoncalv(a)redhat.com>
Cc: Gabriele Monaco <gmonaco(a)redhat.com>
Link: https://lore.kernel.org/20250116144931.649593-3-tglozar@redhat.com
Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode")
Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
tools/tracing/rtla/src/timerlat_hist.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c
index 8b66387e5f35..f1edf1c8a7b0 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -1131,9 +1131,12 @@ static struct osnoise_tool
}
static int stop_tracing;
+static struct trace_instance *hist_inst = NULL;
static void stop_hist(int sig)
{
stop_tracing = 1;
+ if (hist_inst)
+ trace_instance_stop(hist_inst);
}
/*
@@ -1180,6 +1183,12 @@ int timerlat_hist_main(int argc, char *argv[])
}
trace = &tool->trace;
+ /*
+ * Save trace instance into global variable so that SIGINT can stop
+ * the timerlat tracer.
+ * Otherwise, rtla could loop indefinitely when overloaded.
+ */
+ hist_inst = trace;
retval = enable_timerlat(trace);
if (retval) {
@@ -1348,7 +1357,7 @@ int timerlat_hist_main(int argc, char *argv[])
return_value = 0;
- if (trace_is_off(&tool->trace, &record->trace)) {
+ if (trace_is_off(&tool->trace, &record->trace) && !stop_tracing) {
printf("rtla timerlat hit stop tracing\n");
if (!params->no_aa)
--
2.45.2
In the US country code, to avoid including 6 GHz rules in the 5 GHz rules
list, the number of 5 GHz rules is set to a default constant value of 4
(REG_US_5G_NUM_REG_RULES). However, if there are more than 4 valid 5 GHz
rules, the current logic will bypass the legitimate 6 GHz rules.
For example, if there are 5 valid 5 GHz rules and 1 valid 6 GHz rule, the
current logic will only consider 4 of the 5 GHz rules, treating the last
valid rule as a 6 GHz rule. Consequently, the actual 6 GHz rule is never
processed, leading to the eventual disabling of 6 GHz channels.
To fix this issue, instead of hardcoding the value to 4, use a helper
function to determine the number of 6 GHz rules present in the 5 GHz rules
list and ignore only those rules.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1
Cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh(a)oss.qualcomm.com>
---
drivers/net/wireless/ath/ath12k/wmi.c | 61 ++++++++++++++++++++++++++---------
drivers/net/wireless/ath/ath12k/wmi.h | 1 -
2 files changed, 45 insertions(+), 17 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/wmi.c b/drivers/net/wireless/ath/ath12k/wmi.c
index a7625a3d1a3a9b1bbcf90cee30d3c707de03c439..296238c7ee0fd95045a3e0859d730e3bd73ec906 100644
--- a/drivers/net/wireless/ath/ath12k/wmi.c
+++ b/drivers/net/wireless/ath/ath12k/wmi.c
@@ -4852,6 +4852,22 @@ static struct ath12k_reg_rule
return reg_rule_ptr;
}
+static u8 ath12k_wmi_ignore_num_extra_rules(struct ath12k_wmi_reg_rule_ext_params *rule,
+ u32 num_reg_rules)
+{
+ u8 num_invalid_5ghz_rules = 0;
+ u32 count, start_freq;
+
+ for (count = 0; count < num_reg_rules; count++) {
+ start_freq = le32_get_bits(rule[count].freq_info, REG_RULE_START_FREQ);
+
+ if (start_freq >= ATH12K_MIN_6G_FREQ)
+ num_invalid_5ghz_rules++;
+ }
+
+ return num_invalid_5ghz_rules;
+}
+
static int ath12k_pull_reg_chan_list_ext_update_ev(struct ath12k_base *ab,
struct sk_buff *skb,
struct ath12k_reg_info *reg_info)
@@ -4862,6 +4878,7 @@ static int ath12k_pull_reg_chan_list_ext_update_ev(struct ath12k_base *ab,
u32 num_2g_reg_rules, num_5g_reg_rules;
u32 num_6g_reg_rules_ap[WMI_REG_CURRENT_MAX_AP_TYPE];
u32 num_6g_reg_rules_cl[WMI_REG_CURRENT_MAX_AP_TYPE][WMI_REG_MAX_CLIENT_TYPE];
+ u8 num_invalid_5ghz_ext_rules;
u32 total_reg_rules = 0;
int ret, i, j;
@@ -4955,20 +4972,6 @@ static int ath12k_pull_reg_chan_list_ext_update_ev(struct ath12k_base *ab,
memcpy(reg_info->alpha2, &ev->alpha2, REG_ALPHA2_LEN);
- /* FIXME: Currently FW includes 6G reg rule also in 5G rule
- * list for country US.
- * Having same 6G reg rule in 5G and 6G rules list causes
- * intersect check to be true, and same rules will be shown
- * multiple times in iw cmd. So added hack below to avoid
- * parsing 6G rule from 5G reg rule list, and this can be
- * removed later, after FW updates to remove 6G reg rule
- * from 5G rules list.
- */
- if (memcmp(reg_info->alpha2, "US", 2) == 0) {
- reg_info->num_5g_reg_rules = REG_US_5G_NUM_REG_RULES;
- num_5g_reg_rules = reg_info->num_5g_reg_rules;
- }
-
reg_info->dfs_region = le32_to_cpu(ev->dfs_region);
reg_info->phybitmap = le32_to_cpu(ev->phybitmap);
reg_info->num_phy = le32_to_cpu(ev->num_phy);
@@ -5071,8 +5074,29 @@ static int ath12k_pull_reg_chan_list_ext_update_ev(struct ath12k_base *ab,
}
}
+ ext_wmi_reg_rule += num_2g_reg_rules;
+
+ /* Firmware might include 6 GHz reg rule in 5 GHz rule list
+ * for few countries along with separate 6 GHz rule.
+ * Having same 6 GHz reg rule in 5 GHz and 6 GHz rules list
+ * causes intersect check to be true, and same rules will be
+ * shown multiple times in iw cmd.
+ * Hence, avoid parsing 6 GHz rule from 5 GHz reg rule list
+ */
+ num_invalid_5ghz_ext_rules = ath12k_wmi_ignore_num_extra_rules(ext_wmi_reg_rule,
+ num_5g_reg_rules);
+
+ if (num_invalid_5ghz_ext_rules) {
+ ath12k_dbg(ab, ATH12K_DBG_WMI,
+ "CC: %s 5 GHz reg rules number %d from fw, %d number of invalid 5 GHz rules",
+ reg_info->alpha2, reg_info->num_5g_reg_rules,
+ num_invalid_5ghz_ext_rules);
+
+ num_5g_reg_rules = num_5g_reg_rules - num_invalid_5ghz_ext_rules;
+ reg_info->num_5g_reg_rules = num_5g_reg_rules;
+ }
+
if (num_5g_reg_rules) {
- ext_wmi_reg_rule += num_2g_reg_rules;
reg_info->reg_rules_5g_ptr =
create_ext_reg_rules_from_wmi(num_5g_reg_rules,
ext_wmi_reg_rule);
@@ -5084,7 +5108,12 @@ static int ath12k_pull_reg_chan_list_ext_update_ev(struct ath12k_base *ab,
}
}
- ext_wmi_reg_rule += num_5g_reg_rules;
+ /* We have adjusted the number of 5 GHz reg rules above. But still those
+ * many rules needs to be adjusted in ext_wmi_reg_rule.
+ *
+ * NOTE: num_invalid_5ghz_ext_rules will be 0 for rest other cases.
+ */
+ ext_wmi_reg_rule += (num_5g_reg_rules + num_invalid_5ghz_ext_rules);
for (i = 0; i < WMI_REG_CURRENT_MAX_AP_TYPE; i++) {
reg_info->reg_rules_6g_ap_ptr[i] =
diff --git a/drivers/net/wireless/ath/ath12k/wmi.h b/drivers/net/wireless/ath/ath12k/wmi.h
index 38aed18b99f4239e40d369181549b4bc78faa862..81248253a36872f6fd93db772b5394f3ef9d3bf9 100644
--- a/drivers/net/wireless/ath/ath12k/wmi.h
+++ b/drivers/net/wireless/ath/ath12k/wmi.h
@@ -4093,7 +4093,6 @@ struct ath12k_wmi_eht_rate_set_params {
#define MAX_REG_RULES 10
#define REG_ALPHA2_LEN 2
#define MAX_6G_REG_RULES 5
-#define REG_US_5G_NUM_REG_RULES 4
enum wmi_start_event_param {
WMI_VDEV_START_RESP_EVENT = 0,
---
base-commit: c4e4e37afc8b8d178b0f00f607c30b3b81253597
change-id: 20250123-fix_6ghz_rules_handling-ff85f1efb30b
This change is specific to Hyper-V VM user.
If the Virtual Machine Connection window is focused,
a Hyper-V VM user can unintentionally touch the keyboard/mouse
when the VM is hibernating or resuming, and consequently the
hibernation or resume operation can be aborted unexpectedly.
Fix the issue by no longer registering the keyboard/mouse as
wakeup devices (see the other two patches for the
changes to drivers/input/serio/hyperv-keyboard.c and
drivers/hid/hid-hyperv.c).
The keyboard/mouse were registered as wakeup devices because the
VM needs to be woken up from the Suspend-to-Idle state after
a user runs "echo freeze > /sys/power/state". It seems like
the Suspend-to-Idle feature has no real users in practice, so
let's no longer support that by returning -EOPNOTSUPP if a
user tries to use that.
Fixes: 1a06d017fb3f ("Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis(a)linux.microsoft.com>>
---
Changes in v4:
* No change
Changes in v3:
* Add "Cc: stable(a)vger.kernel.org" in sign-off area.
Changes in v2:
* Add "#define vmbus_freeze NULL" when CONFIG_PM_SLEEP is not
enabled.
---
drivers/hv/vmbus_drv.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 6d89d37b069a..4df6b12bf6a1 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -900,6 +900,19 @@ static void vmbus_shutdown(struct device *child_device)
}
#ifdef CONFIG_PM_SLEEP
+/*
+ * vmbus_freeze - Suspend-to-Idle
+ */
+static int vmbus_freeze(struct device *child_device)
+{
+/*
+ * Do not support Suspend-to-Idle ("echo freeze > /sys/power/state") as
+ * that would require registering the Hyper-V synthetic mouse/keyboard
+ * devices as wakeup devices, which can abort hibernation/resume unexpectedly.
+ */
+ return -EOPNOTSUPP;
+}
+
/*
* vmbus_suspend - Suspend a vmbus device
*/
@@ -938,6 +951,7 @@ static int vmbus_resume(struct device *child_device)
return drv->resume(dev);
}
#else
+#define vmbus_freeze NULL
#define vmbus_suspend NULL
#define vmbus_resume NULL
#endif /* CONFIG_PM_SLEEP */
@@ -969,7 +983,7 @@ static void vmbus_device_release(struct device *device)
*/
static const struct dev_pm_ops vmbus_pm = {
- .suspend_noirq = NULL,
+ .suspend_noirq = vmbus_freeze,
.resume_noirq = NULL,
.freeze_noirq = vmbus_suspend,
.thaw_noirq = vmbus_resume,
--
2.34.1
Hi,
This series has two main parts: 1) add error handling all around, and 2)
update the drivers according to latest (mostly non-public) information
from TI.
The "Update UB9702 init sequences" patch basically rewrites the init
sequence from scratch, and to make that patch easier to read, the
previous patch first removes the current init sequence. In the final
version these two patches need to be squashed together.
Tomi
Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
---
Changes in v2:
- Add new patch "media: i2c: ds90ub913: Fix returned fmt from .set_fmt()"
- Reformat the reg write in "Speed-up I2C watchdog timer"
- Add 'it' parameter to rxport for_each macros
- Move 'enable_sscg' module parameter to DT
- Change serializer 'i2c-addr' property to serializer 'reg' property
- Split ub953 registers to a header file
- Link to v1: https://lore.kernel.org/r/20250110-ub9xx-improvements-v1-0-e0b9a1f644da@ide…
---
Jai Luthra (6):
media: i2c: ds90ub953: Speed-up I2C watchdog timer
media: dt-bindings: ti,ds90ub960: Add ti,enable-sscg property
media: dt-bindings: ti,ds90ub960: Allow setting serializer address
media: i2c: ds90ub960: Enable SSCG for UB9702
media: i2c: ds90ub960: Configure serializer using back-channel
media: i2c: ds90ub9xx: Set serializer temperature ramp
Tomi Valkeinen (16):
media: i2c: ds90ub953: Fix error prints
media: i2c: ds90ub913: Fix returned fmt from .set_fmt()
media: i2c: ds90ub913: Align ub913_read() with other similar functions
media: i2c: ds90ub9xx: Add err parameter to read/write funcs
media: i2c: ds90ub960: Add error handling to multiple places
media: i2c: ds90ub953: Add error handling to ub953_log_status()
media: i2c: ds90ub913: Add error handling to ub913_log_status()
media: i2c: ds90ub960: Move UB9702 registers to a separate section
media: i2c: ds90ub960: Add UB9702 specific registers
media: i2c: ds90ub960: Split ub960_init_tx_ports()
media: i2c: ds90ub960: Refresh ub960_init_tx_ports_ub9702()
media: i2c: ds90ub960: Add RX port iteration support
media: i2c: ds90ub960: Move all RX port init code into ub960_init_rx_ports()
media: i2c: ds90ub960: Remove old ub9702 RX port init code (SQUASH)
media: i2c: ds90ub960: Update UB9702 init sequences
media: i2c: ds90ub953: Move reg defines to a header file
.../bindings/media/i2c/ti,ds90ub953.yaml | 77 +-
.../bindings/media/i2c/ti,ds90ub960.yaml | 21 +-
drivers/media/i2c/ds90ub913.c | 82 +-
drivers/media/i2c/ds90ub953.c | 242 +--
drivers/media/i2c/ds90ub953.h | 104 +
drivers/media/i2c/ds90ub960.c | 2264 +++++++++++++++-----
6 files changed, 2070 insertions(+), 720 deletions(-)
---
base-commit: c4b7779abc6633677e6edb79e2809f4f61fde157
change-id: 20250110-ub9xx-improvements-9172b44eb0bc
Best regards,
--
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
When device_register(&child->dev) failed, calling put_device() to
explicitly release child->dev. Otherwise, it could cause double free
problem.
device_register() includes device_add(). As comment of device_add()
says, 'if device_add() succeeds, you should call device_del() when you
want to get rid of it. If device_add() has not succeeded, use only
put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- added the bug description about the comment of device_add();
- fixed the patch as suggestions;
- added Cc and Fixes table.
---
drivers/pci/probe.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2e81ab0f5a25..ae12f92c6a9d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1174,7 +1174,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
add_dev:
pci_set_bus_msi_domain(child);
ret = device_register(&child->dev);
- WARN_ON(ret < 0);
+ if (WARN_ON(ret < 0)) {
+ put_device(&child->dev);
+ return NULL;
+ }
pcibios_add_bus(child);
--
2.25.1
devm_blk_crypto_profile_init() registers a cleanup handler to run when
the associated (platform-) device is being released. For UFS, the
crypto private data and pointers are stored as part of the ufs_hba's
data structure 'struct ufs_hba::crypto_profile'. This structure is
allocated as part of the underlying ufshd allocation.
During driver release or during error handling in ufshcd_pltfrm_init(),
this structure is released as part of ufshcd_dealloc_host() before the
(platform-) device associated with the crypto call above is released.
Once this device is released, the crypto cleanup code will run, using
the just-released 'struct ufs_hba::crypto_profile'. This causes a
use-after-free situation:
exynos-ufshc 14700000.ufs: ufshcd_pltfrm_init() failed -11
exynos-ufshc 14700000.ufs: probe with driver exynos-ufshc failed with error -11
Unable to handle kernel paging request at virtual address 01adafad6dadad88
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[01adafad6dadad88] address between user and kernel address ranges
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.13.0-rc5-next-20250106+ #70
Tainted: [W]=WARN
Hardware name: Oriole (DT)
pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kfree+0x60/0x2d8
lr : kvfree+0x44/0x60
sp : ffff80008009ba80
x29: ffff80008009ba90 x28: 0000000000000000 x27: ffffbcc6591e0130
x26: ffffbcc659309960 x25: ffffbcc658f89c50 x24: ffffbcc659539d80
x23: ffff22e000940040 x22: ffff22e001539010 x21: ffffbcc65714b22c
x20: 6b6b6b6b6b6b6b6b x19: 01adafad6dadad80 x18: 0000000000000000
x17: ffffbcc6579fbac8 x16: ffffbcc657a04300 x15: ffffbcc657a027f4
x14: ffffbcc656f969cc x13: ffffbcc6579fdc80 x12: ffffbcc6579fb194
x11: ffffbcc6579fbc34 x10: 0000000000000000 x9 : ffffbcc65714b22c
x8 : ffff80008009b880 x7 : 0000000000000000 x6 : ffff80008009b940
x5 : ffff80008009b8c0 x4 : ffff22e000940518 x3 : ffff22e006f54f40
x2 : ffffbcc657a02268 x1 : ffff80007fffffff x0 : ffffc1ffc0000000
Call trace:
kfree+0x60/0x2d8 (P)
kvfree+0x44/0x60
blk_crypto_profile_destroy_callback+0x28/0x70
devm_action_release+0x1c/0x30
release_nodes+0x6c/0x108
devres_release_all+0x98/0x100
device_unbind_cleanup+0x20/0x70
really_probe+0x218/0x2d0
In other words, the initialisation code flow is:
platform-device probe
ufshcd_pltfrm_init()
ufshcd_alloc_host()
scsi_host_alloc()
allocation of struct ufs_hba
creation of scsi-host devices
devm_blk_crypto_profile_init()
devm registration of cleanup handler using platform-device
and during error handling of ufshcd_pltfrm_init() or during driver
removal:
ufshcd_dealloc_host()
scsi_host_put()
put_device(scsi-host)
release of struct ufs_hba
put_device(platform-device)
crypto cleanup handler
To fix this use-after free, change ufshcd_alloc_host() to register a
devres action to automatically cleanup the underlying SCSI device on
ufshcd destruction, without requiring explicit calls to
ufshcd_dealloc_host(). This way:
* the crypto profile and all other ufs_hba-owned resources are
destroyed before SCSI (as they've been registered after)
* a memleak is plugged in tc-dwc-g210-pci.c remove() as a
side-effect
* EXPORT_SYMBOL_GPL(ufshcd_dealloc_host) can be removed fully as
it's not needed anymore
* no future drivers using ufshcd_alloc_host() could ever forget
adding the cleanup
Fixes: cb77cb5abe1f ("blk-crypto: rename blk_keyslot_manager to blk_crypto_profile")
Fixes: d76d9d7d1009 ("scsi: ufs: use devm_blk_ksm_init()")
Cc: stable(a)vger.kernel.org
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
Changes in v3:
- rename devres action handler to ufshcd_devres_release() (Bart)
- Link to v2: https://lore.kernel.org/r/20250114-ufshcd-fix-v2-1-2dc627590a4a@linaro.org
Changes in v2:
- completely new approach using devres action for Scsi_host cleanup, to
ensure ordering
- add Fixes: and CC: stable tags (Eric)
- Link to v1: https://lore.kernel.org/r/20250113-ufshcd-fix-v1-1-ca63d1d4bd55@linaro.org
---
In my case, as per above trace I initially encountered an error in
ufshcd_verify_dev_init(), which made me notice this problem both during
error handling and release. For reproducing, it'd be possible to change
that function to just return an error, or rmmod the platform glue
driver.
Other approaches for solving this issue I see are the following, but I
believe this one here is the cleanest:
* turn 'struct ufs_hba::crypto_profile' into a dynamically allocated
pointer, in which case it doesn't matter if cleanup runs after
scsi_host_put()
* add an explicit devm_blk_crypto_profile_deinit() to be called by API
users when necessary, e.g. before ufshcd_dealloc_host() in this case
* register the crypto cleanup handler against the scsi-host device
instead, like in v1 of this patch
---
drivers/ufs/core/ufshcd.c | 27 +++++++++++++++++----------
drivers/ufs/host/ufshcd-pci.c | 2 --
drivers/ufs/host/ufshcd-pltfrm.c | 11 ++++-------
include/ufs/ufshcd.h | 1 -
4 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 43ddae7318cb..8351795296bb 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -10279,16 +10279,6 @@ int ufshcd_system_thaw(struct device *dev)
EXPORT_SYMBOL_GPL(ufshcd_system_thaw);
#endif /* CONFIG_PM_SLEEP */
-/**
- * ufshcd_dealloc_host - deallocate Host Bus Adapter (HBA)
- * @hba: pointer to Host Bus Adapter (HBA)
- */
-void ufshcd_dealloc_host(struct ufs_hba *hba)
-{
- scsi_host_put(hba->host);
-}
-EXPORT_SYMBOL_GPL(ufshcd_dealloc_host);
-
/**
* ufshcd_set_dma_mask - Set dma mask based on the controller
* addressing capability
@@ -10307,6 +10297,16 @@ static int ufshcd_set_dma_mask(struct ufs_hba *hba)
return dma_set_mask_and_coherent(hba->dev, DMA_BIT_MASK(32));
}
+/**
+ * ufshcd_devres_release - devres cleanup handler, invoked during release of
+ * hba->dev
+ * @host: pointer to SCSI host
+ */
+static void ufshcd_devres_release(void *host)
+{
+ scsi_host_put(host);
+}
+
/**
* ufshcd_alloc_host - allocate Host Bus Adapter (HBA)
* @dev: pointer to device handle
@@ -10334,6 +10334,13 @@ int ufshcd_alloc_host(struct device *dev, struct ufs_hba **hba_handle)
err = -ENOMEM;
goto out_error;
}
+
+ err = devm_add_action_or_reset(dev, ufshcd_devres_release,
+ host);
+ if (err)
+ return dev_err_probe(dev, err,
+ "failed to add ufshcd dealloc action\n");
+
host->nr_maps = HCTX_TYPE_POLL + 1;
hba = shost_priv(host);
hba->host = host;
diff --git a/drivers/ufs/host/ufshcd-pci.c b/drivers/ufs/host/ufshcd-pci.c
index ea39c5d5b8cf..9cfcaad23cf9 100644
--- a/drivers/ufs/host/ufshcd-pci.c
+++ b/drivers/ufs/host/ufshcd-pci.c
@@ -562,7 +562,6 @@ static void ufshcd_pci_remove(struct pci_dev *pdev)
pm_runtime_forbid(&pdev->dev);
pm_runtime_get_noresume(&pdev->dev);
ufshcd_remove(hba);
- ufshcd_dealloc_host(hba);
}
/**
@@ -605,7 +604,6 @@ ufshcd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
err = ufshcd_init(hba, mmio_base, pdev->irq);
if (err) {
dev_err(&pdev->dev, "Initialization failed\n");
- ufshcd_dealloc_host(hba);
return err;
}
diff --git a/drivers/ufs/host/ufshcd-pltfrm.c b/drivers/ufs/host/ufshcd-pltfrm.c
index 505572d4fa87..adb0a65d9df5 100644
--- a/drivers/ufs/host/ufshcd-pltfrm.c
+++ b/drivers/ufs/host/ufshcd-pltfrm.c
@@ -488,13 +488,13 @@ int ufshcd_pltfrm_init(struct platform_device *pdev,
if (err) {
dev_err(dev, "%s: clock parse failed %d\n",
__func__, err);
- goto dealloc_host;
+ goto out;
}
err = ufshcd_parse_regulator_info(hba);
if (err) {
dev_err(dev, "%s: regulator init failed %d\n",
__func__, err);
- goto dealloc_host;
+ goto out;
}
ufshcd_init_lanes_per_dir(hba);
@@ -502,14 +502,14 @@ int ufshcd_pltfrm_init(struct platform_device *pdev,
err = ufshcd_parse_operating_points(hba);
if (err) {
dev_err(dev, "%s: OPP parse failed %d\n", __func__, err);
- goto dealloc_host;
+ goto out;
}
err = ufshcd_init(hba, mmio_base, irq);
if (err) {
dev_err_probe(dev, err, "Initialization failed with error %d\n",
err);
- goto dealloc_host;
+ goto out;
}
pm_runtime_set_active(dev);
@@ -517,8 +517,6 @@ int ufshcd_pltfrm_init(struct platform_device *pdev,
return 0;
-dealloc_host:
- ufshcd_dealloc_host(hba);
out:
return err;
}
@@ -534,7 +532,6 @@ void ufshcd_pltfrm_remove(struct platform_device *pdev)
pm_runtime_get_sync(&pdev->dev);
ufshcd_remove(hba);
- ufshcd_dealloc_host(hba);
pm_runtime_disable(&pdev->dev);
pm_runtime_put_noidle(&pdev->dev);
}
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index da0fa5c65081..58eb6e897827 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -1311,7 +1311,6 @@ static inline void ufshcd_rmwl(struct ufs_hba *hba, u32 mask, u32 val, u32 reg)
void ufshcd_enable_irq(struct ufs_hba *hba);
void ufshcd_disable_irq(struct ufs_hba *hba);
int ufshcd_alloc_host(struct device *, struct ufs_hba **);
-void ufshcd_dealloc_host(struct ufs_hba *);
int ufshcd_hba_enable(struct ufs_hba *hba);
int ufshcd_init(struct ufs_hba *, void __iomem *, unsigned int);
int ufshcd_link_recovery(struct ufs_hba *hba);
---
base-commit: 4e16367cfe0ce395f29d0482b78970cce8e1db73
change-id: 20250113-ufshcd-fix-52409f2d32ff
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
The patch titled
Subject: mm/compaction: fix UBSAN shift-out-of-bounds warning
has been added to the -mm mm-unstable branch. Its filename is
mm-compaction-fix-ubsan-shift-out-of-bounds-warning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/compaction: fix UBSAN shift-out-of-bounds warning
Date: Thu, 23 Jan 2025 10:10:29 +0800
syzkaller reported a UBSAN shift-out-of-bounds warning of (1UL << order)
in isolate_freepages_block(). The bogus compound_order can be any value
because it is union with flags. Add back the MAX_PAGE_ORDER check to fix
the warning.
Link: https://lkml.kernel.org/r/20250123021029.2826736-1-liushixin2@huawei.com
Fixes: 3da0272a4c7d ("mm/compaction: correctly return failure with bogus compound_order in strict mode")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Kemeng Shi <shikemeng(a)huaweicloud.com>
Cc: Mattew Wilcox <willy(a)infradead.org> [English fixes]
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/compaction.c~mm-compaction-fix-ubsan-shift-out-of-bounds-warning
+++ a/mm/compaction.c
@@ -631,7 +631,8 @@ static unsigned long isolate_freepages_b
if (PageCompound(page)) {
const unsigned int order = compound_order(page);
- if (blockpfn + (1UL << order) <= end_pfn) {
+ if ((order <= MAX_PAGE_ORDER) &&
+ (blockpfn + (1UL << order) <= end_pfn)) {
blockpfn += (1UL << order) - 1;
page += (1UL << order) - 1;
nr_scanned += (1UL << order) - 1;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
mm-page_isolation-avoid-call-folio_hstate-without-hugetlb_lock.patch
mm-compaction-fix-ubsan-shift-out-of-bounds-warning.patch
Protect access to fore200e->available_cell_rate with rate_mtx lock to
prevent potential data race.
In this case, since the update depends on a prior read, a data race
could lead to a wrong fore200e.available_cell_rate value.
The field fore200e.available_cell_rate is generally protected by the lock
fore200e.rate_mtx when accessed. In all other read and write cases, this
field is consistently protected by the lock, except for this case and
during initialization.
This potential bug was detected by our experimental static analysis tool,
which analyzes locking APIs and paired functions to identify data races
and atomicity violations.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini(a)gmail.com>
---
v2:
* Added a description of the data race hazard in fore200e_open(), as
suggested by Jakub Kicinski and Simon Horman.
---
drivers/atm/fore200e.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index 4fea1149e003..f62e38571440 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -1374,7 +1374,9 @@ fore200e_open(struct atm_vcc *vcc)
vcc->dev_data = NULL;
+ mutex_lock(&fore200e->rate_mtx);
fore200e->available_cell_rate += vcc->qos.txtp.max_pcr;
+ mutex_unlock(&fore200e->rate_mtx);
kfree(fore200e_vcc);
return -EINVAL;
--
2.25.1
Hello,
My name is wioleta. we would like to know if you export to
Poland, as We have active
projects that require most products as seen on your website. If
yes, please kindly keep us informed upon your feedback so we can
send our preferred listing for quote. For further information or
have any questions, please do not hesitate to write us.
Sten Arnlund
Purchase Manager
wioleta.raimer(a)invpolamd.com
a: Vedwalterdige by 2, Holmerskulle, 432 68 Poland.
From: Rand Deeb <rand.sec96(a)gmail.com>
[ Upstream commit 789c17185fb0f39560496c2beab9b57ce1d0cbe7 ]
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96(a)gmail.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
drivers/ssb/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index d52e91258e98..aae50a5dfb57 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -341,11 +341,13 @@ static int ssb_bus_match(struct device *dev, struct device_driver *drv)
static int ssb_device_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
+ struct ssb_device *ssb_dev;
if (!dev)
return -ENODEV;
+ ssb_dev = dev_to_ssb_dev(dev);
+
return add_uevent_var(env,
"MODALIAS=ssb:v%04Xid%04Xrev%02X",
ssb_dev->id.vendor, ssb_dev->id.coreid,
--
2.25.1
From: Lucas De Marchi <lucas.demarchi(a)intel.com>
Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.
The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.
Reviewed-by: José Roberto de Souza <jose.souza(a)intel.com>
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Julia Filipchuk <julia.filipchuk(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
drivers/gpu/drm/xe/xe_devcoredump.c | 30 +++++++++--------------------
drivers/gpu/drm/xe/xe_devcoredump.h | 2 +-
drivers/gpu/drm/xe/xe_guc_ct.c | 3 ++-
drivers/gpu/drm/xe/xe_guc_log.c | 4 +++-
4 files changed, 15 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 81dc7795c0651..1c86e6456d60f 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -395,42 +395,30 @@ int xe_devcoredump_init(struct xe_device *xe)
/**
* xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85
*
- * The output is split to multiple lines because some print targets, e.g. dmesg
- * cannot handle arbitrarily long lines. Note also that printing to dmesg in
- * piece-meal fashion is not possible, each separate call to drm_puts() has a
- * line-feed automatically added! Therefore, the entire output line must be
- * constructed in a local buffer first, then printed in one atomic output call.
+ * The output is split to multiple print calls because some print targets, e.g.
+ * dmesg cannot handle arbitrarily long lines. These targets may add newline
+ * between calls.
*
* There is also a scheduler yield call to prevent the 'task has been stuck for
* 120s' kernel hang check feature from firing when printing to a slow target
* such as dmesg over a serial port.
*
- * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down.
- *
* @p: the printer object to output to
* @prefix: optional prefix to add to output string
* @blob: the Binary Large OBject to dump out
* @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32)
* @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32)
*/
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size)
{
const u32 *blob32 = (const u32 *)blob;
char buff[ASCII85_BUFSZ], *line_buff;
size_t line_pos = 0;
- /*
- * Splitting blobs across multiple lines is not compatible with the mesa
- * debug decoder tool. Note that even dropping the explicit '\n' below
- * doesn't help because the GuC log is so big some underlying implementation
- * still splits the lines at 512K characters. So just bail completely for
- * the moment.
- */
- return;
-
#define DMESG_MAX_LINE_LEN 800
-#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */
+ /* Always leave space for the suffix char and the \0 */
+#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "<suffix>\0" */
if (size & 3)
drm_printf(p, "Size not word aligned: %zu", size);
@@ -462,7 +450,6 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
line_pos += strlen(line_buff + line_pos);
if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
drm_puts(p, line_buff);
@@ -474,10 +461,11 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
}
}
+ if (suffix)
+ line_buff[line_pos++] = suffix;
+
if (line_pos) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
-
drm_puts(p, line_buff);
}
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index 6a17e6d601022..5391a80a4d1ba 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -29,7 +29,7 @@ static inline int xe_devcoredump_init(struct xe_device *xe)
}
#endif
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size);
#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 8b65c5e959cc2..50c8076b51585 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1724,7 +1724,8 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
snapshot->g2h_outstanding);
if (snapshot->ctb)
- xe_print_blob_ascii85(p, "CTB data", snapshot->ctb, 0, snapshot->ctb_size);
+ xe_print_blob_ascii85(p, "CTB data", '\n',
+ snapshot->ctb, 0, snapshot->ctb_size);
} else {
drm_puts(p, "CT disabled\n");
}
diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
index 80151ff6a71f8..44482ea919924 100644
--- a/drivers/gpu/drm/xe/xe_guc_log.c
+++ b/drivers/gpu/drm/xe/xe_guc_log.c
@@ -207,8 +207,10 @@ void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_
remain = snapshot->size;
for (i = 0; i < snapshot->num_chunks; i++) {
size_t size = min(GUC_LOG_CHUNK_SIZE, remain);
+ const char *prefix = i ? NULL : "Log data";
+ char suffix = i == snapshot->num_chunks - 1 ? '\n' : 0;
- xe_print_blob_ascii85(p, i ? NULL : "Log data", snapshot->copy[i], 0, size);
+ xe_print_blob_ascii85(p, prefix, suffix, snapshot->copy[i], 0, size);
remain -= size;
}
}
--
2.48.1
Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.
The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Julia Filipchuk <julia.filipchuk(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
drivers/gpu/drm/xe/xe_devcoredump.c | 30 +++++++++--------------------
drivers/gpu/drm/xe/xe_devcoredump.h | 2 +-
drivers/gpu/drm/xe/xe_guc_ct.c | 3 ++-
drivers/gpu/drm/xe/xe_guc_log.c | 4 +++-
4 files changed, 15 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index a7946a76777e7..d9b71bb690860 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -391,42 +391,30 @@ int xe_devcoredump_init(struct xe_device *xe)
/**
* xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85
*
- * The output is split to multiple lines because some print targets, e.g. dmesg
- * cannot handle arbitrarily long lines. Note also that printing to dmesg in
- * piece-meal fashion is not possible, each separate call to drm_puts() has a
- * line-feed automatically added! Therefore, the entire output line must be
- * constructed in a local buffer first, then printed in one atomic output call.
+ * The output is split to multiple print calls because some print targets, e.g.
+ * dmesg cannot handle arbitrarily long lines. These targets may add newline
+ * between calls.
*
* There is also a scheduler yield call to prevent the 'task has been stuck for
* 120s' kernel hang check feature from firing when printing to a slow target
* such as dmesg over a serial port.
*
- * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down.
- *
* @p: the printer object to output to
* @prefix: optional prefix to add to output string
* @blob: the Binary Large OBject to dump out
* @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32)
* @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32)
*/
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size)
{
const u32 *blob32 = (const u32 *)blob;
char buff[ASCII85_BUFSZ], *line_buff;
size_t line_pos = 0;
- /*
- * Splitting blobs across multiple lines is not compatible with the mesa
- * debug decoder tool. Note that even dropping the explicit '\n' below
- * doesn't help because the GuC log is so big some underlying implementation
- * still splits the lines at 512K characters. So just bail completely for
- * the moment.
- */
- return;
-
#define DMESG_MAX_LINE_LEN 800
-#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */
+ /* Always leave space for the suffix char and the \0 */
+#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "<suffix>\0" */
if (size & 3)
drm_printf(p, "Size not word aligned: %zu", size);
@@ -458,7 +446,6 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
line_pos += strlen(line_buff + line_pos);
if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
drm_puts(p, line_buff);
@@ -470,10 +457,11 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
}
}
+ if (suffix)
+ line_buff[line_pos++] = suffix;
+
if (line_pos) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
-
drm_puts(p, line_buff);
}
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index 6a17e6d601022..5391a80a4d1ba 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -29,7 +29,7 @@ static inline int xe_devcoredump_init(struct xe_device *xe)
}
#endif
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size);
#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 8b65c5e959cc2..50c8076b51585 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1724,7 +1724,8 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
snapshot->g2h_outstanding);
if (snapshot->ctb)
- xe_print_blob_ascii85(p, "CTB data", snapshot->ctb, 0, snapshot->ctb_size);
+ xe_print_blob_ascii85(p, "CTB data", '\n',
+ snapshot->ctb, 0, snapshot->ctb_size);
} else {
drm_puts(p, "CT disabled\n");
}
diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
index 80151ff6a71f8..44482ea919924 100644
--- a/drivers/gpu/drm/xe/xe_guc_log.c
+++ b/drivers/gpu/drm/xe/xe_guc_log.c
@@ -207,8 +207,10 @@ void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_
remain = snapshot->size;
for (i = 0; i < snapshot->num_chunks; i++) {
size_t size = min(GUC_LOG_CHUNK_SIZE, remain);
+ const char *prefix = i ? NULL : "Log data";
+ char suffix = i == snapshot->num_chunks - 1 ? '\n' : 0;
- xe_print_blob_ascii85(p, i ? NULL : "Log data", snapshot->copy[i], 0, size);
+ xe_print_blob_ascii85(p, prefix, suffix, snapshot->copy[i], 0, size);
remain -= size;
}
}
--
2.48.0
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 53dac345395c0d2493cbc2f4c85fe38aef5b63f5
Gitweb: https://git.kernel.org/tip/53dac345395c0d2493cbc2f4c85fe38aef5b63f5
Author: Frederic Weisbecker <frederic(a)kernel.org>
AuthorDate: Sat, 18 Jan 2025 00:24:33 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 23 Jan 2025 20:06:35 +01:00
hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
hrtimers are migrated away from the dying CPU to any online target at
the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
handling tasks involved in the CPU hotplug forward progress.
However wakeups can still be performed by the outgoing CPU after
CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being
armed. Depending on several considerations (crystal ball power management
based election, earliest timer already enqueued, timer migration enabled or
not), the target may eventually be the current CPU even if offline. If that
happens, the timer is eventually ignored.
The most notable example is RCU which had to deal with each and every of
those wake-ups by deferring them to an online CPU, along with related
workarounds:
_ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
_ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
_ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
The problem isn't confined to RCU though as the stop machine kthread
(which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
of its work through cpu_stop_signal_done() and performs a wake up that
eventually arms the deadline server timer:
WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
Call Trace:
<TASK>
start_dl_timer
enqueue_dl_entity
dl_server_start
enqueue_task_fair
enqueue_task
ttwu_do_activate
try_to_wake_up
complete
cpu_stopper_thread
Instead of providing yet another bandaid to work around the situation, fix
it in the hrtimers infrastructure instead: always migrate away a timer to
an online target whenever it is enqueued from an offline CPU.
This will also allow to revert all the above RCU disgraceful hacks.
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Reported-by: Vlad Poenaru <vlad.wing(a)gmail.com>
Reported-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Tested-by: Paul E. McKenney <paulmck(a)kernel.org>
Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org
Closes: 20241213203739.1519801-1-usamaarif642(a)gmail.com
---
include/linux/hrtimer_defs.h | 1 +-
kernel/time/hrtimer.c | 103 +++++++++++++++++++++++++++-------
2 files changed, 83 insertions(+), 21 deletions(-)
diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
index c3b4b7e..84a5045 100644
--- a/include/linux/hrtimer_defs.h
+++ b/include/linux/hrtimer_defs.h
@@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
ktime_t softirq_expires_next;
struct hrtimer *softirq_next_timer;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ call_single_data_t csd;
} ____cacheline_aligned;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4fb81f8..deb1aa3 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -58,6 +58,8 @@
#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+static void retrigger_next_event(void *arg);
+
/*
* The timer bases:
*
@@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = &ktime_get_clocktai,
},
- }
+ },
+ .csd = CSD_INIT(retrigger_next_event, NULL)
};
static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
@@ -124,6 +127,14 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
[CLOCK_TAI] = HRTIMER_BASE_TAI,
};
+static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)
+{
+ if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ return true;
+ else
+ return likely(base->online);
+}
+
/*
* Functions and macros which are different for UP/SMP systems are kept in a
* single place
@@ -178,27 +189,54 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
}
/*
- * We do not migrate the timer when it is expiring before the next
- * event on the target cpu. When high resolution is enabled, we cannot
- * reprogram the target cpu hardware and we would cause it to fire
- * late. To keep it simple, we handle the high resolution enabled and
- * disabled case similar.
+ * Check if the elected target is suitable considering its next
+ * event and the hotplug state of the current CPU.
+ *
+ * If the elected target is remote and its next event is after the timer
+ * to queue, then a remote reprogram is necessary. However there is no
+ * guarantee the IPI handling the operation would arrive in time to meet
+ * the high resolution deadline. In this case the local CPU becomes a
+ * preferred target, unless it is offline.
+ *
+ * High and low resolution modes are handled the same way for simplicity.
*
* Called with cpu_base->lock of target cpu held.
*/
-static int
-hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
+static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base,
+ struct hrtimer_cpu_base *new_cpu_base,
+ struct hrtimer_cpu_base *this_cpu_base)
{
ktime_t expires;
+ /*
+ * The local CPU clockevent can be reprogrammed. Also get_target_base()
+ * guarantees it is online.
+ */
+ if (new_cpu_base == this_cpu_base)
+ return true;
+
+ /*
+ * The offline local CPU can't be the default target if the
+ * next remote target event is after this timer. Keep the
+ * elected new base. An IPI will we issued to reprogram
+ * it as a last resort.
+ */
+ if (!hrtimer_base_is_online(this_cpu_base))
+ return true;
+
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
- return expires < new_base->cpu_base->expires_next;
+
+ return expires >= new_base->cpu_base->expires_next;
}
-static inline
-struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,
- int pinned)
+static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned)
{
+ if (!hrtimer_base_is_online(base)) {
+ int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));
+
+ return &per_cpu(hrtimer_bases, cpu);
+ }
+
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
if (static_branch_likely(&timers_migration_enabled) && !pinned)
return &per_cpu(hrtimer_bases, get_nohz_timer_target());
@@ -249,8 +287,8 @@ again:
raw_spin_unlock(&base->cpu_base->lock);
raw_spin_lock(&new_base->cpu_base->lock);
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,
+ this_cpu_base)) {
raw_spin_unlock(&new_base->cpu_base->lock);
raw_spin_lock(&base->cpu_base->lock);
new_cpu_base = this_cpu_base;
@@ -259,8 +297,7 @@ again:
}
WRITE_ONCE(timer->base, new_base);
} else {
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) {
new_cpu_base = this_cpu_base;
goto again;
}
@@ -706,8 +743,6 @@ static inline int hrtimer_is_hres_enabled(void)
return hrtimer_hres_enabled;
}
-static void retrigger_next_event(void *arg);
-
/*
* Switch to high resolution mode
*/
@@ -1195,6 +1230,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
u64 delta_ns, const enum hrtimer_mode mode,
struct hrtimer_clock_base *base)
{
+ struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
struct hrtimer_clock_base *new_base;
bool force_local, first;
@@ -1206,10 +1242,16 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* and enforce reprogramming after it is queued no matter whether
* it is the new first expiring timer again or not.
*/
- force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
+ force_local = base->cpu_base == this_cpu_base;
force_local &= base->cpu_base->next_timer == timer;
/*
+ * Don't force local queuing if this enqueue happens on a unplugged
+ * CPU after hrtimer_cpu_dying() has been invoked.
+ */
+ force_local &= this_cpu_base->online;
+
+ /*
* Remove an active timer from the queue. In case it is not queued
* on the current CPU, make sure that remove_hrtimer() updates the
* remote data correctly.
@@ -1238,8 +1280,27 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
}
first = enqueue_hrtimer(timer, new_base, mode);
- if (!force_local)
- return first;
+ if (!force_local) {
+ /*
+ * If the current CPU base is online, then the timer is
+ * never queued on a remote CPU if it would be the first
+ * expiring timer there.
+ */
+ if (hrtimer_base_is_online(this_cpu_base))
+ return first;
+
+ /*
+ * Timer was enqueued remote because the current base is
+ * already offline. If the timer is the first to expire,
+ * kick the remote CPU to reprogram the clock event.
+ */
+ if (first) {
+ struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
+
+ smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
+ }
+ return 0;
+ }
/*
* Timer was forced to stay on the current CPU to avoid
From: Zhihao Cheng <chengzhihao1(a)huawei.com>
commit aec8e6bf839101784f3ef037dcdb9432c3f32343 ("btrfs:
fix use-after-free of block device file in __btrfs_free_extra_devids()")
Mounting btrfs from two images (which have the same one fsid and two
different dev_uuids) in certain executing order may trigger an UAF for
variable 'device->bdev_file' in __btrfs_free_extra_devids(). And
following are the details:
1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs
devices by ioctl(BTRFS_IOC_SCAN_DEV):
/ btrfs_device_1 → loop0
fs_device
\ btrfs_device_2 → loop1
2. mount /dev/loop0 /mnt
btrfs_open_devices
btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0)
btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1)
btrfs_fill_super
open_ctree
fail: btrfs_close_devices // -ENOMEM
btrfs_close_bdev(btrfs_device_1)
fput(btrfs_device_1->bdev_file)
// btrfs_device_1->bdev_file is freed
btrfs_close_bdev(btrfs_device_2)
fput(btrfs_device_2->bdev_file)
3. mount /dev/loop1 /mnt
btrfs_open_devices
btrfs_get_bdev_and_sb(&bdev_file)
// EIO, btrfs_device_1->bdev_file is not assigned,
// which points to a freed memory area
btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1)
btrfs_fill_super
open_ctree
btrfs_free_extra_devids
if (btrfs_device_1->bdev_file)
fput(btrfs_device_1->bdev_file) // UAF !
Fix it by setting 'device->bdev_file' as 'NULL' after closing the
btrfs_device in btrfs_close_one_device().
Fixes: CVE-2024-50217
Fixes: 142388194191 ("btrfs: do not background blkdev_put()")
CC: stable(a)vger.kernel.org # 4.19+
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
(cherry picked from commit aec8e6bf839101784f3ef037dcdb9432c3f32343)
Signed-off-by: Shubham Pushpkar <spushpka(a)cisco.com>
---
fs/btrfs/volumes.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b9a0b26d08e1..ab2412542ce5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1176,6 +1176,7 @@ static void btrfs_close_one_device(struct btrfs_device *device)
if (device->bdev) {
fs_devices->open_devices--;
device->bdev = NULL;
+ device->bdev_file = NULL;
}
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
btrfs_destroy_dev_zone_info(device);
--
2.35.6
This reverts commit 27ab05e1b7e5c5ec9b4f658e1b2464c0908298a6.
I tried to upgrade a RasPi 3B+ with Waveshare 7inch HDMI LCD
from 6.1.y to 6.6.y but found that the display is broken with
this log message:
[ 17.776315] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_drm_unregister [vc4])
[ 17.784034] platform 3f806000.vec: deferred probe pending
Some tests revealed that while 6.1.y works, 6.2-rc1 is already broken but all
newer kernels as well. And a bisect did lead me to this patch.
I could repair several versions up to 6.13-rc7 by doing
this revert. Newer kernels have just to take care of
commit f702475b839c ("ARM: dts: bcm2835-rpi: Move duplicate firmware-clocks to bcm2835-rpi.dtsi")
but that is straightforward.
Fixes: 27ab05e1b7e5 ("ARM: dts: bcm2835-rpi: Use firmware clocks for display")
Signed-off-by: H. Nikolaus Schaller <hns(a)goldelico.com>
---
arch/arm/boot/dts/bcm2835-rpi-common.dtsi | 17 -----------------
1 file changed, 17 deletions(-)
diff --git a/arch/arm/boot/dts/bcm2835-rpi-common.dtsi b/arch/arm/boot/dts/bcm2835-rpi-common.dtsi
index 4e7b4a592da7c..8a55b6cded592 100644
--- a/arch/arm/boot/dts/bcm2835-rpi-common.dtsi
+++ b/arch/arm/boot/dts/bcm2835-rpi-common.dtsi
@@ -7,23 +7,6 @@
#include <dt-bindings/power/raspberrypi-power.h>
-&firmware {
- firmware_clocks: clocks {
- compatible = "raspberrypi,firmware-clocks";
- #clock-cells = <1>;
- };
-};
-
-&hdmi {
- clocks = <&firmware_clocks 9>,
- <&firmware_clocks 13>;
- clock-names = "pixel", "hdmi";
-};
-
&v3d {
power-domains = <&power RPI_POWER_DOMAIN_V3D>;
};
-
-&vec {
- clocks = <&firmware_clocks 15>;
-};
--
2.47.0