The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9228b26194d1cc00449f12f306f53ef2e234a55b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040355-harsh-endowment-6808@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
9228b26194d1 ("KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value")
0ab410a93d62 ("KVM: arm64: Narrow PMU sysreg reset values to architectural requirements")
11663111cd49 ("KVM: arm64: Hide PMU registers from userspace when not available")
7ccadf23b861 ("KVM: arm64: Add missing reset handlers for PMU emulation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9228b26194d1cc00449f12f306f53ef2e234a55b Mon Sep 17 00:00:00 2001
From: Reiji Watanabe <reijiw(a)google.com>
Date: Sun, 12 Mar 2023 20:32:08 -0700
Subject: [PATCH] KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the
current value
Have KVM_GET_ONE_REG for vPMU counter (vPMC) registers (PMCCNTR_EL0
and PMEVCNTR<n>_EL0) return the sum of the register value in the sysreg
file and the current perf event counter value.
Values of vPMC registers are saved in sysreg files on certain occasions.
These saved values don't represent the current values of the vPMC
registers if the perf events for the vPMCs count events after the save.
The current values of those registers are the sum of the sysreg file
value and the current perf event counter value. But, when userspace
reads those registers (using KVM_GET_ONE_REG), KVM returns the sysreg
file value to userspace (not the sum value).
Fix this to return the sum value for KVM_GET_ONE_REG.
Fixes: 051ff581ce70 ("arm64: KVM: Add access handler for event counter register")
Cc: stable(a)vger.kernel.org
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Reiji Watanabe <reijiw(a)google.com>
Link: https://lore.kernel.org/r/20230313033208.1475499-1-reijiw@google.com
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 53749d3a0996..1b2c161120be 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -856,6 +856,22 @@ static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx)
return true;
}
+static int get_pmu_evcntr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *val)
+{
+ u64 idx;
+
+ if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 0)
+ /* PMCCNTR_EL0 */
+ idx = ARMV8_PMU_CYCLE_IDX;
+ else
+ /* PMEVCNTRn_EL0 */
+ idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
+
+ *val = kvm_pmu_get_counter_value(vcpu, idx);
+ return 0;
+}
+
static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -1072,7 +1088,7 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
/* Macro to expand the PMEVCNTRn_EL0 register */
#define PMU_PMEVCNTR_EL0(n) \
{ PMU_SYS_REG(SYS_PMEVCNTRn_EL0(n)), \
- .reset = reset_pmevcntr, \
+ .reset = reset_pmevcntr, .get_user = get_pmu_evcntr, \
.access = access_pmu_evcntr, .reg = (PMEVCNTR0_EL0 + n), }
/* Macro to expand the PMEVTYPERn_EL0 register */
@@ -1982,7 +1998,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ PMU_SYS_REG(SYS_PMCEID1_EL0),
.access = access_pmceid, .reset = NULL },
{ PMU_SYS_REG(SYS_PMCCNTR_EL0),
- .access = access_pmu_evcntr, .reset = reset_unknown, .reg = PMCCNTR_EL0 },
+ .access = access_pmu_evcntr, .reset = reset_unknown,
+ .reg = PMCCNTR_EL0, .get_user = get_pmu_evcntr},
{ PMU_SYS_REG(SYS_PMXEVTYPER_EL0),
.access = access_pmu_evtyper, .reset = NULL },
{ PMU_SYS_REG(SYS_PMXEVCNTR_EL0),
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9228b26194d1cc00449f12f306f53ef2e234a55b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040354-armful-augmented-752b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9228b26194d1 ("KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value")
0ab410a93d62 ("KVM: arm64: Narrow PMU sysreg reset values to architectural requirements")
11663111cd49 ("KVM: arm64: Hide PMU registers from userspace when not available")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9228b26194d1cc00449f12f306f53ef2e234a55b Mon Sep 17 00:00:00 2001
From: Reiji Watanabe <reijiw(a)google.com>
Date: Sun, 12 Mar 2023 20:32:08 -0700
Subject: [PATCH] KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the
current value
Have KVM_GET_ONE_REG for vPMU counter (vPMC) registers (PMCCNTR_EL0
and PMEVCNTR<n>_EL0) return the sum of the register value in the sysreg
file and the current perf event counter value.
Values of vPMC registers are saved in sysreg files on certain occasions.
These saved values don't represent the current values of the vPMC
registers if the perf events for the vPMCs count events after the save.
The current values of those registers are the sum of the sysreg file
value and the current perf event counter value. But, when userspace
reads those registers (using KVM_GET_ONE_REG), KVM returns the sysreg
file value to userspace (not the sum value).
Fix this to return the sum value for KVM_GET_ONE_REG.
Fixes: 051ff581ce70 ("arm64: KVM: Add access handler for event counter register")
Cc: stable(a)vger.kernel.org
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Reiji Watanabe <reijiw(a)google.com>
Link: https://lore.kernel.org/r/20230313033208.1475499-1-reijiw@google.com
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 53749d3a0996..1b2c161120be 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -856,6 +856,22 @@ static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx)
return true;
}
+static int get_pmu_evcntr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *val)
+{
+ u64 idx;
+
+ if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 0)
+ /* PMCCNTR_EL0 */
+ idx = ARMV8_PMU_CYCLE_IDX;
+ else
+ /* PMEVCNTRn_EL0 */
+ idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
+
+ *val = kvm_pmu_get_counter_value(vcpu, idx);
+ return 0;
+}
+
static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -1072,7 +1088,7 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
/* Macro to expand the PMEVCNTRn_EL0 register */
#define PMU_PMEVCNTR_EL0(n) \
{ PMU_SYS_REG(SYS_PMEVCNTRn_EL0(n)), \
- .reset = reset_pmevcntr, \
+ .reset = reset_pmevcntr, .get_user = get_pmu_evcntr, \
.access = access_pmu_evcntr, .reg = (PMEVCNTR0_EL0 + n), }
/* Macro to expand the PMEVTYPERn_EL0 register */
@@ -1982,7 +1998,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ PMU_SYS_REG(SYS_PMCEID1_EL0),
.access = access_pmceid, .reset = NULL },
{ PMU_SYS_REG(SYS_PMCCNTR_EL0),
- .access = access_pmu_evcntr, .reset = reset_unknown, .reg = PMCCNTR_EL0 },
+ .access = access_pmu_evcntr, .reset = reset_unknown,
+ .reg = PMCCNTR_EL0, .get_user = get_pmu_evcntr},
{ PMU_SYS_REG(SYS_PMXEVTYPER_EL0),
.access = access_pmu_evtyper, .reset = NULL },
{ PMU_SYS_REG(SYS_PMXEVCNTR_EL0),
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 6165a16a5ad9b237bb3131cff4d3c601ccb8f9a3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040341-excitable-jaws-269d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
6165a16a5ad9 ("NFSv4: Fix hangs when recovering open state after a server reboot")
e3c8dc761ead ("NFSv4: Check the return value of update_open_stateid()")
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6165a16a5ad9b237bb3131cff4d3c601ccb8f9a3 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Tue, 21 Mar 2023 00:17:36 -0400
Subject: [PATCH] NFSv4: Fix hangs when recovering open state after a server
reboot
When we're using a cached open stateid or a delegation in order to avoid
sending a CLAIM_PREVIOUS open RPC call to the server, we don't have a
new open stateid to present to update_open_stateid().
Instead rely on nfs4_try_open_cached(), just as if we were doing a
normal open.
Fixes: d2bfda2e7aa0 ("NFSv4: don't reprocess cached open CLAIM_PREVIOUS")
Cc: stable(a)vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 22a93ae46cd7..5607b1e2b821 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1980,8 +1980,7 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (!data->rpc_done) {
if (data->rpc_status)
return ERR_PTR(data->rpc_status);
- /* cached opens have already been processed */
- goto update;
+ return nfs4_try_open_cached(data);
}
ret = nfs_refresh_inode(inode, &data->f_attr);
@@ -1990,7 +1989,7 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
-update:
+
if (!update_open_stateid(state, &data->o_res.stateid,
NULL, data->o_arg.fmode))
return ERR_PTR(-EAGAIN);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6165a16a5ad9b237bb3131cff4d3c601ccb8f9a3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040340-bouncing-sampling-7639@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6165a16a5ad9 ("NFSv4: Fix hangs when recovering open state after a server reboot")
e3c8dc761ead ("NFSv4: Check the return value of update_open_stateid()")
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6165a16a5ad9b237bb3131cff4d3c601ccb8f9a3 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Tue, 21 Mar 2023 00:17:36 -0400
Subject: [PATCH] NFSv4: Fix hangs when recovering open state after a server
reboot
When we're using a cached open stateid or a delegation in order to avoid
sending a CLAIM_PREVIOUS open RPC call to the server, we don't have a
new open stateid to present to update_open_stateid().
Instead rely on nfs4_try_open_cached(), just as if we were doing a
normal open.
Fixes: d2bfda2e7aa0 ("NFSv4: don't reprocess cached open CLAIM_PREVIOUS")
Cc: stable(a)vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 22a93ae46cd7..5607b1e2b821 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1980,8 +1980,7 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (!data->rpc_done) {
if (data->rpc_status)
return ERR_PTR(data->rpc_status);
- /* cached opens have already been processed */
- goto update;
+ return nfs4_try_open_cached(data);
}
ret = nfs_refresh_inode(inode, &data->f_attr);
@@ -1990,7 +1989,7 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
-update:
+
if (!update_open_stateid(state, &data->o_res.stateid,
NULL, data->o_arg.fmode))
return ERR_PTR(-EAGAIN);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x fd7276189450110ed835eb0a334e62d2f1c4e3be
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040306-emit-earthling-ac1b@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
fd7276189450 ("powerpc: Don't try to copy PPR for task with NULL pt_regs")
47e12855a91d ("powerpc: switch to ->regset_get()")
6e0b79750ce2 ("powerpc/ptrace: move register viewing functions out of ptrace.c")
7c1f8db019f8 ("powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.")
60ef9dbd9d2a ("powerpc/ptrace: split out SPE related functions.")
1b20773b00b7 ("powerpc/ptrace: split out ALTIVEC related functions.")
7b99ed4e8e3a ("powerpc/ptrace: split out VSX related functions.")
963ae6b2ff1c ("powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET")
f1763e623c69 ("powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64")
b3138536c837 ("powerpc/ptrace: remove unused header includes")
da9a1c10e2c7 ("powerpc: Move ptrace into a subdirectory.")
68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save optimisation")
fddb5d430ad9 ("open: introduce openat2(2) syscall")
793b08e2efff ("powerpc/kexec: Move kexec files into a dedicated subdir.")
9f7bd9201521 ("powerpc/32: Split kexec low level code out of misc_32.S")
b020aa9d1e87 ("powerpc: cleanup hw_irq.h")
0671c5b84e9e ("MIPS: Wire up clone3 syscall")
45824fc0da6e ("Merge tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd7276189450110ed835eb0a334e62d2f1c4e3be Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Sun, 26 Mar 2023 16:15:57 -0600
Subject: [PATCH] powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:
Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
BUG: Kernel NULL pointer dereference on read at 0x00000160
Faulting instruction address: 0xc0000000000c3a60
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8
...
NIP memcpy_power7+0x200/0x7d0
LR ppr_get+0x64/0xb0
Call Trace:
ppr_get+0x40/0xb0 (unreliable)
__regset_get+0x180/0x1f0
regset_get_alloc+0x64/0x90
elf_core_dump+0xb98/0x1b60
do_coredump+0x1c34/0x24a0
get_signal+0x71c/0x1410
do_notify_resume+0x140/0x6f0
interrupt_exit_user_prepare_main+0x29c/0x320
interrupt_exit_user_prepare+0x6c/0xa0
interrupt_return_srr_user+0x8/0x138
Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.
Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.
Fixes: fa439810cc1b ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable(a)vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c b/arch/powerpc/kernel/ptrace/ptrace-view.c
index 2087a785f05f..5fff0d04b23f 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -290,6 +290,9 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset,
static int ppr_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
+ if (!target->thread.regs)
+ return -EINVAL;
+
return membuf_write(&to, &target->thread.regs->ppr, sizeof(u64));
}
@@ -297,6 +300,9 @@ static int ppr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count, const void *kbuf,
const void __user *ubuf)
{
+ if (!target->thread.regs)
+ return -EINVAL;
+
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.regs->ppr, 0, sizeof(u64));
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x fd7276189450110ed835eb0a334e62d2f1c4e3be
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040304-splashed-consult-3a39@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
fd7276189450 ("powerpc: Don't try to copy PPR for task with NULL pt_regs")
47e12855a91d ("powerpc: switch to ->regset_get()")
6e0b79750ce2 ("powerpc/ptrace: move register viewing functions out of ptrace.c")
7c1f8db019f8 ("powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.")
60ef9dbd9d2a ("powerpc/ptrace: split out SPE related functions.")
1b20773b00b7 ("powerpc/ptrace: split out ALTIVEC related functions.")
7b99ed4e8e3a ("powerpc/ptrace: split out VSX related functions.")
963ae6b2ff1c ("powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET")
f1763e623c69 ("powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64")
b3138536c837 ("powerpc/ptrace: remove unused header includes")
da9a1c10e2c7 ("powerpc: Move ptrace into a subdirectory.")
68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save optimisation")
fddb5d430ad9 ("open: introduce openat2(2) syscall")
793b08e2efff ("powerpc/kexec: Move kexec files into a dedicated subdir.")
9f7bd9201521 ("powerpc/32: Split kexec low level code out of misc_32.S")
b020aa9d1e87 ("powerpc: cleanup hw_irq.h")
0671c5b84e9e ("MIPS: Wire up clone3 syscall")
45824fc0da6e ("Merge tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd7276189450110ed835eb0a334e62d2f1c4e3be Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Sun, 26 Mar 2023 16:15:57 -0600
Subject: [PATCH] powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:
Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
BUG: Kernel NULL pointer dereference on read at 0x00000160
Faulting instruction address: 0xc0000000000c3a60
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8
...
NIP memcpy_power7+0x200/0x7d0
LR ppr_get+0x64/0xb0
Call Trace:
ppr_get+0x40/0xb0 (unreliable)
__regset_get+0x180/0x1f0
regset_get_alloc+0x64/0x90
elf_core_dump+0xb98/0x1b60
do_coredump+0x1c34/0x24a0
get_signal+0x71c/0x1410
do_notify_resume+0x140/0x6f0
interrupt_exit_user_prepare_main+0x29c/0x320
interrupt_exit_user_prepare+0x6c/0xa0
interrupt_return_srr_user+0x8/0x138
Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.
Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.
Fixes: fa439810cc1b ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable(a)vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c b/arch/powerpc/kernel/ptrace/ptrace-view.c
index 2087a785f05f..5fff0d04b23f 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -290,6 +290,9 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset,
static int ppr_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
+ if (!target->thread.regs)
+ return -EINVAL;
+
return membuf_write(&to, &target->thread.regs->ppr, sizeof(u64));
}
@@ -297,6 +300,9 @@ static int ppr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count, const void *kbuf,
const void __user *ubuf)
{
+ if (!target->thread.regs)
+ return -EINVAL;
+
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.regs->ppr, 0, sizeof(u64));
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x b26cd9325be4c1fcd331b77f10acb627c560d4d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040315-prodigal-chief-ace2@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
b26cd9325be4 ("pinctrl: amd: Disable and mask interrupts on resume")
4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe")
e81376ebbafc ("pinctrl: amd: Use irqchip template")
279ffafaf39d ("pinctrl: Added IRQF_SHARED flag for amd-pinctrl driver")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b26cd9325be4c1fcd331b77f10acb627c560d4d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kornel=20Dul=C4=99ba?= <korneld(a)chromium.org>
Date: Mon, 20 Mar 2023 09:32:59 +0000
Subject: [PATCH] pinctrl: amd: Disable and mask interrupts on resume
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This fixes a similar problem to the one observed in:
commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe").
On some systems, during suspend/resume cycle firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
This confuses the AMD pinctrl driver and causes spurious interrupts.
The driver already has logic to detect if a pin is used by the kernel.
Leverage it to re-initialize interrupt fields of a pin only if it's not
used by us.
Cc: stable(a)vger.kernel.org
Fixes: dbad75dd1f25 ("pinctrl: add AMD GPIO driver support.")
Signed-off-by: Kornel Dulęba <korneld(a)chromium.org>
Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 9236a132c7ba..609821b756c2 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -872,32 +872,34 @@ static const struct pinconf_ops amd_pinconf_ops = {
.pin_config_group_set = amd_pinconf_group_set,
};
-static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
+static void amd_gpio_irq_init_pin(struct amd_gpio *gpio_dev, int pin)
{
- struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
+ const struct pin_desc *pd;
unsigned long flags;
u32 pin_reg, mask;
- int i;
mask = BIT(WAKE_CNTRL_OFF_S0I3) | BIT(WAKE_CNTRL_OFF_S3) |
BIT(INTERRUPT_MASK_OFF) | BIT(INTERRUPT_ENABLE_OFF) |
BIT(WAKE_CNTRL_OFF_S4);
- for (i = 0; i < desc->npins; i++) {
- int pin = desc->pins[i].number;
- const struct pin_desc *pd = pin_desc_get(gpio_dev->pctrl, pin);
-
- if (!pd)
- continue;
+ pd = pin_desc_get(gpio_dev->pctrl, pin);
+ if (!pd)
+ return;
- raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+ raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+ pin_reg = readl(gpio_dev->base + pin * 4);
+ pin_reg &= ~mask;
+ writel(pin_reg, gpio_dev->base + pin * 4);
+ raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
+}
- pin_reg = readl(gpio_dev->base + i * 4);
- pin_reg &= ~mask;
- writel(pin_reg, gpio_dev->base + i * 4);
+static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
+{
+ struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
+ int i;
- raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
- }
+ for (i = 0; i < desc->npins; i++)
+ amd_gpio_irq_init_pin(gpio_dev, i);
}
#ifdef CONFIG_PM_SLEEP
@@ -950,8 +952,10 @@ static int amd_gpio_resume(struct device *dev)
for (i = 0; i < desc->npins; i++) {
int pin = desc->pins[i].number;
- if (!amd_gpio_should_save(gpio_dev, pin))
+ if (!amd_gpio_should_save(gpio_dev, pin)) {
+ amd_gpio_irq_init_pin(gpio_dev, pin);
continue;
+ }
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4) & PIN_IRQ_PENDING;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b26cd9325be4c1fcd331b77f10acb627c560d4d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040314-starring-unwanted-c4c0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
b26cd9325be4 ("pinctrl: amd: Disable and mask interrupts on resume")
4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe")
e81376ebbafc ("pinctrl: amd: Use irqchip template")
279ffafaf39d ("pinctrl: Added IRQF_SHARED flag for amd-pinctrl driver")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b26cd9325be4c1fcd331b77f10acb627c560d4d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kornel=20Dul=C4=99ba?= <korneld(a)chromium.org>
Date: Mon, 20 Mar 2023 09:32:59 +0000
Subject: [PATCH] pinctrl: amd: Disable and mask interrupts on resume
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This fixes a similar problem to the one observed in:
commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe").
On some systems, during suspend/resume cycle firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
This confuses the AMD pinctrl driver and causes spurious interrupts.
The driver already has logic to detect if a pin is used by the kernel.
Leverage it to re-initialize interrupt fields of a pin only if it's not
used by us.
Cc: stable(a)vger.kernel.org
Fixes: dbad75dd1f25 ("pinctrl: add AMD GPIO driver support.")
Signed-off-by: Kornel Dulęba <korneld(a)chromium.org>
Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 9236a132c7ba..609821b756c2 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -872,32 +872,34 @@ static const struct pinconf_ops amd_pinconf_ops = {
.pin_config_group_set = amd_pinconf_group_set,
};
-static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
+static void amd_gpio_irq_init_pin(struct amd_gpio *gpio_dev, int pin)
{
- struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
+ const struct pin_desc *pd;
unsigned long flags;
u32 pin_reg, mask;
- int i;
mask = BIT(WAKE_CNTRL_OFF_S0I3) | BIT(WAKE_CNTRL_OFF_S3) |
BIT(INTERRUPT_MASK_OFF) | BIT(INTERRUPT_ENABLE_OFF) |
BIT(WAKE_CNTRL_OFF_S4);
- for (i = 0; i < desc->npins; i++) {
- int pin = desc->pins[i].number;
- const struct pin_desc *pd = pin_desc_get(gpio_dev->pctrl, pin);
-
- if (!pd)
- continue;
+ pd = pin_desc_get(gpio_dev->pctrl, pin);
+ if (!pd)
+ return;
- raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+ raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+ pin_reg = readl(gpio_dev->base + pin * 4);
+ pin_reg &= ~mask;
+ writel(pin_reg, gpio_dev->base + pin * 4);
+ raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
+}
- pin_reg = readl(gpio_dev->base + i * 4);
- pin_reg &= ~mask;
- writel(pin_reg, gpio_dev->base + i * 4);
+static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
+{
+ struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
+ int i;
- raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
- }
+ for (i = 0; i < desc->npins; i++)
+ amd_gpio_irq_init_pin(gpio_dev, i);
}
#ifdef CONFIG_PM_SLEEP
@@ -950,8 +952,10 @@ static int amd_gpio_resume(struct device *dev)
for (i = 0; i < desc->npins; i++) {
int pin = desc->pins[i].number;
- if (!amd_gpio_should_save(gpio_dev, pin))
+ if (!amd_gpio_should_save(gpio_dev, pin)) {
+ amd_gpio_irq_init_pin(gpio_dev, pin);
continue;
+ }
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4) & PIN_IRQ_PENDING;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x b26cd9325be4c1fcd331b77f10acb627c560d4d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040312-paramedic-spectacle-0fcf@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
b26cd9325be4 ("pinctrl: amd: Disable and mask interrupts on resume")
4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe")
e81376ebbafc ("pinctrl: amd: Use irqchip template")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b26cd9325be4c1fcd331b77f10acb627c560d4d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kornel=20Dul=C4=99ba?= <korneld(a)chromium.org>
Date: Mon, 20 Mar 2023 09:32:59 +0000
Subject: [PATCH] pinctrl: amd: Disable and mask interrupts on resume
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This fixes a similar problem to the one observed in:
commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe").
On some systems, during suspend/resume cycle firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
This confuses the AMD pinctrl driver and causes spurious interrupts.
The driver already has logic to detect if a pin is used by the kernel.
Leverage it to re-initialize interrupt fields of a pin only if it's not
used by us.
Cc: stable(a)vger.kernel.org
Fixes: dbad75dd1f25 ("pinctrl: add AMD GPIO driver support.")
Signed-off-by: Kornel Dulęba <korneld(a)chromium.org>
Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 9236a132c7ba..609821b756c2 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -872,32 +872,34 @@ static const struct pinconf_ops amd_pinconf_ops = {
.pin_config_group_set = amd_pinconf_group_set,
};
-static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
+static void amd_gpio_irq_init_pin(struct amd_gpio *gpio_dev, int pin)
{
- struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
+ const struct pin_desc *pd;
unsigned long flags;
u32 pin_reg, mask;
- int i;
mask = BIT(WAKE_CNTRL_OFF_S0I3) | BIT(WAKE_CNTRL_OFF_S3) |
BIT(INTERRUPT_MASK_OFF) | BIT(INTERRUPT_ENABLE_OFF) |
BIT(WAKE_CNTRL_OFF_S4);
- for (i = 0; i < desc->npins; i++) {
- int pin = desc->pins[i].number;
- const struct pin_desc *pd = pin_desc_get(gpio_dev->pctrl, pin);
-
- if (!pd)
- continue;
+ pd = pin_desc_get(gpio_dev->pctrl, pin);
+ if (!pd)
+ return;
- raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+ raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+ pin_reg = readl(gpio_dev->base + pin * 4);
+ pin_reg &= ~mask;
+ writel(pin_reg, gpio_dev->base + pin * 4);
+ raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
+}
- pin_reg = readl(gpio_dev->base + i * 4);
- pin_reg &= ~mask;
- writel(pin_reg, gpio_dev->base + i * 4);
+static void amd_gpio_irq_init(struct amd_gpio *gpio_dev)
+{
+ struct pinctrl_desc *desc = gpio_dev->pctrl->desc;
+ int i;
- raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
- }
+ for (i = 0; i < desc->npins; i++)
+ amd_gpio_irq_init_pin(gpio_dev, i);
}
#ifdef CONFIG_PM_SLEEP
@@ -950,8 +952,10 @@ static int amd_gpio_resume(struct device *dev)
for (i = 0; i < desc->npins; i++) {
int pin = desc->pins[i].number;
- if (!amd_gpio_should_save(gpio_dev, pin))
+ if (!amd_gpio_should_save(gpio_dev, pin)) {
+ amd_gpio_irq_init_pin(gpio_dev, pin);
continue;
+ }
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4) & PIN_IRQ_PENDING;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c1976bd8f23016d8706973908f2bb0ac0d852a8f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040358-blade-preheated-5a2d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
c1976bd8f230 ("zonefs: Always invalidate last cached page on append write")
4008e2a0b01a ("zonefs: Reorganize code")
a608da3bd730 ("zonefs: Detect append writes at invalid locations")
8745889a7fd0 ("Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c1976bd8f23016d8706973908f2bb0ac0d852a8f Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Wed, 29 Mar 2023 13:16:01 +0900
Subject: [PATCH] zonefs: Always invalidate last cached page on append write
When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.
This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).
Reported-by: Hans Holmberg <Hans.Holmberg(a)wdc.com>
Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 617e4f9db42e..c6ab2732955e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -382,6 +382,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
struct zonefs_zone *z = zonefs_inode_zone(inode);
struct block_device *bdev = inode->i_sb->s_bdev;
unsigned int max = bdev_max_zone_append_sectors(bdev);
+ pgoff_t start, end;
struct bio *bio;
ssize_t size = 0;
int nr_pages;
@@ -390,6 +391,19 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
iov_iter_truncate(from, max);
+ /*
+ * If the inode block size (zone write granularity) is smaller than the
+ * page size, we may be appending data belonging to the last page of the
+ * inode straddling inode->i_size, with that page already cached due to
+ * a buffered read or readahead. So make sure to invalidate that page.
+ * This will always be a no-op for the case where the block size is
+ * equal to the page size.
+ */
+ start = iocb->ki_pos >> PAGE_SHIFT;
+ end = (iocb->ki_pos + iov_iter_count(from) - 1) >> PAGE_SHIFT;
+ if (invalidate_inode_pages2_range(inode->i_mapping, start, end))
+ return -EBUSY;
+
nr_pages = iov_iter_npages(from, BIO_MAX_VECS);
if (!nr_pages)
return 0;
Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developers don't keep an eye on it, I decided to forward it by mail.
Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
not CCed them in mails like this.
Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=217282 :
> Carsten Hatger 2023-03-31 14:49:06 UTC
>
> Created attachment 304068 [details]
> dmesg of 6.1.22 vanilla boot
>
> Dear all,
>
> ath11k hangs when booting v6.1.22 on a HP Pro x360 G9 w/ Qualcomm FastConnect 6900 using firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23.
>
> However, v.6.1.21 works fine.
>
> Please let me know if I can further assist in resolving this issue - testing would be fine.
>
> Yours,
> Carsten
See the ticket for more details.
[TLDR for the rest of this mail: I'm adding this report to the list of
tracked Linux kernel regressions; the text you find below is based on a
few templates paragraphs you might have encountered already in similar
form.]
BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: v6.1.21..v6.1.22
https://bugzilla.kernel.org/show_bug.cgi?id=217282
#regzbot title: net: wireless: ath11k: hang on boot
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
this thread sees some discussion). See page linked in footer for details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 005308f7bdacf5685ed1a431244a183dbbb9e0e8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040353-preview-jackknife-52f7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
005308f7bdac ("io_uring/poll: clear single/double poll flags on poll arming")
5204aa8c43bd ("io_uring: add a helper for apoll alloc")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
a9c210cebe13 ("io_uring: move epoll handler to its own file")
4cf90495281b ("io_uring: add a dummy -EOPNOTSUPP prep handler")
99f15d8d6136 ("io_uring: move uring_cmd handling to its own file")
cd40cae29ef8 ("io_uring: split out open/close operations")
453b329be5ea ("io_uring: separate out file table handling code")
f4c163dd7d4b ("io_uring: split out fadvise/madvise operations")
0d5847274037 ("io_uring: split out fs related sync/fallocate functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 005308f7bdacf5685ed1a431244a183dbbb9e0e8 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 27 Mar 2023 19:56:18 -0600
Subject: [PATCH] io_uring/poll: clear single/double poll flags on poll arming
Unless we have at least one entry queued, then don't call into
io_poll_remove_entries(). Normally this isn't possible, but if we
retry poll then we can have ->nr_entries cleared again as we're
setting it up. If this happens for a poll retry, then we'll still have
at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks
it has entries to remove.
Clear REQ_F_SINGLE_POLL and REQ_F_DOUBLE_POLL unconditionally when
arming a poll request.
Fixes: c16bda37594f ("io_uring/poll: allow some retries for poll triggering spuriously")
Cc: stable(a)vger.kernel.org
Reported-by: Pengfei Xu <pengfei.xu(a)intel.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 795facbd0e9f..55306e801081 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -726,6 +726,7 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
apoll = io_req_alloc_apoll(req, issue_flags);
if (!apoll)
return IO_APOLL_ABORTED;
+ req->flags &= ~(REQ_F_SINGLE_POLL | REQ_F_DOUBLE_POLL);
req->flags |= REQ_F_POLLED;
ipt.pt._qproc = io_async_queue_proc;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 77af13ba3c7f91d91c377c7e2d122849bbc17128
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040327-freehand-water-9a7b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
77af13ba3c7f ("zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space")
aa7f243f32e1 ("zonefs: Separate zone information from inode information")
34422914dc00 ("zonefs: Reduce struct zonefs_inode_info size")
46a9c526eef7 ("zonefs: Simplify IO error handling")
4008e2a0b01a ("zonefs: Reorganize code")
a608da3bd730 ("zonefs: Detect append writes at invalid locations")
db58653ce0c7 ("zonefs: Fix active zone accounting")
7dd12d65ac64 ("zonefs: fix zone report size in __zonefs_io_error()")
8745889a7fd0 ("Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 77af13ba3c7f91d91c377c7e2d122849bbc17128 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Thu, 30 Mar 2023 09:47:58 +0900
Subject: [PATCH] zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user
space
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index c6ab2732955e..132f01d3461f 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -581,11 +581,21 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
append = sync;
}
- if (append)
+ if (append) {
ret = zonefs_file_dio_append(iocb, from);
- else
+ } else {
+ /*
+ * iomap_dio_rw() may return ENOTBLK if there was an issue with
+ * page invalidation. Overwrite that error code with EBUSY to
+ * be consistent with zonefs_file_dio_append() return value for
+ * similar issues.
+ */
ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops,
&zonefs_write_dio_ops, 0, NULL, 0);
+ if (ret == -ENOTBLK)
+ ret = -EBUSY;
+ }
+
if (zonefs_zone_is_seq(z) &&
(ret > 0 || ret == -EIOCBQUEUED)) {
if (ret > 0)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 77af13ba3c7f91d91c377c7e2d122849bbc17128
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040326-rind-claw-0bcb@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
77af13ba3c7f ("zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space")
aa7f243f32e1 ("zonefs: Separate zone information from inode information")
34422914dc00 ("zonefs: Reduce struct zonefs_inode_info size")
46a9c526eef7 ("zonefs: Simplify IO error handling")
4008e2a0b01a ("zonefs: Reorganize code")
a608da3bd730 ("zonefs: Detect append writes at invalid locations")
db58653ce0c7 ("zonefs: Fix active zone accounting")
7dd12d65ac64 ("zonefs: fix zone report size in __zonefs_io_error()")
8745889a7fd0 ("Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 77af13ba3c7f91d91c377c7e2d122849bbc17128 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Thu, 30 Mar 2023 09:47:58 +0900
Subject: [PATCH] zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user
space
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Tested-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index c6ab2732955e..132f01d3461f 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -581,11 +581,21 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
append = sync;
}
- if (append)
+ if (append) {
ret = zonefs_file_dio_append(iocb, from);
- else
+ } else {
+ /*
+ * iomap_dio_rw() may return ENOTBLK if there was an issue with
+ * page invalidation. Overwrite that error code with EBUSY to
+ * be consistent with zonefs_file_dio_append() return value for
+ * similar issues.
+ */
ret = iomap_dio_rw(iocb, from, &zonefs_write_iomap_ops,
&zonefs_write_dio_ops, 0, NULL, 0);
+ if (ret == -ENOTBLK)
+ ret = -EBUSY;
+ }
+
if (zonefs_zone_is_seq(z) &&
(ret > 0 || ret == -EIOCBQUEUED)) {
if (ret > 0)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2280d425ba3599bdd85c41bd0ec8ba568f00c032
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040354-ramrod-papyrus-415b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
2280d425ba35 ("btrfs: ignore fiemap path cache when there are multiple paths for a node")
73e339e6ab74 ("btrfs: cache sharedness of the last few data extents during fiemap")
b629685803bc ("btrfs: remove roots ulist when checking data extent sharedness")
84a7949d4097 ("btrfs: move ulists to data extent sharedness check context")
61dbb952f0a5 ("btrfs: turn the backref sharedness check cache into a context object")
ceb707da9ad9 ("btrfs: directly pass the inode to btrfs_is_data_extent_shared()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2280d425ba3599bdd85c41bd0ec8ba568f00c032 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 28 Mar 2023 10:45:20 +0100
Subject: [PATCH] btrfs: ignore fiemap path cache when there are multiple paths
for a node
During fiemap, when walking backreferences to determine if a b+tree
node/leaf is shared, we may find a tree block (leaf or node) for which
two parents were added to the references ulist. This happens if we get
for example one direct ref (shared tree block ref) and one indirect ref
(non-shared tree block ref) for the tree block at the current level,
which can happen during relocation.
In that case the fiemap path cache can not be used since it's meant for
a single path, with one tree block at each possible level, so having
multiple references for a tree block at any level may result in getting
the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the
warning:
WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)
at lookup_backref_shared_cache() and at store_backref_shared_cache().
This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the
warning is there just to catch any unexpected case like the one described
above. However if a user finds this it may be scary and get reported.
So just ignore the path cache once we find a tree block for which there
are more than one reference, which is the less common case, and update
the cache with the sharedness check result for all levels below the level
for which we found multiple references.
Reported-by: Jarno Pelkonen <jarno.pelkonen(a)gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkT…
Fixes: 12a824dc67a6 ("btrfs: speedup checking for extent sharedness during fiemap")
CC: stable(a)vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 90e40d5ceccd..e54f0884802a 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1921,8 +1921,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
level = -1;
ULIST_ITER_INIT(&uiter);
while (1) {
- bool is_shared;
- bool cached;
+ const unsigned long prev_ref_count = ctx->refs.nnodes;
walk_ctx.bytenr = bytenr;
ret = find_parent_nodes(&walk_ctx, &shared);
@@ -1940,21 +1939,36 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
ret = 0;
/*
- * If our data extent was not directly shared (without multiple
- * reference items), than it might have a single reference item
- * with a count > 1 for the same offset, which means there are 2
- * (or more) file extent items that point to the data extent -
- * this happens when a file extent item needs to be split and
- * then one item gets moved to another leaf due to a b+tree leaf
- * split when inserting some item. In this case the file extent
- * items may be located in different leaves and therefore some
- * of the leaves may be referenced through shared subtrees while
- * others are not. Since our extent buffer cache only works for
- * a single path (by far the most common case and simpler to
- * deal with), we can not use it if we have multiple leaves
- * (which implies multiple paths).
+ * More than one extent buffer (bytenr) may have been added to
+ * the ctx->refs ulist, in which case we have to check multiple
+ * tree paths in case the first one is not shared, so we can not
+ * use the path cache which is made for a single path. Multiple
+ * extent buffers at the current level happen when:
+ *
+ * 1) level -1, the data extent: If our data extent was not
+ * directly shared (without multiple reference items), then
+ * it might have a single reference item with a count > 1 for
+ * the same offset, which means there are 2 (or more) file
+ * extent items that point to the data extent - this happens
+ * when a file extent item needs to be split and then one
+ * item gets moved to another leaf due to a b+tree leaf split
+ * when inserting some item. In this case the file extent
+ * items may be located in different leaves and therefore
+ * some of the leaves may be referenced through shared
+ * subtrees while others are not. Since our extent buffer
+ * cache only works for a single path (by far the most common
+ * case and simpler to deal with), we can not use it if we
+ * have multiple leaves (which implies multiple paths).
+ *
+ * 2) level >= 0, a tree node/leaf: We can have a mix of direct
+ * and indirect references on a b+tree node/leaf, so we have
+ * to check multiple paths, and the extent buffer (the
+ * current bytenr) may be shared or not. One example is
+ * during relocation as we may get a shared tree block ref
+ * (direct ref) and a non-shared tree block ref (indirect
+ * ref) for the same node/leaf.
*/
- if (level == -1 && ctx->refs.nnodes > 1)
+ if ((ctx->refs.nnodes - prev_ref_count) > 1)
ctx->use_path_cache = false;
if (level >= 0)
@@ -1964,18 +1978,45 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
if (!node)
break;
bytenr = node->val;
- level++;
- cached = lookup_backref_shared_cache(ctx, root, bytenr, level,
- &is_shared);
- if (cached) {
- ret = (is_shared ? 1 : 0);
- break;
+ if (ctx->use_path_cache) {
+ bool is_shared;
+ bool cached;
+
+ level++;
+ cached = lookup_backref_shared_cache(ctx, root, bytenr,
+ level, &is_shared);
+ if (cached) {
+ ret = (is_shared ? 1 : 0);
+ break;
+ }
}
shared.share_count = 0;
shared.have_delayed_delete_refs = false;
cond_resched();
}
+ /*
+ * If the path cache is disabled, then it means at some tree level we
+ * got multiple parents due to a mix of direct and indirect backrefs or
+ * multiple leaves with file extent items pointing to the same data
+ * extent. We have to invalidate the cache and cache only the sharedness
+ * result for the levels where we got only one node/reference.
+ */
+ if (!ctx->use_path_cache) {
+ int i = 0;
+
+ level--;
+ if (ret >= 0 && level >= 0) {
+ bytenr = ctx->path_cache_entries[level].bytenr;
+ ctx->use_path_cache = true;
+ store_backref_shared_cache(ctx, root, bytenr, level, ret);
+ i = level + 1;
+ }
+
+ for ( ; i < BTRFS_MAX_LEVEL; i++)
+ ctx->path_cache_entries[i].bytenr = 0;
+ }
+
/*
* Cache the sharedness result for the data extent if we know our inode
* has more than 1 file extent item that refers to the data extent.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 50d281fc434cb8e2497f5e70a309ccca6b1a09f0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040340-happiest-next-a09c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
50d281fc434c ("btrfs: scan device in non-exclusive mode")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 23 Mar 2023 15:56:48 +0800
Subject: [PATCH] btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.
During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY
error
- LTP pwritev03 test case, where mkfs.vfs failed with
the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
on the device.
In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:
btrfs_mount_root
btrfs_open_devices
open_fs_devices
btrfs_open_one_device
btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.
The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.
Reported-by: Sherry Yang <sherry.yang(a)oracle.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6d0124b6e79e..ac0e8fb92fc8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
* So, we need to add a special mount option to scan for
* later supers, using BTRFS_SUPER_MIRROR_MAX instead
*/
- flags |= FMODE_EXCL;
+ /*
+ * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
+ * initiate the device scan which may race with the user's mount
+ * or mkfs command, resulting in failure.
+ * Since the device scan is solely for reading purposes, there is
+ * no need for FMODE_EXCL. Additionally, the devices are read again
+ * during the mount process. It is ok to get some inconsistent
+ * values temporarily, as the device paths of the fsid are the only
+ * required information for assembling the volume.
+ */
bdev = blkdev_get_by_path(path, flags, holder);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 50d281fc434cb8e2497f5e70a309ccca6b1a09f0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040338-carving-ripping-8786@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
50d281fc434c ("btrfs: scan device in non-exclusive mode")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
96c2e067ed3e ("btrfs: skip devices without magic signature when mounting")
c3e1f96c37d0 ("btrfs: enumerate the type of exclusive operation in progress")
944d3f9fac61 ("btrfs: switch seed device to list api")
c4989c2fd0eb ("btrfs: simplify setting/clearing fs_info to btrfs_fs_devices")
54eed6ae8d8e ("btrfs: make close_fs_devices return void")
3712ccb7f1cc ("btrfs: factor out loop logic from btrfs_free_extra_devids")
dc0ab488d2cb ("btrfs: factor out reada loop in __reada_start_machine")
adca4d945c8d ("btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT")
3092c68fc58c ("btrfs: sysfs: add bdi link to the fsid directory")
998a0671961f ("btrfs: include non-missing as a qualifier for the latest_bdev")
1ed802c972c6 ("btrfs: drop useless goto in open_fs_devices")
b335eab890ed ("btrfs: make btrfs_read_disk_super return struct btrfs_disk_super")
c4a816c67c39 ("btrfs: introduce chunk allocation policy")
9a8658e33d8f ("btrfs: open code trivial helper btrfs_header_fsid")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 23 Mar 2023 15:56:48 +0800
Subject: [PATCH] btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.
During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY
error
- LTP pwritev03 test case, where mkfs.vfs failed with
the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
on the device.
In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:
btrfs_mount_root
btrfs_open_devices
open_fs_devices
btrfs_open_one_device
btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.
The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.
Reported-by: Sherry Yang <sherry.yang(a)oracle.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6d0124b6e79e..ac0e8fb92fc8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
* So, we need to add a special mount option to scan for
* later supers, using BTRFS_SUPER_MIRROR_MAX instead
*/
- flags |= FMODE_EXCL;
+ /*
+ * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
+ * initiate the device scan which may race with the user's mount
+ * or mkfs command, resulting in failure.
+ * Since the device scan is solely for reading purposes, there is
+ * no need for FMODE_EXCL. Additionally, the devices are read again
+ * during the mount process. It is ok to get some inconsistent
+ * values temporarily, as the device paths of the fsid are the only
+ * required information for assembling the volume.
+ */
bdev = blkdev_get_by_path(path, flags, holder);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
The patch below does not apply to the 5.20-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.20.y
git checkout FETCH_HEAD
git cherry-pick -x 50d281fc434cb8e2497f5e70a309ccca6b1a09f0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040334-reproduce-granola-78a1@gregkh' --subject-prefix 'PATCH 5.20.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 23 Mar 2023 15:56:48 +0800
Subject: [PATCH] btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.
During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY
error
- LTP pwritev03 test case, where mkfs.vfs failed with
the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
on the device.
In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:
btrfs_mount_root
btrfs_open_devices
open_fs_devices
btrfs_open_one_device
btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.
The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.
Reported-by: Sherry Yang <sherry.yang(a)oracle.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6d0124b6e79e..ac0e8fb92fc8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
* So, we need to add a special mount option to scan for
* later supers, using BTRFS_SUPER_MIRROR_MAX instead
*/
- flags |= FMODE_EXCL;
+ /*
+ * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
+ * initiate the device scan which may race with the user's mount
+ * or mkfs command, resulting in failure.
+ * Since the device scan is solely for reading purposes, there is
+ * no need for FMODE_EXCL. Additionally, the devices are read again
+ * during the mount process. It is ok to get some inconsistent
+ * values temporarily, as the device paths of the fsid are the only
+ * required information for assembling the volume.
+ */
bdev = blkdev_get_by_path(path, flags, holder);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 50d281fc434cb8e2497f5e70a309ccca6b1a09f0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023040333-vanquish-wriggle-e007@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
50d281fc434c ("btrfs: scan device in non-exclusive mode")
12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
5d1ab66c56fe ("btrfs: disallow space_cache in ZONED mode")
b70f509774ad ("btrfs: check and enable ZONED mode")
5b316468983d ("btrfs: get zone information of zoned block devices")
bacce86ae8a7 ("btrfs: drop unused argument step from btrfs_free_extra_devids")
96c2e067ed3e ("btrfs: skip devices without magic signature when mounting")
c3e1f96c37d0 ("btrfs: enumerate the type of exclusive operation in progress")
944d3f9fac61 ("btrfs: switch seed device to list api")
c4989c2fd0eb ("btrfs: simplify setting/clearing fs_info to btrfs_fs_devices")
54eed6ae8d8e ("btrfs: make close_fs_devices return void")
3712ccb7f1cc ("btrfs: factor out loop logic from btrfs_free_extra_devids")
dc0ab488d2cb ("btrfs: factor out reada loop in __reada_start_machine")
adca4d945c8d ("btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT")
3092c68fc58c ("btrfs: sysfs: add bdi link to the fsid directory")
998a0671961f ("btrfs: include non-missing as a qualifier for the latest_bdev")
1ed802c972c6 ("btrfs: drop useless goto in open_fs_devices")
b335eab890ed ("btrfs: make btrfs_read_disk_super return struct btrfs_disk_super")
c4a816c67c39 ("btrfs: introduce chunk allocation policy")
9a8658e33d8f ("btrfs: open code trivial helper btrfs_header_fsid")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 23 Mar 2023 15:56:48 +0800
Subject: [PATCH] btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.
During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY
error
- LTP pwritev03 test case, where mkfs.vfs failed with
the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
on the device.
In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:
btrfs_mount_root
btrfs_open_devices
open_fs_devices
btrfs_open_one_device
btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.
The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.
Reported-by: Sherry Yang <sherry.yang(a)oracle.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6d0124b6e79e..ac0e8fb92fc8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1366,8 +1366,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
* So, we need to add a special mount option to scan for
* later supers, using BTRFS_SUPER_MIRROR_MAX instead
*/
- flags |= FMODE_EXCL;
+ /*
+ * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
+ * initiate the device scan which may race with the user's mount
+ * or mkfs command, resulting in failure.
+ * Since the device scan is solely for reading purposes, there is
+ * no need for FMODE_EXCL. Additionally, the devices are read again
+ * during the mount process. It is ok to get some inconsistent
+ * values temporarily, as the device paths of the fsid are the only
+ * required information for assembling the volume.
+ */
bdev = blkdev_get_by_path(path, flags, holder);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
On 4/3/23 12:10 PM, Matthew Wilcox wrote:
> On Sun, Apr 02, 2023 at 06:19:20AM +0800, Rongwei Wang wrote:
>> Without this modification, a core will wait (mostly)
>> 'swap_info_struct->lock' when completing
>> 'del_from_avail_list(p)'. Immediately, other cores
>> soon calling 'add_to_avail_list()' to add the same
>> object again when acquiring the lock that released
>> by former. It's not the desired result but exists
>> indeed. This case can be described as below:
> This feels like a very verbose way of saying
>
> "The si->lock must be held when deleting the si from the
> available list. Otherwise, another thread can re-add the
> si to the available list, which can lead to memory corruption.
> The only place we have found where this happens is in the
> swapoff path."
It looks better than mine. Sorry for my confusing description, it will
be fixed in the next version.
>
>> +++ b/mm/swapfile.c
>> @@ -2610,8 +2610,12 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> spin_unlock(&swap_lock);
>> goto out_dput;
>> }
>> - del_from_avail_list(p);
>> + /*
>> + * Here lock is used to protect deleting and SWP_WRITEOK clearing
>> + * can be seen concurrently.
>> + */
> This comment isn't necessary. But I would add a lockdep assert inside
> __del_from_avail_list() that p->lock is held.
Thanks. Actually, I have this line in previous test version, but delete
for saving one line of code.
I will update here as you said.
Thanks for your time.
>
>> spin_lock(&p->lock);
>> + del_from_avail_list(p);
>> if (p->prio < 0) {
>> struct swap_info_struct *si = p;
>> int nid;
>> --
>> 2.27.0
>>
>>
This is a proposal to revert commit 914eedcb9ba0ff53c33808.
I found this when writting a simple UFFDIO_API test to be the first unit
test in this set. Two things breaks with the commit:
- UFFDIO_API check was lost and missing. According to man page, the
kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa. This
check is needed if the api version will be extended in the future, or
user app won't be able to identify which is a new kernel.
- Feature flags checks were removed, which means UFFDIO_API with a
feature that does not exist will also succeed. According to the man
page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
unknown features passed in.
Link: https://lore.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.com
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
fs/userfaultfd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 8395605790f6..3b2a41c330e6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1977,8 +1977,10 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
ret = -EFAULT;
if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
goto out;
- /* Ignore unsupported features (userspace built against newer kernel) */
- features = uffdio_api.features & UFFD_API_FEATURES;
+ features = uffdio_api.features;
+ ret = -EINVAL;
+ if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
+ goto err_out;
ret = -EPERM;
if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
goto err_out;
--
2.39.1
Hello my beloved, good morning from here, how are you doing today? My
name is Mrs. Rabi Juanni Marcus, I have something very important that
i want to discuss with you.
If the value read from the CHDBOFF and ERDBOFF registers is outside the
range of the MHI register space then an invalid address might be computed
which later causes a kernel panic. Range check the read value to prevent
a crash due to bad data from the device.
Fixes: 6cd330ae76ff ("bus: mhi: core: Add support for ringing channel/event ring doorbells")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy(a)quicinc.com>
---
v2:
-CC stable
-Use ERANGE for the error code
drivers/bus/mhi/host/init.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index 3d779ee..b46a082 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -516,6 +516,12 @@ int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
return -EIO;
}
+ if (val >= mhi_cntrl->reg_len - (8 * MHI_DEV_WAKE_DB)) {
+ dev_err(dev, "CHDB offset: 0x%x is out of range: 0x%zx\n",
+ val, mhi_cntrl->reg_len - (8 * MHI_DEV_WAKE_DB));
+ return -ERANGE;
+ }
+
/* Setup wake db */
mhi_cntrl->wake_db = base + val + (8 * MHI_DEV_WAKE_DB);
mhi_cntrl->wake_set = false;
@@ -532,6 +538,12 @@ int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
return -EIO;
}
+ if (val >= mhi_cntrl->reg_len - (8 * mhi_cntrl->total_ev_rings)) {
+ dev_err(dev, "ERDB offset: 0x%x is out of range: 0x%zx\n",
+ val, mhi_cntrl->reg_len - (8 * mhi_cntrl->total_ev_rings));
+ return -ERANGE;
+ }
+
/* Setup event db address for each ev_ring */
mhi_event = mhi_cntrl->mhi_event;
for (i = 0; i < mhi_cntrl->total_ev_rings; i++, val += 8, mhi_event++) {
--
2.7.4
If two or more suitable entries with the same filename are found in
__uc_fw_auto_select's fw_blobs, and that filename fails to load in the
first attempt and in the retry, when __uc_fw_auto_select is called for
the third time, the coincidence of strings will cause it to clear
file_selected.path at the first hit, so it will return the second hit
over and over again, indefinitely.
Of course this doesn't occur with the pristine blob lists, but a
modified version could run into this, e.g., patching in a duplicate
entry, or (as in our case) disarming blob loading by remapping their
names to "/*(DEBLOBBED)*/", given a toolchain that unifies identical
string literals.
Of course I'm ready to carry a patchlet to avoid this problem
triggered by our (GNU Linux-libre's) intentional changes, but I
figured you might be interested in fail-safing it even in accidental
backporting circumstances. I realize it's not entirely foolproof: if
the same string appears in two entries separated by a different one,
the infinite loop might still occur. Catching that even more unlikely
situation seemed too expensive.
Link: https://www.fsfla.org/pipermail/linux-libre/2023-March/003506.html
Cc: intel-gfx(a)lists.freedesktop.org
Cc: stable(a)vger.kernel.org # 6.[12].x
Signed-off-by: Alexandre Oliva <lxoliva(a)fsfla.org>
---
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index 9d6f571097e6..2b7564a3ed82 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -259,7 +259,10 @@ __uc_fw_auto_select(struct drm_i915_private *i915, struct intel_uc_fw *uc_fw)
uc_fw->file_selected.path = NULL;
continue;
- }
+ } else if (uc_fw->file_wanted.path == blob->path)
+ /* Avoid retrying forever when neighbor
+ entries point to the same path. */
+ continue;
uc_fw->file_selected.path = blob->path;
uc_fw->file_wanted.path = blob->path;
--
2.25.1
--
Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/
Free Software Activist GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts. Ask me about <https://stallmansupport.org>
In dual-bridges scenario, some bugs were found for irq
controllers drivers, so the patch serie is used to fix them.
V1->V2:
1. Remove all of ChangeID in patches
2. Exchange the sequence of some patches
3. Adjust code style of if...else... in patch[2]
Jianmin Lv (5):
irqchip/loongson-eiointc: Fix returned value on parsing MADT
irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
irqchip/loongson-eiointc: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
drivers/irqchip/irq-loongson-eiointc.c | 32 ++++++++++++++++++--------
drivers/irqchip/irq-loongson-pch-pic.c | 6 ++++-
2 files changed, 27 insertions(+), 11 deletions(-)
--
2.31.1
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
v2: https://lore.kernel.org/r/20230326182120.194541-1-hi@alyssa.is
Difference from v2: replaced asflags-remove-y with every possible
debug flag with asflags-y += -g0, as suggested by Nick Desaulniers.
Additionally, I've CCed the x86 maintainers this time, since Masahiro
said he would like acks from subsystem maintainers, and
get_maintainer.pl didn't pick them the first time around.
arch/riscv/purgatory/Makefile | 7 +------
arch/x86/purgatory/Makefile | 3 +--
2 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index d16bf715a586..9c1e71853ee7 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -84,12 +84,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_ctype.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strcmp.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strlen.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_strncmp.o += -Wa,-gdwarf-2
+asflags-y += -g0
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..8e6c81b1c8f7 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-y += -g0
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
The patch titled
Subject: nilfs2: fix sysfs interface lifetime
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-sysfs-interface-lifetime.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix sysfs interface lifetime
Date: Fri, 31 Mar 2023 05:55:15 +0900
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking. This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.
Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava(a)dubeyko.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/super.c | 2 ++
fs/nilfs2/the_nilfs.c | 12 +++++++-----
2 files changed, 9 insertions(+), 5 deletions(-)
--- a/fs/nilfs2/super.c~nilfs2-fix-sysfs-interface-lifetime
+++ a/fs/nilfs2/super.c
@@ -482,6 +482,7 @@ static void nilfs_put_super(struct super
up_write(&nilfs->ns_sem);
}
+ nilfs_sysfs_delete_device_group(nilfs);
iput(nilfs->ns_sufile);
iput(nilfs->ns_cpfile);
iput(nilfs->ns_dat);
@@ -1105,6 +1106,7 @@ nilfs_fill_super(struct super_block *sb,
nilfs_put_root(fsroot);
failed_unload:
+ nilfs_sysfs_delete_device_group(nilfs);
iput(nilfs->ns_sufile);
iput(nilfs->ns_cpfile);
iput(nilfs->ns_dat);
--- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-sysfs-interface-lifetime
+++ a/fs/nilfs2/the_nilfs.c
@@ -87,7 +87,6 @@ void destroy_nilfs(struct the_nilfs *nil
{
might_sleep();
if (nilfs_init(nilfs)) {
- nilfs_sysfs_delete_device_group(nilfs);
brelse(nilfs->ns_sbh[0]);
brelse(nilfs->ns_sbh[1]);
}
@@ -305,6 +304,10 @@ int load_nilfs(struct the_nilfs *nilfs,
goto failed;
}
+ err = nilfs_sysfs_create_device_group(sb);
+ if (unlikely(err))
+ goto sysfs_error;
+
if (valid_fs)
goto skip_recovery;
@@ -366,6 +369,9 @@ int load_nilfs(struct the_nilfs *nilfs,
goto failed;
failed_unload:
+ nilfs_sysfs_delete_device_group(nilfs);
+
+ sysfs_error:
iput(nilfs->ns_cpfile);
iput(nilfs->ns_sufile);
iput(nilfs->ns_dat);
@@ -697,10 +703,6 @@ int init_nilfs(struct the_nilfs *nilfs,
if (err)
goto failed_sbh;
- err = nilfs_sysfs_create_device_group(sb);
- if (err)
- goto failed_sbh;
-
set_nilfs_init(nilfs);
err = 0;
out:
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-potential-uaf-of-struct-nilfs_sc_info-in-nilfs_segctor_thread.patch
nilfs2-fix-sysfs-interface-lifetime.patch
As I attached a USB SSD into CentOS 9 Stream computer, after a short
while it swaps /dev/sdb into /dev/sdc and the I/O gets ruined.
Kind regards, Ilari Jääskeläinen.
When cleaning up peer group ids in the failure path we need to make sure
to hold on to the namespace lock. Otherwise another thread might just
turn the mount from a shared into a non-shared mount concurrently.
Reported-by: syzbot+8ac3859139c685c4f597(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/00000000000088694505f8132d77@google.com
Fixes: 2a1867219c7b ("fs: add mount_setattr()")
Cc: stable(a)vger.kernel.org # 5.12+
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
fs/namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index bc0f15257b49..6836e937ee61 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4183,9 +4183,9 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
unlock_mount_hash();
if (kattr->propagation) {
- namespace_unlock();
if (err)
cleanup_group_ids(mnt, NULL);
+ namespace_unlock();
}
return err;
---
base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
change-id: 20230330-vfs-mount_setattr-propagation-fix-363b7c59d7fb
I am fairly certain that CentOS 9 Stream GNOME mounted it without UUID,
though.
pe, 2023-03-31 kello 09:35 +0000, David Laight kirjoitti:
> From: Ilari Jääskeläinen
> > Sent: 31 March 2023 10:09
> >
> > I am afraid I cant do that. I already almost wiped my root
> > partition by
> > accident because of this flaw.
>
> Always mount filsystems by uuid, not device name.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)
roger that. good advice. thanks.
pe, 2023-03-31 kello 09:35 +0000, David Laight kirjoitti:
> From: Ilari Jääskeläinen
> > Sent: 31 March 2023 10:09
> >
> > I am afraid I cant do that. I already almost wiped my root
> > partition by
> > accident because of this flaw.
>
> Always mount filsystems by uuid, not device name.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)
Please can I trust you?.
I need your assistance to help me to invest my inheritance fund in
your country and to help me to come over to your country for the
betterment of my life and continue my education.
I will be happy to hear from you, then I will give you more details.
Best regards,
Miss Sarah Arabian
It the device is probed in non-zero ACPI D state, the module
identification is delayed until the first streamon.
The module identification has two parts: deviceID and version. To rea
the version we have to enable OTP read. This cannot be done during
streamon, becase it modifies REG_MODE_SELECT.
Since the driver has the same behaviour for all the module versions, do
not read the module version from the sensor's OTP.
Cc: stable(a)vger.kernel.org
Fixes: 0e014f1a8d54 ("media: ov8856: support device probe in non-zero ACPI D state")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
To: Dongchun Zhu <dongchun.zhu(a)mediatek.com>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
To: Sakari Ailus <sakari.ailus(a)linux.intel.com>
To: Bingbu Cao <bingbu.cao(a)intel.com>
Cc: Max Staudt <mstaudt(a)chromium.org>
Cc: Jimmy Su <jimmy.su(a)intel.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
Cc: linux-media(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
drivers/media/i2c/ov8856.c | 40 ----------------------------------------
1 file changed, 40 deletions(-)
diff --git a/drivers/media/i2c/ov8856.c b/drivers/media/i2c/ov8856.c
index cf8384e09413..b5c7881383ca 100644
--- a/drivers/media/i2c/ov8856.c
+++ b/drivers/media/i2c/ov8856.c
@@ -1709,46 +1709,6 @@ static int ov8856_identify_module(struct ov8856 *ov8856)
return -ENXIO;
}
- ret = ov8856_write_reg(ov8856, OV8856_REG_MODE_SELECT,
- OV8856_REG_VALUE_08BIT, OV8856_MODE_STREAMING);
- if (ret)
- return ret;
-
- ret = ov8856_write_reg(ov8856, OV8856_OTP_MODE_CTRL,
- OV8856_REG_VALUE_08BIT, OV8856_OTP_MODE_AUTO);
- if (ret) {
- dev_err(&client->dev, "failed to set otp mode");
- return ret;
- }
-
- ret = ov8856_write_reg(ov8856, OV8856_OTP_LOAD_CTRL,
- OV8856_REG_VALUE_08BIT,
- OV8856_OTP_LOAD_CTRL_ENABLE);
- if (ret) {
- dev_err(&client->dev, "failed to enable load control");
- return ret;
- }
-
- ret = ov8856_read_reg(ov8856, OV8856_MODULE_REVISION,
- OV8856_REG_VALUE_08BIT, &val);
- if (ret) {
- dev_err(&client->dev, "failed to read module revision");
- return ret;
- }
-
- dev_info(&client->dev, "OV8856 revision %x (%s) at address 0x%02x\n",
- val,
- val == OV8856_2A_MODULE ? "2A" :
- val == OV8856_1B_MODULE ? "1B" : "unknown revision",
- client->addr);
-
- ret = ov8856_write_reg(ov8856, OV8856_REG_MODE_SELECT,
- OV8856_REG_VALUE_08BIT, OV8856_MODE_STANDBY);
- if (ret) {
- dev_err(&client->dev, "failed to exit streaming mode");
- return ret;
- }
-
ov8856->identified = true;
return 0;
---
base-commit: 9fd6ba5420ba2b637d1ecc6de8613ec8b9c87e5a
change-id: 20230323-ov8856-otp-112f3cdc74b1
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
From: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
When considering whether to mark one context as stopped and another as
started we need to look at whether the previous and new _contexts_ are
different and not just requests. Otherwise the software tracked context
start time was incorrectly updated to the most recent lite-restore time-
stamp, which was in some cases resulting in active time going backward,
until the context switch (typically the hearbeat pulse) would synchronise
with the hardware tracked context runtime. Easiest use case to observe
this behaviour was with a full screen clients with close to 100% engine
load.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Fixes: bb6287cb1886 ("drm/i915: Track context current active time")
Cc: <stable(a)vger.kernel.org> # v5.19+
---
drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 1bbe6708d0a7..750326434677 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2018,6 +2018,8 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
* inspecting the queue to see if we need to resumbit.
*/
if (*prev != *execlists->active) { /* elide lite-restores */
+ struct intel_context *prev_ce = NULL, *active_ce = NULL;
+
/*
* Note the inherent discrepancy between the HW runtime,
* recorded as part of the context switch, and the CPU
@@ -2029,9 +2031,15 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
* and correct overselves later when updating from HW.
*/
if (*prev)
- lrc_runtime_stop((*prev)->context);
+ prev_ce = (*prev)->context;
if (*execlists->active)
- lrc_runtime_start((*execlists->active)->context);
+ active_ce = (*execlists->active)->context;
+ if (prev_ce != active_ce) {
+ if (prev_ce)
+ lrc_runtime_stop(prev_ce);
+ if (active_ce)
+ lrc_runtime_start(active_ce);
+ }
new_timeslice(execlists);
}
--
2.37.2
The bug was obswerved while reading code. There are not many users of
addr_mode_nbytes. Anyway, we should update the flash's current address
mode when changing the address mode, fix it. We don't care for now about
the set_4byte_addr_mode(nor, false) from spi_nor_restore(), as it is
used at driver remove and shutdown.
Fixes: d7931a215063 ("mtd: spi-nor: core: Track flash's internal address mode")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
Cc: stable(a)vger.kernel.org
---
drivers/mtd/spi-nor/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 0517a61975e4..4f0d90d3dad5 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -3135,6 +3135,7 @@ static int spi_nor_quad_enable(struct spi_nor *nor)
static int spi_nor_init(struct spi_nor *nor)
{
+ struct spi_nor_flash_parameter *params = nor->params;
int err;
err = spi_nor_octal_dtr_enable(nor, true);
@@ -3176,9 +3177,10 @@ static int spi_nor_init(struct spi_nor *nor)
*/
WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
"enabling reset hack; may not recover from unexpected reboots\n");
- err = nor->params->set_4byte_addr_mode(nor, true);
+ err = params->set_4byte_addr_mode(nor, true);
if (err && err != -ENOTSUPP)
return err;
+ params->addr_mode_nbytes = 4;
}
return 0;
--
2.40.0.348.gf938b09366-goog
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Reiserfs sets a security xattr at inode creation time in two stages: first,
it calls reiserfs_security_init() to obtain the xattr from active LSMs;
then, it calls reiserfs_security_write() to actually write that xattr.
Unfortunately, it seems there is a wrong expectation that LSMs provide the
full xattr name in the form 'security.<suffix>'. However, LSMs always
provided just the suffix, causing reiserfs to not write the xattr at all
(if the suffix is shorter than the prefix), or to write an xattr with the
wrong name.
Add a temporary buffer in reiserfs_security_write(), and write to it the
full xattr name, before passing it to reiserfs_xattr_set_handle().
Since the 'security.' prefix is always prepended, remove the name length
check.
Cc: stable(a)vger.kernel.org # v2.6.x
Fixes: 57fe60df6241 ("reiserfs: add atomic addition of selinux attributes during inode creation")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
fs/reiserfs/xattr_security.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/reiserfs/xattr_security.c b/fs/reiserfs/xattr_security.c
index 6bffdf9a4fd..b0c354ab113 100644
--- a/fs/reiserfs/xattr_security.c
+++ b/fs/reiserfs/xattr_security.c
@@ -95,11 +95,13 @@ int reiserfs_security_write(struct reiserfs_transaction_handle *th,
struct inode *inode,
struct reiserfs_security_handle *sec)
{
+ char xattr_name[XATTR_NAME_MAX + 1];
int error;
- if (strlen(sec->name) < sizeof(XATTR_SECURITY_PREFIX))
- return -EINVAL;
- error = reiserfs_xattr_set_handle(th, inode, sec->name, sec->value,
+ snprintf(xattr_name, sizeof(xattr_name), "%s%s", XATTR_SECURITY_PREFIX,
+ sec->name);
+
+ error = reiserfs_xattr_set_handle(th, inode, xattr_name, sec->value,
sec->length, XATTR_CREATE);
if (error == -ENODATA || error == -EOPNOTSUPP)
error = 0;
--
2.25.1
While determining the initial pin assignment to be sent in the configure
message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to
send both Pin Assignment C and E when both are supported by the DFP_U and
UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates,
VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never
selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only
Pin Assignment C if it is available.
Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera(a)google.com>
---
drivers/usb/typec/altmodes/displayport.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/typec/altmodes/displayport.c b/drivers/usb/typec/altmodes/displayport.c
index 662cd043b50e..8f3e884222ad 100644
--- a/drivers/usb/typec/altmodes/displayport.c
+++ b/drivers/usb/typec/altmodes/displayport.c
@@ -112,8 +112,12 @@ static int dp_altmode_configure(struct dp_altmode *dp, u8 con)
if (dp->data.status & DP_STATUS_PREFER_MULTI_FUNC &&
pin_assign & DP_PIN_ASSIGN_MULTI_FUNC_MASK)
pin_assign &= DP_PIN_ASSIGN_MULTI_FUNC_MASK;
- else if (pin_assign & DP_PIN_ASSIGN_DP_ONLY_MASK)
+ else if (pin_assign & DP_PIN_ASSIGN_DP_ONLY_MASK) {
pin_assign &= DP_PIN_ASSIGN_DP_ONLY_MASK;
+ /* Default to pin assign C if available */
+ if (pin_assign & BIT(DP_PIN_ASSIGN_C))
+ pin_assign = BIT(DP_PIN_ASSIGN_C);
+ }
if (!pin_assign)
return -EINVAL;
base-commit: 97318d6427f62b723c89f4150f8f48126ef74961
--
2.40.0.348.gf938b09366-goog
SUBJECT: ovl: fail on invalid uid/gid mapping at copy up
COMMIT: 4f11ada10d0ad3fd53e2bd67806351de63a4f9c3
Reason for request:
This resolves CVE-2023-0386
CVE context: https://nvd.nist.gov/vuln/detail/CVE-2023-0386
SVACE reports return value of a function 'usb_alloc_urb' is dereferenced
without checking for null in 5.10 stable releases.
The problem has been fixed by the following
patch which can be cleanly applied to the 5.10 branch.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
The patch titled
Subject: mm: take a page reference when removing device exclusive entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-take-a-page-reference-when-removing-device-exclusive-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alistair Popple <apopple(a)nvidia.com>
Subject: mm: take a page reference when removing device exclusive entries
Date: Thu, 30 Mar 2023 12:25:19 +1100
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-take-a-page-reference-when-removing-device-exclusive-entries
+++ a/mm/memory.c
@@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusiv
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a reference to lock the folio because we don't hold
+ * the PTL so a racing thread can remove the device-exclusive
+ * entry and unmap it. If the folio is free the entry must
+ * have been removed already. If it happens to have already
+ * been re-allocated after being freed all we do is lock and
+ * unlock it.
+ */
+ if (!folio_try_get(folio))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ folio_put(folio);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusiv
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ folio_put(folio);
mmu_notifier_invalidate_range_end(&range);
return 0;
_
Patches currently in -mm which might be from apopple(a)nvidia.com are
mm-take-a-page-reference-when-removing-device-exclusive-entries.patch
commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
introduces a memory leak by missing a call to destroy_context() when a
percpu_counter fails to allocate.
Before introducing the per-cpu counter allocations, init_new_context()
was the last call that could fail in mm_init(), and thus there was no
need to ever invoke destroy_context() in the error paths. Adding the
following percpu counter allocations adds error paths after
init_new_context(), which means its associated destroy_context() needs
to be called when percpu counters fail to allocate.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org # 6.2
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index c0257cbee093..c983c4fe3090 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1171,6 +1171,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
fail_pcpu:
while (i > 0)
percpu_counter_destroy(&mm->rss_stat[--i]);
+ destroy_context(mm);
fail_nocontext:
mm_free_pgd(mm);
fail_nopgd:
--
2.25.1
The patch titled
Subject: mm: fix memory leak on mm_init error handling
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-memory-leak-on-mm_init-error-handling.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm: fix memory leak on mm_init error handling
Date: Thu, 30 Mar 2023 09:38:22 -0400
commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
introduces a memory leak by missing a call to destroy_context() when a
percpu_counter fails to allocate.
Before introducing the per-cpu counter allocations, init_new_context() was
the last call that could fail in mm_init(), and thus there was no need to
ever invoke destroy_context() in the error paths. Adding the following
percpu counter allocations adds error paths after init_new_context(),
which means its associated destroy_context() needs to be called when
percpu counters fail to allocate.
Link: https://lkml.kernel.org/r/20230330133822.66271-1-mathieu.desnoyers@efficios…
Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/fork.c~mm-fix-memory-leak-on-mm_init-error-handling
+++ a/kernel/fork.c
@@ -1174,6 +1174,7 @@ static struct mm_struct *mm_init(struct
fail_pcpu:
while (i > 0)
percpu_counter_destroy(&mm->rss_stat[--i]);
+ destroy_context(mm);
fail_nocontext:
mm_free_pgd(mm);
fail_nopgd:
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-fix-memory-leak-on-mm_init-error-handling.patch