vhost_vsock_handle_tx_kick() already holds the mutex during its call
to vhost_get_vq_desc(). All we have to do here is take the same lock
during virtqueue clean-up and we mitigate the reported issues.
Also WARN() as a precautionary measure. The purpose of this is to
capture possible future race conditions which may pop up over time.
Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+adc3cb32385586bec859(a)syzkaller.appspotmail.com
Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
---
drivers/vhost/vhost.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe28..ef7e371e3e649 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
int i;
for (i = 0; i < dev->nvqs; ++i) {
+ /* No workers should run here by design. However, races have
+ * previously occurred where drivers have been unable to flush
+ * all work properly prior to clean-up. Without a successful
+ * flush the guest will malfunction, but avoiding host memory
+ * corruption in those cases does seem preferable.
+ */
+ WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
+
+ mutex_lock(&dev->vqs[i]->mutex);
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
if (dev->vqs[i]->kick)
@@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
if (dev->vqs[i]->call_ctx.ctx)
eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
vhost_vq_reset(dev, dev->vqs[i]);
+ mutex_unlock(&dev->vqs[i]->mutex);
}
vhost_dev_free_iovecs(dev);
if (dev->log_ctx)
--
2.35.1.616.g0bdcbb4464-goog
This is the start of the stable review cycle for the 4.14.271 release.
There are 18 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.271-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.271-rc1
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/spectre.h | 32 ++++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++++++-
arch/arm/kernel/entry-common.S | 24 +++
arch/arm/kernel/spectre.c | 71 ++++++++
arch/arm/kernel/traps.c | 65 ++++++-
arch/arm/kernel/vmlinux-xip.lds.S | 37 ++--
arch/arm/kernel/vmlinux.lds.S | 37 ++--
arch/arm/mm/Kconfig | 11 ++
arch/arm/mm/proc-v7-bugs.c | 198 +++++++++++++++++++---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 214 +++++++++++++++++-------
drivers/firmware/psci.c | 15 ++
include/linux/arm-smccc.h | 74 ++++++++
include/linux/bpf.h | 11 ++
kernel/sysctl.c | 8 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
22 files changed, 822 insertions(+), 146 deletions(-)
This is the start of the stable review cycle for the 5.4.184 release.
There are 18 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.184-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.184-rc1
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/spectre.h | 32 ++++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++++++-
arch/arm/kernel/entry-common.S | 24 +++
arch/arm/kernel/spectre.c | 71 ++++++++
arch/arm/kernel/traps.c | 65 ++++++-
arch/arm/kernel/vmlinux.lds.h | 35 +++-
arch/arm/mm/Kconfig | 11 ++
arch/arm/mm/proc-v7-bugs.c | 199 +++++++++++++++++++---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 216 +++++++++++++++++-------
drivers/firmware/psci/psci.c | 15 ++
include/linux/arm-smccc.h | 74 ++++++++
include/linux/bpf.h | 12 ++
kernel/sysctl.c | 8 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
21 files changed, 796 insertions(+), 137 deletions(-)
This is a complement of f6795053dac8 ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.
This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).
Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.
So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area(). To allow that, move those two macros
out of mm/mmap.c into include/linux/sched/mm.h
If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.
For the time being, only ARM64 is affected by this change.
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Steve Capper <steve.capper(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses")
Cc: <stable(a)vger.kernel.org> # 5.0.x
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
fs/hugetlbfs/inode.c | 9 +++++----
include/linux/sched/mm.h | 8 ++++++++
mm/mmap.c | 8 --------
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ba15a85c7dfb..73f0a1f94d78 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -205,7 +205,7 @@ hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
info.flags = 0;
info.length = len;
info.low_limit = current->mm->mmap_base;
- info.high_limit = TASK_SIZE;
+ info.high_limit = arch_get_mmap_end(addr, len, flags);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
info.align_offset = 0;
return vm_unmapped_area(&info);
@@ -221,7 +221,7 @@ hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = max(PAGE_SIZE, mmap_min_addr);
- info.high_limit = current->mm->mmap_base;
+ info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
info.align_offset = 0;
addr = vm_unmapped_area(&info);
@@ -236,7 +236,7 @@ hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
info.low_limit = current->mm->mmap_base;
- info.high_limit = TASK_SIZE;
+ info.high_limit = arch_get_mmap_end(addr, len, flags);
addr = vm_unmapped_area(&info);
}
@@ -251,6 +251,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct hstate *h = hstate_file(file);
+ const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
if (len & ~huge_page_mask(h))
return -EINVAL;
@@ -266,7 +267,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
if (addr) {
addr = ALIGN(addr, huge_page_size(h));
vma = find_vma(mm, addr);
- if (TASK_SIZE - len >= addr &&
+ if (mmap_end - len >= addr &&
(!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 1dcf54360aea..319bbf90cc96 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -135,6 +135,14 @@ static inline void mm_update_next_owner(struct mm_struct *mm)
#endif /* CONFIG_MEMCG */
#ifdef CONFIG_MMU
+#ifndef arch_get_mmap_end
+#define arch_get_mmap_end(addr, len, flags) (TASK_SIZE)
+#endif
+
+#ifndef arch_get_mmap_base
+#define arch_get_mmap_base(addr, base) (base)
+#endif
+
extern void arch_pick_mmap_layout(struct mm_struct *mm,
struct rlimit *rlim_stack);
extern unsigned long
diff --git a/mm/mmap.c b/mm/mmap.c
index 18cd52620719..b52e6df32ddf 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2119,14 +2119,6 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info)
return addr;
}
-#ifndef arch_get_mmap_end
-#define arch_get_mmap_end(addr, len, flags) (TASK_SIZE)
-#endif
-
-#ifndef arch_get_mmap_base
-#define arch_get_mmap_base(addr, base) (base)
-#endif
-
/* Get an address range which is currently unmapped.
* For shmat() with addr=0.
*
--
2.34.1
From: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
The CODA960 manual states that ASO/FMO features of baseline are not
supported, so for this reason this driver should only report
constrained baseline support.
This fixes negotiation issue with constrained baseline content
on GStreamer 1.17.1.
ASO/FMO features are unsupported for the encoder and untested for the
decoder because there is currently no userspace support. Neither GStreamer
parsers nor FFMPEG parsers support ASO/FMO.
Cc: stable(a)vger.kernel.org
Fixes: 42a68012e67c2 ("media: coda: add read-only h.264 decoder profile/level controls")
Signed-off-by: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Signed-off-by: Ezequiel Garcia <ezequiel(a)collabora.com>
Tested-by: Pascal Speck <kernel(a)iktek.de>
Signed-off-by: Fabio Estevam <festevam(a)denx.de>
---
Changes since v1:
- Followed Phillip's suggestion to change the commit message to say that
ASO/FMO features are unsupported for the encoder and untested for the
decoder because there is no userspace support.
https://patchwork.kernel.org/project/linux-media/patch/20200717034923.21952…
- Added Pascal's Tested-by tag.
drivers/media/platform/coda/coda-common.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/coda/coda-common.c b/drivers/media/platform/coda/coda-common.c
index a57822b05070..53b2dd1b268c 100644
--- a/drivers/media/platform/coda/coda-common.c
+++ b/drivers/media/platform/coda/coda-common.c
@@ -2344,8 +2344,8 @@ static void coda_encode_ctrls(struct coda_ctx *ctx)
V4L2_CID_MPEG_VIDEO_H264_CHROMA_QP_INDEX_OFFSET, -12, 12, 1, 0);
v4l2_ctrl_new_std_menu(&ctx->ctrls, &coda_ctrl_ops,
V4L2_CID_MPEG_VIDEO_H264_PROFILE,
- V4L2_MPEG_VIDEO_H264_PROFILE_BASELINE, 0x0,
- V4L2_MPEG_VIDEO_H264_PROFILE_BASELINE);
+ V4L2_MPEG_VIDEO_H264_PROFILE_CONSTRAINED_BASELINE, 0x0,
+ V4L2_MPEG_VIDEO_H264_PROFILE_CONSTRAINED_BASELINE);
if (ctx->dev->devtype->product == CODA_HX4 ||
ctx->dev->devtype->product == CODA_7541) {
v4l2_ctrl_new_std_menu(&ctx->ctrls, &coda_ctrl_ops,
@@ -2426,7 +2426,7 @@ static void coda_decode_ctrls(struct coda_ctx *ctx)
ctx->h264_profile_ctrl = v4l2_ctrl_new_std_menu(&ctx->ctrls,
&coda_ctrl_ops, V4L2_CID_MPEG_VIDEO_H264_PROFILE,
V4L2_MPEG_VIDEO_H264_PROFILE_HIGH,
- ~((1 << V4L2_MPEG_VIDEO_H264_PROFILE_BASELINE) |
+ ~((1 << V4L2_MPEG_VIDEO_H264_PROFILE_CONSTRAINED_BASELINE) |
(1 << V4L2_MPEG_VIDEO_H264_PROFILE_MAIN) |
(1 << V4L2_MPEG_VIDEO_H264_PROFILE_HIGH)),
V4L2_MPEG_VIDEO_H264_PROFILE_HIGH);
--
2.25.1
From: Li RongQing <lirongqing(a)baidu.com>
[ Upstream commit 9ee83635d872812f3920209c606c6ea9e412ffcc ]
When sending a call-function IPI-many to vCPUs, yield to the
IPI target vCPU which is marked as preempted.
but when emulating HLT, an idling vCPU will be voluntarily
scheduled out and mark as preempted from the guest kernel
perspective. yielding to idle vCPU is pointless and increase
unnecessary vmexit, maybe miss the true preempted vCPU
so yield to IPI target vCPU only if vCPU is busy and preempted
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
Message-Id: <1644380201-29423-1-git-send-email-lirongqing(a)baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7462b79c39de..8fe6eb5bed3f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -590,7 +590,7 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
/* Make sure other vCPUs get a chance to run if they need to. */
for_each_cpu(cpu, mask) {
- if (vcpu_is_preempted(cpu)) {
+ if (!idle_cpu(cpu) && vcpu_is_preempted(cpu)) {
kvm_hypercall1(KVM_HC_SCHED_YIELD, per_cpu(x86_cpu_to_apicid, cpu));
break;
}
--
2.34.1
From: Li RongQing <lirongqing(a)baidu.com>
[ Upstream commit 9ee83635d872812f3920209c606c6ea9e412ffcc ]
When sending a call-function IPI-many to vCPUs, yield to the
IPI target vCPU which is marked as preempted.
but when emulating HLT, an idling vCPU will be voluntarily
scheduled out and mark as preempted from the guest kernel
perspective. yielding to idle vCPU is pointless and increase
unnecessary vmexit, maybe miss the true preempted vCPU
so yield to IPI target vCPU only if vCPU is busy and preempted
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
Message-Id: <1644380201-29423-1-git-send-email-lirongqing(a)baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6ff2c7cac4c4..77e4d875a468 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -543,7 +543,7 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
/* Make sure other vCPUs get a chance to run if they need to. */
for_each_cpu(cpu, mask) {
- if (vcpu_is_preempted(cpu)) {
+ if (!idle_cpu(cpu) && vcpu_is_preempted(cpu)) {
kvm_hypercall1(KVM_HC_SCHED_YIELD, per_cpu(x86_cpu_to_apicid, cpu));
break;
}
--
2.34.1
From: Li RongQing <lirongqing(a)baidu.com>
[ Upstream commit 9ee83635d872812f3920209c606c6ea9e412ffcc ]
When sending a call-function IPI-many to vCPUs, yield to the
IPI target vCPU which is marked as preempted.
but when emulating HLT, an idling vCPU will be voluntarily
scheduled out and mark as preempted from the guest kernel
perspective. yielding to idle vCPU is pointless and increase
unnecessary vmexit, maybe miss the true preempted vCPU
so yield to IPI target vCPU only if vCPU is busy and preempted
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
Message-Id: <1644380201-29423-1-git-send-email-lirongqing(a)baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b656456c3a94..49f19e572a25 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -565,7 +565,7 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
/* Make sure other vCPUs get a chance to run if they need to. */
for_each_cpu(cpu, mask) {
- if (vcpu_is_preempted(cpu)) {
+ if (!idle_cpu(cpu) && vcpu_is_preempted(cpu)) {
kvm_hypercall1(KVM_HC_SCHED_YIELD, per_cpu(x86_cpu_to_apicid, cpu));
break;
}
--
2.34.1
From: Li RongQing <lirongqing(a)baidu.com>
[ Upstream commit 9ee83635d872812f3920209c606c6ea9e412ffcc ]
When sending a call-function IPI-many to vCPUs, yield to the
IPI target vCPU which is marked as preempted.
but when emulating HLT, an idling vCPU will be voluntarily
scheduled out and mark as preempted from the guest kernel
perspective. yielding to idle vCPU is pointless and increase
unnecessary vmexit, maybe miss the true preempted vCPU
so yield to IPI target vCPU only if vCPU is busy and preempted
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
Message-Id: <1644380201-29423-1-git-send-email-lirongqing(a)baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 59abbdad7729..2121c20e877f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -619,7 +619,7 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
/* Make sure other vCPUs get a chance to run if they need to. */
for_each_cpu(cpu, mask) {
- if (vcpu_is_preempted(cpu)) {
+ if (!idle_cpu(cpu) && vcpu_is_preempted(cpu)) {
kvm_hypercall1(KVM_HC_SCHED_YIELD, per_cpu(x86_cpu_to_apicid, cpu));
break;
}
--
2.34.1
From: Yan Yan <evitayan(a)google.com>
[ Upstream commit e03c3bba351f99ad932e8f06baa9da1afc418e02 ]
xfrm_migrate cannot handle address family change of an xfrm_state.
The symptons are the xfrm_state will be migrated to a wrong address,
and sending as well as receiving packets wil be broken.
This commit fixes it by breaking the original xfrm_state_clone
method into two steps so as to update the props.family before
running xfrm_init_state. As the result, xfrm_state's inner mode,
outer mode, type and IP header length in xfrm_state_migrate can
be updated with the new address family.
Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1885354
Signed-off-by: Yan Yan <evitayan(a)google.com>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/xfrm/xfrm_state.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 4d19f2ff6e05..73b4e7c0d336 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1238,9 +1238,6 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig)
memcpy(&x->mark, &orig->mark, sizeof(x->mark));
- if (xfrm_init_state(x) < 0)
- goto error;
-
x->props.flags = orig->props.flags;
x->props.extra_flags = orig->props.extra_flags;
@@ -1317,6 +1314,11 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
if (!xc)
return NULL;
+ xc->props.family = m->new_family;
+
+ if (xfrm_init_state(xc) < 0)
+ goto error;
+
memcpy(&xc->id.daddr, &m->new_daddr, sizeof(xc->id.daddr));
memcpy(&xc->props.saddr, &m->new_saddr, sizeof(xc->props.saddr));
--
2.34.1
From: Yan Yan <evitayan(a)google.com>
[ Upstream commit e03c3bba351f99ad932e8f06baa9da1afc418e02 ]
xfrm_migrate cannot handle address family change of an xfrm_state.
The symptons are the xfrm_state will be migrated to a wrong address,
and sending as well as receiving packets wil be broken.
This commit fixes it by breaking the original xfrm_state_clone
method into two steps so as to update the props.family before
running xfrm_init_state. As the result, xfrm_state's inner mode,
outer mode, type and IP header length in xfrm_state_migrate can
be updated with the new address family.
Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1885354
Signed-off-by: Yan Yan <evitayan(a)google.com>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/xfrm/xfrm_state.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 5164dfe0aa09..2c17fbdd2366 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1421,9 +1421,6 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
memcpy(&x->mark, &orig->mark, sizeof(x->mark));
- if (xfrm_init_state(x) < 0)
- goto error;
-
x->props.flags = orig->props.flags;
x->props.extra_flags = orig->props.extra_flags;
@@ -1501,6 +1498,11 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
if (!xc)
return NULL;
+ xc->props.family = m->new_family;
+
+ if (xfrm_init_state(xc) < 0)
+ goto error;
+
memcpy(&xc->id.daddr, &m->new_daddr, sizeof(xc->id.daddr));
memcpy(&xc->props.saddr, &m->new_saddr, sizeof(xc->props.saddr));
--
2.34.1
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 445c1470b6ef96440e7cfc42dfc160f5004fd149
Gitweb: https://git.kernel.org/tip/445c1470b6ef96440e7cfc42dfc160f5004fd149
Author: Ross Philipson <ross.philipson(a)oracle.com>
AuthorDate: Wed, 23 Feb 2022 21:07:36 -05:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Wed, 09 Mar 2022 12:49:46 +01:00
x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
The x86 boot documentation describes the setup_indirect structures and
how they are used. Only one of the two functions in ioremap.c that needed
to be modified to be aware of the introduction of setup_indirect
functionality was updated. Adds comparable support to the other function
where it was missing.
Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect")
Signed-off-by: Ross Philipson <ross.philipson(a)oracle.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Daniel Kiper <daniel.kiper(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1645668456-22036-3-git-send-email-ross.philipson@…
---
arch/x86/mm/ioremap.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ab666c4..17a492c 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -676,22 +676,51 @@ static bool memremap_is_setup_data(resource_size_t phys_addr,
static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
unsigned long size)
{
+ struct setup_indirect *indirect;
struct setup_data *data;
u64 paddr, paddr_next;
paddr = boot_params.hdr.setup_data;
while (paddr) {
- unsigned int len;
+ unsigned int len, size;
if (phys_addr == paddr)
return true;
data = early_memremap_decrypted(paddr, sizeof(*data));
+ if (!data) {
+ pr_warn("failed to early memremap setup_data entry\n");
+ return false;
+ }
+
+ size = sizeof(*data);
paddr_next = data->next;
len = data->len;
- early_memunmap(data, sizeof(*data));
+ if ((phys_addr > paddr) && (phys_addr < (paddr + len))) {
+ early_memunmap(data, sizeof(*data));
+ return true;
+ }
+
+ if (data->type == SETUP_INDIRECT) {
+ size += len;
+ early_memunmap(data, sizeof(*data));
+ data = early_memremap_decrypted(paddr, size);
+ if (!data) {
+ pr_warn("failed to early memremap indirect setup_data\n");
+ return false;
+ }
+
+ indirect = (struct setup_indirect *)data->data;
+
+ if (indirect->type != SETUP_INDIRECT) {
+ paddr = indirect->addr;
+ len = indirect->len;
+ }
+ }
+
+ early_memunmap(data, size);
if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
return true;
--
Hi Dear,
My name is Lily William, I am from the United States of America. It's my
pleasure to contact you for a new and special friendship. I will be glad to
see your reply so we can get to know each other better.
Yours
Lily
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58c9a5060cb7cd529d49c93954cdafe81c1d642a Mon Sep 17 00:00:00 2001
From: James Morse <james.morse(a)arm.com>
Date: Thu, 3 Mar 2022 16:53:56 +0000
Subject: [PATCH] arm64: proton-pack: Include unprivileged eBPF status in
Spectre v2 mitigation reporting
The mitigations for Spectre-BHB are only applied when an exception is
taken from user-space. The mitigation status is reported via the spectre_v2
sysfs vulnerabilities file.
When unprivileged eBPF is enabled the mitigation in the exception vectors
can be avoided by an eBPF program.
When unprivileged eBPF is enabled, print a warning and report vulnerable
via the sysfs vulnerabilities file.
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: James Morse <james.morse(a)arm.com>
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
index d3fbff00993d..6d45c63c6454 100644
--- a/arch/arm64/kernel/proton-pack.c
+++ b/arch/arm64/kernel/proton-pack.c
@@ -18,6 +18,7 @@
*/
#include <linux/arm-smccc.h>
+#include <linux/bpf.h>
#include <linux/cpu.h>
#include <linux/device.h>
#include <linux/nospec.h>
@@ -111,6 +112,15 @@ static const char *get_bhb_affected_string(enum mitigation_state bhb_state)
}
}
+static bool _unprivileged_ebpf_enabled(void)
+{
+#ifdef CONFIG_BPF_SYSCALL
+ return !sysctl_unprivileged_bpf_disabled;
+#else
+ return false;
+#endif
+}
+
ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -130,6 +140,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr,
v2_str = "CSV2";
fallthrough;
case SPECTRE_MITIGATED:
+ if (bhb_state == SPECTRE_MITIGATED && _unprivileged_ebpf_enabled())
+ return sprintf(buf, "Vulnerable: Unprivileged eBPF enabled\n");
+
return sprintf(buf, "Mitigation: %s%s\n", v2_str, bhb_str);
case SPECTRE_VULNERABLE:
fallthrough;
@@ -1125,3 +1138,16 @@ void __init spectre_bhb_patch_clearbhb(struct alt_instr *alt,
*updptr++ = cpu_to_le32(aarch64_insn_gen_nop());
*updptr++ = cpu_to_le32(aarch64_insn_gen_nop());
}
+
+#ifdef CONFIG_BPF_SYSCALL
+#define EBPF_WARN "Unprivileged eBPF is enabled, data leaks possible via Spectre v2 BHB attacks!\n"
+void unpriv_ebpf_notify(int new_state)
+{
+ if (spectre_v2_state == SPECTRE_VULNERABLE ||
+ spectre_bhb_state != SPECTRE_MITIGATED)
+ return;
+
+ if (!new_state)
+ pr_err("WARNING: %s", EBPF_WARN);
+}
+#endif
Hello Dear,
how are you today,I hope you are doing great. It is my great pleasure
to contact you,I want to make a new and special friend,I hope you
don't mind. My name is Tracy Williams
from the United States, Am a french and English nationality. I will
give you pictures and more details about my self as soon as i hear
from you in my email account bellow,
Here is my email address; drtracywilliams89(a)gmail.com
Please send your reply to my PRIVATE mail box.
Thanks,
Tracy Williams.
On Wed, Mar 09, 2022 at 02:08:42PM +0100, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> arm64: entry.S: Add ventry overflow sanity checks
>
> to the 5.4-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> arm64-entry.s-add-ventry-overflow-sanity-checks.patch
> and it can be found in the queue-5.4 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Oops, no, sorry, I've dropped this whole series from the 5.4 queue
again.
James, I've emailed you about this, let's take the followup to the
stable list and work on this there.
thanks,
greg k-h
--
Hi
I want to know from you if you received my message concerning your
compensation file with United Nations Compensation Program. Please
confirm.
Kind regards!
Howard Newell
London WC2N 4JS, UK
Dear stable kernel maintainers,
Please consider cherry-picking
commit 6c23d54f4cb8 ("arm64: Import latest memcpy()/memmove() implementation")
to v5.10.y. It first landed in v5.14-rc1.
It fixes a linkage failure observed when building kernels for ChromeOS
under AutoFDO:
ld.lld: error: arch/arm64/lib/lib.a(memmove.o):(function __memmove:
.text+0x8): relocation R_AARCH64_CONDBR19 out of range: -6331272 is
not in [-1048576, 1048575]; references __memcpy
>>> defined in arch/arm64/lib/lib.a(memcpy.o)
(The prior version of memmove used assembler conditional branches to
memcpy; under AutoFDO the linker will decide where best to place
memmove; it may be > 1MB away from memcpy. After this patch, memcpy
and memmove are the same function).
--
Thanks,
~Nick Desaulniers
Dear kernel developers!
I am using Linux Mint Xfce 20.3 with kernel version 5.16. I had to use
kernel 5.16 because with the standard kernel version of Linux mit 20.3
(which is 5.13) my laptop did not correctly resume, when I closed the
lid.
With kernel 5.16 my laptop perfectly went to to suspend when I closed
the lid and it perfectly resumed, when I opened the lid again. This
means: I had to press the power button once when I reopened the lid -
and then the laptop resumed (to the login screen). This was true until
kernel version 5.16.10. With kernel version > 5.16.10 my laptop does
not go into suspend anymore. This means: When I open the lid I am back
at the login screen immediately (I don't have to press the power button
anymore).
System information for my laptop:
----------------------------------------------------------------------
System: Kernel: 5.16.10-051610-generic x86_64 bits: 64 compiler: N/A
Desktop: Xfce 4.16.0
tk: Gtk 3.24.20 wm: xfwm4 dm: LightDM Distro: Linux Mint
20.3 Una
base: Ubuntu 20.04 focal
Machine: Type: Laptop System: HP product: HP ProBook 455 G8 Notebook
PC v: N/A serial: <filter>
Chassis: type: 10 serial: <filter>
Mobo: HP model: 8864 v: KBC Version 41.1E.00 serial:
<filter> UEFI: HP
v: T78 Ver. 01.07.00 date: 10/08/2021
Battery: ID-1: BAT0 charge: 43.8 Wh condition: 44.5/45.0 Wh (99%)
volts: 13.0/11.4
model: Hewlett-Packard Primary serial: <filter> status:
Unknown
CPU: Topology: 8-Core model: AMD Ryzen 7 5800U with Radeon
Graphics bits: 64 type: MT MCP
arch: Zen 3 L2 cache: 4096 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a
ssse3 svm bogomips: 60685
Speed: 3497 MHz min/max: 1600/1900 MHz Core speeds (MHz): 1:
3474 2: 3464 3: 3473
4: 3471 5: 4362 6: 4332 7: 3478 8: 3455 9: 3459 10: 3452 11:
3462 12: 3468 13: 3468
14: 3468 15: 3467 16: 3472
Graphics: Device-1: AMD vendor: Hewlett-Packard driver: amdgpu v:
kernel bus ID: 05:00.0
chip ID: 1002:1638
Display: x11 server: X.Org 1.20.13 driver: amdgpu,ati
unloaded: fbdev,modesetting,vesa
resolution: 1920x1080~60Hz
OpenGL: renderer: AMD RENOIR (DRM 3.44.0 5.16.10-051610-
generic LLVM 12.0.0)
v: 4.6 Mesa 21.2.6 direct render: Yes
----------------------------------------------------------------------
Best regards,
Reinhold Mannsberger
Hi,
On 9/14/21 09:57, Qu Wenruo wrote:
> [BUG]
...
>
> ================================================
> WARNING: lock held when returning to user space!
> 5.15.0-rc1 #16 Not tainted
> ------------------------------------------------
> syz-executor/7579 is leaving the kernel with locks still held!
> 1 lock held by syz-executor/7579:
> #0: ffff888104b73da8 (btrfs-tree-01/1){+.+.}-{3:3}, at:
> __btrfs_tree_lock+0x2e/0x1a0 fs/btrfs/locking.c:112
>
> [CAUSE]
> In btrfs_alloc_tree_block(), after btrfs_init_new_buffer(), the new
> extent buffer @buf is locked, but if later operations like adding
> delayed tree ref fails, we just free @buf without unlocking it,
> resulting above warning.
This patch fixes CVE-2021-4149. Commit 19ea40dddf18
"btrfs: unlock newly allocated extent buffer after error" upstream.
The patch was backported to kernels 5.15, 5.10, 5.4 because it contains
"CC: stable(a)vger.kernel.org # 5.4+" in the commit message.
However, it looks to me like kernels 4.9, 4.14, 4.19 are also vulnerable.
In v4.9 kernel there is btrfs_init_new_buffer() call:
btrfs_alloc_tree_block(...)
{
...
buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
...
out_free_buf:
free_extent_buffer(buf);
...
}
and btrfs_init_new_buffer() contains btrfs_tree_lock(buf) inside it.
The patch can be cherry-picked to v4.9 kernel without a conflict.
Probably, the error was introduced in the commit 67b7859e9bfa
"btrfs: handle ENOMEM in btrfs_alloc_tree_block" It's in the kernel
since v4.1
Can you confirm that kernels v4.9, 4.14, 4.19 are also vulnerable?
Thanks,
Denis
>
> [FIX]
> Unlock @buf in out_free_buf: tag.
>
> Reported-by: Hao Sun <sunhao.th(a)gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CACkBjsZ9O6Zr0KK1yGn=1rQi6Crh1yeCRdTSBx…
> Signed-off-by: Qu Wenruo <wqu(a)suse.com>
> ---
> fs/btrfs/extent-tree.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c88e7727a31a..8aa981ffe7b7 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4898,6 +4898,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
> out_free_delayed:
> btrfs_free_delayed_extent_op(extent_op);
> out_free_buf:
> + btrfs_tree_unlock(buf);
> free_extent_buffer(buf);
> out_free_reserved:
> btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 0);
The implementations of aead and skcipher in the QAT driver do not
support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY but does not enqueue
the request.
This can result in applications like dm-crypt waiting indefinitely for a
completion of a request that was never submitted to the hardware.
To avoid this problem, disable the registration of all crypto algorithms
in the QAT driver by setting the number of crypto instances to 0 at
configuration time.
Cc: stable(a)vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
---
drivers/crypto/qat/qat_4xxx/adf_drv.c | 7 +++++++
drivers/crypto/qat/qat_common/qat_crypto.c | 7 +++++++
2 files changed, 14 insertions(+)
diff --git a/drivers/crypto/qat/qat_4xxx/adf_drv.c b/drivers/crypto/qat/qat_4xxx/adf_drv.c
index a6c78b9c730b..fa4c350c1bf9 100644
--- a/drivers/crypto/qat/qat_4xxx/adf_drv.c
+++ b/drivers/crypto/qat/qat_4xxx/adf_drv.c
@@ -75,6 +75,13 @@ static int adf_crypto_dev_config(struct adf_accel_dev *accel_dev)
if (ret)
goto err;
+ /* Temporarily set the number of crypto instances to zero to avoid
+ * registering the crypto algorithms.
+ * This will be removed when the algorithms will support the
+ * CRYPTO_TFM_REQ_MAY_BACKLOG flag
+ */
+ instances = 0;
+
for (i = 0; i < instances; i++) {
val = i;
bank = i * 2;
diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c
index 7234c4940fae..67c9588e89df 100644
--- a/drivers/crypto/qat/qat_common/qat_crypto.c
+++ b/drivers/crypto/qat/qat_common/qat_crypto.c
@@ -161,6 +161,13 @@ int qat_crypto_dev_config(struct adf_accel_dev *accel_dev)
if (ret)
goto err;
+ /* Temporarily set the number of crypto instances to zero to avoid
+ * registering the crypto algorithms.
+ * This will be removed when the algorithms will support the
+ * CRYPTO_TFM_REQ_MAY_BACKLOG flag
+ */
+ instances = 0;
+
for (i = 0; i < instances; i++) {
val = i;
snprintf(key, sizeof(key), ADF_CY "%d" ADF_RING_ASYM_BANK_NUM, i);
--
2.35.1
This is the start of the stable review cycle for the 4.14.270 release.
There are 42 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 09 Mar 2022 09:16:25 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.270-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.270-rc1
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: disable softirqs in dcbnl_flush_dev()
Hugh Dickins <hughd(a)google.com>
memfd: fix F_SEAL_WRITE after shmem huge page allocated
William Mahon <wmahon(a)chromium.org>
HID: add mapping for KEY_ALL_APPLICATIONS
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
nl80211: Handle nla_memdup failures in handle_nan_filter
Jia-Ju Bai <baijiaju1990(a)gmail.com>
net: chelsio: cxgb3: check the return value of pci_find_capability()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
soc: fsl: qe: Check of ioremap return value
Randy Dunlap <rdunlap(a)infradead.org>
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: gs_usb: change active_channels's type from atomic_t to u8
Jann Horn <jannh(a)google.com>
efivars: Respect "block" flag in efivar_entry_set_safe()
Zheyu Ma <zheyuma97(a)gmail.com>
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
Randy Dunlap <rdunlap(a)infradead.org>
net: sxgbe: fix return value of __setup handler
Randy Dunlap <rdunlap(a)infradead.org>
net: stmmac: fix return value of __setup handler
Nicolas Escande <nico.escande(a)gmail.com>
mac80211: fix forwarded mesh frames AC & queue selection
Johan Hovold <johan(a)kernel.org>
firmware: qemu_fw_cfg: fix kobject leak in probe error path
Qiushi Wu <wu000273(a)umn.edu>
firmware: Fix a reference count leak.
D. Wythe <alibuda(a)linux.alibaba.com>
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
D. Wythe <alibuda(a)linux.alibaba.com>
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: flush lingering app table entries for unregistered devices
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Don't expect inter-netns unique iflink indices
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Request iflink once in batadv_get_real_netdevice
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Request iflink once in batadv-on-batadv check
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: fix possible use-after-free
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: don't assume sk is full socket
Leon Romanovsky <leonro(a)nvidia.com>
xfrm: enforce validity of offload input flags
Eric Dumazet <edumazet(a)google.com>
netfilter: fix use-after-free in __nf_register_net_hook()
Jiri Bohac <jbohac(a)suse.cz>
xfrm: fix MTU regression
Marek Vasut <marex(a)denx.de>
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Zhen Ni <nizhen(a)uniontech.com>
ALSA: intel_hdmi: Fix reference to PCM buffer address
Sergey Shtylyov <s.shtylyov(a)omp.ru>
ata: pata_hpt37x: fix PCI clock detection
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: clear related members when goto fail
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: don't release an existing dev->buf
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Wolfram Sang <wsa(a)kernel.org>
i2c: qup: allow COMPILE_TEST
Wolfram Sang <wsa(a)kernel.org>
i2c: cadence: allow COMPILE_TEST
Yongzhi Liu <lyz_cs(a)pku.edu.cn>
dmaengine: shdma: Fix runtime PM imbalance on error
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: fix double free race when mount fails in cifs_get_root()
José Expósito <jose.exposito89(a)gmail.com>
Input: clear BTN_RIGHT/MIDDLE on buttonpads
Eric Anholt <eric(a)anholt.net>
i2c: bcm2835: Avoid clock stretching timeouts
JaeMan Park <jaeman(a)google.com>
mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
Benjamin Beichler <benjamin.beichler(a)uni-rostock.de>
mac80211_hwsim: report NOACK frames in tx_status
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mm/mmu.c | 2 +
drivers/ata/pata_hpt37x.c | 4 +-
drivers/dma/sh/shdma-base.c | 4 +-
drivers/firmware/efi/vars.c | 5 +-
drivers/firmware/qemu_fw_cfg.c | 10 ++--
drivers/hid/hid-debug.c | 4 +-
drivers/hid/hid-input.c | 2 +
drivers/i2c/busses/Kconfig | 4 +-
drivers/i2c/busses/i2c-bcm2835.c | 11 ++++
drivers/input/input.c | 6 +++
drivers/input/mouse/elan_i2c_core.c | 64 ++++++++---------------
drivers/net/arcnet/com20020-pci.c | 3 ++
drivers/net/can/usb/gs_usb.c | 10 ++--
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c | 2 +
drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 6 +--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +--
drivers/net/usb/cdc_mbim.c | 5 ++
drivers/net/wireless/mac80211_hwsim.c | 13 +++++
drivers/soc/fsl/qe/qe_io.c | 2 +
drivers/usb/gadget/legacy/inode.c | 10 ++--
fs/cifs/cifsfs.c | 1 +
include/net/netfilter/nf_queue.h | 2 +-
include/uapi/linux/input-event-codes.h | 3 +-
include/uapi/linux/xfrm.h | 6 +++
mm/shmem.c | 7 +--
net/batman-adv/hard-interface.c | 29 ++++++----
net/dcb/dcbnl.c | 44 ++++++++++++++++
net/ipv6/ip6_output.c | 11 ++--
net/mac80211/rx.c | 4 +-
net/netfilter/core.c | 5 +-
net/netfilter/nf_queue.c | 23 ++++++--
net/netfilter/nfnetlink_queue.c | 12 +++--
net/smc/smc_core.c | 5 +-
net/wireless/nl80211.c | 12 +++++
net/xfrm/xfrm_device.c | 6 ++-
sound/soc/soc-ops.c | 4 +-
sound/x86/intel_hdmi_audio.c | 2 +-
38 files changed, 250 insertions(+), 103 deletions(-)
This is the start of the stable review cycle for the 4.9.305 release.
There are 32 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 09 Mar 2022 09:16:25 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.305-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.305-rc1
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: disable softirqs in dcbnl_flush_dev()
Hugh Dickins <hughd(a)google.com>
memfd: fix F_SEAL_WRITE after shmem huge page allocated
William Mahon <wmahon(a)chromium.org>
HID: add mapping for KEY_ALL_APPLICATIONS
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Jia-Ju Bai <baijiaju1990(a)gmail.com>
net: chelsio: cxgb3: check the return value of pci_find_capability()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
soc: fsl: qe: Check of ioremap return value
Randy Dunlap <rdunlap(a)infradead.org>
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: gs_usb: change active_channels's type from atomic_t to u8
Jann Horn <jannh(a)google.com>
efivars: Respect "block" flag in efivar_entry_set_safe()
Zheyu Ma <zheyuma97(a)gmail.com>
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
Randy Dunlap <rdunlap(a)infradead.org>
net: sxgbe: fix return value of __setup handler
Randy Dunlap <rdunlap(a)infradead.org>
net: stmmac: fix return value of __setup handler
Nicolas Escande <nico.escande(a)gmail.com>
mac80211: fix forwarded mesh frames AC & queue selection
Johan Hovold <johan(a)kernel.org>
firmware: qemu_fw_cfg: fix kobject leak in probe error path
Qiushi Wu <wu000273(a)umn.edu>
firmware: Fix a reference count leak.
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: flush lingering app table entries for unregistered devices
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: fix possible use-after-free
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: don't assume sk is full socket
Jiri Bohac <jbohac(a)suse.cz>
xfrm: fix MTU regression
Marek Vasut <marex(a)denx.de>
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Sergey Shtylyov <s.shtylyov(a)omp.ru>
ata: pata_hpt37x: fix PCI clock detection
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: clear related members when goto fail
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: don't release an existing dev->buf
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Wolfram Sang <wsa(a)kernel.org>
i2c: qup: allow COMPILE_TEST
Yongzhi Liu <lyz_cs(a)pku.edu.cn>
dmaengine: shdma: Fix runtime PM imbalance on error
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: fix double free race when mount fails in cifs_get_root()
José Expósito <jose.exposito89(a)gmail.com>
Input: clear BTN_RIGHT/MIDDLE on buttonpads
Eric Anholt <eric(a)anholt.net>
i2c: bcm2835: Avoid clock stretching timeouts
JaeMan Park <jaeman(a)google.com>
mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
Benjamin Beichler <benjamin.beichler(a)uni-rostock.de>
mac80211_hwsim: report NOACK frames in tx_status
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mm/mmu.c | 2 +
drivers/ata/pata_hpt37x.c | 4 +-
drivers/dma/sh/shdma-base.c | 4 +-
drivers/firmware/efi/vars.c | 5 +-
drivers/firmware/qemu_fw_cfg.c | 10 ++--
drivers/hid/hid-debug.c | 4 +-
drivers/hid/hid-input.c | 2 +
drivers/i2c/busses/Kconfig | 2 +-
drivers/i2c/busses/i2c-bcm2835.c | 11 ++++
drivers/input/input.c | 6 +++
drivers/input/mouse/elan_i2c_core.c | 64 ++++++++---------------
drivers/net/arcnet/com20020-pci.c | 3 ++
drivers/net/can/usb/gs_usb.c | 10 ++--
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c | 2 +
drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 6 +--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +--
drivers/net/usb/cdc_mbim.c | 5 ++
drivers/net/wireless/mac80211_hwsim.c | 13 +++++
drivers/soc/fsl/qe/qe_io.c | 2 +
drivers/usb/gadget/legacy/inode.c | 10 ++--
fs/cifs/cifsfs.c | 1 +
include/net/netfilter/nf_queue.h | 2 +-
include/uapi/linux/input-event-codes.h | 3 +-
mm/shmem.c | 7 +--
net/dcb/dcbnl.c | 44 ++++++++++++++++
net/ipv6/ip6_output.c | 11 ++--
net/mac80211/rx.c | 4 +-
net/netfilter/nf_queue.c | 23 ++++++--
net/netfilter/nfnetlink_queue.c | 12 +++--
sound/soc/soc-ops.c | 4 +-
31 files changed, 199 insertions(+), 87 deletions(-)
This is the start of the stable review cycle for the 4.19.233 release.
There are 51 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 09 Mar 2022 09:16:25 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.233-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.233-rc1
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: disable softirqs in dcbnl_flush_dev()
Filipe Manana <fdmanana(a)suse.com>
btrfs: add missing run of delayed items after unlink during log replay
Steven Rostedt (Google) <rostedt(a)goodmis.org>
tracing/histogram: Fix sorting on old "cpu" value
Hugh Dickins <hughd(a)google.com>
memfd: fix F_SEAL_WRITE after shmem huge page allocated
William Mahon <wmahon(a)chromium.org>
HID: add mapping for KEY_ALL_APPLICATIONS
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
nl80211: Handle nla_memdup failures in handle_nan_filter
Jia-Ju Bai <baijiaju1990(a)gmail.com>
net: chelsio: cxgb3: check the return value of pci_find_capability()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
soc: fsl: qe: Check of ioremap return value
Sukadev Bhattiprolu <sukadev(a)linux.ibm.com>
ibmvnic: free reset-work-item when flushing
Randy Dunlap <rdunlap(a)infradead.org>
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
Brian Norris <briannorris(a)chromium.org>
arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: gs_usb: change active_channels's type from atomic_t to u8
Alyssa Ross <hi(a)alyssa.is>
firmware: arm_scmi: Remove space in MODULE_ALIAS name
Jann Horn <jannh(a)google.com>
efivars: Respect "block" flag in efivar_entry_set_safe()
Zheyu Ma <zheyuma97(a)gmail.com>
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
Randy Dunlap <rdunlap(a)infradead.org>
net: sxgbe: fix return value of __setup handler
Randy Dunlap <rdunlap(a)infradead.org>
net: stmmac: fix return value of __setup handler
Nicolas Escande <nico.escande(a)gmail.com>
mac80211: fix forwarded mesh frames AC & queue selection
Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
xen/netfront: destroy queues before real_num_tx_queues is zeroed
Lukas Wunner <lukas(a)wunner.de>
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
Ye Bin <yebin10(a)huawei.com>
block: Fix fsync always failed if once failed
D. Wythe <alibuda(a)linux.alibaba.com>
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
D. Wythe <alibuda(a)linux.alibaba.com>
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: flush lingering app table entries for unregistered devices
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Don't expect inter-netns unique iflink indices
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Request iflink once in batadv_get_real_netdevice
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Request iflink once in batadv-on-batadv check
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: fix possible use-after-free
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: don't assume sk is full socket
Leon Romanovsky <leonro(a)nvidia.com>
xfrm: enforce validity of offload input flags
Antony Antony <antony.antony(a)secunet.com>
xfrm: fix the if_id check in changelink
Eric Dumazet <edumazet(a)google.com>
netfilter: fix use-after-free in __nf_register_net_hook()
Jiri Bohac <jbohac(a)suse.cz>
xfrm: fix MTU regression
Marek Vasut <marex(a)denx.de>
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Zhen Ni <nizhen(a)uniontech.com>
ALSA: intel_hdmi: Fix reference to PCM buffer address
Sergey Shtylyov <s.shtylyov(a)omp.ru>
ata: pata_hpt37x: fix PCI clock detection
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: clear related members when goto fail
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: don't release an existing dev->buf
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Wolfram Sang <wsa(a)kernel.org>
i2c: qup: allow COMPILE_TEST
Wolfram Sang <wsa(a)kernel.org>
i2c: cadence: allow COMPILE_TEST
Yongzhi Liu <lyz_cs(a)pku.edu.cn>
dmaengine: shdma: Fix runtime PM imbalance on error
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: fix double free race when mount fails in cifs_get_root()
José Expósito <jose.exposito89(a)gmail.com>
Input: clear BTN_RIGHT/MIDDLE on buttonpads
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ASoC: rt5682: do not block workqueue if card is unbound
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ASoC: rt5668: do not block workqueue if card is unbound
Eric Anholt <eric(a)anholt.net>
i2c: bcm2835: Avoid clock stretching timeouts
JaeMan Park <jaeman(a)google.com>
mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
Benjamin Beichler <benjamin.beichler(a)uni-rostock.de>
mac80211_hwsim: report NOACK frames in tx_status
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mm/mmu.c | 2 +
arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 17 ++++--
block/blk-flush.c | 4 +-
drivers/ata/pata_hpt37x.c | 4 +-
drivers/dma/sh/shdma-base.c | 4 +-
drivers/firmware/arm_scmi/driver.c | 2 +-
drivers/firmware/efi/vars.c | 5 +-
drivers/hid/hid-debug.c | 4 +-
drivers/hid/hid-input.c | 2 +
drivers/i2c/busses/Kconfig | 4 +-
drivers/i2c/busses/i2c-bcm2835.c | 11 ++++
drivers/input/input.c | 6 +++
drivers/input/mouse/elan_i2c_core.c | 64 ++++++++---------------
drivers/net/arcnet/com20020-pci.c | 3 ++
drivers/net/can/usb/gs_usb.c | 10 ++--
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c | 2 +
drivers/net/ethernet/ibm/ibmvnic.c | 4 +-
drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 6 +--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +--
drivers/net/usb/cdc_mbim.c | 5 ++
drivers/net/wireless/mac80211_hwsim.c | 13 +++++
drivers/net/xen-netfront.c | 39 ++++++++------
drivers/pci/hotplug/pciehp_hpc.c | 7 +--
drivers/soc/fsl/qe/qe_io.c | 2 +
drivers/usb/gadget/legacy/inode.c | 10 ++--
fs/btrfs/tree-log.c | 18 +++++++
fs/cifs/cifsfs.c | 1 +
include/net/netfilter/nf_queue.h | 2 +-
include/uapi/linux/input-event-codes.h | 3 +-
include/uapi/linux/xfrm.h | 6 +++
kernel/trace/trace_events_hist.c | 6 +--
mm/memfd.c | 30 ++++++++---
net/batman-adv/hard-interface.c | 29 ++++++----
net/dcb/dcbnl.c | 44 ++++++++++++++++
net/ipv6/ip6_output.c | 11 ++--
net/mac80211/rx.c | 4 +-
net/netfilter/core.c | 5 +-
net/netfilter/nf_queue.c | 22 ++++++--
net/netfilter/nfnetlink_queue.c | 12 +++--
net/smc/smc_core.c | 5 +-
net/wireless/nl80211.c | 12 +++++
net/xfrm/xfrm_device.c | 6 ++-
net/xfrm/xfrm_interface.c | 2 +-
sound/soc/codecs/rt5668.c | 12 +++--
sound/soc/codecs/rt5682.c | 12 +++--
sound/soc/soc-ops.c | 4 +-
sound/x86/intel_hdmi_audio.c | 2 +-
48 files changed, 344 insertions(+), 144 deletions(-)
On Tue, Mar 08, 2022 at 04:29:15PM +0100, Luna Jernberg wrote:
> oh i meant to send that too the 5.16.13 emails
>
> On Tue, Mar 8, 2022 at 3:30 PM Luna Jernberg <droidbittin(a)gmail.com> wrote:
>
> > Both rc2, and rc1 works fine from the linux-rc package in AUR on my Arch
> > Linux systems :)
If you respond with a "Tested-by:" line like others do, my scripts will
automatically put it in the commit log for the release commit if you
want.
thanks for testing!
greg k-h
This is the start of the stable review cycle for the 5.4.183 release.
There are 64 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 09 Mar 2022 09:16:25 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.183-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.183-rc1
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: disable softirqs in dcbnl_flush_dev()
Jiri Bohac <jbohac(a)suse.cz>
Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
Filipe Manana <fdmanana(a)suse.com>
btrfs: add missing run of delayed items after unlink during log replay
Sidong Yang <realwakka(a)gmail.com>
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix lost prealloc extents beyond eof after full fsync
Randy Dunlap <rdunlap(a)infradead.org>
tracing: Fix return value of __setup handlers
Steven Rostedt (Google) <rostedt(a)goodmis.org>
tracing/histogram: Fix sorting on old "cpu" value
William Mahon <wmahon(a)chromium.org>
HID: add mapping for KEY_ALL_APPLICATIONS
William Mahon <wmahon(a)chromium.org>
HID: add mapping for KEY_DICTATE
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Hans de Goede <hdegoede(a)redhat.com>
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
nl80211: Handle nla_memdup failures in handle_nan_filter
Jia-Ju Bai <baijiaju1990(a)gmail.com>
net: chelsio: cxgb3: check the return value of pci_find_capability()
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
soc: fsl: qe: Check of ioremap return value
Hugh Dickins <hughd(a)google.com>
memfd: fix F_SEAL_WRITE after shmem huge page allocated
Sukadev Bhattiprolu <sukadev(a)linux.ibm.com>
ibmvnic: free reset-work-item when flushing
Sasha Neftin <sasha.neftin(a)intel.com>
igc: igc_write_phy_reg_gpy: drop premature return
Randy Dunlap <rdunlap(a)infradead.org>
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Fix kgdb breakpoint for Thumb2
Corinna Vinschen <vinschen(a)redhat.com>
igc: igc_read_phy_reg_gpy: drop premature return
Brian Norris <briannorris(a)chromium.org>
arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: gs_usb: change active_channels's type from atomic_t to u8
Fabio Estevam <festevam(a)denx.de>
ASoC: cs4265: Fix the duplicated control name
Alyssa Ross <hi(a)alyssa.is>
firmware: arm_scmi: Remove space in MODULE_ALIAS name
Jann Horn <jannh(a)google.com>
efivars: Respect "block" flag in efivar_entry_set_safe()
Maciej Fijalkowski <maciej.fijalkowski(a)intel.com>
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
Zheyu Ma <zheyuma97(a)gmail.com>
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
Randy Dunlap <rdunlap(a)infradead.org>
net: sxgbe: fix return value of __setup handler
Slawomir Laba <slawomirx.laba(a)intel.com>
iavf: Fix missing check for running netdev
Randy Dunlap <rdunlap(a)infradead.org>
net: stmmac: fix return value of __setup handler
Nicolas Escande <nico.escande(a)gmail.com>
mac80211: fix forwarded mesh frames AC & queue selection
Valentin Schneider <valentin.schneider(a)arm.com>
ia64: ensure proper NUMA distance and possible map initialization
Dietmar Eggemann <dietmar.eggemann(a)arm.com>
sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
Valentin Schneider <valentin.schneider(a)arm.com>
sched/topology: Make sched_init_numa() use a set for the deduplicating sort
Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
xen/netfront: destroy queues before real_num_tx_queues is zeroed
Ye Bin <yebin10(a)huawei.com>
block: Fix fsync always failed if once failed
D. Wythe <alibuda(a)linux.alibaba.com>
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
D. Wythe <alibuda(a)linux.alibaba.com>
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dcb: flush lingering app table entries for unregistered devices
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Don't expect inter-netns unique iflink indices
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Request iflink once in batadv_get_real_netdevice
Sven Eckelmann <sven(a)narfation.org>
batman-adv: Request iflink once in batadv-on-batadv check
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: fix possible use-after-free
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: don't assume sk is full socket
Leon Romanovsky <leonro(a)nvidia.com>
xfrm: enforce validity of offload input flags
Antony Antony <antony.antony(a)secunet.com>
xfrm: fix the if_id check in changelink
Eric Dumazet <edumazet(a)google.com>
netfilter: fix use-after-free in __nf_register_net_hook()
Jiri Bohac <jbohac(a)suse.cz>
xfrm: fix MTU regression
Marek Vasut <marex(a)denx.de>
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Zhen Ni <nizhen(a)uniontech.com>
ALSA: intel_hdmi: Fix reference to PCM buffer address
Sergey Shtylyov <s.shtylyov(a)omp.ru>
ata: pata_hpt37x: fix PCI clock detection
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: clear related members when goto fail
Hangyu Hua <hbh25y(a)gmail.com>
usb: gadget: don't release an existing dev->buf
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Wolfram Sang <wsa(a)kernel.org>
i2c: qup: allow COMPILE_TEST
Wolfram Sang <wsa(a)kernel.org>
i2c: cadence: allow COMPILE_TEST
Yongzhi Liu <lyz_cs(a)pku.edu.cn>
dmaengine: shdma: Fix runtime PM imbalance on error
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: fix double free race when mount fails in cifs_get_root()
José Expósito <jose.exposito89(a)gmail.com>
Input: clear BTN_RIGHT/MIDDLE on buttonpads
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ASoC: rt5682: do not block workqueue if card is unbound
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ASoC: rt5668: do not block workqueue if card is unbound
Eric Anholt <eric(a)anholt.net>
i2c: bcm2835: Avoid clock stretching timeouts
JaeMan Park <jaeman(a)google.com>
mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
Benjamin Beichler <benjamin.beichler(a)uni-rostock.de>
mac80211_hwsim: report NOACK frames in tx_status
-------------
Diffstat:
Makefile | 4 +-
arch/arm/kernel/kgdb.c | 36 +++++++--
arch/arm/mm/mmu.c | 2 +
arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 17 ++--
arch/ia64/kernel/acpi.c | 7 +-
block/blk-flush.c | 4 +-
drivers/ata/pata_hpt37x.c | 4 +-
drivers/dma/sh/shdma-base.c | 4 +-
drivers/firmware/arm_scmi/driver.c | 2 +-
drivers/firmware/efi/vars.c | 5 +-
drivers/hid/hid-debug.c | 5 +-
drivers/hid/hid-input.c | 3 +
drivers/i2c/busses/Kconfig | 4 +-
drivers/i2c/busses/i2c-bcm2835.c | 11 +++
drivers/input/input.c | 6 ++
drivers/input/mouse/elan_i2c_core.c | 64 ++++++---------
drivers/net/arcnet/com20020-pci.c | 3 +
drivers/net/can/usb/gs_usb.c | 10 +--
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c | 2 +
drivers/net/ethernet/ibm/ibmvnic.c | 4 +-
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +-
drivers/net/ethernet/intel/igc/igc_phy.c | 4 -
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 6 +-
drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 6 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +-
drivers/net/usb/cdc_mbim.c | 5 ++
drivers/net/wireless/mac80211_hwsim.c | 13 +++
drivers/net/xen-netfront.c | 39 +++++----
drivers/soc/fsl/qe/qe_io.c | 2 +
drivers/usb/gadget/legacy/inode.c | 10 ++-
fs/btrfs/qgroup.c | 9 ++-
fs/btrfs/tree-log.c | 61 +++++++++++---
fs/cifs/cifsfs.c | 1 +
include/linux/topology.h | 1 +
include/net/netfilter/nf_queue.h | 2 +-
include/net/xfrm.h | 1 -
include/uapi/linux/input-event-codes.h | 4 +-
include/uapi/linux/xfrm.h | 6 ++
kernel/sched/topology.c | 99 +++++++++++------------
kernel/trace/trace.c | 4 +-
kernel/trace/trace_events_hist.c | 6 +-
kernel/trace/trace_kprobe.c | 2 +-
mm/memfd.c | 40 ++++++---
net/batman-adv/hard-interface.c | 29 ++++---
net/dcb/dcbnl.c | 44 ++++++++++
net/ipv4/esp4.c | 2 +-
net/ipv6/esp6.c | 2 +-
net/ipv6/ip6_output.c | 11 ++-
net/mac80211/rx.c | 4 +-
net/netfilter/core.c | 5 +-
net/netfilter/nf_queue.c | 24 ++++--
net/netfilter/nfnetlink_queue.c | 12 ++-
net/smc/smc_core.c | 5 +-
net/wireless/nl80211.c | 12 +++
net/xfrm/xfrm_device.c | 6 +-
net/xfrm/xfrm_interface.c | 2 +-
net/xfrm/xfrm_state.c | 14 +---
sound/soc/codecs/cs4265.c | 3 +-
sound/soc/codecs/rt5668.c | 12 +--
sound/soc/codecs/rt5682.c | 12 +--
sound/soc/soc-ops.c | 4 +-
sound/x86/intel_hdmi_audio.c | 2 +-
62 files changed, 487 insertions(+), 249 deletions(-)
Even if SPI_NOR_NO_ERASE was set, one could still send erase opcodes
to the flash. It is not recommended to send unsupported opcodes to
flashes. Fix the logic and do not set mtd->_erase when SPI_NOR_NO_ERASE
is specified. With this users will not be able to issue erase opcodes to
flashes and instead they will recive an -ENOTSUPP error.
Cc: stable(a)vger.kernel.org
Fixes: b199489d37b2 ("mtd: spi-nor: add the framework for SPI NOR")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
---
drivers/mtd/spi-nor/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 86a536c97c18..cd2d094ef837 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -2969,10 +2969,11 @@ static void spi_nor_set_mtd_info(struct spi_nor *nor)
mtd->flags = MTD_CAP_NORFLASH;
if (nor->info->flags & SPI_NOR_NO_ERASE)
mtd->flags |= MTD_NO_ERASE;
+ else
+ mtd->_erase = spi_nor_erase;
mtd->writesize = nor->params->writesize;
mtd->writebufsize = nor->params->page_size;
mtd->size = nor->params->size;
- mtd->_erase = spi_nor_erase;
mtd->_read = spi_nor_read;
/* Might be already set by some SST flashes. */
if (!mtd->_write)
--
2.25.1
We tested RS485 function on an EVB which has SC16IS752, after
finishing the test, we started the RS232 function test, but found the
RTS is still working in the RS485 mode.
That is because both startup and shutdown call port_update() to set
the EFCR_REG, this will not clear the RS485 bits once the bits are set
in the reconf_rs485(). To fix it, clear the RS485 bits in shutdown.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
drivers/tty/serial/sc16is7xx.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 3a6c68e19c80..6adc51d9ecf3 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1055,10 +1055,12 @@ static void sc16is7xx_shutdown(struct uart_port *port)
/* Disable all interrupts */
sc16is7xx_port_write(port, SC16IS7XX_IER_REG, 0);
- /* Disable TX/RX */
+ /* Disable TX/RX, clear auto RS485 and RTS invert */
sc16is7xx_port_update(port, SC16IS7XX_EFCR_REG,
SC16IS7XX_EFCR_RXDISABLE_BIT |
- SC16IS7XX_EFCR_TXDISABLE_BIT,
+ SC16IS7XX_EFCR_TXDISABLE_BIT |
+ SC16IS7XX_EFCR_AUTO_RS485_BIT |
+ SC16IS7XX_EFCR_RTS_INVERT_BIT,
SC16IS7XX_EFCR_RXDISABLE_BIT |
SC16IS7XX_EFCR_TXDISABLE_BIT);
--
2.25.1
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Usage of the iterator outside of the list_for_each_entry
is considered harmful. https://lkml.org/lkml/2022/2/17/1032
Do not reference the loop variable outside of the loop,
by rearranging the orders of execution.
Instead of performing search loop and checking outside the loop
if the end of the list was hit and no matching element was found,
the execution is performed inside the loop upon a successful match
followed by a goto statement to the next step,
therefore no condition has to be performed after the loop has ended.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
---
drivers/misc/mei/interrupt.c | 35 +++++++++++++++--------------------
1 file changed, 15 insertions(+), 20 deletions(-)
diff --git a/drivers/misc/mei/interrupt.c b/drivers/misc/mei/interrupt.c
index a67f4f2d33a9..0706322154cb 100644
--- a/drivers/misc/mei/interrupt.c
+++ b/drivers/misc/mei/interrupt.c
@@ -424,31 +424,26 @@ int mei_irq_read_handler(struct mei_device *dev,
list_for_each_entry(cl, &dev->file_list, link) {
if (mei_cl_hbm_equal(cl, mei_hdr)) {
cl_dbg(dev, cl, "got a message\n");
- break;
+ ret = mei_cl_irq_read_msg(cl, mei_hdr, meta_hdr, cmpl_list);
+ goto reset_slots;
}
}
/* if no recipient cl was found we assume corrupted header */
- if (&cl->link == &dev->file_list) {
- /* A message for not connected fixed address clients
- * should be silently discarded
- * On power down client may be force cleaned,
- * silently discard such messages
- */
- if (hdr_is_fixed(mei_hdr) ||
- dev->dev_state == MEI_DEV_POWER_DOWN) {
- mei_irq_discard_msg(dev, mei_hdr, mei_hdr->length);
- ret = 0;
- goto reset_slots;
- }
- dev_err(dev->dev, "no destination client found 0x%08X\n",
- dev->rd_msg_hdr[0]);
- ret = -EBADMSG;
- goto end;
+ /* A message for not connected fixed address clients
+ * should be silently discarded
+ * On power down client may be force cleaned,
+ * silently discard such messages
+ */
+ if (hdr_is_fixed(mei_hdr) ||
+ dev->dev_state == MEI_DEV_POWER_DOWN) {
+ mei_irq_discard_msg(dev, mei_hdr, mei_hdr->length);
+ ret = 0;
+ goto reset_slots;
}
-
- ret = mei_cl_irq_read_msg(cl, mei_hdr, meta_hdr, cmpl_list);
-
+ dev_err(dev->dev, "no destination client found 0x%08X\n", dev->rd_msg_hdr[0]);
+ ret = -EBADMSG;
+ goto end;
reset_slots:
/* reset the number of slots and header */
--
2.35.1
Good day,
Can I write you here? I have urgent information for you here, With
utmost good faith?, as you know that my country have been in deep
crisis due to the war,
Miss. Mariam Musa.
Dzień dobry,
Czy interesuje Państwa rozwiązanie umożliwiające monitorowanie samochodów firmowych oraz optymalizację kosztów ich utrzymania?
Pozdrawiam,
Marcin Chruszcz
The patch titled
Subject: hugetlb: do not demote poisoned hugetlb pages
has been added to the -mm tree. Its filename is
hugetlb-do-not-demote-poisoned-hugetlb-pages.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hugetlb-do-not-demote-poisoned-hu…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlb-do-not-demote-poisoned-hu…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: do not demote poisoned hugetlb pages
It is possible for poisoned hugetlb pages to reside on the free lists.
The huge page allocation routines which dequeue entries from the free
lists make a point of avoiding poisoned pages. There is no such check and
avoidance in the demote code path.
If a hugetlb page on the is on a free list, poison will only be set in the
head page rather then the page with the actual error. If such a page is
demoted, then the poison flag may follow the wrong page. A page without
error could have poison set, and a page with poison could not have the
flag set.
Check for poison before attempting to demote a hugetlb page. Also, return
-EBUSY to the caller if only poisoned pages are on the free list.
Link: https://lkml.kernel.org/r/20220307215707.50916-1-mike.kravetz@oracle.com
Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--- a/mm/hugetlb.c~hugetlb-do-not-demote-poisoned-hugetlb-pages
+++ a/mm/hugetlb.c
@@ -3469,7 +3469,6 @@ static int demote_pool_huge_page(struct
{
int nr_nodes, node;
struct page *page;
- int rc = 0;
lockdep_assert_held(&hugetlb_lock);
@@ -3480,15 +3479,19 @@ static int demote_pool_huge_page(struct
}
for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
- if (!list_empty(&h->hugepage_freelists[node])) {
- page = list_entry(h->hugepage_freelists[node].next,
- struct page, lru);
- rc = demote_free_huge_page(h, page);
- break;
+ list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
+ if (PageHWPoison(page))
+ continue;
+
+ return demote_free_huge_page(h, page);
}
}
- return rc;
+ /*
+ * Only way to get here is if all pages on free lists are poisoned.
+ * Return -EBUSY so that caller will not retry.
+ */
+ return -EBUSY;
}
#define HSTATE_ATTR_RO(_name) \
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-do-not-demote-poisoned-hugetlb-pages.patch
hugetlb-clean-up-potential-spectre-issue-warnings.patch
hugetlb-clean-up-potential-spectre-issue-warnings-v2.patch
mm-enable-madv_dontneed-for-hugetlb-mappings.patch
selftests-vm-add-hugetlb-madvise-madv_dontneed-madv_remove-test.patch
userfaultfd-selftests-enable-hugetlb-remap-and-remove-event-testing.patch
The patch titled
Subject: hugetlb: do not demote poisoned hugetlb pages
has been added to the -mm tree. Its filename is
hugetlb-do-not-demote-poisoned-hugetlb-pages.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hugetlb-do-not-demote-poisoned-hu…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlb-do-not-demote-poisoned-hu…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: do not demote poisoned hugetlb pages
It is possible for poisoned hugetlb pages to reside on the free lists.
The huge page allocation routines which dequeue entries from the free
lists make a point of avoiding poisoned pages. There is no such check
and avoidance in the demote code path.
If a hugetlb page on the is on a free list, poison will only be set in
the head page rather then the page with the actual error. If such a
page is demoted, then the poison flag may follow the wrong page. A page
without error could have poison set, and a page with poison could not
have the flag set.
Check for poison before attempting to demote a hugetlb page. Also,
return -EBUSY to the caller if only poisoned pages are on the free list.
Link: https://lkml.kernel.org/r/20220307215707.50916-1-mike.kravetz@oracle.com
Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--- a/mm/hugetlb.c~hugetlb-do-not-demote-poisoned-hugetlb-pages
+++ a/mm/hugetlb.c
@@ -3469,7 +3469,6 @@ static int demote_pool_huge_page(struct
{
int nr_nodes, node;
struct page *page;
- int rc = 0;
lockdep_assert_held(&hugetlb_lock);
@@ -3480,15 +3479,19 @@ static int demote_pool_huge_page(struct
}
for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
- if (!list_empty(&h->hugepage_freelists[node])) {
- page = list_entry(h->hugepage_freelists[node].next,
- struct page, lru);
- rc = demote_free_huge_page(h, page);
- break;
+ list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
+ if (PageHWPoison(page))
+ continue;
+
+ return demote_free_huge_page(h, page);
}
}
- return rc;
+ /*
+ * Only way to get here is if all pages on free lists are poisoned.
+ * Return -EBUSY so that caller will not retry.
+ */
+ return -EBUSY;
}
#define HSTATE_ATTR_RO(_name) \
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-do-not-demote-poisoned-hugetlb-pages.patch
hugetlb-clean-up-potential-spectre-issue-warnings.patch
hugetlb-clean-up-potential-spectre-issue-warnings-v2.patch
mm-enable-madv_dontneed-for-hugetlb-mappings.patch
selftests-vm-add-hugetlb-madvise-madv_dontneed-madv_remove-test.patch
userfaultfd-selftests-enable-hugetlb-remap-and-remove-event-testing.patch
The following changes since commit 7e57714cd0ad2d5bb90e50b5096a0e671dec1ef3:
Linux 5.17-rc6 (2022-02-27 14:36:33 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 3dd7d135e75cb37c8501ba02977332a2a487dd39:
tools/virtio: handle fallout from folio work (2022-03-06 06:06:50 -0500)
----------------------------------------------------------------
virtio: last minute fixes
Some fixes that took a while to get ready. Not regressions,
but they look safe and seem to be worth to have.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Anirudh Rayabharam (1):
vhost: fix hung thread due to erroneous iotlb entries
Michael S. Tsirkin (6):
virtio: unexport virtio_finalize_features
virtio: acknowledge all features before access
virtio: document virtio_reset_device
virtio_console: break out of buf poll on remove
virtio: drop default for virtio-mem
tools/virtio: handle fallout from folio work
Si-Wei Liu (3):
vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
Stefano Garzarella (2):
vhost: remove avail_event arg from vhost_update_avail_event()
tools/virtio: fix virtio_test execution
Xie Yongji (3):
vduse: Fix returning wrong type in vduse_domain_alloc_iova()
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
virtio-blk: Remove BUG_ON() in virtio_queue_rq()
Zhang Min (1):
vdpa: fix use-after-free on vp_vdpa_remove
drivers/block/virtio_blk.c | 20 ++++++-------
drivers/char/virtio_console.c | 7 +++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 34 ++++++++++++++++++++--
drivers/vdpa/vdpa.c | 2 +-
drivers/vdpa/vdpa_user/iova_domain.c | 2 +-
drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +-
drivers/vhost/iotlb.c | 11 +++++++
drivers/vhost/vdpa.c | 2 +-
drivers/vhost/vhost.c | 9 ++++--
drivers/virtio/Kconfig | 1 -
drivers/virtio/virtio.c | 56 ++++++++++++++++++++++++------------
drivers/virtio/virtio_vdpa.c | 2 +-
include/linux/vdpa.h | 18 ++++++++----
include/linux/virtio.h | 1 -
include/linux/virtio_config.h | 3 +-
tools/virtio/linux/mm_types.h | 3 ++
tools/virtio/virtio_test.c | 1 +
17 files changed, 127 insertions(+), 47 deletions(-)
create mode 100644 tools/virtio/linux/mm_types.h
Commit 4e6292114c74 ("x86/paravirt: Add new features for paravirt
patching") changed the order in which altinstructions and paravirt
instructions are patched at boot time. However, no analogous change was
made in module_finalize, where we apply altinstructions and
parainstructions during module load.
As a result, any code that generates "stacked up" altinstructions and
parainstructions (i.e. local_irq_save/restore) will produce different
results when used in built-in kernel code vs. kernel modules. This also
makes it possible to inadvertently replace altinstructions in the booted
kernel with their parainstruction counterparts when using
livepatch/kpatch.
To fix this, re-order the processing in module_finalize, so that we do
things in this order:
1. apply_paravirt
2. apply_retpolines
3. apply_alternatives
4. alternatives_smp_module_add
This is the same ordering that is used at boot time in
alternative_instructions.
Fixes: 4e6292114c74 ("x86/paravirt: Add new features for paravirt patching")
Signed-off-by: Alex Thorlton <alex.thorlton(a)oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: x86(a)kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.13+
---
arch/x86/kernel/module.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 95fa745e310a5..4edc9c87ad0bc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -273,6 +273,10 @@ int module_finalize(const Elf_Ehdr *hdr,
retpolines = s;
}
+ if (para) {
+ void *pseg = (void *)para->sh_addr;
+ apply_paravirt(pseg, pseg + para->sh_size);
+ }
if (retpolines) {
void *rseg = (void *)retpolines->sh_addr;
apply_retpolines(rseg, rseg + retpolines->sh_size);
@@ -290,11 +294,6 @@ int module_finalize(const Elf_Ehdr *hdr,
tseg, tseg + text->sh_size);
}
- if (para) {
- void *pseg = (void *)para->sh_addr;
- apply_paravirt(pseg, pseg + para->sh_size);
- }
-
/* make jump label nops */
jump_label_apply_nops(me);
--
2.33.1
From: Brett Creeley <brett.creeley(a)intel.com>
commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream.
[I had to fix the cherry-pick manually as the patch added a line around
some context that was missing.]
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
---
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++
.../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++
2 files changed, 30 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index a78e8f00cf71..a0b8436f50ba 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -646,6 +646,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1894,6 +1896,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
*/
ice_vf_ctrl_invalidate_vsi(vf);
ice_vf_fdir_init(vf);
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -4082,6 +4086,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -4091,6 +4097,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4465,6 +4472,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ice_vc_get_ver_msg(vf, msg);
@@ -4553,6 +4569,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -4668,6 +4686,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -4686,6 +4706,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4715,11 +4736,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 38b4dc82c5c1..a750e9a9d712 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -74,6 +74,11 @@ struct ice_mdd_vf_events {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
u16 ctrl_vsi_idx;
--
2.35.1.129.gb80121027d12
From: Brett Creeley <brett.creeley(a)intel.com>
commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream.
[I had to fix the cherry-pick manually as the patch added a line around
some context that was missing.]
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Cc: <stable(a)vger.kernel.org> # 5.14.x
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
---
This should apply to 5.14.x
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++
.../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++
2 files changed, 30 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 7e3ae4cc17a3..d2f79d579745 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -646,6 +646,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1892,6 +1894,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
*/
ice_vf_ctrl_invalidate_vsi(vf);
ice_vf_fdir_init(vf);
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -4080,6 +4084,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -4089,6 +4095,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4463,6 +4470,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ice_vc_get_ver_msg(vf, msg);
@@ -4551,6 +4567,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -4666,6 +4684,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -4684,6 +4704,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4713,11 +4734,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 38b4dc82c5c1..a750e9a9d712 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -74,6 +74,11 @@ struct ice_mdd_vf_events {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
u16 ctrl_vsi_idx;
--
2.35.1.355.ge7e302376dd6
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e4a41c2c1fa916547e63440c73a51a5eb06247af Mon Sep 17 00:00:00 2001
From: Hou Tao <houtao1(a)huawei.com>
Date: Fri, 31 Dec 2021 23:10:18 +0800
Subject: [PATCH] bpf, arm64: Use emit_addr_mov_i64() for BPF_PSEUDO_FUNC
The following error is reported when running "./test_progs -t for_each"
under arm64:
bpf_jit: multi-func JIT bug 58 != 56
[...]
JIT doesn't support bpf-to-bpf calls
The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:
0x48: 21 00 C0 D2 movz x1, #0x1, lsl #32
0x4c: 21 00 80 F2 movk x1, #0x1
0x48: E1 3F C0 92 movn x1, #0x1ff, lsl #32
0x4c: 41 FE A2 F2 movk x1, #0x17f2, lsl #16
0x50: 81 70 9F F2 movk x1, #0xfb84
Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
the size of jited image will not change.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Hou Tao <houtao1(a)huawei.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20211231151018.3781550-1-houtao1@huawei.com
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 07aad85848fa..e96d4d87291f 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -792,7 +792,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
u64 imm64;
imm64 = (u64)insn1.imm << 32 | (u32)imm;
- emit_a64_mov_i64(dst, imm64, ctx);
+ if (bpf_pseudo_func(insn))
+ emit_addr_mov_i64(dst, imm64, ctx);
+ else
+ emit_a64_mov_i64(dst, imm64, ctx);
return 1;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 25f1488bdbba63415239ff301fe61a8546140d9f Mon Sep 17 00:00:00 2001
From: Bas Nieuwenhuizen <bas(a)basnieuwenhuizen.nl>
Date: Mon, 24 Jan 2022 01:23:36 +0100
Subject: [PATCH] drm/amd/display: Wrap dcn301_calculate_wm_and_dlg for FPU.
Mirrors the logic for dcn30. Cue lots of WARNs and some
kernel panics without this fix.
Cc: stable(a)vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen <bas(a)basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index b4001233867c..5d9637b07429 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -1380,6 +1380,17 @@ static void set_wm_ranges(
pp_smu->nv_funcs.set_wm_ranges(&pp_smu->nv_funcs.pp_smu, &ranges);
}
+static void dcn301_calculate_wm_and_dlg(
+ struct dc *dc, struct dc_state *context,
+ display_e2e_pipe_params_st *pipes,
+ int pipe_cnt,
+ int vlevel)
+{
+ DC_FP_START();
+ dcn301_calculate_wm_and_dlg_fp(dc, context, pipes, pipe_cnt, vlevel);
+ DC_FP_END();
+}
+
static struct resource_funcs dcn301_res_pool_funcs = {
.destroy = dcn301_destroy_resource_pool,
.link_enc_create = dcn301_link_encoder_create,
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
index 94c32832a0e7..0a7a33864973 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
@@ -327,7 +327,7 @@ void dcn301_fpu_init_soc_bounding_box(struct bp_soc_bb_info bb_info)
dcn3_01_soc.sr_exit_time_us = bb_info.dram_sr_exit_latency_100ns * 10;
}
-void dcn301_calculate_wm_and_dlg(struct dc *dc,
+void dcn301_calculate_wm_and_dlg_fp(struct dc *dc,
struct dc_state *context,
display_e2e_pipe_params_st *pipes,
int pipe_cnt,
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h
index fc7065d17842..774b0fdfc80b 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h
@@ -34,7 +34,7 @@ void dcn301_fpu_set_wm_ranges(int i,
void dcn301_fpu_init_soc_bounding_box(struct bp_soc_bb_info bb_info);
-void dcn301_calculate_wm_and_dlg(struct dc *dc,
+void dcn301_calculate_wm_and_dlg_fp(struct dc *dc,
struct dc_state *context,
display_e2e_pipe_params_st *pipes,
int pipe_cnt,
At least some PL2303GS have a bcdDevice of 0x605 instead of 0x100 as the
datasheet claims. Add it to the list of known release numbers for the
HXN (G) type.
Fixes: 894758d0571d ("USB: serial: pl2303: tighten type HXN (G) detection")
Reported-by: Matyáš Kroupa <kroupa.matyas(a)gmail.com>
Link: https://lore.kernel.org/r/165de6a0-43e9-092c-2916-66b115c7fbf4@gmail.com
Cc: stable(a)vger.kernel.org # 5.13
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/pl2303.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
index e2ef761ed39c..88b284d61681 100644
--- a/drivers/usb/serial/pl2303.c
+++ b/drivers/usb/serial/pl2303.c
@@ -436,6 +436,7 @@ static int pl2303_detect_type(struct usb_serial *serial)
case 0x105:
case 0x305:
case 0x405:
+ case 0x605:
/*
* Assume it's an HXN-type if the device doesn't
* support the old read request value.
--
2.34.1
Recent tightening of the opcode table in binutils so as to consistently
disallow the assembly or disassembly of CP0 instructions not supported
by the processor architecture chosen has caused a regression like below:
arch/mips/dec/prom/locore.S: Assembler messages:
arch/mips/dec/prom/locore.S:29: Error: opcode not supported on this processor: r4600 (mips3) `rfe'
in a piece of code used to probe for memory with PMAX DECstation models,
which have non-REX firmware. Those computers always have an R2000 CPU
and consequently the exception handler used in memory probing uses the
RFE instruction, which those processors use.
While adding 64-bit support this code was correctly excluded for 64-bit
configurations, however it should have also been excluded for irrelevant
32-bit configurations. Do this now then, and only enable PMAX memory
probing for R3k systems.
Reported-by: Jan-Benedict Glaw <jbglaw(a)lug-owl.de>
Reported-by: Sudip Mukherjee <sudipm.mukherjee(a)gmail.com>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org # v2.6.12+
---
Hi,
I'm assuming this won't go back beyond commit 2a11c8ea20bf ("kconfig:
Introduce IS_ENABLED(), IS_BUILTIN() and IS_MODULE()") or any backport
will have to be rewritten to avoid IS_ENABLED.
The original actual change named to fix ought to be commit dd82ef87e4c9
("PROM interface rework to support a 64-bit kernel.") from the LMO repo:
<https://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git/commit/?id=d…>,
which introduced the `prom_is_rex' macro, which guards this code. Said
commit predates the history of our main repository though.
This change has actually been verified at runtime with a PMIN system
(effectively a PMAX, but with a slower R2000 CPU) and a 4MAX+ system (an
R4400SC-based machine), and naturally throughout the three possible build
configurations: R3k, R4k/32-bit, R4k/64-bit.
It took longer than expected, but oh well... Sorry for the inconvenience
caused.
Please apply,
Maciej
---
arch/mips/dec/prom/Makefile | 2 +-
arch/mips/include/asm/dec/prom.h | 15 +++++----------
2 files changed, 6 insertions(+), 11 deletions(-)
linux-dec-locore-r3000.diff
Index: linux-macro/arch/mips/dec/prom/Makefile
===================================================================
--- linux-macro.orig/arch/mips/dec/prom/Makefile
+++ linux-macro/arch/mips/dec/prom/Makefile
@@ -6,4 +6,4 @@
lib-y += init.o memory.o cmdline.o identify.o console.o
-lib-$(CONFIG_32BIT) += locore.o
+lib-$(CONFIG_CPU_R3000) += locore.o
Index: linux-macro/arch/mips/include/asm/dec/prom.h
===================================================================
--- linux-macro.orig/arch/mips/include/asm/dec/prom.h
+++ linux-macro/arch/mips/include/asm/dec/prom.h
@@ -43,16 +43,11 @@
*/
#define REX_PROM_MAGIC 0x30464354
-#ifdef CONFIG_64BIT
-
-#define prom_is_rex(magic) 1 /* KN04 and KN05 are REX PROMs. */
-
-#else /* !CONFIG_64BIT */
-
-#define prom_is_rex(magic) ((magic) == REX_PROM_MAGIC)
-
-#endif /* !CONFIG_64BIT */
-
+/* KN04 and KN05 are REX PROMs, so only do the check for R3k systems. */
+static inline bool prom_is_rex(u32 magic)
+{
+ return !IS_ENABLED(CONFIG_CPU_R3000) || magic == REX_PROM_MAGIC;
+}
/*
* 3MIN/MAXINE PROM entry points for DS5000/1xx's, DS5000/xx's and
With KCFLAGS="-O3", I was able to trigger a fortify-source
memcpy() overflow panic on set_vi_srs_handler().
Although O3 level is not supported in the mainline, under some
conditions that may've happened with any optimization settings,
it's just a matter of inlining luck. The panic itself is correct,
more precisely, 50/50 false-positive and not at the same time.
From the one side, no real overflow happens. Exception handler
defined in asm just gets copied to some reserved places in the
memory.
But the reason behind is that C code refers to that exception
handler declares it as `char`, i.e. something of 1 byte length.
It's obvious that the asm function itself is way more than 1 byte,
so fortify logics thought we are going to past the symbol declared.
The standard way to refer to asm symbols from C code which is not
supposed to be called from C is to declare them as
`extern const u8[]`. This is fully correct from any point of view,
as any code itself is just a bunch of bytes (including 0 as it is
for syms like _stext/_etext/etc.), and the exact size is not known
at the moment of compilation.
Adjust the type of the except_vec_vi_*() and related variables.
Make set_handler() take `const` as a second argument to avoid
cast-away warnings and give a little more room for optimization.
Fixes: e01402b115cc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants")
Fixes: c65a5480ff29 ("[MIPS] Fix potential latency problem due to non-atomic cpu_wait.")
Cc: stable(a)vger.kernel.org # 3.10+
Signed-off-by: Alexander Lobakin <alobakin(a)pm.me>
---
arch/mips/include/asm/setup.h | 2 +-
arch/mips/kernel/traps.c | 22 +++++++++++-----------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h
index bb36a400203d..8c56b862fd9c 100644
--- a/arch/mips/include/asm/setup.h
+++ b/arch/mips/include/asm/setup.h
@@ -16,7 +16,7 @@ static inline void setup_8250_early_printk_port(unsigned long base,
unsigned int reg_shift, unsigned int timeout) {}
#endif
-extern void set_handler(unsigned long offset, void *addr, unsigned long len);
+void set_handler(unsigned long offset, const void *addr, unsigned long len);
extern void set_uncached_handler(unsigned long offset, void *addr, unsigned long len);
typedef void (*vi_handler_t)(void);
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index a486486b2355..246c6a6b0261 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2091,19 +2091,19 @@ static void *set_vi_srs_handler(int n, vi_handler_t addr, int srs)
* If no shadow set is selected then use the default handler
* that does normal register saving and standard interrupt exit
*/
- extern char except_vec_vi, except_vec_vi_lui;
- extern char except_vec_vi_ori, except_vec_vi_end;
- extern char rollback_except_vec_vi;
- char *vec_start = using_rollback_handler() ?
- &rollback_except_vec_vi : &except_vec_vi;
+ extern const u8 except_vec_vi[], except_vec_vi_lui[];
+ extern const u8 except_vec_vi_ori[], except_vec_vi_end[];
+ extern const u8 rollback_except_vec_vi[];
+ const u8 *vec_start = using_rollback_handler() ?
+ rollback_except_vec_vi : except_vec_vi;
#if defined(CONFIG_CPU_MICROMIPS) || defined(CONFIG_CPU_BIG_ENDIAN)
- const int lui_offset = &except_vec_vi_lui - vec_start + 2;
- const int ori_offset = &except_vec_vi_ori - vec_start + 2;
+ const int lui_offset = except_vec_vi_lui - vec_start + 2;
+ const int ori_offset = except_vec_vi_ori - vec_start + 2;
#else
- const int lui_offset = &except_vec_vi_lui - vec_start;
- const int ori_offset = &except_vec_vi_ori - vec_start;
+ const int lui_offset = except_vec_vi_lui - vec_start;
+ const int ori_offset = except_vec_vi_ori - vec_start;
#endif
- const int handler_len = &except_vec_vi_end - vec_start;
+ const int handler_len = except_vec_vi_end - vec_start;
if (handler_len > VECTORSPACING) {
/*
@@ -2311,7 +2311,7 @@ void per_cpu_trap_init(bool is_boot_cpu)
}
/* Install CPU exception handler */
-void set_handler(unsigned long offset, void *addr, unsigned long size)
+void set_handler(unsigned long offset, const void *addr, unsigned long size)
{
#ifdef CONFIG_CPU_MICROMIPS
memcpy((void *)(ebase + offset), ((unsigned char *)addr - 1), size);
--
2.35.1
On 2022-03-06 22:47, Matthew Wilcox wrote:
> On Sat, Mar 05, 2022 at 03:56:38PM -0500, Sasha Levin wrote:
>> This is a note to let you know that I've just added the patch titled
>>
>> iommu/amd: Use put_pages_list
>>
>> to the 5.16-stable tree which can be found at:
>> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>>
>> The filename of the patch is:
>> iommu-amd-use-put_pages_list.patch
>> and it can be found in the queue-5.16 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let <stable(a)vger.kernel.org> know about it.
>
> I would defer to Robin, but I don't think this is a good candidate for
> a stable backport. It has some dependencies, IIRC. And it's not really
> fixing a bug.
Indeed, I'd feel a bit uncomfortable about backporting this
unnecessarily. It doesn't make much sense in isolation without the other
patch(es) converting the rest of the IOMMU subsystem as well, but either
way unless the whole SLAB rework needs backporting for some reason then
it just represents risk without benefit.
Thanks,
Robin.
--
Hi,
I am a banker working with one of the leading banks here in the United
States. I write to contact you over a very important business
transaction which will be in our interest and of huge benefit to both
parties. Kindly get back to me for more details.
Thanks.
Mr. William Brook
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: venus: hfi_cmds: List HDR10 property as unsupported for v1 and v3
Author: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Date: Tue Feb 1 16:51:29 2022 +0100
The HFI_PROPERTY_PARAM_VENC_HDR10_PQ_SEI HFI property is not supported
on Venus v1 and v3.
cc: stable(a)vger.kernel.org # 5.13+
Fixes: 9172652d72f8 ("media: venus: venc: Add support for CLL and Mastering display controls")
Signed-off-by: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
drivers/media/platform/qcom/venus/hfi_cmds.c | 2 ++
1 file changed, 2 insertions(+)
---
diff --git a/drivers/media/platform/qcom/venus/hfi_cmds.c b/drivers/media/platform/qcom/venus/hfi_cmds.c
index 5aea07307e02..4ecd444050bb 100644
--- a/drivers/media/platform/qcom/venus/hfi_cmds.c
+++ b/drivers/media/platform/qcom/venus/hfi_cmds.c
@@ -1054,6 +1054,8 @@ static int pkt_session_set_property_1x(struct hfi_session_set_property_pkt *pkt,
pkt->shdr.hdr.size += sizeof(u32) + sizeof(*info);
break;
}
+ case HFI_PROPERTY_PARAM_VENC_HDR10_PQ_SEI:
+ return -ENOTSUPP;
/* FOLLOWING PROPERTIES ARE NOT IMPLEMENTED IN CORE YET */
case HFI_PROPERTY_CONFIG_BUFFER_REQUIREMENTS:
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: venus: venc: Fix h264 8x8 transform control
Author: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Date: Tue Feb 8 02:18:16 2022 +0100
During encoder driver open controls are initialized via a call
to v4l2_ctrl_handler_setup which returns EINVAL error for
V4L2_CID_MPEG_VIDEO_H264_8X8_TRANSFORM v4l2 control. The control
default value is disabled and because of firmware limitations
8x8 transform cannot be disabled for the supported HIGH and
CONSTRAINED_HIGH profiles.
To fix the issue change the control default value to enabled
(this is fine because the firmware enables 8x8 transform for
high and constrained_high profiles by default). Also, correct
the checking of profile ids in s_ctrl from hfi to v4l2 ids.
cc: stable(a)vger.kernel.org # 5.15+
Fixes: bfee75f73c37 ("media: venus: venc: add support for V4L2_CID_MPEG_VIDEO_H264_8X8_TRANSFORM control")
Signed-off-by: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
drivers/media/platform/qcom/venus/venc.c | 4 ++--
drivers/media/platform/qcom/venus/venc_ctrls.c | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
---
diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c
index 84bafc3118cc..adea4c3b8c20 100644
--- a/drivers/media/platform/qcom/venus/venc.c
+++ b/drivers/media/platform/qcom/venus/venc.c
@@ -662,8 +662,8 @@ static int venc_set_properties(struct venus_inst *inst)
ptype = HFI_PROPERTY_PARAM_VENC_H264_TRANSFORM_8X8;
h264_transform.enable_type = 0;
- if (ctr->profile.h264 == HFI_H264_PROFILE_HIGH ||
- ctr->profile.h264 == HFI_H264_PROFILE_CONSTRAINED_HIGH)
+ if (ctr->profile.h264 == V4L2_MPEG_VIDEO_H264_PROFILE_HIGH ||
+ ctr->profile.h264 == V4L2_MPEG_VIDEO_H264_PROFILE_CONSTRAINED_HIGH)
h264_transform.enable_type = ctr->h264_8x8_transform;
ret = hfi_session_set_property(inst, ptype, &h264_transform);
diff --git a/drivers/media/platform/qcom/venus/venc_ctrls.c b/drivers/media/platform/qcom/venus/venc_ctrls.c
index 1ada42df314d..ea5805e71c14 100644
--- a/drivers/media/platform/qcom/venus/venc_ctrls.c
+++ b/drivers/media/platform/qcom/venus/venc_ctrls.c
@@ -320,8 +320,8 @@ static int venc_op_s_ctrl(struct v4l2_ctrl *ctrl)
ctr->intra_refresh_period = ctrl->val;
break;
case V4L2_CID_MPEG_VIDEO_H264_8X8_TRANSFORM:
- if (ctr->profile.h264 != HFI_H264_PROFILE_HIGH &&
- ctr->profile.h264 != HFI_H264_PROFILE_CONSTRAINED_HIGH)
+ if (ctr->profile.h264 != V4L2_MPEG_VIDEO_H264_PROFILE_HIGH &&
+ ctr->profile.h264 != V4L2_MPEG_VIDEO_H264_PROFILE_CONSTRAINED_HIGH)
return -EINVAL;
/*
@@ -457,7 +457,7 @@ int venc_ctrl_init(struct venus_inst *inst)
V4L2_CID_MPEG_VIDEO_H264_I_FRAME_MIN_QP, 1, 51, 1, 1);
v4l2_ctrl_new_std(&inst->ctrl_handler, &venc_ctrl_ops,
- V4L2_CID_MPEG_VIDEO_H264_8X8_TRANSFORM, 0, 1, 1, 0);
+ V4L2_CID_MPEG_VIDEO_H264_8X8_TRANSFORM, 0, 1, 1, 1);
v4l2_ctrl_new_std(&inst->ctrl_handler, &venc_ctrl_ops,
V4L2_CID_MPEG_VIDEO_H264_P_FRAME_MIN_QP, 1, 51, 1, 1);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: venus: vdec: fixed possible memory leak issue
Author: Ameer Hamza <amhamza.mgc(a)gmail.com>
Date: Mon Dec 6 11:43:15 2021 +0100
The venus_helper_alloc_dpb_bufs() implementation allows an early return
on an error path when checking the id from ida_alloc_min() which would
not release the earlier buffer allocation.
Move the direct kfree() from the error checking of dma_alloc_attrs() to
the common fail path to ensure that allocations are released on all
error paths in this function.
Addresses-Coverity: 1494120 ("Resource leak")
cc: stable(a)vger.kernel.org # 5.16+
Fixes: 40d87aafee29 ("media: venus: vdec: decoded picture buffer handling during reconfig sequence")
Signed-off-by: Ameer Hamza <amhamza.mgc(a)gmail.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas(a)ideasonboard.com>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
drivers/media/platform/qcom/venus/helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/platform/qcom/venus/helpers.c b/drivers/media/platform/qcom/venus/helpers.c
index 84c3a511ec31..0bca95d01650 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -189,7 +189,6 @@ int venus_helper_alloc_dpb_bufs(struct venus_inst *inst)
buf->va = dma_alloc_attrs(dev, buf->size, &buf->da, GFP_KERNEL,
buf->attrs);
if (!buf->va) {
- kfree(buf);
ret = -ENOMEM;
goto fail;
}
@@ -209,6 +208,7 @@ int venus_helper_alloc_dpb_bufs(struct venus_inst *inst)
return 0;
fail:
+ kfree(buf);
venus_helper_free_dpb_bufs(inst);
return ret;
}
I would request three commits to be backported to stable/4.14
c75ab8a55ac1 ("net/rds: remove user triggered WARN_ON in rds_sendmsg")
7dba92037baf ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()")
bdc2ab5c61a5 ("net/rds: Fix a use after free in rds_message_map_pages")
The commits fix bug where input-parameters to 'rds_message_alloc_sgs()'
are just tested with WARN_ON instead of error-return
Regards
Hans Westgaard Ry
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 353050be4c19e102178ccc05988101887c25ae53 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 9 Nov 2021 18:48:08 +0000
Subject: [PATCH] bpf: Fix toctou on read-only map's constant scalar tracking
Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is
checking whether maps are read-only both from BPF program side and user space
side, and then, given their content is constant, reading out their data via
map->ops->map_direct_value_addr() which is then subsequently used as known
scalar value for the register, that is, it is marked as __mark_reg_known()
with the read value at verification time. Before a23740ec43ba, the register
content was marked as an unknown scalar so the verifier could not make any
assumptions about the map content.
The current implementation however is prone to a TOCTOU race, meaning, the
value read as known scalar for the register is not guaranteed to be exactly
the same at a later point when the program is executed, and as such, the
prior made assumptions of the verifier with regards to the program will be
invalid which can cause issues such as OOB access, etc.
While the BPF_F_RDONLY_PROG map flag is always fixed and required to be
specified at map creation time, the map->frozen property is initially set to
false for the map given the map value needs to be populated, e.g. for global
data sections. Once complete, the loader "freezes" the map from user space
such that no subsequent updates/deletes are possible anymore. For the rest
of the lifetime of the map, this freeze one-time trigger cannot be undone
anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_*
cmd calls which would update/delete map entries will be rejected with -EPERM
since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also
means that pending update/delete map entries must still complete before this
guarantee is given. This corner case is not an issue for loaders since they
create and prepare such program private map in successive steps.
However, a malicious user is able to trigger this TOCTOU race in two different
ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is
used to expand the competition interval, so that map_update_elem() can modify
the contents of the map after map_freeze() and bpf_prog_load() were executed.
This works, because userfaultfd halts the parallel thread which triggered a
map_update_elem() at the time where we copy key/value from the user buffer and
this already passed the FMODE_CAN_WRITE capability test given at that time the
map was not "frozen". Then, the main thread performs the map_freeze() and
bpf_prog_load(), and once that had completed successfully, the other thread
is woken up to complete the pending map_update_elem() which then changes the
map content. For ii) the idea of the batched update is similar, meaning, when
there are a large number of updates to be processed, it can increase the
competition interval between the two. It is therefore possible in practice to
modify the contents of the map after executing map_freeze() and bpf_prog_load().
One way to fix both i) and ii) at the same time is to expand the use of the
map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap()
support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf:
Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with
the rationale to make a writable mmap()'ing of a map mutually exclusive with
read-only freezing. The counter indicates writable mmap() mappings and then
prevents/fails the freeze operation. Its semantics can be expanded beyond
just mmap() by generally indicating ongoing write phases. This would essentially
span any parallel regular and batched flavor of update/delete operation and
then also have map_freeze() fail with -EBUSY. For the check_mem_access() in
the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all
last pending writes have completed via bpf_map_write_active() test. Once the
map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0
only then we are really guaranteed to use the map's data as known constants.
For map->frozen being set and pending writes in process of still being completed
we fall back to marking that register as unknown scalar so we don't end up
making assumptions about it. With this, both TOCTOU reproducers from i) and
ii) are fixed.
Note that the map->writecnt has been converted into a atomic64 in the fix in
order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating
map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning
the freeze_mutex over entire map update/delete operations in syscall side
would not be possible due to then causing everything to be serialized.
Similarly, something like synchronize_rcu() after setting map->frozen to wait
for update/deletes to complete is not possible either since it would also
have to span the user copy which can sleep. On the libbpf side, this won't
break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the
anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed
mmap()-ed memory where for .rodata it's non-writable.
Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars")
Reported-by: w1tcher.bupt(a)gmail.com
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f715e8863f4d..e7a163a3146b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -193,7 +193,7 @@ struct bpf_map {
atomic64_t usercnt;
struct work_struct work;
struct mutex freeze_mutex;
- u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
+ atomic64_t writecnt;
};
static inline bool map_value_has_spin_lock(const struct bpf_map *map)
@@ -1419,6 +1419,7 @@ void bpf_map_put(struct bpf_map *map);
void *bpf_map_area_alloc(u64 size, int numa_node);
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
void bpf_map_area_free(void *base);
+bool bpf_map_write_active(const struct bpf_map *map);
void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
int generic_map_lookup_batch(struct bpf_map *map,
const union bpf_attr *attr,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 50f96ea4452a..1033ee8c0caf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -132,6 +132,21 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
return map;
}
+static void bpf_map_write_active_inc(struct bpf_map *map)
+{
+ atomic64_inc(&map->writecnt);
+}
+
+static void bpf_map_write_active_dec(struct bpf_map *map)
+{
+ atomic64_dec(&map->writecnt);
+}
+
+bool bpf_map_write_active(const struct bpf_map *map)
+{
+ return atomic64_read(&map->writecnt) != 0;
+}
+
static u32 bpf_map_value_size(const struct bpf_map *map)
{
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
@@ -601,11 +616,8 @@ static void bpf_map_mmap_open(struct vm_area_struct *vma)
{
struct bpf_map *map = vma->vm_file->private_data;
- if (vma->vm_flags & VM_MAYWRITE) {
- mutex_lock(&map->freeze_mutex);
- map->writecnt++;
- mutex_unlock(&map->freeze_mutex);
- }
+ if (vma->vm_flags & VM_MAYWRITE)
+ bpf_map_write_active_inc(map);
}
/* called for all unmapped memory region (including initial) */
@@ -613,11 +625,8 @@ static void bpf_map_mmap_close(struct vm_area_struct *vma)
{
struct bpf_map *map = vma->vm_file->private_data;
- if (vma->vm_flags & VM_MAYWRITE) {
- mutex_lock(&map->freeze_mutex);
- map->writecnt--;
- mutex_unlock(&map->freeze_mutex);
- }
+ if (vma->vm_flags & VM_MAYWRITE)
+ bpf_map_write_active_dec(map);
}
static const struct vm_operations_struct bpf_map_default_vmops = {
@@ -668,7 +677,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
goto out;
if (vma->vm_flags & VM_MAYWRITE)
- map->writecnt++;
+ bpf_map_write_active_inc(map);
out:
mutex_unlock(&map->freeze_mutex);
return err;
@@ -1139,6 +1148,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
+ bpf_map_write_active_inc(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
@@ -1174,6 +1184,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
free_key:
kvfree(key);
err_put:
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
@@ -1196,6 +1207,7 @@ static int map_delete_elem(union bpf_attr *attr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
+ bpf_map_write_active_inc(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
@@ -1226,6 +1238,7 @@ static int map_delete_elem(union bpf_attr *attr)
out:
kvfree(key);
err_put:
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
@@ -1533,6 +1546,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
+ bpf_map_write_active_inc(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) ||
!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
@@ -1597,6 +1611,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
free_key:
kvfree(key);
err_put:
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
@@ -1624,8 +1639,7 @@ static int map_freeze(const union bpf_attr *attr)
}
mutex_lock(&map->freeze_mutex);
-
- if (map->writecnt) {
+ if (bpf_map_write_active(map)) {
err = -EBUSY;
goto err_put;
}
@@ -4171,6 +4185,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
union bpf_attr __user *uattr,
int cmd)
{
+ bool has_read = cmd == BPF_MAP_LOOKUP_BATCH ||
+ cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH;
+ bool has_write = cmd != BPF_MAP_LOOKUP_BATCH;
struct bpf_map *map;
int err, ufd;
struct fd f;
@@ -4183,16 +4200,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if ((cmd == BPF_MAP_LOOKUP_BATCH ||
- cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) &&
- !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
+ if (has_write)
+ bpf_map_write_active_inc(map);
+ if (has_read && !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
err = -EPERM;
goto err_put;
}
-
- if (cmd != BPF_MAP_LOOKUP_BATCH &&
- !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
+ if (has_write && !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
}
@@ -4205,8 +4219,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
BPF_DO_BATCH(map->ops->map_update_batch);
else
BPF_DO_BATCH(map->ops->map_delete_batch);
-
err_put:
+ if (has_write)
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 65d2f93b7030..50efda51515b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4056,7 +4056,22 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
static bool bpf_map_is_rdonly(const struct bpf_map *map)
{
- return (map->map_flags & BPF_F_RDONLY_PROG) && map->frozen;
+ /* A map is considered read-only if the following condition are true:
+ *
+ * 1) BPF program side cannot change any of the map content. The
+ * BPF_F_RDONLY_PROG flag is throughout the lifetime of a map
+ * and was set at map creation time.
+ * 2) The map value(s) have been initialized from user space by a
+ * loader and then "frozen", such that no new map update/delete
+ * operations from syscall side are possible for the rest of
+ * the map's lifetime from that point onwards.
+ * 3) Any parallel/pending map update/delete operations from syscall
+ * side have been completed. Only after that point, it's safe to
+ * assume that map value(s) are immutable.
+ */
+ return (map->map_flags & BPF_F_RDONLY_PROG) &&
+ READ_ONCE(map->frozen) &&
+ !bpf_map_write_active(map);
}
static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable(a)vger.kernel.org
---
I just realized that there are still scenarios where swiotlb may produce
some strange effects. Thus I don't think we have discussed the
dma_sync_* part in detail.
v1 -> v2:
* single patch instead of revert + right version
---
Documentation/core-api/dma-attributes.rst | 8 --------
include/linux/dma-mapping.h | 8 --------
kernel/dma/swiotlb.c | 23 +++++++++++++++--------
3 files changed, 15 insertions(+), 24 deletions(-)
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 17706dc91ec9..1887d92e8e92 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
-
-DMA_ATTR_OVERWRITE
-------------------
-
-This is a hint to the DMA-mapping subsystem that the device is expected to
-overwrite the entire mapped size, thus the caller does not require any of the
-previous buffer contents to be preserved. This allows bounce-buffering
-implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6150d11a607e..dca2b1355bb1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,14 +61,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
-/*
- * This is a hint to the DMA-mapping subsystem that the device is expected
- * to overwrite the entire mapped size, thus the caller does not require any
- * of the previous buffer contents to be preserved. This allows
- * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
- */
-#define DMA_ATTR_OVERWRITE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bfc56cb21705..6db1c475ec82 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -627,10 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
- dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ /*
+ * When dir == DMA_FROM_DEVICE we could omit the copy from the orig
+ * to the tlb buffer, if we knew for sure the device will
+ * overwirte the entire current content. But we don't. Thus
+ * unconditional bounce may prevent leaking swiotlb content (i.e.
+ * kernel memory) to user-space.
+ */
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
@@ -697,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
- swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
- else
- BUG_ON(dir != DMA_FROM_DEVICE);
+ /*
+ * Unconditional bounce is necessary to avoid corruption on
+ * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite
+ * the whole lengt of the bounce buffer.
+ */
+ swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+ BUG_ON(!valid_dma_direction(dir));
}
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
base-commit: 38f80f42147ff658aff218edb0a88c37e58bf44f
--
2.32.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40eb0dcf4114cbfff4d207890fa5a19e82da9fdc Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Thu, 10 Feb 2022 17:10:53 +0800
Subject: [PATCH] tee: optee: fix error return code in probe function
If teedev_open() fails, probe function need return
error code.
Fixes: aceeafefff73 ("optee: use driver internal tee_context for some rpc")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index 545f61af1248..0c90355896a0 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -860,8 +860,10 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev)
optee_supp_init(&optee->supp);
ffa_dev_set_drvdata(ffa_dev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_rhashtable_free;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, OPTEE_DEFAULT_MAX_NOTIF_VALUE);
if (rc)
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index bacd1a1d79ee..4157f4b41bdd 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1427,8 +1427,10 @@ static int optee_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_supp_uninit;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, max_notif_value);
if (rc)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40eb0dcf4114cbfff4d207890fa5a19e82da9fdc Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Thu, 10 Feb 2022 17:10:53 +0800
Subject: [PATCH] tee: optee: fix error return code in probe function
If teedev_open() fails, probe function need return
error code.
Fixes: aceeafefff73 ("optee: use driver internal tee_context for some rpc")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index 545f61af1248..0c90355896a0 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -860,8 +860,10 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev)
optee_supp_init(&optee->supp);
ffa_dev_set_drvdata(ffa_dev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_rhashtable_free;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, OPTEE_DEFAULT_MAX_NOTIF_VALUE);
if (rc)
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index bacd1a1d79ee..4157f4b41bdd 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1427,8 +1427,10 @@ static int optee_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_supp_uninit;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, max_notif_value);
if (rc)
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40eb0dcf4114cbfff4d207890fa5a19e82da9fdc Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Thu, 10 Feb 2022 17:10:53 +0800
Subject: [PATCH] tee: optee: fix error return code in probe function
If teedev_open() fails, probe function need return
error code.
Fixes: aceeafefff73 ("optee: use driver internal tee_context for some rpc")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index 545f61af1248..0c90355896a0 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -860,8 +860,10 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev)
optee_supp_init(&optee->supp);
ffa_dev_set_drvdata(ffa_dev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_rhashtable_free;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, OPTEE_DEFAULT_MAX_NOTIF_VALUE);
if (rc)
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index bacd1a1d79ee..4157f4b41bdd 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1427,8 +1427,10 @@ static int optee_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_supp_uninit;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, max_notif_value);
if (rc)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40eb0dcf4114cbfff4d207890fa5a19e82da9fdc Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Thu, 10 Feb 2022 17:10:53 +0800
Subject: [PATCH] tee: optee: fix error return code in probe function
If teedev_open() fails, probe function need return
error code.
Fixes: aceeafefff73 ("optee: use driver internal tee_context for some rpc")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index 545f61af1248..0c90355896a0 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -860,8 +860,10 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev)
optee_supp_init(&optee->supp);
ffa_dev_set_drvdata(ffa_dev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_rhashtable_free;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, OPTEE_DEFAULT_MAX_NOTIF_VALUE);
if (rc)
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index bacd1a1d79ee..4157f4b41bdd 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -1427,8 +1427,10 @@ static int optee_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, optee);
ctx = teedev_open(optee->teedev);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ rc = PTR_ERR(ctx);
goto err_supp_uninit;
+ }
optee->ctx = ctx;
rc = optee_notif_init(optee, max_notif_value);
if (rc)
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab66eb8541d61b0a11d70980f07b4c2dfeddc5 Mon Sep 17 00:00:00 2001
From: Su Yue <l(a)damenly.su>
Date: Tue, 22 Feb 2022 16:42:07 +0800
Subject: [PATCH] btrfs: tree-checker: use u64 for item data end to avoid
overflow
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:
[350.411942 ] loop0: detected capacity change from 0 to 262144
[350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
[350.428564 ] BTRFS info (device loop0): disk space caching is enabled
[350.428568 ] BTRFS info (device loop0): has skinny extents
[350.429589 ]
[350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636 ] index 1048096 is out of range for type 'page *[16]'
[350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
[350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772 ] Call Trace:
[350.429774 ] <TASK>
[350.429776 ] dump_stack_lvl+0x47/0x5c
[350.429780 ] ubsan_epilogue+0x5/0x50
[350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832 ] check_leaf+0x754/0x1a40 [btrfs]
[350.429874 ] ? filemap_read+0x34a/0x390
[350.429878 ] ? load_balance+0x175/0xfc0
[350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969 ] ? newidle_balance+0x259/0x480
[350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030 ] ? __schedule+0x2fb/0xa40
[350.430033 ] process_one_work+0x1f6/0x400
[350.430035 ] ? process_one_work+0x400/0x400
[350.430036 ] worker_thread+0x2d/0x3d0
[350.430037 ] ? process_one_work+0x400/0x400
[350.430038 ] kthread+0x165/0x190
[350.430041 ] ? set_kthread_struct+0x40/0x40
[350.430043 ] ret_from_fork+0x1f/0x30
[350.430047 ] </TASK>
[350.430047 ]
[350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
btrfs check reports:
corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
item end, have 4294971193 expect 3897
The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.
Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.
This commit does solve the invalid memory access showed by the stack
trace. However, its metadata profile is DUP and another copy of the
leaf is fine. So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l(a)damenly.su>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 9fd145f1c4bc..aae5697dde32 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1682,6 +1682,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
*/
for (slot = 0; slot < nritems; slot++) {
u32 item_end_expected;
+ u64 item_data_end;
int ret;
btrfs_item_key_to_cpu(leaf, &key, slot);
@@ -1696,6 +1697,8 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
return -EUCLEAN;
}
+ item_data_end = (u64)btrfs_item_offset(leaf, slot) +
+ btrfs_item_size(leaf, slot);
/*
* Make sure the offset and ends are right, remember that the
* item data starts at the end of the leaf and grows towards the
@@ -1706,11 +1709,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
else
item_end_expected = btrfs_item_offset(leaf,
slot - 1);
- if (unlikely(btrfs_item_data_end(leaf, slot) != item_end_expected)) {
+ if (unlikely(item_data_end != item_end_expected)) {
generic_err(leaf, slot,
- "unexpected item end, have %u expect %u",
- btrfs_item_data_end(leaf, slot),
- item_end_expected);
+ "unexpected item end, have %llu expect %u",
+ item_data_end, item_end_expected);
return -EUCLEAN;
}
@@ -1719,12 +1721,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
* just in case all the items are consistent to each other, but
* all point outside of the leaf.
*/
- if (unlikely(btrfs_item_data_end(leaf, slot) >
- BTRFS_LEAF_DATA_SIZE(fs_info))) {
+ if (unlikely(item_data_end > BTRFS_LEAF_DATA_SIZE(fs_info))) {
generic_err(leaf, slot,
- "slot end outside of leaf, have %u expect range [0, %u]",
- btrfs_item_data_end(leaf, slot),
- BTRFS_LEAF_DATA_SIZE(fs_info));
+ "slot end outside of leaf, have %llu expect range [0, %u]",
+ item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab66eb8541d61b0a11d70980f07b4c2dfeddc5 Mon Sep 17 00:00:00 2001
From: Su Yue <l(a)damenly.su>
Date: Tue, 22 Feb 2022 16:42:07 +0800
Subject: [PATCH] btrfs: tree-checker: use u64 for item data end to avoid
overflow
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:
[350.411942 ] loop0: detected capacity change from 0 to 262144
[350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
[350.428564 ] BTRFS info (device loop0): disk space caching is enabled
[350.428568 ] BTRFS info (device loop0): has skinny extents
[350.429589 ]
[350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636 ] index 1048096 is out of range for type 'page *[16]'
[350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
[350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772 ] Call Trace:
[350.429774 ] <TASK>
[350.429776 ] dump_stack_lvl+0x47/0x5c
[350.429780 ] ubsan_epilogue+0x5/0x50
[350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832 ] check_leaf+0x754/0x1a40 [btrfs]
[350.429874 ] ? filemap_read+0x34a/0x390
[350.429878 ] ? load_balance+0x175/0xfc0
[350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969 ] ? newidle_balance+0x259/0x480
[350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030 ] ? __schedule+0x2fb/0xa40
[350.430033 ] process_one_work+0x1f6/0x400
[350.430035 ] ? process_one_work+0x400/0x400
[350.430036 ] worker_thread+0x2d/0x3d0
[350.430037 ] ? process_one_work+0x400/0x400
[350.430038 ] kthread+0x165/0x190
[350.430041 ] ? set_kthread_struct+0x40/0x40
[350.430043 ] ret_from_fork+0x1f/0x30
[350.430047 ] </TASK>
[350.430047 ]
[350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
btrfs check reports:
corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
item end, have 4294971193 expect 3897
The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.
Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.
This commit does solve the invalid memory access showed by the stack
trace. However, its metadata profile is DUP and another copy of the
leaf is fine. So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l(a)damenly.su>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 9fd145f1c4bc..aae5697dde32 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1682,6 +1682,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
*/
for (slot = 0; slot < nritems; slot++) {
u32 item_end_expected;
+ u64 item_data_end;
int ret;
btrfs_item_key_to_cpu(leaf, &key, slot);
@@ -1696,6 +1697,8 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
return -EUCLEAN;
}
+ item_data_end = (u64)btrfs_item_offset(leaf, slot) +
+ btrfs_item_size(leaf, slot);
/*
* Make sure the offset and ends are right, remember that the
* item data starts at the end of the leaf and grows towards the
@@ -1706,11 +1709,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
else
item_end_expected = btrfs_item_offset(leaf,
slot - 1);
- if (unlikely(btrfs_item_data_end(leaf, slot) != item_end_expected)) {
+ if (unlikely(item_data_end != item_end_expected)) {
generic_err(leaf, slot,
- "unexpected item end, have %u expect %u",
- btrfs_item_data_end(leaf, slot),
- item_end_expected);
+ "unexpected item end, have %llu expect %u",
+ item_data_end, item_end_expected);
return -EUCLEAN;
}
@@ -1719,12 +1721,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
* just in case all the items are consistent to each other, but
* all point outside of the leaf.
*/
- if (unlikely(btrfs_item_data_end(leaf, slot) >
- BTRFS_LEAF_DATA_SIZE(fs_info))) {
+ if (unlikely(item_data_end > BTRFS_LEAF_DATA_SIZE(fs_info))) {
generic_err(leaf, slot,
- "slot end outside of leaf, have %u expect range [0, %u]",
- btrfs_item_data_end(leaf, slot),
- BTRFS_LEAF_DATA_SIZE(fs_info));
+ "slot end outside of leaf, have %llu expect range [0, %u]",
+ item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab66eb8541d61b0a11d70980f07b4c2dfeddc5 Mon Sep 17 00:00:00 2001
From: Su Yue <l(a)damenly.su>
Date: Tue, 22 Feb 2022 16:42:07 +0800
Subject: [PATCH] btrfs: tree-checker: use u64 for item data end to avoid
overflow
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:
[350.411942 ] loop0: detected capacity change from 0 to 262144
[350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
[350.428564 ] BTRFS info (device loop0): disk space caching is enabled
[350.428568 ] BTRFS info (device loop0): has skinny extents
[350.429589 ]
[350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636 ] index 1048096 is out of range for type 'page *[16]'
[350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
[350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772 ] Call Trace:
[350.429774 ] <TASK>
[350.429776 ] dump_stack_lvl+0x47/0x5c
[350.429780 ] ubsan_epilogue+0x5/0x50
[350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832 ] check_leaf+0x754/0x1a40 [btrfs]
[350.429874 ] ? filemap_read+0x34a/0x390
[350.429878 ] ? load_balance+0x175/0xfc0
[350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969 ] ? newidle_balance+0x259/0x480
[350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030 ] ? __schedule+0x2fb/0xa40
[350.430033 ] process_one_work+0x1f6/0x400
[350.430035 ] ? process_one_work+0x400/0x400
[350.430036 ] worker_thread+0x2d/0x3d0
[350.430037 ] ? process_one_work+0x400/0x400
[350.430038 ] kthread+0x165/0x190
[350.430041 ] ? set_kthread_struct+0x40/0x40
[350.430043 ] ret_from_fork+0x1f/0x30
[350.430047 ] </TASK>
[350.430047 ]
[350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
btrfs check reports:
corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
item end, have 4294971193 expect 3897
The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.
Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.
This commit does solve the invalid memory access showed by the stack
trace. However, its metadata profile is DUP and another copy of the
leaf is fine. So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l(a)damenly.su>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 9fd145f1c4bc..aae5697dde32 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1682,6 +1682,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
*/
for (slot = 0; slot < nritems; slot++) {
u32 item_end_expected;
+ u64 item_data_end;
int ret;
btrfs_item_key_to_cpu(leaf, &key, slot);
@@ -1696,6 +1697,8 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
return -EUCLEAN;
}
+ item_data_end = (u64)btrfs_item_offset(leaf, slot) +
+ btrfs_item_size(leaf, slot);
/*
* Make sure the offset and ends are right, remember that the
* item data starts at the end of the leaf and grows towards the
@@ -1706,11 +1709,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
else
item_end_expected = btrfs_item_offset(leaf,
slot - 1);
- if (unlikely(btrfs_item_data_end(leaf, slot) != item_end_expected)) {
+ if (unlikely(item_data_end != item_end_expected)) {
generic_err(leaf, slot,
- "unexpected item end, have %u expect %u",
- btrfs_item_data_end(leaf, slot),
- item_end_expected);
+ "unexpected item end, have %llu expect %u",
+ item_data_end, item_end_expected);
return -EUCLEAN;
}
@@ -1719,12 +1721,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
* just in case all the items are consistent to each other, but
* all point outside of the leaf.
*/
- if (unlikely(btrfs_item_data_end(leaf, slot) >
- BTRFS_LEAF_DATA_SIZE(fs_info))) {
+ if (unlikely(item_data_end > BTRFS_LEAF_DATA_SIZE(fs_info))) {
generic_err(leaf, slot,
- "slot end outside of leaf, have %u expect range [0, %u]",
- btrfs_item_data_end(leaf, slot),
- BTRFS_LEAF_DATA_SIZE(fs_info));
+ "slot end outside of leaf, have %llu expect range [0, %u]",
+ item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab66eb8541d61b0a11d70980f07b4c2dfeddc5 Mon Sep 17 00:00:00 2001
From: Su Yue <l(a)damenly.su>
Date: Tue, 22 Feb 2022 16:42:07 +0800
Subject: [PATCH] btrfs: tree-checker: use u64 for item data end to avoid
overflow
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:
[350.411942 ] loop0: detected capacity change from 0 to 262144
[350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
[350.428564 ] BTRFS info (device loop0): disk space caching is enabled
[350.428568 ] BTRFS info (device loop0): has skinny extents
[350.429589 ]
[350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636 ] index 1048096 is out of range for type 'page *[16]'
[350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
[350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772 ] Call Trace:
[350.429774 ] <TASK>
[350.429776 ] dump_stack_lvl+0x47/0x5c
[350.429780 ] ubsan_epilogue+0x5/0x50
[350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832 ] check_leaf+0x754/0x1a40 [btrfs]
[350.429874 ] ? filemap_read+0x34a/0x390
[350.429878 ] ? load_balance+0x175/0xfc0
[350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969 ] ? newidle_balance+0x259/0x480
[350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030 ] ? __schedule+0x2fb/0xa40
[350.430033 ] process_one_work+0x1f6/0x400
[350.430035 ] ? process_one_work+0x400/0x400
[350.430036 ] worker_thread+0x2d/0x3d0
[350.430037 ] ? process_one_work+0x400/0x400
[350.430038 ] kthread+0x165/0x190
[350.430041 ] ? set_kthread_struct+0x40/0x40
[350.430043 ] ret_from_fork+0x1f/0x30
[350.430047 ] </TASK>
[350.430047 ]
[350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
btrfs check reports:
corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
item end, have 4294971193 expect 3897
The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.
Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.
This commit does solve the invalid memory access showed by the stack
trace. However, its metadata profile is DUP and another copy of the
leaf is fine. So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l(a)damenly.su>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 9fd145f1c4bc..aae5697dde32 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1682,6 +1682,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
*/
for (slot = 0; slot < nritems; slot++) {
u32 item_end_expected;
+ u64 item_data_end;
int ret;
btrfs_item_key_to_cpu(leaf, &key, slot);
@@ -1696,6 +1697,8 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
return -EUCLEAN;
}
+ item_data_end = (u64)btrfs_item_offset(leaf, slot) +
+ btrfs_item_size(leaf, slot);
/*
* Make sure the offset and ends are right, remember that the
* item data starts at the end of the leaf and grows towards the
@@ -1706,11 +1709,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
else
item_end_expected = btrfs_item_offset(leaf,
slot - 1);
- if (unlikely(btrfs_item_data_end(leaf, slot) != item_end_expected)) {
+ if (unlikely(item_data_end != item_end_expected)) {
generic_err(leaf, slot,
- "unexpected item end, have %u expect %u",
- btrfs_item_data_end(leaf, slot),
- item_end_expected);
+ "unexpected item end, have %llu expect %u",
+ item_data_end, item_end_expected);
return -EUCLEAN;
}
@@ -1719,12 +1721,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
* just in case all the items are consistent to each other, but
* all point outside of the leaf.
*/
- if (unlikely(btrfs_item_data_end(leaf, slot) >
- BTRFS_LEAF_DATA_SIZE(fs_info))) {
+ if (unlikely(item_data_end > BTRFS_LEAF_DATA_SIZE(fs_info))) {
generic_err(leaf, slot,
- "slot end outside of leaf, have %u expect range [0, %u]",
- btrfs_item_data_end(leaf, slot),
- BTRFS_LEAF_DATA_SIZE(fs_info));
+ "slot end outside of leaf, have %llu expect range [0, %u]",
+ item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab66eb8541d61b0a11d70980f07b4c2dfeddc5 Mon Sep 17 00:00:00 2001
From: Su Yue <l(a)damenly.su>
Date: Tue, 22 Feb 2022 16:42:07 +0800
Subject: [PATCH] btrfs: tree-checker: use u64 for item data end to avoid
overflow
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:
[350.411942 ] loop0: detected capacity change from 0 to 262144
[350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
[350.428564 ] BTRFS info (device loop0): disk space caching is enabled
[350.428568 ] BTRFS info (device loop0): has skinny extents
[350.429589 ]
[350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636 ] index 1048096 is out of range for type 'page *[16]'
[350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
[350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772 ] Call Trace:
[350.429774 ] <TASK>
[350.429776 ] dump_stack_lvl+0x47/0x5c
[350.429780 ] ubsan_epilogue+0x5/0x50
[350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832 ] check_leaf+0x754/0x1a40 [btrfs]
[350.429874 ] ? filemap_read+0x34a/0x390
[350.429878 ] ? load_balance+0x175/0xfc0
[350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969 ] ? newidle_balance+0x259/0x480
[350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030 ] ? __schedule+0x2fb/0xa40
[350.430033 ] process_one_work+0x1f6/0x400
[350.430035 ] ? process_one_work+0x400/0x400
[350.430036 ] worker_thread+0x2d/0x3d0
[350.430037 ] ? process_one_work+0x400/0x400
[350.430038 ] kthread+0x165/0x190
[350.430041 ] ? set_kthread_struct+0x40/0x40
[350.430043 ] ret_from_fork+0x1f/0x30
[350.430047 ] </TASK>
[350.430047 ]
[350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
btrfs check reports:
corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
item end, have 4294971193 expect 3897
The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.
Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.
This commit does solve the invalid memory access showed by the stack
trace. However, its metadata profile is DUP and another copy of the
leaf is fine. So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l(a)damenly.su>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 9fd145f1c4bc..aae5697dde32 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1682,6 +1682,7 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
*/
for (slot = 0; slot < nritems; slot++) {
u32 item_end_expected;
+ u64 item_data_end;
int ret;
btrfs_item_key_to_cpu(leaf, &key, slot);
@@ -1696,6 +1697,8 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
return -EUCLEAN;
}
+ item_data_end = (u64)btrfs_item_offset(leaf, slot) +
+ btrfs_item_size(leaf, slot);
/*
* Make sure the offset and ends are right, remember that the
* item data starts at the end of the leaf and grows towards the
@@ -1706,11 +1709,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
else
item_end_expected = btrfs_item_offset(leaf,
slot - 1);
- if (unlikely(btrfs_item_data_end(leaf, slot) != item_end_expected)) {
+ if (unlikely(item_data_end != item_end_expected)) {
generic_err(leaf, slot,
- "unexpected item end, have %u expect %u",
- btrfs_item_data_end(leaf, slot),
- item_end_expected);
+ "unexpected item end, have %llu expect %u",
+ item_data_end, item_end_expected);
return -EUCLEAN;
}
@@ -1719,12 +1721,10 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
* just in case all the items are consistent to each other, but
* all point outside of the leaf.
*/
- if (unlikely(btrfs_item_data_end(leaf, slot) >
- BTRFS_LEAF_DATA_SIZE(fs_info))) {
+ if (unlikely(item_data_end > BTRFS_LEAF_DATA_SIZE(fs_info))) {
generic_err(leaf, slot,
- "slot end outside of leaf, have %u expect range [0, %u]",
- btrfs_item_data_end(leaf, slot),
- BTRFS_LEAF_DATA_SIZE(fs_info));
+ "slot end outside of leaf, have %llu expect range [0, %u]",
+ item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 18 Feb 2022 14:56:10 -0500
Subject: [PATCH] btrfs: do not start relocation until in progress drops are
done
We hit a bug with a recovering relocation on mount for one of our file
systems in production. I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time. This
presented as an error while looking up an extent item
WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
RIP: 0010:lookup_inline_extent_backref+0x647/0x680
RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
Call Trace:
<TASK>
insert_inline_extent_backref+0x46/0xd0
__btrfs_inc_extent_ref.isra.0+0x5f/0x200
? btrfs_merge_delayed_refs+0x164/0x190
__btrfs_run_delayed_refs+0x561/0xfa0
? btrfs_search_slot+0x7b4/0xb30
? btrfs_update_root+0x1a9/0x2c0
btrfs_run_delayed_refs+0x73/0x1f0
? btrfs_update_root+0x1a9/0x2c0
btrfs_commit_transaction+0x50/0xa50
? btrfs_update_reloc_root+0x122/0x220
prepare_to_merge+0x29f/0x320
relocate_block_group+0x2b8/0x550
btrfs_relocate_block_group+0x1a6/0x350
btrfs_relocate_chunk+0x27/0xe0
btrfs_balance+0x777/0xe60
balance_kthread+0x35/0x50
? btrfs_balance+0xe60/0xe60
kthread+0x16b/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex. However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.
Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.
Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
had a pending drop_progress key. If they do then we know we were in the
middle of the drop operation and set a flag on the fs_info. Then
balance can wait until this flag is cleared to start up again.
If there are DEAD_ROOT's that don't have a drop_progress set then we're
safe to start balance right away as we'll be properly protected by the
cleaner_mutex.
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 947f04789389..ebb2d109e8bb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -602,6 +602,9 @@ enum {
/* Indicate that we want the transaction kthread to commit right now. */
BTRFS_FS_COMMIT_TRANS,
+ /* Indicate we have half completed snapshot deletions pending. */
+ BTRFS_FS_UNFINISHED_DROPS,
+
#if BITS_PER_LONG == 32
/* Indicate if we have error/warn message printed on 32bit systems */
BTRFS_FS_32BIT_ERROR,
@@ -1106,8 +1109,15 @@ enum {
BTRFS_ROOT_QGROUP_FLUSHING,
/* We started the orphan cleanup for this root. */
BTRFS_ROOT_ORPHAN_CLEANUP,
+ /* This root has a drop operation that was started previously. */
+ BTRFS_ROOT_UNFINISHED_DROP,
};
+static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
+{
+ clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
+}
+
/*
* Record swapped tree blocks of a subvolume tree for delayed subtree trace
* code. For detail check comment in fs/btrfs/qgroup.c.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 87a5addbedf6..48590a380762 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3813,6 +3813,10 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
set_bit(BTRFS_FS_OPEN, &fs_info->flags);
+ /* Kick the cleaner thread so it'll start deleting snapshots. */
+ if (test_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags))
+ wake_up_process(fs_info->cleaner_kthread);
+
clear_oneshot:
btrfs_clear_oneshot_options(fs_info);
return 0;
@@ -4538,6 +4542,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
kthread_park(fs_info->cleaner_kthread);
+ /*
+ * If we had UNFINISHED_DROPS we could still be processing them, so
+ * clear that bit and wake up relocation so it can stop.
+ */
+ btrfs_wake_unfinished_drop(fs_info);
+
/* wait for the qgroup rescan worker to stop */
btrfs_qgroup_wait_for_completion(fs_info, false);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d89273c4b6b8..96427b1ecac3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5622,6 +5622,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
int ret;
int level;
bool root_dropped = false;
+ bool unfinished_drop = false;
btrfs_debug(fs_info, "Drop subvolume %llu", root->root_key.objectid);
@@ -5664,6 +5665,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
* already dropped.
*/
set_bit(BTRFS_ROOT_DELETING, &root->state);
+ unfinished_drop = test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state);
+
if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
level = btrfs_header_level(root->node);
path->nodes[level] = btrfs_lock_root_node(root);
@@ -5838,6 +5841,13 @@ out_free:
kfree(wc);
btrfs_free_path(path);
out:
+ /*
+ * We were an unfinished drop root, check to see if there are any
+ * pending, and if not clear and wake up any waiters.
+ */
+ if (!err && unfinished_drop)
+ btrfs_maybe_wake_unfinished_drop(fs_info);
+
/*
* So if we need to stop dropping the snapshot for whatever reason we
* need to make sure to add it back to the dead root list so that we
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f5465197996d..9d8054839782 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3960,6 +3960,19 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
int rw = 0;
int err = 0;
+ /*
+ * This only gets set if we had a half-deleted snapshot on mount. We
+ * cannot allow relocation to start while we're still trying to clean up
+ * these pending deletions.
+ */
+ ret = wait_on_bit(&fs_info->flags, BTRFS_FS_UNFINISHED_DROPS, TASK_INTERRUPTIBLE);
+ if (ret)
+ return ret;
+
+ /* We may have been woken up by close_ctree, so bail if we're closing. */
+ if (btrfs_fs_closing(fs_info))
+ return -EINTR;
+
bg = btrfs_lookup_block_group(fs_info, group_start);
if (!bg)
return -ENOENT;
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index 3d68d2dcd83e..ca7426ef61c8 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -278,6 +278,21 @@ int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info)
WARN_ON(!test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state));
if (btrfs_root_refs(&root->root_item) == 0) {
+ struct btrfs_key drop_key;
+
+ btrfs_disk_key_to_cpu(&drop_key, &root->root_item.drop_progress);
+ /*
+ * If we have a non-zero drop_progress then we know we
+ * made it partly through deleting this snapshot, and
+ * thus we need to make sure we block any balance from
+ * happening until this snapshot is completely dropped.
+ */
+ if (drop_key.objectid != 0 || drop_key.type != 0 ||
+ drop_key.offset != 0) {
+ set_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
+ set_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state);
+ }
+
set_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
btrfs_add_dead_root(root);
}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c3cfdfd8de9b..f17bf3764ce8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1319,6 +1319,32 @@ again:
return 0;
}
+/*
+ * If we had a pending drop we need to see if there are any others left in our
+ * dead roots list, and if not clear our bit and wake any waiters.
+ */
+void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
+{
+ /*
+ * We put the drop in progress roots at the front of the list, so if the
+ * first entry doesn't have UNFINISHED_DROP set we can wake everybody
+ * up.
+ */
+ spin_lock(&fs_info->trans_lock);
+ if (!list_empty(&fs_info->dead_roots)) {
+ struct btrfs_root *root = list_first_entry(&fs_info->dead_roots,
+ struct btrfs_root,
+ root_list);
+ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state)) {
+ spin_unlock(&fs_info->trans_lock);
+ return;
+ }
+ }
+ spin_unlock(&fs_info->trans_lock);
+
+ btrfs_wake_unfinished_drop(fs_info);
+}
+
/*
* dead roots are old snapshots that need to be deleted. This allocates
* a dirty root struct and adds it into the list of dead roots that need to
@@ -1331,7 +1357,12 @@ void btrfs_add_dead_root(struct btrfs_root *root)
spin_lock(&fs_info->trans_lock);
if (list_empty(&root->root_list)) {
btrfs_grab_root(root);
- list_add_tail(&root->root_list, &fs_info->dead_roots);
+
+ /* We want to process the partially complete drops first. */
+ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state))
+ list_add(&root->root_list, &fs_info->dead_roots);
+ else
+ list_add_tail(&root->root_list, &fs_info->dead_roots);
}
spin_unlock(&fs_info->trans_lock);
}
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 9402d8d94484..ba8a9826eb37 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -216,6 +216,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid);
void btrfs_add_dead_root(struct btrfs_root *root);
int btrfs_defrag_root(struct btrfs_root *root);
+void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info);
int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
int btrfs_commit_transaction(struct btrfs_trans_handle *trans);
void btrfs_commit_transaction_async(struct btrfs_trans_handle *trans);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 18 Feb 2022 10:17:39 -0500
Subject: [PATCH] btrfs: do not WARN_ON() if we have PageError set
Whenever we do any extent buffer operations we call
assert_eb_page_uptodate() to complain loudly if we're operating on an
non-uptodate page. Our overnight tests caught this warning earlier this
week
WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
Call Trace:
extent_buffer_test_bit+0x3f/0x70
free_space_test_bit+0xa6/0xc0
load_free_space_tree+0x1f6/0x470
caching_thread+0x454/0x630
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? lock_release+0x1f0/0x2d0
btrfs_work_helper+0xf2/0x3e0
? lock_release+0x1f0/0x2d0
? finish_task_switch.isra.0+0xf9/0x3a0
process_one_work+0x26d/0x580
? process_one_work+0x580/0x580
worker_thread+0x55/0x3b0
? process_one_work+0x580/0x580
kthread+0xf0/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer
uptodate when we fail to write it"), however all that fix did was keep
us from finding extent buffers after a failed writeout. It didn't keep
us from continuing to use a buffer that we already had found.
In this case we're searching the commit root to cache the block group,
so we can start committing the transaction and switch the commit root
and then start writing. After the switch we can look up an extent
buffer that hasn't been written yet and start processing that block
group. Then we fail to write that block out and clear Uptodate on the
page, and then we start spewing these errors.
Normally we're protected by the tree lock to a certain degree here. If
we read a block we have that block read locked, and we block the writer
from locking the block before we submit it for the write. However this
isn't necessarily fool proof because the read could happen before we do
the submit_bio and after we locked and unlocked the extent buffer.
Also in this particular case we have path->skip_locking set, so that
won't save us here. We'll simply get a block that was valid when we
read it, but became invalid while we were using it.
What we really want is to catch the case where we've "read" a block but
it's not marked Uptodate. On read we ClearPageError(), so if we're
!Uptodate and !Error we know we didn't do the right thing for reading
the page.
Fix this by checking !Uptodate && !Error, this way we will not complain
if our buffer gets invalidated while we're using it, and we'll maintain
the spirit of the check which is to make sure we have a fully in-cache
block while we're messing with it.
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d6d48ecf823c..9081223c3230 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6851,14 +6851,24 @@ static void assert_eb_page_uptodate(const struct extent_buffer *eb,
{
struct btrfs_fs_info *fs_info = eb->fs_info;
+ /*
+ * If we are using the commit root we could potentially clear a page
+ * Uptodate while we're using the extent buffer that we've previously
+ * looked up. We don't want to complain in this case, as the page was
+ * valid before, we just didn't write it out. Instead we want to catch
+ * the case where we didn't actually read the block properly, which
+ * would have !PageUptodate && !PageError, as we clear PageError before
+ * reading.
+ */
if (fs_info->sectorsize < PAGE_SIZE) {
- bool uptodate;
+ bool uptodate, error;
uptodate = btrfs_subpage_test_uptodate(fs_info, page,
eb->start, eb->len);
- WARN_ON(!uptodate);
+ error = btrfs_subpage_test_error(fs_info, page, eb->start, eb->len);
+ WARN_ON(!uptodate && !error);
} else {
- WARN_ON(!PageUptodate(page));
+ WARN_ON(!PageUptodate(page) && !PageError(page));
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 18 Feb 2022 10:17:39 -0500
Subject: [PATCH] btrfs: do not WARN_ON() if we have PageError set
Whenever we do any extent buffer operations we call
assert_eb_page_uptodate() to complain loudly if we're operating on an
non-uptodate page. Our overnight tests caught this warning earlier this
week
WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
Call Trace:
extent_buffer_test_bit+0x3f/0x70
free_space_test_bit+0xa6/0xc0
load_free_space_tree+0x1f6/0x470
caching_thread+0x454/0x630
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? lock_release+0x1f0/0x2d0
btrfs_work_helper+0xf2/0x3e0
? lock_release+0x1f0/0x2d0
? finish_task_switch.isra.0+0xf9/0x3a0
process_one_work+0x26d/0x580
? process_one_work+0x580/0x580
worker_thread+0x55/0x3b0
? process_one_work+0x580/0x580
kthread+0xf0/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer
uptodate when we fail to write it"), however all that fix did was keep
us from finding extent buffers after a failed writeout. It didn't keep
us from continuing to use a buffer that we already had found.
In this case we're searching the commit root to cache the block group,
so we can start committing the transaction and switch the commit root
and then start writing. After the switch we can look up an extent
buffer that hasn't been written yet and start processing that block
group. Then we fail to write that block out and clear Uptodate on the
page, and then we start spewing these errors.
Normally we're protected by the tree lock to a certain degree here. If
we read a block we have that block read locked, and we block the writer
from locking the block before we submit it for the write. However this
isn't necessarily fool proof because the read could happen before we do
the submit_bio and after we locked and unlocked the extent buffer.
Also in this particular case we have path->skip_locking set, so that
won't save us here. We'll simply get a block that was valid when we
read it, but became invalid while we were using it.
What we really want is to catch the case where we've "read" a block but
it's not marked Uptodate. On read we ClearPageError(), so if we're
!Uptodate and !Error we know we didn't do the right thing for reading
the page.
Fix this by checking !Uptodate && !Error, this way we will not complain
if our buffer gets invalidated while we're using it, and we'll maintain
the spirit of the check which is to make sure we have a fully in-cache
block while we're messing with it.
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d6d48ecf823c..9081223c3230 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6851,14 +6851,24 @@ static void assert_eb_page_uptodate(const struct extent_buffer *eb,
{
struct btrfs_fs_info *fs_info = eb->fs_info;
+ /*
+ * If we are using the commit root we could potentially clear a page
+ * Uptodate while we're using the extent buffer that we've previously
+ * looked up. We don't want to complain in this case, as the page was
+ * valid before, we just didn't write it out. Instead we want to catch
+ * the case where we didn't actually read the block properly, which
+ * would have !PageUptodate && !PageError, as we clear PageError before
+ * reading.
+ */
if (fs_info->sectorsize < PAGE_SIZE) {
- bool uptodate;
+ bool uptodate, error;
uptodate = btrfs_subpage_test_uptodate(fs_info, page,
eb->start, eb->len);
- WARN_ON(!uptodate);
+ error = btrfs_subpage_test_error(fs_info, page, eb->start, eb->len);
+ WARN_ON(!uptodate && !error);
} else {
- WARN_ON(!PageUptodate(page));
+ WARN_ON(!PageUptodate(page) && !PageError(page));
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d99478874355d3a7b9d86dfb5d7590d5b1754b1f Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 17 Feb 2022 12:12:02 +0000
Subject: [PATCH] btrfs: fix lost prealloc extents beyond eof after full fsync
When doing a full fsync, if we have prealloc extents beyond (or at) eof,
and the leaves that contain them were not modified in the current
transaction, we end up not logging them. This results in losing those
extents when we replay the log after a power failure, since the inode is
truncated to the current value of the logged i_size.
Just like for the fast fsync path, we need to always log all prealloc
extents starting at or beyond i_size. The fast fsync case was fixed in
commit 471d557afed155 ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay") but it missed the full fsync path. The problem
exists since the very early days, when the log tree was added by
commit e02119d5a7b439 ("Btrfs: Add a write ahead tree log to optimize
synchronous operations").
Example reproducer:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
# Create our test file with many file extent items, so that they span
# several leaves of metadata, even if the node/page size is 64K. Use
# direct IO and not fsync/O_SYNC because it's both faster and it avoids
# clearing the full sync flag from the inode - we want the fsync below
# to trigger the slow full sync code path.
$ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo
# Now add two preallocated extents to our file without extending the
# file's size. One right at i_size, and another further beyond, leaving
# a gap between the two prealloc extents.
$ xfs_io -c "falloc -k 16M 1M" /mnt/foo
$ xfs_io -c "falloc -k 20M 1M" /mnt/foo
# Make sure everything is durably persisted and the transaction is
# committed. This makes all created extents to have a generation lower
# than the generation of the transaction used by the next write and
# fsync.
sync
# Now overwrite only the first extent, which will result in modifying
# only the first leaf of metadata for our inode. Then fsync it. This
# fsync will use the slow code path (inode full sync bit is set) because
# it's the first fsync since the inode was created/loaded.
$ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo
# Extent list before power failure.
$ xfs_io -c "fiemap -v" /mnt/foo
/mnt/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 2178048..2178055 8 0x0
1: [8..16383]: 26632..43007 16376 0x0
2: [16384..32767]: 2156544..2172927 16384 0x0
3: [32768..34815]: 2172928..2174975 2048 0x800
4: [34816..40959]: hole 6144
5: [40960..43007]: 2174976..2177023 2048 0x801
<power fail>
# Mount fs again, trigger log replay.
$ mount /dev/sdc /mnt
# Extent list after power failure and log replay.
$ xfs_io -c "fiemap -v" /mnt/foo
/mnt/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 2178048..2178055 8 0x0
1: [8..16383]: 26632..43007 16376 0x0
2: [16384..32767]: 2156544..2172927 16384 0x1
# The prealloc extents at file offsets 16M and 20M are missing.
So fix this by calling btrfs_log_prealloc_extents() when we are doing a
full fsync, so that we always log all prealloc extents beyond eof.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3ee014c06b82..42caf595c936 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4635,7 +4635,7 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
/*
* Log all prealloc extents beyond the inode's i_size to make sure we do not
- * lose them after doing a fast fsync and replaying the log. We scan the
+ * lose them after doing a full/fast fsync and replaying the log. We scan the
* subvolume's root instead of iterating the inode's extent map tree because
* otherwise we can log incorrect extent items based on extent map conversion.
* That can happen due to the fact that extent maps are merged when they
@@ -5414,6 +5414,7 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans,
struct btrfs_log_ctx *ctx,
bool *need_log_inode_item)
{
+ const u64 i_size = i_size_read(&inode->vfs_inode);
struct btrfs_root *root = inode->root;
int ins_start_slot = 0;
int ins_nr = 0;
@@ -5434,13 +5435,21 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans,
if (min_key->type > max_key->type)
break;
- if (min_key->type == BTRFS_INODE_ITEM_KEY)
+ if (min_key->type == BTRFS_INODE_ITEM_KEY) {
*need_log_inode_item = false;
-
- if ((min_key->type == BTRFS_INODE_REF_KEY ||
- min_key->type == BTRFS_INODE_EXTREF_KEY) &&
- inode->generation == trans->transid &&
- !recursive_logging) {
+ } else if (min_key->type == BTRFS_EXTENT_DATA_KEY &&
+ min_key->offset >= i_size) {
+ /*
+ * Extents at and beyond eof are logged with
+ * btrfs_log_prealloc_extents().
+ * Only regular files have BTRFS_EXTENT_DATA_KEY keys,
+ * and no keys greater than that, so bail out.
+ */
+ break;
+ } else if ((min_key->type == BTRFS_INODE_REF_KEY ||
+ min_key->type == BTRFS_INODE_EXTREF_KEY) &&
+ inode->generation == trans->transid &&
+ !recursive_logging) {
u64 other_ino = 0;
u64 other_parent = 0;
@@ -5471,10 +5480,8 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans,
btrfs_release_path(path);
goto next_key;
}
- }
-
- /* Skip xattrs, we log them later with btrfs_log_all_xattrs() */
- if (min_key->type == BTRFS_XATTR_ITEM_KEY) {
+ } else if (min_key->type == BTRFS_XATTR_ITEM_KEY) {
+ /* Skip xattrs, logged later with btrfs_log_all_xattrs() */
if (ins_nr == 0)
goto next_slot;
ret = copy_items(trans, inode, dst_path, path,
@@ -5527,9 +5534,21 @@ static int copy_inode_items_to_log(struct btrfs_trans_handle *trans,
break;
}
}
- if (ins_nr)
+ if (ins_nr) {
ret = copy_items(trans, inode, dst_path, path, ins_start_slot,
ins_nr, inode_only, logged_isize);
+ if (ret)
+ return ret;
+ }
+
+ if (inode_only == LOG_INODE_ALL && S_ISREG(inode->vfs_inode.i_mode)) {
+ /*
+ * Release the path because otherwise we might attempt to double
+ * lock the same leaf with btrfs_log_prealloc_extents() below.
+ */
+ btrfs_release_path(path);
+ ret = btrfs_log_prealloc_extents(trans, inode, dst_path);
+ }
return ret;
}
Hello,
Good day,
The HSBC Bank is a financial institution in United Kingdom. We
promotes long-term,sustainable and broad-based economic growth in
developing and emerging countries by providing financial support like
loans and investment to large, small and
medium-sized companies (SMEs) as well as fast-growing enterprises
which in turn helps to create secure and permanent jobs and reduce
poverty.
If you need fund to promotes your business, project(Project Funding),
Loan, planning, budgeting and expansion of your business(s) , do not
hesitate to indicate your interest as we are here to serve you better
by granting your request.
Thank you
Mr:Mark
From: Yunfei Wang <yf.wang(a)mediatek.com>
In alloc_iova_fast function, if an iova alloc request fail,
it will free the iova ranges present in the percpu iova
rcaches and free global iova rcache and then retry, but
flushing CPU iova rcaches only for each online CPU, which
will cause incomplete rcache cleaning, and iova rcaches of
not online CPU cannot be flushed, because iova rcaches may
also lead to fragmentation of iova space, so the next retry
action may still be fail.
Based on the above, so need to flushing all iova rcaches
for each possible CPU, use for_each_possible_cpu instead of
for_each_online_cpu like in free_iova_rcaches function,
so that all rcaches can be completely released to try
replenishing IOVAs.
Signed-off-by: Yunfei Wang <yf.wang(a)mediatek.com>
Cc: <stable(a)vger.kernel.org> # 5.4.*
---
drivers/iommu/iova.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b28c9435b898..5a0637cd7bc2 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -460,7 +460,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
/* Try replenishing IOVAs by flushing rcache. */
flush_rcache = false;
- for_each_online_cpu(cpu)
+ for_each_possible_cpu(cpu)
free_cpu_cached_iovas(cpu, iovad);
free_global_cached_iovas(iovad);
goto retry;
--
2.18.0
The patch titled
Subject: memcg: sync flush only if periodic flush is delayed
has been added to the -mm tree. Its filename is
memcg-sync-flush-only-if-periodic-flush-is-delayed.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/memcg-sync-flush-only-if-periodic…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/memcg-sync-flush-only-if-periodic…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: memcg: sync flush only if periodic flush is delayed
Daniel Dao has reported [1] a regression on workloads that may trigger a
lot of refaults (anon and file). The underlying issue is that flushing
rstat is expensive. Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time. Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.
This patch fixes this regression by making the rstat flushing conditional
in the performance critical codepaths. More specifically, the kernel
relies on the async periodic rstat flusher to flush the stats and only if
the periodic flusher is delayed by more than twice the amount of its
normal time window then the kernel allows rstat flushing from the
performance critical codepaths.
Now the question: what are the side-effects of this change? The worst
that can happen is the refault codepath will see 4sec old lruvec stats and
may cause false (or missed) activations of the refaulted page which may
under-or-overestimate the workingset size. Though that is not very
concerning as the kernel can already miss or do false activations.
There are two more codepaths whose flushing behavior is not changed by
this patch and we may need to come to them in future. One is the
writeback stats used by dirty throttling and second is the deactivation
heuristic in the reclaim. For now keeping an eye on them and if there is
report of regression due to these codepaths, we will reevaluate then.
Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndg… [1]
Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com
Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Daniel Dao <dqminh(a)cloudflare.com>
Tested-by: Ivan Babrou <ivan(a)cloudflare.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Koutn�� <mkoutny(a)suse.com>
Cc: Frank Hofmann <fhofmann(a)cloudflare.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 5 +++++
mm/memcontrol.c | 12 +++++++++++-
mm/workingset.c | 2 +-
3 files changed, 17 insertions(+), 2 deletions(-)
--- a/include/linux/memcontrol.h~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/include/linux/memcontrol.h
@@ -999,6 +999,7 @@ static inline unsigned long lruvec_page_
}
void mem_cgroup_flush_stats(void);
+void mem_cgroup_flush_stats_delayed(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
@@ -1442,6 +1443,10 @@ static inline void mem_cgroup_flush_stat
{
}
+static inline void mem_cgroup_flush_stats_delayed(void)
+{
+}
+
static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
{
--- a/mm/memcontrol.c~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/mm/memcontrol.c
@@ -628,6 +628,9 @@ static DECLARE_DEFERRABLE_WORK(stats_flu
static DEFINE_SPINLOCK(stats_flush_lock);
static DEFINE_PER_CPU(unsigned int, stats_updates);
static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+static u64 flush_next_time;
+
+#define FLUSH_TIME (2UL*HZ)
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
@@ -649,6 +652,7 @@ static void __mem_cgroup_flush_stats(voi
if (!spin_trylock_irqsave(&stats_flush_lock, flag))
return;
+ flush_next_time = jiffies_64 + 2*FLUSH_TIME;
cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
atomic_set(&stats_flush_threshold, 0);
spin_unlock_irqrestore(&stats_flush_lock, flag);
@@ -660,10 +664,16 @@ void mem_cgroup_flush_stats(void)
__mem_cgroup_flush_stats();
}
+void mem_cgroup_flush_stats_delayed(void)
+{
+ if (rstat_flush_time && time_after64(jiffies_64, flush_next_time))
+ mem_cgroup_flush_stats();
+}
+
static void flush_memcg_stats_dwork(struct work_struct *w)
{
__mem_cgroup_flush_stats();
- queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ);
+ queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
/**
--- a/mm/workingset.c~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/mm/workingset.c
@@ -354,7 +354,7 @@ void workingset_refault(struct folio *fo
mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_delayed();
/*
* Compare the distance to the existing workingset size. We
* don't activate pages that couldn't stay resident even if
_
Patches currently in -mm which might be from shakeelb(a)google.com are
memcg-sync-flush-only-if-periodic-flush-is-delayed.patch
memcg-replace-in_interrupt-with-in_task.patch
memcg-refactor-mem_cgroup_oom.patch
memcg-unify-force-charging-conditions.patch
selftests-memcg-test-high-limit-for-single-entry-allocation.patch
memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch
The patch titled
Subject: memcg: sync flush only if periodic flush is delayed
has been added to the -mm tree. Its filename is
memcg-sync-flush-only-if-periodic-flush-is-delayed.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/memcg-sync-flush-only-if-periodic…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/memcg-sync-flush-only-if-periodic…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: memcg: sync flush only if periodic flush is delayed
Daniel Dao has reported [1] a regression on workloads that may trigger a
lot of refaults (anon and file). The underlying issue is that flushing
rstat is expensive. Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time. Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.
This patch fixes this regression by making the rstat flushing conditional
in the performance critical codepaths. More specifically, the kernel
relies on the async periodic rstat flusher to flush the stats and only if
the periodic flusher is delayed by more than twice the amount of its
normal time window then the kernel allows rstat flushing from the
performance critical codepaths.
Now the question: what are the side-effects of this change? The worst
that can happen is the refault codepath will see 4sec old lruvec stats and
may cause false (or missed) activations of the refaulted page which may
under-or-overestimate the workingset size. Though that is not very
concerning as the kernel can already miss or do false activations.
There are two more codepaths whose flushing behavior is not changed by
this patch and we may need to come to them in future. One is the
writeback stats used by dirty throttling and second is the deactivation
heuristic in the reclaim. For now keeping an eye on them and if there is
report of regression due to these codepaths, we will reevaluate then.
Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndg… [1]
Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com
Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Daniel Dao <dqminh(a)cloudflare.com>
Tested-by: Ivan Babrou <ivan(a)cloudflare.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Koutn�� <mkoutny(a)suse.com>
Cc: Frank Hofmann <fhofmann(a)cloudflare.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 5 +++++
mm/memcontrol.c | 12 +++++++++++-
mm/workingset.c | 2 +-
3 files changed, 17 insertions(+), 2 deletions(-)
--- a/include/linux/memcontrol.h~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/include/linux/memcontrol.h
@@ -999,6 +999,7 @@ static inline unsigned long lruvec_page_
}
void mem_cgroup_flush_stats(void);
+void mem_cgroup_flush_stats_delayed(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
@@ -1442,6 +1443,10 @@ static inline void mem_cgroup_flush_stat
{
}
+static inline void mem_cgroup_flush_stats_delayed(void)
+{
+}
+
static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
{
--- a/mm/memcontrol.c~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/mm/memcontrol.c
@@ -628,6 +628,9 @@ static DECLARE_DEFERRABLE_WORK(stats_flu
static DEFINE_SPINLOCK(stats_flush_lock);
static DEFINE_PER_CPU(unsigned int, stats_updates);
static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+static u64 flush_next_time;
+
+#define FLUSH_TIME (2UL*HZ)
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
@@ -649,6 +652,7 @@ static void __mem_cgroup_flush_stats(voi
if (!spin_trylock_irqsave(&stats_flush_lock, flag))
return;
+ flush_next_time = jiffies_64 + 2*FLUSH_TIME;
cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
atomic_set(&stats_flush_threshold, 0);
spin_unlock_irqrestore(&stats_flush_lock, flag);
@@ -660,10 +664,16 @@ void mem_cgroup_flush_stats(void)
__mem_cgroup_flush_stats();
}
+void mem_cgroup_flush_stats_delayed(void)
+{
+ if (rstat_flush_time && time_after64(jiffies_64, flush_next_time))
+ mem_cgroup_flush_stats();
+}
+
static void flush_memcg_stats_dwork(struct work_struct *w)
{
__mem_cgroup_flush_stats();
- queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ);
+ queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
/**
--- a/mm/workingset.c~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/mm/workingset.c
@@ -354,7 +354,7 @@ void workingset_refault(struct folio *fo
mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_delayed();
/*
* Compare the distance to the existing workingset size. We
* don't activate pages that couldn't stay resident even if
_
Patches currently in -mm which might be from shakeelb(a)google.com are
memcg-sync-flush-only-if-periodic-flush-is-delayed.patch
memcg-replace-in_interrupt-with-in_task.patch
memcg-refactor-mem_cgroup_oom.patch
memcg-unify-force-charging-conditions.patch
selftests-memcg-test-high-limit-for-single-entry-allocation.patch
memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch
The patch titled
Subject: memcg: sync flush only if periodic flush is delayed
has been removed from the -mm tree. Its filename was
memcg-sync-flush-only-if-periodic-flush-is-delayed.patch
This patch was dropped because it had testing failures
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: memcg: sync flush only if periodic flush is delayed
Daniel Dao has reported [1] a regression on workloads that may trigger a
lot of refaults (anon and file). The underlying issue is that flushing
rstat is expensive. Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time. Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.
This patch fixes this regression by making the rstat flushing conditional
in the performance critical codepaths. More specifically, the kernel
relies on the async periodic rstat flusher to flush the stats and only if
the periodic flusher is delayed by more than twice the amount of its
normal time window then the kernel allows rstat flushing from the
performance critical codepaths.
Now the question: what are the side-effects of this change? The worst
that can happen is the refault codepath will see 4sec old lruvec stats and
may cause false (or missed) activations of the refaulted page which may
under-or-overestimate the workingset size. Though that is not very
concerning as the kernel can already miss or do false activations.
There are two more codepaths whose flushing behavior is not changed by
this patch and we may need to come to them in future. One is the
writeback stats used by dirty throttling and second is the deactivation
heuristic in the reclaim. For now keeping an eye on them and if there is
report of regression due to these codepaths, we will reevaluate then.
Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndg… [1]
Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com
Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Daniel Dao <dqminh(a)cloudflare.com>
Tested-by: Ivan Babrou <ivan(a)cloudflare.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Koutn�� <mkoutny(a)suse.com>
Cc: Frank Hofmann <fhofmann(a)cloudflare.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 5 +++++
mm/memcontrol.c | 12 +++++++++++-
mm/workingset.c | 2 +-
3 files changed, 17 insertions(+), 2 deletions(-)
--- a/include/linux/memcontrol.h~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/include/linux/memcontrol.h
@@ -999,6 +999,7 @@ static inline unsigned long lruvec_page_
}
void mem_cgroup_flush_stats(void);
+void mem_cgroup_flush_stats_delayed(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
@@ -1442,6 +1443,10 @@ static inline void mem_cgroup_flush_stat
{
}
+static inline void mem_cgroup_flush_stats_delayed(void)
+{
+}
+
static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
{
--- a/mm/memcontrol.c~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/mm/memcontrol.c
@@ -628,6 +628,9 @@ static DECLARE_DEFERRABLE_WORK(stats_flu
static DEFINE_SPINLOCK(stats_flush_lock);
static DEFINE_PER_CPU(unsigned int, stats_updates);
static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+static u64 flush_next_time;
+
+#define FLUSH_TIME (2UL*HZ)
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
@@ -649,6 +652,7 @@ static void __mem_cgroup_flush_stats(voi
if (!spin_trylock_irqsave(&stats_flush_lock, flag))
return;
+ flush_next_time = jiffies_64 + 2*FLUSH_TIME;
cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
atomic_set(&stats_flush_threshold, 0);
spin_unlock_irqrestore(&stats_flush_lock, flag);
@@ -660,10 +664,16 @@ void mem_cgroup_flush_stats(void)
__mem_cgroup_flush_stats();
}
+void mem_cgroup_flush_stats_delayed(void)
+{
+ if (rstat_flush_time && time_after64(jiffies_64, flush_next_time))
+ mem_cgroup_flush_stats();
+}
+
static void flush_memcg_stats_dwork(struct work_struct *w)
{
__mem_cgroup_flush_stats();
- queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ);
+ queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
/**
--- a/mm/workingset.c~memcg-sync-flush-only-if-periodic-flush-is-delayed
+++ a/mm/workingset.c
@@ -354,7 +354,7 @@ void workingset_refault(struct folio *fo
mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_delayed();
/*
* Compare the distance to the existing workingset size. We
* don't activate pages that couldn't stay resident even if
_
Patches currently in -mm which might be from shakeelb(a)google.com are
memcg-replace-in_interrupt-with-in_task.patch
memcg-refactor-mem_cgroup_oom.patch
memcg-unify-force-charging-conditions.patch
selftests-memcg-test-high-limit-for-single-entry-allocation.patch
memcg-synchronously-enforce-memoryhigh-for-large-overcharges.patch
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 96403e11283def1d1c465c8279514c9a504d8630 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Fri, 4 Mar 2022 20:28:55 -0800
Subject: [PATCH] mm: prevent vm_area_struct::anon_name refcount saturation
A deep process chain with many vmas could grow really high. With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).
Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults. Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).
kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter. This would lead to the
refcounted object never being freed. A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.
To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED. This should provide enough headroom for raising the
refcounts temporarily.
Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Suggested-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Alexey Gladkov <legion(a)kernel.org>
Cc: Chris Hyser <chris.hyser(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Colin Cross <ccross(a)google.com>
Cc: Cyrill Gorcunov <gorcunov(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng(a)yulong.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index dd3accaa4e6d..cf90b1fa2c60 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -161,15 +161,25 @@ static inline void anon_vma_name_put(struct anon_vma_name *anon_name)
kref_put(&anon_name->kref, anon_vma_name_free);
}
+static inline
+struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name)
+{
+ /* Prevent anon_name refcount saturation early on */
+ if (kref_read(&anon_name->kref) < REFCOUNT_MAX) {
+ anon_vma_name_get(anon_name);
+ return anon_name;
+
+ }
+ return anon_vma_name_alloc(anon_name->name);
+}
+
static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma,
struct vm_area_struct *new_vma)
{
struct anon_vma_name *anon_name = anon_vma_name(orig_vma);
- if (anon_name) {
- anon_vma_name_get(anon_name);
- new_vma->anon_name = anon_name;
- }
+ if (anon_name)
+ new_vma->anon_name = anon_vma_name_reuse(anon_name);
}
static inline void free_anon_vma_name(struct vm_area_struct *vma)
diff --git a/mm/madvise.c b/mm/madvise.c
index 081b1cded21e..1f2693dccf7b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -113,8 +113,7 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
if (anon_vma_name_eq(orig_name, anon_name))
return 0;
- anon_vma_name_get(anon_name);
- vma->anon_name = anon_name;
+ vma->anon_name = anon_vma_name_reuse(anon_name);
anon_vma_name_put(orig_name);
return 0;
The patch titled
Subject: memfd: fix F_SEAL_WRITE after shmem huge page allocated
has been removed from the -mm tree. Its filename was
memfd-fix-f_seal_write-after-shmem-huge-page-allocated.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: memfd: fix F_SEAL_WRITE after shmem huge page allocated
Wangyong reports: after enabling tmpfs filesystem to support transparent
hugepage with the following command:
echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.
Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict. Bear in mind
that total_mapcount() itself scans all of the THP subpages, when choosing
to take an XA_CHECK_SCHED latency break.
Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.
Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Zeal Robot <zealci(a)zte.com.cn>
Reported-by: wangyong <wang.yong12(a)zte.com.cn>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: CGEL ZTE <cgel.zte(a)gmail.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)
--- a/mm/memfd.c~memfd-fix-f_seal_write-after-shmem-huge-page-allocated
+++ a/mm/memfd.c
@@ -31,20 +31,28 @@
static void memfd_tag_pins(struct xa_state *xas)
{
struct page *page;
- unsigned int tagged = 0;
+ int latency = 0;
+ int cache_count;
lru_add_drain();
xas_lock_irq(xas);
xas_for_each(xas, page, ULONG_MAX) {
- if (xa_is_value(page))
- continue;
- page = find_subpage(page, xas->xa_index);
- if (page_count(page) - page_mapcount(page) > 1)
+ cache_count = 1;
+ if (!xa_is_value(page) &&
+ PageTransHuge(page) && !PageHuge(page))
+ cache_count = HPAGE_PMD_NR;
+
+ if (!xa_is_value(page) &&
+ page_count(page) - total_mapcount(page) != cache_count)
xas_set_mark(xas, MEMFD_TAG_PINNED);
+ if (cache_count != 1)
+ xas_set(xas, page->index + cache_count);
- if (++tagged % XA_CHECK_SCHED)
+ latency += cache_count;
+ if (latency < XA_CHECK_SCHED)
continue;
+ latency = 0;
xas_pause(xas);
xas_unlock_irq(xas);
@@ -73,7 +81,8 @@ static int memfd_wait_for_pins(struct ad
error = 0;
for (scan = 0; scan <= LAST_SCAN; scan++) {
- unsigned int tagged = 0;
+ int latency = 0;
+ int cache_count;
if (!xas_marked(&xas, MEMFD_TAG_PINNED))
break;
@@ -87,10 +96,14 @@ static int memfd_wait_for_pins(struct ad
xas_lock_irq(&xas);
xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) {
bool clear = true;
- if (xa_is_value(page))
- continue;
- page = find_subpage(page, xas.xa_index);
- if (page_count(page) - page_mapcount(page) != 1) {
+
+ cache_count = 1;
+ if (!xa_is_value(page) &&
+ PageTransHuge(page) && !PageHuge(page))
+ cache_count = HPAGE_PMD_NR;
+
+ if (!xa_is_value(page) && cache_count !=
+ page_count(page) - total_mapcount(page)) {
/*
* On the last scan, we clean up all those tags
* we inserted; but make a note that we still
@@ -103,8 +116,11 @@ static int memfd_wait_for_pins(struct ad
}
if (clear)
xas_clear_mark(&xas, MEMFD_TAG_PINNED);
- if (++tagged % XA_CHECK_SCHED)
+
+ latency += cache_count;
+ if (latency < XA_CHECK_SCHED)
continue;
+ latency = 0;
xas_pause(&xas);
xas_unlock_irq(&xas);
_
Patches currently in -mm which might be from hughd(a)google.com are
tmpfs-support-for-file-creation-time-fix.patch
mm-_install_special_mapping-apply-vm_locked_clear_mask.patch
mm-thp-refix-__split_huge_pmd_locked-for-migration-pmd.patch
mm-thp-clearpagedoublemap-in-first-page_add_file_rmap.patch
mm-thp-fix-nr_file_mapped-accounting-in-page__file_rmap.patch