Hi,
some of our customers (Proxmox VE) are seeing issues with file
corruptions when accessing contents located on CephFS via the in-kernel
Ceph client [0,1], we managed to reproduce this regression on kernels up
to the latest 6.11-rc6.
Accessing the same content on the CephFS using the FUSE client or the
in-kernel ceph client with older kernels (Ubuntu kernel on v6.5) does
not show file corruptions.
Unfortunately the corruption is hard to reproduce, seemingly only a
small subset of files is affected. However, once a file is affected, the
issue is persistent and can easily be reproduced.
Bisection with the reproducer points to this commit:
"92b6cc5d: netfs: Add iov_iters to (sub)requests to describe various
buffers"
Description of the issue:
A file was copied from local filesystem to cephfs via:
```
cp /tmp/proxmox-backup-server_3.2-1.iso
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso
```
* sha256sum on local
filesystem:`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/tmp/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel up to above commit:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel after above commit:
`89ad3620bf7b1e0913b534516cfbe48580efbaec944b79951e2c14e5e551f736
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* removing and/or recopying the file does not change the issue, the
corrupt checksum remains the same.
* accessing the same file from different clients results in the same
output: the one with above patch applied do show the incorrect checksum,
ones without the patch show the correct checksum.
* the issue persists even across reboot of the ceph cluster and/or clients.
* the file is indeed corrupt after reading, as verified by a `cmp -b`.
Interestingly, the first 4M contain the correct data, the following 4M
are read as all zeros, which differs from the original data.
* the issue is related to the readahead size: mounting the cephfs with a
`rasize=0` makes the issue disappear, same is true for sizes up to 128k
(please note that the ranges as initially reported on the mailing list
[3] are not correct for rasize [0..128k] the file is not corrupted).
In the bugtracker issue [4] I attached a ftrace with "*ceph*" as filter
while performing a read on the latest kernel 6.11-rc6 while performing
```
dd if=/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso of=/tmp/test.out
bs=8M count=1
```
the relevant part shown by task `dd-26192`.
Please let me know if I can provide further information or debug outputs
in order to narrow down the issue.
[0] https://forum.proxmox.com/threads/78340/post-676129
[1] https://forum.proxmox.com/threads/149249/
[2] https://forum.proxmox.com/threads/151291/
[3]
https://lore.kernel.org/lkml/db686d0c-2f27-47c8-8c14-26969433b13b@proxmox.c…
[4] https://bugzilla.kernel.org/show_bug.cgi?id=219237
#regzbot introduced: 92b6cc5d
Regards,
Christian Ebner
It is possible that an interrupt is disabled and masked at the same time.
When the interrupt is enabled again by enable_irq(), only plic_irq_enable()
is called, not plic_irq_unmask(). The interrupt remains masked and never
raises.
An example where interrupt is both disabled and masked is when
handle_fasteoi_irq() is the handler, and IRQS_ONESHOT is set. The interrupt
handler:
1. Mask the interrupt
2. Handle the interrupt
3. Check if interrupt is still enabled, and unmask it (see
cond_unmask_eoi_irq())
If another task disables the interrupt in the middle of the above steps,
the interrupt will not get unmasked, and will remain masked when it is
enabled in the future.
The problem is occasionally observed when PREEMPT_RT is enabled, because
PREEMPT_RT add the IRQS_ONESHOT flag. But PREEMPT_RT only makes the
problem more likely to appear, the bug has been around since
commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
operations").
Fix it by unmasking interrupt in plic_irq_enable().
Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations").
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-sifive-plic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 2f6ef5c495bd..0efbf14ec9fa 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -128,6 +128,9 @@ static inline void plic_irq_toggle(const struct cpumask *mask,
static void plic_irq_enable(struct irq_data *d)
{
+ struct plic_priv *priv = irq_data_get_irq_chip_data(d);
+
+ writel(1, priv->regs + PRIORITY_BASE + d->hwirq * PRIORITY_PER_ID);
plic_irq_toggle(irq_data_get_effective_affinity_mask(d), d, 1);
}
--
2.39.5
Changes in v3:
- Drops assigned-clock-* from description retains in example - Sakari,
Krzysztof
- Updates example fake clock names to ov08x40_* instead of copy/paste
ov9282_clk -> ov08x40_clk, ov9282_clk_parent -> ov08x40_clk_parent - bod
- Link to v2: https://lore.kernel.org/r/20241001-b4-master-24-11-25-ov08x40-v2-0-e478976b…
Changes in v2:
- Drops "-" in ovti,ov08x40.yaml after description: - Rob
- Adds ":" after first line of description text - Rob
- dts -> DT in commit log - Rob
- Removes dependency on 'xvclk' as a name in yaml
and driver - Sakari
- Uses assigned-clock, assigned-clock-parents and assigned-clock-rates -
Sakari
- Drops clock-frequency - Sakarai, Krzysztof
- Drops dovdd-supply, avdd-supply, dvdd-supply and reset-gpios
as required, its perfectly possible not to have the reset GPIO or the
power rails under control of the SoC. - bod
- Link to v1: https://lore.kernel.org/r/20240926-b4-master-24-11-25-ov08x40-v1-0-e4d5fbd3…
V1:
This series brings fixes and updates to ov08x40 which allows for use of
this sensor on the Qualcomm x1e80100 CRD but also on any other dts based
system.
Firstly there's a fix for the pseudo burst mode code that was added in
8f667d202384 ("media: ov08x40: Reduce start streaming time"). Not every I2C
controller can handle an arbitrary sized write, this is the case on
Qualcomm CAMSS/CCI I2C sensor interfaces which limit the transaction size
and communicate this limit via I2C quirks. A simple fix to optionally break
up the large submitted burst into chunks not exceeding adapter->quirk size
fixes.
Secondly then is addition of a yaml description for the ov08x40 and
extension of the driver to support OF probe and powering on of the power
rails from the driver instead of from ACPI.
Once done the sensor works without further modification on the Qualcomm
x1e80100 CRD.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
media: ov08x40: Fix burst write sequence
media: dt-bindings: Add OmniVision OV08X40
media: ov08x40: Rename ext_clk to xvclk
media: ov08x40: Add OF probe support
.../bindings/media/i2c/ovti,ov08x40.yaml | 116 +++++++++++++
drivers/media/i2c/ov08x40.c | 179 ++++++++++++++++++---
2 files changed, 272 insertions(+), 23 deletions(-)
---
base-commit: 2b7275670032a98cba266bd1b8905f755b3e650f
change-id: 20240926-b4-master-24-11-25-ov08x40-c6f477aaa6a4
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Hi,
Subject: [PATCH net] netkit: Assign missing bpf_net_context
commit id: 157f29152b61ca41809dd7ead29f5733adeced19
kernel versions: 6.11
The patch was sent before 6.11 was cut final, but it didn't make it.
We are seeing kernel crashes on 6.11.1 without this commit, with the
exact signature that the patch fixes.
Thank you.
Regards,
Maksym
commit 41130c32f3a18fcc930316da17f3a5f3bc326aa1 upstream
This was not marked as stable (but has a Fixes: tag), but causes the USB stack
to crash with 6.1.x kernels.
The patch does not apply when cherry-picking because of the context.
Should I send the patch again with updated context so that it applies without
conflicts?
Best regards,
Georg
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x d4fc4d01471528da8a9797a065982e05090e1d81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100100-emperor-thespian-397f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
d4fc4d014715 ("x86/tdx: Fix "in-kernel MMIO" check")
859e63b789d6 ("x86/tdx: Convert shared memory back to private on kexec")
533c67e63584 ("mm/mglru: add dummy pmd_dirty()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4fc4d01471528da8a9797a065982e05090e1d81 Mon Sep 17 00:00:00 2001
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
Date: Fri, 13 Sep 2024 19:05:56 +0200
Subject: [PATCH] x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66dce0da..327c45c5013f 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d4fc4d01471528da8a9797a065982e05090e1d81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100104-everybody-retrace-89ed@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d4fc4d014715 ("x86/tdx: Fix "in-kernel MMIO" check")
859e63b789d6 ("x86/tdx: Convert shared memory back to private on kexec")
533c67e63584 ("mm/mglru: add dummy pmd_dirty()")
df57721f9a63 ("Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4fc4d01471528da8a9797a065982e05090e1d81 Mon Sep 17 00:00:00 2001
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
Date: Fri, 13 Sep 2024 19:05:56 +0200
Subject: [PATCH] x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66dce0da..327c45c5013f 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*