Hi,
some of our customers (Proxmox VE) are seeing issues with file
corruptions when accessing contents located on CephFS via the in-kernel
Ceph client [0,1], we managed to reproduce this regression on kernels up
to the latest 6.11-rc6.
Accessing the same content on the CephFS using the FUSE client or the
in-kernel ceph client with older kernels (Ubuntu kernel on v6.5) does
not show file corruptions.
Unfortunately the corruption is hard to reproduce, seemingly only a
small subset of files is affected. However, once a file is affected, the
issue is persistent and can easily be reproduced.
Bisection with the reproducer points to this commit:
"92b6cc5d: netfs: Add iov_iters to (sub)requests to describe various
buffers"
Description of the issue:
A file was copied from local filesystem to cephfs via:
```
cp /tmp/proxmox-backup-server_3.2-1.iso
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso
```
* sha256sum on local
filesystem:`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/tmp/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel up to above commit:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel after above commit:
`89ad3620bf7b1e0913b534516cfbe48580efbaec944b79951e2c14e5e551f736
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* removing and/or recopying the file does not change the issue, the
corrupt checksum remains the same.
* accessing the same file from different clients results in the same
output: the one with above patch applied do show the incorrect checksum,
ones without the patch show the correct checksum.
* the issue persists even across reboot of the ceph cluster and/or clients.
* the file is indeed corrupt after reading, as verified by a `cmp -b`.
Interestingly, the first 4M contain the correct data, the following 4M
are read as all zeros, which differs from the original data.
* the issue is related to the readahead size: mounting the cephfs with a
`rasize=0` makes the issue disappear, same is true for sizes up to 128k
(please note that the ranges as initially reported on the mailing list
[3] are not correct for rasize [0..128k] the file is not corrupted).
In the bugtracker issue [4] I attached a ftrace with "*ceph*" as filter
while performing a read on the latest kernel 6.11-rc6 while performing
```
dd if=/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso of=/tmp/test.out
bs=8M count=1
```
the relevant part shown by task `dd-26192`.
Please let me know if I can provide further information or debug outputs
in order to narrow down the issue.
[0] https://forum.proxmox.com/threads/78340/post-676129
[1] https://forum.proxmox.com/threads/149249/
[2] https://forum.proxmox.com/threads/151291/
[3]
https://lore.kernel.org/lkml/db686d0c-2f27-47c8-8c14-26969433b13b@proxmox.c…
[4] https://bugzilla.kernel.org/show_bug.cgi?id=219237
#regzbot introduced: 92b6cc5d
Regards,
Christian Ebner
It is possible that an interrupt is disabled and masked at the same time.
When the interrupt is enabled again by enable_irq(), only plic_irq_enable()
is called, not plic_irq_unmask(). The interrupt remains masked and never
raises.
An example where interrupt is both disabled and masked is when
handle_fasteoi_irq() is the handler, and IRQS_ONESHOT is set. The interrupt
handler:
1. Mask the interrupt
2. Handle the interrupt
3. Check if interrupt is still enabled, and unmask it (see
cond_unmask_eoi_irq())
If another task disables the interrupt in the middle of the above steps,
the interrupt will not get unmasked, and will remain masked when it is
enabled in the future.
The problem is occasionally observed when PREEMPT_RT is enabled, because
PREEMPT_RT add the IRQS_ONESHOT flag. But PREEMPT_RT only makes the
problem more likely to appear, the bug has been around since
commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
operations").
Fix it by unmasking interrupt in plic_irq_enable().
Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations").
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-sifive-plic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 2f6ef5c495bd..0efbf14ec9fa 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -128,6 +128,9 @@ static inline void plic_irq_toggle(const struct cpumask *mask,
static void plic_irq_enable(struct irq_data *d)
{
+ struct plic_priv *priv = irq_data_get_irq_chip_data(d);
+
+ writel(1, priv->regs + PRIORITY_BASE + d->hwirq * PRIORITY_PER_ID);
plic_irq_toggle(irq_data_get_effective_affinity_mask(d), d, 1);
}
--
2.39.5
Changes in v3:
- Drops assigned-clock-* from description retains in example - Sakari,
Krzysztof
- Updates example fake clock names to ov08x40_* instead of copy/paste
ov9282_clk -> ov08x40_clk, ov9282_clk_parent -> ov08x40_clk_parent - bod
- Link to v2: https://lore.kernel.org/r/20241001-b4-master-24-11-25-ov08x40-v2-0-e478976b…
Changes in v2:
- Drops "-" in ovti,ov08x40.yaml after description: - Rob
- Adds ":" after first line of description text - Rob
- dts -> DT in commit log - Rob
- Removes dependency on 'xvclk' as a name in yaml
and driver - Sakari
- Uses assigned-clock, assigned-clock-parents and assigned-clock-rates -
Sakari
- Drops clock-frequency - Sakarai, Krzysztof
- Drops dovdd-supply, avdd-supply, dvdd-supply and reset-gpios
as required, its perfectly possible not to have the reset GPIO or the
power rails under control of the SoC. - bod
- Link to v1: https://lore.kernel.org/r/20240926-b4-master-24-11-25-ov08x40-v1-0-e4d5fbd3…
V1:
This series brings fixes and updates to ov08x40 which allows for use of
this sensor on the Qualcomm x1e80100 CRD but also on any other dts based
system.
Firstly there's a fix for the pseudo burst mode code that was added in
8f667d202384 ("media: ov08x40: Reduce start streaming time"). Not every I2C
controller can handle an arbitrary sized write, this is the case on
Qualcomm CAMSS/CCI I2C sensor interfaces which limit the transaction size
and communicate this limit via I2C quirks. A simple fix to optionally break
up the large submitted burst into chunks not exceeding adapter->quirk size
fixes.
Secondly then is addition of a yaml description for the ov08x40 and
extension of the driver to support OF probe and powering on of the power
rails from the driver instead of from ACPI.
Once done the sensor works without further modification on the Qualcomm
x1e80100 CRD.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
media: ov08x40: Fix burst write sequence
media: dt-bindings: Add OmniVision OV08X40
media: ov08x40: Rename ext_clk to xvclk
media: ov08x40: Add OF probe support
.../bindings/media/i2c/ovti,ov08x40.yaml | 116 +++++++++++++
drivers/media/i2c/ov08x40.c | 179 ++++++++++++++++++---
2 files changed, 272 insertions(+), 23 deletions(-)
---
base-commit: 2b7275670032a98cba266bd1b8905f755b3e650f
change-id: 20240926-b4-master-24-11-25-ov08x40-c6f477aaa6a4
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Hi,
Subject: [PATCH net] netkit: Assign missing bpf_net_context
commit id: 157f29152b61ca41809dd7ead29f5733adeced19
kernel versions: 6.11
The patch was sent before 6.11 was cut final, but it didn't make it.
We are seeing kernel crashes on 6.11.1 without this commit, with the
exact signature that the patch fixes.
Thank you.
Regards,
Maksym
commit 41130c32f3a18fcc930316da17f3a5f3bc326aa1 upstream
This was not marked as stable (but has a Fixes: tag), but causes the USB stack
to crash with 6.1.x kernels.
The patch does not apply when cherry-picking because of the context.
Should I send the patch again with updated context so that it applies without
conflicts?
Best regards,
Georg
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x d4fc4d01471528da8a9797a065982e05090e1d81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100100-emperor-thespian-397f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
d4fc4d014715 ("x86/tdx: Fix "in-kernel MMIO" check")
859e63b789d6 ("x86/tdx: Convert shared memory back to private on kexec")
533c67e63584 ("mm/mglru: add dummy pmd_dirty()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4fc4d01471528da8a9797a065982e05090e1d81 Mon Sep 17 00:00:00 2001
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
Date: Fri, 13 Sep 2024 19:05:56 +0200
Subject: [PATCH] x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66dce0da..327c45c5013f 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d4fc4d01471528da8a9797a065982e05090e1d81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100104-everybody-retrace-89ed@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d4fc4d014715 ("x86/tdx: Fix "in-kernel MMIO" check")
859e63b789d6 ("x86/tdx: Convert shared memory back to private on kexec")
533c67e63584 ("mm/mglru: add dummy pmd_dirty()")
df57721f9a63 ("Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4fc4d01471528da8a9797a065982e05090e1d81 Mon Sep 17 00:00:00 2001
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
Date: Fri, 13 Sep 2024 19:05:56 +0200
Subject: [PATCH] x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66dce0da..327c45c5013f 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
When sending data using DMA at high baudrate (4 Mbdps in local test case) to
a device with small RX buffer which keeps asserting RTS after every received
byte, it is possible that the iMX UART driver would not recognize the falling
edge of RTS input signal and get stuck, unable to transmit any more data.
This condition happens when the following sequence of events occur:
- imx_uart_mctrl_check() is called at some point and takes a snapshot of UART
control signal status into sport->old_status using imx_uart_get_hwmctrl().
The RTSS/TIOCM_CTS bit is of interest here (*).
- DMA transfer occurs, the remote device asserts RTS signal after each byte.
The i.MX UART driver recognizes each such RTS signal change, raises an
interrupt with USR1 register RTSD bit set, which leads to invocation of
__imx_uart_rtsint(), which calls uart_handle_cts_change().
- If the RTS signal is deasserted, uart_handle_cts_change() clears
port->hw_stopped and unblocks the port for further data transfers.
- If the RTS is asserted, uart_handle_cts_change() sets port->hw_stopped
and blocks the port for further data transfers. This may occur as the
last interrupt of a transfer, which means port->hw_stopped remains set
and the port remains blocked (**).
- Any further data transfer attempts will trigger imx_uart_mctrl_check(),
which will read current status of UART control signals by calling
imx_uart_get_hwmctrl() (***) and compare it with sport->old_status .
- If current status differs from sport->old_status for RTS signal,
uart_handle_cts_change() is called and possibly unblocks the port
by clearing port->hw_stopped .
- If current status does not differ from sport->old_status for RTS
signal, no action occurs. This may occur in case prior snapshot (*)
was taken before any transfer so the RTS is deasserted, current
snapshot (***) was taken after a transfer and therefore RTS is
deasserted again, which means current status and sport->old_status
are identical. In case (**) triggered when RTS got asserted, and
made port->hw_stopped set, the port->hw_stopped will remain set
because no change on RTS line is recognized by this driver and
uart_handle_cts_change() is not called from here to unblock the
port->hw_stopped.
Update sport->old_status in __imx_uart_rtsint() accordingly to make
imx_uart_mctrl_check() detect such RTS change. Note that TIOCM_CAR
and TIOCM_RI bits in sport->old_status do not suffer from this problem.
Fixes: ceca629e0b48 ("[ARM] 2971/1: i.MX uart handle rts irq")
Signed-off-by: Marek Vasut <marex(a)denx.de>
---
Cc: "Uwe Kleine-König" <u.kleine-koenig(a)pengutronix.de>
Cc: Christoph Niedermaier <cniedermaier(a)dh-electronics.com>
Cc: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Cc: Esben Haabendal <esben(a)geanix.com>
Cc: Fabio Estevam <festevam(a)gmail.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
Cc: Pengutronix Kernel Team <kernel(a)pengutronix.de>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Rickard x Andersson <rickaran(a)axis.com>
Cc: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: Shawn Guo <shawnguo(a)kernel.org>
Cc: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
Cc: imx(a)lists.linux.dev
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-serial(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
drivers/tty/serial/imx.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 67d4a72eda770..3ad7f42790ef9 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -762,6 +762,10 @@ static irqreturn_t __imx_uart_rtsint(int irq, void *dev_id)
imx_uart_writel(sport, USR1_RTSD, USR1);
usr1 = imx_uart_readl(sport, USR1) & USR1_RTSS;
+ if (usr1 & USR1_RTSS)
+ sport->old_status |= TIOCM_CTS;
+ else
+ sport->old_status &= ~TIOCM_CTS;
uart_handle_cts_change(&sport->port, usr1);
wake_up_interruptible(&sport->port.state->port.delta_msr_wait);
--
2.45.2