While the set_msix() callback function in pcie-cadence-ep writes the
Table Size field correctly (N-1), the calculation of the PBA offset
is wrong because it calculates space for (N-1) entries instead of N.
This results in e.g. the following error when using QEMU with PCI
passthrough on a device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
Cc: stable(a)vger.kernel.org
Fixes: 3ef5d16f50f8 ("PCI: cadence: Add MSI-X support to Endpoint driver")
Reviewed-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/pci/controller/cadence/pcie-cadence-ep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 599ec4b1223e..112ae200b393 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -292,13 +292,14 @@ static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u8 vfn,
struct cdns_pcie *pcie = &ep->pcie;
u32 cap = CDNS_PCIE_EP_FUNC_MSIX_CAP_OFFSET;
u32 val, reg;
+ u16 actual_interrupts = interrupts + 1;
fn = cdns_pcie_get_fn_from_vfn(pcie, fn, vfn);
reg = cap + PCI_MSIX_FLAGS;
val = cdns_pcie_ep_fn_readw(pcie, fn, reg);
val &= ~PCI_MSIX_FLAGS_QSIZE;
- val |= interrupts;
+ val |= interrupts; /* 0's based value */
cdns_pcie_ep_fn_writew(pcie, fn, reg, val);
/* Set MSI-X BAR and offset */
@@ -308,7 +309,7 @@ static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u8 vfn,
/* Set PBA BAR and offset. BAR must match MSI-X BAR */
reg = cap + PCI_MSIX_PBA;
- val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+ val = (offset + (actual_interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
cdns_pcie_ep_fn_writel(pcie, fn, reg, val);
return 0;
--
2.49.0
While the set_msix() callback function in pcie-designware-ep writes the
Table Size field correctly (N-1), the calculation of the PBA offset
is wrong because it calculates space for (N-1) entries instead of N.
This results in e.g. the following error when using QEMU with PCI
passthrough on a device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
Cc: stable(a)vger.kernel.org
Fixes: 83153d9f36e2 ("PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments")
Reviewed-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/pci/controller/dwc/pcie-designware-ep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 1a0bf9341542..24026f3f3413 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -585,6 +585,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct dw_pcie_ep_func *ep_func;
u32 val, reg;
+ u16 actual_interrupts = interrupts + 1;
ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
if (!ep_func || !ep_func->msix_cap)
@@ -595,7 +596,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
reg = ep_func->msix_cap + PCI_MSIX_FLAGS;
val = dw_pcie_ep_readw_dbi(ep, func_no, reg);
val &= ~PCI_MSIX_FLAGS_QSIZE;
- val |= interrupts;
+ val |= interrupts; /* 0's based value */
dw_pcie_writew_dbi(pci, reg, val);
reg = ep_func->msix_cap + PCI_MSIX_TABLE;
@@ -603,7 +604,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
dw_pcie_ep_writel_dbi(ep, func_no, reg, val);
reg = ep_func->msix_cap + PCI_MSIX_PBA;
- val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+ val = (offset + (actual_interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
dw_pcie_ep_writel_dbi(ep, func_no, reg, val);
dw_pcie_dbi_ro_wr_dis(pci);
--
2.49.0
First patch of the series fixes possible infinite loop.
Remaining three patches fixes address alignment issue observed
after "9382bc44b5f5 arm64: allow kmalloc() caches aligned to the
smaller cache_line_size()"
Patch-2 and patch-3 applies to stable version 6.6 onwards.
Patch-4 applies to stable version 6.12 onwards
Bharat Bhushan (4):
crypto: octeontx2: add timeout for load_fvc completion poll
crypto: octeontx2: Fix address alignment issue on ucode loading
crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2
crypto: octeontx2: Fix address alignment on CN10KB and CN10KA-B0
.../marvell/octeontx2/otx2_cpt_reqmgr.h | 119 +++++++++++++-----
.../marvell/octeontx2/otx2_cptpf_ucode.c | 46 ++++---
2 files changed, 121 insertions(+), 44 deletions(-)
--
2.34.1
Keyboard and touchpad stopped working on several Apple Macbooks from the
year 2017 using kernel 6.12.xx . Until now I could only find this
discussion affirming the bug on Debian and Fedora:
https://github.com/Dunedan/mbp-2016-linux/issues/202
On siduction I also tried the more recent kernels 6.14.5 and mainline
6.15-rc4 (from Ubuntu) and the issue persisted with my testdevice
MacBookPro14,1 -- see the relevant output:
kernel: platform pxa2xx-spi.3: Adding to iommu group 20
kernel: input: Apple SPI Keyboard as
/devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: DMAR: DRHD: handling fault status reg 3
kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
Many thanks,
Jörg Berkel
Hey,
I just saw that I was debited twice for my latest transaction on your website.
Transaction appears twice on my payment history, and I need to get a reversal for the duplicate transaction.
Please investigate this and fix the issue at your earliest convenience? I’d appreciate a reply as fast as possible.
Thank you for your help in advance. Eagerly awaiting your update.
Best regards,
Charles Edwards
From: Linus Torvalds <torvalds(a)linux-foundation.org>
[ Upstream commit 6bd23e0c2bb6c65d4f5754d1456bc9a4427fc59b ]
... and use it to limit the virtual terminals to just N_TTY. They are
kind of special, and in particular, the "con_write()" routine violates
the "writes cannot sleep" rule that some ldiscs rely on.
This avoids the
BUG: sleeping function called from invalid context at kernel/printk/printk.c:2659
when N_GSM has been attached to a virtual console, and gsmld_write()
calls con_write() while holding a spinlock, and con_write() then tries
to get the console lock.
Tested-by: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Daniel Starke <daniel.starke(a)siemens.com>
Reported-by: syzbot <syzbot+dbac96d8e73b61aa559c(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=dbac96d8e73b61aa559c
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20240423163339.59780-1-torvalds@linux-foundation.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Minor conflict resolved due to code context change. And also backport description
comments for struct tty_operations.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
drivers/tty/tty_ldisc.c | 6 +
drivers/tty/vt/vt.c | 10 ++
include/linux/tty_driver.h | 339 +++++++++++++++++++++++++++++++++++++
3 files changed, 355 insertions(+)
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index c34c01579b75..1ab3c4eb3359 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -567,6 +567,12 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
goto out;
}
+ if (tty->ops->ldisc_ok) {
+ retval = tty->ops->ldisc_ok(tty, disc);
+ if (retval)
+ goto out;
+ }
+
old_ldisc = tty->ldisc;
/* Shutdown the old discipline. */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index bd125ea5c51f..5b35ea7744a4 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3440,6 +3440,15 @@ static void con_cleanup(struct tty_struct *tty)
tty_port_put(&vc->port);
}
+/*
+ * We can't deal with anything but the N_TTY ldisc,
+ * because we can sleep in our write() routine.
+ */
+static int con_ldisc_ok(struct tty_struct *tty, int ldisc)
+{
+ return ldisc == N_TTY ? 0 : -EINVAL;
+}
+
static int default_color = 7; /* white */
static int default_italic_color = 2; // green (ASCII)
static int default_underline_color = 3; // cyan (ASCII)
@@ -3567,6 +3576,7 @@ static const struct tty_operations con_ops = {
.resize = vt_resize,
.shutdown = con_shutdown,
.cleanup = con_cleanup,
+ .ldisc_ok = con_ldisc_ok,
};
static struct cdev vc0_cdev;
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index c20431d8def8..9707328595df 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -244,6 +244,344 @@ struct tty_driver;
struct serial_icounter_struct;
struct serial_struct;
+/**
+ * struct tty_operations -- interface between driver and tty
+ *
+ * @lookup: ``struct tty_struct *()(struct tty_driver *self, struct file *,
+ * int idx)``
+ *
+ * Return the tty device corresponding to @idx, %NULL if there is not
+ * one currently in use and an %ERR_PTR value on error. Called under
+ * %tty_mutex (for now!)
+ *
+ * Optional method. Default behaviour is to use the @self->ttys array.
+ *
+ * @install: ``int ()(struct tty_driver *self, struct tty_struct *tty)``
+ *
+ * Install a new @tty into the @self's internal tables. Used in
+ * conjunction with @lookup and @remove methods.
+ *
+ * Optional method. Default behaviour is to use the @self->ttys array.
+ *
+ * @remove: ``void ()(struct tty_driver *self, struct tty_struct *tty)``
+ *
+ * Remove a closed @tty from the @self's internal tables. Used in
+ * conjunction with @lookup and @remove methods.
+ *
+ * Optional method. Default behaviour is to use the @self->ttys array.
+ *
+ * @open: ``int ()(struct tty_struct *tty, struct file *)``
+ *
+ * This routine is called when a particular @tty device is opened. This
+ * routine is mandatory; if this routine is not filled in, the attempted
+ * open will fail with %ENODEV.
+ *
+ * Required method. Called with tty lock held. May sleep.
+ *
+ * @close: ``void ()(struct tty_struct *tty, struct file *)``
+ *
+ * This routine is called when a particular @tty device is closed. At the
+ * point of return from this call the driver must make no further ldisc
+ * calls of any kind.
+ *
+ * Remark: called even if the corresponding @open() failed.
+ *
+ * Required method. Called with tty lock held. May sleep.
+ *
+ * @shutdown: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine is called under the tty lock when a particular @tty device
+ * is closed for the last time. It executes before the @tty resources
+ * are freed so may execute while another function holds a @tty kref.
+ *
+ * @cleanup: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine is called asynchronously when a particular @tty device
+ * is closed for the last time freeing up the resources. This is
+ * actually the second part of shutdown for routines that might sleep.
+ *
+ * @write: ``int ()(struct tty_struct *tty, const unsigned char *buf,
+ * int count)``
+ *
+ * This routine is called by the kernel to write a series (@count) of
+ * characters (@buf) to the @tty device. The characters may come from
+ * user space or kernel space. This routine will return the
+ * number of characters actually accepted for writing.
+ *
+ * May occur in parallel in special cases. Because this includes panic
+ * paths drivers generally shouldn't try and do clever locking here.
+ *
+ * Optional: Required for writable devices. May not sleep.
+ *
+ * @put_char: ``int ()(struct tty_struct *tty, unsigned char ch)``
+ *
+ * This routine is called by the kernel to write a single character @ch to
+ * the @tty device. If the kernel uses this routine, it must call the
+ * @flush_chars() routine (if defined) when it is done stuffing characters
+ * into the driver. If there is no room in the queue, the character is
+ * ignored.
+ *
+ * Optional: Kernel will use the @write method if not provided. Do not
+ * call this function directly, call tty_put_char().
+ *
+ * @flush_chars: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine is called by the kernel after it has written a
+ * series of characters to the tty device using @put_char().
+ *
+ * Optional. Do not call this function directly, call
+ * tty_driver_flush_chars().
+ *
+ * @write_room: ``unsigned int ()(struct tty_struct *tty)``
+ *
+ * This routine returns the numbers of characters the @tty driver
+ * will accept for queuing to be written. This number is subject
+ * to change as output buffers get emptied, or if the output flow
+ * control is acted.
+ *
+ * The ldisc is responsible for being intelligent about multi-threading of
+ * write_room/write calls
+ *
+ * Required if @write method is provided else not needed. Do not call this
+ * function directly, call tty_write_room()
+ *
+ * @chars_in_buffer: ``unsigned int ()(struct tty_struct *tty)``
+ *
+ * This routine returns the number of characters in the device private
+ * output queue. Used in tty_wait_until_sent() and for poll()
+ * implementation.
+ *
+ * Optional: if not provided, it is assumed there is no queue on the
+ * device. Do not call this function directly, call tty_chars_in_buffer().
+ *
+ * @ioctl: ``int ()(struct tty_struct *tty, unsigned int cmd,
+ * unsigned long arg)``
+ *
+ * This routine allows the @tty driver to implement device-specific
+ * ioctls. If the ioctl number passed in @cmd is not recognized by the
+ * driver, it should return %ENOIOCTLCMD.
+ *
+ * Optional.
+ *
+ * @compat_ioctl: ``long ()(struct tty_struct *tty, unsigned int cmd,
+ * unsigned long arg)``
+ *
+ * Implement ioctl processing for 32 bit process on 64 bit system.
+ *
+ * Optional.
+ *
+ * @set_termios: ``void ()(struct tty_struct *tty, struct ktermios *old)``
+ *
+ * This routine allows the @tty driver to be notified when device's
+ * termios settings have changed. New settings are in @tty->termios.
+ * Previous settings are passed in the @old argument.
+ *
+ * The API is defined such that the driver should return the actual modes
+ * selected. This means that the driver is responsible for modifying any
+ * bits in @tty->termios it cannot fulfill to indicate the actual modes
+ * being used.
+ *
+ * Optional. Called under the @tty->termios_rwsem. May sleep.
+ *
+ * @ldisc_ok: ``int ()(struct tty_struct *tty, int ldisc)``
+ *
+ * This routine allows the @tty driver to decide if it can deal
+ * with a particular @ldisc.
+ *
+ * Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
+ * @set_ldisc: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine allows the @tty driver to be notified when the device's
+ * line discipline is being changed. At the point this is done the
+ * discipline is not yet usable.
+ *
+ * Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
+ * @throttle: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that input buffers for the line
+ * discipline are close to full, and it should somehow signal that no more
+ * characters should be sent to the @tty.
+ *
+ * Serialization including with @unthrottle() is the job of the ldisc
+ * layer.
+ *
+ * Optional: Always invoke via tty_throttle_safe(). Called under the
+ * @tty->termios_rwsem.
+ *
+ * @unthrottle: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it should signal that
+ * characters can now be sent to the @tty without fear of overrunning the
+ * input buffers of the line disciplines.
+ *
+ * Optional. Always invoke via tty_unthrottle(). Called under the
+ * @tty->termios_rwsem.
+ *
+ * @stop: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it should stop outputting
+ * characters to the tty device.
+ *
+ * Called with @tty->flow.lock held. Serialized with @start() method.
+ *
+ * Optional. Always invoke via stop_tty().
+ *
+ * @start: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it resumed sending
+ * characters to the @tty device.
+ *
+ * Called with @tty->flow.lock held. Serialized with stop() method.
+ *
+ * Optional. Always invoke via start_tty().
+ *
+ * @hangup: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it should hang up the @tty
+ * device.
+ *
+ * Optional. Called with tty lock held.
+ *
+ * @break_ctl: ``int ()(struct tty_struct *tty, int state)``
+ *
+ * This optional routine requests the @tty driver to turn on or off BREAK
+ * status on the RS-232 port. If @state is -1, then the BREAK status
+ * should be turned on; if @state is 0, then BREAK should be turned off.
+ *
+ * If this routine is implemented, the high-level tty driver will handle
+ * the following ioctls: %TCSBRK, %TCSBRKP, %TIOCSBRK, %TIOCCBRK.
+ *
+ * If the driver sets %TTY_DRIVER_HARDWARE_BREAK in tty_alloc_driver(),
+ * then the interface will also be called with actual times and the
+ * hardware is expected to do the delay work itself. 0 and -1 are still
+ * used for on/off.
+ *
+ * Optional: Required for %TCSBRK/%BRKP/etc. handling. May sleep.
+ *
+ * @flush_buffer: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine discards device private output buffer. Invoked on close,
+ * hangup, to implement %TCOFLUSH ioctl and similar.
+ *
+ * Optional: if not provided, it is assumed there is no queue on the
+ * device. Do not call this function directly, call
+ * tty_driver_flush_buffer().
+ *
+ * @wait_until_sent: ``void ()(struct tty_struct *tty, int timeout)``
+ *
+ * This routine waits until the device has written out all of the
+ * characters in its transmitter FIFO. Or until @timeout (in jiffies) is
+ * reached.
+ *
+ * Optional: If not provided, the device is assumed to have no FIFO.
+ * Usually correct to invoke via tty_wait_until_sent(). May sleep.
+ *
+ * @send_xchar: ``void ()(struct tty_struct *tty, char ch)``
+ *
+ * This routine is used to send a high-priority XON/XOFF character (@ch)
+ * to the @tty device.
+ *
+ * Optional: If not provided, then the @write method is called under
+ * the @tty->atomic_write_lock to keep it serialized with the ldisc.
+ *
+ * @tiocmget: ``int ()(struct tty_struct *tty)``
+ *
+ * This routine is used to obtain the modem status bits from the @tty
+ * driver.
+ *
+ * Optional: If not provided, then %ENOTTY is returned from the %TIOCMGET
+ * ioctl. Do not call this function directly, call tty_tiocmget().
+ *
+ * @tiocmset: ``int ()(struct tty_struct *tty,
+ * unsigned int set, unsigned int clear)``
+ *
+ * This routine is used to set the modem status bits to the @tty driver.
+ * First, @clear bits should be cleared, then @set bits set.
+ *
+ * Optional: If not provided, then %ENOTTY is returned from the %TIOCMSET
+ * ioctl. Do not call this function directly, call tty_tiocmset().
+ *
+ * @resize: ``int ()(struct tty_struct *tty, struct winsize *ws)``
+ *
+ * Called when a termios request is issued which changes the requested
+ * terminal geometry to @ws.
+ *
+ * Optional: the default action is to update the termios structure
+ * without error. This is usually the correct behaviour. Drivers should
+ * not force errors here if they are not resizable objects (e.g. a serial
+ * line). See tty_do_resize() if you need to wrap the standard method
+ * in your own logic -- the usual case.
+ *
+ * @get_icount: ``int ()(struct tty_struct *tty,
+ * struct serial_icounter *icount)``
+ *
+ * Called when the @tty device receives a %TIOCGICOUNT ioctl. Passed a
+ * kernel structure @icount to complete.
+ *
+ * Optional: called only if provided, otherwise %ENOTTY will be returned.
+ *
+ * @get_serial: ``int ()(struct tty_struct *tty, struct serial_struct *p)``
+ *
+ * Called when the @tty device receives a %TIOCGSERIAL ioctl. Passed a
+ * kernel structure @p (&struct serial_struct) to complete.
+ *
+ * Optional: called only if provided, otherwise %ENOTTY will be returned.
+ * Do not call this function directly, call tty_tiocgserial().
+ *
+ * @set_serial: ``int ()(struct tty_struct *tty, struct serial_struct *p)``
+ *
+ * Called when the @tty device receives a %TIOCSSERIAL ioctl. Passed a
+ * kernel structure @p (&struct serial_struct) to set the values from.
+ *
+ * Optional: called only if provided, otherwise %ENOTTY will be returned.
+ * Do not call this function directly, call tty_tiocsserial().
+ *
+ * @show_fdinfo: ``void ()(struct tty_struct *tty, struct seq_file *m)``
+ *
+ * Called when the @tty device file descriptor receives a fdinfo request
+ * from VFS (to show in /proc/<pid>/fdinfo/). @m should be filled with
+ * information.
+ *
+ * Optional: called only if provided, otherwise nothing is written to @m.
+ * Do not call this function directly, call tty_show_fdinfo().
+ *
+ * @poll_init: ``int ()(struct tty_driver *driver, int line, char *options)``
+ *
+ * kgdboc support (Documentation/dev-tools/kgdb.rst). This routine is
+ * called to initialize the HW for later use by calling @poll_get_char or
+ * @poll_put_char.
+ *
+ * Optional: called only if provided, otherwise skipped as a non-polling
+ * driver.
+ *
+ * @poll_get_char: ``int ()(struct tty_driver *driver, int line)``
+ *
+ * kgdboc support (see @poll_init). @driver should read a character from a
+ * tty identified by @line and return it.
+ *
+ * Optional: called only if @poll_init provided.
+ *
+ * @poll_put_char: ``void ()(struct tty_driver *driver, int line, char ch)``
+ *
+ * kgdboc support (see @poll_init). @driver should write character @ch to
+ * a tty identified by @line.
+ *
+ * Optional: called only if @poll_init provided.
+ *
+ * @proc_show: ``int ()(struct seq_file *m, void *driver)``
+ *
+ * Driver @driver (cast to &struct tty_driver) can show additional info in
+ * /proc/tty/driver/<driver_name>. It is enough to fill in the information
+ * into @m.
+ *
+ * Optional: called only if provided, otherwise no /proc entry created.
+ *
+ * This structure defines the interface between the low-level tty driver and
+ * the tty routines. These routines can be defined. Unless noted otherwise,
+ * they are optional, and can be filled in with a %NULL pointer.
+ */
struct tty_operations {
struct tty_struct * (*lookup)(struct tty_driver *driver,
struct file *filp, int idx);
@@ -271,6 +609,7 @@ struct tty_operations {
void (*hangup)(struct tty_struct *tty);
int (*break_ctl)(struct tty_struct *tty, int state);
void (*flush_buffer)(struct tty_struct *tty);
+ int (*ldisc_ok)(struct tty_struct *tty, int ldisc);
void (*set_ldisc)(struct tty_struct *tty);
void (*wait_until_sent)(struct tty_struct *tty, int timeout);
void (*send_xchar)(struct tty_struct *tty, char ch);
--
2.34.1
The patch titled
Subject: mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table
has been added to the -mm mm-new branch. Its filename is
mm-hugetlb-fix-a-deadlock-with-pagecache_folio-and-hugetlb_fault_mutex_table.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gavin Guo <gavinguo(a)igalia.com>
Subject: mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table
Date: Tue, 13 May 2025 17:34:48 +0800
Fix a deadlock which can be triggered by an internal syzkaller [1]
reproducer and captured by bpftrace script [2] and its log [3] in this
scenario:
Process 1 Process 2
--- ---
hugetlb_fault
mutex_lock(B) // take B
filemap_lock_hugetlb_folio
filemap_lock_folio
__filemap_get_folio
folio_lock(A) // take A
hugetlb_wp
mutex_unlock(B) // release B
... hugetlb_fault
... mutex_lock(B) // take B
filemap_lock_hugetlb_folio
filemap_lock_folio
__filemap_get_folio
folio_lock(A) // blocked
unmap_ref_private
...
mutex_lock(B) // retake and blocked
This is a ABBA deadlock involving two locks:
- Lock A: pagecache_folio lock
- Lock B: hugetlb_fault_mutex_table lock
The deadlock occurs between two processes as follows:
1. The first process (let's call it Process 1) is handling a
copy-on-write (COW) operation on a hugepage via hugetlb_wp. Due to
insufficient reserved hugetlb pages, Process 1, owner of the reserved
hugetlb page, attempts to unmap a hugepage owned by another process
(non-owner) to satisfy the reservation. Before unmapping, Process 1
acquires lock B (hugetlb_fault_mutex_table lock) and then lock A
(pagecache_folio lock). To proceed with the unmap, it releases Lock B
but retains Lock A. After the unmap, Process 1 tries to reacquire Lock
B. However, at this point, Lock B has already been acquired by another
process.
2. The second process (Process 2) enters the hugetlb_fault handler
during the unmap operation. It successfully acquires Lock B
(hugetlb_fault_mutex_table lock) that was just released by Process 1,
but then attempts to acquire Lock A (pagecache_folio lock), which is
still held by Process 1.
As a result, Process 1 (holding Lock A) is blocked waiting for Lock B
(held by Process 2), while Process 2 (holding Lock B) is blocked waiting
for Lock A (held by Process 1), constructing a ABBA deadlock scenario.
The solution here is to unlock the pagecache_folio and provide the
pagecache_folio_unlocked variable to the caller to have the visibility
over the pagecache_folio status for subsequent handling.
The error message:
INFO: task repro_20250402_:13229 blocked for more than 64 seconds.
Not tainted 6.15.0-rc3+ #24
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro_20250402_ state:D stack:25856 pid:13229 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x1755/0x4f50
schedule+0x158/0x330
schedule_preempt_disabled+0x15/0x30
__mutex_lock+0x75f/0xeb0
hugetlb_wp+0xf88/0x3440
hugetlb_fault+0x14c8/0x2c30
trace_clock_x86_tsc+0x20/0x20
do_user_addr_fault+0x61d/0x1490
exc_page_fault+0x64/0x100
asm_exc_page_fault+0x26/0x30
RIP: 0010:__put_user_4+0xd/0x20
copy_process+0x1f4a/0x3d60
kernel_clone+0x210/0x8f0
__x64_sys_clone+0x18d/0x1f0
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x41b26d
</TASK>
INFO: task repro_20250402_:13229 is blocked on a mutex likely owned by task repro_20250402_:13250.
task:repro_20250402_ state:D stack:28288 pid:13250 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00000006
Call Trace:
<TASK>
__schedule+0x1755/0x4f50
schedule+0x158/0x330
io_schedule+0x92/0x110
folio_wait_bit_common+0x69a/0xba0
__filemap_get_folio+0x154/0xb70
hugetlb_fault+0xa50/0x2c30
trace_clock_x86_tsc+0x20/0x20
do_user_addr_fault+0xace/0x1490
exc_page_fault+0x64/0x100
asm_exc_page_fault+0x26/0x30
RIP: 0033:0x402619
</TASK>
INFO: task repro_20250402_:13250 blocked for more than 65 seconds.
Not tainted 6.15.0-rc3+ #24
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro_20250402_ state:D stack:28288 pid:13250 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00000006
Call Trace:
<TASK>
__schedule+0x1755/0x4f50
schedule+0x158/0x330
io_schedule+0x92/0x110
folio_wait_bit_common+0x69a/0xba0
__filemap_get_folio+0x154/0xb70
hugetlb_fault+0xa50/0x2c30
trace_clock_x86_tsc+0x20/0x20
do_user_addr_fault+0xace/0x1490
exc_page_fault+0x64/0x100
asm_exc_page_fault+0x26/0x30
RIP: 0033:0x402619
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/35:
#0: ffffffff879a7440 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x30/0x180
2 locks held by repro_20250402_/13229:
#0: ffff888017d801e0 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x37/0x300
#1: ffff888000fec848 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_wp+0xf88/0x3440
3 locks held by repro_20250402_/13250:
#0: ffff8880177f3d08 (vm_lock){++++}-{0:0}, at: do_user_addr_fault+0x41b/0x1490
#1: ffff888000fec848 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_fault+0x3b8/0x2c30
#2: ffff8880129500e8 (&resv_map->rw_sema){++++}-{4:4}, at: hugetlb_fault+0x494/0x2c30
Link: https://drive.google.com/file/d/1DVRnIW-vSayU5J1re9Ct_br3jJQU6Vpb/view?usp=… [1]
Link: https://github.com/bboymimi/bpftracer/blob/master/scripts/hugetlb_lock_debu… [2]
Link: https://drive.google.com/file/d/1bWq2-8o-BJAuhoHWX7zAhI6ggfhVzQUI/view?usp=… [3]
Link: https://lkml.kernel.org/r/20250513093448.592150-1-gavinguo@igalia.com
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Signed-off-by: Gavin Guo <gavinguo(a)igalia.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Florent Revest <revest(a)google.com>
Cc: Gavin Shan <gshan(a)redhat.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Byungchul Park <byungchul(a)sk.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-deadlock-with-pagecache_folio-and-hugetlb_fault_mutex_table
+++ a/mm/hugetlb.c
@@ -6131,7 +6131,8 @@ static void unmap_ref_private(struct mm_
* Keep the pte_same checks anyway to make transition from the mutex easier.
*/
static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
- struct vm_fault *vmf)
+ struct vm_fault *vmf,
+ bool *pagecache_folio_unlocked)
{
struct vm_area_struct *vma = vmf->vma;
struct mm_struct *mm = vma->vm_mm;
@@ -6229,6 +6230,22 @@ retry_avoidcopy:
folio_put(old_folio);
/*
+ * The pagecache_folio needs to be unlocked to avoid
+ * deadlock and we won't re-lock it in hugetlb_wp(). The
+ * pagecache_folio could be truncated after being
+ * unlocked. So its state should not be relied
+ * subsequently.
+ *
+ * Setting *pagecache_folio_unlocked to true allows the
+ * caller to handle any necessary logic related to the
+ * folio's unlocked state.
+ */
+ if (pagecache_folio) {
+ folio_unlock(pagecache_folio);
+ if (pagecache_folio_unlocked)
+ *pagecache_folio_unlocked = true;
+ }
+ /*
* Drop hugetlb_fault_mutex and vma_lock before
* unmapping. unmapping needs to hold vma_lock
* in write mode. Dropping vma_lock in read mode
@@ -6581,7 +6598,7 @@ static vm_fault_t hugetlb_no_page(struct
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
- ret = hugetlb_wp(folio, vmf);
+ ret = hugetlb_wp(folio, vmf, NULL);
}
spin_unlock(vmf->ptl);
@@ -6653,6 +6670,7 @@ vm_fault_t hugetlb_fault(struct mm_struc
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
int need_wait_lock = 0;
+ bool pagecache_folio_unlocked = false;
struct vm_fault vmf = {
.vma = vma,
.address = address & huge_page_mask(h),
@@ -6807,7 +6825,8 @@ vm_fault_t hugetlb_fault(struct mm_struc
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(vmf.orig_pte)) {
- ret = hugetlb_wp(pagecache_folio, &vmf);
+ ret = hugetlb_wp(pagecache_folio, &vmf,
+ &pagecache_folio_unlocked);
goto out_put_page;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte);
@@ -6824,10 +6843,14 @@ out_put_page:
out_ptl:
spin_unlock(vmf.ptl);
- if (pagecache_folio) {
+ /*
+ * If the pagecache_folio is unlocked in hugetlb_wp(), we skip
+ * folio_unlock() here.
+ */
+ if (pagecache_folio && !pagecache_folio_unlocked)
folio_unlock(pagecache_folio);
+ if (pagecache_folio)
folio_put(pagecache_folio);
- }
out_mutex:
hugetlb_vma_unlock_read(vma);
_
Patches currently in -mm which might be from gavinguo(a)igalia.com are
mm-hugetlb-fix-a-deadlock-with-pagecache_folio-and-hugetlb_fault_mutex_table.patch
V4: add read-back for non-DPG case. This is for protection
purpose as it is not used for producton.
On VCN v4.0.5 there is a race condition where the WPTR is not
updated after starting from idle when doorbell is used. Adding
register read-back after written at function end is to ensure
all register writes are done before they can be used.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Cc: stable(a)vger.kernel.org
Signed-off-by: David (Ming Qiang) Wu <David.Wu3(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Tested-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index ed00d35039c1..a09f9a2dd471 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -1034,6 +1034,10 @@ static int vcn_v4_0_5_start_dpg_mode(struct amdgpu_vcn_inst *vinst,
ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT |
VCN_RB1_DB_CTRL__EN_MASK);
+ /* Keeping one read-back to ensure all register writes are done, otherwise
+ * it may introduce race conditions */
+ RREG32_SOC15(VCN, inst_idx, regVCN_RB1_DB_CTRL);
+
return 0;
}
@@ -1216,6 +1220,10 @@ static int vcn_v4_0_5_start(struct amdgpu_vcn_inst *vinst)
WREG32_SOC15(VCN, i, regVCN_RB_ENABLE, tmp);
fw_shared->sq.queue_mode &= ~(FW_QUEUE_RING_RESET | FW_QUEUE_DPG_HOLD_OFF);
+ /* Keeping one read-back to ensure all register writes are done, otherwise
+ * it may introduce race conditions */
+ RREG32_SOC15(VCN, i, regVCN_RB_ENABLE);
+
return 0;
}
--
2.34.1
The series contains several small patches to fix various
issues in the pinctrl driver for Armada 3700.
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
Gabor Juhos (7):
pinctrl: armada-37xx: use correct OUTPUT_VAL register for GPIOs > 31
pinctrl: armada-37xx: propagate error from armada_37xx_gpio_direction_output()
pinctrl: armada-37xx: set GPIO output value before setting direction
pinctrl: armada-37xx: propagate error from armada_37xx_gpio_get()
pinctrl: armada-37xx: propagate error from armada_37xx_pmx_gpio_set_direction()
pinctrl: armada-37xx: propagate error from armada_37xx_gpio_get_direction()
pinctrl: armada-37xx: propagate error from armada_37xx_pmx_set_by_name()
drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 35 ++++++++++++++++-------------
1 file changed, 20 insertions(+), 15 deletions(-)
---
base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
change-id: 20250512-pinctrl-a37xx-fixes-98fabc45cb11
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
V2: not to add extra read-back in vcn_v4_0_5_start as there is a
read-back already. New comment for better understanding.
On VCN v4.0.5 there is a race condition where the WPTR is not
updated after starting from idle when doorbell is used. The read-back
of regVCN_RB1_DB_CTRL register after written is to ensure the
doorbell_index is updated before it can work properly.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Cc: stable(a)vger.kernel.org
Signed-off-by: David (Ming Qiang) Wu <David.Wu3(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Tested-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index ed00d35039c1..e55b76d71367 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -1034,6 +1034,10 @@ static int vcn_v4_0_5_start_dpg_mode(struct amdgpu_vcn_inst *vinst,
ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT |
VCN_RB1_DB_CTRL__EN_MASK);
+ /* Keeping one read-back to ensure all register writes are done, otherwise
+ * it may introduce race conditions */
+ RREG32_SOC15(VCN, inst_idx, regVCN_RB1_DB_CTRL);
+
return 0;
}
--
2.34.1
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 9ad7974856926129f190ffbe3beea78460b3b7cc ]
If it looks like there's another subframe in the A-MSDU
but the header isn't fully there, we can end up reading
data out of bounds, only to discard later. Make this a
bit more careful and check if the subframe header can
even be present.
Reported-by: syzbot+d050d437fe47d479d210(a)syzkaller.appspotmail.com
Link: https://msgid.link/20240226203405.a731e2c95e38.I82ce7d8c0cc8970ce29d0a39fdc…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
[Minor conflict resolved due to code context change. And routine
ieee80211_is_valid_amsdu is introduced by commit fe4a6d2db3ba
("wifi: mac80211: implement support for yet another mesh A-MSDU format")
after 6.4.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
net/wireless/util.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 6ebc6567b287..0fd48361e3e1 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -751,24 +751,27 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list,
struct sk_buff *frame = NULL;
u16 ethertype;
u8 *payload;
- int offset = 0, remaining;
+ int offset = 0;
struct ethhdr eth;
bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb);
bool reuse_skb = false;
bool last = false;
while (!last) {
+ int remaining = skb->len - offset;
unsigned int subframe_len;
int len;
u8 padding;
+ if (sizeof(eth) > remaining)
+ goto purge;
+
skb_copy_bits(skb, offset, ð, sizeof(eth));
len = ntohs(eth.h_proto);
subframe_len = sizeof(struct ethhdr) + len;
padding = (4 - subframe_len) & 0x3;
/* the last MSDU has no padding */
- remaining = skb->len - offset;
if (subframe_len > remaining)
goto purge;
/* mitigate A-MSDU aggregation injection attacks */
--
2.34.1
From: Filipe Manana <fdmanana(a)suse.com>
commit a0f0cf8341e34e5d2265bfd3a7ad68342da1e2aa upstream.
When using the flushoncommit mount option, during almost every transaction
commit we trigger a warning from __writeback_inodes_sb_nr():
$ cat fs/fs-writeback.c:
(...)
static void __writeback_inodes_sb_nr(struct super_block *sb, ...
{
(...)
WARN_ON(!rwsem_is_locked(&sb->s_umount));
(...)
}
(...)
The trace produced in dmesg looks like the following:
[947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
[947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
[947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
[947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
[947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
[947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
[947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
[947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
[947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
[947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
[947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
[947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
[947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
[947.571072] Call Trace:
[947.572354] <TASK>
[947.573266] btrfs_commit_transaction+0x1f1/0x998
[947.576785] ? start_transaction+0x3ab/0x44e
[947.579867] ? schedule_timeout+0x8a/0xdd
[947.582716] transaction_kthread+0xe9/0x156
[947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407
[947.590104] kthread+0x131/0x139
[947.592168] ? set_kthread_struct+0x32/0x32
[947.595174] ret_from_fork+0x22/0x30
[947.597561] </TASK>
[947.598553] ---[ end trace 644721052755541c ]---
This is because we started using writeback_inodes_sb() to flush delalloc
when committing a transaction (when using -o flushoncommit), in order to
avoid deadlocks with filesystem freeze operations. This change was made
by commit ce8ea7cc6eb313 ("btrfs: don't call btrfs_start_delalloc_roots
in flushoncommit"). After that change we started producing that warning,
and every now and then a user reports this since the warning happens too
often, it spams dmesg/syslog, and a user is unsure if this reflects any
problem that might compromise the filesystem's reliability.
We can not just lock the sb->s_umount semaphore before calling
writeback_inodes_sb(), because that would at least deadlock with
filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
is called while we are holding that semaphore in write mode, and that can
trigger a transaction commit, resulting in a deadlock. It would also
trigger the same type of deadlock in the unmount path. Possibly, it could
also introduce some other locking dependencies that lockdep would report.
To fix this call try_to_writeback_inodes_sb() instead of
writeback_inodes_sb(), because that will try to read lock sb->s_umount
and then will only call writeback_inodes_sb() if it was able to lock it.
This is fine because the cases where it can't read lock sb->s_umount
are during a filesystem unmount or during a filesystem freeze - in those
cases sb->s_umount is write locked and sync_filesystem() is called, which
calls writeback_inodes_sb(). In other words, in all cases where we can't
take a read lock on sb->s_umount, writeback is already being triggered
elsewhere.
An alternative would be to call btrfs_start_delalloc_roots() with a
number of pages different from LONG_MAX, for example matching the number
of delalloc bytes we currently have, in which case we would end up
starting all delalloc with filemap_fdatawrite_wbc() and not with an
async flush via filemap_flush() - that is only possible after the rather
recent commit e076ab2a2ca70a ("btrfs: shrink delalloc pages instead of
full inodes"). However that creates a whole new can of worms due to new
lock dependencies, which lockdep complains, like for example:
[ 8948.247280] ======================================================
[ 8948.247823] WARNING: possible circular locking dependency detected
[ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted
[ 8948.248786] ------------------------------------------------------
[ 8948.249320] kworker/u16:18/933570 is trying to acquire lock:
[ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.250638]
but task is already holding lock:
[ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.252018]
which lock already depends on the new lock.
[ 8948.252710]
the existing dependency chain (in reverse order) is:
[ 8948.253343]
-> #2 (&root->delalloc_mutex){+.+.}-{3:3}:
[ 8948.253950] __mutex_lock+0x90/0x900
[ 8948.254354] start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.254859] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.255408] btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.255942] btrfs_mksubvol+0x380/0x570 [btrfs]
[ 8948.256406] btrfs_mksnapshot+0x81/0xb0 [btrfs]
[ 8948.256870] __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs]
[ 8948.257413] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
[ 8948.257961] btrfs_ioctl+0x1196/0x3630 [btrfs]
[ 8948.258418] __x64_sys_ioctl+0x83/0xb0
[ 8948.258793] do_syscall_64+0x3b/0xc0
[ 8948.259146] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.259709]
-> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}:
[ 8948.260330] __mutex_lock+0x90/0x900
[ 8948.260692] btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.261234] btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.261766] btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs]
[ 8948.262379] btrfs_start_pre_rw_mount+0x119/0x180 [btrfs]
[ 8948.262909] open_ctree+0x1511/0x171e [btrfs]
[ 8948.263359] btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 8948.263863] legacy_get_tree+0x30/0x50
[ 8948.264242] vfs_get_tree+0x28/0xc0
[ 8948.264594] vfs_kern_mount.part.0+0x71/0xb0
[ 8948.265017] btrfs_mount+0x11d/0x3a0 [btrfs]
[ 8948.265462] legacy_get_tree+0x30/0x50
[ 8948.265851] vfs_get_tree+0x28/0xc0
[ 8948.266203] path_mount+0x2d4/0xbe0
[ 8948.266554] __x64_sys_mount+0x103/0x140
[ 8948.266940] do_syscall_64+0x3b/0xc0
[ 8948.267300] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.267790]
-> #0 (sb_internal#2){.+.+}-{0:0}:
[ 8948.268322] __lock_acquire+0x12e8/0x2260
[ 8948.268733] lock_acquire+0xd7/0x310
[ 8948.269092] start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.269591] find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.270087] btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.270588] cow_file_range+0x17e/0x490 [btrfs]
[ 8948.271051] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.271586] writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.272071] __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.272579] extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.273113] extent_writepages+0x76/0x130 [btrfs]
[ 8948.273573] do_writepages+0xd2/0x1c0
[ 8948.273942] filemap_fdatawrite_wbc+0x68/0x90
[ 8948.274371] start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.274876] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.275417] flush_space+0x1f2/0x630 [btrfs]
[ 8948.275863] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.276438] process_one_work+0x252/0x5a0
[ 8948.276829] worker_thread+0x55/0x3b0
[ 8948.277189] kthread+0xf2/0x120
[ 8948.277506] ret_from_fork+0x22/0x30
[ 8948.277868]
other info that might help us debug this:
[ 8948.278548] Chain exists of:
sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex
[ 8948.279601] Possible unsafe locking scenario:
[ 8948.280102] CPU0 CPU1
[ 8948.280508] ---- ----
[ 8948.280915] lock(&root->delalloc_mutex);
[ 8948.281271] lock(&fs_info->delalloc_root_mutex);
[ 8948.281915] lock(&root->delalloc_mutex);
[ 8948.282487] lock(sb_internal#2);
[ 8948.282800]
*** DEADLOCK ***
[ 8948.283333] 4 locks held by kworker/u16:18/933570:
[ 8948.283750] #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.284609] #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.285637] #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.286674] #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.287596]
stack backtrace:
[ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1
[ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[ 8948.290298] Call Trace:
[ 8948.290517] <TASK>
[ 8948.290700] dump_stack_lvl+0x59/0x73
[ 8948.291026] check_noncircular+0xf3/0x110
[ 8948.291375] ? start_transaction+0x228/0x6e0 [btrfs]
[ 8948.291826] __lock_acquire+0x12e8/0x2260
[ 8948.292241] lock_acquire+0xd7/0x310
[ 8948.292714] ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.293241] ? lock_is_held_type+0xea/0x140
[ 8948.293601] start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.294055] ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294518] find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294957] ? _raw_spin_unlock+0x29/0x40
[ 8948.295312] ? btrfs_get_alloc_profile+0x124/0x290 [btrfs]
[ 8948.295813] btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.296270] cow_file_range+0x17e/0x490 [btrfs]
[ 8948.296691] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.297175] ? find_lock_delalloc_range+0x247/0x270 [btrfs]
[ 8948.297678] writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.298123] __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.298570] extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.299061] extent_writepages+0x76/0x130 [btrfs]
[ 8948.299495] do_writepages+0xd2/0x1c0
[ 8948.299817] ? sched_clock_cpu+0xd/0x110
[ 8948.300160] ? lock_release+0x155/0x4a0
[ 8948.300494] filemap_fdatawrite_wbc+0x68/0x90
[ 8948.300874] ? do_raw_spin_unlock+0x4b/0xa0
[ 8948.301243] start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.301706] ? lock_release+0x155/0x4a0
[ 8948.302055] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.302564] flush_space+0x1f2/0x630 [btrfs]
[ 8948.302970] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.303510] process_one_work+0x252/0x5a0
[ 8948.303860] ? process_one_work+0x5a0/0x5a0
[ 8948.304221] worker_thread+0x55/0x3b0
[ 8948.304543] ? process_one_work+0x5a0/0x5a0
[ 8948.304904] kthread+0xf2/0x120
[ 8948.305184] ? kthread_complete_and_exit+0x20/0x20
[ 8948.305598] ret_from_fork+0x22/0x30
[ 8948.305921] </TASK>
It all comes from the fact that btrfs_start_delalloc_roots() takes the
delalloc_root_mutex, in the transaction commit path we are holding a
read lock on one of the superblock's freeze semaphores (via
sb_start_intwrite()), the async reclaim task can also do a call to
btrfs_start_delalloc_roots(), which ends up triggering writeback with
calls to filemap_fdatawrite_wbc(), resulting in extent allocation which
in turn can call btrfs_start_transaction(), which will result in taking
the freeze semaphore via sb_start_intwrite(), forming a nasty dependency
on all those locks which can be taken in different orders by different
code paths.
So just adopt the simple approach of calling try_to_writeback_inodes_sb()
at btrfs_start_delalloc_flush().
Link: https://lore.kernel.org/linux-btrfs/20220130005258.GA7465@cuci.nl/
Link: https://lore.kernel.org/linux-btrfs/43acc426-d683-d1b6-729d-c6bc4a2fff4d@gm…
Link: https://lore.kernel.org/linux-btrfs/6833930a-08d7-6fbc-0141-eb9cdfd6bb4d@gm…
Link: https://lore.kernel.org/linux-btrfs/20190322041731.GF16651@hungrycats.org/
Reviewed-by: Omar Sandoval <osandov(a)fb.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
[ add more link reports ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Minor context change fixed]
Signed-off-by: Zhi Yang <Zhi.Yang(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
fs/btrfs/transaction.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 21a5a963c70e..8c0703f8037b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2021,16 +2021,24 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_trans_handle *trans)
struct btrfs_fs_info *fs_info = trans->fs_info;
/*
- * We use writeback_inodes_sb here because if we used
+ * We use try_to_writeback_inodes_sb() here because if we used
* btrfs_start_delalloc_roots we would deadlock with fs freeze.
* Currently are holding the fs freeze lock, if we do an async flush
* we'll do btrfs_join_transaction() and deadlock because we need to
* wait for the fs freeze lock. Using the direct flushing we benefit
* from already being in a transaction and our join_transaction doesn't
* have to re-take the fs freeze lock.
+ *
+ * Note that try_to_writeback_inodes_sb() will only trigger writeback
+ * if it can read lock sb->s_umount. It will always be able to lock it,
+ * except when the filesystem is being unmounted or being frozen, but in
+ * those cases sync_filesystem() is called, which results in calling
+ * writeback_inodes_sb() while holding a write lock on sb->s_umount.
+ * Note that we don't call writeback_inodes_sb() directly, because it
+ * will emit a warning if sb->s_umount is not locked.
*/
if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) {
- writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC);
+ try_to_writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC);
} else {
struct btrfs_pending_snapshot *pending;
struct list_head *head = &trans->transaction->pending_snapshots;
--
2.34.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ab00ddd802f80e31fc9639c652d736fe3913feae
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051243-handprint-impulse-c20a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ab00ddd802f80e31fc9639c652d736fe3913feae Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang(a)linux.alibaba.com>
Date: Wed, 23 Apr 2025 18:36:45 +0800
Subject: [PATCH] selftests/mm: compaction_test: support platform with huge
mount of memory
When running mm selftest to verify mm patches, 'compaction_test' case
failed on an x86 server with 1TB memory. And the root cause is that it
has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB
for that x86 server, and when it succeeds, it expects it's large than 1/3
of 80% of the free memory in system. This logic only works for platform
with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false
alarm for others.
Fix it by changing the fixed page number to self-adjustable number
according to the real number of free memory.
Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com>
Acked-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)inux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 2c3a0eb6b22d..9bc4591c7b16 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
int compaction_index = 0;
char nr_hugepages[20] = {0};
char init_nr_hugepages[24] = {0};
+ char target_nr_hugepages[24] = {0};
+ int slen;
snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
"%lu", initial_nr_hugepages);
@@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
goto out;
}
- /* Request a large number of huge pages. The Kernel will allocate
- as much as it can */
- if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
- ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
+ /*
+ * Request huge pages for about half of the free memory. The Kernel
+ * will allocate as much as it can, and we expect it will get at least 1/3
+ */
+ nr_hugepages_ul = mem_free / hugepage_size / 2;
+ snprintf(target_nr_hugepages, sizeof(target_nr_hugepages),
+ "%lu", nr_hugepages_ul);
+
+ slen = strlen(target_nr_hugepages);
+ if (write(fd, target_nr_hugepages, slen) != slen) {
+ ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n",
+ nr_hugepages_ul, strerror(errno));
goto close_fd;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 94cff94634e506a4a44684bee1875d2dbf782722
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051257-skater-skeleton-9a3d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 94cff94634e506a4a44684bee1875d2dbf782722 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Date: Fri, 4 Apr 2025 15:31:16 +0200
Subject: [PATCH] clocksource/i8253: Use raw_spinlock_irqsave() in
clockevent_i8253_disable()
On x86 during boot, clockevent_i8253_disable() can be invoked via
x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens
with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice
that i8253_lock is used in hard interrupt context. This causes lockdep to
complain because it observed the lock being acquired with interrupts
enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with
raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index 39f7c2d736d1..b603c25f3dfa 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void)
#ifdef CONFIG_CLKEVT_I8253
void clockevent_i8253_disable(void)
{
- raw_spin_lock(&i8253_lock);
+ guard(raw_spinlock_irqsave)(&i8253_lock);
/*
* Writing the MODE register should stop the counter, according to
@@ -132,8 +132,6 @@ void clockevent_i8253_disable(void)
outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE);
-
- raw_spin_unlock(&i8253_lock);
}
static int pit_shutdown(struct clock_event_device *evt)
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 9ad7974856926129f190ffbe3beea78460b3b7cc ]
If it looks like there's another subframe in the A-MSDU
but the header isn't fully there, we can end up reading
data out of bounds, only to discard later. Make this a
bit more careful and check if the subframe header can
even be present.
Reported-by: syzbot+d050d437fe47d479d210(a)syzkaller.appspotmail.com
Link: https://msgid.link/20240226203405.a731e2c95e38.I82ce7d8c0cc8970ce29d0a39fdc…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
[Minor conflict resolved due to code context change. And routine
ieee80211_is_valid_amsdu is introduced by commit fe4a6d2db3ba
("wifi: mac80211: implement support for yet another mesh A-MSDU format")
after 6.4.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
net/wireless/util.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 6ebc6567b287..0fd48361e3e1 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -751,24 +751,27 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list,
struct sk_buff *frame = NULL;
u16 ethertype;
u8 *payload;
- int offset = 0, remaining;
+ int offset = 0;
struct ethhdr eth;
bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb);
bool reuse_skb = false;
bool last = false;
while (!last) {
+ int remaining = skb->len - offset;
unsigned int subframe_len;
int len;
u8 padding;
+ if (sizeof(eth) > remaining)
+ goto purge;
+
skb_copy_bits(skb, offset, ð, sizeof(eth));
len = ntohs(eth.h_proto);
subframe_len = sizeof(struct ethhdr) + len;
padding = (4 - subframe_len) & 0x3;
/* the last MSDU has no padding */
- remaining = skb->len - offset;
if (subframe_len > remaining)
goto purge;
/* mitigate A-MSDU aggregation injection attacks */
--
2.34.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x f31fe8165d365379d858c53bef43254c7d6d1cfd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051208-hardly-dole-1422@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f31fe8165d365379d858c53bef43254c7d6d1cfd Mon Sep 17 00:00:00 2001
From: Naman Jain <namjain(a)linux.microsoft.com>
Date: Fri, 2 May 2025 13:18:10 +0530
Subject: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
On regular bootup, devices get registered to VMBus first, so when
uio_hv_generic driver for a particular device type is probed,
the device is already initialized and added, so sysfs creation in
hv_uio_probe() works fine. However, when the device is removed
and brought back, the channel gets rescinded and the device again gets
registered to VMBus. However this time, the uio_hv_generic driver is
already registered to probe for that device and in this case sysfs
creation is tried before the device's kobject gets initialized
completely.
Fix this by moving the core logic of sysfs creation of ring buffer,
from uio_hv_generic to HyperV's VMBus driver, where the rest of the
sysfs attributes for the channels are defined. While doing that, make
use of attribute groups and macros, instead of creating sysfs
directly, to ensure better error handling and code flow.
Problematic path:
vmbus_process_offer (A new offer comes for the VMBus device)
vmbus_add_channel_work
vmbus_device_register
|-> device_register
| |...
| |-> hv_uio_probe
| |...
| |-> sysfs_create_bin_file (leads to a warning as
| the primary channel's kobject, which is used to
| create the sysfs file, is not yet initialized)
|-> kset_create_and_add
|-> vmbus_add_channel_kobj (initialization of the primary
channel's kobject happens later)
Above code flow is sequential and the warning is always reproducible in
this path.
Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
Cc: stable(a)kernel.org
Suggested-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Suggested-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Michael Kelley <mhklinux(a)outlook.com>
Tested-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..0b450e53161e 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -477,4 +477,10 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
#endif /* CONFIG_HYPERV_TESTING */
+/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+ int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_struct *vma));
+int hv_remove_ring_sysfs(struct vmbus_channel *channel);
+
#endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 8d3cff42bdbb..0f16a83cc2d6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1802,6 +1802,27 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR_RO(subchannel_id);
+static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
+ const struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
+
+ /*
+ * hv_(create|remove)_ring_sysfs implementation ensures that mmap_ring_buffer
+ * is not NULL.
+ */
+ return channel->mmap_ring_buffer(channel, vma);
+}
+
+static struct bin_attribute chan_attr_ring_buffer = {
+ .attr = {
+ .name = "ring",
+ .mode = 0600,
+ },
+ .size = 2 * SZ_2M,
+ .mmap = hv_mmap_ring_buffer_wrapper,
+};
static struct attribute *vmbus_chan_attrs[] = {
&chan_attr_out_mask.attr,
&chan_attr_in_mask.attr,
@@ -1821,6 +1842,11 @@ static struct attribute *vmbus_chan_attrs[] = {
NULL
};
+static struct bin_attribute *vmbus_chan_bin_attrs[] = {
+ &chan_attr_ring_buffer,
+ NULL
+};
+
/*
* Channel-level attribute_group callback function. Returns the permission for
* each attribute, and returns 0 if an attribute is not visible.
@@ -1841,9 +1867,24 @@ static umode_t vmbus_chan_attr_is_visible(struct kobject *kobj,
return attr->mode;
}
+static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
+ const struct bin_attribute *attr, int idx)
+{
+ const struct vmbus_channel *channel =
+ container_of(kobj, struct vmbus_channel, kobj);
+
+ /* Hide ring attribute if channel's ring_sysfs_visible is set to false */
+ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
+ return 0;
+
+ return attr->attr.mode;
+}
+
static const struct attribute_group vmbus_chan_group = {
.attrs = vmbus_chan_attrs,
- .is_visible = vmbus_chan_attr_is_visible
+ .bin_attrs = vmbus_chan_bin_attrs,
+ .is_visible = vmbus_chan_attr_is_visible,
+ .is_bin_visible = vmbus_chan_bin_attr_is_visible,
};
static const struct kobj_type vmbus_chan_ktype = {
@@ -1851,6 +1892,63 @@ static const struct kobj_type vmbus_chan_ktype = {
.release = vmbus_chan_release,
};
+/**
+ * hv_create_ring_sysfs() - create "ring" sysfs entry corresponding to ring buffers for a channel.
+ * @channel: Pointer to vmbus_channel structure
+ * @hv_mmap_ring_buffer: function pointer for initializing the function to be called on mmap of
+ * channel's "ring" sysfs node, which is for the ring buffer of that channel.
+ * Function pointer is of below type:
+ * int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ * struct vm_area_struct *vma))
+ * This has a pointer to the channel and a pointer to vm_area_struct,
+ * used for mmap, as arguments.
+ *
+ * Sysfs node for ring buffer of a channel is created along with other fields, however its
+ * visibility is disabled by default. Sysfs creation needs to be controlled when the use-case
+ * is running.
+ * For example, HV_NIC device is used either by uio_hv_generic or hv_netvsc at any given point of
+ * time, and "ring" sysfs is needed only when uio_hv_generic is bound to that device. To avoid
+ * exposing the ring buffer by default, this function is reponsible to enable visibility of
+ * ring for userspace to use.
+ * Note: Race conditions can happen with userspace and it is not encouraged to create new
+ * use-cases for this. This was added to maintain backward compatibility, while solving
+ * one of the race conditions in uio_hv_generic while creating sysfs.
+ *
+ * Returns 0 on success or error code on failure.
+ */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+ int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_struct *vma))
+{
+ struct kobject *kobj = &channel->kobj;
+
+ channel->mmap_ring_buffer = hv_mmap_ring_buffer;
+ channel->ring_sysfs_visible = true;
+
+ return sysfs_update_group(kobj, &vmbus_chan_group);
+}
+EXPORT_SYMBOL_GPL(hv_create_ring_sysfs);
+
+/**
+ * hv_remove_ring_sysfs() - remove ring sysfs entry corresponding to ring buffers for a channel.
+ * @channel: Pointer to vmbus_channel structure
+ *
+ * Hide "ring" sysfs for a channel by changing its is_visible attribute and updating sysfs group.
+ *
+ * Returns 0 on success or error code on failure.
+ */
+int hv_remove_ring_sysfs(struct vmbus_channel *channel)
+{
+ struct kobject *kobj = &channel->kobj;
+ int ret;
+
+ channel->ring_sysfs_visible = false;
+ ret = sysfs_update_group(kobj, &vmbus_chan_group);
+ channel->mmap_ring_buffer = NULL;
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hv_remove_ring_sysfs);
+
/*
* vmbus_add_channel_kobj - setup a sub-directory under device/channels
*/
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..69c1df0f4ca5 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -131,15 +131,12 @@ static void hv_uio_rescind(struct vmbus_channel *channel)
vmbus_device_unregister(channel->device_obj);
}
-/* Sysfs API to allow mmap of the ring buffers
+/* Function used for mmap of ring buffer sysfs interface.
* The ring buffer is allocated as contiguous memory by vmbus_open
*/
-static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
- const struct bin_attribute *attr,
- struct vm_area_struct *vma)
+static int
+hv_uio_ring_mmap(struct vmbus_channel *channel, struct vm_area_struct *vma)
{
- struct vmbus_channel *channel
- = container_of(kobj, struct vmbus_channel, kobj);
void *ring_buffer = page_address(channel->ringbuffer_page);
if (channel->state != CHANNEL_OPENED_STATE)
@@ -149,15 +146,6 @@ static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
channel->ringbuffer_pagecount << PAGE_SHIFT);
}
-static const struct bin_attribute ring_buffer_bin_attr = {
- .attr = {
- .name = "ring",
- .mode = 0600,
- },
- .size = 2 * SZ_2M,
- .mmap = hv_uio_ring_mmap,
-};
-
/* Callback from VMBUS subsystem when new channel created. */
static void
hv_uio_new_channel(struct vmbus_channel *new_sc)
@@ -178,8 +166,7 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
/* Disable interrupts on sub channel */
new_sc->inbound.ring_buffer->interrupt_mask = 1;
set_channel_read_mode(new_sc, HV_CALL_ISR);
-
- ret = sysfs_create_bin_file(&new_sc->kobj, &ring_buffer_bin_attr);
+ ret = hv_create_ring_sysfs(new_sc, hv_uio_ring_mmap);
if (ret) {
dev_err(device, "sysfs create ring bin file failed; %d\n", ret);
vmbus_close(new_sc);
@@ -350,10 +337,18 @@ hv_uio_probe(struct hv_device *dev,
goto fail_close;
}
- ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
- if (ret)
- dev_notice(&dev->device,
- "sysfs create ring bin file failed; %d\n", ret);
+ /*
+ * This internally calls sysfs_update_group, which returns a non-zero value if it executes
+ * before sysfs_create_group. This is expected as the 'ring' will be created later in
+ * vmbus_device_register() -> vmbus_add_channel_kobj(). Thus, no need to check the return
+ * value and print warning.
+ *
+ * Creating/exposing sysfs in driver probe is not encouraged as it can lead to race
+ * conditions with userspace. For backward compatibility, "ring" sysfs could not be removed
+ * or decoupled from uio_hv_generic probe. Userspace programs can make use of inotify
+ * APIs to make sure that ring is created.
+ */
+ hv_create_ring_sysfs(channel, hv_uio_ring_mmap);
hv_set_drvdata(dev, pdata);
@@ -375,7 +370,7 @@ hv_uio_remove(struct hv_device *dev)
if (!pdata)
return;
- sysfs_remove_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+ hv_remove_ring_sysfs(dev->channel);
uio_unregister_device(&pdata->info);
hv_uio_cleanup(dev, pdata);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 675959fb97ba..d6ffe01962c2 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1002,6 +1002,12 @@ struct vmbus_channel {
/* The max size of a packet on this channel */
u32 max_pkt_size;
+
+ /* function to mmap ring buffer memory to the channel's sysfs ring attribute */
+ int (*mmap_ring_buffer)(struct vmbus_channel *channel, struct vm_area_struct *vma);
+
+ /* boolean to control visibility of sysfs for ring buffer */
+ bool ring_sysfs_visible;
};
#define lock_requestor(channel, flags) \
From: Wayne Lin <wayne.lin(a)amd.com>
[ Upstream commit fcf6a49d79923a234844b8efe830a61f3f0584e4 ]
[Why]
When unplug one of monitors connected after mst hub, encounter null pointer dereference.
It's due to dc_sink get released immediately in early_unregister() or detect_ctx(). When
commit new state which directly referring to info stored in dc_sink will cause null pointer
dereference.
[how]
Remove redundant checking condition. Relevant condition should already be covered by checking
if dsc_aux is null or not. Also reset dsc_aux to NULL when the connector is disconnected.
Reviewed-by: Jerry Zuo <jerry.zuo(a)amd.com>
Acked-by: Zaeem Mohamed <zaeem.mohamed(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[The deleted codes in this fix is introduced by commit b9b5a82c5321
("drm/amd/display: Fix DSC-re-computing") after 6.11.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 5eb994ed5471..6bb590bc7c19 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -183,6 +183,8 @@ amdgpu_dm_mst_connector_early_unregister(struct drm_connector *connector)
dc_sink_release(dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
}
aconnector->mst_status = MST_STATUS_DEFAULT;
@@ -487,6 +489,8 @@ dm_dp_mst_detect(struct drm_connector *connector,
dc_sink_release(aconnector->dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
amdgpu_dm_set_mst_status(&aconnector->mst_status,
MST_REMOTE_EDID | MST_ALLOCATE_NEW_PAYLOAD | MST_CLEAR_ALLOCATED_PAYLOAD,
--
2.34.1
From: Wei Fang <wei.fang(a)nxp.com>
[ Upstream commit c2e0c58b25a0a0c37ec643255558c5af4450c9f5 ]
There is a deadlock issue found in sungem driver, please refer to the
commit ac0a230f719b ("eth: sungem: remove .ndo_poll_controller to avoid
deadlocks"). The root cause of the issue is that netpoll is in atomic
context and disable_irq() is called by .ndo_poll_controller interface
of sungem driver, however, disable_irq() might sleep. After analyzing
the implementation of fec_poll_controller(), the fec driver should have
the same issue. Due to the fec driver uses NAPI for TX completions, the
.ndo_poll_controller is unnecessary to be implemented in the fec driver,
so fec_poll_controller() can be safely removed.
Fixes: 7f5c6addcdc0 ("net/fec: add poll controller function for fec nic")
Signed-off-by: Wei Fang <wei.fang(a)nxp.com>
Link: https://lore.kernel.org/r/20240511062009.652918-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[Minor context change fixed]
Signed-off-by: Feng Liu <Feng.Liu3(a)windriver.com>
Signed-off-by: He Zhe <Zhe.He(a)windriver.com>
---
Verified the build test.
---
drivers/net/ethernet/freescale/fec_main.c | 26 -----------------------
1 file changed, 26 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 8e30e999456d..01579719f4ce 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3255,29 +3255,6 @@ fec_set_mac_address(struct net_device *ndev, void *p)
return 0;
}
-#ifdef CONFIG_NET_POLL_CONTROLLER
-/**
- * fec_poll_controller - FEC Poll controller function
- * @dev: The FEC network adapter
- *
- * Polled functionality used by netconsole and others in non interrupt mode
- *
- */
-static void fec_poll_controller(struct net_device *dev)
-{
- int i;
- struct fec_enet_private *fep = netdev_priv(dev);
-
- for (i = 0; i < FEC_IRQ_NUM; i++) {
- if (fep->irq[i] > 0) {
- disable_irq(fep->irq[i]);
- fec_enet_interrupt(fep->irq[i], dev);
- enable_irq(fep->irq[i]);
- }
- }
-}
-#endif
-
static inline void fec_enet_set_netdev_features(struct net_device *netdev,
netdev_features_t features)
{
@@ -3351,9 +3328,6 @@ static const struct net_device_ops fec_netdev_ops = {
.ndo_tx_timeout = fec_timeout,
.ndo_set_mac_address = fec_set_mac_address,
.ndo_do_ioctl = fec_enet_ioctl,
-#ifdef CONFIG_NET_POLL_CONTROLLER
- .ndo_poll_controller = fec_poll_controller,
-#endif
.ndo_set_features = fec_set_features,
};
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 94cff94634e506a4a44684bee1875d2dbf782722
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051256-safeguard-purchase-6c4e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 94cff94634e506a4a44684bee1875d2dbf782722 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Date: Fri, 4 Apr 2025 15:31:16 +0200
Subject: [PATCH] clocksource/i8253: Use raw_spinlock_irqsave() in
clockevent_i8253_disable()
On x86 during boot, clockevent_i8253_disable() can be invoked via
x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens
with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice
that i8253_lock is used in hard interrupt context. This causes lockdep to
complain because it observed the lock being acquired with interrupts
enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with
raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index 39f7c2d736d1..b603c25f3dfa 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void)
#ifdef CONFIG_CLKEVT_I8253
void clockevent_i8253_disable(void)
{
- raw_spin_lock(&i8253_lock);
+ guard(raw_spinlock_irqsave)(&i8253_lock);
/*
* Writing the MODE register should stop the counter, according to
@@ -132,8 +132,6 @@ void clockevent_i8253_disable(void)
outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE);
-
- raw_spin_unlock(&i8253_lock);
}
static int pit_shutdown(struct clock_event_device *evt)
LoongArch's toolchain may change the default code model from normal to
medium. This is unnecessary for kernel, and generates some relocations
which cannot be handled by the module loader. So explicitly specify the
code model to normal in Makefile (for Rust 'normal' is 'small').
Cc: stable(a)vger.kernel.org
Tested-by: Haiyong Sun <sunhaiyong(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
We may use new toolchain to build 6.1/6.6 LTS, backport it to avoid
problems.
arch/loongarch/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index a74bbcb05ee1..f2966745b058 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -55,7 +55,7 @@ endif
ifdef CONFIG_64BIT
ld-emul = $(64bit-emul)
-cflags-y += -mabi=lp64s
+cflags-y += -mabi=lp64s -mcmodel=normal
endif
cflags-y += -pipe -msoft-float
--
2.47.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ab00ddd802f80e31fc9639c652d736fe3913feae
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051243-poster-expiring-81bc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ab00ddd802f80e31fc9639c652d736fe3913feae Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang(a)linux.alibaba.com>
Date: Wed, 23 Apr 2025 18:36:45 +0800
Subject: [PATCH] selftests/mm: compaction_test: support platform with huge
mount of memory
When running mm selftest to verify mm patches, 'compaction_test' case
failed on an x86 server with 1TB memory. And the root cause is that it
has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB
for that x86 server, and when it succeeds, it expects it's large than 1/3
of 80% of the free memory in system. This logic only works for platform
with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false
alarm for others.
Fix it by changing the fixed page number to self-adjustable number
according to the real number of free memory.
Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com>
Acked-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)inux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 2c3a0eb6b22d..9bc4591c7b16 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
int compaction_index = 0;
char nr_hugepages[20] = {0};
char init_nr_hugepages[24] = {0};
+ char target_nr_hugepages[24] = {0};
+ int slen;
snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
"%lu", initial_nr_hugepages);
@@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
goto out;
}
- /* Request a large number of huge pages. The Kernel will allocate
- as much as it can */
- if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
- ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
+ /*
+ * Request huge pages for about half of the free memory. The Kernel
+ * will allocate as much as it can, and we expect it will get at least 1/3
+ */
+ nr_hugepages_ul = mem_free / hugepage_size / 2;
+ snprintf(target_nr_hugepages, sizeof(target_nr_hugepages),
+ "%lu", nr_hugepages_ul);
+
+ slen = strlen(target_nr_hugepages);
+ if (write(fd, target_nr_hugepages, slen) != slen) {
+ ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n",
+ nr_hugepages_ul, strerror(errno));
goto close_fd;
}
Hi Greg
Two patches for synaptics driver seems to not have been backported to 5.4
but have been added to 5.10 and later versions. These patches are:
1. https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tre…
2. https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tre…
I'm sending them after rebasing them to 5.4.
Thanks
RESEND: since I forgot to Cc stable
Aditya Garg (1):
Input: synaptics - enable InterTouch on TUXEDO InfinityBook Pro 14 v5
Dmitry Torokhov (1):
Input: synaptics - enable SMBus for HP Elitebook 850 G1
drivers/input/mouse/synaptics.c | 2 ++
1 file changed, 2 insertions(+)
--
2.49.0
From: Puranjay Mohan <puranjay(a)kernel.org>
[ Upstream commit 19d3c179a37730caf600a97fed3794feac2b197b ]
When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls
__bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them
the struct bpf_tramp_image *im pointer as an argument in R0.
The trampoline generation code uses emit_addr_mov_i64() to emit
instructions for moving the bpf_tramp_image address into R0, but
emit_addr_mov_i64() assumes the address to be in the vmalloc() space
and uses only 48 bits. Because bpf_tramp_image is allocated using
kzalloc(), its address can use more than 48-bits, in this case the
trampoline will pass an invalid address to __bpf_tramp_enter/exit()
causing a kernel crash.
Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64()
as it can work with addresses that are greater than 48-bits.
Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64")
Signed-off-by: Puranjay Mohan <puranjay(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Closes: https://lore.kernel.org/all/SJ0PR15MB461564D3F7E7A763498CA6A8CBDB2@SJ0PR15M…
Link: https://lore.kernel.org/bpf/20240711151838.43469-1-puranjay@kernel.org
[Minor context change fixed.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
arch/arm64/net/bpf_jit_comp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index c04ace8f4843..3168343815b3 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1893,7 +1893,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
if (flags & BPF_TRAMP_F_CALL_ORIG) {
- emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+ emit_a64_mov_i64(A64_R(0), (const u64)im, ctx);
emit_call((const u64)__bpf_tramp_enter, ctx);
}
@@ -1937,7 +1937,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
if (flags & BPF_TRAMP_F_CALL_ORIG) {
im->ip_epilogue = ctx->image + ctx->idx;
- emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+ emit_a64_mov_i64(A64_R(0), (const u64)im, ctx);
emit_call((const u64)__bpf_tramp_exit, ctx);
}
--
2.34.1
From: Puranjay Mohan <puranjay(a)kernel.org>
[ Upstream commit 19d3c179a37730caf600a97fed3794feac2b197b ]
When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls
__bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them
the struct bpf_tramp_image *im pointer as an argument in R0.
The trampoline generation code uses emit_addr_mov_i64() to emit
instructions for moving the bpf_tramp_image address into R0, but
emit_addr_mov_i64() assumes the address to be in the vmalloc() space
and uses only 48 bits. Because bpf_tramp_image is allocated using
kzalloc(), its address can use more than 48-bits, in this case the
trampoline will pass an invalid address to __bpf_tramp_enter/exit()
causing a kernel crash.
Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64()
as it can work with addresses that are greater than 48-bits.
Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64")
Signed-off-by: Puranjay Mohan <puranjay(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Closes: https://lore.kernel.org/all/SJ0PR15MB461564D3F7E7A763498CA6A8CBDB2@SJ0PR15M…
Link: https://lore.kernel.org/bpf/20240711151838.43469-1-puranjay@kernel.org
[Minor context change fixed.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
arch/arm64/net/bpf_jit_comp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 5074bd1d37b5..7c5156e7d31e 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1952,7 +1952,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
if (flags & BPF_TRAMP_F_CALL_ORIG) {
- emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+ emit_a64_mov_i64(A64_R(0), (const u64)im, ctx);
emit_call((const u64)__bpf_tramp_enter, ctx);
}
@@ -1996,7 +1996,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
if (flags & BPF_TRAMP_F_CALL_ORIG) {
im->ip_epilogue = ctx->image + ctx->idx;
- emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+ emit_a64_mov_i64(A64_R(0), (const u64)im, ctx);
emit_call((const u64)__bpf_tramp_exit, ctx);
}
--
2.34.1
From: Zi Yan <ziy(a)nvidia.com>
nr_failed was missing the large folio splits from migrate_pages_batch()
and can cause a mismatch between migrate_pages() return value and the
number of not migrated pages, i.e., when the return value of
migrate_pages() is 0, there are still pages left in the from page list.
It will happen when a non-PMD THP large folio fails to migrate due to
-ENOMEM and is split successfully but not all the split pages are not
migrated, migrate_pages_batch() would return non-zero, but
astats.nr_thp_split = 0. nr_failed would be 0 and returned to the caller
of migrate_pages(), but the not migrated pages are left in the from page
list without being added back to LRU lists.
Fix it by adding a new nr_split counter for large folio splits and adding
it to nr_failed in migrate_page_sync() after migrate_pages_batch() is
done.
Link: https://lkml.kernel.org/r/20231017163129.2025214-1-zi.yan@sent.com
Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: Huang Ying <ying.huang(a)intel.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
This patch has a Fixes tag and should be backported to 6.6, I don't know
why hasn't bakported.
mm/migrate.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 1004b1def1c2..4ed470885217 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1504,6 +1504,7 @@ struct migrate_pages_stats {
int nr_thp_succeeded; /* THP migrated successfully */
int nr_thp_failed; /* THP failed to be migrated */
int nr_thp_split; /* THP split before migrating */
+ int nr_split; /* Large folio (include THP) split before migrating */
};
/*
@@ -1623,6 +1624,7 @@ static int migrate_pages_batch(struct list_head *from,
int nr_retry_pages = 0;
int pass = 0;
bool is_thp = false;
+ bool is_large = false;
struct folio *folio, *folio2, *dst = NULL, *dst2;
int rc, rc_saved = 0, nr_pages;
LIST_HEAD(unmap_folios);
@@ -1638,7 +1640,8 @@ static int migrate_pages_batch(struct list_head *from,
nr_retry_pages = 0;
list_for_each_entry_safe(folio, folio2, from, lru) {
- is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
+ is_large = folio_test_large(folio);
+ is_thp = is_large && folio_test_pmd_mappable(folio);
nr_pages = folio_nr_pages(folio);
cond_resched();
@@ -1658,6 +1661,7 @@ static int migrate_pages_batch(struct list_head *from,
stats->nr_thp_failed++;
if (!try_split_folio(folio, split_folios)) {
stats->nr_thp_split++;
+ stats->nr_split++;
continue;
}
stats->nr_failed_pages += nr_pages;
@@ -1686,11 +1690,12 @@ static int migrate_pages_batch(struct list_head *from,
nr_failed++;
stats->nr_thp_failed += is_thp;
/* Large folio NUMA faulting doesn't split to retry. */
- if (folio_test_large(folio) && !nosplit) {
+ if (is_large && !nosplit) {
int ret = try_split_folio(folio, split_folios);
if (!ret) {
stats->nr_thp_split += is_thp;
+ stats->nr_split += is_large;
break;
} else if (reason == MR_LONGTERM_PIN &&
ret == -EAGAIN) {
@@ -1836,6 +1841,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
stats->nr_succeeded += astats.nr_succeeded;
stats->nr_thp_succeeded += astats.nr_thp_succeeded;
stats->nr_thp_split += astats.nr_thp_split;
+ stats->nr_split += astats.nr_split;
if (rc < 0) {
stats->nr_failed_pages += astats.nr_failed_pages;
stats->nr_thp_failed += astats.nr_thp_failed;
@@ -1843,7 +1849,11 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
return rc;
}
stats->nr_thp_failed += astats.nr_thp_split;
- nr_failed += astats.nr_thp_split;
+ /*
+ * Do not count rc, as pages will be retried below.
+ * Count nr_split only, since it includes nr_thp_split.
+ */
+ nr_failed += astats.nr_split;
/*
* Fall back to migrate all failed folios one by one synchronously. All
* failed folios except split THPs will be retried, so their failure
--
2.47.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ab00ddd802f80e31fc9639c652d736fe3913feae
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051244-backwater-computer-184f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ab00ddd802f80e31fc9639c652d736fe3913feae Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang(a)linux.alibaba.com>
Date: Wed, 23 Apr 2025 18:36:45 +0800
Subject: [PATCH] selftests/mm: compaction_test: support platform with huge
mount of memory
When running mm selftest to verify mm patches, 'compaction_test' case
failed on an x86 server with 1TB memory. And the root cause is that it
has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB
for that x86 server, and when it succeeds, it expects it's large than 1/3
of 80% of the free memory in system. This logic only works for platform
with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false
alarm for others.
Fix it by changing the fixed page number to self-adjustable number
according to the real number of free memory.
Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com>
Acked-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)inux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 2c3a0eb6b22d..9bc4591c7b16 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
int compaction_index = 0;
char nr_hugepages[20] = {0};
char init_nr_hugepages[24] = {0};
+ char target_nr_hugepages[24] = {0};
+ int slen;
snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
"%lu", initial_nr_hugepages);
@@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
goto out;
}
- /* Request a large number of huge pages. The Kernel will allocate
- as much as it can */
- if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
- ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
+ /*
+ * Request huge pages for about half of the free memory. The Kernel
+ * will allocate as much as it can, and we expect it will get at least 1/3
+ */
+ nr_hugepages_ul = mem_free / hugepage_size / 2;
+ snprintf(target_nr_hugepages, sizeof(target_nr_hugepages),
+ "%lu", nr_hugepages_ul);
+
+ slen = strlen(target_nr_hugepages);
+ if (write(fd, target_nr_hugepages, slen) != slen) {
+ ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n",
+ nr_hugepages_ul, strerror(errno));
goto close_fd;
}
commit e67e0eb6a98b261caf45048f9eb95fd7609289c0 upstream.
LoongArch's toolchain may change the default code model from normal to
medium. This is unnecessary for kernel, and generates some relocations
which cannot be handled by the module loader. So explicitly specify the
code model to normal in Makefile (for Rust 'normal' is 'small').
Cc: stable(a)vger.kernel.org
Tested-by: Haiyong Sun <sunhaiyong(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Add upstream commit id.
arch/loongarch/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index a74bbcb05ee1..f2966745b058 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -55,7 +55,7 @@ endif
ifdef CONFIG_64BIT
ld-emul = $(64bit-emul)
-cflags-y += -mabi=lp64s
+cflags-y += -mabi=lp64s -mcmodel=normal
endif
cflags-y += -pipe -msoft-float
--
2.47.1
From: Zi Yan <ziy(a)nvidia.com>
commit a259945efe6ada94087ef666e9b38f8e34ea34ba upstream.
nr_failed was missing the large folio splits from migrate_pages_batch()
and can cause a mismatch between migrate_pages() return value and the
number of not migrated pages, i.e., when the return value of
migrate_pages() is 0, there are still pages left in the from page list.
It will happen when a non-PMD THP large folio fails to migrate due to
-ENOMEM and is split successfully but not all the split pages are not
migrated, migrate_pages_batch() would return non-zero, but
astats.nr_thp_split = 0. nr_failed would be 0 and returned to the caller
of migrate_pages(), but the not migrated pages are left in the from page
list without being added back to LRU lists.
Fix it by adding a new nr_split counter for large folio splits and adding
it to nr_failed in migrate_page_sync() after migrate_pages_batch() is
done.
Link: https://lkml.kernel.org/r/20231017163129.2025214-1-zi.yan@sent.com
Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: Huang Ying <ying.huang(a)intel.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Add upstream commit id.
mm/migrate.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 1004b1def1c2..4ed470885217 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1504,6 +1504,7 @@ struct migrate_pages_stats {
int nr_thp_succeeded; /* THP migrated successfully */
int nr_thp_failed; /* THP failed to be migrated */
int nr_thp_split; /* THP split before migrating */
+ int nr_split; /* Large folio (include THP) split before migrating */
};
/*
@@ -1623,6 +1624,7 @@ static int migrate_pages_batch(struct list_head *from,
int nr_retry_pages = 0;
int pass = 0;
bool is_thp = false;
+ bool is_large = false;
struct folio *folio, *folio2, *dst = NULL, *dst2;
int rc, rc_saved = 0, nr_pages;
LIST_HEAD(unmap_folios);
@@ -1638,7 +1640,8 @@ static int migrate_pages_batch(struct list_head *from,
nr_retry_pages = 0;
list_for_each_entry_safe(folio, folio2, from, lru) {
- is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
+ is_large = folio_test_large(folio);
+ is_thp = is_large && folio_test_pmd_mappable(folio);
nr_pages = folio_nr_pages(folio);
cond_resched();
@@ -1658,6 +1661,7 @@ static int migrate_pages_batch(struct list_head *from,
stats->nr_thp_failed++;
if (!try_split_folio(folio, split_folios)) {
stats->nr_thp_split++;
+ stats->nr_split++;
continue;
}
stats->nr_failed_pages += nr_pages;
@@ -1686,11 +1690,12 @@ static int migrate_pages_batch(struct list_head *from,
nr_failed++;
stats->nr_thp_failed += is_thp;
/* Large folio NUMA faulting doesn't split to retry. */
- if (folio_test_large(folio) && !nosplit) {
+ if (is_large && !nosplit) {
int ret = try_split_folio(folio, split_folios);
if (!ret) {
stats->nr_thp_split += is_thp;
+ stats->nr_split += is_large;
break;
} else if (reason == MR_LONGTERM_PIN &&
ret == -EAGAIN) {
@@ -1836,6 +1841,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
stats->nr_succeeded += astats.nr_succeeded;
stats->nr_thp_succeeded += astats.nr_thp_succeeded;
stats->nr_thp_split += astats.nr_thp_split;
+ stats->nr_split += astats.nr_split;
if (rc < 0) {
stats->nr_failed_pages += astats.nr_failed_pages;
stats->nr_thp_failed += astats.nr_thp_failed;
@@ -1843,7 +1849,11 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
return rc;
}
stats->nr_thp_failed += astats.nr_thp_split;
- nr_failed += astats.nr_thp_split;
+ /*
+ * Do not count rc, as pages will be retried below.
+ * Count nr_split only, since it includes nr_thp_split.
+ */
+ nr_failed += astats.nr_split;
/*
* Fall back to migrate all failed folios one by one synchronously. All
* failed folios except split THPs will be retried, so their failure
--
2.47.1
From: Baokun Li <libaokun1(a)huawei.com>
[ Upstream commit b4b4fda34e535756f9e774fb2d09c4537b7dfd1c ]
In the following concurrency we will access the uninitialized rs->lock:
ext4_fill_super
ext4_register_sysfs
// sysfs registered msg_ratelimit_interval_ms
// Other processes modify rs->interval to
// non-zero via msg_ratelimit_interval_ms
ext4_orphan_cleanup
ext4_msg(sb, KERN_INFO, "Errors on filesystem, "
__ext4_msg
___ratelimit(&(EXT4_SB(sb)->s_msg_ratelimit_state)
if (!rs->interval) // do nothing if interval is 0
return 1;
raw_spin_trylock_irqsave(&rs->lock, flags)
raw_spin_trylock(lock)
_raw_spin_trylock
__raw_spin_trylock
spin_acquire(&lock->dep_map, 0, 1, _RET_IP_)
lock_acquire
__lock_acquire
register_lock_class
assign_lock_key
dump_stack();
ratelimit_state_init(&sbi->s_msg_ratelimit_state, 5 * HZ, 10);
raw_spin_lock_init(&rs->lock);
// init rs->lock here
and get the following dump_stack:
=========================================================
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 12 PID: 753 Comm: mount Tainted: G E 6.7.0-rc6-next-20231222 #504
[...]
Call Trace:
dump_stack_lvl+0xc5/0x170
dump_stack+0x18/0x30
register_lock_class+0x740/0x7c0
__lock_acquire+0x69/0x13a0
lock_acquire+0x120/0x450
_raw_spin_trylock+0x98/0xd0
___ratelimit+0xf6/0x220
__ext4_msg+0x7f/0x160 [ext4]
ext4_orphan_cleanup+0x665/0x740 [ext4]
__ext4_fill_super+0x21ea/0x2b10 [ext4]
ext4_fill_super+0x14d/0x360 [ext4]
[...]
=========================================================
Normally interval is 0 until s_msg_ratelimit_state is initialized, so
___ratelimit() does nothing. But registering sysfs precedes initializing
rs->lock, so it is possible to change rs->interval to a non-zero value
via the msg_ratelimit_interval_ms interface of sysfs while rs->lock is
uninitialized, and then a call to ext4_msg triggers the problem by
accessing an uninitialized rs->lock. Therefore register sysfs after all
initializations are complete to avoid such problems.
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20240102133730.1098120-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
[Minor context change fixed.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
fs/ext4/super.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7f0231b34905..8528f61854ab 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5496,19 +5496,15 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
if (err)
goto failed_mount6;
- err = ext4_register_sysfs(sb);
- if (err)
- goto failed_mount7;
-
err = ext4_init_orphan_info(sb);
if (err)
- goto failed_mount8;
+ goto failed_mount7;
#ifdef CONFIG_QUOTA
/* Enable quota usage during mount. */
if (ext4_has_feature_quota(sb) && !sb_rdonly(sb)) {
err = ext4_enable_quotas(sb);
if (err)
- goto failed_mount9;
+ goto failed_mount8;
}
#endif /* CONFIG_QUOTA */
@@ -5534,7 +5530,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
ext4_msg(sb, KERN_INFO, "recovery complete");
err = ext4_mark_recovery_complete(sb, es);
if (err)
- goto failed_mount10;
+ goto failed_mount9;
}
if (test_opt(sb, DISCARD) && !bdev_max_discard_sectors(sb->s_bdev))
@@ -5551,15 +5547,17 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
atomic_set(&sbi->s_warning_count, 0);
atomic_set(&sbi->s_msg_count, 0);
+ /* Register sysfs after all initializations are complete. */
+ err = ext4_register_sysfs(sb);
+ if (err)
+ goto failed_mount9;
+
return 0;
-failed_mount10:
+failed_mount9:
ext4_quota_off_umount(sb);
-failed_mount9: __maybe_unused
+failed_mount8: __maybe_unused
ext4_release_orphan_info(sb);
-failed_mount8:
- ext4_unregister_sysfs(sb);
- kobject_put(&sbi->s_kobj);
failed_mount7:
ext4_unregister_li_request(sb);
failed_mount6:
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 94cff94634e506a4a44684bee1875d2dbf782722
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051258-washbowl-alongside-de3d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 94cff94634e506a4a44684bee1875d2dbf782722 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Date: Fri, 4 Apr 2025 15:31:16 +0200
Subject: [PATCH] clocksource/i8253: Use raw_spinlock_irqsave() in
clockevent_i8253_disable()
On x86 during boot, clockevent_i8253_disable() can be invoked via
x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens
with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice
that i8253_lock is used in hard interrupt context. This causes lockdep to
complain because it observed the lock being acquired with interrupts
enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with
raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index 39f7c2d736d1..b603c25f3dfa 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void)
#ifdef CONFIG_CLKEVT_I8253
void clockevent_i8253_disable(void)
{
- raw_spin_lock(&i8253_lock);
+ guard(raw_spinlock_irqsave)(&i8253_lock);
/*
* Writing the MODE register should stop the counter, according to
@@ -132,8 +132,6 @@ void clockevent_i8253_disable(void)
outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE);
-
- raw_spin_unlock(&i8253_lock);
}
static int pit_shutdown(struct clock_event_device *evt)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ab00ddd802f80e31fc9639c652d736fe3913feae
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051245-jailbreak-unlinked-27ec@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ab00ddd802f80e31fc9639c652d736fe3913feae Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang(a)linux.alibaba.com>
Date: Wed, 23 Apr 2025 18:36:45 +0800
Subject: [PATCH] selftests/mm: compaction_test: support platform with huge
mount of memory
When running mm selftest to verify mm patches, 'compaction_test' case
failed on an x86 server with 1TB memory. And the root cause is that it
has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB
for that x86 server, and when it succeeds, it expects it's large than 1/3
of 80% of the free memory in system. This logic only works for platform
with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false
alarm for others.
Fix it by changing the fixed page number to self-adjustable number
according to the real number of free memory.
Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com>
Acked-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)inux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 2c3a0eb6b22d..9bc4591c7b16 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
int compaction_index = 0;
char nr_hugepages[20] = {0};
char init_nr_hugepages[24] = {0};
+ char target_nr_hugepages[24] = {0};
+ int slen;
snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
"%lu", initial_nr_hugepages);
@@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
goto out;
}
- /* Request a large number of huge pages. The Kernel will allocate
- as much as it can */
- if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
- ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
+ /*
+ * Request huge pages for about half of the free memory. The Kernel
+ * will allocate as much as it can, and we expect it will get at least 1/3
+ */
+ nr_hugepages_ul = mem_free / hugepage_size / 2;
+ snprintf(target_nr_hugepages, sizeof(target_nr_hugepages),
+ "%lu", nr_hugepages_ul);
+
+ slen = strlen(target_nr_hugepages);
+ if (write(fd, target_nr_hugepages, slen) != slen) {
+ ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n",
+ nr_hugepages_ul, strerror(errno));
goto close_fd;
}
V2: not to add extra read-back in vcn_v4_0_5_start as there is a
read-back already. New comment for better understanding.
On VCN v4.0.5 there is a race condition where the WPTR is not
updated after starting from idle when doorbell is used. The read-back
of regVCN_RB1_DB_CTRL register after written is to ensure the
doorbell_index is updated before it can work properly.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Cc: stable(a)vger.kernel.org
Signed-off-by: David (Ming Qiang) Wu <David.Wu3(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Tested-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index ed00d35039c13..e55b76d71367d 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -1034,6 +1034,10 @@ static int vcn_v4_0_5_start_dpg_mode(struct amdgpu_vcn_inst *vinst,
ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT |
VCN_RB1_DB_CTRL__EN_MASK);
+ /* Keeping one read-back to ensure all register writes are done, otherwise
+ * it may introduce race conditions */
+ RREG32_SOC15(VCN, inst_idx, regVCN_RB1_DB_CTRL);
+
return 0;
}
--
2.34.1
From: Ashish Kalra <ashish.kalra(a)amd.com>
When the shared pages are being made private during kdump preparation
there are additional checks to handle shared GHCB pages.
These additional checks include handling the case of GHCB page being
contained within a huge page.
The check for handling the case of GHCB contained within a huge
page incorrectly skips a page just below the GHCB page from being
transitioned back to private during kdump preparation.
This skipped page causes a 0x404 #VC exception when it is accessed
later while dumping guest memory during vmcore generation via kdump.
Correct the range to be checked for GHCB contained in a huge page.
Also ensure that the skipped huge page containing the GHCB page is
transitioned back to private by applying the correct address mask
later when changing GHCBs to private at end of kdump preparation.
Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com>
---
arch/x86/coco/sev/core.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index d35fec7b164a..30b74e4e4e88 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -1019,7 +1019,8 @@ static void unshare_all_memory(void)
data = per_cpu(runtime_data, cpu);
ghcb = (unsigned long)&data->ghcb_page;
- if (addr <= ghcb && ghcb <= addr + size) {
+ /* Handle the case of a huge page containing the GHCB page */
+ if (addr <= ghcb && ghcb < addr + size) {
skipped_addr = true;
break;
}
@@ -1131,8 +1132,8 @@ static void shutdown_all_aps(void)
void snp_kexec_finish(void)
{
struct sev_es_runtime_data *data;
+ unsigned long size, addr;
unsigned int level, cpu;
- unsigned long size;
struct ghcb *ghcb;
pte_t *pte;
@@ -1160,8 +1161,10 @@ void snp_kexec_finish(void)
ghcb = &data->ghcb_page;
pte = lookup_address((unsigned long)ghcb, &level);
size = page_level_size(level);
- set_pte_enc(pte, level, (void *)ghcb);
- snp_set_memory_private((unsigned long)ghcb, (size / PAGE_SIZE));
+ /* Handle the case of a huge page containing the GHCB page */
+ addr = (unsigned long)ghcb & page_level_mask(level);
+ set_pte_enc(pte, level, (void *)addr);
+ snp_set_memory_private(addr, (size / PAGE_SIZE));
}
}
--
2.34.1
On VCN v4.0.5 there is a race condition where the WPTR is not
updated after starting from idle when doorbell is used. The read-back
of regVCN_RB1_DB_CTRL register after written is to ensure the
doorbell_index is updated before it can work properly.
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Signed-off-by: David (Ming Qiang) Wu <David.Wu3(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index ed00d35039c1..d6be8b05d7a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -1033,6 +1033,8 @@ static int vcn_v4_0_5_start_dpg_mode(struct amdgpu_vcn_inst *vinst,
WREG32_SOC15(VCN, inst_idx, regVCN_RB1_DB_CTRL,
ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT |
VCN_RB1_DB_CTRL__EN_MASK);
+ /* Read DB_CTRL to flush the write DB_CTRL command. */
+ RREG32_SOC15(VCN, inst_idx, regVCN_RB1_DB_CTRL);
return 0;
}
@@ -1195,6 +1197,8 @@ static int vcn_v4_0_5_start(struct amdgpu_vcn_inst *vinst)
WREG32_SOC15(VCN, i, regVCN_RB1_DB_CTRL,
ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT |
VCN_RB1_DB_CTRL__EN_MASK);
+ /* Read DB_CTRL to flush the write DB_CTRL command. */
+ RREG32_SOC15(VCN, i, regVCN_RB1_DB_CTRL);
WREG32_SOC15(VCN, i, regUVD_RB_BASE_LO, ring->gpu_addr);
WREG32_SOC15(VCN, i, regUVD_RB_BASE_HI, upper_32_bits(ring->gpu_addr));
--
2.49.0
The current HID bpf implementation assumes no output report/request will
go through it after hid_bpf_destroy_device() has been called. This leads
to a bug that unplugging certain types of HID devices causes a cleaned-
up SRCU to be accessed. The bug was previously a hidden failure until a
recent x86 percpu change [1] made it access not-present pages.
The bug will be triggered if the conditions below are met:
A) a device under the driver has some LEDs on
B) hid_ll_driver->request() is uninplemented (e.g., logitech-djreceiver)
If condition A is met, hidinput_led_worker() is always scheduled *after*
hid_bpf_destroy_device().
hid_destroy_device
` hid_bpf_destroy_device
` cleanup_srcu_struct(&hdev->bpf.srcu)
` hid_remove_device
` ...
` led_classdev_unregister
` led_trigger_set(led_cdev, NULL)
` led_set_brightness(led_cdev, LED_OFF)
` ...
` input_inject_event
` input_event_dispose
` hidinput_input_event
` schedule_work(&hid->led_work) [hidinput_led_worker]
This is fine when condition B is not met, where hidinput_led_worker()
calls hid_ll_driver->request(). This is the case for most HID drivers,
which implement it or use the generic one from usbhid. The driver itself
or an underlying driver will then abort processing the request.
Otherwise, hidinput_led_worker() tries hid_hw_output_report() and leads
to the bug.
hidinput_led_worker
` hid_hw_output_report
` dispatch_hid_bpf_output_report
` srcu_read_lock(&hdev->bpf.srcu)
` srcu_read_unlock(&hdev->bpf.srcu, idx)
The bug has existed since the introduction [2] of
dispatch_hid_bpf_output_report(). However, the same bug also exists in
dispatch_hid_bpf_raw_requests(), and I've reproduced (no visible effect
because of the lack of [1], but confirmed bpf.destroyed == 1) the bug
against the commit (i.e., the Fixes:) introducing the function. This is
because hidinput_led_worker() falls back to hid_hw_raw_request() when
hid_ll_driver->output_report() is uninplemented (e.g., logitech-
djreceiver).
hidinput_led_worker
` hid_hw_output_report: -ENOSYS
` hid_hw_raw_request
` dispatch_hid_bpf_raw_requests
` srcu_read_lock(&hdev->bpf.srcu)
` srcu_read_unlock(&hdev->bpf.srcu, idx)
Fix the issue by returning early in the two mentioned functions if
hid_bpf has been marked as destroyed. Though
dispatch_hid_bpf_device_event() handles input events, and there is no
evidence that it may be called after the destruction, the same check, as
a safety net, is also added to it to maintain the consistency among all
dispatch functions.
The impact of the bug on other architectures is unclear. Even if it acts
as a hidden failure, this is still dangerous because it corrupts
whatever is on the address calculated by SRCU. Thus, CC'ing the stable
list.
[1]: commit 9d7de2aa8b41 ("x86/percpu/64: Use relative percpu offsets")
[2]: commit 9286675a2aed ("HID: bpf: add HID-BPF hooks for
hid_hw_output_report")
Closes: https://lore.kernel.org/all/20250506145548.GGaBoi9Jzp3aeJizTR@fat_crate.loc…
Fixes: 8bd0488b5ea5 ("HID: bpf: add HID-BPF hooks for hid_hw_raw_requests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rong Zhang <i(a)rong.moe>
---
drivers/hid/bpf/hid_bpf_dispatch.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index 2e96ec6a3073..9a06f9b0e4ef 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -38,6 +38,9 @@ dispatch_hid_bpf_device_event(struct hid_device *hdev, enum hid_report_type type
struct hid_bpf_ops *e;
int ret;
+ if (unlikely(hdev->bpf.destroyed))
+ return ERR_PTR(-ENODEV);
+
if (type >= HID_REPORT_TYPES)
return ERR_PTR(-EINVAL);
@@ -93,6 +96,9 @@ int dispatch_hid_bpf_raw_requests(struct hid_device *hdev,
struct hid_bpf_ops *e;
int ret, idx;
+ if (unlikely(hdev->bpf.destroyed))
+ return -ENODEV;
+
if (rtype >= HID_REPORT_TYPES)
return -EINVAL;
@@ -130,6 +136,9 @@ int dispatch_hid_bpf_output_report(struct hid_device *hdev,
struct hid_bpf_ops *e;
int ret, idx;
+ if (unlikely(hdev->bpf.destroyed))
+ return -ENODEV;
+
idx = srcu_read_lock(&hdev->bpf.srcu);
list_for_each_entry_srcu(e, &hdev->bpf.prog_list, list,
srcu_read_lock_held(&hdev->bpf.srcu)) {
base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
--
2.49.0
There is a long standing bug which causes I2C communication not to
work on the Armada 3700 based boards. This small series restores
that functionality.
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
Signed-off-by: Imre Kaloz <kaloz(a)openwrt.org>
---
Gabor Juhos (3):
i2c: add init_recovery() callback
i2c: pxa: prevent calling of the generic recovery init code
i2c: pxa: handle 'Early Bus Busy' condition on Armada 3700
drivers/i2c/busses/i2c-pxa.c | 25 +++++++++++++++++++------
drivers/i2c/i2c-core-base.c | 8 +++++++-
include/linux/i2c.h | 4 ++++
3 files changed, 30 insertions(+), 7 deletions(-)
---
base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
change-id: 20250510-i2c-pxa-fix-i2c-communication-3e6de1e3d0c6
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
With an unset CONFIG_CPU_MITIGATIONS, or more precisely with
CONFIG_MITIGATION_ITS,
I get a compilation error:
ld: arch/x86/net/bpf_jit_comp.o: in function `emit_indirect_jump':
bpf_jit_comp.c:(.text+0x8cb): undefined reference to
`__x86_indirect_its_thunk_array'
make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make[1]: *** [/usr/src/linux/Makefile:1182: vmlinux] Error 2
make: *** [Makefile:224: __sub-make] Error 2
From: Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com>
If an input changes state during wake-up and is used as an interrupt
source, the IRQ handler reads the volatile input register to clear the
interrupt mask and deassert the IRQ line. However, the IRQ handler is
triggered before access to the register is granted, causing the read
operation to fail.
As a result, the IRQ handler enters a loop, repeatedly printing the
"failed reading register" message, until `pca953x_resume()` is eventually
called, which restores the driver context and enables access to
registers.
Fix by disabling the IRQ line before entering suspend mode, and
re-enabling it after the driver context is restored in `pca953x_resume()`.
An IRQ can be disabled with disable_irq() and still wake the system as
long as the IRQ has wake enabled, so the wake-up functionality is
preserved.
Fixes: b76574300504 ("gpio: pca953x: Restore registers after suspend/resume cycle")
Cc: stable(a)vger.kernel.org
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Tested-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
---
v2 -> v3
- add r-b Andy, t-b Geert
- fixed commit message
v1 -> v2
- Instead of calling PM ops with disabled interrupts, just disable the
irq while going in suspend and re-enable it after restoring the
context in resume function.
---
drivers/gpio/gpio-pca953x.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index ab2c0fd428fb..b852e4997629 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -1226,6 +1226,8 @@ static int pca953x_restore_context(struct pca953x_chip *chip)
guard(mutex)(&chip->i2c_lock);
+ if (chip->client->irq > 0)
+ enable_irq(chip->client->irq);
regcache_cache_only(chip->regmap, false);
regcache_mark_dirty(chip->regmap);
ret = pca953x_regcache_sync(chip);
@@ -1238,6 +1240,10 @@ static int pca953x_restore_context(struct pca953x_chip *chip)
static void pca953x_save_context(struct pca953x_chip *chip)
{
guard(mutex)(&chip->i2c_lock);
+
+ /* Disable IRQ to prevent early triggering while regmap "cache only" is on */
+ if (chip->client->irq > 0)
+ disable_irq(chip->client->irq);
regcache_cache_only(chip->regmap, true);
}
--
2.39.5
A user reported on the Arch Linux Forums that their device is emitting
the following message in the kernel journal, which is fixed by adding
the quirk as submitted in this patch:
> kernel: usb 1-2: current rate 8436480 is different from the runtime rate 48000
There also is an entry for this product line added long time ago.
Their specific device has the following ID:
$ lsusb | grep Audio
Bus 001 Device 002: ID 1101:0003 EasyPass Industrial Co., Ltd Audioengine D1
Link: https://bbs.archlinux.org/viewtopic.php?id=305494
Fixes: 93f9d1a4ac593 ("ALSA: usb-audio: Apply sample rate quirk for Audioengine D1")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Heusel <christian(a)heusel.eu>
---
sound/usb/quirks.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
index 9112313a9dbc005d8ab6076a6f2f4b0c0cecc64f..eb192834db68ca599770055cb563201e648f20ba 100644
--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -2250,6 +2250,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
QUIRK_FLAG_FIXED_RATE),
DEVICE_FLG(0x0fd9, 0x0008, /* Hauppauge HVR-950Q */
QUIRK_FLAG_SHARE_MEDIA_DEVICE | QUIRK_FLAG_ALIGN_TRANSFER),
+ DEVICE_FLG(0x1101, 0x0003, /* Audioengine D1 */
+ QUIRK_FLAG_GET_SAMPLE_RATE),
DEVICE_FLG(0x1224, 0x2a25, /* Jieli Technology USB PHY 2.0 */
QUIRK_FLAG_GET_SAMPLE_RATE | QUIRK_FLAG_MIC_RES_16),
DEVICE_FLG(0x1395, 0x740a, /* Sennheiser DECT */
---
base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
change-id: 20250512-audioengine-quirk-addition-718e6c86a2cc
Best regards,
--
Christian Heusel <christian(a)heusel.eu>
This patch series addresses a regression in Energy Efficient Ethernet
(EEE) handling for KSZ switches with integrated PHYs, introduced in
kernel v6.9 by commit fe0d4fd9285e ("net: phy: Keep track of EEE
configuration").
The first patch updates the DSA driver to allow phylink to properly
manage PHY EEE configuration. Since integrated PHYs handle LPI
internally and ports without integrated PHYs do not document MAC-level
LPI support, dummy MAC LPI callbacks are provided.
The second patch removes outdated EEE workarounds from the micrel PHY
driver, as they are no longer needed with correct phylink handling.
This series addresses the regression for mainline and kernels starting
from v6.14. It is not easily possible to fully fix older kernels due
to missing infrastructure changes.
Tested on KSZ9893 hardware.
Oleksij Rempel (2):
net: dsa: microchip: let phylink manage PHY EEE configuration on KSZ
switches
net: phy: micrel: remove KSZ9477 EEE quirks now handled by phylink
drivers/net/dsa/microchip/ksz_common.c | 135 ++++++++++++++++++++-----
drivers/net/phy/micrel.c | 7 --
include/linux/micrel_phy.h | 1 -
3 files changed, 107 insertions(+), 36 deletions(-)
--
2.39.5
While the set_msix() callback function in pcie-cadence-ep writes the
Table Size field correctly (N-1), the calculation of the PBA offset
is wrong because it calculates space for (N-1) entries instead of N.
This results in e.g. the following error when using QEMU with PCI
passthrough on a device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
Cc: stable(a)vger.kernel.org
Fixes: 3ef5d16f50f8 ("PCI: cadence: Add MSI-X support to Endpoint driver")
Reviewed-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/pci/controller/cadence/pcie-cadence-ep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 599ec4b1223e..112ae200b393 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -292,13 +292,14 @@ static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u8 vfn,
struct cdns_pcie *pcie = &ep->pcie;
u32 cap = CDNS_PCIE_EP_FUNC_MSIX_CAP_OFFSET;
u32 val, reg;
+ u16 actual_interrupts = interrupts + 1;
fn = cdns_pcie_get_fn_from_vfn(pcie, fn, vfn);
reg = cap + PCI_MSIX_FLAGS;
val = cdns_pcie_ep_fn_readw(pcie, fn, reg);
val &= ~PCI_MSIX_FLAGS_QSIZE;
- val |= interrupts;
+ val |= interrupts; /* 0's based value */
cdns_pcie_ep_fn_writew(pcie, fn, reg, val);
/* Set MSI-X BAR and offset */
@@ -308,7 +309,7 @@ static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u8 vfn,
/* Set PBA BAR and offset. BAR must match MSI-X BAR */
reg = cap + PCI_MSIX_PBA;
- val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+ val = (offset + (actual_interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
cdns_pcie_ep_fn_writel(pcie, fn, reg, val);
return 0;
--
2.49.0
While the set_msix() callback function in pcie-designware-ep writes the
Table Size field correctly (N-1), the calculation of the PBA offset
is wrong because it calculates space for (N-1) entries instead of N.
This results in e.g. the following error when using QEMU with PCI
passthrough on a device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
Cc: stable(a)vger.kernel.org
Fixes: 83153d9f36e2 ("PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments")
Reviewed-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/pci/controller/dwc/pcie-designware-ep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 1a0bf9341542..24026f3f3413 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -585,6 +585,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct dw_pcie_ep_func *ep_func;
u32 val, reg;
+ u16 actual_interrupts = interrupts + 1;
ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
if (!ep_func || !ep_func->msix_cap)
@@ -595,7 +596,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
reg = ep_func->msix_cap + PCI_MSIX_FLAGS;
val = dw_pcie_ep_readw_dbi(ep, func_no, reg);
val &= ~PCI_MSIX_FLAGS_QSIZE;
- val |= interrupts;
+ val |= interrupts; /* 0's based value */
dw_pcie_writew_dbi(pci, reg, val);
reg = ep_func->msix_cap + PCI_MSIX_TABLE;
@@ -603,7 +604,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
dw_pcie_ep_writel_dbi(ep, func_no, reg, val);
reg = ep_func->msix_cap + PCI_MSIX_PBA;
- val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+ val = (offset + (actual_interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
dw_pcie_ep_writel_dbi(ep, func_no, reg, val);
dw_pcie_dbi_ro_wr_dis(pci);
--
2.49.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 94cff94634e506a4a44684bee1875d2dbf782722
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051256-encrust-scribe-9996@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 94cff94634e506a4a44684bee1875d2dbf782722 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Date: Fri, 4 Apr 2025 15:31:16 +0200
Subject: [PATCH] clocksource/i8253: Use raw_spinlock_irqsave() in
clockevent_i8253_disable()
On x86 during boot, clockevent_i8253_disable() can be invoked via
x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens
with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice
that i8253_lock is used in hard interrupt context. This causes lockdep to
complain because it observed the lock being acquired with interrupts
enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with
raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index 39f7c2d736d1..b603c25f3dfa 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void)
#ifdef CONFIG_CLKEVT_I8253
void clockevent_i8253_disable(void)
{
- raw_spin_lock(&i8253_lock);
+ guard(raw_spinlock_irqsave)(&i8253_lock);
/*
* Writing the MODE register should stop the counter, according to
@@ -132,8 +132,6 @@ void clockevent_i8253_disable(void)
outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE);
-
- raw_spin_unlock(&i8253_lock);
}
static int pit_shutdown(struct clock_event_device *evt)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6beb6835c1fbb3f676aebb51a5fee6b77fed9308
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050913-rubble-confirm-99ee@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6beb6835c1fbb3f676aebb51a5fee6b77fed9308 Mon Sep 17 00:00:00 2001
From: Eelco Chaudron <echaudro(a)redhat.com>
Date: Tue, 6 May 2025 16:28:54 +0200
Subject: [PATCH] openvswitch: Fix unsafe attribute parsing in
output_userspace()
This patch replaces the manual Netlink attribute iteration in
output_userspace() with nla_for_each_nested(), which ensures that only
well-formed attributes are processed.
Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Eelco Chaudron <echaudro(a)redhat.com>
Acked-by: Ilya Maximets <i.maximets(a)ovn.org>
Acked-by: Aaron Conole <aconole(a)redhat.com>
Link: https://patch.msgid.link/0bd65949df61591d9171c0dc13e42cea8941da10.174654173…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 61fea7baae5d..2f22ca59586f 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -975,8 +975,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
upcall.cmd = OVS_PACKET_CMD_ACTION;
upcall.mru = OVS_CB(skb)->mru;
- for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
- a = nla_next(a, &rem)) {
+ nla_for_each_nested(a, attr, rem) {
switch (nla_type(a)) {
case OVS_USERSPACE_ATTR_USERDATA:
upcall.userdata = a;
From: Mario Limonciello <mario.limonciello(a)amd.com>
On VCN 4.0.5 using a doorbell to notify VCN hardware for WPTR changes
while dynamic power gating is enabled introduces a timing dependency
that can sometimes cause WPTR to not be properly updated. This manifests
as a job timeout which will trigger a VCN reset and cause the application
that submitted the job to crash.
Writing directly to the WPTR register instead of using the doorbell changes
the timing enough that the issue doesn't happen. Turn off doorbell use for
now while the issue continues to be debugged.
Cc: stable(a)vger.kernel.org
Cc: David.Wu3(a)amd.com
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3909
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index ba603b2246e2e..ea9513f65d7e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -181,7 +181,7 @@ static int vcn_v4_0_5_sw_init(struct amdgpu_ip_block *ip_block)
return r;
ring = &adev->vcn.inst[i].ring_enc[0];
- ring->use_doorbell = true;
+ ring->use_doorbell = false;
if (amdgpu_sriov_vf(adev))
ring->doorbell_index = (adev->doorbell_index.vcn.vcn_ring0_1 << 1) +
i * (adev->vcn.inst[i].num_enc_rings + 1) + 1;
--
2.43.0
The patch titled
Subject: taskstats: fix struct taskstats breaks backward compatibility since version 15
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
taskstats-fix-struct-taskstats-breaks-backward-compatibility-since-version-15.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Wang Yaxin <wang.yaxin(a)zte.com.cn>
Subject: taskstats: fix struct taskstats breaks backward compatibility since version 15
Date: Sat, 10 May 2025 15:54:13 +0800 (CST)
Problem
========
commit 658eb5ab916d ("delayacct: add delay max to record delay peak")
- adding more fields
commit f65c64f311ee ("delayacct: add delay min to record delay peak")
- adding more fields
commit b016d0873777 ("taskstats: modify taskstats version")
- version bump to 15
Since version 15 (TASKSTATS_VERSION=15) the new layout of the structure
adds fields in the middle of the structure, rendering all old software
incompatible with newer kernels and software compiled against the new
kernel headers incompatible with older kernels.
Solution
=========
move delay max and delay min to the end of taskstat, and bump
the version to 16 after the change
Link: https://lkml.kernel.org/r/20250510155413259V4JNRXxukdDgzsaL0Fo6a@zte.com.cn
Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin(a)zte.com.cn>
Signed-off-by: xu xin <xu.xin16(a)zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2(a)zte.com.cn>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/uapi/linux/taskstats.h | 43 +++++++++++++++++++------------
1 file changed, 27 insertions(+), 16 deletions(-)
--- a/include/uapi/linux/taskstats.h~taskstats-fix-struct-taskstats-breaks-backward-compatibility-since-version-15
+++ a/include/uapi/linux/taskstats.h
@@ -34,7 +34,7 @@
*/
-#define TASKSTATS_VERSION 15
+#define TASKSTATS_VERSION 16
#define TS_COMM_LEN 32 /* should be >= TASK_COMM_LEN
* in linux/sched.h */
@@ -72,8 +72,6 @@ struct taskstats {
*/
__u64 cpu_count __attribute__((aligned(8)));
__u64 cpu_delay_total;
- __u64 cpu_delay_max;
- __u64 cpu_delay_min;
/* Following four fields atomically updated using task->delays->lock */
@@ -82,14 +80,10 @@ struct taskstats {
*/
__u64 blkio_count;
__u64 blkio_delay_total;
- __u64 blkio_delay_max;
- __u64 blkio_delay_min;
/* Delay waiting for page fault I/O (swap in only) */
__u64 swapin_count;
__u64 swapin_delay_total;
- __u64 swapin_delay_max;
- __u64 swapin_delay_min;
/* cpu "wall-clock" running time
* On some architectures, value will adjust for cpu time stolen
@@ -172,14 +166,11 @@ struct taskstats {
/* Delay waiting for memory reclaim */
__u64 freepages_count;
__u64 freepages_delay_total;
- __u64 freepages_delay_max;
- __u64 freepages_delay_min;
+
/* Delay waiting for thrashing page */
__u64 thrashing_count;
__u64 thrashing_delay_total;
- __u64 thrashing_delay_max;
- __u64 thrashing_delay_min;
/* v10: 64-bit btime to avoid overflow */
__u64 ac_btime64; /* 64-bit begin time */
@@ -187,8 +178,6 @@ struct taskstats {
/* v11: Delay waiting for memory compact */
__u64 compact_count;
__u64 compact_delay_total;
- __u64 compact_delay_max;
- __u64 compact_delay_min;
/* v12 begin */
__u32 ac_tgid; /* thread group ID */
@@ -210,15 +199,37 @@ struct taskstats {
/* v13: Delay waiting for write-protect copy */
__u64 wpcopy_count;
__u64 wpcopy_delay_total;
- __u64 wpcopy_delay_max;
- __u64 wpcopy_delay_min;
/* v14: Delay waiting for IRQ/SOFTIRQ */
__u64 irq_count;
__u64 irq_delay_total;
+
+ /* v15: add Delay max and Delay min */
+
+ /* v16: move Delay max and Delay min to the end of taskstat */
+ __u64 cpu_delay_max;
+ __u64 cpu_delay_min;
+
+ __u64 blkio_delay_max;
+ __u64 blkio_delay_min;
+
+ __u64 swapin_delay_max;
+ __u64 swapin_delay_min;
+
+ __u64 freepages_delay_max;
+ __u64 freepages_delay_min;
+
+ __u64 thrashing_delay_max;
+ __u64 thrashing_delay_min;
+
+ __u64 compact_delay_max;
+ __u64 compact_delay_min;
+
+ __u64 wpcopy_delay_max;
+ __u64 wpcopy_delay_min;
+
__u64 irq_delay_max;
__u64 irq_delay_min;
- /* v15: add Delay max */
};
_
Patches currently in -mm which might be from wang.yaxin(a)zte.com.cn are
taskstats-fix-struct-taskstats-breaks-backward-compatibility-since-version-15.patch
In defining VEND1_GLOBAL_LED_PROV_ACT_STRETCH there was a typo where the
GENMASK definition was swapped.
Fix it to prevent any kind of misconfiguration if ever this define will
be used in the future.
Cc: <stable(a)vger.kernel.org>
Fixes: 61578f679378 ("net: phy: aquantia: add support for PHY LEDs")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
drivers/net/phy/aquantia/aquantia.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/aquantia/aquantia.h b/drivers/net/phy/aquantia/aquantia.h
index 0c78bfabace5..e2bb66a21589 100644
--- a/drivers/net/phy/aquantia/aquantia.h
+++ b/drivers/net/phy/aquantia/aquantia.h
@@ -76,7 +76,7 @@
#define VEND1_GLOBAL_LED_PROV_LINK100 BIT(5)
#define VEND1_GLOBAL_LED_PROV_RX_ACT BIT(3)
#define VEND1_GLOBAL_LED_PROV_TX_ACT BIT(2)
-#define VEND1_GLOBAL_LED_PROV_ACT_STRETCH GENMASK(0, 1)
+#define VEND1_GLOBAL_LED_PROV_ACT_STRETCH GENMASK(1, 0)
#define VEND1_GLOBAL_LED_PROV_LINK_MASK (VEND1_GLOBAL_LED_PROV_LINK100 | \
VEND1_GLOBAL_LED_PROV_LINK1000 | \
--
2.48.1
Hi All,
Chages since v6:
- do not unnecessary free pages across iterations
Chages since v5:
- full error message included into commit description
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 76 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 62 insertions(+), 14 deletions(-)
--
2.45.2
From: Steve Wilkins <steve.wilkins(a)raymarine.com>
[ Upstream commit 9cf71eb0faef4bff01df4264841b8465382d7927 ]
While transmitting with rx_len == 0, the RX FIFO is not going to be
emptied in the interrupt handler. A subsequent transfer could then
read crap from the previous transfer out of the RX FIFO into the
start RX buffer. The core provides a register that will empty the RX and
TX FIFOs, so do that before each transfer.
Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Signed-off-by: Steve Wilkins <steve.wilkins(a)raymarine.com>
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Link: https://patch.msgid.link/20240715-flammable-provoke-459226d08e70@wendy
Signed-off-by: Mark Brown <broonie(a)kernel.org>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
drivers/spi/spi-microchip-core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c
index bfad0fe743ad..acc05f5a929e 100644
--- a/drivers/spi/spi-microchip-core.c
+++ b/drivers/spi/spi-microchip-core.c
@@ -91,6 +91,8 @@
#define REG_CONTROL2 (0x28)
#define REG_COMMAND (0x2c)
#define COMMAND_CLRFRAMECNT BIT(4)
+#define COMMAND_TXFIFORST BIT(3)
+#define COMMAND_RXFIFORST BIT(2)
#define REG_PKTSIZE (0x30)
#define REG_CMD_SIZE (0x34)
#define REG_HWSTATUS (0x38)
@@ -489,6 +491,8 @@ static int mchp_corespi_transfer_one(struct spi_controller *host,
mchp_corespi_set_xfer_size(spi, (spi->tx_len > FIFO_DEPTH)
? FIFO_DEPTH : spi->tx_len);
+ mchp_corespi_write(spi, REG_COMMAND, COMMAND_RXFIFORST | COMMAND_TXFIFORST);
+
while (spi->tx_len)
mchp_corespi_write_fifo(spi);
--
2.34.1
On 5/12/25 05:34, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> memblock: Accept allocated memory before use in memblock_double_array()
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> memblock-accept-allocated-memory-before-use-in-memblock_double_array.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
The 6.6 version of the patch needs a fixup. As mentioned in the patch
description, any release before v6.12 needs to have the accept_memory()
call changed from:
accept_memory(addr, new_alloc_size);
to
accept_memory(addr, addr + new_alloc_size);
Do you need for me to send a v6.6 specific patch?
Thanks,
Tom
>
>
> From da8bf5daa5e55a6af2b285ecda460d6454712ff4 Mon Sep 17 00:00:00 2001
> From: Tom Lendacky <thomas.lendacky(a)amd.com>
> Date: Thu, 8 May 2025 12:24:10 -0500
> Subject: memblock: Accept allocated memory before use in memblock_double_array()
>
> From: Tom Lendacky <thomas.lendacky(a)amd.com>
>
> commit da8bf5daa5e55a6af2b285ecda460d6454712ff4 upstream.
>
> When increasing the array size in memblock_double_array() and the slab
> is not yet available, a call to memblock_find_in_range() is used to
> reserve/allocate memory. However, the range returned may not have been
> accepted, which can result in a crash when booting an SNP guest:
>
> RIP: 0010:memcpy_orig+0x68/0x130
> Code: ...
> RSP: 0000:ffffffff9cc03ce8 EFLAGS: 00010006
> RAX: ff11001ff83e5000 RBX: 0000000000000000 RCX: fffffffffffff000
> RDX: 0000000000000bc0 RSI: ffffffff9dba8860 RDI: ff11001ff83e5c00
> RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000002000
> R10: 000000207fffe000 R11: 0000040000000000 R12: ffffffff9d06ef78
> R13: ff11001ff83e5000 R14: ffffffff9dba7c60 R15: 0000000000000c00
> memblock_double_array+0xff/0x310
> memblock_add_range+0x1fb/0x2f0
> memblock_reserve+0x4f/0xa0
> memblock_alloc_range_nid+0xac/0x130
> memblock_alloc_internal+0x53/0xc0
> memblock_alloc_try_nid+0x3d/0xa0
> swiotlb_init_remap+0x149/0x2f0
> mem_init+0xb/0xb0
> mm_core_init+0x8f/0x350
> start_kernel+0x17e/0x5d0
> x86_64_start_reservations+0x14/0x30
> x86_64_start_kernel+0x92/0xa0
> secondary_startup_64_no_verify+0x194/0x19b
>
> Mitigate this by calling accept_memory() on the memory range returned
> before the slab is available.
>
> Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
> parameter instead of 'start' and 'size', therefore the accept_memory()
> call must be adjusted to specify 'start + size' for 'end' when applying
> to kernels prior to v6.12.
>
> Cc: stable(a)vger.kernel.org # see patch description, needs adjustments for <= 6.11
> Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
> Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
> Link: https://lore.kernel.org/r/da1ac73bf4ded761e21b4e4bb5178382a580cd73.17467250…
> Signed-off-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> mm/memblock.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -460,7 +460,14 @@ static int __init_memblock memblock_doub
> min(new_area_start, memblock.current_limit),
> new_alloc_size, PAGE_SIZE);
>
> - new_array = addr ? __va(addr) : NULL;
> + if (addr) {
> + /* The memory may not have been accepted, yet. */
> + accept_memory(addr, new_alloc_size);
> +
> + new_array = __va(addr);
> + } else {
> + new_array = NULL;
> + }
> }
> if (!addr) {
> pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n",
>
>
> Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
>
> queue-6.6/memblock-accept-allocated-memory-before-use-in-memblock_double_array.patch
From: Andrii Nakryiko <andrii(a)kernel.org>
commit 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce upstream.
BPF link for some program types is passed as a "context" which can be
used by those BPF programs to look up additional information. E.g., for
multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values.
Because of this runtime dependency, when bpf_link refcnt drops to zero
there could still be active BPF programs running accessing link data.
This patch adds generic support to defer bpf_link dealloc callback to
after RCU GP, if requested. This is done by exposing two different
deallocation callbacks, one synchronous and one deferred. If deferred
one is provided, bpf_link_free() will schedule dealloc_deferred()
callback to happen after RCU GP.
BPF is using two flavors of RCU: "classic" non-sleepable one and RCU
tasks trace one. The latter is used when sleepable BPF programs are
used. bpf_link_free() accommodates that by checking underlying BPF
program's sleepable flag, and goes either through normal RCU GP only for
non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP
(taking into account rcu_trace_implies_rcu_gp() optimization), if BPF
program is sleepable.
We use this for multi-kprobe and multi-uprobe links, which dereference
link during program run. We also preventively switch raw_tp link to use
deferred dealloc callback, as upcoming changes in bpf-next tree expose
raw_tp link data (specifically, cookie value) to BPF program at runtime
as well.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Reported-by: syzbot+981935d9485a560bfbcb(a)syzkaller.appspotmail.com
Reported-by: syzbot+2cb5a6c573e98db598cc(a)syzkaller.appspotmail.com
Reported-by: syzbot+62d8b26793e8a2bd0516(a)syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
[fixed conflicts due to missing commits 89ae89f53d20
("bpf: Add multi uprobe link")]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
include/linux/bpf.h | 16 +++++++++++++++-
kernel/bpf/syscall.c | 35 ++++++++++++++++++++++++++++++++---
kernel/trace/bpf_trace.c | 2 +-
3 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e9c1338851e3..1cf8c7037289 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1293,12 +1293,26 @@ struct bpf_link {
enum bpf_link_type type;
const struct bpf_link_ops *ops;
struct bpf_prog *prog;
- struct work_struct work;
+ /* rcu is used before freeing, work can be used to schedule that
+ * RCU-based freeing before that, so they never overlap
+ */
+ union {
+ struct rcu_head rcu;
+ struct work_struct work;
+ };
};
struct bpf_link_ops {
void (*release)(struct bpf_link *link);
+ /* deallocate link resources callback, called without RCU grace period
+ * waiting
+ */
void (*dealloc)(struct bpf_link *link);
+ /* deallocate link resources callback, called after RCU grace period;
+ * if underlying BPF program is sleepable we go through tasks trace
+ * RCU GP and then "classic" RCU GP
+ */
+ void (*dealloc_deferred)(struct bpf_link *link);
int (*detach)(struct bpf_link *link);
int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
struct bpf_prog *old_prog);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 27fdf1b2fc46..1cc9b28b065a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2750,17 +2750,46 @@ void bpf_link_inc(struct bpf_link *link)
atomic64_inc(&link->refcnt);
}
+static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu)
+{
+ struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
+
+ /* free bpf_link and its containing memory */
+ link->ops->dealloc_deferred(link);
+}
+
+static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
+{
+ if (rcu_trace_implies_rcu_gp())
+ bpf_link_defer_dealloc_rcu_gp(rcu);
+ else
+ call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp);
+}
+
/* bpf_link_free is guaranteed to be called from process context */
static void bpf_link_free(struct bpf_link *link)
{
+ bool sleepable = false;
+
bpf_link_free_id(link->id);
if (link->prog) {
+ sleepable = link->prog->aux->sleepable;
/* detach BPF program, clean up used resources */
link->ops->release(link);
bpf_prog_put(link->prog);
}
- /* free bpf_link and its containing memory */
- link->ops->dealloc(link);
+ if (link->ops->dealloc_deferred) {
+ /* schedule BPF link deallocation; if underlying BPF program
+ * is sleepable, we need to first wait for RCU tasks trace
+ * sync, then go through "classic" RCU grace period
+ */
+ if (sleepable)
+ call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
+ else
+ call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
+ }
+ if (link->ops->dealloc)
+ link->ops->dealloc(link);
}
static void bpf_link_put_deferred(struct work_struct *work)
@@ -3246,7 +3275,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_raw_tp_link_lops = {
.release = bpf_raw_tp_link_release,
- .dealloc = bpf_raw_tp_link_dealloc,
+ .dealloc_deferred = bpf_raw_tp_link_dealloc,
.show_fdinfo = bpf_raw_tp_link_show_fdinfo,
.fill_link_info = bpf_raw_tp_link_fill_link_info,
};
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7254c808b27c..989b6843069e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2564,7 +2564,7 @@ static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link)
static const struct bpf_link_ops bpf_kprobe_multi_link_lops = {
.release = bpf_kprobe_multi_link_release,
- .dealloc = bpf_kprobe_multi_link_dealloc,
+ .dealloc_deferred = bpf_kprobe_multi_link_dealloc,
};
static void bpf_kprobe_multi_cookie_swap(void *a, void *b, int size, const void *priv)
--
2.34.1
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x a39f3087092716f2bd531d6fdc20403c3dc2a879
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051210-devourer-antarctic-1ed3@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a39f3087092716f2bd531d6fdc20403c3dc2a879 Mon Sep 17 00:00:00 2001
From: Miguel Ojeda <ojeda(a)kernel.org>
Date: Fri, 2 May 2025 16:02:34 +0200
Subject: [PATCH] rust: allow Rust 1.87.0's `clippy::ptr_eq` lint
Starting with Rust 1.87.0 (expected 2025-05-15) [1], Clippy may expand
the `ptr_eq` lint, e.g.:
error: use `core::ptr::eq` when comparing raw pointers
--> rust/kernel/list.rs:438:12
|
438 | if self.first == item {
| ^^^^^^^^^^^^^^^^^^ help: try: `core::ptr::eq(self.first, item)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_eq
= note: `-D clippy::ptr-eq` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::ptr_eq)]`
It is expected that a PR to relax the lint will be backported [2] by
the time Rust 1.87.0 releases, since the lint was considered too eager
(at least by default) [3].
Thus allow the lint temporarily just in case.
Cc: stable(a)vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust-clippy/pull/14339 [1]
Link: https://github.com/rust-lang/rust-clippy/pull/14526 [2]
Link: https://github.com/rust-lang/rust-clippy/issues/14525 [3]
Link: https://lore.kernel.org/r/20250502140237.1659624-3-ojeda@kernel.org
[ Converted to `allow`s since backport was confirmed. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/rust/kernel/alloc/kvec.rs b/rust/kernel/alloc/kvec.rs
index ae9d072741ce..87a71fd40c3c 100644
--- a/rust/kernel/alloc/kvec.rs
+++ b/rust/kernel/alloc/kvec.rs
@@ -2,6 +2,9 @@
//! Implementation of [`Vec`].
+// May not be needed in Rust 1.87.0 (pending beta backport).
+#![allow(clippy::ptr_eq)]
+
use super::{
allocator::{KVmalloc, Kmalloc, Vmalloc},
layout::ArrayLayout,
diff --git a/rust/kernel/list.rs b/rust/kernel/list.rs
index a335c3b1ff5e..2054682c5724 100644
--- a/rust/kernel/list.rs
+++ b/rust/kernel/list.rs
@@ -4,6 +4,9 @@
//! A linked list implementation.
+// May not be needed in Rust 1.87.0 (pending beta backport).
+#![allow(clippy::ptr_eq)]
+
use crate::sync::ArcBorrow;
use crate::types::Opaque;
use core::iter::{DoubleEndedIterator, FusedIterator};
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit 52f671db18823089a02f07efc04efdb2272ddc17 ]
The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.
The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.
In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.
Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
Suggested-by: Davide Caratti <dcaratti(a)redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.166…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Acked-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
net/sched/act_mirred.c | 14 +++++---------
.../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
2 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index bbc34987bd09..896bffd50aa8 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -205,18 +205,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
return err;
}
-static bool is_mirred_nested(void)
-{
- return unlikely(__this_cpu_read(mirred_nest_level) > 1);
-}
-
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int
+tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
{
int err;
if (!want_ingress)
err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
- else if (is_mirred_nested())
+ else if (!at_ingress)
err = netif_rx(skb);
else
err = netif_receive_skb(skb);
@@ -312,7 +308,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
/* let's the caller reinsert the packet, if possible */
if (use_reinsert) {
- err = tcf_mirred_forward(want_ingress, skb);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb);
if (err)
tcf_action_inc_overlimit_qstats(&m->common);
__this_cpu_dec(mirred_nest_level);
@@ -320,7 +316,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
}
}
- err = tcf_mirred_forward(want_ingress, skb2);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb2);
if (err)
tcf_action_inc_overlimit_qstats(&m->common);
__this_cpu_dec(mirred_nest_level);
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index b0f5e55d2d0b..589629636502 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
check_err $? "didn't mirred redirect ICMP"
tc_check_packets "dev $h1 ingress" 102 10
check_err $? "didn't drop mirred ICMP"
- local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
- test ${overlimits} = 10
- check_err $? "wrong overlimits, expected 10 got ${overlimits}"
tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
--
2.34.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x a39f3087092716f2bd531d6fdc20403c3dc2a879
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051211-unclothed-common-f6cf@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a39f3087092716f2bd531d6fdc20403c3dc2a879 Mon Sep 17 00:00:00 2001
From: Miguel Ojeda <ojeda(a)kernel.org>
Date: Fri, 2 May 2025 16:02:34 +0200
Subject: [PATCH] rust: allow Rust 1.87.0's `clippy::ptr_eq` lint
Starting with Rust 1.87.0 (expected 2025-05-15) [1], Clippy may expand
the `ptr_eq` lint, e.g.:
error: use `core::ptr::eq` when comparing raw pointers
--> rust/kernel/list.rs:438:12
|
438 | if self.first == item {
| ^^^^^^^^^^^^^^^^^^ help: try: `core::ptr::eq(self.first, item)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_eq
= note: `-D clippy::ptr-eq` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::ptr_eq)]`
It is expected that a PR to relax the lint will be backported [2] by
the time Rust 1.87.0 releases, since the lint was considered too eager
(at least by default) [3].
Thus allow the lint temporarily just in case.
Cc: stable(a)vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust-clippy/pull/14339 [1]
Link: https://github.com/rust-lang/rust-clippy/pull/14526 [2]
Link: https://github.com/rust-lang/rust-clippy/issues/14525 [3]
Link: https://lore.kernel.org/r/20250502140237.1659624-3-ojeda@kernel.org
[ Converted to `allow`s since backport was confirmed. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/rust/kernel/alloc/kvec.rs b/rust/kernel/alloc/kvec.rs
index ae9d072741ce..87a71fd40c3c 100644
--- a/rust/kernel/alloc/kvec.rs
+++ b/rust/kernel/alloc/kvec.rs
@@ -2,6 +2,9 @@
//! Implementation of [`Vec`].
+// May not be needed in Rust 1.87.0 (pending beta backport).
+#![allow(clippy::ptr_eq)]
+
use super::{
allocator::{KVmalloc, Kmalloc, Vmalloc},
layout::ArrayLayout,
diff --git a/rust/kernel/list.rs b/rust/kernel/list.rs
index a335c3b1ff5e..2054682c5724 100644
--- a/rust/kernel/list.rs
+++ b/rust/kernel/list.rs
@@ -4,6 +4,9 @@
//! A linked list implementation.
+// May not be needed in Rust 1.87.0 (pending beta backport).
+#![allow(clippy::ptr_eq)]
+
use crate::sync::ArcBorrow;
use crate::types::Opaque;
use core::iter::{DoubleEndedIterator, FusedIterator};
The patch titled
Subject: XArray: fix kmemleak false positive in xas_shrink()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
xarray-fix-kmemleak-false-positive-in-xas_shrink.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jared Kangas <jkangas(a)redhat.com>
Subject: XArray: fix kmemleak false positive in xas_shrink()
Date: Mon, 12 May 2025 12:17:07 -0700
Kmemleak periodically produces a false positive report that resembles
the following:
unreferenced object 0xffff0000c105ed08 (size 576):
comm "swapper/0", pid 1, jiffies 4294937478
hex dump (first 32 bytes):
00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
d8 e7 0a 8b 00 80 ff ff 20 ed 05 c1 00 00 ff ff ........ .......
backtrace (crc 69e99671):
kmemleak_alloc+0xb4/0xc4
kmem_cache_alloc_lru+0x1f0/0x244
xas_alloc+0x2a0/0x3a0
xas_expand.constprop.0+0x144/0x4dc
xas_create+0x2b0/0x484
xas_store+0x60/0xa00
__xa_alloc+0x194/0x280
__xa_alloc_cyclic+0x104/0x2e0
dev_index_reserve+0xd8/0x18c
register_netdevice+0x5e8/0xf90
register_netdev+0x28/0x50
loopback_net_init+0x68/0x114
ops_init+0x90/0x2c0
register_pernet_operations+0x20c/0x554
register_pernet_device+0x3c/0x8c
net_dev_init+0x5cc/0x7d8
This transient leak can be traced to xas_shrink(): when the xarray's
head is reassigned, kmemleak may have already started scanning the
xarray. When this happens, if kmemleak fails to scan the new xa_head
before it moves, kmemleak will see it as a leak until the xarray is
scanned again.
The report can be reproduced by running the xdp_bonding BPF selftest,
although it doesn't appear consistently due to the bug's transience.
In my testing, the following script has reliably triggered the report in
under an hour on a debug kernel with kmemleak enabled, where KSELFTESTS
is set to the install path for the kernel selftests:
#!/bin/sh
set -eu
echo 1 >/sys/module/kmemleak/parameters/verbose
echo scan=1 >/sys/kernel/debug/kmemleak
while :; do
$KSELFTESTS/bpf/test_progs -t xdp_bonding
done
To prevent this false positive report, mark the new xa_head in
xas_shrink() as a transient leak.
Link: https://lkml.kernel.org/r/20250512191707.245153-1-jkangas@redhat.com
Signed-off-by: Jared Kangas <jkangas(a)redhat.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Jared Kangas <jkangas(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/xarray.c | 2 ++
1 file changed, 2 insertions(+)
--- a/lib/xarray.c~xarray-fix-kmemleak-false-positive-in-xas_shrink
+++ a/lib/xarray.c
@@ -8,6 +8,7 @@
#include <linux/bitmap.h>
#include <linux/export.h>
+#include <linux/kmemleak.h>
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/xarray.h>
@@ -476,6 +477,7 @@ static void xas_shrink(struct xa_state *
break;
node = xa_to_node(entry);
node->parent = NULL;
+ kmemleak_transient_leak(node);
}
}
_
Patches currently in -mm which might be from jkangas(a)redhat.com are
xarray-fix-kmemleak-false-positive-in-xas_shrink.patch
When increasing the array size in memblock_double_array() and the slab
is not yet available, a call to memblock_find_in_range() is used to
reserve/allocate memory. However, the range returned may not have been
accepted, which can result in a crash when booting an SNP guest:
RIP: 0010:memcpy_orig+0x68/0x130
Code: ...
RSP: 0000:ffffffff9cc03ce8 EFLAGS: 00010006
RAX: ff11001ff83e5000 RBX: 0000000000000000 RCX: fffffffffffff000
RDX: 0000000000000bc0 RSI: ffffffff9dba8860 RDI: ff11001ff83e5c00
RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000002000
R10: 000000207fffe000 R11: 0000040000000000 R12: ffffffff9d06ef78
R13: ff11001ff83e5000 R14: ffffffff9dba7c60 R15: 0000000000000c00
memblock_double_array+0xff/0x310
memblock_add_range+0x1fb/0x2f0
memblock_reserve+0x4f/0xa0
memblock_alloc_range_nid+0xac/0x130
memblock_alloc_internal+0x53/0xc0
memblock_alloc_try_nid+0x3d/0xa0
swiotlb_init_remap+0x149/0x2f0
mem_init+0xb/0xb0
mm_core_init+0x8f/0x350
start_kernel+0x17e/0x5d0
x86_64_start_reservations+0x14/0x30
x86_64_start_kernel+0x92/0xa0
secondary_startup_64_no_verify+0x194/0x19b
Mitigate this by calling accept_memory() on the memory range returned
before the slab is available.
Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
parameter instead of 'start' and 'size', therefore the accept_memory()
call must be adjusted to specify 'start + size' for 'end' when applying
to kernels prior to v6.12.
Cc: <stable(a)vger.kernel.org> # see patch description, needs adjustments for <= 6.11
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
---
mm/memblock.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 0a53db4d9f7b..30fa553e9634 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -457,7 +457,14 @@ static int __init_memblock memblock_double_array(struct memblock_type *type,
min(new_area_start, memblock.current_limit),
new_alloc_size, PAGE_SIZE);
- new_array = addr ? __va(addr) : NULL;
+ if (addr) {
+ /* The memory may not have been accepted, yet. */
+ accept_memory(addr, new_alloc_size);
+
+ new_array = __va(addr);
+ } else {
+ new_array = NULL;
+ }
}
if (!addr) {
pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n",
base-commit: fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2
--
2.46.2
From: Zack Rusin <zack.rusin(a)broadcom.com>
commit e58337100721f3cc0c7424a18730e4f39844934f upstream.
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.
vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.
dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.
Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 2298e804e96e ("drm/vmwgfx: rework to new fence interface, v2")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.2+
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240722184313.181318-2-zack.…
[Minor context change fixed]
Signed-off-by: Zhi Yang <Zhi.Yang(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6bacdb7583df..0505f87d13c0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -32,7 +32,6 @@
#define VMW_FENCE_WRAP (1 << 31)
struct vmw_fence_manager {
- int num_fence_objects;
struct vmw_private *dev_priv;
spinlock_t lock;
struct list_head fence_list;
@@ -113,13 +112,13 @@ static void vmw_fence_obj_destroy(struct dma_fence *f)
{
struct vmw_fence_obj *fence =
container_of(f, struct vmw_fence_obj, base);
-
struct vmw_fence_manager *fman = fman_from_fence(fence);
- spin_lock(&fman->lock);
- list_del_init(&fence->head);
- --fman->num_fence_objects;
- spin_unlock(&fman->lock);
+ if (!list_empty(&fence->head)) {
+ spin_lock(&fman->lock);
+ list_del_init(&fence->head);
+ spin_unlock(&fman->lock);
+ }
fence->destroy(fence);
}
@@ -250,7 +249,6 @@ static const struct dma_fence_ops vmw_fence_ops = {
.release = vmw_fence_obj_destroy,
};
-
/**
* Execute signal actions on fences recently signaled.
* This is done from a workqueue so we don't have to execute
@@ -353,7 +351,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
goto out_unlock;
}
list_add_tail(&fence->head, &fman->fence_list);
- ++fman->num_fence_objects;
out_unlock:
spin_unlock(&fman->lock);
@@ -402,7 +399,7 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman,
{
u32 goal_seqno;
u32 *fifo_mem;
- struct vmw_fence_obj *fence;
+ struct vmw_fence_obj *fence, *next_fence;
if (likely(!fman->seqno_valid))
return false;
@@ -413,7 +410,7 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman,
return false;
fman->seqno_valid = false;
- list_for_each_entry(fence, &fman->fence_list, head) {
+ list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
vmw_mmio_write(fence->base.seqno,
--
2.34.1
From: Dan Carpenter <dan.carpenter(a)linaro.org>
[ Upstream commit e56aac6e5a25630645607b6856d4b2a17b2311a5 ]
The "command" variable can be controlled by the user via debugfs. The
worry is that if con_index is zero then "&uc->ucsi->connector[con_index
- 1]" would be an array underflow.
Fixes: 170a6726d0e2 ("usb: typec: ucsi: add support for separate DP altmode devices")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Link: https://lore.kernel.org/r/c69ef0b3-61b0-4dde-98dd-97b97f81d912@stanley.moun…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[The function ucsi_ccg_sync_write() is renamed to ucsi_ccg_sync_control()
in commit 13f2ec3115c8 ("usb: typec: ucsi:simplify command sending API").
Apply this patch to ucsi_ccg_sync_write() in 5.15.y accordingly.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
drivers/usb/typec/ucsi/ucsi_ccg.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c b/drivers/usb/typec/ucsi/ucsi_ccg.c
index fb6211efb5d8..3983bf21a580 100644
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c
+++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
@@ -573,6 +573,10 @@ static int ucsi_ccg_sync_write(struct ucsi *ucsi, unsigned int offset,
uc->has_multiple_dp) {
con_index = (uc->last_cmd_sent >> 16) &
UCSI_CMD_CONNECTOR_MASK;
+ if (con_index == 0) {
+ ret = -EINVAL;
+ goto unlock;
+ }
con = &uc->ucsi->connector[con_index - 1];
ucsi_ccg_update_set_new_cam_cmd(uc, con, (u64 *)val);
}
@@ -588,6 +592,7 @@ static int ucsi_ccg_sync_write(struct ucsi *ucsi, unsigned int offset,
err_clear_bit:
clear_bit(DEV_CMD_PENDING, &uc->flags);
pm_runtime_put_sync(uc->dev);
+unlock:
mutex_unlock(&uc->lock);
return ret;
--
2.34.1
This reverts commit 887c5c12e80c ("um: work around sched_yield not yielding in time-travel mode")
added with v6.12.25.
Reason being that the patch depends on at least
commit 0b8b2668f998 ("um: insert scheduler ticks when userspace does not yield")
in order to build. Otherwise it fails with:
| /usr/bin/ld: arch/um/kernel/skas/syscall.o: in function `handle_syscall':
| linux-6.12.27/arch/um/kernel/skas/syscall.c:43:(.text+0xa2): undefined
| reference to `tt_extra_sched_jiffies'
| collect2: error: ld returned 1 exit status
The author Benjamin Berg commented: "I think it is better to just not backport
commit 0b8b2668f998 ("um: insert scheduler ticks when userspace does not yield")
"
Link: https://lore.kernel.org/linux-um/8ce0b6056a9726e540f61bce77311278654219eb.c…
Cc: <stable(a)vger.kernel.org> # 6.12.y
Cc: Benjamin Berg <benjamin.berg(a)intel.com>
Signed-off-by: Christian Lamparter <chunkeey(a)gmail.com>
---
Just in case, this throws someone else for a space-time loop:
What's interessting/very strange strange about this time-travel stuff:
|commit 0b8b2668f998 ("um: insert scheduler ticks when userspace does not yield")
$ git describe 0b8b2668f998
=> v6.12-rc2-43-g0b8b2668f998
(from what I know this is 43 patches on top of v6.12-rc2 as per the man page:
"The command finds the most recent tag that is reachable from a commit. [...]
it suffixes the tag name with the number of additional commits on top of the tagged
object and the abbreviated object name of the most recent commit."
But it was merged as part of: uml-for-linus-6.13-rc1 :
https://lore.kernel.org/lkml/1155823186.11802667.1732921581257.JavaMail.zim…
---
arch/um/include/linux/time-internal.h | 2 --
arch/um/kernel/skas/syscall.c | 11 -----------
2 files changed, 13 deletions(-)
diff --git a/arch/um/include/linux/time-internal.h b/arch/um/include/linux/time-internal.h
index 138908b999d7..b22226634ff6 100644
--- a/arch/um/include/linux/time-internal.h
+++ b/arch/um/include/linux/time-internal.h
@@ -83,8 +83,6 @@ extern void time_travel_not_configured(void);
#define time_travel_del_event(...) time_travel_not_configured()
#endif /* CONFIG_UML_TIME_TRAVEL_SUPPORT */
-extern unsigned long tt_extra_sched_jiffies;
-
/*
* Without CONFIG_UML_TIME_TRAVEL_SUPPORT this is a linker error if used,
* which is intentional since we really shouldn't link it in that case.
diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
index a5beaea2967e..b09e85279d2b 100644
--- a/arch/um/kernel/skas/syscall.c
+++ b/arch/um/kernel/skas/syscall.c
@@ -31,17 +31,6 @@ void handle_syscall(struct uml_pt_regs *r)
goto out;
syscall = UPT_SYSCALL_NR(r);
-
- /*
- * If no time passes, then sched_yield may not actually yield, causing
- * broken spinlock implementations in userspace (ASAN) to hang for long
- * periods of time.
- */
- if ((time_travel_mode == TT_MODE_INFCPU ||
- time_travel_mode == TT_MODE_EXTERNAL) &&
- syscall == __NR_sched_yield)
- tt_extra_sched_jiffies += 1;
-
if (syscall >= 0 && syscall < __NR_syscalls) {
unsigned long ret = EXECUTE_SYSCALL(syscall, regs);
--
2.49.0
From: Linus Torvalds <torvalds(a)linux-foundation.org>
[ Upstream commit 6bd23e0c2bb6c65d4f5754d1456bc9a4427fc59b ]
... and use it to limit the virtual terminals to just N_TTY. They are
kind of special, and in particular, the "con_write()" routine violates
the "writes cannot sleep" rule that some ldiscs rely on.
This avoids the
BUG: sleeping function called from invalid context at kernel/printk/printk.c:2659
when N_GSM has been attached to a virtual console, and gsmld_write()
calls con_write() while holding a spinlock, and con_write() then tries
to get the console lock.
Tested-by: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Daniel Starke <daniel.starke(a)siemens.com>
Reported-by: syzbot <syzbot+dbac96d8e73b61aa559c(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=dbac96d8e73b61aa559c
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20240423163339.59780-1-torvalds@linux-foundation.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Minor conflict resolved due to code context change. And also backport description
comments for struct tty_operations.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
drivers/tty/tty_ldisc.c | 6 +
drivers/tty/vt/vt.c | 10 ++
include/linux/tty_driver.h | 339 +++++++++++++++++++++++++++++++++++++
3 files changed, 355 insertions(+)
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 7262f45b513b..0dae579efdd9 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -579,6 +579,12 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
goto out;
}
+ if (tty->ops->ldisc_ok) {
+ retval = tty->ops->ldisc_ok(tty, disc);
+ if (retval)
+ goto out;
+ }
+
old_ldisc = tty->ldisc;
/* Shutdown the old discipline. */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 5d9de3a53548..a772c614a878 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3448,6 +3448,15 @@ static void con_cleanup(struct tty_struct *tty)
tty_port_put(&vc->port);
}
+/*
+ * We can't deal with anything but the N_TTY ldisc,
+ * because we can sleep in our write() routine.
+ */
+static int con_ldisc_ok(struct tty_struct *tty, int ldisc)
+{
+ return ldisc == N_TTY ? 0 : -EINVAL;
+}
+
static int default_color = 7; /* white */
static int default_italic_color = 2; // green (ASCII)
static int default_underline_color = 3; // cyan (ASCII)
@@ -3576,6 +3585,7 @@ static const struct tty_operations con_ops = {
.resize = vt_resize,
.shutdown = con_shutdown,
.cleanup = con_cleanup,
+ .ldisc_ok = con_ldisc_ok,
};
static struct cdev vc0_cdev;
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index 2f719b471d52..3412eb7280da 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -243,6 +243,344 @@ struct tty_driver;
struct serial_icounter_struct;
struct serial_struct;
+/**
+ * struct tty_operations -- interface between driver and tty
+ *
+ * @lookup: ``struct tty_struct *()(struct tty_driver *self, struct file *,
+ * int idx)``
+ *
+ * Return the tty device corresponding to @idx, %NULL if there is not
+ * one currently in use and an %ERR_PTR value on error. Called under
+ * %tty_mutex (for now!)
+ *
+ * Optional method. Default behaviour is to use the @self->ttys array.
+ *
+ * @install: ``int ()(struct tty_driver *self, struct tty_struct *tty)``
+ *
+ * Install a new @tty into the @self's internal tables. Used in
+ * conjunction with @lookup and @remove methods.
+ *
+ * Optional method. Default behaviour is to use the @self->ttys array.
+ *
+ * @remove: ``void ()(struct tty_driver *self, struct tty_struct *tty)``
+ *
+ * Remove a closed @tty from the @self's internal tables. Used in
+ * conjunction with @lookup and @remove methods.
+ *
+ * Optional method. Default behaviour is to use the @self->ttys array.
+ *
+ * @open: ``int ()(struct tty_struct *tty, struct file *)``
+ *
+ * This routine is called when a particular @tty device is opened. This
+ * routine is mandatory; if this routine is not filled in, the attempted
+ * open will fail with %ENODEV.
+ *
+ * Required method. Called with tty lock held. May sleep.
+ *
+ * @close: ``void ()(struct tty_struct *tty, struct file *)``
+ *
+ * This routine is called when a particular @tty device is closed. At the
+ * point of return from this call the driver must make no further ldisc
+ * calls of any kind.
+ *
+ * Remark: called even if the corresponding @open() failed.
+ *
+ * Required method. Called with tty lock held. May sleep.
+ *
+ * @shutdown: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine is called under the tty lock when a particular @tty device
+ * is closed for the last time. It executes before the @tty resources
+ * are freed so may execute while another function holds a @tty kref.
+ *
+ * @cleanup: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine is called asynchronously when a particular @tty device
+ * is closed for the last time freeing up the resources. This is
+ * actually the second part of shutdown for routines that might sleep.
+ *
+ * @write: ``int ()(struct tty_struct *tty, const unsigned char *buf,
+ * int count)``
+ *
+ * This routine is called by the kernel to write a series (@count) of
+ * characters (@buf) to the @tty device. The characters may come from
+ * user space or kernel space. This routine will return the
+ * number of characters actually accepted for writing.
+ *
+ * May occur in parallel in special cases. Because this includes panic
+ * paths drivers generally shouldn't try and do clever locking here.
+ *
+ * Optional: Required for writable devices. May not sleep.
+ *
+ * @put_char: ``int ()(struct tty_struct *tty, unsigned char ch)``
+ *
+ * This routine is called by the kernel to write a single character @ch to
+ * the @tty device. If the kernel uses this routine, it must call the
+ * @flush_chars() routine (if defined) when it is done stuffing characters
+ * into the driver. If there is no room in the queue, the character is
+ * ignored.
+ *
+ * Optional: Kernel will use the @write method if not provided. Do not
+ * call this function directly, call tty_put_char().
+ *
+ * @flush_chars: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine is called by the kernel after it has written a
+ * series of characters to the tty device using @put_char().
+ *
+ * Optional. Do not call this function directly, call
+ * tty_driver_flush_chars().
+ *
+ * @write_room: ``int ()(struct tty_struct *tty)``
+ *
+ * This routine returns the numbers of characters the @tty driver
+ * will accept for queuing to be written. This number is subject
+ * to change as output buffers get emptied, or if the output flow
+ * control is acted.
+ *
+ * The ldisc is responsible for being intelligent about multi-threading of
+ * write_room/write calls
+ *
+ * Required if @write method is provided else not needed. Do not call this
+ * function directly, call tty_write_room()
+ *
+ * @chars_in_buffer: ``int ()(struct tty_struct *tty)``
+ *
+ * This routine returns the number of characters in the device private
+ * output queue. Used in tty_wait_until_sent() and for poll()
+ * implementation.
+ *
+ * Optional: if not provided, it is assumed there is no queue on the
+ * device. Do not call this function directly, call tty_chars_in_buffer().
+ *
+ * @ioctl: ``int ()(struct tty_struct *tty, unsigned int cmd,
+ * unsigned long arg)``
+ *
+ * This routine allows the @tty driver to implement device-specific
+ * ioctls. If the ioctl number passed in @cmd is not recognized by the
+ * driver, it should return %ENOIOCTLCMD.
+ *
+ * Optional.
+ *
+ * @compat_ioctl: ``long ()(struct tty_struct *tty, unsigned int cmd,
+ * unsigned long arg)``
+ *
+ * Implement ioctl processing for 32 bit process on 64 bit system.
+ *
+ * Optional.
+ *
+ * @set_termios: ``void ()(struct tty_struct *tty, struct ktermios *old)``
+ *
+ * This routine allows the @tty driver to be notified when device's
+ * termios settings have changed. New settings are in @tty->termios.
+ * Previous settings are passed in the @old argument.
+ *
+ * The API is defined such that the driver should return the actual modes
+ * selected. This means that the driver is responsible for modifying any
+ * bits in @tty->termios it cannot fulfill to indicate the actual modes
+ * being used.
+ *
+ * Optional. Called under the @tty->termios_rwsem. May sleep.
+ *
+ * @ldisc_ok: ``int ()(struct tty_struct *tty, int ldisc)``
+ *
+ * This routine allows the @tty driver to decide if it can deal
+ * with a particular @ldisc.
+ *
+ * Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
+ * @set_ldisc: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine allows the @tty driver to be notified when the device's
+ * line discipline is being changed. At the point this is done the
+ * discipline is not yet usable.
+ *
+ * Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
+ * @throttle: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that input buffers for the line
+ * discipline are close to full, and it should somehow signal that no more
+ * characters should be sent to the @tty.
+ *
+ * Serialization including with @unthrottle() is the job of the ldisc
+ * layer.
+ *
+ * Optional: Always invoke via tty_throttle_safe(). Called under the
+ * @tty->termios_rwsem.
+ *
+ * @unthrottle: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it should signal that
+ * characters can now be sent to the @tty without fear of overrunning the
+ * input buffers of the line disciplines.
+ *
+ * Optional. Always invoke via tty_unthrottle(). Called under the
+ * @tty->termios_rwsem.
+ *
+ * @stop: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it should stop outputting
+ * characters to the tty device.
+ *
+ * Called with @tty->flow.lock held. Serialized with @start() method.
+ *
+ * Optional. Always invoke via stop_tty().
+ *
+ * @start: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it resumed sending
+ * characters to the @tty device.
+ *
+ * Called with @tty->flow.lock held. Serialized with stop() method.
+ *
+ * Optional. Always invoke via start_tty().
+ *
+ * @hangup: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine notifies the @tty driver that it should hang up the @tty
+ * device.
+ *
+ * Optional. Called with tty lock held.
+ *
+ * @break_ctl: ``int ()(struct tty_struct *tty, int state)``
+ *
+ * This optional routine requests the @tty driver to turn on or off BREAK
+ * status on the RS-232 port. If @state is -1, then the BREAK status
+ * should be turned on; if @state is 0, then BREAK should be turned off.
+ *
+ * If this routine is implemented, the high-level tty driver will handle
+ * the following ioctls: %TCSBRK, %TCSBRKP, %TIOCSBRK, %TIOCCBRK.
+ *
+ * If the driver sets %TTY_DRIVER_HARDWARE_BREAK in tty_alloc_driver(),
+ * then the interface will also be called with actual times and the
+ * hardware is expected to do the delay work itself. 0 and -1 are still
+ * used for on/off.
+ *
+ * Optional: Required for %TCSBRK/%BRKP/etc. handling. May sleep.
+ *
+ * @flush_buffer: ``void ()(struct tty_struct *tty)``
+ *
+ * This routine discards device private output buffer. Invoked on close,
+ * hangup, to implement %TCOFLUSH ioctl and similar.
+ *
+ * Optional: if not provided, it is assumed there is no queue on the
+ * device. Do not call this function directly, call
+ * tty_driver_flush_buffer().
+ *
+ * @wait_until_sent: ``void ()(struct tty_struct *tty, int timeout)``
+ *
+ * This routine waits until the device has written out all of the
+ * characters in its transmitter FIFO. Or until @timeout (in jiffies) is
+ * reached.
+ *
+ * Optional: If not provided, the device is assumed to have no FIFO.
+ * Usually correct to invoke via tty_wait_until_sent(). May sleep.
+ *
+ * @send_xchar: ``void ()(struct tty_struct *tty, char ch)``
+ *
+ * This routine is used to send a high-priority XON/XOFF character (@ch)
+ * to the @tty device.
+ *
+ * Optional: If not provided, then the @write method is called under
+ * the @tty->atomic_write_lock to keep it serialized with the ldisc.
+ *
+ * @tiocmget: ``int ()(struct tty_struct *tty)``
+ *
+ * This routine is used to obtain the modem status bits from the @tty
+ * driver.
+ *
+ * Optional: If not provided, then %ENOTTY is returned from the %TIOCMGET
+ * ioctl. Do not call this function directly, call tty_tiocmget().
+ *
+ * @tiocmset: ``int ()(struct tty_struct *tty,
+ * unsigned int set, unsigned int clear)``
+ *
+ * This routine is used to set the modem status bits to the @tty driver.
+ * First, @clear bits should be cleared, then @set bits set.
+ *
+ * Optional: If not provided, then %ENOTTY is returned from the %TIOCMSET
+ * ioctl. Do not call this function directly, call tty_tiocmset().
+ *
+ * @resize: ``int ()(struct tty_struct *tty, struct winsize *ws)``
+ *
+ * Called when a termios request is issued which changes the requested
+ * terminal geometry to @ws.
+ *
+ * Optional: the default action is to update the termios structure
+ * without error. This is usually the correct behaviour. Drivers should
+ * not force errors here if they are not resizable objects (e.g. a serial
+ * line). See tty_do_resize() if you need to wrap the standard method
+ * in your own logic -- the usual case.
+ *
+ * @get_icount: ``int ()(struct tty_struct *tty,
+ * struct serial_icounter *icount)``
+ *
+ * Called when the @tty device receives a %TIOCGICOUNT ioctl. Passed a
+ * kernel structure @icount to complete.
+ *
+ * Optional: called only if provided, otherwise %ENOTTY will be returned.
+ *
+ * @get_serial: ``int ()(struct tty_struct *tty, struct serial_struct *p)``
+ *
+ * Called when the @tty device receives a %TIOCGSERIAL ioctl. Passed a
+ * kernel structure @p (&struct serial_struct) to complete.
+ *
+ * Optional: called only if provided, otherwise %ENOTTY will be returned.
+ * Do not call this function directly, call tty_tiocgserial().
+ *
+ * @set_serial: ``int ()(struct tty_struct *tty, struct serial_struct *p)``
+ *
+ * Called when the @tty device receives a %TIOCSSERIAL ioctl. Passed a
+ * kernel structure @p (&struct serial_struct) to set the values from.
+ *
+ * Optional: called only if provided, otherwise %ENOTTY will be returned.
+ * Do not call this function directly, call tty_tiocsserial().
+ *
+ * @show_fdinfo: ``void ()(struct tty_struct *tty, struct seq_file *m)``
+ *
+ * Called when the @tty device file descriptor receives a fdinfo request
+ * from VFS (to show in /proc/<pid>/fdinfo/). @m should be filled with
+ * information.
+ *
+ * Optional: called only if provided, otherwise nothing is written to @m.
+ * Do not call this function directly, call tty_show_fdinfo().
+ *
+ * @poll_init: ``int ()(struct tty_driver *driver, int line, char *options)``
+ *
+ * kgdboc support (Documentation/dev-tools/kgdb.rst). This routine is
+ * called to initialize the HW for later use by calling @poll_get_char or
+ * @poll_put_char.
+ *
+ * Optional: called only if provided, otherwise skipped as a non-polling
+ * driver.
+ *
+ * @poll_get_char: ``int ()(struct tty_driver *driver, int line)``
+ *
+ * kgdboc support (see @poll_init). @driver should read a character from a
+ * tty identified by @line and return it.
+ *
+ * Optional: called only if @poll_init provided.
+ *
+ * @poll_put_char: ``void ()(struct tty_driver *driver, int line, char ch)``
+ *
+ * kgdboc support (see @poll_init). @driver should write character @ch to
+ * a tty identified by @line.
+ *
+ * Optional: called only if @poll_init provided.
+ *
+ * @proc_show: ``int ()(struct seq_file *m, void *driver)``
+ *
+ * Driver @driver (cast to &struct tty_driver) can show additional info in
+ * /proc/tty/driver/<driver_name>. It is enough to fill in the information
+ * into @m.
+ *
+ * Optional: called only if provided, otherwise no /proc entry created.
+ *
+ * This structure defines the interface between the low-level tty driver and
+ * the tty routines. These routines can be defined. Unless noted otherwise,
+ * they are optional, and can be filled in with a %NULL pointer.
+ */
struct tty_operations {
struct tty_struct * (*lookup)(struct tty_driver *driver,
struct file *filp, int idx);
@@ -270,6 +608,7 @@ struct tty_operations {
void (*hangup)(struct tty_struct *tty);
int (*break_ctl)(struct tty_struct *tty, int state);
void (*flush_buffer)(struct tty_struct *tty);
+ int (*ldisc_ok)(struct tty_struct *tty, int ldisc);
void (*set_ldisc)(struct tty_struct *tty);
void (*wait_until_sent)(struct tty_struct *tty, int timeout);
void (*send_xchar)(struct tty_struct *tty, char ch);
--
2.34.1
From: Dan Carpenter <dan.carpenter(a)linaro.org>
[ Upstream commit e56aac6e5a25630645607b6856d4b2a17b2311a5 ]
The "command" variable can be controlled by the user via debugfs. The
worry is that if con_index is zero then "&uc->ucsi->connector[con_index
- 1]" would be an array underflow.
Fixes: 170a6726d0e2 ("usb: typec: ucsi: add support for separate DP altmode devices")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Link: https://lore.kernel.org/r/c69ef0b3-61b0-4dde-98dd-97b97f81d912@stanley.moun…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[The function ucsi_ccg_sync_write() is renamed to ucsi_ccg_sync_control()
in commit 13f2ec3115c8 ("usb: typec: ucsi:simplify command sending API").
Apply this patch to ucsi_ccg_sync_write() in 6.1.y accordingly.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
drivers/usb/typec/ucsi/ucsi_ccg.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c b/drivers/usb/typec/ucsi/ucsi_ccg.c
index 8e500fe41e78..4801d783bd0c 100644
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c
+++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
@@ -585,6 +585,10 @@ static int ucsi_ccg_sync_write(struct ucsi *ucsi, unsigned int offset,
uc->has_multiple_dp) {
con_index = (uc->last_cmd_sent >> 16) &
UCSI_CMD_CONNECTOR_MASK;
+ if (con_index == 0) {
+ ret = -EINVAL;
+ goto unlock;
+ }
con = &uc->ucsi->connector[con_index - 1];
ucsi_ccg_update_set_new_cam_cmd(uc, con, (u64 *)val);
}
@@ -600,6 +604,7 @@ static int ucsi_ccg_sync_write(struct ucsi *ucsi, unsigned int offset,
err_clear_bit:
clear_bit(DEV_CMD_PENDING, &uc->flags);
pm_runtime_put_sync(uc->dev);
+unlock:
mutex_unlock(&uc->lock);
return ret;
--
2.34.1
From: Dan Carpenter <dan.carpenter(a)linaro.org>
[ Upstream commit e56aac6e5a25630645607b6856d4b2a17b2311a5 ]
The "command" variable can be controlled by the user via debugfs. The
worry is that if con_index is zero then "&uc->ucsi->connector[con_index
- 1]" would be an array underflow.
Fixes: 170a6726d0e2 ("usb: typec: ucsi: add support for separate DP altmode devices")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Link: https://lore.kernel.org/r/c69ef0b3-61b0-4dde-98dd-97b97f81d912@stanley.moun…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[The function ucsi_ccg_sync_write() is renamed to ucsi_ccg_sync_control()
in commit 13f2ec3115c8 ("usb: typec: ucsi:simplify command sending API").
Apply this patch to ucsi_ccg_sync_write() in 5.10.y accordingly.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
drivers/usb/typec/ucsi/ucsi_ccg.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c b/drivers/usb/typec/ucsi/ucsi_ccg.c
index fb6211efb5d8..3983bf21a580 100644
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c
+++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
@@ -573,6 +573,10 @@ static int ucsi_ccg_sync_write(struct ucsi *ucsi, unsigned int offset,
uc->has_multiple_dp) {
con_index = (uc->last_cmd_sent >> 16) &
UCSI_CMD_CONNECTOR_MASK;
+ if (con_index == 0) {
+ ret = -EINVAL;
+ goto unlock;
+ }
con = &uc->ucsi->connector[con_index - 1];
ucsi_ccg_update_set_new_cam_cmd(uc, con, (u64 *)val);
}
@@ -588,6 +592,7 @@ static int ucsi_ccg_sync_write(struct ucsi *ucsi, unsigned int offset,
err_clear_bit:
clear_bit(DEV_CMD_PENDING, &uc->flags);
pm_runtime_put_sync(uc->dev);
+unlock:
mutex_unlock(&uc->lock);
return ret;
--
2.34.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c23c03bf1faa1e76be1eba35bad6da6a2a7c95ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050930-scuba-spending-0eb9@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c23c03bf1faa1e76be1eba35bad6da6a2a7c95ee Mon Sep 17 00:00:00 2001
From: Cristian Marussi <cristian.marussi(a)arm.com>
Date: Mon, 10 Mar 2025 17:58:00 +0000
Subject: [PATCH] firmware: arm_scmi: Fix timeout checks on polling path
Polling mode transactions wait for a reply busy-looping without holding a
spinlock, but currently the timeout checks are based only on elapsed time:
as a result we could hit a false positive whenever our busy-looping thread
is pre-empted and scheduled out for a time greater than the polling
timeout.
Change the checks at the end of the busy-loop to make sure that the polling
wasn't indeed successful or an out-of-order reply caused the polling to be
forcibly terminated.
Fixes: 31d2f803c19c ("firmware: arm_scmi: Add sync_cmds_completed_on_ret transport flag")
Reported-by: Huangjie <huangjie1663(a)phytium.com.cn>
Closes: https://lore.kernel.org/arm-scmi/20250123083323.2363749-1-jackhuang021@gmai…
Signed-off-by: Cristian Marussi <cristian.marussi(a)arm.com>
Cc: stable(a)vger.kernel.org # 5.18.x
Message-Id: <20250310175800.1444293-1-cristian.marussi(a)arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla(a)arm.com>
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 1c75a4c9c371..0390d5ff195e 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -1248,7 +1248,8 @@ static void xfer_put(const struct scmi_protocol_handle *ph,
}
static bool scmi_xfer_done_no_timeout(struct scmi_chan_info *cinfo,
- struct scmi_xfer *xfer, ktime_t stop)
+ struct scmi_xfer *xfer, ktime_t stop,
+ bool *ooo)
{
struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
@@ -1257,7 +1258,7 @@ static bool scmi_xfer_done_no_timeout(struct scmi_chan_info *cinfo,
* in case of out-of-order receptions of delayed responses
*/
return info->desc->ops->poll_done(cinfo, xfer) ||
- try_wait_for_completion(&xfer->done) ||
+ (*ooo = try_wait_for_completion(&xfer->done)) ||
ktime_after(ktime_get(), stop);
}
@@ -1274,15 +1275,17 @@ static int scmi_wait_for_reply(struct device *dev, const struct scmi_desc *desc,
* itself to support synchronous commands replies.
*/
if (!desc->sync_cmds_completed_on_ret) {
+ bool ooo = false;
+
/*
* Poll on xfer using transport provided .poll_done();
* assumes no completion interrupt was available.
*/
ktime_t stop = ktime_add_ms(ktime_get(), timeout_ms);
- spin_until_cond(scmi_xfer_done_no_timeout(cinfo,
- xfer, stop));
- if (ktime_after(ktime_get(), stop)) {
+ spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer,
+ stop, &ooo));
+ if (!ooo && !info->desc->ops->poll_done(cinfo, xfer)) {
dev_err(dev,
"timed out in resp(caller: %pS) - polling\n",
(void *)_RET_IP_);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x dc7eb8755797ed41a0d1b5c0c39df3c8f401b3d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024012617-overlap-reborn-e124@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
dc7eb8755797 ("arm64/sme: Always exit sme_alloc() early with existing storage")
5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state")
f90b529bcbe5 ("arm64/sme: Implement ZT0 ptrace support")
ce514000da4f ("arm64/sme: Rename za_state to sme_state")
1192b93ba352 ("arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()")
deeb8f9a80fd ("arm64/fpsimd: Have KVM explicitly say which FP registers to save")
baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE")
93ae6b01bafe ("KVM: arm64: Discard any SVE state when entering KVM guests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dc7eb8755797ed41a0d1b5c0c39df3c8f401b3d9 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Mon, 15 Jan 2024 20:15:46 +0000
Subject: [PATCH] arm64/sme: Always exit sme_alloc() early with existing
storage
When sme_alloc() is called with existing storage and we are not flushing we
will always allocate new storage, both leaking the existing storage and
corrupting the state. Fix this by separating the checks for flushing and
for existing storage as we do for SVE.
Callers that reallocate (eg, due to changing the vector length) should
call sme_free() themselves.
Fixes: 5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20240115-arm64-sme-flush-v1-1-7472bd3459b7@kernel…
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 0983be2b1b61..a5dc6f764195 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1217,8 +1217,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
*/
void sme_alloc(struct task_struct *task, bool flush)
{
- if (task->thread.sme_state && flush) {
- memset(task->thread.sme_state, 0, sme_state_size(task));
+ if (task->thread.sme_state) {
+ if (flush)
+ memset(task->thread.sme_state, 0,
+ sme_state_size(task));
return;
}