When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>", the physical range that contains the carried
over IMA measurement list may fall outside the truncated RAM leading to
a kernel panic.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) – not-present page
Other architectures already validate the range with page_is_ram(), as
done in commit: cbf9c4b9617b ("of: check previous kernel's
ima-kexec-buffer against memory bounds") do a similar check on x86.
Cc: stable(a)vger.kernel.org
Fixes: b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec")
Reported-by: Paul Webb <paul.x.webb(a)oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
---
Have tested the kexec for x86 kernel with IMA_KEXEC enabled and the
above patch works good. Paul initially reported this on 6.12 kernel but
I was able to reproduce this on 6.18, so I tried replicating how this
was fixed in drivers/of/kexec.c
---
arch/x86/kernel/setup.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1b2edd07a3e1..fcef197d180e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -439,9 +439,23 @@ int __init ima_free_kexec_buffer(void)
int __init ima_get_kexec_buffer(void **addr, size_t *size)
{
+ unsigned long start_pfn, end_pfn;
+
if (!ima_kexec_buffer_size)
return -ENOENT;
+ /*
+ * Calculate the PFNs for the buffer and ensure
+ * they are with in addressable memory.
+ */
+ start_pfn = PFN_DOWN(ima_kexec_buffer_phys);
+ end_pfn = PFN_DOWN(ima_kexec_buffer_phys + ima_kexec_buffer_size - 1);
+ if (!pfn_range_is_mapped(start_pfn, end_pfn)) {
+ pr_warn("IMA buffer at 0x%llx, size = 0x%zx beyond memory\n",
+ ima_kexec_buffer_phys, ima_kexec_buffer_size);
+ return -EINVAL;
+ }
+
*addr = __va(ima_kexec_buffer_phys);
*size = ima_kexec_buffer_size;
--
2.50.1
The local variable 'sensitivity' was never clamped to 0 or
POWERSAVE_BIAS_MAX because the return value of clamp() was not used. Fix
this by assigning the clamped value back to 'sensitivity'.
Cc: stable(a)vger.kernel.org
Fixes: 9c5320c8ea8b ("cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/cpufreq/amd_freq_sensitivity.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/amd_freq_sensitivity.c b/drivers/cpufreq/amd_freq_sensitivity.c
index 13fed4b9e02b..713ccf24c97d 100644
--- a/drivers/cpufreq/amd_freq_sensitivity.c
+++ b/drivers/cpufreq/amd_freq_sensitivity.c
@@ -76,7 +76,7 @@ static unsigned int amd_powersave_bias_target(struct cpufreq_policy *policy,
sensitivity = POWERSAVE_BIAS_MAX -
(POWERSAVE_BIAS_MAX * (d_reference - d_actual) / d_reference);
- clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
+ sensitivity = clamp(sensitivity, 0, POWERSAVE_BIAS_MAX);
/* this workload is not CPU bound, so choose a lower freq */
if (sensitivity < od_tuners->powersave_bias) {
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
The local variable 'curr_energy' was never clamped to
PAC_193X_MIN_POWER_ACC or PAC_193X_MAX_POWER_ACC because the return
value of clamp() was not used. Fix this by assigning the clamped value
back to 'curr_energy'.
Cc: stable(a)vger.kernel.org
Fixes: 0fb528c8255b ("iio: adc: adding support for PAC193x")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/iio/adc/pac1934.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/adc/pac1934.c b/drivers/iio/adc/pac1934.c
index 48df16509260..256488d3936b 100644
--- a/drivers/iio/adc/pac1934.c
+++ b/drivers/iio/adc/pac1934.c
@@ -665,7 +665,8 @@ static int pac1934_reg_snapshot(struct pac1934_chip_info *info,
/* add the power_acc field */
curr_energy += inc;
- clamp(curr_energy, PAC_193X_MIN_POWER_ACC, PAC_193X_MAX_POWER_ACC);
+ curr_energy = clamp(curr_energy, PAC_193X_MIN_POWER_ACC,
+ PAC_193X_MAX_POWER_ACC);
reg_data->energy_sec_acc[cnt] = curr_energy;
}
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
The local variable 'val' was never clamped to -75000 or 180000 because
the return value of clamp_val() was not used. Fix this by assigning the
clamped value back to 'val', and use clamp() instead of clamp_val().
Cc: stable(a)vger.kernel.org
Fixes: a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/net/phy/marvell-88q2xxx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/marvell-88q2xxx.c b/drivers/net/phy/marvell-88q2xxx.c
index f3d83b04c953..201dee1a1698 100644
--- a/drivers/net/phy/marvell-88q2xxx.c
+++ b/drivers/net/phy/marvell-88q2xxx.c
@@ -698,7 +698,7 @@ static int mv88q2xxx_hwmon_write(struct device *dev,
switch (attr) {
case hwmon_temp_max:
- clamp_val(val, -75000, 180000);
+ val = clamp(val, -75000, 180000);
val = (val / 1000) + 75;
val = FIELD_PREP(MDIO_MMD_PCS_MV_TEMP_SENSOR3_INT_THRESH_MASK,
val);
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
Hi,
This Linux kernel patch series introduces support for error recovery for
passthrough PCI devices on System Z (s390x).
Background
----------
For PCI devices on s390x an operating system receives platform specific
error events from firmware rather than through AER.Today for
passthrough/userspace devices, we don't attempt any error recovery and
ignore any error events for the devices. The passthrough/userspace devices
are managed by the vfio-pci driver. The driver does register error handling
callbacks (error_detected), and on an error trigger an eventfd to
userspace. But we need a mechanism to notify userspace
(QEMU/guest/userspace drivers) about the error event.
Proposal
--------
We can expose this error information (currently only the PCI Error Code)
via a device feature. Userspace can then obtain the error information
via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
a device reset.
This is how a typical flow for passthrough devices to a VM would work:
For passthrough devices to a VM, the driver bound to the device on the host
is vfio-pci. vfio-pci driver does support the error_detected() callback
(vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery
code on the host will call the vfio-pci error_detected() callback. The
vfio-pci error_detected() callback will notify userspace/QEMU via an
eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x
error recovery on the host will skip any further action(see patch 6) and
let userspace drive the error recovery.
Once userspace/QEMU is notified, it then injects this error into the VM
so device drivers in the VM can take recovery actions. For example for a
passthrough NVMe device, the VM's OS NVMe driver will access the device.
At this point the VM's NVMe driver's error_detected() will drive the
recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error
recovery in the VM's OS will try to do a reset. Resets are privileged
operations and so the VM will need intervention from QEMU to perform the
reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the
host that the VM is requesting a reset of the device. The vfio-pci driver
on the host will then perform the reset on the device to recover it.
Thanks
Farhan
ChangeLog
---------
v5 series https://lore.kernel.org/all/20251113183502.2388-1-alifm@linux.ibm.com/
v5 -> v6
- Rebase on 6.18 + Lukas's PCI: Universal error recoverability of
devices series (https://lore.kernel.org/all/cover.1763483367.git.lukas@wunner.de/)
- Re-work config space accessibility check to pci_dev_save_and_disable() (patch 3).
This avoids saving the config space, in the reset path, if the device's config space is
corrupted or inaccessible.
v4 series https://lore.kernel.org/all/20250924171628.826-1-alifm@linux.ibm.com/
v4 -> v5
- Rebase on 6.18-rc5
- Move bug fixes to the beginning of the series (patch 1 and 2). These patches
were posted as a separate fixes series
https://lore.kernel.org/all/a14936ac-47d6-461b-816f-0fd66f869b0f@linux.ibm.…
- Add matching pci_put_dev() for pci_get_slot() (patch 6).
v3 series https://lore.kernel.org/all/20250911183307.1910-1-alifm@linux.ibm.com/
v3 -> v4
- Remove warn messages for each PCI capability not restored (patch 1)
- Check PCI_COMMAND and PCI_STATUS register for error value instead of device id
(patch 1)
- Fix kernel crash in patch 3
- Added reviewed by tags
- Address comments from Niklas's (patches 4, 5, 7)
- Fix compilation error non s390x system (patch 8)
- Explicitly align struct vfio_device_feature_zpci_err (patch 8)
v2 series https://lore.kernel.org/all/20250825171226.1602-1-alifm@linux.ibm.com/
v2 -> v3
- Patch 1 avoids saving any config space state if the device is in error
(suggested by Alex)
- Patch 2 adds additional check only for FLR reset to try other function
reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions. Creates a new flag pci_slot to allow per function slot.
- Patch 4 fixes a bug in s390 for resource to bus address translation.
- Rebase on 6.17-rc5
v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@linux.ibm.com/
v1 - > v2
- Patches 1 and 2 adds some additional checks for FLR/PM reset to
try other function reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions.
- Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE
ioctl. The ioctl is used by userspace to retriece any PCI error
information for the device (suggested by Alex).
- Patch 8 adds a reset_done() callback for the vfio-pci driver, to
restore the state of the device after a reset.
- Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX.
Farhan Ali (9):
PCI: Allow per function PCI slots
s390/pci: Add architecture specific resource/bus address translation
PCI: Avoid saving config space state if inaccessible
PCI: Add additional checks for flr reset
s390/pci: Update the logic for detecting passthrough device
s390/pci: Store PCI error information for passthrough devices
vfio-pci/zdev: Add a device feature for error information
vfio: Add a reset_done callback for vfio-pci driver
vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
arch/s390/include/asm/pci.h | 29 ++++++++
arch/s390/pci/pci.c | 75 +++++++++++++++++++++
arch/s390/pci/pci_event.c | 107 +++++++++++++++++-------------
drivers/pci/host-bridge.c | 4 +-
drivers/pci/pci.c | 19 +++++-
drivers/pci/slot.c | 25 ++++++-
drivers/vfio/pci/vfio_pci_core.c | 20 ++++--
drivers/vfio/pci/vfio_pci_intrs.c | 3 +-
drivers/vfio/pci/vfio_pci_priv.h | 9 +++
drivers/vfio/pci/vfio_pci_zdev.c | 45 ++++++++++++-
include/linux/pci.h | 1 +
include/uapi/linux/vfio.h | 15 +++++
12 files changed, 291 insertions(+), 61 deletions(-)
--
2.43.0
The macro FAN_FROM_REG evaluates its arguments multiple times. When used
in lockless contexts involving shared driver data, this leads to
Time-of-Check to Time-of-Use (TOCTOU) race conditions, potentially
causing divide-by-zero errors.
Convert the macro to a static function. This guarantees that arguments
are evaluated only once (pass-by-value), preventing the race
conditions.
Additionally, in store_fan_div, move the calculation of the minimum
limit inside the update lock. This ensures that the read-modify-write
sequence operates on consistent data.
Adhere to the principle of minimal changes by only converting macros
that evaluate arguments multiple times and are used in lockless
contexts.
Link: https://lore.kernel.org/all/CALbr=LYJ_ehtp53HXEVkSpYoub+XYSTU8Rg=o1xxMJ8=5z…
Fixes: 9873964d6eb2 ("[PATCH] HWMON: w83791d: New hardware monitoring driver for the Winbond W83791D")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
Based on the discussion in the link, I will submit a series of patches to
address TOCTOU issues in the hwmon subsystem by converting macros to
functions or adjusting locking where appropriate.
---
drivers/hwmon/w83791d.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/hwmon/w83791d.c b/drivers/hwmon/w83791d.c
index ace854b370a0..996e36951f9d 100644
--- a/drivers/hwmon/w83791d.c
+++ b/drivers/hwmon/w83791d.c
@@ -218,9 +218,14 @@ static u8 fan_to_reg(long rpm, int div)
return clamp_val((1350000 + rpm * div / 2) / (rpm * div), 1, 254);
}
-#define FAN_FROM_REG(val, div) ((val) == 0 ? -1 : \
- ((val) == 255 ? 0 : \
- 1350000 / ((val) * (div))))
+static int fan_from_reg(int val, int div)
+{
+ if (val == 0)
+ return -1;
+ if (val == 255)
+ return 0;
+ return 1350000 / (val * div);
+}
/* for temp1 which is 8-bit resolution, LSB = 1 degree Celsius */
#define TEMP1_FROM_REG(val) ((val) * 1000)
@@ -521,7 +526,7 @@ static ssize_t show_##reg(struct device *dev, struct device_attribute *attr, \
struct w83791d_data *data = w83791d_update_device(dev); \
int nr = sensor_attr->index; \
return sprintf(buf, "%d\n", \
- FAN_FROM_REG(data->reg[nr], DIV_FROM_REG(data->fan_div[nr]))); \
+ fan_from_reg(data->reg[nr], DIV_FROM_REG(data->fan_div[nr]))); \
}
show_fan_reg(fan);
@@ -585,10 +590,10 @@ static ssize_t store_fan_div(struct device *dev, struct device_attribute *attr,
if (err)
return err;
+ mutex_lock(&data->update_lock);
/* Save fan_min */
- min = FAN_FROM_REG(data->fan_min[nr], DIV_FROM_REG(data->fan_div[nr]));
+ min = fan_from_reg(data->fan_min[nr], DIV_FROM_REG(data->fan_div[nr]));
- mutex_lock(&data->update_lock);
data->fan_div[nr] = div_to_reg(nr, val);
switch (nr) {
--
2.43.0
The driver_override_show() function reads the driver_override string
without holding the device_lock. However, driver_override_store() uses
driver_set_override(), which modifies and frees the string while holding
the device_lock.
This can result in a concurrent use-after-free if the string is freed
by the store function while being read by the show function.
Fix this by holding the device_lock around the read operation.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
I verified this with a stress test that continuously writes/reads the
attribute. It triggered KASAN and leaked bytes like a0 f4 81 9f a3 ff ff
(likely kernel pointers). Since driver_override is world-readable (0644),
this allows unprivileged users to leak kernel pointers and bypass KASLR.
Similar races were fixed in other buses (e.g., commits 9561475db680 and
91d44c1afc61). Currently, 9 of 11 buses handle this correctly; this patch
fixes one of the remaining two.
---
drivers/bus/fsl-mc/fsl-mc-bus.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index 25845c04e562..a97baf2cbcdd 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -202,8 +202,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", mc_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", mc_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
--
2.43.0
Hi,
Ahead of LDI Show 2025 (Dec 3–9, Las Vegas), we have access to a verified audience of 16,594 and a strong exhibitor roster of 312 top suppliers and vendors.
This includes lighting & audio equipment manufacturers, staging/rigging vendors, media-server / projection / video providers, production houses, rental companies, sound & lighting engineers, stage managers , other live-event technology stakeholders and many.
If interested, reply “Send Pricing” to receive the details.
Best regards,
Megan Castillio
Head of Client Relations
To opt-out, reply “Not Interested”.