Hi Trond and Greg:
LTS 4.19 reported null-ptr-deref BUG as follows:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
Call Trace:
nfs_inode_add_request+0x1cc/0x5b8
nfs_setup_write_request+0x1fa/0x1fc
nfs_writepage_setup+0x2d/0x7d
nfs_updatepage+0x8b8/0x936
nfs_write_end+0x61d/0xd45
generic_perform_write+0x19a/0x3f0
nfs_file_write+0x2cc/0x6e5
new_sync_write+0x442/0x560
__vfs_write+0xda/0xef
vfs_write+0x176/0x48b
ksys_write+0x10a/0x1e9
__se_sys_write+0x24/0x29
__x64_sys_write+0x79/0x93
do_syscall_64+0x16d/0x4bb
entry_SYSCALL_64_after_hwframe+0x5c/0xc1
The reason is: generic_error_remove_page set page->mapping to NULL when
nfs server have a fatal error:
nfs_updatepage
nfs_writepage_setup
nfs_setup_write_request
nfs_try_to_update_request // return NULL
nfs_wb_page // return 0
nfs_writepage_locked // return 0
nfs_do_writepage // return 0
nfs_page_async_flush // return 0
nfs_error_is_fatal_on_server
generic_error_remove_page
truncate_inode_page
delete_from_page_cache
__delete_from_page_cache
page_cache_tree_delete
page->mapping = NULL // this is point
nfs_create_request
req->wb_page = page // the page is freed
nfs_inode_add_request
mapping = page_file_mapping(req->wb_page)
return page->mapping
spin_lock(&mapping->private_lock) // mapping is NULL
It is reasonable by reverting the patch "89047634f5ce NFS: Don't
interrupt file writeout due to fatal errors" to fix this bug?
This patch is one patch of patchset [Fix up soft mounts for
NFSv4.x](https://lore.kernel.org/all/20190407175912.23528-1-trond.myklebust…,
the patchset replace custom error reporting mechanism. it seams that we
should merge all the patchset to LTS 4.19, or all patchs should not be
merged. And the "Fixes:" label is not correct, this patch is a
refactoring patch, not for fixing bugs.
519fabc7aaba ("psi: remove 500ms min window size limitation for
triggers") breaks unprivileged psi polling on cgroups.
Historically, we had a privilege check for polling in the open() of a
pressure file in /proc, but were erroneously missing it for the open()
of cgroup pressure files.
When unprivileged polling was introduced in d82caa273565 ("sched/psi:
Allow unprivileged polling of N*2s period"), it needed to filter
privileges depending on the exact polling parameters, and as such
moved the CAP_SYS_RESOURCE check from the proc open() callback to
psi_trigger_create(). Both the proc files as well as cgroup files go
through this during write(). This implicitly added the missing check
for privileges required for HT polling for cgroups.
When 519fabc7aaba ("psi: remove 500ms min window size limitation for
triggers") followed right after to remove further restrictions on the
RT polling window, it incorrectly assumed the cgroup privilege check
was still missing and added it to the cgroup open(), mirroring what we
used to do for proc files in the past.
As a result, unprivileged poll requests that would be supported now
get rejected when opening the cgroup pressure file for writing.
Remove the cgroup open() check. psi_trigger_create() handles it.
Fixes: 519fabc7aaba ("psi: remove 500ms min window size limitation for triggers")
Cc: stable(a)vger.kernel.org # 6.5+
Reported-by: Luca Boccassi <bluca(a)debian.org>
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
---
kernel/cgroup/cgroup.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f11488b18ceb..2069ee98da60 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3879,14 +3879,6 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of,
return psi_trigger_poll(&ctx->psi.trigger, of->file, pt);
}
-static int cgroup_pressure_open(struct kernfs_open_file *of)
-{
- if (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE))
- return -EPERM;
-
- return 0;
-}
-
static void cgroup_pressure_release(struct kernfs_open_file *of)
{
struct cgroup_file_ctx *ctx = of->priv;
@@ -5287,7 +5279,6 @@ static struct cftype cgroup_psi_files[] = {
{
.name = "io.pressure",
.file_offset = offsetof(struct cgroup, psi_files[PSI_IO]),
- .open = cgroup_pressure_open,
.seq_show = cgroup_io_pressure_show,
.write = cgroup_io_pressure_write,
.poll = cgroup_pressure_poll,
@@ -5296,7 +5287,6 @@ static struct cftype cgroup_psi_files[] = {
{
.name = "memory.pressure",
.file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]),
- .open = cgroup_pressure_open,
.seq_show = cgroup_memory_pressure_show,
.write = cgroup_memory_pressure_write,
.poll = cgroup_pressure_poll,
@@ -5305,7 +5295,6 @@ static struct cftype cgroup_psi_files[] = {
{
.name = "cpu.pressure",
.file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]),
- .open = cgroup_pressure_open,
.seq_show = cgroup_cpu_pressure_show,
.write = cgroup_cpu_pressure_write,
.poll = cgroup_pressure_poll,
@@ -5315,7 +5304,6 @@ static struct cftype cgroup_psi_files[] = {
{
.name = "irq.pressure",
.file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]),
- .open = cgroup_pressure_open,
.seq_show = cgroup_irq_pressure_show,
.write = cgroup_irq_pressure_write,
.poll = cgroup_pressure_poll,
--
2.42.0
cros_ec_sensors_push_data() reads `indio_dev->active_scan_mask` and
calls iio_push_to_buffers_with_timestamp() without making sure the
`indio_dev` stays in buffer mode. There is a race if `indio_dev` exits
buffer mode right before cros_ec_sensors_push_data() accesses them.
An use-after-free on `indio_dev->active_scan_mask` was observed. The
call trace:
[...]
_find_next_bit
cros_ec_sensors_push_data
cros_ec_sensorhub_event
blocking_notifier_call_chain
cros_ec_irq_thread
It was caused by a race condition: one thread just freed
`active_scan_mask` at [1]; while another thread tried to access the
memory at [2].
Fix it by calling iio_device_claim_buffer_mode() to ensure the
`indio_dev` can't exit buffer mode during cros_ec_sensors_push_data().
[1]: https://elixir.bootlin.com/linux/v6.5/source/drivers/iio/industrialio-buffe…
[2]: https://elixir.bootlin.com/linux/v6.5/source/drivers/iio/common/cros_ec_sen…
Cc: stable(a)vger.kernel.org
Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO")
Signed-off-by: Tzung-Bi Shih <tzungbi(a)kernel.org>
---
Changes from v1(https://patchwork.kernel.org/project/linux-iio/patch/20230828094339.1248…:
- Use iio_device_{claim|release}_buffer_mode() instead of accessing `mlock`.
drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
index b72d39fc2434..6bfe5d6847e7 100644
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
+++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c
@@ -190,8 +190,11 @@ int cros_ec_sensors_push_data(struct iio_dev *indio_dev,
/*
* Ignore samples if the buffer is not set: it is needed if the ODR is
* set but the buffer is not enabled yet.
+ *
+ * Note: iio_device_claim_buffer_mode() returns -EBUSY if the buffer
+ * is not enabled.
*/
- if (!iio_buffer_enabled(indio_dev))
+ if (iio_device_claim_buffer_mode(indio_dev) < 0)
return 0;
out = (s16 *)st->samples;
@@ -210,6 +213,7 @@ int cros_ec_sensors_push_data(struct iio_dev *indio_dev,
iio_push_to_buffers_with_timestamp(indio_dev, st->samples,
timestamp + delta);
+ iio_device_release_buffer_mode(indio_dev);
return 0;
}
EXPORT_SYMBOL_GPL(cros_ec_sensors_push_data);
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Though we do check the event ring read pointer by "is_valid_ring_ptr"
to make sure it is in the buffer range, but there is another risk the
pointer may be not aligned. Since we are expecting event ring elements
are 128 bits(struct mhi_ring_element) aligned, an unaligned read pointer
could lead to multiple issues like DoS or ring buffer memory corruption.
So add a alignment check for event ring read pointer.
Fixes: ec32332df764 ("bus: mhi: core: Sanity check values from remote device before use")
cc: stable(a)vger.kernel.org
Signed-off-by: Krishna chaitanya chundru <quic_krichai(a)quicinc.com>
---
Changes in v2:
- Change the modulus operation to bit-wise & operation as suggested by Jeff.
- Link to v1: https://lore.kernel.org/r/20231023-alignment_check-v1-1-2ca5716d5c15@quicin…
---
drivers/bus/mhi/host/main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index 499590437e9b..e765c16a99d1 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -268,7 +268,8 @@ static void mhi_del_ring_element(struct mhi_controller *mhi_cntrl,
static bool is_valid_ring_ptr(struct mhi_ring *ring, dma_addr_t addr)
{
- return addr >= ring->iommu_base && addr < ring->iommu_base + ring->len;
+ return addr >= ring->iommu_base && addr < ring->iommu_base + ring->len &&
+ !(addr & (sizeof(struct mhi_ring_element) - 1));
}
int mhi_destroy_device(struct device *dev, void *data)
---
base-commit: 71e68e182e382e951d6248bccc3c960dcec5a718
change-id: 20231013-alignment_check-c013f509d24a
Best regards,
--
Krishna chaitanya chundru <quic_krichai(a)quicinc.com>
The 'status' attribute for AP queue devices bound to the vfio_ap device
driver displays incorrect status when the mediated device is attached to a
guest, but the queue device is not passed through. In the current
implementation, the status displayed is 'in_use' which is not correct; it
should be 'assigned'. This can happen if one of the queue devices
associated with a given adapter is not bound to the vfio_ap device driver.
For example:
Queues listed in /sys/bus/ap/drivers/vfio_ap:
14.0005
14.0006
14.000d
16.0006
16.000d
Queues listed in /sys/devices/vfio_ap/matrix/$UUID/matrix
14.0005
14.0006
14.000d
16.0005
16.0006
16.000d
Queues listed in /sys/devices/vfio_ap/matrix/$UUID/guest_matrix
14.0005
14.0006
14.000d
The reason no queues for adapter 0x16 are listed in the guest_matrix is
because queue 16.0005 is not bound to the vfio_ap device driver, so no
queue associated with the adapter is passed through to the guest;
therefore, each queue device for adapter 0x16 should display 'assigned'
instead of 'in_use', because those queues are not in use by a guest, but
only assigned to the mediated device.
Let's check the AP configuration for the guest to determine whether a
queue device is passed through before displaying a status of 'in_use'.
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Fixes: f139862b92cf ("s390/vfio-ap: add status attribute to AP queue device's sysfs dir")
Cc: stable(a)vger.kernel.org
---
drivers/s390/crypto/vfio_ap_ops.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 4db538a55192..871c14a6921f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1976,6 +1976,7 @@ static ssize_t status_show(struct device *dev,
{
ssize_t nchars = 0;
struct vfio_ap_queue *q;
+ unsigned long apid, apqi;
struct ap_matrix_mdev *matrix_mdev;
struct ap_device *apdev = to_ap_dev(dev);
@@ -1984,7 +1985,11 @@ static ssize_t status_show(struct device *dev,
matrix_mdev = vfio_ap_mdev_for_queue(q);
if (matrix_mdev) {
- if (matrix_mdev->kvm)
+ apid = AP_QID_CARD(q->apqn);
+ apqi = AP_QID_QUEUE(q->apqn);
+ if (matrix_mdev->kvm &&
+ test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) &&
+ test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
nchars = scnprintf(buf, PAGE_SIZE, "%s\n",
AP_QUEUE_IN_USE);
else
--
2.41.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b65235f6e102354ccafda601eaa1c5bef5284d21
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102017-human-marine-7125@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b65235f6e102354ccafda601eaa1c5bef5284d21 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 28 Sep 2023 20:33:51 +0300
Subject: [PATCH] x86: KVM: SVM: always update the x2avic msr interception
The following problem exists since x2avic was enabled in the KVM:
svm_set_x2apic_msr_interception is called to enable the interception of
the x2apic msrs.
In particular it is called at the moment the guest resets its apic.
Assuming that the guest's apic was in x2apic mode, the reset will bring
it back to the xapic mode.
The svm_set_x2apic_msr_interception however has an erroneous check for
'!apic_x2apic_mode()' which prevents it from doing anything in this case.
As a result of this, all x2apic msrs are left unintercepted, and that
exposes the bare metal x2apic (if enabled) to the guest.
Oops.
Remove the erroneous '!apic_x2apic_mode()' check to fix that.
This fixes CVE-2023-5090
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230928173354.217464-2-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9507df93f410..acdd0b89e471 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -913,8 +913,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;
- if (!x2avic_enabled ||
- !apic_x2apic_mode(svm->vcpu.arch.apic))
+ if (!x2avic_enabled)
return;
for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {