From: Octavian Purdila <tavip(a)google.com>
commit 3fff5da4ca2164bb4d0f1e6cd33f6eb8a0e73e50 upstream.
Prevent adding a device which is already a team device lower,
e.g. adding veth0 if vlan1 was already added and veth0 is a lower of
vlan1.
This is not useful in practice and can lead to recursive locking:
$ ip link add veth0 type veth peer name veth1
$ ip link set veth0 up
$ ip link set veth1 up
$ ip link add link veth0 name veth0.1 type vlan protocol 802.1Q id 1
$ ip link add team0 type team
$ ip link set veth0.1 down
$ ip link set veth0.1 master team0
team0: Port device veth0.1 added
$ ip link set veth0 down
$ ip link set veth0 master team0
============================================
WARNING: possible recursive locking detected
6.13.0-rc2-virtme-00441-ga14a429069bb #46 Not tainted
--------------------------------------------
ip/7684 is trying to acquire lock:
ffff888016848e00 (team->team_lock_key){+.+.}-{4:4}, at: team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
but task is already holding lock:
ffff888016848e00 (team->team_lock_key){+.+.}-{4:4}, at: team_add_slave (drivers/net/team/team_core.c:1147 drivers/net/team/team_core.c:1977)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(team->team_lock_key);
lock(team->team_lock_key);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by ip/7684:
stack backtrace:
CPU: 3 UID: 0 PID: 7684 Comm: ip Not tainted 6.13.0-rc2-virtme-00441-ga14a429069bb #46
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:122)
print_deadlock_bug.cold (kernel/locking/lockdep.c:3040)
__lock_acquire (kernel/locking/lockdep.c:3893 kernel/locking/lockdep.c:5226)
? netlink_broadcast_filtered (net/netlink/af_netlink.c:1548)
lock_acquire.part.0 (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? trace_lock_acquire (./include/trace/events/lock.h:24 (discriminator 2))
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? lock_acquire (kernel/locking/lockdep.c:5822)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
__mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:735)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? fib_sync_up (net/ipv4/fib_semantics.c:2167)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
notifier_call_chain (kernel/notifier.c:85)
call_netdevice_notifiers_info (net/core/dev.c:1996)
__dev_notify_flags (net/core/dev.c:8993)
? __dev_change_flags (net/core/dev.c:8975)
dev_change_flags (net/core/dev.c:9027)
vlan_device_event (net/8021q/vlan.c:85 net/8021q/vlan.c:470)
? br_device_event (net/bridge/br.c:143)
notifier_call_chain (kernel/notifier.c:85)
call_netdevice_notifiers_info (net/core/dev.c:1996)
dev_open (net/core/dev.c:1519 net/core/dev.c:1505)
team_add_slave (drivers/net/team/team_core.c:1219 drivers/net/team/team_core.c:1977)
? __pfx_team_add_slave (drivers/net/team/team_core.c:1972)
do_set_master (net/core/rtnetlink.c:2917)
do_setlink.isra.0 (net/core/rtnetlink.c:3117)
Reported-by: syzbot+3c47b5843403a45aef57(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3c47b5843403a45aef57
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Octavian Purdila <tavip(a)google.com>
Reviewed-by: Hangbin Liu <liuhangbin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[Alexey: fixed path from team_core.c to team.c to resolve merge conflict]
Signed-off-by: Alexey Panov <apanov(a)astralinux.ru>
---
v2: fixed Cc
drivers/net/team/team.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 5e5af71a85ac..015151cd2222 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1166,6 +1166,13 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
return -EBUSY;
}
+ if (netdev_has_upper_dev(port_dev, dev)) {
+ NL_SET_ERR_MSG(extack, "Device is already a lower device of the team interface");
+ netdev_err(dev, "Device %s is already a lower device of the team interface\n",
+ portname);
+ return -EBUSY;
+ }
+
if (port_dev->features & NETIF_F_VLAN_CHALLENGED &&
vlan_uses_dev(dev)) {
NL_SET_ERR_MSG(extack, "Device is VLAN challenged and team device has VLAN set up");
--
2.30.2
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 221cd51efe4565501a3dbf04cc011b537dcce7fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021036-footwork-entryway-f39c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 221cd51efe4565501a3dbf04cc011b537dcce7fb Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Tue, 3 Dec 2024 21:20:10 +0000
Subject: [PATCH] media: uvcvideo: Remove dangling pointers
When an async control is written, we copy a pointer to the file handle
that started the operation. That pointer will be used when the device is
done. Which could be anytime in the future.
If the user closes that file descriptor, its structure will be freed,
and there will be one dangling pointer per pending async control, that
the driver will try to use.
Clean all the dangling pointers during release().
To avoid adding a performance penalty in the most common case (no async
operation), a counter has been introduced with some logic to make sure
that it is properly handled.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-3-26c867231118@chromium…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index b05b84887e51..4837d8df9c03 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1579,6 +1579,40 @@ static void uvc_ctrl_send_slave_event(struct uvc_video_chain *chain,
uvc_ctrl_send_event(chain, handle, ctrl, mapping, val, changes);
}
+static void uvc_ctrl_set_handle(struct uvc_fh *handle, struct uvc_control *ctrl,
+ struct uvc_fh *new_handle)
+{
+ lockdep_assert_held(&handle->chain->ctrl_mutex);
+
+ if (new_handle) {
+ if (ctrl->handle)
+ dev_warn_ratelimited(&handle->stream->dev->udev->dev,
+ "UVC non compliance: Setting an async control with a pending operation.");
+
+ if (new_handle == ctrl->handle)
+ return;
+
+ if (ctrl->handle) {
+ WARN_ON(!ctrl->handle->pending_async_ctrls);
+ if (ctrl->handle->pending_async_ctrls)
+ ctrl->handle->pending_async_ctrls--;
+ }
+
+ ctrl->handle = new_handle;
+ handle->pending_async_ctrls++;
+ return;
+ }
+
+ /* Cannot clear the handle for a control not owned by us.*/
+ if (WARN_ON(ctrl->handle != handle))
+ return;
+
+ ctrl->handle = NULL;
+ if (WARN_ON(!handle->pending_async_ctrls))
+ return;
+ handle->pending_async_ctrls--;
+}
+
void uvc_ctrl_status_event(struct uvc_video_chain *chain,
struct uvc_control *ctrl, const u8 *data)
{
@@ -1589,7 +1623,8 @@ void uvc_ctrl_status_event(struct uvc_video_chain *chain,
mutex_lock(&chain->ctrl_mutex);
handle = ctrl->handle;
- ctrl->handle = NULL;
+ if (handle)
+ uvc_ctrl_set_handle(handle, ctrl, NULL);
list_for_each_entry(mapping, &ctrl->info.mappings, list) {
s32 value = __uvc_ctrl_get_value(mapping, data);
@@ -1863,7 +1898,7 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
if (!rollback && handle &&
ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
- ctrl->handle = handle;
+ uvc_ctrl_set_handle(handle, ctrl, handle);
}
return 0;
@@ -2772,6 +2807,26 @@ int uvc_ctrl_init_device(struct uvc_device *dev)
return 0;
}
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle)
+{
+ struct uvc_entity *entity;
+
+ guard(mutex)(&handle->chain->ctrl_mutex);
+
+ if (!handle->pending_async_ctrls)
+ return;
+
+ list_for_each_entry(entity, &handle->chain->dev->entities, list) {
+ for (unsigned int i = 0; i < entity->ncontrols; ++i) {
+ if (entity->controls[i].handle != handle)
+ continue;
+ uvc_ctrl_set_handle(handle, &entity->controls[i], NULL);
+ }
+ }
+
+ WARN_ON(handle->pending_async_ctrls);
+}
+
/*
* Cleanup device controls.
*/
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index dee6feeba274..93c6cdb23881 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -671,6 +671,8 @@ static int uvc_v4l2_release(struct file *file)
uvc_dbg(stream->dev, CALLS, "%s\n", __func__);
+ uvc_ctrl_cleanup_fh(handle);
+
/* Only free resources if this is a privileged handle. */
if (uvc_has_privileges(handle))
uvc_queue_release(&stream->queue);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 965a789ed03e..5690cfd61e23 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -338,7 +338,11 @@ struct uvc_video_chain {
struct uvc_entity *processing; /* Processing unit */
struct uvc_entity *selector; /* Selector unit */
- struct mutex ctrl_mutex; /* Protects ctrl.info */
+ struct mutex ctrl_mutex; /*
+ * Protects ctrl.info,
+ * ctrl.handle and
+ * uvc_fh.pending_async_ctrls
+ */
struct v4l2_prio_state prio; /* V4L2 priority state */
u32 caps; /* V4L2 chain-wide caps */
@@ -613,6 +617,7 @@ struct uvc_fh {
struct uvc_video_chain *chain;
struct uvc_streaming *stream;
enum uvc_handle_state state;
+ unsigned int pending_async_ctrls;
};
struct uvc_driver {
@@ -798,6 +803,8 @@ int uvc_ctrl_is_accessible(struct uvc_video_chain *chain, u32 v4l2_id,
int uvc_xu_ctrl_query(struct uvc_video_chain *chain,
struct uvc_xu_control_query *xqry);
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle);
+
/* Utility functions */
struct usb_host_endpoint *uvc_find_endpoint(struct usb_host_interface *alts,
u8 epaddr);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d9fecd096f67a4469536e040a8a10bbfb665918b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021008-virus-pampered-abf4@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d9fecd096f67a4469536e040a8a10bbfb665918b Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Tue, 3 Dec 2024 21:20:08 +0000
Subject: [PATCH] media: uvcvideo: Only save async fh if success
Now we keep a reference to the active fh for any call to uvc_ctrl_set,
regardless if it is an actual set or if it is a just a try or if the
device refused the operation.
We should only keep the file handle if the device actually accepted
applying the operation.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Suggested-by: Hans de Goede <hdegoede(a)redhat.com>
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-1-26c867231118@chromium…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index bab9fdac98e6..e0806641a8d0 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1811,7 +1811,10 @@ int uvc_ctrl_begin(struct uvc_video_chain *chain)
}
static int uvc_ctrl_commit_entity(struct uvc_device *dev,
- struct uvc_entity *entity, int rollback, struct uvc_control **err_ctrl)
+ struct uvc_fh *handle,
+ struct uvc_entity *entity,
+ int rollback,
+ struct uvc_control **err_ctrl)
{
struct uvc_control *ctrl;
unsigned int i;
@@ -1859,6 +1862,10 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
*err_ctrl = ctrl;
return ret;
}
+
+ if (!rollback && handle &&
+ ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
+ ctrl->handle = handle;
}
return 0;
@@ -1895,8 +1902,8 @@ int __uvc_ctrl_commit(struct uvc_fh *handle, int rollback,
/* Find the control. */
list_for_each_entry(entity, &chain->entities, chain) {
- ret = uvc_ctrl_commit_entity(chain->dev, entity, rollback,
- &err_ctrl);
+ ret = uvc_ctrl_commit_entity(chain->dev, handle, entity,
+ rollback, &err_ctrl);
if (ret < 0) {
if (ctrls)
ctrls->error_idx =
@@ -2046,9 +2053,6 @@ int uvc_ctrl_set(struct uvc_fh *handle,
mapping->set(mapping, value,
uvc_ctrl_data(ctrl, UVC_CTRL_DATA_CURRENT));
- if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
- ctrl->handle = handle;
-
ctrl->dirty = 1;
ctrl->modified = 1;
return 0;
@@ -2377,7 +2381,7 @@ int uvc_ctrl_restore_values(struct uvc_device *dev)
ctrl->dirty = 1;
}
- ret = uvc_ctrl_commit_entity(dev, entity, 0, NULL);
+ ret = uvc_ctrl_commit_entity(dev, NULL, entity, 0, NULL);
if (ret < 0)
return ret;
}
Hi Juergen, hi all,
Radoslav Bodó reported in Debian an issue after updating our kernel
from 6.1.112 to 6.1.115. His report in full is at:
https://bugs.debian.org/1088159
He reports that after switching to 6.1.115 (and present in any of the
later 6.1.y series) booting under xen, the mptsas devices are not
anymore accessible, the boot shows:
mpt3sas version 43.100.00.00 loaded
mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8086116 kB)
mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
mpt3sas_cm0: MSI-X vectors supported: 96
mpt3sas_cm0: 0 40 40
mpt3sas_cm0: High IOPs queues : disabled
mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 447
mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 448
mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 449
mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 450
mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 451
mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 452
mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 453
mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 454
mpt3sas0-msix8: PCI-MSI-X enabled: IRQ 455
mpt3sas0-msix9: PCI-MSI-X enabled: IRQ 456
mpt3sas0-msix10: PCI-MSI-X enabled: IRQ 457
mpt3sas0-msix11: PCI-MSI-X enabled: IRQ 458
mpt3sas0-msix12: PCI-MSI-X enabled: IRQ 459
mpt3sas0-msix13: PCI-MSI-X enabled: IRQ 460
mpt3sas0-msix14: PCI-MSI-X enabled: IRQ 461
mpt3sas0-msix15: PCI-MSI-X enabled: IRQ 462
mpt3sas0-msix16: PCI-MSI-X enabled: IRQ 463
mpt3sas0-msix17: PCI-MSI-X enabled: IRQ 464
mpt3sas0-msix18: PCI-MSI-X enabled: IRQ 465
mpt3sas0-msix19: PCI-MSI-X enabled: IRQ 466
mpt3sas0-msix20: PCI-MSI-X enabled: IRQ 467
mpt3sas0-msix21: PCI-MSI-X enabled: IRQ 468
mpt3sas0-msix22: PCI-MSI-X enabled: IRQ 469
mpt3sas0-msix23: PCI-MSI-X enabled: IRQ 470
mpt3sas0-msix24: PCI-MSI-X enabled: IRQ 471
mpt3sas0-msix25: PCI-MSI-X enabled: IRQ 472
mpt3sas0-msix26: PCI-MSI-X enabled: IRQ 473
mpt3sas0-msix27: PCI-MSI-X enabled: IRQ 474
mpt3sas0-msix28: PCI-MSI-X enabled: IRQ 475
mpt3sas0-msix29: PCI-MSI-X enabled: IRQ 476
mpt3sas0-msix30: PCI-MSI-X enabled: IRQ 477
mpt3sas0-msix31: PCI-MSI-X enabled: IRQ 478
mpt3sas0-msix32: PCI-MSI-X enabled: IRQ 479
mpt3sas0-msix33: PCI-MSI-X enabled: IRQ 480
mpt3sas0-msix34: PCI-MSI-X enabled: IRQ 481
mpt3sas0-msix35: PCI-MSI-X enabled: IRQ 482
mpt3sas0-msix36: PCI-MSI-X enabled: IRQ 483
mpt3sas0-msix37: PCI-MSI-X enabled: IRQ 484
mpt3sas0-msix38: PCI-MSI-X enabled: IRQ 485
mpt3sas0-msix39: PCI-MSI-X enabled: IRQ 486
mpt3sas_cm0: iomem(0x00000000ac400000), mapped(0x00000000d9f45f61), size(65536)
mpt3sas_cm0: ioport(0x0000000000006000), size(256)
mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
mpt3sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
mpt3sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:12348/_scsih_probe()!
We were able to bissect the changes (see https://bugs.debian.org/1088159#64) down to
b1e6e80a1b42 ("xen/swiotlb: add alignment check for dma buffers")
#regzbot introduced: b1e6e80a1b42
#regzbot link: https://bugs.debian.org/1088159
reverting the commit resolves the issue.
Does that ring some bells?
In fact we have two more bugs reported with similar symptoms but not
yet confirmed they are the same, but I'm referencing them here as well
in case we are able to cross-match to root cause:
https://bugs.debian.org/1093371 (megaraid_sas didn't work anymore with
Xen)
and
https://bugs.debian.org/1087807 (Unable to boot: i40e swiotlb buffer
is full)
(but again the these are yet not confirmed to have the same root
cause).
Thanks in advance,
Regards,
Salvatore
If an inactive rsb is not hashed anymore and this could occur because we
releases and acquired locks we need to signal the followed code that the
lookup failed. Since the lookup was successful, but it isn't part of the
rsb hash anymore we need to signal it by setting error to -EBADR as
dlm_search_rsb_tree() does it.
Cc: stable(a)vger.kernel.org
Fixes: 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct lookup")
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
fs/dlm/lock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index c8ff88f1cdcf..499fa999ae83 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -784,6 +784,7 @@ static int find_rsb_dir(struct dlm_ls *ls, const void *name, int len,
}
} else {
write_unlock_bh(&ls->ls_rsbtbl_lock);
+ error = -EBADR;
goto do_new;
}
--
2.43.0
The pnfs that we obtain from hmm_range_fault() point to pages that
we don't have a reference on, and the guarantee that they are still
in the cpu page-tables is that the notifier lock must be held and the
notifier seqno is still valid.
So while building the sg table and marking the pages accesses / dirty
we need to hold this lock with a validated seqno.
However, the lock is reclaim tainted which makes
sg_alloc_table_from_pages_segment() unusable, since it internally
allocates memory.
Instead build the sg-table manually. For the non-iommu case
this might lead to fewer coalesces, but if that's a problem it can
be fixed up later in the resource cursor code. For the iommu case,
the whole sg-table may still be coalesced to a single contigous
device va region.
This avoids marking pages that we don't own dirty and accessed, and
it also avoid dereferencing struct pages that we don't own.
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
---
drivers/gpu/drm/xe/xe_hmm.c | 115 ++++++++++++++++++++++++++----------
1 file changed, 85 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index c56738fa713b..d3b5551496d0 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -42,6 +42,36 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
}
}
+static int xe_alloc_sg(struct sg_table *st, struct hmm_range *range,
+ struct rw_semaphore *notifier_sem)
+{
+ unsigned long i, npages, hmm_pfn;
+ unsigned long num_chunks = 0;
+ int ret;
+
+ /* HMM docs says this is needed. */
+ ret = down_read_interruptible(notifier_sem);
+ if (ret)
+ return ret;
+
+ if (mmu_interval_read_retry(range->notifier, range->notifier_seq))
+ return -EAGAIN;
+
+ npages = xe_npages_in_range(range->start, range->end);
+ for (i = 0; i < npages;) {
+ hmm_pfn = range->hmm_pfns[i];
+ if (!(hmm_pfn & HMM_PFN_VALID)) {
+ up_read(notifier_sem);
+ return -EFAULT;
+ }
+ num_chunks++;
+ i += 1UL << hmm_pfn_to_map_order(hmm_pfn);
+ }
+ up_read(notifier_sem);
+
+ return sg_alloc_table(st, num_chunks, GFP_KERNEL);
+}
+
/**
* xe_build_sg() - build a scatter gather table for all the physical pages/pfn
* in a hmm_range. dma-map pages if necessary. dma-address is save in sg table
@@ -50,6 +80,7 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
* @range: the hmm range that we build the sg table from. range->hmm_pfns[]
* has the pfn numbers of pages that back up this hmm address range.
* @st: pointer to the sg table.
+ * @notifier_sem: The xe notifier lock.
* @write: whether we write to this range. This decides dma map direction
* for system pages. If write we map it bi-diretional; otherwise
* DMA_TO_DEVICE
@@ -76,38 +107,33 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
* Returns 0 if successful; -ENOMEM if fails to allocate memory
*/
static int xe_build_sg(struct xe_device *xe, struct hmm_range *range,
- struct sg_table *st, bool write)
+ struct sg_table *st,
+ struct rw_semaphore *notifier_sem,
+ bool write)
{
struct device *dev = xe->drm.dev;
- struct page **pages;
- u64 i, npages;
- int ret;
-
- npages = xe_npages_in_range(range->start, range->end);
- pages = kvmalloc_array(npages, sizeof(*pages), GFP_KERNEL);
- if (!pages)
- return -ENOMEM;
-
- for (i = 0; i < npages; i++) {
- pages[i] = hmm_pfn_to_page(range->hmm_pfns[i]);
- xe_assert(xe, !is_device_private_page(pages[i]));
- }
-
- ret = sg_alloc_table_from_pages_segment(st, pages, npages, 0, npages << PAGE_SHIFT,
- xe_sg_segment_size(dev), GFP_KERNEL);
- if (ret)
- goto free_pages;
-
- ret = dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
- if (ret) {
- sg_free_table(st);
- st = NULL;
+ unsigned long hmm_pfn, size;
+ struct scatterlist *sgl;
+ struct page *page;
+ unsigned long i, j;
+
+ lockdep_assert_held(notifier_sem);
+
+ i = 0;
+ for_each_sg(st->sgl, sgl, st->nents, j) {
+ hmm_pfn = range->hmm_pfns[i];
+ page = hmm_pfn_to_page(hmm_pfn);
+ xe_assert(xe, !is_device_private_page(page));
+ size = 1UL << hmm_pfn_to_map_order(hmm_pfn);
+ sg_set_page(sgl, page, size << PAGE_SHIFT, 0);
+ if (unlikely(j == st->nents - 1))
+ sg_mark_end(sgl);
+ i += size;
}
+ xe_assert(xe, i == xe_npages_in_range(range->start, range->end));
-free_pages:
- kvfree(pages);
- return ret;
+ return dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
}
/**
@@ -235,16 +261,45 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
if (ret)
goto free_pfns;
- ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
+ if (unlikely(userptr->sg)) {
+ ret = down_write_killable(&vm->userptr.notifier_lock);
+ if (ret)
+ goto free_pfns;
+
+ xe_hmm_userptr_free_sg(uvma);
+ up_write(&vm->userptr.notifier_lock);
+ }
+
+ ret = xe_alloc_sg(&userptr->sgt, &hmm_range, &vm->userptr.notifier_lock);
if (ret)
goto free_pfns;
+ ret = down_read_interruptible(&vm->userptr.notifier_lock);
+ if (ret)
+ goto free_st;
+
+ if (mmu_interval_read_retry(hmm_range.notifier, hmm_range.notifier_seq)) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+
+ ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt,
+ &vm->userptr.notifier_lock, write);
+ if (ret)
+ goto out_unlock;
+
xe_mark_range_accessed(&hmm_range, write);
userptr->sg = &userptr->sgt;
userptr->notifier_seq = hmm_range.notifier_seq;
+ up_read(&vm->userptr.notifier_lock);
+ kvfree(pfns);
+ return 0;
+out_unlock:
+ up_read(&vm->userptr.notifier_lock);
+free_st:
+ sg_free_table(&userptr->sgt);
free_pfns:
kvfree(pfns);
return ret;
}
-
--
2.48.1
Filtering decisions are made in filters evaluation order. Once a
decision is made by a filter, filters that scheduled to be evaluated
after the decision-made filter should just respect it. This is the
intended and documented behavior. Since core layer-handled filters are
evaluated before operations layer-handled filters, decisions made on
core layer should respected by ops layer.
In case of reject filters, the decision is respected, since core
layer-rejected regions are not passed to ops layer. But in case of
allow filters, ops layer filters don't know if the region has passed to
them because it was allowed by core filters or just because it didn't
match to any core layer. The current wrong implementation assumes it
was due to not matched by any core filters. As a reuslt, the decision
is not respected. Pass the missing information to ops layer using a new
filed in 'struct damos', and make the ops layer filters respect it.
Fixes: 491fee286e56 ("mm/damon/core: support damos_filter->allow")
Cc: <stable(a)vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
include/linux/damon.h | 5 +++++
mm/damon/core.c | 6 +++++-
mm/damon/paddr.c | 3 +++
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 795ca09b1107..242910b190c9 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -496,6 +496,11 @@ struct damos {
unsigned long next_apply_sis;
/* informs if ongoing DAMOS walk for this scheme is finished */
bool walk_completed;
+ /*
+ * If the current region in the filtering stage is allowed by core
+ * layer-handled filters. If true, operations layer allows it, too.
+ */
+ bool core_filters_allowed;
/* public: */
struct damos_quota quota;
struct damos_watermarks wmarks;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index cfa105ee9610..b1ce072b56f2 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1433,9 +1433,13 @@ static bool damos_filter_out(struct damon_ctx *ctx, struct damon_target *t,
{
struct damos_filter *filter;
+ s->core_filters_allowed = false;
damos_for_each_filter(filter, s) {
- if (damos_filter_match(ctx, t, r, filter))
+ if (damos_filter_match(ctx, t, r, filter)) {
+ if (filter->allow)
+ s->core_filters_allowed = true;
return !filter->allow;
+ }
}
return false;
}
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 25090230da17..d5db313ca717 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -253,6 +253,9 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
{
struct damos_filter *filter;
+ if (scheme->core_filters_allowed)
+ return false;
+
damos_for_each_filter(filter, scheme) {
if (damos_pa_filter_match(filter, folio))
return !filter->allow;
base-commit: c8f5534db6574708eee17fcd416f0a3fb3b45dbd
--
2.39.5
The function for allocating and initialize a 'struct damos' object,
damon_new_scheme(), is not initializing damos->walk_completed field.
Only damos_walk_complete() is setting the field. Hence the field will
be eventually set and used correctly from second damos_walk() call for
the scheme. But the first damos_walk() could mistakenly not walk on the
regions. Actually, a common usage of DAMOS for taking an access pattern
snapshot is installing a monitoring-purpose DAMOS scheme, doing
damos_walk() to retrieve the snapshot, and then removing the scheme.
DAMON user-space tool (damo) also gets runtime snapshot in the way.
Hence the problem can continuously happen in such use cases. Initialize
it properly in the allocation function.
Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()")
Cc: <stable(a)vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 38f545fea585..cfa105ee9610 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -373,6 +373,7 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
* or damon_attrs are updated.
*/
scheme->next_apply_sis = 0;
+ scheme->walk_completed = false;
INIT_LIST_HEAD(&scheme->filters);
scheme->stat = (struct damos_stat){};
INIT_LIST_HEAD(&scheme->list);
base-commit: 3880bbe477938a3b30ff7bf2ef316adf98876671
--
2.39.5
Hello everyone,
on the Arch Linux Bugtracker[1] Benjamin (also added in CC) reported
that his MT7925 wifi card has halved it's throughput when updating from
the v6.13.1 to the v6.13.2 stable kernel. The problem is still present
in the 6.13.5 stable kernel.
We have bisected this issue together and found the backporting of the
following commit responsible for this issue:
4cf9f08632c0 ("wifi: mt76: mt7925: Update mt7925_mcu_uni_[tx,rx]_ba for MLO")
We unfortunately didn't have a chance to test the mainline releases as
the reporter uses the (out of tree) nvidia modules that were not
compatible with mainline release at the time of testing. We will soon
test against Mainline aswell.
I have attached dmesg outputs of a good and a bad boot aswell as his
other hardware specs and will be available to debug this further.
Cheers,
Christian
[1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/112