The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x aea62c744a9ae2a8247c54ec42138405216414da
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090852-importer-unadorned-f55b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
aea62c744a9a ("mmc: cqhci: Fix checking of CQHCI_HALT state")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aea62c744a9ae2a8247c54ec42138405216414da Mon Sep 17 00:00:00 2001
From: Seunghwan Baek <sh8267.baek(a)samsung.com>
Date: Thu, 29 Aug 2024 15:18:22 +0900
Subject: [PATCH] mmc: cqhci: Fix checking of CQHCI_HALT state
To check if mmc cqe is in halt state, need to check set/clear of CQHCI_HALT
bit. At this time, we need to check with &, not &&.
Fixes: a4080225f51d ("mmc: cqhci: support for command queue enabled host")
Cc: stable(a)vger.kernel.org
Signed-off-by: Seunghwan Baek <sh8267.baek(a)samsung.com>
Reviewed-by: Ritesh Harjani <ritesh.list(a)gmail.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20240829061823.3718-2-sh8267.baek@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c
index c14d7251d0bb..a02da26a1efd 100644
--- a/drivers/mmc/host/cqhci-core.c
+++ b/drivers/mmc/host/cqhci-core.c
@@ -617,7 +617,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
cqhci_writel(cq_host, 0, CQHCI_CTL);
mmc->cqe_on = true;
pr_debug("%s: cqhci: CQE on\n", mmc_hostname(mmc));
- if (cqhci_readl(cq_host, CQHCI_CTL) && CQHCI_HALT) {
+ if (cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT) {
pr_err("%s: cqhci: CQE failed to exit halt state\n",
mmc_hostname(mmc));
}
From: Willem de Bruijn <willemb(a)google.com>
Backport the following commit, because it fixes an existing backport
that has caused multiple reports of breakage on 5.15 based kernels:
net: drop bad gso csum_start and offset in virtio_net_hdr
To backport without conflicts, also backport its two dependencies:
net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation
gso: fix dodgy bit handling for GSO_UDP_L4
Also backport the one patch in netdev-net/main that references one
of the above in its Fixes tag:
net: change maximum number of UDP segments to 128
All four patches also exist in 6.1.109
include/linux/udp.h | 2 +-
include/linux/virtio_net.h | 35 +++++++++++++++++-----------
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 17 +++++++++++---
tools/testing/selftests/net/udpgso.c | 2 +-
5 files changed, 40 insertions(+), 19 deletions(-)
--
2.46.0.598.g6f2099f65c-goog
read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is
bigger than the variable hv_sched_clock_offset, which is cached during
early boot, but depending on the timing this assumption may be false
when a hibernated VM starts again (the clock counter starts from 0
again) and is resuming back (Note: hv_init_tsc_clocksource() is not
called during hibernation/resume); consequently,
read_hv_sched_clock_tsc() may return a negative integer (which is
interpreted as a huge positive integer since the return type is u64)
and new kernel messages are prefixed with huge timestamps before
read_hv_sched_clock_tsc() grows big enough (which typically takes
several seconds).
Fix the issue by saving the Hyper-V clock counter just before the
suspend, and using it to correct the hv_sched_clock_offset in
resume. Override x86_platform.save_sched_clock_state and
x86_platform.restore_sched_clock_state so that we don't
have to touch the common x86 code.
Note: if Invariant TSC is available, the issue doesn't happen because
1) we don't register read_hv_sched_clock_tsc() for sched clock:
See commit e5313f1c5404 ("clocksource/drivers/hyper-v: Rework
clocksource and sched clock setup");
2) the common x86 code adjusts TSC similarly: see
__restore_processor_state() -> tsc_verify_tsc_adjust(true) and
x86_platform.restore_sched_clock_state().
Cc: stable(a)vger.kernel.org
Fixes: 1349401ff1aa ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation")
Co-developed-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
drivers/clocksource/hyperv_timer.c | 64 +++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index b2a080647e41..7aa44b8aae2e 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -27,7 +27,10 @@
#include <asm/mshyperv.h>
static struct clock_event_device __percpu *hv_clock_event;
-static u64 hv_sched_clock_offset __ro_after_init;
+
+/* Can have negative values, after resume from hibernation, so keep them s64 */
+static s64 hv_sched_clock_offset __read_mostly;
+static s64 hv_sched_clock_offset_saved;
/*
* If false, we're using the old mechanism for stimer0 interrupts
@@ -51,6 +54,9 @@ static int stimer0_irq = -1;
static int stimer0_message_sint;
static __maybe_unused DEFINE_PER_CPU(long, stimer0_evt);
+static void (*old_save_sched_clock_state)(void);
+static void (*old_restore_sched_clock_state)(void);
+
/*
* Common code for stimer0 interrupts coming via Direct Mode or
* as a VMbus message.
@@ -434,6 +440,39 @@ static u64 noinstr read_hv_sched_clock_tsc(void)
(NSEC_PER_SEC / HV_CLOCK_HZ);
}
+/*
+ * Hyper-V clock counter resets during hibernation. Save and restore clock
+ * offset during suspend/resume, while also considering the time passed
+ * before suspend. This is to make sure that sched_clock using hv tsc page
+ * based clocksource, proceeds from where it left off during suspend and
+ * it shows correct time for the timestamps of kernel messages after resume.
+ */
+static void save_hv_clock_tsc_state(void)
+{
+ hv_sched_clock_offset_saved = hv_read_reference_counter();
+}
+
+static void restore_hv_clock_tsc_state(void)
+{
+ /*
+ * Time passed before suspend = hv_sched_clock_offset_saved
+ * - hv_sched_clock_offset (old)
+ *
+ * After Hyper-V clock counter resets, hv_sched_clock_offset needs a correction.
+ *
+ * New time = hv_read_reference_counter() (future) - hv_sched_clock_offset (new)
+ * New time = Time passed before suspend + hv_read_reference_counter() (future)
+ * - hv_read_reference_counter() (now)
+ *
+ * Solving the above two equations gives:
+ *
+ * hv_sched_clock_offset (new) = hv_sched_clock_offset (old)
+ * - hv_sched_clock_offset_saved
+ * + hv_read_reference_counter() (now))
+ */
+ hv_sched_clock_offset -= hv_sched_clock_offset_saved - hv_read_reference_counter();
+}
+
static void suspend_hv_clock_tsc(struct clocksource *arg)
{
union hv_reference_tsc_msr tsc_msr;
@@ -456,6 +495,24 @@ static void resume_hv_clock_tsc(struct clocksource *arg)
hv_set_msr(HV_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
}
+/*
+ * Functions to override save_sched_clock_state and restore_sched_clock_state
+ * functions of x86_platform. The Hyper-V clock counter is reset during
+ * suspend-resume and the offset used to measure time needs to be
+ * corrected, post resume.
+ */
+static void hv_save_sched_clock_state(void)
+{
+ save_hv_clock_tsc_state();
+ old_save_sched_clock_state();
+}
+
+static void hv_restore_sched_clock_state(void)
+{
+ restore_hv_clock_tsc_state();
+ old_restore_sched_clock_state();
+}
+
#ifdef HAVE_VDSO_CLOCKMODE_HVCLOCK
static int hv_cs_enable(struct clocksource *cs)
{
@@ -539,6 +596,11 @@ static void __init hv_init_tsc_clocksource(void)
hv_read_reference_counter = read_hv_clock_tsc;
+ old_save_sched_clock_state = x86_platform.save_sched_clock_state;
+ x86_platform.save_sched_clock_state = hv_save_sched_clock_state;
+ old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
+ x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
+
/*
* TSC page mapping works differently in root compared to guest.
* - In guest partition the guest PFN has to be passed to the
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.25.1
If the net_conf pointer is NULL and the code attempts to access its
fields without a check, it will lead to a null pointer dereference.
Add a NULL check before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikhail Lobanov <m.lobanov(a)rosalinux.ru>
---
drivers/block/drbd/drbd_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 287a8d1d3f70..87cf5883078f 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -876,7 +876,7 @@ is_valid_state(struct drbd_device *device, union drbd_state ns)
ns.disk == D_OUTDATED)
rv = SS_CONNECTED_OUTDATES;
- else if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) &&
+ else if (nc && (ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) &&
(nc->verify_alg[0] == 0))
rv = SS_NO_VERIFY_ALG;
--
2.43.0
From: Willem de Bruijn <willemb(a)google.com>
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable(a)vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
v1->v2
- skb_transport_header instead of skb->transport_header (edumazet@)
- typo: migitate -> mitigate
---
include/linux/virtio_net.h | 16 +++++-----------
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 4 ++++
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index d1d7825318c32..6c395a2600e8d 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -56,7 +56,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
unsigned int thlen = 0;
unsigned int p_off = 0;
unsigned int ip_proto;
- u64 ret, remainder, gso_size;
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -99,16 +98,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
- if (hdr->gso_size) {
- gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
- ret = div64_u64_rem(skb->len, gso_size, &remainder);
- if (!(ret && (hdr->gso_size > needed) &&
- ((remainder > needed) || (remainder == 0)))) {
- return -EINVAL;
- }
- skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
- }
-
if (!pskb_may_pull(skb, needed))
return -EINVAL;
@@ -182,6 +171,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
if (gso_type != SKB_GSO_UDP_L4)
return -EINVAL;
break;
+ case SKB_GSO_TCPV4:
+ case SKB_GSO_TCPV6:
+ if (skb->csum_offset != offsetof(struct tcphdr, check))
+ return -EINVAL;
+ break;
}
/* Kernel has a special handling for GSO_BY_FRAGS. */
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 4b791e74529e1..e4ad3311e1489 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -140,6 +140,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
if (thlen < sizeof(*th))
goto out;
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb)))
+ goto out;
+
if (!pskb_may_pull(skb, thlen))
goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index aa2e0a28ca613..bc8a9da750fed 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -278,6 +278,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
if (gso_skb->len <= sizeof(*uh) + mss)
return ERR_PTR(-EINVAL);
+ if (unlikely(skb_checksum_start(gso_skb) !=
+ skb_transport_header(gso_skb)))
+ return ERR_PTR(-EINVAL);
+
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
--
2.46.0.rc1.232.g9752f9e123-goog
The quilt patch titled
Subject: mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock
has been removed from the -mm tree. Its filename was
mm-damon-vaddr-protect-vma-traversal-in-__damon_va_thre_regions-with-rcu-read-lock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/damon/vaddr: protect vma traversal in __damon_va_thre_regions() with rcu read lock
Date: Wed, 4 Sep 2024 17:12:04 -0700
Traversing VMAs of a given maple tree should be protected by rcu read
lock. However, __damon_va_three_regions() is not doing the protection.
Hold the lock.
Link: https://lkml.kernel.org/r/20240905001204.1481-1-sj@kernel.org
Fixes: d0cf3dd47f0d ("damon: convert __damon_va_three_regions to use the VMA iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Closes: https://lore.kernel.org/b83651a0-5b24-4206-b860-cb54ffdf209b@roeck-us.net
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/vaddr.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/damon/vaddr.c~mm-damon-vaddr-protect-vma-traversal-in-__damon_va_thre_regions-with-rcu-read-lock
+++ a/mm/damon/vaddr.c
@@ -126,6 +126,7 @@ static int __damon_va_three_regions(stru
* If this is too slow, it can be optimised to examine the maple
* tree gaps.
*/
+ rcu_read_lock();
for_each_vma(vmi, vma) {
unsigned long gap;
@@ -146,6 +147,7 @@ static int __damon_va_three_regions(stru
next:
prev = vma;
}
+ rcu_read_unlock();
if (!sz_range(&second_gap) || !sz_range(&first_gap))
return -EINVAL;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-mark-three-functions-as-__maybe_unused.patch
The quilt patch titled
Subject: ocfs2: cancel dqi_sync_work before freeing oinfo
has been removed from the -mm tree. Its filename was
ocfs2-cancel-dqi_sync_work-before-freeing-oinfo.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: cancel dqi_sync_work before freeing oinfo
Date: Wed, 4 Sep 2024 15:10:03 +0800
ocfs2_global_read_info() will initialize and schedule dqi_sync_work at the
end, if error occurs after successfully reading global quota, it will
trigger the following warning with CONFIG_DEBUG_OBJECTS_* enabled:
ODEBUG: free active (active state 0) object: 00000000d8b0ce28 object type: timer_list hint: qsync_work_fn+0x0/0x16c
This reports that there is an active delayed work when freeing oinfo in
error handling, so cancel dqi_sync_work first. BTW, return status instead
of -1 when .read_file_info fails.
Link: https://syzkaller.appspot.com/bug?extid=f7af59df5d6b25f0febd
Link: https://lkml.kernel.org/r/20240904071004.2067695-1-joseph.qi@linux.alibaba.…
Fixes: 171bf93ce11f ("ocfs2: Periodic quota syncing")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Reported-by: syzbot+f7af59df5d6b25f0febd(a)syzkaller.appspotmail.com
Tested-by: syzbot+f7af59df5d6b25f0febd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/quota_local.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/quota_local.c~ocfs2-cancel-dqi_sync_work-before-freeing-oinfo
+++ a/fs/ocfs2/quota_local.c
@@ -692,7 +692,7 @@ static int ocfs2_local_read_info(struct
int status;
struct buffer_head *bh = NULL;
struct ocfs2_quota_recovery *rec;
- int locked = 0;
+ int locked = 0, global_read = 0;
info->dqi_max_spc_limit = 0x7fffffffffffffffLL;
info->dqi_max_ino_limit = 0x7fffffffffffffffLL;
@@ -700,6 +700,7 @@ static int ocfs2_local_read_info(struct
if (!oinfo) {
mlog(ML_ERROR, "failed to allocate memory for ocfs2 quota"
" info.");
+ status = -ENOMEM;
goto out_err;
}
info->dqi_priv = oinfo;
@@ -712,6 +713,7 @@ static int ocfs2_local_read_info(struct
status = ocfs2_global_read_info(sb, type);
if (status < 0)
goto out_err;
+ global_read = 1;
status = ocfs2_inode_lock(lqinode, &oinfo->dqi_lqi_bh, 1);
if (status < 0) {
@@ -782,10 +784,12 @@ out_err:
if (locked)
ocfs2_inode_unlock(lqinode, 1);
ocfs2_release_local_quota_bitmaps(&oinfo->dqi_chunk);
+ if (global_read)
+ cancel_delayed_work_sync(&oinfo->dqi_sync_work);
kfree(oinfo);
}
brelse(bh);
- return -1;
+ return status;
}
/* Write local info to quota file */
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-cleanup-return-value-and-mlog-in-ocfs2_global_read_info.patch
The quilt patch titled
Subject: ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
has been removed from the -mm tree. Its filename was
ocfs2-fix-possible-null-ptr-deref-in-ocfs2_set_buffer_uptodate.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lizhi Xu <lizhi.xu(a)windriver.com>
Subject: ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
Date: Mon, 2 Sep 2024 10:36:36 +0800
When doing cleanup, if flags without OCFS2_BH_READAHEAD, it may trigger
NULL pointer dereference in the following ocfs2_set_buffer_uptodate() if
bh is NULL.
Link: https://lkml.kernel.org/r/20240902023636.1843422-3-joseph.qi@linux.alibaba.…
Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: Heming Zhao <heming.zhao(a)suse.com>
Suggested-by: Heming Zhao <heming.zhao(a)suse.com>
Cc: <stable(a)vger.kernel.org> [4.20+]
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/buffer_head_io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/buffer_head_io.c~ocfs2-fix-possible-null-ptr-deref-in-ocfs2_set_buffer_uptodate
+++ a/fs/ocfs2/buffer_head_io.c
@@ -388,7 +388,8 @@ read_failure:
/* Always set the buffer in the cache, even if it was
* a forced read, or read-ahead which hasn't yet
* completed. */
- ocfs2_set_buffer_uptodate(ci, bh);
+ if (bh)
+ ocfs2_set_buffer_uptodate(ci, bh);
}
ocfs2_metadata_cache_io_unlock(ci);
_
Patches currently in -mm which might be from lizhi.xu(a)windriver.com are
The quilt patch titled
Subject: ocfs2: remove unreasonable unlock in ocfs2_read_blocks
has been removed from the -mm tree. Its filename was
ocfs2-remove-unreasonable-unlock-in-ocfs2_read_blocks.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lizhi Xu <lizhi.xu(a)windriver.com>
Subject: ocfs2: remove unreasonable unlock in ocfs2_read_blocks
Date: Mon, 2 Sep 2024 10:36:35 +0800
Patch series "Misc fixes for ocfs2_read_blocks", v5.
This series contains 2 fixes for ocfs2_read_blocks(). The first patch fix
the issue reported by syzbot, which detects bad unlock balance in
ocfs2_read_blocks(). The second patch fixes an issue reported by Heming
Zhao when reviewing above fix.
This patch (of 2):
There was a lock release before exiting, so remove the unreasonable unlock.
Link: https://lkml.kernel.org/r/20240902023636.1843422-1-joseph.qi@linux.alibaba.…
Link: https://lkml.kernel.org/r/20240902023636.1843422-2-joseph.qi@linux.alibaba.…
Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+ab134185af9ef88dfed5(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ab134185af9ef88dfed5
Tested-by: syzbot+ab134185af9ef88dfed5(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org> [4.20+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/buffer_head_io.c | 1 -
1 file changed, 1 deletion(-)
--- a/fs/ocfs2/buffer_head_io.c~ocfs2-remove-unreasonable-unlock-in-ocfs2_read_blocks
+++ a/fs/ocfs2/buffer_head_io.c
@@ -235,7 +235,6 @@ int ocfs2_read_blocks(struct ocfs2_cachi
if (bhs[i] == NULL) {
bhs[i] = sb_getblk(sb, block++);
if (bhs[i] == NULL) {
- ocfs2_metadata_cache_io_unlock(ci);
status = -ENOMEM;
mlog_errno(status);
/* Don't forget to put previous bh! */
_
Patches currently in -mm which might be from lizhi.xu(a)windriver.com are
The quilt patch titled
Subject: ocfs2: fix null-ptr-deref when journal load failed.
has been removed from the -mm tree. Its filename was
ocfs2-fix-null-ptr-deref-when-journal-load-failed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Julian Sun <sunjunchao2870(a)gmail.com>
Subject: ocfs2: fix null-ptr-deref when journal load failed.
Date: Mon, 2 Sep 2024 11:08:44 +0800
During the mounting process, if journal_reset() fails because of too short
journal, then lead to jbd2_journal_load() fails with NULL j_sb_buffer.
Subsequently, ocfs2_journal_shutdown() calls
jbd2_journal_flush()->jbd2_cleanup_journal_tail()->
__jbd2_update_log_tail()->jbd2_journal_update_sb_log_tail()
->lock_buffer(journal->j_sb_buffer), resulting in a null-pointer
dereference error.
To resolve this issue, we should check the JBD2_LOADED flag to ensure the
journal was properly loaded. Additionally, use journal instead of
osb->journal directly to simplify the code.
Link: https://syzkaller.appspot.com/bug?extid=05b9b39d8bdfe1a0861f
Link: https://lkml.kernel.org/r/20240902030844.422725-1-sunjunchao2870@gmail.com
Fixes: f6f50e28f0cb ("jbd2: Fail to load a journal if it is too short")
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
Reported-by: syzbot+05b9b39d8bdfe1a0861f(a)syzkaller.appspotmail.com
Suggested-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/journal.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/journal.c~ocfs2-fix-null-ptr-deref-when-journal-load-failed
+++ a/fs/ocfs2/journal.c
@@ -1055,7 +1055,7 @@ void ocfs2_journal_shutdown(struct ocfs2
if (!igrab(inode))
BUG();
- num_running_trans = atomic_read(&(osb->journal->j_num_trans));
+ num_running_trans = atomic_read(&(journal->j_num_trans));
trace_ocfs2_journal_shutdown(num_running_trans);
/* Do a commit_cache here. It will flush our journal, *and*
@@ -1074,9 +1074,10 @@ void ocfs2_journal_shutdown(struct ocfs2
osb->commit_task = NULL;
}
- BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0);
+ BUG_ON(atomic_read(&(journal->j_num_trans)) != 0);
- if (ocfs2_mount_local(osb)) {
+ if (ocfs2_mount_local(osb) &&
+ (journal->j_journal->j_flags & JBD2_LOADED)) {
jbd2_journal_lock_updates(journal->j_journal);
status = jbd2_journal_flush(journal->j_journal, 0);
jbd2_journal_unlock_updates(journal->j_journal);
_
Patches currently in -mm which might be from sunjunchao2870(a)gmail.com are
The mwifiex chips support simultaneous Accesspoint and station mode,
but this only works when all are using the same channel. The downstream
driver uses ECSA which makes the Accesspoint automatically switch to the
channel the station is going to use. Until this is implemented in the
mwifiex driver at least catch this situation and bail out with an error.
Userspace doesn't have a meaningful way to figure out what went wrong,
so print an error message to give the user a clue.
Without this patch the driver would timeout on the
HostCmd_CMD_802_11_ASSOCIATE command when creating a station with a
channel different from the one that an existing accesspoint uses.
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 52 ++++++++++++++++++++++++
drivers/net/wireless/marvell/mwifiex/main.h | 1 +
drivers/net/wireless/marvell/mwifiex/sta_ioctl.c | 3 ++
3 files changed, 56 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 5697a02e6b8d3..0d3bf624cd3de 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -2054,6 +2054,55 @@ static int mwifiex_cfg80211_stop_ap(struct wiphy *wiphy, struct net_device *dev,
return 0;
}
+bool mwifiex_channel_conflict(struct mwifiex_private *priv, struct ieee80211_channel *ch)
+{
+ struct mwifiex_adapter *adapter = priv->adapter;
+ struct mwifiex_current_bss_params *bss_params;
+ u8 band;
+ int freq, i;
+
+ for (i = 0; i < adapter->priv_num; i++) {
+ struct mwifiex_private *p = adapter->priv[i];
+ struct ieee80211_channel *used = NULL;
+
+ if (p == priv)
+ continue;
+
+ switch (GET_BSS_ROLE(p)) {
+ case MWIFIEX_BSS_ROLE_UAP:
+ if (!netif_carrier_ok(p->netdev))
+ break;
+
+ if (!cfg80211_chandef_valid(&p->bss_chandef))
+ break;
+
+ used = p->bss_chandef.chan;
+
+ break;
+ case MWIFIEX_BSS_ROLE_STA:
+ if (!p->media_connected)
+ break;
+
+ bss_params = &p->curr_bss_params;
+ band = mwifiex_band_to_radio_type(bss_params->band);
+ freq = ieee80211_channel_to_frequency(bss_params->bss_descriptor.channel,
+ band);
+
+ used = ieee80211_get_channel(priv->wdev.wiphy, freq);
+
+ break;
+ }
+
+ if (used && !ieee80211_channel_equal(used, ch)) {
+ mwifiex_dbg(priv->adapter, MSG,
+ "all AP and STA must operate on same channel\n");
+ return false;
+ }
+ }
+
+ return true;
+}
+
/* cfg80211 operation handler for start_ap.
* Function sets beacon period, DTIM period, SSID and security into
* AP config structure.
@@ -2069,6 +2118,9 @@ static int mwifiex_cfg80211_start_ap(struct wiphy *wiphy,
if (GET_BSS_ROLE(priv) != MWIFIEX_BSS_ROLE_UAP)
return -1;
+ if (!mwifiex_channel_conflict(priv, params->chandef.chan))
+ return -EBUSY;
+
bss_cfg = kzalloc(sizeof(struct mwifiex_uap_bss_param), GFP_KERNEL);
if (!bss_cfg)
return -ENOMEM;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 529863edd7a25..b68dbf884156b 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -1697,6 +1697,7 @@ int mwifiex_set_mac_address(struct mwifiex_private *priv,
struct net_device *dev,
bool external, u8 *new_mac);
void mwifiex_devdump_tmo_func(unsigned long function_context);
+bool mwifiex_channel_conflict(struct mwifiex_private *priv, struct ieee80211_channel *ch);
#ifdef CONFIG_DEBUG_FS
void mwifiex_debugfs_init(void);
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
index d3cba6895f8ce..9794816d8a0c6 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
@@ -291,6 +291,9 @@ int mwifiex_bss_start(struct mwifiex_private *priv, struct cfg80211_bss *bss,
if (!bss_desc)
return -1;
+ if (!mwifiex_channel_conflict(priv, bss->channel))
+ return -EBUSY;
+
if (mwifiex_band_to_radio_type(bss_desc->bss_band) ==
HostCmd_SCAN_RADIO_TYPE_BG) {
config_bands = BAND_B | BAND_G | BAND_GN;
---
base-commit: 67a72043aa2e6f60f7bbe7bfa598ba168f16d04f
change-id: 20240830-mwifiex-check-channel-f411a156bbe0
Best regards,
--
Sascha Hauer <s.hauer(a)pengutronix.de>
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9149c9b0c7e046273141e41eebd8a517416144ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090942-clunky-disobey-80a9@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
9149c9b0c7e0 ("usb: dwc3: core: update LC timer as per USB Spec V3.2")
63d7f9810a38 ("usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug")
843714bb37d9 ("usb: dwc3: Decouple USB 2.0 L1 & L2 events")
9af21dd6faeb ("usb: dwc3: Add support for DWC_usb32 IP")
8bb14308a869 ("usb: dwc3: core: Use role-switch default dr_mode")
d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit")
586f4335700f ("usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name")
7ba6b09fda5e ("usb: dwc3: core: add support for disabling SS instances in park mode")
5eb5afb07853 ("usb: dwc3: use proper initializers for property entries")
9ba3aca8fe82 ("usb: dwc3: Disable phy suspend after power-on reset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9149c9b0c7e046273141e41eebd8a517416144ac Mon Sep 17 00:00:00 2001
From: Faisal Hassan <quic_faisalh(a)quicinc.com>
Date: Thu, 29 Aug 2024 15:15:02 +0530
Subject: [PATCH] usb: dwc3: core: update LC timer as per USB Spec V3.2
This fix addresses STAR 9001285599, which only affects DWC_usb3 version
3.20a. The timer value for PM_LC_TIMER in DWC_usb3 3.20a for the Link
ECN changes is incorrect. If the PM TIMER ECN is enabled via GUCTL2[19],
the link compliance test (TD7.21) may fail. If the ECN is not enabled
(GUCTL2[19] = 0), the controller will use the old timer value (5us),
which is still acceptable for the link compliance test. Therefore, clear
GUCTL2[19] to pass the USB link compliance test: TD 7.21.
Cc: stable(a)vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240829094502.26502-1-quic_faisalh@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index ccc3895dbd7f..9eb085f359ce 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1386,6 +1386,21 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
}
+ /*
+ * STAR 9001285599: This issue affects DWC_usb3 version 3.20a
+ * only. If the PM TIMER ECM is enabled through GUCTL2[19], the
+ * link compliance test (TD7.21) may fail. If the ECN is not
+ * enabled (GUCTL2[19] = 0), the controller will use the old timer
+ * value (5us), which is still acceptable for the link compliance
+ * test. Therefore, do not enable PM TIMER ECM in 3.20a by
+ * setting GUCTL2[19] by default; instead, use GUCTL2[19] = 0.
+ */
+ if (DWC3_VER_IS(DWC3, 320A)) {
+ reg = dwc3_readl(dwc->regs, DWC3_GUCTL2);
+ reg &= ~DWC3_GUCTL2_LC_TIMER;
+ dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
+ }
+
/*
* When configured in HOST mode, after issuing U3/L2 exit controller
* fails to send proper CRC checksum in CRC5 feild. Because of this
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..c71240e8f7c7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -421,6 +421,7 @@
/* Global User Control Register 2 */
#define DWC3_GUCTL2_RST_ACTBITLATER BIT(14)
+#define DWC3_GUCTL2_LC_TIMER BIT(19)
/* Global User Control Register 3 */
#define DWC3_GUCTL3_SPLITDISABLE BIT(14)
@@ -1269,6 +1270,7 @@ struct dwc3 {
#define DWC3_REVISION_290A 0x5533290a
#define DWC3_REVISION_300A 0x5533300a
#define DWC3_REVISION_310A 0x5533310a
+#define DWC3_REVISION_320A 0x5533320a
#define DWC3_REVISION_330A 0x5533330a
#define DWC31_REVISION_ANY 0x0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9149c9b0c7e046273141e41eebd8a517416144ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090943-justness-geologist-75e9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9149c9b0c7e0 ("usb: dwc3: core: update LC timer as per USB Spec V3.2")
63d7f9810a38 ("usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug")
843714bb37d9 ("usb: dwc3: Decouple USB 2.0 L1 & L2 events")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9149c9b0c7e046273141e41eebd8a517416144ac Mon Sep 17 00:00:00 2001
From: Faisal Hassan <quic_faisalh(a)quicinc.com>
Date: Thu, 29 Aug 2024 15:15:02 +0530
Subject: [PATCH] usb: dwc3: core: update LC timer as per USB Spec V3.2
This fix addresses STAR 9001285599, which only affects DWC_usb3 version
3.20a. The timer value for PM_LC_TIMER in DWC_usb3 3.20a for the Link
ECN changes is incorrect. If the PM TIMER ECN is enabled via GUCTL2[19],
the link compliance test (TD7.21) may fail. If the ECN is not enabled
(GUCTL2[19] = 0), the controller will use the old timer value (5us),
which is still acceptable for the link compliance test. Therefore, clear
GUCTL2[19] to pass the USB link compliance test: TD 7.21.
Cc: stable(a)vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240829094502.26502-1-quic_faisalh@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index ccc3895dbd7f..9eb085f359ce 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1386,6 +1386,21 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
}
+ /*
+ * STAR 9001285599: This issue affects DWC_usb3 version 3.20a
+ * only. If the PM TIMER ECM is enabled through GUCTL2[19], the
+ * link compliance test (TD7.21) may fail. If the ECN is not
+ * enabled (GUCTL2[19] = 0), the controller will use the old timer
+ * value (5us), which is still acceptable for the link compliance
+ * test. Therefore, do not enable PM TIMER ECM in 3.20a by
+ * setting GUCTL2[19] by default; instead, use GUCTL2[19] = 0.
+ */
+ if (DWC3_VER_IS(DWC3, 320A)) {
+ reg = dwc3_readl(dwc->regs, DWC3_GUCTL2);
+ reg &= ~DWC3_GUCTL2_LC_TIMER;
+ dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
+ }
+
/*
* When configured in HOST mode, after issuing U3/L2 exit controller
* fails to send proper CRC checksum in CRC5 feild. Because of this
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..c71240e8f7c7 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -421,6 +421,7 @@
/* Global User Control Register 2 */
#define DWC3_GUCTL2_RST_ACTBITLATER BIT(14)
+#define DWC3_GUCTL2_LC_TIMER BIT(19)
/* Global User Control Register 3 */
#define DWC3_GUCTL3_SPLITDISABLE BIT(14)
@@ -1269,6 +1270,7 @@ struct dwc3 {
#define DWC3_REVISION_290A 0x5533290a
#define DWC3_REVISION_300A 0x5533300a
#define DWC3_REVISION_310A 0x5533310a
+#define DWC3_REVISION_320A 0x5533320a
#define DWC3_REVISION_330A 0x5533330a
#define DWC31_REVISION_ANY 0x0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 96f9ab0d5933c1c00142dd052f259fce0bc3ced2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090907-pancreas-remodeler-f80d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
96f9ab0d5933 ("iio: adc: ad7124: fix chip ID mismatch")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 96f9ab0d5933c1c00142dd052f259fce0bc3ced2 Mon Sep 17 00:00:00 2001
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
Date: Wed, 31 Jul 2024 15:37:22 +0300
Subject: [PATCH] iio: adc: ad7124: fix chip ID mismatch
The ad7124_soft_reset() function has the assumption that the chip will
assert the "power-on reset" bit in the STATUS register after a software
reset without any delay. The POR bit =0 is used to check if the chip
initialization is done.
A chip ID mismatch probe error appears intermittently when the probe
continues too soon and the ID register does not contain the expected
value.
Fix by adding a 200us delay after the software reset command is issued.
Fixes: b3af341bbd96 ("iio: adc: Add ad7124 support")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240731-ad7124-fix-v1-1-46a76aa4b9be@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index 3beed78496c5..c0b82f64c976 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -764,6 +764,7 @@ static int ad7124_soft_reset(struct ad7124_state *st)
if (ret < 0)
return ret;
+ fsleep(200);
timeout = 100;
do {
ret = ad_sd_read_reg(&st->sd, AD7124_STATUS, 1, &readval);
Hi,
We are offering you the visitors contact list of Big Data LDN Expo 2024.
We have 12,137 Verified Contact List with discount.
List Contains: Contact Name, Title, Phone Number, Fax Number, Physical address, Company Name, Company URL, Employee Size, Revenue Size, Industry, and more…
Let me know if you’re interested so that I can share you the pricing for the same.
Kind Regards,
Jacob Smith
Senior Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
On 9/9/24 16:36, Charles Keepax wrote:
> On Wed, Sep 04, 2024 at 04:52:28PM +0200, Krzysztof Kozlowski wrote:
>> This reverts commit ab8d66d132bc8f1992d3eb6cab8d32dda6733c84 because it
>> breaks codecs using non-continuous masks in source and sink ports. The
>> commit missed the point that port numbers are not used as indices for
>> iterating over prop.sink_ports or prop.source_ports.
>>
>> Soundwire core and existing codecs expect that the array passed as
>> prop.sink_ports and prop.source_ports is continuous. The port mask still
>> might be non-continuous, but that's unrelated.
>>
>> Reported-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
>> Closes: https://lore.kernel.org/all/b6c75eee-761d-44c8-8413-2a5b34ee2f98@linux.inte…
>> Fixes: ab8d66d132bc ("soundwire: stream: fix programming slave ports for non-continous port maps")
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
>>
>> ---
>
> Would be good to merge this as soon as we can, this is causing
> soundwire regressions from rc6 onwards.
the revert also needs to happen in -stable. 6.10.8 is broken as well.
https://github.com/thesofproject/linux/issues/5168
These are a few patches broken out from [1]. Kalle requested to limit
the number of patches per series to approximately 12 and Francesco to
move the fixes to the front of the series, so here we go.
First two patches are fixes. First one is for host mlme support which
currently is in wireless-next, so no stable tag needed, second one has a
stable tag.
The remaining patches except the last one I have chosen to upstream
first. I'll continue with the other patches after having this series
in shape and merged.
The last one is a new patch not included in [1].
Sascha
[1] https://lore.kernel.org/all/20240820-mwifiex-cleanup-v1-0-320d8de4a4b7@peng…
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
---
Sascha Hauer (12):
wifi: mwifiex: add missing locking
wifi: mwifiex: fix MAC address handling
wifi: mwifiex: deduplicate code in mwifiex_cmd_tx_rate_cfg()
wifi: mwifiex: use adapter as context pointer for mwifiex_hs_activated_event()
wifi: mwifiex: drop unnecessary initialization
wifi: mwifiex: make region_code_mapping_t const
wifi: mwifiex: pass adapter to mwifiex_dnld_cmd_to_fw()
wifi: mwifiex: simplify mwifiex_setup_ht_caps()
wifi: mwifiex: fix indention
wifi: mwifiex: make locally used function static
wifi: mwifiex: move common settings out of switch/case
wifi: mwifiex: drop asynchronous init waiting code
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 38 ++++------
drivers/net/wireless/marvell/mwifiex/cfp.c | 4 +-
drivers/net/wireless/marvell/mwifiex/cmdevt.c | 76 +++++++-------------
drivers/net/wireless/marvell/mwifiex/init.c | 19 ++---
drivers/net/wireless/marvell/mwifiex/main.c | 94 +++++++++----------------
drivers/net/wireless/marvell/mwifiex/main.h | 16 ++---
drivers/net/wireless/marvell/mwifiex/sta_cmd.c | 49 ++++---------
drivers/net/wireless/marvell/mwifiex/txrx.c | 3 +-
drivers/net/wireless/marvell/mwifiex/util.c | 22 +-----
drivers/net/wireless/marvell/mwifiex/wmm.c | 12 ++--
10 files changed, 105 insertions(+), 228 deletions(-)
---
base-commit: 67a72043aa2e6f60f7bbe7bfa598ba168f16d04f
change-id: 20240826-mwifiex-cleanup-1-b5035c7faff6
Best regards,
--
Sascha Hauer <s.hauer(a)pengutronix.de>
It's incorrect to assume that LBR can/should only be used with sampling
events. BPF subsystem provides bpf_get_branch_snapshot() BPF helper,
which expects a properly setup and activated perf event which allows
kernel to capture LBR data.
For instance, retsnoop tool ([0]) makes an extensive use of this
functionality and sets up perf event as follows:
struct perf_event_attr attr;
memset(&attr, 0, sizeof(attr));
attr.size = sizeof(attr);
attr.type = PERF_TYPE_HARDWARE;
attr.config = PERF_COUNT_HW_CPU_CYCLES;
attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_KERNEL;
Commit referenced in Fixes tag broke this setup by making invalid assumption
that LBR is useful only for sampling events. Remove that assumption.
Note, earlier we removed a similar assumption on AMD side of LBR support,
see [1] for details.
[0] https://github.com/anakryiko/retsnoop
[1] 9794563d4d05 ("perf/x86/amd: Don't reject non-sampling events with configured LBR")
Cc: stable(a)vger.kernel.org # 6.8+
Fixes: 85846b27072d ("perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag")
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
---
arch/x86/events/intel/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 9e519d8a810a..f82a342b8852 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3972,7 +3972,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
x86_pmu.pebs_aliases(event);
}
- if (needs_branch_stack(event) && is_sampling_event(event))
+ if (needs_branch_stack(event))
event->hw.flags |= PERF_X86_EVENT_NEEDS_BRANCH_STACK;
if (branch_sample_counters(event)) {
--
2.43.5
On Mon, Sep 9, 2024 at 2:48 PM Sasha Levin <sashal(a)kernel.org> wrote:
> This is a note to let you know that I've just added the patch titled
>
> userfaultfd: fix checks for huge PMDs
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
Thanks for the backport!
Are you also doing the backport for older trees, or should someone
else take care of that?
The following commit has been merged into the core/debugobjects branch of tip:
Commit-ID: 684d28feb8546d1e9597aa363c3bfcf52fe250b7
Gitweb: https://git.kernel.org/tip/684d28feb8546d1e9597aa363c3bfcf52fe250b7
Author: Zhen Lei <thunder.leizhen(a)huawei.com>
AuthorDate: Wed, 04 Sep 2024 21:39:40 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 09 Sep 2024 16:40:25 +02:00
debugobjects: Fix conditions in fill_pool()
fill_pool() uses 'obj_pool_min_free' to decide whether objects should be
handed back to the kmem cache. But 'obj_pool_min_free' records the lowest
historical value of the number of objects in the object pool and not the
minimum number of objects which should be kept in the pool.
Use 'debug_objects_pool_min_level' instead, which holds the minimum number
which was scaled to the number of CPUs at boot time.
[ tglx: Massage change log ]
Fixes: d26bf5056fc0 ("debugobjects: Reduce number of pool_lock acquisitions in fill_pool()")
Fixes: 36c4ead6f6df ("debugobjects: Add global free list and the counter")
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240904133944.2124-3-thunder.leizhen@huawei.com
---
lib/debugobjects.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 7226fdb..6329a86 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -142,13 +142,14 @@ static void fill_pool(void)
* READ_ONCE()s pair with the WRITE_ONCE()s in pool_lock critical
* sections.
*/
- while (READ_ONCE(obj_nr_tofree) && (READ_ONCE(obj_pool_free) < obj_pool_min_free)) {
+ while (READ_ONCE(obj_nr_tofree) &&
+ READ_ONCE(obj_pool_free) < debug_objects_pool_min_level) {
raw_spin_lock_irqsave(&pool_lock, flags);
/*
* Recheck with the lock held as the worker thread might have
* won the race and freed the global free list already.
*/
- while (obj_nr_tofree && (obj_pool_free < obj_pool_min_free)) {
+ while (obj_nr_tofree && (obj_pool_free < debug_objects_pool_min_level)) {
obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
hlist_del(&obj->node);
WRITE_ONCE(obj_nr_tofree, obj_nr_tofree - 1);
Two bitmasks in 'struct sdw_slave_prop' - 'source_ports' and
'sink_ports' - define which ports to program in
sdw_program_slave_port_params(). The masks are used to get the
appropriate data port properties ('struct sdw_get_slave_dpn_prop') from
an array.
Bitmasks can be non-continuous or can start from index different than 0,
thus when looking for matching port property for given port, we must
iterate over mask bits, not from 0 up to number of ports.
This fixes allocation and programming slave ports, when a source or sink
masks start from further index.
Fixes: f8101c74aa54 ("soundwire: Add Master and Slave port programming")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
drivers/soundwire/stream.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c
index 7aa4900dcf31..f275143d7b18 100644
--- a/drivers/soundwire/stream.c
+++ b/drivers/soundwire/stream.c
@@ -1291,18 +1291,18 @@ struct sdw_dpn_prop *sdw_get_slave_dpn_prop(struct sdw_slave *slave,
unsigned int port_num)
{
struct sdw_dpn_prop *dpn_prop;
- u8 num_ports;
+ unsigned long mask;
int i;
if (direction == SDW_DATA_DIR_TX) {
- num_ports = hweight32(slave->prop.source_ports);
+ mask = slave->prop.source_ports;
dpn_prop = slave->prop.src_dpn_prop;
} else {
- num_ports = hweight32(slave->prop.sink_ports);
+ mask = slave->prop.sink_ports;
dpn_prop = slave->prop.sink_dpn_prop;
}
- for (i = 0; i < num_ports; i++) {
+ for_each_set_bit(i, &mask, 32) {
if (dpn_prop[i].num == port_num)
return &dpn_prop[i];
}
--
2.43.0
This patch addresses an issue with improper reference count handling in the
ice_sriov_set_msix_vec_count() function.
First, the function calls ice_get_vf_by_id(), which increments the
reference count of the vf pointer. If the subsequent call to
ice_get_vf_vsi() fails, the function currently returns an error without
decrementing the reference count of the vf pointer, leading to a reference
count leak. The correct behavior, as implemented in this patch, is to
decrement the reference count using ice_put_vf(vf) before returning an
error when vsi is NULL.
Second, the function calls ice_sriov_get_irqs(), which sets
vf->first_vector_idx. If this call returns a negative value, indicating an
error, the function returns an error without decrementing the reference
count of the vf pointer, resulting in another reference count leak. The
patch addresses this by adding a call to ice_put_vf(vf) before returning
an error when vf->first_vector_idx < 0.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and identifying potential mismanagement of reference counts. In this case,
the tool flagged the missing decrement operation as a potential issue,
leading to this patch.
Fixes: 4035c72dc1ba ("ice: reconfig host after changing MSI-X on VF")
Fixes: 4d38cb44bd32 ("ice: manage VFs MSI-X using resource tracking")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)outlook.com>
---
v2:
* In this patch v2, an additional resource leak was addressed when
vf->first_vector_idx < 0. The issue is now fixed by adding ice_put_vf(vf)
before returning an error.
Thanks to Simon Horman for identifying this additional leak scenario.
---
drivers/net/ethernet/intel/ice/ice_sriov.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 55ef33208456..fbf18ac97875 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1096,8 +1096,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
return -ENOENT;
vsi = ice_get_vf_vsi(vf);
- if (!vsi)
+ if (!vsi) {
+ ice_put_vf(vf);
return -ENOENT;
+ }
prev_msix = vf->num_msix;
prev_queues = vf->num_vf_qs;
@@ -1142,8 +1144,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
vf->num_msix = prev_msix;
vf->num_vf_qs = prev_queues;
vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
- if (vf->first_vector_idx < 0)
+ if (vf->first_vector_idx < 0) {
+ ice_put_vf(vf);
return -EINVAL;
+ }
if (needs_rebuild) {
ice_vf_reconfig_vsi(vf);
--
2.25.1
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Without the locking amdgpu currently can race
amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(), leading to the
latter accesing potentially inconsitent entity->sched_list and
entity->num_sched_list pair.
The comment on drm_sched_entity_modify_sched() however says:
"""
* Note that this must be called under the same common lock for @entity as
* drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver needs to
* guarantee through some other means that this is never called while new jobs
* can be pushed to @entity.
"""
It is unclear if that is referring to this race or something else.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
---
drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..ae8be30472cd 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
{
WARN_ON(!num_sched_list || !sched_list);
+ spin_lock(&entity->rq_lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
+ spin_unlock(&entity->rq_lock);
}
EXPORT_SYMBOL(drm_sched_entity_modify_sched);
--
2.46.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9972605a238339b85bd16b084eed5f18414d22db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081211-owl-snowdrop-d2aa@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
9972605a2383 ("memcg: protect concurrent access to mem_cgroup_idr")
6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id is properly set up")
e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
7348cc91821b ("mm: multi-gen LRU: remove aging fairness safeguard")
a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
adb8213014b2 ("mm: memcg: fix stale protection of reclaim target memcg")
57e9cc50f4dd ("mm: vmscan: split khugepaged stats from direct reclaim stats")
e4fea72b1438 ("mglru: mm/vmscan.c: fix imprecise comments")
d396def5d86d ("memcg: rearrange code")
410f8e82689e ("memcg: extract memcg_vmstats from struct mem_cgroup")
d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface")
1332a809d95a ("mm: multi-gen LRU: thrashing prevention")
354ed5974429 ("mm: multi-gen LRU: kill switch")
f76c83378851 ("mm: multi-gen LRU: optimize multiple memcgs")
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap")
ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
ec1c86b25f4b ("mm: multi-gen LRU: groundwork")
f1e1a7be4718 ("mm/vmscan.c: refactor shrink_node()")
d3629af59f41 ("mm/vmscan: make the annotations of refaults code at the right place")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9972605a238339b85bd16b084eed5f18414d22db Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Date: Fri, 2 Aug 2024 16:58:22 -0700
Subject: [PATCH] memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..f29157288b7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3386,11 +3386,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
#define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1)
static DEFINE_IDR(mem_cgroup_idr);
+static DEFINE_SPINLOCK(memcg_idr_lock);
+
+static int mem_cgroup_alloc_id(void)
+{
+ int ret;
+
+ idr_preload(GFP_KERNEL);
+ spin_lock(&memcg_idr_lock);
+ ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
+ GFP_NOWAIT);
+ spin_unlock(&memcg_idr_lock);
+ idr_preload_end();
+ return ret;
+}
static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
{
if (memcg->id.id > 0) {
+ spin_lock(&memcg_idr_lock);
idr_remove(&mem_cgroup_idr, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
+
memcg->id.id = 0;
}
}
@@ -3524,8 +3541,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
if (!memcg)
return ERR_PTR(error);
- memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
+ memcg->id.id = mem_cgroup_alloc_id();
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
@@ -3667,7 +3683,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
* publish it here at the end of onlining. This matches the
* regular ID destruction during offlining.
*/
+ spin_lock(&memcg_idr_lock);
idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
return 0;
offline_kmem:
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 9972605a238339b85bd16b084eed5f18414d22db
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081259-plow-freezing-a93e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
9972605a2383 ("memcg: protect concurrent access to mem_cgroup_idr")
6f0df8e16eb5 ("memcontrol: ensure memcg acquired by id is properly set up")
e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
7348cc91821b ("mm: multi-gen LRU: remove aging fairness safeguard")
a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
adb8213014b2 ("mm: memcg: fix stale protection of reclaim target memcg")
57e9cc50f4dd ("mm: vmscan: split khugepaged stats from direct reclaim stats")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9972605a238339b85bd16b084eed5f18414d22db Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Date: Fri, 2 Aug 2024 16:58:22 -0700
Subject: [PATCH] memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..f29157288b7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3386,11 +3386,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
#define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1)
static DEFINE_IDR(mem_cgroup_idr);
+static DEFINE_SPINLOCK(memcg_idr_lock);
+
+static int mem_cgroup_alloc_id(void)
+{
+ int ret;
+
+ idr_preload(GFP_KERNEL);
+ spin_lock(&memcg_idr_lock);
+ ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
+ GFP_NOWAIT);
+ spin_unlock(&memcg_idr_lock);
+ idr_preload_end();
+ return ret;
+}
static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
{
if (memcg->id.id > 0) {
+ spin_lock(&memcg_idr_lock);
idr_remove(&mem_cgroup_idr, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
+
memcg->id.id = 0;
}
}
@@ -3524,8 +3541,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
if (!memcg)
return ERR_PTR(error);
- memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
+ memcg->id.id = mem_cgroup_alloc_id();
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
@@ -3667,7 +3683,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
* publish it here at the end of onlining. This matches the
* regular ID destruction during offlining.
*/
+ spin_lock(&memcg_idr_lock);
idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+ spin_unlock(&memcg_idr_lock);
return 0;
offline_kmem:
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 853b1f96b1fd..cca1acf2e037 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1553,7 +1553,6 @@ struct btrfs_drop_extents_args {
struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
- bool fsync_skip_inode_lock;
};
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e23d178f9778..c8231677c79e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1534,13 +1534,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -1549,13 +1542,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
err = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -1795,7 +1785,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1805,7 +1794,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 970ff316069d..8b88446df36d 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -11,6 +11,12 @@
#include "delayed-ref.h"
#include "ctree.h"
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
enum btrfs_trans_state {
TRANS_STATE_RUNNING,
TRANS_STATE_COMMIT_START,
--
2.43.0
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 86c7f8ce1715..06333a74d6c4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -445,7 +445,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 952cf145c629..15fd8c00f4c0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1543,13 +1543,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
err = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -1558,13 +1551,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
err = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -1796,7 +1786,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1806,7 +1795,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 238a0ab85df9..7623db359881 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -12,6 +12,12 @@
#include "ctree.h"
#include "misc.h"
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
--
2.43.0
From: Filipe Manana <fdmanana(a)suse.com>
commit cd9253c23aedd61eb5ff11f37a36247cd46faf86 upstream.
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's vfs lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's vfs lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's vfs lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
Aug 21 11:46:43 kerberos kernel: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
Aug 21 11:46:43 kerberos kernel: ------------[ cut here ]------------
Aug 21 11:46:43 kerberos kernel: kernel BUG at fs/btrfs/ordered-data.c:983!
Aug 21 11:46:43 kerberos kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Aug 21 11:46:43 kerberos kernel: CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Aug 21 11:46:43 kerberos kernel: Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
Aug 21 11:46:43 kerberos kernel: RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Aug 21 11:46:43 kerberos kernel: Code: 50 d6 86 c0 e8 (...)
Aug 21 11:46:43 kerberos kernel: RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
Aug 21 11:46:43 kerberos kernel: RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
Aug 21 11:46:43 kerberos kernel: RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
Aug 21 11:46:43 kerberos kernel: RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
Aug 21 11:46:43 kerberos kernel: R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
Aug 21 11:46:43 kerberos kernel: R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
Aug 21 11:46:43 kerberos kernel: FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
Aug 21 11:46:43 kerberos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 11:46:43 kerberos kernel: CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Aug 21 11:46:43 kerberos kernel: Call Trace:
Aug 21 11:46:43 kerberos kernel: <TASK>
Aug 21 11:46:43 kerberos kernel: ? __die_body.cold+0x14/0x24
Aug 21 11:46:43 kerberos kernel: ? die+0x2e/0x50
Aug 21 11:46:43 kerberos kernel: ? do_trap+0xca/0x110
Aug 21 11:46:43 kerberos kernel: ? do_error_trap+0x6a/0x90
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? exc_invalid_op+0x50/0x70
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? asm_exc_invalid_op+0x1a/0x20
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
Aug 21 11:46:43 kerberos kernel: ? __seccomp_filter+0x31d/0x4f0
Aug 21 11:46:43 kerberos kernel: __x64_sys_fdatasync+0x4f/0x90
Aug 21 11:46:43 kerberos kernel: do_syscall_64+0x82/0x160
Aug 21 11:46:43 kerberos kernel: ? do_futex+0xcb/0x190
Aug 21 11:46:43 kerberos kernel: ? __x64_sys_futex+0x10e/0x1d0
Aug 21 11:46:43 kerberos kernel: ? switch_fpu_return+0x4f/0xd0
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: ? syscall_exit_to_user_mode+0x72/0x220
Aug 21 11:46:43 kerberos kernel: ? do_syscall_64+0x8e/0x160
Aug 21 11:46:43 kerberos kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of trask A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/ctree.h | 1 -
fs/btrfs/file.c | 25 ++++++++++---------------
fs/btrfs/transaction.h | 6 ++++++
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a56209d275c1..b2e4b30b8fae 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -457,7 +457,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ca434f0cd27f..66dfee873906 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1558,13 +1558,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
ret = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchoronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -1573,13 +1566,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
ret = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
@@ -1811,7 +1801,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_inode(dentry);
struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
@@ -1821,7 +1810,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 4e451ab173b1..62ec85f4b777 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,6 +27,12 @@ struct btrfs_root_item;
struct btrfs_root;
struct btrfs_path;
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
--
2.43.0
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since drm_sched_entity_modify_sched() can modify the entities run queue
lets make sure to only derefernce the pointer once so both adding and
waking up are guaranteed to be consistent.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
---
drivers/gpu/drm/scheduler/sched_entity.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index ae8be30472cd..62b07ef7630a 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
+ struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +610,15 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
return;
}
- drm_sched_rq_add_entity(entity->rq, entity);
+ rq = entity->rq;
+
+ drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(rq->sched, entity);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
--
2.46.0
Memory access #VEs are hard for Linux to handle in contexts like the
entry code or NMIs. But other OSes need them for functionality.
There's a static (pre-guest-boot) way for a VMM to choose one or the
other. But VMMs don't always know which OS they are booting, so they
choose to deliver those #VEs so the "other" OSes will work. That,
unfortunately has left us in the lurch and exposed to these
hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static
configuration is set to "send nasty #VEs", the kernel can dynamically
request that they be disabled. Once they are disabled, access to private
memory that is not in the Mapped state in the Secure-EPT (SEPT) will
result in an exit to the VMM rather than injecting a #VE.
Check if the feature is available and disable SEPT #VE if possible.
If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
attribute is no longer reliable. It reflects the initial state of the
control for the TD, but it will not be updated if someone (e.g. bootloader)
changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
determine if SEPT #VEs are enabled or disabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")
Cc: stable(a)vger.kernel.org
Acked-by: Kai Huang <kai.huang(a)intel.com>
---
arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++-------
arch/x86/include/asm/shared/tdx.h | 10 +++-
2 files changed, 69 insertions(+), 17 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 08ce488b54d0..f969f4f5ebf8 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -78,7 +78,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
}
/* Read TD-scoped metadata */
-static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+static inline u64 tdg_vm_rd(u64 field, u64 *value)
{
struct tdx_module_args args = {
.rdx = field,
@@ -193,6 +193,62 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
+/*
+ * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure
+ * that no #VE will be delivered for accesses to TD-private memory.
+ *
+ * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM
+ * controls if the guest will receive such #VE with TD attribute
+ * ATTR_SEPT_VE_DISABLE.
+ *
+ * Newer TDX modules allow the guest to control if it wants to receive SEPT
+ * violation #VEs.
+ *
+ * Check if the feature is available and disable SEPT #VE if possible.
+ *
+ * If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
+ * attribute is no longer reliable. It reflects the initial state of the
+ * control for the TD, but it will not be updated if someone (e.g. bootloader)
+ * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
+ * determine if SEPT #VEs are enabled or disabled.
+ */
+static void disable_sept_ve(u64 td_attr)
+{
+ const char *msg = "TD misconfiguration: SEPT #VE has to be disabled";
+ bool debug = td_attr & ATTR_DEBUG;
+ u64 config, controls;
+
+ /* Is this TD allowed to disable SEPT #VE */
+ tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
+ if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) {
+ /* No SEPT #VE controls for the guest: check the attribute */
+ if (td_attr & ATTR_SEPT_VE_DISABLE)
+ return;
+
+ /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */
+ if (debug)
+ pr_warn("%s\n", msg);
+ else
+ tdx_panic(msg);
+ return;
+ }
+
+ /* Check if SEPT #VE has been disabled before us */
+ tdg_vm_rd(TDCS_TD_CTLS, &controls);
+ if (controls & TD_CTLS_PENDING_VE_DISABLE)
+ return;
+
+ /* Keep #VEs enabled for splats in debugging environments */
+ if (debug)
+ return;
+
+ /* Disable SEPT #VEs */
+ tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE,
+ TD_CTLS_PENDING_VE_DISABLE);
+
+ return;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -218,24 +274,12 @@ static void tdx_setup(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ td_attr = args.rdx;
+
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
- /*
- * The kernel can not handle #VE's when accessing normal kernel
- * memory. Ensure that no #VE will be delivered for accesses to
- * TD-private memory. Only VMM-shared memory (MMIO) will #VE.
- */
- td_attr = args.rdx;
- if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
- const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set.";
-
- /* Relax SEPT_VE_DISABLE check for debug TD. */
- if (td_attr & ATTR_DEBUG)
- pr_warn("%s\n", msg);
- else
- tdx_panic(msg);
- }
+ disable_sept_ve(td_attr);
}
/*
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 7e12cfa28bec..fecb2a6e864b 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -19,9 +19,17 @@
#define TDG_VM_RD 7
#define TDG_VM_WR 8
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+/* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
+#define TDCS_CONFIG_FLAGS 0x1110000300000016
+#define TDCS_TD_CTLS 0x1110000300000017
#define TDCS_NOTIFY_ENABLES 0x9100000000000010
+/* TDCS_CONFIG_FLAGS bits */
+#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1)
+
+/* TDCS_TD_CTLS bits */
+#define TD_CTLS_PENDING_VE_DISABLE BIT_ULL(0)
+
/* TDX hypercall Leaf IDs */
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
--
2.45.2
Rename tdx_parse_tdinfo() to tdx_setup() and move setting NOTIFY_ENABLES
there.
The function will be extended to adjust TD configuration.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 64717a96a936..08ce488b54d0 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -193,7 +193,7 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
-static void tdx_parse_tdinfo(u64 *cc_mask)
+static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
unsigned int gpa_width;
@@ -218,6 +218,9 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
+ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
+
/*
* The kernel can not handle #VE's when accessing normal kernel
* memory. Ensure that no #VE will be delivered for accesses to
@@ -964,11 +967,11 @@ void __init tdx_early_init(void)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
cc_vendor = CC_VENDOR_INTEL;
- tdx_parse_tdinfo(&cc_mask);
- cc_set_mask(cc_mask);
- /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
+ /* Configure the TD */
+ tdx_setup(&cc_mask);
+
+ cc_set_mask(cc_mask);
/*
* All bits above GPA width are reserved and kernel treats shared bit
--
2.45.2
From: Jason Andryuk <jason.andryuk(a)amd.com>
Hi Arthur,
Can you give the patch below a try? If it works, please respond with a
Tested-by. I'll then submit it with your Reported-by and Tested-by.
Thanks,
Jason
[PATCH] fbdev/xen-fbfront: Assign fb_info->device
Probing xen-fbfront faults in video_is_primary_device(). The passed-in
struct device is NULL since xen-fbfront doesn't assign it and the
memory is kzalloc()-ed. Assign fb_info->device to avoid this.
This was exposed by the conversion of fb_is_primary_device() to
video_is_primary_device() which dropped a NULL check for struct device.
Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers")
CC: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
The other option would be to re-instate the NULL check in
video_is_primary_device()
---
drivers/video/fbdev/xen-fbfront.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/video/fbdev/xen-fbfront.c b/drivers/video/fbdev/xen-fbfront.c
index 66d4628a96ae..c90f48ebb15e 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -407,6 +407,7 @@ static int xenfb_probe(struct xenbus_device *dev,
/* complete the abuse: */
fb_info->pseudo_palette = fb_info->par;
fb_info->par = info;
+ fb_info->device = &dev->dev;
fb_info->screen_buffer = info->fb;
--
2.43.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2ab9d830262c132ab5db2f571003d80850d56b2a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090922-directed-majorette-f8ad@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2ab9d830262c ("perf/aux: Fix AUX buffer serialization")
c1e8d7c6a7a6 ("mmap locking API: convert mmap_sem comments")
d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites")
5a36f0f3f518 ("Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ab9d830262c132ab5db2f571003d80850d56b2a Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 2 Sep 2024 10:14:24 +0200
Subject: [PATCH] perf/aux: Fix AUX buffer serialization
Ole reported that event->mmap_mutex is strictly insufficient to
serialize the AUX buffer, add a per RB mutex to fully serialize it.
Note that in the lock order comment the perf_event::mmap_mutex order
was already wrong, that is, it nesting under mmap_lock is not new with
this patch.
Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams")
Reported-by: Ole <ole(a)binarygecko.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c973e3c11e03..8a6c6bbcd658 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1255,8 +1255,9 @@ static void put_ctx(struct perf_event_context *ctx)
* perf_event_context::mutex
* perf_event::child_mutex;
* perf_event_context::lock
- * perf_event::mmap_mutex
* mmap_lock
+ * perf_event::mmap_mutex
+ * perf_buffer::aux_mutex
* perf_addr_filters_head::lock
*
* cpu_hotplug_lock
@@ -6373,12 +6374,11 @@ static void perf_mmap_close(struct vm_area_struct *vma)
event->pmu->event_unmapped(event, vma->vm_mm);
/*
- * rb->aux_mmap_count will always drop before rb->mmap_count and
- * event->mmap_count, so it is ok to use event->mmap_mutex to
- * serialize with perf_mmap here.
+ * The AUX buffer is strictly a sub-buffer, serialize using aux_mutex
+ * to avoid complications.
*/
if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
- atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+ atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
/*
* Stop all AUX events that are writing to this buffer,
* so that we can free its AUX pages and corresponding PMU
@@ -6395,7 +6395,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
rb_free_aux(rb);
WARN_ON_ONCE(refcount_read(&rb->aux_refcount));
- mutex_unlock(&event->mmap_mutex);
+ mutex_unlock(&rb->aux_mutex);
}
if (atomic_dec_and_test(&rb->mmap_count))
@@ -6483,6 +6483,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
struct perf_event *event = file->private_data;
unsigned long user_locked, user_lock_limit;
struct user_struct *user = current_user();
+ struct mutex *aux_mutex = NULL;
struct perf_buffer *rb = NULL;
unsigned long locked, lock_limit;
unsigned long vma_size;
@@ -6531,6 +6532,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (!rb)
goto aux_unlock;
+ aux_mutex = &rb->aux_mutex;
+ mutex_lock(aux_mutex);
+
aux_offset = READ_ONCE(rb->user_page->aux_offset);
aux_size = READ_ONCE(rb->user_page->aux_size);
@@ -6681,6 +6685,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
atomic_dec(&rb->mmap_count);
}
aux_unlock:
+ if (aux_mutex)
+ mutex_unlock(aux_mutex);
mutex_unlock(&event->mmap_mutex);
/*
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 451514442a1b..e072d995d670 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -40,6 +40,7 @@ struct perf_buffer {
struct user_struct *mmap_user;
/* AUX area */
+ struct mutex aux_mutex;
long aux_head;
unsigned int aux_nest;
long aux_wakeup; /* last aux_watermark boundary crossed by aux_head */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 8cadf97bc290..4f46f688d0d4 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -337,6 +337,8 @@ ring_buffer_init(struct perf_buffer *rb, long watermark, int flags)
*/
if (!rb->nr_pages)
rb->paused = 1;
+
+ mutex_init(&rb->aux_mutex);
}
void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2ab9d830262c132ab5db2f571003d80850d56b2a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090922-footbath-barrack-c29b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2ab9d830262c ("perf/aux: Fix AUX buffer serialization")
c1e8d7c6a7a6 ("mmap locking API: convert mmap_sem comments")
d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites")
5a36f0f3f518 ("Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ab9d830262c132ab5db2f571003d80850d56b2a Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 2 Sep 2024 10:14:24 +0200
Subject: [PATCH] perf/aux: Fix AUX buffer serialization
Ole reported that event->mmap_mutex is strictly insufficient to
serialize the AUX buffer, add a per RB mutex to fully serialize it.
Note that in the lock order comment the perf_event::mmap_mutex order
was already wrong, that is, it nesting under mmap_lock is not new with
this patch.
Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams")
Reported-by: Ole <ole(a)binarygecko.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c973e3c11e03..8a6c6bbcd658 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1255,8 +1255,9 @@ static void put_ctx(struct perf_event_context *ctx)
* perf_event_context::mutex
* perf_event::child_mutex;
* perf_event_context::lock
- * perf_event::mmap_mutex
* mmap_lock
+ * perf_event::mmap_mutex
+ * perf_buffer::aux_mutex
* perf_addr_filters_head::lock
*
* cpu_hotplug_lock
@@ -6373,12 +6374,11 @@ static void perf_mmap_close(struct vm_area_struct *vma)
event->pmu->event_unmapped(event, vma->vm_mm);
/*
- * rb->aux_mmap_count will always drop before rb->mmap_count and
- * event->mmap_count, so it is ok to use event->mmap_mutex to
- * serialize with perf_mmap here.
+ * The AUX buffer is strictly a sub-buffer, serialize using aux_mutex
+ * to avoid complications.
*/
if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
- atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+ atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
/*
* Stop all AUX events that are writing to this buffer,
* so that we can free its AUX pages and corresponding PMU
@@ -6395,7 +6395,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
rb_free_aux(rb);
WARN_ON_ONCE(refcount_read(&rb->aux_refcount));
- mutex_unlock(&event->mmap_mutex);
+ mutex_unlock(&rb->aux_mutex);
}
if (atomic_dec_and_test(&rb->mmap_count))
@@ -6483,6 +6483,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
struct perf_event *event = file->private_data;
unsigned long user_locked, user_lock_limit;
struct user_struct *user = current_user();
+ struct mutex *aux_mutex = NULL;
struct perf_buffer *rb = NULL;
unsigned long locked, lock_limit;
unsigned long vma_size;
@@ -6531,6 +6532,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (!rb)
goto aux_unlock;
+ aux_mutex = &rb->aux_mutex;
+ mutex_lock(aux_mutex);
+
aux_offset = READ_ONCE(rb->user_page->aux_offset);
aux_size = READ_ONCE(rb->user_page->aux_size);
@@ -6681,6 +6685,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
atomic_dec(&rb->mmap_count);
}
aux_unlock:
+ if (aux_mutex)
+ mutex_unlock(aux_mutex);
mutex_unlock(&event->mmap_mutex);
/*
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 451514442a1b..e072d995d670 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -40,6 +40,7 @@ struct perf_buffer {
struct user_struct *mmap_user;
/* AUX area */
+ struct mutex aux_mutex;
long aux_head;
unsigned int aux_nest;
long aux_wakeup; /* last aux_watermark boundary crossed by aux_head */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 8cadf97bc290..4f46f688d0d4 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -337,6 +337,8 @@ ring_buffer_init(struct perf_buffer *rb, long watermark, int flags)
*/
if (!rb->nr_pages)
rb->paused = 1;
+
+ mutex_init(&rb->aux_mutex);
}
void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 471ef0b5a8aaca4296108e756b970acfc499ede4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090938-resale-impose-3699@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
471ef0b5a8aa ("clocksource/drivers/timer-of: Remove percpu irq related code")
0f1a7b3fac05 ("timer-of: don't use conditional expression with mixed 'void' types")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 471ef0b5a8aaca4296108e756b970acfc499ede4 Mon Sep 17 00:00:00 2001
From: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Date: Mon, 19 Aug 2024 12:03:35 +0200
Subject: [PATCH] clocksource/drivers/timer-of: Remove percpu irq related code
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
GCC's named address space checks errors out with:
drivers/clocksource/timer-of.c: In function ‘timer_of_irq_exit’:
drivers/clocksource/timer-of.c:29:46: error: passing argument 2 of
‘free_percpu_irq’ from pointer to non-enclosed address space
29 | free_percpu_irq(of_irq->irq, clkevt);
| ^~~~~~
In file included from drivers/clocksource/timer-of.c:8:
./include/linux/interrupt.h:201:43: note: expected ‘__seg_gs void *’
but argument is of type ‘struct clock_event_device *’
201 | extern void free_percpu_irq(unsigned int, void __percpu *);
| ^~~~~~~~~~~~~~~
drivers/clocksource/timer-of.c: In function ‘timer_of_irq_init’:
drivers/clocksource/timer-of.c:74:51: error: passing argument 4 of
‘request_percpu_irq’ from pointer to non-enclosed address space
74 | np->full_name, clkevt) :
| ^~~~~~
./include/linux/interrupt.h:190:56: note: expected ‘__seg_gs void *’
but argument is of type ‘struct clock_event_device *’
190 | const char *devname, void __percpu *percpu_dev_id)
Sparse warns about:
timer-of.c:29:46: warning: incorrect type in argument 2 (different address spaces)
timer-of.c:29:46: expected void [noderef] __percpu *
timer-of.c:29:46: got struct clock_event_device *clkevt
timer-of.c:74:51: warning: incorrect type in argument 4 (different address spaces)
timer-of.c:74:51: expected void [noderef] __percpu *percpu_dev_id
timer-of.c:74:51: got struct clock_event_device *clkevt
It appears the code is incorrect as reported by Uros Bizjak:
"The referred code is questionable as it tries to reuse
the clkevent pointer once as percpu pointer and once as generic
pointer, which should be avoided."
This change removes the percpu related code as no drivers is using it.
[Daniel: Fixed the description]
Fixes: dc11bae785295 ("clocksource/drivers: Add timer-of common init routine")
Reported-by: Uros Bizjak <ubizjak(a)gmail.com>
Tested-by: Uros Bizjak <ubizjak(a)gmail.com>
Link: https://lore.kernel.org/r/20240819100335.2394751-1-daniel.lezcano@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
diff --git a/drivers/clocksource/timer-of.c b/drivers/clocksource/timer-of.c
index c3f54d9912be..420202bf76e4 100644
--- a/drivers/clocksource/timer-of.c
+++ b/drivers/clocksource/timer-of.c
@@ -25,10 +25,7 @@ static __init void timer_of_irq_exit(struct of_timer_irq *of_irq)
struct clock_event_device *clkevt = &to->clkevt;
- if (of_irq->percpu)
- free_percpu_irq(of_irq->irq, clkevt);
- else
- free_irq(of_irq->irq, clkevt);
+ free_irq(of_irq->irq, clkevt);
}
/**
@@ -42,9 +39,6 @@ static __init void timer_of_irq_exit(struct of_timer_irq *of_irq)
* - Get interrupt number by name
* - Get interrupt number by index
*
- * When the interrupt is per CPU, 'request_percpu_irq()' is called,
- * otherwise 'request_irq()' is used.
- *
* Returns 0 on success, < 0 otherwise
*/
static __init int timer_of_irq_init(struct device_node *np,
@@ -69,12 +63,9 @@ static __init int timer_of_irq_init(struct device_node *np,
return -EINVAL;
}
- ret = of_irq->percpu ?
- request_percpu_irq(of_irq->irq, of_irq->handler,
- np->full_name, clkevt) :
- request_irq(of_irq->irq, of_irq->handler,
- of_irq->flags ? of_irq->flags : IRQF_TIMER,
- np->full_name, clkevt);
+ ret = request_irq(of_irq->irq, of_irq->handler,
+ of_irq->flags ? of_irq->flags : IRQF_TIMER,
+ np->full_name, clkevt);
if (ret) {
pr_err("Failed to request irq %d for %pOF\n", of_irq->irq, np);
return ret;
diff --git a/drivers/clocksource/timer-of.h b/drivers/clocksource/timer-of.h
index a5478f3e8589..01a2c6b7db06 100644
--- a/drivers/clocksource/timer-of.h
+++ b/drivers/clocksource/timer-of.h
@@ -11,7 +11,6 @@
struct of_timer_irq {
int irq;
int index;
- int percpu;
const char *name;
unsigned long flags;
irq_handler_t handler;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8679e8b4a1ebdb40c4429e49368d29353e07b601
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090922-marathon-revival-7771@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8679e8b4a1eb ("nvmem: u-boot-env: error if NVMEM device is too small")
6bafe07c9306 ("nvmem: u-boot-env: improve coding style")
a832556d23c5 ("nvmem: u-boot-env: use nvmem device helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8679e8b4a1ebdb40c4429e49368d29353e07b601 Mon Sep 17 00:00:00 2001
From: John Thomson <git(a)johnthomson.fastmail.com.au>
Date: Mon, 2 Sep 2024 15:25:08 +0100
Subject: [PATCH] nvmem: u-boot-env: error if NVMEM device is too small
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Verify data size before trying to parse it to avoid reading out of
buffer. This could happen in case of problems at MTD level or invalid DT
bindings.
Signed-off-by: John Thomson <git(a)johnthomson.fastmail.com.au>
Cc: stable <stable(a)kernel.org>
Fixes: d5542923f200 ("nvmem: add driver handling U-Boot environment variables")
[rmilecki: simplify commit description & rebase]
Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20240902142510.71096-2-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/nvmem/u-boot-env.c b/drivers/nvmem/u-boot-env.c
index 936e39b20b38..593f0bf4a395 100644
--- a/drivers/nvmem/u-boot-env.c
+++ b/drivers/nvmem/u-boot-env.c
@@ -176,6 +176,13 @@ static int u_boot_env_parse(struct u_boot_env *priv)
data_offset = offsetof(struct u_boot_env_image_broadcom, data);
break;
}
+
+ if (dev_size < data_offset) {
+ dev_err(dev, "Device too small for u-boot-env\n");
+ err = -EIO;
+ goto err_kfree;
+ }
+
crc32_addr = (__le32 *)(buf + crc32_offset);
crc32 = le32_to_cpu(*crc32_addr);
crc32_data_len = dev_size - crc32_data_offset;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8679e8b4a1ebdb40c4429e49368d29353e07b601
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090921-cycling-overfed-49be@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
8679e8b4a1eb ("nvmem: u-boot-env: error if NVMEM device is too small")
6bafe07c9306 ("nvmem: u-boot-env: improve coding style")
a832556d23c5 ("nvmem: u-boot-env: use nvmem device helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8679e8b4a1ebdb40c4429e49368d29353e07b601 Mon Sep 17 00:00:00 2001
From: John Thomson <git(a)johnthomson.fastmail.com.au>
Date: Mon, 2 Sep 2024 15:25:08 +0100
Subject: [PATCH] nvmem: u-boot-env: error if NVMEM device is too small
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Verify data size before trying to parse it to avoid reading out of
buffer. This could happen in case of problems at MTD level or invalid DT
bindings.
Signed-off-by: John Thomson <git(a)johnthomson.fastmail.com.au>
Cc: stable <stable(a)kernel.org>
Fixes: d5542923f200 ("nvmem: add driver handling U-Boot environment variables")
[rmilecki: simplify commit description & rebase]
Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20240902142510.71096-2-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/nvmem/u-boot-env.c b/drivers/nvmem/u-boot-env.c
index 936e39b20b38..593f0bf4a395 100644
--- a/drivers/nvmem/u-boot-env.c
+++ b/drivers/nvmem/u-boot-env.c
@@ -176,6 +176,13 @@ static int u_boot_env_parse(struct u_boot_env *priv)
data_offset = offsetof(struct u_boot_env_image_broadcom, data);
break;
}
+
+ if (dev_size < data_offset) {
+ dev_err(dev, "Device too small for u-boot-env\n");
+ err = -EIO;
+ goto err_kfree;
+ }
+
crc32_addr = (__le32 *)(buf + crc32_offset);
crc32 = le32_to_cpu(*crc32_addr);
crc32_data_len = dev_size - crc32_data_offset;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 00dcf2fa449f23a263343d7fe051741bdde65d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090925-chokehold-enhance-1d11@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
00dcf2fa449f ("usb: dwc3: Avoid waking up gadget during startxfer")
047161686b81 ("usb: dwc3: Add remote wakeup handling")
a02a26eb0aea ("usb: dwc3: gadget: Ignore Update Transfer cmd params")
63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer")
40edb52298df ("usb: dwc3: avoid NULL access of usb_gadget_driver")
c560e76319a9 ("usb: dwc3: gadget: Fix START_TRANSFER link state check")
475e8be53d04 ("usb: dwc3: gadget: Check for disabled LPM quirk")
82c46b8ed9dc ("usb: dwc3: gadget: Introduce a DWC3 VBUS draw callback")
f580170f135a ("usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc")
e81a7018d93a ("usb: dwc3: allocate gadget structure dynamically")
c5a7092f4015 ("usb: dwc3: gadget: make starting isoc transfers more robust")
9af21dd6faeb ("usb: dwc3: Add support for DWC_usb32 IP")
8bb14308a869 ("usb: dwc3: core: Use role-switch default dr_mode")
d0550cd20e52 ("usb: dwc3: gadget: Do link recovery for SS and SSP")
d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit")
586f4335700f ("usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name")
5eb5afb07853 ("usb: dwc3: use proper initializers for property entries")
9ba3aca8fe82 ("usb: dwc3: Disable phy suspend after power-on reset")
a0a465569b45 ("usb: dwc3: remove generic PHY calibrate() calls")
c09b73cfac2a ("usb: dwc3: don't set gadget->is_otg flag")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00dcf2fa449f23a263343d7fe051741bdde65d0b Mon Sep 17 00:00:00 2001
From: Prashanth K <quic_prashk(a)quicinc.com>
Date: Wed, 28 Aug 2024 12:13:02 +0530
Subject: [PATCH] usb: dwc3: Avoid waking up gadget during startxfer
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: stable(a)vger.kernel.org
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240828064302.3796315-1-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..291bc549935b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,23 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to the programming guide, if the link state is in L1/L2/U3,
+ * then sending the Start Transfer command may not complete. The
+ * programming guide suggested to bring the link state back to ON/U0 by
+ * performing remote wakeup prior to sending the command. However, don't
+ * initiate remote wakeup when the user/function does not send wakeup
+ * request via wakeup ops. Send the command when it's allowed.
+ *
+ * Notes:
+ * For L1 link state, issuing a command requires the clearing of
+ * GUSB2PHYCFG.SUSPENDUSB2, which turns on the signal required to complete
+ * the given command (usually within 50us). This should happen within the
+ * command timeout set by driver. No additional step is needed.
+ *
+ * For L2 or U3 link state, the gadget is in USB suspend. Care should be
+ * taken when sending Start Transfer command to ensure that it's done after
+ * USB resume.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +344,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 00dcf2fa449f23a263343d7fe051741bdde65d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090924-crescent-edging-00be@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
00dcf2fa449f ("usb: dwc3: Avoid waking up gadget during startxfer")
047161686b81 ("usb: dwc3: Add remote wakeup handling")
a02a26eb0aea ("usb: dwc3: gadget: Ignore Update Transfer cmd params")
63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer")
40edb52298df ("usb: dwc3: avoid NULL access of usb_gadget_driver")
c560e76319a9 ("usb: dwc3: gadget: Fix START_TRANSFER link state check")
475e8be53d04 ("usb: dwc3: gadget: Check for disabled LPM quirk")
82c46b8ed9dc ("usb: dwc3: gadget: Introduce a DWC3 VBUS draw callback")
f580170f135a ("usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc")
e81a7018d93a ("usb: dwc3: allocate gadget structure dynamically")
c5a7092f4015 ("usb: dwc3: gadget: make starting isoc transfers more robust")
9af21dd6faeb ("usb: dwc3: Add support for DWC_usb32 IP")
8bb14308a869 ("usb: dwc3: core: Use role-switch default dr_mode")
d0550cd20e52 ("usb: dwc3: gadget: Do link recovery for SS and SSP")
d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit")
586f4335700f ("usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name")
5eb5afb07853 ("usb: dwc3: use proper initializers for property entries")
9ba3aca8fe82 ("usb: dwc3: Disable phy suspend after power-on reset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00dcf2fa449f23a263343d7fe051741bdde65d0b Mon Sep 17 00:00:00 2001
From: Prashanth K <quic_prashk(a)quicinc.com>
Date: Wed, 28 Aug 2024 12:13:02 +0530
Subject: [PATCH] usb: dwc3: Avoid waking up gadget during startxfer
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: stable(a)vger.kernel.org
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240828064302.3796315-1-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..291bc549935b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,23 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to the programming guide, if the link state is in L1/L2/U3,
+ * then sending the Start Transfer command may not complete. The
+ * programming guide suggested to bring the link state back to ON/U0 by
+ * performing remote wakeup prior to sending the command. However, don't
+ * initiate remote wakeup when the user/function does not send wakeup
+ * request via wakeup ops. Send the command when it's allowed.
+ *
+ * Notes:
+ * For L1 link state, issuing a command requires the clearing of
+ * GUSB2PHYCFG.SUSPENDUSB2, which turns on the signal required to complete
+ * the given command (usually within 50us). This should happen within the
+ * command timeout set by driver. No additional step is needed.
+ *
+ * For L2 or U3 link state, the gadget is in USB suspend. Care should be
+ * taken when sending Start Transfer command to ensure that it's done after
+ * USB resume.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +344,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 00dcf2fa449f23a263343d7fe051741bdde65d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090923-trustful-helium-3f09@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
00dcf2fa449f ("usb: dwc3: Avoid waking up gadget during startxfer")
047161686b81 ("usb: dwc3: Add remote wakeup handling")
a02a26eb0aea ("usb: dwc3: gadget: Ignore Update Transfer cmd params")
63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer")
40edb52298df ("usb: dwc3: avoid NULL access of usb_gadget_driver")
c560e76319a9 ("usb: dwc3: gadget: Fix START_TRANSFER link state check")
475e8be53d04 ("usb: dwc3: gadget: Check for disabled LPM quirk")
82c46b8ed9dc ("usb: dwc3: gadget: Introduce a DWC3 VBUS draw callback")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00dcf2fa449f23a263343d7fe051741bdde65d0b Mon Sep 17 00:00:00 2001
From: Prashanth K <quic_prashk(a)quicinc.com>
Date: Wed, 28 Aug 2024 12:13:02 +0530
Subject: [PATCH] usb: dwc3: Avoid waking up gadget during startxfer
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: stable(a)vger.kernel.org
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240828064302.3796315-1-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..291bc549935b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,23 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to the programming guide, if the link state is in L1/L2/U3,
+ * then sending the Start Transfer command may not complete. The
+ * programming guide suggested to bring the link state back to ON/U0 by
+ * performing remote wakeup prior to sending the command. However, don't
+ * initiate remote wakeup when the user/function does not send wakeup
+ * request via wakeup ops. Send the command when it's allowed.
+ *
+ * Notes:
+ * For L1 link state, issuing a command requires the clearing of
+ * GUSB2PHYCFG.SUSPENDUSB2, which turns on the signal required to complete
+ * the given command (usually within 50us). This should happen within the
+ * command timeout set by driver. No additional step is needed.
+ *
+ * For L2 or U3 link state, the gadget is in USB suspend. Care should be
+ * taken when sending Start Transfer command to ensure that it's done after
+ * USB resume.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +344,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 00dcf2fa449f23a263343d7fe051741bdde65d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090922-shrewdly-bright-07e5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
00dcf2fa449f ("usb: dwc3: Avoid waking up gadget during startxfer")
047161686b81 ("usb: dwc3: Add remote wakeup handling")
a02a26eb0aea ("usb: dwc3: gadget: Ignore Update Transfer cmd params")
63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00dcf2fa449f23a263343d7fe051741bdde65d0b Mon Sep 17 00:00:00 2001
From: Prashanth K <quic_prashk(a)quicinc.com>
Date: Wed, 28 Aug 2024 12:13:02 +0530
Subject: [PATCH] usb: dwc3: Avoid waking up gadget during startxfer
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: stable(a)vger.kernel.org
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240828064302.3796315-1-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..291bc549935b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,23 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to the programming guide, if the link state is in L1/L2/U3,
+ * then sending the Start Transfer command may not complete. The
+ * programming guide suggested to bring the link state back to ON/U0 by
+ * performing remote wakeup prior to sending the command. However, don't
+ * initiate remote wakeup when the user/function does not send wakeup
+ * request via wakeup ops. Send the command when it's allowed.
+ *
+ * Notes:
+ * For L1 link state, issuing a command requires the clearing of
+ * GUSB2PHYCFG.SUSPENDUSB2, which turns on the signal required to complete
+ * the given command (usually within 50us). This should happen within the
+ * command timeout set by driver. No additional step is needed.
+ *
+ * For L2 or U3 link state, the gadget is in USB suspend. Care should be
+ * taken when sending Start Transfer command to ensure that it's done after
+ * USB resume.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +344,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 00dcf2fa449f23a263343d7fe051741bdde65d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090921-chastity-calzone-c81f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
00dcf2fa449f ("usb: dwc3: Avoid waking up gadget during startxfer")
047161686b81 ("usb: dwc3: Add remote wakeup handling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00dcf2fa449f23a263343d7fe051741bdde65d0b Mon Sep 17 00:00:00 2001
From: Prashanth K <quic_prashk(a)quicinc.com>
Date: Wed, 28 Aug 2024 12:13:02 +0530
Subject: [PATCH] usb: dwc3: Avoid waking up gadget during startxfer
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: stable(a)vger.kernel.org
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/20240828064302.3796315-1-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..291bc549935b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,23 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to the programming guide, if the link state is in L1/L2/U3,
+ * then sending the Start Transfer command may not complete. The
+ * programming guide suggested to bring the link state back to ON/U0 by
+ * performing remote wakeup prior to sending the command. However, don't
+ * initiate remote wakeup when the user/function does not send wakeup
+ * request via wakeup ops. Send the command when it's allowed.
+ *
+ * Notes:
+ * For L1 link state, issuing a command requires the clearing of
+ * GUSB2PHYCFG.SUSPENDUSB2, which turns on the signal required to complete
+ * the given command (usually within 50us). This should happen within the
+ * command timeout set by driver. No additional step is needed.
+ *
+ * For L2 or U3 link state, the gadget is in USB suspend. Care should be
+ * taken when sending Start Transfer command to ensure that it's done after
+ * USB resume.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +344,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 61cbfb5368dd50ed0d65ce21d305aa923581db2b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090916-dazzling-showman-af36@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
61cbfb5368dd ("iio: adc: ad7124: fix DT configuration parsing")
a6eaf02b8274 ("iio: adc: ad7124: Switch from of specific to fwnode based property handling")
1240c94ce819 ("iio: adc: Explicitly include correct DT includes")
4c077429b422 ("iio: mlx90614: Sort headers")
a99544c6c883 ("iio: adc: palmas: add support for iio threshold events")
2d48dbdfc7d4 ("iio: adc: palmas: move eventX_enable into palmas_adc_event")
7501a3a97e4f ("iio: adc: palmas: use iio_event_direction for threshold polarity")
d2ab4eea732d ("iio: adc: palmas: replace "wakeup" with "event"")
79d9622d622d ("iio: adc: palmas: remove adc_wakeupX_data")
6d52b0e70698 ("iio: adc: palmas: Take probe fully device managed.")
49f76c499d38 ("iio: adc: palmas_gpadc: fix NULL dereference on rmmod")
3a258747a01f ("iio: adc: ad7124: Silence no spi_device_id warnings")
5cfe8a1c2577 ("iio: adc: ad7124: add sequencer support")
fd5ba89e4cd4 ("iio: adc: ad7124: Add update_scan_mode")
3f1a9c392d69 ("iio:adc:palmas_gpadc: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 61cbfb5368dd50ed0d65ce21d305aa923581db2b Mon Sep 17 00:00:00 2001
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
Date: Tue, 6 Aug 2024 11:51:33 +0300
Subject: [PATCH] iio: adc: ad7124: fix DT configuration parsing
The cfg pointer is set before reading the channel number that the
configuration should point to. This causes configurations to be shifted
by one channel.
For example setting bipolar to the first channel defined in the DT will
cause bipolar mode to be active on the second defined channel.
Fix by moving the cfg pointer setting after reading the channel number.
Fixes: 7b8d045e497a ("iio: adc: ad7124: allow more than 8 channels")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240806085133.114547-1-dumitru.ceclan@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index afb5f4d741e6..108e9ccab1ef 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -844,8 +844,6 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels = channels;
device_for_each_child_node_scoped(dev, child) {
- cfg = &st->channels[channel].cfg;
-
ret = fwnode_property_read_u32(child, "reg", &channel);
if (ret)
return ret;
@@ -863,6 +861,7 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels[channel].ain = AD7124_CHANNEL_AINP(ain[0]) |
AD7124_CHANNEL_AINM(ain[1]);
+ cfg = &st->channels[channel].cfg;
cfg->bipolar = fwnode_property_read_bool(child, "bipolar");
ret = fwnode_property_read_u32(child, "adi,reference-select", &tmp);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 61cbfb5368dd50ed0d65ce21d305aa923581db2b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090915-luridness-parameter-3447@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
61cbfb5368dd ("iio: adc: ad7124: fix DT configuration parsing")
a6eaf02b8274 ("iio: adc: ad7124: Switch from of specific to fwnode based property handling")
1240c94ce819 ("iio: adc: Explicitly include correct DT includes")
4c077429b422 ("iio: mlx90614: Sort headers")
a99544c6c883 ("iio: adc: palmas: add support for iio threshold events")
2d48dbdfc7d4 ("iio: adc: palmas: move eventX_enable into palmas_adc_event")
7501a3a97e4f ("iio: adc: palmas: use iio_event_direction for threshold polarity")
d2ab4eea732d ("iio: adc: palmas: replace "wakeup" with "event"")
79d9622d622d ("iio: adc: palmas: remove adc_wakeupX_data")
6d52b0e70698 ("iio: adc: palmas: Take probe fully device managed.")
49f76c499d38 ("iio: adc: palmas_gpadc: fix NULL dereference on rmmod")
3a258747a01f ("iio: adc: ad7124: Silence no spi_device_id warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 61cbfb5368dd50ed0d65ce21d305aa923581db2b Mon Sep 17 00:00:00 2001
From: Dumitru Ceclan <mitrutzceclan(a)gmail.com>
Date: Tue, 6 Aug 2024 11:51:33 +0300
Subject: [PATCH] iio: adc: ad7124: fix DT configuration parsing
The cfg pointer is set before reading the channel number that the
configuration should point to. This causes configurations to be shifted
by one channel.
For example setting bipolar to the first channel defined in the DT will
cause bipolar mode to be active on the second defined channel.
Fix by moving the cfg pointer setting after reading the channel number.
Fixes: 7b8d045e497a ("iio: adc: ad7124: allow more than 8 channels")
Signed-off-by: Dumitru Ceclan <dumitru.ceclan(a)analog.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240806085133.114547-1-dumitru.ceclan@analog.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c
index afb5f4d741e6..108e9ccab1ef 100644
--- a/drivers/iio/adc/ad7124.c
+++ b/drivers/iio/adc/ad7124.c
@@ -844,8 +844,6 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels = channels;
device_for_each_child_node_scoped(dev, child) {
- cfg = &st->channels[channel].cfg;
-
ret = fwnode_property_read_u32(child, "reg", &channel);
if (ret)
return ret;
@@ -863,6 +861,7 @@ static int ad7124_parse_channel_config(struct iio_dev *indio_dev,
st->channels[channel].ain = AD7124_CHANNEL_AINP(ain[0]) |
AD7124_CHANNEL_AINM(ain[1]);
+ cfg = &st->channels[channel].cfg;
cfg->bipolar = fwnode_property_read_bool(child, "bipolar");
ret = fwnode_property_read_u32(child, "adi,reference-select", &tmp);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 90826e08468ba7fb35d8b39645b22d9e80004afe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090946-heading-mortality-97cb@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
90826e08468b ("iio: adc: ad7606: remove frstdata check for serial mode")
7989b4bb23fe ("iio: adc: ad7616: Add support for AD7616 ADC")
6bf229abce75 ("iio: adc: ad7606: Move oversampling options in chip info and rework *_avail attributes")
2985a5d88455 ("staging: iio: adc: ad7606: Move out of staging")
54160ae3b2d3 ("staging: iio: adc: ad7606: Misc style fixes (no functional change)")
cc49bd1652a4 ("staging: iio: adc: ad7606: Add support for threaded irq")
2bbf53e3e506 ("staging: iio: adc: ad7606: Simplify the Kconfing menu")
43f9b204edf0 ("staging: iio: adc: ad7606: Add OF device ID table")
41f71e5e7daf ("staging: iio: adc: ad7606: Use find_closest() macro")
c0683bfd3772 ("staging: iio: adc: ad7606: Use devm functions in probe")
557e585c3fdb ("staging: iio: adc: ad7606: Use wait-for-completion handler")
7c0bc65c8403 ("Merge tag 'iio-for-4.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-testing")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 90826e08468ba7fb35d8b39645b22d9e80004afe Mon Sep 17 00:00:00 2001
From: Guillaume Stols <gstols(a)baylibre.com>
Date: Tue, 2 Jul 2024 12:52:51 +0000
Subject: [PATCH] iio: adc: ad7606: remove frstdata check for serial mode
The current implementation attempts to recover from an eventual glitch
in the clock by checking frstdata state after reading the first
channel's sample: If frstdata is low, it will reset the chip and
return -EIO.
This will only work in parallel mode, where frstdata pin is set low
after the 2nd sample read starts.
For the serial mode, according to the datasheet, "The FRSTDATA output
returns to a logic low following the 16th SCLK falling edge.", thus
after the Xth pulse, X being the number of bits in a sample, the check
will always be true, and the driver will not work at all in serial
mode if frstdata(optional) is defined in the devicetree as it will
reset the chip, and return -EIO every time read_sample is called.
Hence, this check must be removed for serial mode.
Fixes: b9618c0cacd7 ("staging: IIO: ADC: New driver for AD7606/AD7606-6/AD7606-4")
Signed-off-by: Guillaume Stols <gstols(a)baylibre.com>
Reviewed-by: Nuno Sa <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20240702-cleanup-ad7606-v3-1-18d5ea18770e@baylibre…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/adc/ad7606.c b/drivers/iio/adc/ad7606.c
index 3a417595294f..c321c6ef48df 100644
--- a/drivers/iio/adc/ad7606.c
+++ b/drivers/iio/adc/ad7606.c
@@ -49,7 +49,7 @@ static const unsigned int ad7616_oversampling_avail[8] = {
1, 2, 4, 8, 16, 32, 64, 128,
};
-static int ad7606_reset(struct ad7606_state *st)
+int ad7606_reset(struct ad7606_state *st)
{
if (st->gpio_reset) {
gpiod_set_value(st->gpio_reset, 1);
@@ -60,6 +60,7 @@ static int ad7606_reset(struct ad7606_state *st)
return -ENODEV;
}
+EXPORT_SYMBOL_NS_GPL(ad7606_reset, IIO_AD7606);
static int ad7606_reg_access(struct iio_dev *indio_dev,
unsigned int reg,
@@ -88,31 +89,6 @@ static int ad7606_read_samples(struct ad7606_state *st)
{
unsigned int num = st->chip_info->num_channels - 1;
u16 *data = st->data;
- int ret;
-
- /*
- * The frstdata signal is set to high while and after reading the sample
- * of the first channel and low for all other channels. This can be used
- * to check that the incoming data is correctly aligned. During normal
- * operation the data should never become unaligned, but some glitch or
- * electrostatic discharge might cause an extra read or clock cycle.
- * Monitoring the frstdata signal allows to recover from such failure
- * situations.
- */
-
- if (st->gpio_frstdata) {
- ret = st->bops->read_block(st->dev, 1, data);
- if (ret)
- return ret;
-
- if (!gpiod_get_value(st->gpio_frstdata)) {
- ad7606_reset(st);
- return -EIO;
- }
-
- data++;
- num--;
- }
return st->bops->read_block(st->dev, num, data);
}
diff --git a/drivers/iio/adc/ad7606.h b/drivers/iio/adc/ad7606.h
index 0c6a88cc4695..6649e84d25de 100644
--- a/drivers/iio/adc/ad7606.h
+++ b/drivers/iio/adc/ad7606.h
@@ -151,6 +151,8 @@ int ad7606_probe(struct device *dev, int irq, void __iomem *base_address,
const char *name, unsigned int id,
const struct ad7606_bus_ops *bops);
+int ad7606_reset(struct ad7606_state *st);
+
enum ad7606_supported_device_ids {
ID_AD7605_4,
ID_AD7606_8,
diff --git a/drivers/iio/adc/ad7606_par.c b/drivers/iio/adc/ad7606_par.c
index d8408052262e..6bc587b20f05 100644
--- a/drivers/iio/adc/ad7606_par.c
+++ b/drivers/iio/adc/ad7606_par.c
@@ -7,6 +7,7 @@
#include <linux/mod_devicetable.h>
#include <linux/module.h>
+#include <linux/gpio/consumer.h>
#include <linux/platform_device.h>
#include <linux/types.h>
#include <linux/err.h>
@@ -21,8 +22,29 @@ static int ad7606_par16_read_block(struct device *dev,
struct iio_dev *indio_dev = dev_get_drvdata(dev);
struct ad7606_state *st = iio_priv(indio_dev);
- insw((unsigned long)st->base_address, buf, count);
+ /*
+ * On the parallel interface, the frstdata signal is set to high while
+ * and after reading the sample of the first channel and low for all
+ * other channels. This can be used to check that the incoming data is
+ * correctly aligned. During normal operation the data should never
+ * become unaligned, but some glitch or electrostatic discharge might
+ * cause an extra read or clock cycle. Monitoring the frstdata signal
+ * allows to recover from such failure situations.
+ */
+ int num = count;
+ u16 *_buf = buf;
+
+ if (st->gpio_frstdata) {
+ insw((unsigned long)st->base_address, _buf, 1);
+ if (!gpiod_get_value(st->gpio_frstdata)) {
+ ad7606_reset(st);
+ return -EIO;
+ }
+ _buf++;
+ num--;
+ }
+ insw((unsigned long)st->base_address, _buf, num);
return 0;
}
@@ -35,8 +57,28 @@ static int ad7606_par8_read_block(struct device *dev,
{
struct iio_dev *indio_dev = dev_get_drvdata(dev);
struct ad7606_state *st = iio_priv(indio_dev);
+ /*
+ * On the parallel interface, the frstdata signal is set to high while
+ * and after reading the sample of the first channel and low for all
+ * other channels. This can be used to check that the incoming data is
+ * correctly aligned. During normal operation the data should never
+ * become unaligned, but some glitch or electrostatic discharge might
+ * cause an extra read or clock cycle. Monitoring the frstdata signal
+ * allows to recover from such failure situations.
+ */
+ int num = count;
+ u16 *_buf = buf;
- insb((unsigned long)st->base_address, buf, count * 2);
+ if (st->gpio_frstdata) {
+ insb((unsigned long)st->base_address, _buf, 2);
+ if (!gpiod_get_value(st->gpio_frstdata)) {
+ ad7606_reset(st);
+ return -EIO;
+ }
+ _buf++;
+ num--;
+ }
+ insb((unsigned long)st->base_address, _buf, num * 2);
return 0;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b48aa991758999d4e8f9296c5bbe388f293ef465
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090940-shale-handcart-eb5d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
b48aa9917589 ("staging: iio: frequency: ad9834: Validate frequency parameter value")
8e8040c52e63 ("staging: iio: frequency: ad9833: Load clock using clock framework")
80109c32348d ("staging: iio: frequency: ad9833: Get frequency value statically")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b48aa991758999d4e8f9296c5bbe388f293ef465 Mon Sep 17 00:00:00 2001
From: Aleksandr Mishin <amishin(a)t-argos.ru>
Date: Wed, 3 Jul 2024 18:45:06 +0300
Subject: [PATCH] staging: iio: frequency: ad9834: Validate frequency parameter
value
In ad9834_write_frequency() clk_get_rate() can return 0. In such case
ad9834_calc_freqreg() call will lead to division by zero. Checking
'if (fout > (clk_freq / 2))' doesn't protect in case of 'fout' is 0.
ad9834_write_frequency() is called from ad9834_write(), where fout is
taken from text buffer, which can contain any value.
Modify parameters checking.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 12b9d5bf76bf ("Staging: IIO: DDS: AD9833 / AD9834 driver")
Suggested-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Link: https://patch.msgid.link/20240703154506.25584-1-amishin@t-argos.ru
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/staging/iio/frequency/ad9834.c b/drivers/staging/iio/frequency/ad9834.c
index a7a5cdcc6590..47e7d7e6d920 100644
--- a/drivers/staging/iio/frequency/ad9834.c
+++ b/drivers/staging/iio/frequency/ad9834.c
@@ -114,7 +114,7 @@ static int ad9834_write_frequency(struct ad9834_state *st,
clk_freq = clk_get_rate(st->mclk);
- if (fout > (clk_freq / 2))
+ if (!clk_freq || fout > (clk_freq / 2))
return -EINVAL;
regval = ad9834_calc_freqreg(clk_freq, fout);
On Sun, 08 Sep 2024 14:36:37 +0100,
Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> irqchip-gic-v4-make-sure-a-vpe-is-locked-when-vmapp-.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 1a232324773145ff7ce59b6a1b52b3247223f9d4
> Author: Marc Zyngier <maz(a)kernel.org>
> Date: Fri Jul 5 10:31:55 2024 +0100
>
> irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
>
> [ Upstream commit a84a07fa3100d7ad46a3d6882af25a3df9c9e7e3 ]
>
> In order to make sure that vpe->col_idx is correctly sampled when a VMAPP
> command is issued, the vpe_lock must be held for the VPE. This is now
> possible since the introduction of the per-VM vmapp_lock, which can be
> taken before vpe_lock in the correct locking order.
>
> Signed-off-by: Marc Zyngier <maz(a)kernel.org>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Tested-by: Nianyao Tang <tangnianyao(a)huawei.com>
> Link: https://lore.kernel.org/r/20240705093155.871070-4-maz@kernel.org
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index e25dea0e50c7..1e0f0e1bf481 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1804,7 +1804,9 @@ static void its_map_vm(struct its_node *its, struct its_vm *vm)
> for (i = 0; i < vm->nr_vpes; i++) {
> struct its_vpe *vpe = vm->vpes[i];
>
> - its_send_vmapp(its, vpe, true);
> + scoped_guard(raw_spinlock, &vpe->vpe_lock)
> + its_send_vmapp(its, vpe, true);
> +
> its_send_vinvall(its, vpe);
> }
> }
> @@ -1825,8 +1827,10 @@ static void its_unmap_vm(struct its_node *its, struct its_vm *vm)
> if (!--vm->vlpi_count[its->list_nr]) {
> int i;
>
> - for (i = 0; i < vm->nr_vpes; i++)
> + for (i = 0; i < vm->nr_vpes; i++) {
> + guard(raw_spinlock)(&vm->vpes[i]->vpe_lock);
> its_send_vmapp(its, vm->vpes[i], false);
> + }
> }
>
> raw_spin_unlock_irqrestore(&vmovp_lock, flags);
>
No please.
Not only you are missing the essential part of the series (the patch
introducing the per-VM lock that this change relies on), you are also
missing the fixes that followed.
So please drop this patch from the 6.6 and 6.1 queues.
M.
--
Without deviation from the norm, progress is not possible.
On Sun, Sep 08, 2024 at 09:25:58AM GMT, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> bcachefs: Add error code to defer option parsing
???
Sasha, this and the other patch aren't bugfixes at all, they're prep
work for the new mount API, i.e. feature work.
Please just drop the bcachefs patches from stable entirely; the lockless
IO patch revert is a fix but I'll be sending that with a couple other
fixes in a day or so.
Am Sonntag, dem 08.09.2024 um 09:37 -0400 schrieb Sasha Levin:
> This is a note to let you know that I've just added the patch titled
>
> wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> wifi-mt76-mt7921-fix-null-pointer-access-in-mt7921_i.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 857d7854c40324bfc70a6d32c9eb0792bc7c0b56
> Author: Bert Karwatzki <spasswolf(a)web.de>
> Date: Mon Aug 12 12:45:41 2024 +0200
>
> wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change
>
> [ Upstream commit 479ffee68d59c599f8aed8fa2dcc8e13e7bd13c3 ]
>
> When disabling wifi mt7921_ipv6_addr_change() is called as a notifier.
> At this point mvif->phy is already NULL so we cannot use it here.
>
> Signed-off-by: Bert Karwatzki <spasswolf(a)web.de>
> Signed-off-by: Felix Fietkau <nbd(a)nbd.name>
> Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
> Link: https://patch.msgid.link/20240812104542.80760-1-spasswolf@web.de
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> index 6a5c2cae087d..6dec54431312 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
> @@ -1095,7 +1095,7 @@ static void mt7921_ipv6_addr_change(struct ieee80211_hw *hw,
> struct inet6_dev *idev)
> {
> struct mt792x_vif *mvif = (struct mt792x_vif *)vif->drv_priv;
> - struct mt792x_dev *dev = mvif->phy->dev;
> + struct mt792x_dev *dev = mt792x_hw_dev(hw);
> struct inet6_ifaddr *ifa;
> struct in6_addr ns_addrs[IEEE80211_BSS_ARP_ADDR_LIST_LEN];
> struct sk_buff *skb;
The patch is only fixes a NULL pointer if the tree also contains this commit:
commit 574e609c4e6a0843a9ed53de79e00da8fb3e7437
Author: Felix Fietkau <nbd(a)nbd.name>
Date: Thu Jul 4 15:09:47 2024 +0200
wifi: mac80211: clear vif drv_priv after remove_interface when stopping
Avoid reusing stale driver data when an interface is brought down and up
again. In order to avoid having to duplicate the memset in every single
driver, do it here.
Signed-off-by: Felix Fietkau <nbd(a)nbd.name>
Link: https://patch.msgid.link/20240704130947.48609-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
In trees which do not contain this the patch is not necessary.
Bert Karwatzki
The patch "intel: legacy: Partial revert of field get conversion"
(commit ba54b1a276a6b69d80649942fe5334d19851443e in mainline) fixes a
broken refactoring that prevents Wake-on-LAN from working on some e1000e
devices.
v6.10 already includes that fix and v6.1 and earlier did not yet contain
the offending refactoring, so it should only be necessary to apply this
to 6.6.
Thanks!
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x e399257349098bf7c84343f99efb2bc9c22eb9fd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090839-crimp-posted-6a31@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
e39925734909 ("mm/memcontrol: respect zswap.writeback setting from parent cg too")
2b33a97c94bc ("mm: zswap: rename is_zswap_enabled() to zswap_is_enabled()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e399257349098bf7c84343f99efb2bc9c22eb9fd Mon Sep 17 00:00:00 2001
From: Mike Yuan <me(a)yhndnzj.com>
Date: Fri, 23 Aug 2024 16:27:06 +0000
Subject: [PATCH] mm/memcontrol: respect zswap.writeback setting from parent cg
too
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, the behavior of zswap.writeback wrt. the cgroup hierarchy
seems a bit odd. Unlike zswap.max, it doesn't honor the value from parent
cgroups. This surfaced when people tried to globally disable zswap
writeback, i.e. reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=1 still performing writeback.
The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups. It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value. Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.
[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Dis…
[2] https://github.com/systemd/systemd/pull/31734
Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling")
Signed-off-by: Mike Yuan <me(a)yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs(a)gmail.com>
Acked-by: Yosry Ahmed <yosryahmed(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Michal Koutný <mkoutny(a)suse.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 86311c2907cd..95c18bc17083 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1717,9 +1717,10 @@ The following nested keys are defined.
entries fault back in or are written out to disk.
memory.zswap.writeback
- A read-write single value file. The default value is "1". The
- initial value of the root cgroup is 1, and when a new cgroup is
- created, it inherits the current value of its parent.
+ A read-write single value file. The default value is "1".
+ Note that this setting is hierarchical, i.e. the writeback would be
+ implicitly disabled for child cgroups if the upper hierarchy
+ does so.
When this is set to 0, all swapping attempts to swapping devices
are disabled. This included both zswap writebacks, and swapping due
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f29157288b7d..d563fb515766 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3613,8 +3613,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
memcg1_soft_limit_reset(memcg);
#ifdef CONFIG_ZSWAP
memcg->zswap_max = PAGE_COUNTER_MAX;
- WRITE_ONCE(memcg->zswap_writeback,
- !parent || READ_ONCE(parent->zswap_writeback));
+ WRITE_ONCE(memcg->zswap_writeback, true);
#endif
page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX);
if (parent) {
@@ -5320,7 +5319,14 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
{
/* if zswap is disabled, do not block pages going to the swapping device */
- return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback);
+ if (!zswap_is_enabled())
+ return true;
+
+ for (; memcg; memcg = parent_mem_cgroup(memcg))
+ if (!READ_ONCE(memcg->zswap_writeback))
+ return false;
+
+ return true;
}
static u64 zswap_current_read(struct cgroup_subsys_state *css,
On Tue, 2024-08-13 at 08:25 -0700, Darrick J. Wong wrote:
> On Tue, Aug 13, 2024 at 04:59:25PM +0200, Christoph Hellwig wrote:
> > On Tue, Aug 13, 2024 at 07:19:28PM +1000, Dave Chinner wrote:
> > > In hindsight, this was a wholly avoidable bug - a single patch made
> > > two different API modifications that only differed by a single
> > > letter, and one of the 23 conversions missed a single letter. If
> > > that was two patches - one for the finobt conversion, the second for
> > > the inobt conversion, the bug would have been plainly obvious during
> > > review....
> >
> > Maybe we should avoid identifiers that close anyway :)
> >
> > The change looks good:
> >
> > Reviewed-by: Christoph Hellwig <hch(a)lst.de>
>
> Looks good to me too
> Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Is it (now commit 95179935bea) going to -stable too?
Setting the PF_NO_SETAFFINITY flag creates problems in combination with
cpuset operations (see commit messages for details). To mitigate this, fixes have
been written to remove the flag from the poller threads, which landed in v6.3. We
need them in v6.1 as well.
Best regards,
Felix Moessbauer
Siemens AG
Jens Axboe (1):
io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers
Michal Koutný (1):
io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads
io_uring/io-wq.c | 16 +++++++++++-----
io_uring/sqpoll.c | 1 -
2 files changed, 11 insertions(+), 6 deletions(-)
--
2.39.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ef34a6ea0cab1800f4b3c9c3c2cefd5091e03379
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024082642-google-strongman-27a7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ef34a6ea0cab ("mptcp: pm: re-using ID of unused flushed subflows")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ef34a6ea0cab1800f4b3c9c3c2cefd5091e03379 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Mon, 19 Aug 2024 21:45:23 +0200
Subject: [PATCH] mptcp: pm: re-using ID of unused flushed subflows
If no subflows are attached to the 'subflow' endpoints that are being
flushed, the corresponding addr IDs will not be marked as available
again.
Mark all ID as being available when flushing all the 'subflow'
endpoints, and reset local_addr_used counter to cover these cases.
Note that mptcp_pm_remove_addrs_and_subflows() helper is only called for
flushing operations, not to remove a specific set of addresses and
subflows.
Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-5-38035d40de5b…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 8b232a210a06..2c26696b820e 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1623,8 +1623,15 @@ static void mptcp_pm_remove_addrs_and_subflows(struct mptcp_sock *msk,
mptcp_pm_remove_addr(msk, &alist);
spin_unlock_bh(&msk->pm.lock);
}
+
if (slist.nr)
mptcp_pm_remove_subflow(msk, &slist);
+
+ /* Reset counters: maybe some subflows have been removed before */
+ spin_lock_bh(&msk->pm.lock);
+ bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
+ msk->pm.local_addr_used = 0;
+ spin_unlock_bh(&msk->pm.lock);
}
static void mptcp_nl_remove_addrs_list(struct net *net,
From: Chuck Lever <chuck.lever(a)oracle.com>
Following up on
https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
Here is a backport series targeting origin/linux-5.10.y that closes
the information leak described in the above thread.
Review comments welcome.
Chuck Lever (6):
NFSD: Refactor nfsd_reply_cache_free_locked()
NFSD: Rename nfsd_reply_cache_alloc()
NFSD: Replace nfsd_prune_bucket()
NFSD: Refactor the duplicate reply cache shrinker
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Jeff Layton (2):
nfsd: move reply cache initialization into nfsd startup
nfsd: move init of percpu reply_cache_stats counters back to
nfsd_init_net
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
NeilBrown (1):
NFSD: simplify error paths in nfsd_svc()
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/export.c | 32 ++++--
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfscache.c | 202 ++++++++++++++++++++++---------------
fs/nfsd/nfsctl.c | 24 ++---
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 38 ++++---
fs/nfsd/stats.c | 52 ++++------
fs/nfsd/stats.h | 83 ++++++---------
fs/nfsd/trace.h | 22 ++++
fs/nfsd/vfs.c | 6 +-
include/linux/sunrpc/svc.h | 5 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 36 ++++---
18 files changed, 306 insertions(+), 241 deletions(-)
--
2.45.1
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 562755501d44cfbbe82703a62cb41502bd067bd1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090812-ample-stowaway-5c06@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
562755501d44 ("ALSA: hda/realtek: extend quirks for Clevo V5[46]0")
03c5c350e38d ("ALSA: hda/realtek: Add support for new HP G12 laptops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 562755501d44cfbbe82703a62cb41502bd067bd1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<marmarek(a)invisiblethingslab.com>
Date: Tue, 3 Sep 2024 14:49:31 +0200
Subject: [PATCH] ALSA: hda/realtek: extend quirks for Clevo V5[46]0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The mic in those laptops suffers too high gain resulting in mostly (fan
or else) noise being recorded. In addition to the existing fixup about
mic detection, apply also limiting its boost. While at it, extend the
quirk to also V5[46]0TNE models, which have the same issue.
Signed-off-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Cc: <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20240903124939.6213-1-marmarek@invisiblethingslab.…
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index ff62702a8226..fd7711d69823 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7638,6 +7638,7 @@ enum {
ALC287_FIXUP_LENOVO_14ARP8_LEGION_IAH7,
ALC287_FIXUP_LENOVO_SSID_17AA3820,
ALCXXX_FIXUP_CS35LXX,
+ ALC245_FIXUP_CLEVO_NOISY_MIC,
};
/* A special fixup for Lenovo C940 and Yoga Duet 7;
@@ -9977,6 +9978,12 @@ static const struct hda_fixup alc269_fixups[] = {
.type = HDA_FIXUP_FUNC,
.v.func = cs35lxx_autodet_fixup,
},
+ [ALC245_FIXUP_CLEVO_NOISY_MIC] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = alc269_fixup_limit_int_mic_boost,
+ .chained = true,
+ .chain_id = ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE,
+ },
};
static const struct snd_pci_quirk alc269_fixup_tbl[] = {
@@ -10626,7 +10633,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1558, 0xa600, "Clevo NL50NU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0xa650, "Clevo NP[567]0SN[CD]", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0xa671, "Clevo NP70SN[CDE]", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
- SND_PCI_QUIRK(0x1558, 0xa763, "Clevo V54x_6x_TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1558, 0xa741, "Clevo V54x_6x_TNE", ALC245_FIXUP_CLEVO_NOISY_MIC),
+ SND_PCI_QUIRK(0x1558, 0xa763, "Clevo V54x_6x_TU", ALC245_FIXUP_CLEVO_NOISY_MIC),
SND_PCI_QUIRK(0x1558, 0xb018, "Clevo NP50D[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0xb019, "Clevo NH77D[BE]Q", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1558, 0xb022, "Clevo NH77D[DC][QW]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE),
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 65444581a4aecf0e96b4691bb20fc75c602f5863
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090809-wrongly-repulsive-5a71@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
65444581a4ae ("drm/amd/display: Determine IPS mode by ASIC and PMFW versions")
234e94555800 ("drm/amd/display: Enable copying of bounding box data from VBIOS DMUB")
afca033f10d3 ("drm/amd/display: Add periodic detection for IPS")
05c5ffaac770 ("drm/amd/display: gpuvm handling in DML21")
9ba971b25316 ("drm/amd/display: Re-enable IPS2 for static screen")
70839da63605 ("drm/amd/display: Add new DCN401 sources")
14813934b629 ("drm/amd/display: Allow RCG for Static Screen + LVP for DCN35")
e779f4587f61 ("drm/amd/display: Add handling for DC power mode")
cc263c3a0c9f ("drm/amd/display: remove context->dml2 dependency from DML21 wrapper")
d62d5551dd61 ("drm/amd/display: Backup and restore only on full updates")
2d5bb791e24f ("drm/amd/display: Implement update_planes_and_stream_v3 sequence")
4f5b8d78ca43 ("drm/amd/display: Init DPPCLK from SMU on dcn32")
2728e9c7c842 ("drm/amd/display: add DC changes for DCN351")
d2dea1f14038 ("drm/amd/display: Generalize new minimal transition path")
0701117efd1e ("Revert "drm/amd/display: For FPO and SubVP/DRR configs program vmin/max sel"")
a9b1a4f684b3 ("drm/amd/display: Add more checks for exiting idle in DC")
13b3d6bdbeb4 ("drm/amd/display: add debugfs disallow edp psr")
dcbf438d4834 ("drm/amd/display: Unify optimize_required flags and VRR adjustments")
1630c6ded587 ("drm/amd/display: "Enable IPS by default"")
8457bddc266c ("drm/amd/display: Revert "Rework DC Z10 restore"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65444581a4aecf0e96b4691bb20fc75c602f5863 Mon Sep 17 00:00:00 2001
From: Leo Li <sunpeng.li(a)amd.com>
Date: Tue, 27 Aug 2024 11:29:53 -0400
Subject: [PATCH] drm/amd/display: Determine IPS mode by ASIC and PMFW versions
[Why]
DCN IPS interoperates with other system idle power features, such as
Zstates.
On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a
hard hang. We observe this on systems where the SBIOS allows Z8.
Though there is a SBIOS fix, there's no guarantee that users will get it
any time soon, or even install it. A workaround is needed to prevent
this from rearing its head in the wild.
[How]
For DCN35, check the pmfw version to determine whether the SBIOS has the
fix. If not, set IPS1+RCG as the deepest possible state in all cases
except for s0ix and display off (DPMS). Otherwise, enable all IPS
Signed-off-by: Leo Li <sunpeng.li(a)amd.com>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 28d43d0895896f84c038d906d244e0a95eb243ec)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 983a977632ff..e6cea5b9bdb3 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1752,6 +1752,30 @@ static struct dml2_soc_bb *dm_dmub_get_vbios_bounding_box(struct amdgpu_device *
return bb;
}
+static enum dmub_ips_disable_type dm_get_default_ips_mode(
+ struct amdgpu_device *adev)
+{
+ /*
+ * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
+ * cause a hard hang. A fix exists for newer PMFW.
+ *
+ * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
+ * IPS state in all cases, except for s0ix and all displays off (DPMS),
+ * where IPS2 is allowed.
+ *
+ * When checking pmfw version, use the major and minor only.
+ */
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) == IP_VERSION(3, 5, 0) &&
+ (adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
+ return DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) >= IP_VERSION(3, 5, 0))
+ return DMUB_IPS_ENABLE;
+
+ /* ASICs older than DCN35 do not have IPSs */
+ return DMUB_IPS_DISABLE_ALL;
+}
+
static int amdgpu_dm_init(struct amdgpu_device *adev)
{
struct dc_init_data init_data;
@@ -1863,7 +1887,7 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
if (amdgpu_dc_debug_mask & DC_DISABLE_IPS)
init_data.flags.disable_ips = DMUB_IPS_DISABLE_ALL;
else
- init_data.flags.disable_ips = DMUB_IPS_ENABLE;
+ init_data.flags.disable_ips = dm_get_default_ips_mode(adev);
init_data.flags.disable_ips_in_vpb = 0;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 65444581a4aecf0e96b4691bb20fc75c602f5863
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090805-visa-hankering-f46e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
65444581a4ae ("drm/amd/display: Determine IPS mode by ASIC and PMFW versions")
234e94555800 ("drm/amd/display: Enable copying of bounding box data from VBIOS DMUB")
afca033f10d3 ("drm/amd/display: Add periodic detection for IPS")
05c5ffaac770 ("drm/amd/display: gpuvm handling in DML21")
9ba971b25316 ("drm/amd/display: Re-enable IPS2 for static screen")
70839da63605 ("drm/amd/display: Add new DCN401 sources")
14813934b629 ("drm/amd/display: Allow RCG for Static Screen + LVP for DCN35")
e779f4587f61 ("drm/amd/display: Add handling for DC power mode")
cc263c3a0c9f ("drm/amd/display: remove context->dml2 dependency from DML21 wrapper")
d62d5551dd61 ("drm/amd/display: Backup and restore only on full updates")
2d5bb791e24f ("drm/amd/display: Implement update_planes_and_stream_v3 sequence")
4f5b8d78ca43 ("drm/amd/display: Init DPPCLK from SMU on dcn32")
2728e9c7c842 ("drm/amd/display: add DC changes for DCN351")
d2dea1f14038 ("drm/amd/display: Generalize new minimal transition path")
0701117efd1e ("Revert "drm/amd/display: For FPO and SubVP/DRR configs program vmin/max sel"")
a9b1a4f684b3 ("drm/amd/display: Add more checks for exiting idle in DC")
13b3d6bdbeb4 ("drm/amd/display: add debugfs disallow edp psr")
dcbf438d4834 ("drm/amd/display: Unify optimize_required flags and VRR adjustments")
1630c6ded587 ("drm/amd/display: "Enable IPS by default"")
8457bddc266c ("drm/amd/display: Revert "Rework DC Z10 restore"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65444581a4aecf0e96b4691bb20fc75c602f5863 Mon Sep 17 00:00:00 2001
From: Leo Li <sunpeng.li(a)amd.com>
Date: Tue, 27 Aug 2024 11:29:53 -0400
Subject: [PATCH] drm/amd/display: Determine IPS mode by ASIC and PMFW versions
[Why]
DCN IPS interoperates with other system idle power features, such as
Zstates.
On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a
hard hang. We observe this on systems where the SBIOS allows Z8.
Though there is a SBIOS fix, there's no guarantee that users will get it
any time soon, or even install it. A workaround is needed to prevent
this from rearing its head in the wild.
[How]
For DCN35, check the pmfw version to determine whether the SBIOS has the
fix. If not, set IPS1+RCG as the deepest possible state in all cases
except for s0ix and display off (DPMS). Otherwise, enable all IPS
Signed-off-by: Leo Li <sunpeng.li(a)amd.com>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 28d43d0895896f84c038d906d244e0a95eb243ec)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 983a977632ff..e6cea5b9bdb3 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1752,6 +1752,30 @@ static struct dml2_soc_bb *dm_dmub_get_vbios_bounding_box(struct amdgpu_device *
return bb;
}
+static enum dmub_ips_disable_type dm_get_default_ips_mode(
+ struct amdgpu_device *adev)
+{
+ /*
+ * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
+ * cause a hard hang. A fix exists for newer PMFW.
+ *
+ * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
+ * IPS state in all cases, except for s0ix and all displays off (DPMS),
+ * where IPS2 is allowed.
+ *
+ * When checking pmfw version, use the major and minor only.
+ */
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) == IP_VERSION(3, 5, 0) &&
+ (adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
+ return DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) >= IP_VERSION(3, 5, 0))
+ return DMUB_IPS_ENABLE;
+
+ /* ASICs older than DCN35 do not have IPSs */
+ return DMUB_IPS_DISABLE_ALL;
+}
+
static int amdgpu_dm_init(struct amdgpu_device *adev)
{
struct dc_init_data init_data;
@@ -1863,7 +1887,7 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
if (amdgpu_dc_debug_mask & DC_DISABLE_IPS)
init_data.flags.disable_ips = DMUB_IPS_DISABLE_ALL;
else
- init_data.flags.disable_ips = DMUB_IPS_ENABLE;
+ init_data.flags.disable_ips = dm_get_default_ips_mode(adev);
init_data.flags.disable_ips_in_vpb = 0;
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 65444581a4aecf0e96b4691bb20fc75c602f5863
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090802-spill-spooky-94c0@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
65444581a4ae ("drm/amd/display: Determine IPS mode by ASIC and PMFW versions")
234e94555800 ("drm/amd/display: Enable copying of bounding box data from VBIOS DMUB")
afca033f10d3 ("drm/amd/display: Add periodic detection for IPS")
05c5ffaac770 ("drm/amd/display: gpuvm handling in DML21")
9ba971b25316 ("drm/amd/display: Re-enable IPS2 for static screen")
70839da63605 ("drm/amd/display: Add new DCN401 sources")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65444581a4aecf0e96b4691bb20fc75c602f5863 Mon Sep 17 00:00:00 2001
From: Leo Li <sunpeng.li(a)amd.com>
Date: Tue, 27 Aug 2024 11:29:53 -0400
Subject: [PATCH] drm/amd/display: Determine IPS mode by ASIC and PMFW versions
[Why]
DCN IPS interoperates with other system idle power features, such as
Zstates.
On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a
hard hang. We observe this on systems where the SBIOS allows Z8.
Though there is a SBIOS fix, there's no guarantee that users will get it
any time soon, or even install it. A workaround is needed to prevent
this from rearing its head in the wild.
[How]
For DCN35, check the pmfw version to determine whether the SBIOS has the
fix. If not, set IPS1+RCG as the deepest possible state in all cases
except for s0ix and display off (DPMS). Otherwise, enable all IPS
Signed-off-by: Leo Li <sunpeng.li(a)amd.com>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 28d43d0895896f84c038d906d244e0a95eb243ec)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 983a977632ff..e6cea5b9bdb3 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1752,6 +1752,30 @@ static struct dml2_soc_bb *dm_dmub_get_vbios_bounding_box(struct amdgpu_device *
return bb;
}
+static enum dmub_ips_disable_type dm_get_default_ips_mode(
+ struct amdgpu_device *adev)
+{
+ /*
+ * On DCN35 systems with Z8 enabled, it's possible for IPS2 + Z8 to
+ * cause a hard hang. A fix exists for newer PMFW.
+ *
+ * As a workaround, for non-fixed PMFW, force IPS1+RCG as the deepest
+ * IPS state in all cases, except for s0ix and all displays off (DPMS),
+ * where IPS2 is allowed.
+ *
+ * When checking pmfw version, use the major and minor only.
+ */
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) == IP_VERSION(3, 5, 0) &&
+ (adev->pm.fw_version & 0x00FFFF00) < 0x005D6300)
+ return DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF;
+
+ if (amdgpu_ip_version(adev, DCE_HWIP, 0) >= IP_VERSION(3, 5, 0))
+ return DMUB_IPS_ENABLE;
+
+ /* ASICs older than DCN35 do not have IPSs */
+ return DMUB_IPS_DISABLE_ALL;
+}
+
static int amdgpu_dm_init(struct amdgpu_device *adev)
{
struct dc_init_data init_data;
@@ -1863,7 +1887,7 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
if (amdgpu_dc_debug_mask & DC_DISABLE_IPS)
init_data.flags.disable_ips = DMUB_IPS_DISABLE_ALL;
else
- init_data.flags.disable_ips = DMUB_IPS_ENABLE;
+ init_data.flags.disable_ips = dm_get_default_ips_mode(adev);
init_data.flags.disable_ips_in_vpb = 0;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 38e3285dbd07db44487bbaca8c383a5d7f3c11f3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090844-speech-subzero-1de7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
38e3285dbd07 ("drm/amd/display: Block timing sync for different signals in PMO")
70839da63605 ("drm/amd/display: Add new DCN401 sources")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38e3285dbd07db44487bbaca8c383a5d7f3c11f3 Mon Sep 17 00:00:00 2001
From: Dillon Varone <dillon.varone(a)amd.com>
Date: Thu, 22 Aug 2024 17:52:57 -0400
Subject: [PATCH] drm/amd/display: Block timing sync for different signals in
PMO
PMO assumes that like timings can be synchronized, but DC only allows
this if the signal types match.
Reviewed-by: Austin Zheng <austin.zheng(a)amd.com>
Signed-off-by: Dillon Varone <dillon.varone(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 29d3d6af43135de7bec677f334292ca8dab53d67)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
index 603036df68ba..6547cc2c2a77 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
@@ -811,7 +811,8 @@ static void build_synchronized_timing_groups(
for (j = i + 1; j < display_config->display_config.num_streams; j++) {
if (memcmp(master_timing,
&display_config->display_config.stream_descriptors[j].timing,
- sizeof(struct dml2_timing_cfg)) == 0) {
+ sizeof(struct dml2_timing_cfg)) == 0 &&
+ display_config->display_config.stream_descriptors[i].output.output_encoder == display_config->display_config.stream_descriptors[j].output.output_encoder) {
set_bit_in_bitfield(&pmo->scratch.pmo_dcn4.synchronized_timing_group_masks[timing_group_idx], j);
set_bit_in_bitfield(&stream_mapped_mask, j);
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 38e3285dbd07db44487bbaca8c383a5d7f3c11f3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090842-uncounted-lustrous-8af4@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
38e3285dbd07 ("drm/amd/display: Block timing sync for different signals in PMO")
70839da63605 ("drm/amd/display: Add new DCN401 sources")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38e3285dbd07db44487bbaca8c383a5d7f3c11f3 Mon Sep 17 00:00:00 2001
From: Dillon Varone <dillon.varone(a)amd.com>
Date: Thu, 22 Aug 2024 17:52:57 -0400
Subject: [PATCH] drm/amd/display: Block timing sync for different signals in
PMO
PMO assumes that like timings can be synchronized, but DC only allows
this if the signal types match.
Reviewed-by: Austin Zheng <austin.zheng(a)amd.com>
Signed-off-by: Dillon Varone <dillon.varone(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 29d3d6af43135de7bec677f334292ca8dab53d67)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
index 603036df68ba..6547cc2c2a77 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
@@ -811,7 +811,8 @@ static void build_synchronized_timing_groups(
for (j = i + 1; j < display_config->display_config.num_streams; j++) {
if (memcmp(master_timing,
&display_config->display_config.stream_descriptors[j].timing,
- sizeof(struct dml2_timing_cfg)) == 0) {
+ sizeof(struct dml2_timing_cfg)) == 0 &&
+ display_config->display_config.stream_descriptors[i].output.output_encoder == display_config->display_config.stream_descriptors[j].output.output_encoder) {
set_bit_in_bitfield(&pmo->scratch.pmo_dcn4.synchronized_timing_group_masks[timing_group_idx], j);
set_bit_in_bitfield(&stream_mapped_mask, j);
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 38e3285dbd07db44487bbaca8c383a5d7f3c11f3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090839-curry-shallot-c638@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
38e3285dbd07 ("drm/amd/display: Block timing sync for different signals in PMO")
70839da63605 ("drm/amd/display: Add new DCN401 sources")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38e3285dbd07db44487bbaca8c383a5d7f3c11f3 Mon Sep 17 00:00:00 2001
From: Dillon Varone <dillon.varone(a)amd.com>
Date: Thu, 22 Aug 2024 17:52:57 -0400
Subject: [PATCH] drm/amd/display: Block timing sync for different signals in
PMO
PMO assumes that like timings can be synchronized, but DC only allows
this if the signal types match.
Reviewed-by: Austin Zheng <austin.zheng(a)amd.com>
Signed-off-by: Dillon Varone <dillon.varone(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 29d3d6af43135de7bec677f334292ca8dab53d67)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
index 603036df68ba..6547cc2c2a77 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c
@@ -811,7 +811,8 @@ static void build_synchronized_timing_groups(
for (j = i + 1; j < display_config->display_config.num_streams; j++) {
if (memcmp(master_timing,
&display_config->display_config.stream_descriptors[j].timing,
- sizeof(struct dml2_timing_cfg)) == 0) {
+ sizeof(struct dml2_timing_cfg)) == 0 &&
+ display_config->display_config.stream_descriptors[i].output.output_encoder == display_config->display_config.stream_descriptors[j].output.output_encoder) {
set_bit_in_bitfield(&pmo->scratch.pmo_dcn4.synchronized_timing_group_masks[timing_group_idx], j);
set_bit_in_bitfield(&stream_mapped_mask, j);
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x e8705632435ae2f2253b65d3786da389982e8813
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090848-sharpness-hunk-c88f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
e8705632435a ("drm/i915: Fix readout degamma_lut mismatch on ilk/snb")
da8c3cdb016c ("drm/i915: Rename bigjoiner master/slave to bigjoiner primary/secondary")
fb4943574f92 ("drm/i915: Rename all bigjoiner to joiner")
578ff98403ce ("drm/i915: Allow bigjoiner for MST")
3607b30836ae ("drm/i915: Handle joined pipes inside hsw_crtc_enable()")
e16bcbb01186 ("drm/i915: Handle joined pipes inside hsw_crtc_disable()")
2b8ad19d3ed6 ("drm/i915: Introduce intel_crtc_joined_pipe_mask()")
e43b4f7980f8 ("drm/i915: Pass connector to intel_dp_need_bigjoiner()")
5a1527ed8b43 ("drm/i915/mst: Check intel_dp_joiner_needs_dsc()")
aa099402f98b ("drm/i915: Extract intel_dp_joiner_needs_dsc()")
c0b8afc3a777 ("drm/i915: s/intel_dp_can_bigjoiner()/intel_dp_has_bigjoiner()/")
e02ef5553d9b ("drm/i915: Update pipes in reverse order for bigjoiner")
3a5e09d82f97 ("drm/i915: Fix intel_modeset_pipe_config_late() for bigjoiner")
f9d5e51db656 ("drm/i915/vrr: Disable VRR when using bigjoiner")
ef79820db723 ("drm/i915: Disable live M/N updates when using bigjoiner")
b37e1347b991 ("drm/i915: Disable port sync when bigjoiner is used")
372fa0c79d3f ("drm/i915/psr: Disable PSR when bigjoiner is used")
7a3f171c8f6a ("drm/i915: Extract glk_need_scaler_clock_gating_wa()")
c922a47913f9 ("drm/i915: Clean up glk_pipe_scaler_clock_gating_wa()")
e9fa99dd47a4 ("drm/i915: Shuffle DP .mode_valid() checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e8705632435ae2f2253b65d3786da389982e8813 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 10 Jul 2024 15:41:37 +0300
Subject: [PATCH] drm/i915: Fix readout degamma_lut mismatch on ilk/snb
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On ilk/snb the pipe may be configured to place the LUT before or
after the CSC depending on various factors, but as there is only
one LUT (no split mode like on IVB+) we only advertise a gamma_lut
and no degamma_lut in the uapi to avoid confusing userspace.
This can cause a problem during readout if the VBIOS/GOP enabled
the LUT in the pre CSC configuration. The current code blindly
assigns the results of the readout to the degamma_lut, which will
cause a failure during the next atomic_check() as we aren't expecting
anything to be in degamma_lut since it's not visible to userspace.
Fix the problem by assigning whatever LUT we read out from the
hardware into gamma_lut.
Cc: stable(a)vger.kernel.org
Fixes: d2559299d339 ("drm/i915: Make ilk_read_luts() capable of degamma readout")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11608
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240710124137.16773-1-ville.…
Reviewed-by: Uma Shankar <uma.shankar(a)intel.com>
(cherry picked from commit 33eca84db6e31091cef63584158ab64704f78462)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_modeset_setup.c b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
index 7602cb30ebf1..e1213f3d93cc 100644
--- a/drivers/gpu/drm/i915/display/intel_modeset_setup.c
+++ b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
@@ -326,6 +326,8 @@ static void intel_modeset_update_connector_atomic_state(struct drm_i915_private
static void intel_crtc_copy_hw_to_uapi_state(struct intel_crtc_state *crtc_state)
{
+ struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev);
+
if (intel_crtc_is_joiner_secondary(crtc_state))
return;
@@ -337,11 +339,30 @@ static void intel_crtc_copy_hw_to_uapi_state(struct intel_crtc_state *crtc_state
crtc_state->uapi.adjusted_mode = crtc_state->hw.adjusted_mode;
crtc_state->uapi.scaling_filter = crtc_state->hw.scaling_filter;
- /* assume 1:1 mapping */
- drm_property_replace_blob(&crtc_state->hw.degamma_lut,
- crtc_state->pre_csc_lut);
- drm_property_replace_blob(&crtc_state->hw.gamma_lut,
- crtc_state->post_csc_lut);
+ if (DISPLAY_INFO(i915)->color.degamma_lut_size) {
+ /* assume 1:1 mapping */
+ drm_property_replace_blob(&crtc_state->hw.degamma_lut,
+ crtc_state->pre_csc_lut);
+ drm_property_replace_blob(&crtc_state->hw.gamma_lut,
+ crtc_state->post_csc_lut);
+ } else {
+ /*
+ * ilk/snb hw may be configured for either pre_csc_lut
+ * or post_csc_lut, but we don't advertise degamma_lut as
+ * being available in the uapi since there is only one
+ * hardware LUT. Always assign the result of the readout
+ * to gamma_lut as that is the only valid source of LUTs
+ * in the uapi.
+ */
+ drm_WARN_ON(&i915->drm, crtc_state->post_csc_lut &&
+ crtc_state->pre_csc_lut);
+
+ drm_property_replace_blob(&crtc_state->hw.degamma_lut,
+ NULL);
+ drm_property_replace_blob(&crtc_state->hw.gamma_lut,
+ crtc_state->post_csc_lut ?:
+ crtc_state->pre_csc_lut);
+ }
drm_property_replace_blob(&crtc_state->uapi.degamma_lut,
crtc_state->hw.degamma_lut);
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x e8705632435ae2f2253b65d3786da389982e8813
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090844-result-caucasian-a9e5@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
e8705632435a ("drm/i915: Fix readout degamma_lut mismatch on ilk/snb")
da8c3cdb016c ("drm/i915: Rename bigjoiner master/slave to bigjoiner primary/secondary")
fb4943574f92 ("drm/i915: Rename all bigjoiner to joiner")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e8705632435ae2f2253b65d3786da389982e8813 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 10 Jul 2024 15:41:37 +0300
Subject: [PATCH] drm/i915: Fix readout degamma_lut mismatch on ilk/snb
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On ilk/snb the pipe may be configured to place the LUT before or
after the CSC depending on various factors, but as there is only
one LUT (no split mode like on IVB+) we only advertise a gamma_lut
and no degamma_lut in the uapi to avoid confusing userspace.
This can cause a problem during readout if the VBIOS/GOP enabled
the LUT in the pre CSC configuration. The current code blindly
assigns the results of the readout to the degamma_lut, which will
cause a failure during the next atomic_check() as we aren't expecting
anything to be in degamma_lut since it's not visible to userspace.
Fix the problem by assigning whatever LUT we read out from the
hardware into gamma_lut.
Cc: stable(a)vger.kernel.org
Fixes: d2559299d339 ("drm/i915: Make ilk_read_luts() capable of degamma readout")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11608
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240710124137.16773-1-ville.…
Reviewed-by: Uma Shankar <uma.shankar(a)intel.com>
(cherry picked from commit 33eca84db6e31091cef63584158ab64704f78462)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_modeset_setup.c b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
index 7602cb30ebf1..e1213f3d93cc 100644
--- a/drivers/gpu/drm/i915/display/intel_modeset_setup.c
+++ b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
@@ -326,6 +326,8 @@ static void intel_modeset_update_connector_atomic_state(struct drm_i915_private
static void intel_crtc_copy_hw_to_uapi_state(struct intel_crtc_state *crtc_state)
{
+ struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev);
+
if (intel_crtc_is_joiner_secondary(crtc_state))
return;
@@ -337,11 +339,30 @@ static void intel_crtc_copy_hw_to_uapi_state(struct intel_crtc_state *crtc_state
crtc_state->uapi.adjusted_mode = crtc_state->hw.adjusted_mode;
crtc_state->uapi.scaling_filter = crtc_state->hw.scaling_filter;
- /* assume 1:1 mapping */
- drm_property_replace_blob(&crtc_state->hw.degamma_lut,
- crtc_state->pre_csc_lut);
- drm_property_replace_blob(&crtc_state->hw.gamma_lut,
- crtc_state->post_csc_lut);
+ if (DISPLAY_INFO(i915)->color.degamma_lut_size) {
+ /* assume 1:1 mapping */
+ drm_property_replace_blob(&crtc_state->hw.degamma_lut,
+ crtc_state->pre_csc_lut);
+ drm_property_replace_blob(&crtc_state->hw.gamma_lut,
+ crtc_state->post_csc_lut);
+ } else {
+ /*
+ * ilk/snb hw may be configured for either pre_csc_lut
+ * or post_csc_lut, but we don't advertise degamma_lut as
+ * being available in the uapi since there is only one
+ * hardware LUT. Always assign the result of the readout
+ * to gamma_lut as that is the only valid source of LUTs
+ * in the uapi.
+ */
+ drm_WARN_ON(&i915->drm, crtc_state->post_csc_lut &&
+ crtc_state->pre_csc_lut);
+
+ drm_property_replace_blob(&crtc_state->hw.degamma_lut,
+ NULL);
+ drm_property_replace_blob(&crtc_state->hw.gamma_lut,
+ crtc_state->post_csc_lut ?:
+ crtc_state->pre_csc_lut);
+ }
drm_property_replace_blob(&crtc_state->uapi.degamma_lut,
crtc_state->hw.degamma_lut);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 72a6e22c604c95ddb3b10b5d3bb85b6ff4dbc34f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090838-thus-fiftieth-f4f7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
72a6e22c604c ("fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 72a6e22c604c95ddb3b10b5d3bb85b6ff4dbc34f Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 26 Aug 2024 19:20:56 +0800
Subject: [PATCH] fscache: delete fscache_cookie_lru_timer when fscache exits
to avoid UAF
The fscache_cookie_lru_timer is initialized when the fscache module
is inserted, but is not deleted when the fscache module is removed.
If timer_reduce() is called before removing the fscache module,
the fscache_cookie_lru_timer will be added to the timer list of
the current cpu. Afterwards, a use-after-free will be triggered
in the softIRQ after removing the fscache module, as follows:
==================================================================
BUG: unable to handle page fault for address: fffffbfff803c9e9
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 #855
Tainted: [W]=WARN
RIP: 0010:__run_timer_base.part.0+0x254/0x8a0
Call Trace:
<IRQ>
tmigr_handle_remote_up+0x627/0x810
__walk_groups.isra.0+0x47/0x140
tmigr_handle_remote+0x1fa/0x2f0
handle_softirqs+0x180/0x590
irq_exit_rcu+0x84/0xb0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:default_idle+0xf/0x20
default_idle_call+0x38/0x60
do_idle+0x2b5/0x300
cpu_startup_entry+0x54/0x60
start_secondary+0x20d/0x280
common_startup_64+0x13e/0x148
</TASK>
Modules linked in: [last unloaded: netfs]
==================================================================
Therefore delete fscache_cookie_lru_timer when removing the fscahe module.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Cc: stable(a)kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com
Acked-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/netfs/fscache_main.c b/fs/netfs/fscache_main.c
index 42e98bb523e3..49849005eb7c 100644
--- a/fs/netfs/fscache_main.c
+++ b/fs/netfs/fscache_main.c
@@ -103,6 +103,7 @@ void __exit fscache_exit(void)
kmem_cache_destroy(fscache_cookie_jar);
fscache_proc_cleanup();
+ timer_shutdown_sync(&fscache_cookie_lru_timer);
destroy_workqueue(fscache_wq);
pr_notice("FS-Cache unloaded\n");
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x cd9253c23aedd61eb5ff11f37a36247cd46faf86
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090853-untagged-gravy-ccd3@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
cd9253c23aed ("btrfs: fix race between direct IO write and fsync when using same fd")
939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
9aa29a20b700 ("btrfs: move the direct IO code into its own file")
04ef7631bfa5 ("btrfs: cleanup duplicated parameters related to btrfs_create_dio_extent()")
9fec848b3a33 ("btrfs: cleanup duplicated parameters related to create_io_em()")
e9ea31fb5c1f ("btrfs: cleanup duplicated parameters related to btrfs_alloc_ordered_extent")
cdc627e65c7e ("btrfs: cleanup duplicated parameters related to can_nocow_file_extent_args")
c77a8c61002e ("btrfs: remove extent_map::block_start member")
e28b851ed9b2 ("btrfs: remove extent_map::block_len member")
4aa7b5d1784f ("btrfs: remove extent_map::orig_start member")
3f255ece2f1e ("btrfs: introduce extra sanity checks for extent maps")
3d2ac9922465 ("btrfs: introduce new members for extent_map")
87a6962f73b1 ("btrfs: export the expected file extent through can_nocow_extent()")
e8fe524da027 ("btrfs: rename extent_map::orig_block_len to disk_num_bytes")
8996f61ab9ff ("btrfs: move fiemap code into its own file")
56b7169f691c ("btrfs: use a btrfs_inode local variable at btrfs_sync_file()")
e641e323abb3 ("btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()")
cef2daba4268 ("btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()")
4e660ca3a98d ("btrfs: use a regular rb_root instead of cached rb_root for extent_map_tree")
7f5830bc964d ("btrfs: rename rb_root member of extent_map_tree from map to root")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9253c23aedd61eb5ff11f37a36247cd46faf86 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 29 Aug 2024 18:25:49 +0100
Subject: [PATCH] btrfs: fix race between direct IO write and fsync when using
same fd
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's VFS lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's VFS lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's VFS lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:983!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Code: 50 d6 86 c0 e8 (...)
RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x24
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x6a/0x90
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? exc_invalid_op+0x50/0x70
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? asm_exc_invalid_op+0x1a/0x20
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? __seccomp_filter+0x31d/0x4f0
__x64_sys_fdatasync+0x4f/0x90
do_syscall_64+0x82/0x160
? do_futex+0xcb/0x190
? __x64_sys_futex+0x10e/0x1d0
? switch_fpu_return+0x4f/0xd0
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of task A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 75fa563e4cac..c8568b1a61c4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -459,7 +459,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 67adbe9d294a..364bce34f034 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -864,13 +864,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
ret = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -879,13 +872,10 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
ret = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9914419f3b7d..2aeb8116549c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1603,7 +1603,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct btrfs_inode *inode = BTRFS_I(d_inode(dentry));
struct btrfs_root *root = inode->root;
@@ -1613,7 +1612,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->vfs_inode.i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 98c03ddc760b..dd9ce9b9f69e 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,6 +27,12 @@ struct btrfs_root_item;
struct btrfs_root;
struct btrfs_path;
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x cd9253c23aedd61eb5ff11f37a36247cd46faf86
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090850-naturist-deafness-b924@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
cd9253c23aed ("btrfs: fix race between direct IO write and fsync when using same fd")
939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
9aa29a20b700 ("btrfs: move the direct IO code into its own file")
04ef7631bfa5 ("btrfs: cleanup duplicated parameters related to btrfs_create_dio_extent()")
9fec848b3a33 ("btrfs: cleanup duplicated parameters related to create_io_em()")
e9ea31fb5c1f ("btrfs: cleanup duplicated parameters related to btrfs_alloc_ordered_extent")
cdc627e65c7e ("btrfs: cleanup duplicated parameters related to can_nocow_file_extent_args")
c77a8c61002e ("btrfs: remove extent_map::block_start member")
e28b851ed9b2 ("btrfs: remove extent_map::block_len member")
4aa7b5d1784f ("btrfs: remove extent_map::orig_start member")
3f255ece2f1e ("btrfs: introduce extra sanity checks for extent maps")
3d2ac9922465 ("btrfs: introduce new members for extent_map")
87a6962f73b1 ("btrfs: export the expected file extent through can_nocow_extent()")
e8fe524da027 ("btrfs: rename extent_map::orig_block_len to disk_num_bytes")
8996f61ab9ff ("btrfs: move fiemap code into its own file")
56b7169f691c ("btrfs: use a btrfs_inode local variable at btrfs_sync_file()")
e641e323abb3 ("btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()")
cef2daba4268 ("btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()")
4e660ca3a98d ("btrfs: use a regular rb_root instead of cached rb_root for extent_map_tree")
7f5830bc964d ("btrfs: rename rb_root member of extent_map_tree from map to root")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9253c23aedd61eb5ff11f37a36247cd46faf86 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 29 Aug 2024 18:25:49 +0100
Subject: [PATCH] btrfs: fix race between direct IO write and fsync when using
same fd
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's VFS lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's VFS lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's VFS lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:983!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Code: 50 d6 86 c0 e8 (...)
RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x24
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x6a/0x90
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? exc_invalid_op+0x50/0x70
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? asm_exc_invalid_op+0x1a/0x20
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? __seccomp_filter+0x31d/0x4f0
__x64_sys_fdatasync+0x4f/0x90
do_syscall_64+0x82/0x160
? do_futex+0xcb/0x190
? __x64_sys_futex+0x10e/0x1d0
? switch_fpu_return+0x4f/0xd0
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of task A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 75fa563e4cac..c8568b1a61c4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -459,7 +459,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 67adbe9d294a..364bce34f034 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -864,13 +864,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
ret = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -879,13 +872,10 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
ret = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9914419f3b7d..2aeb8116549c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1603,7 +1603,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct btrfs_inode *inode = BTRFS_I(d_inode(dentry));
struct btrfs_root *root = inode->root;
@@ -1613,7 +1612,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->vfs_inode.i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 98c03ddc760b..dd9ce9b9f69e 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,6 +27,12 @@ struct btrfs_root_item;
struct btrfs_root;
struct btrfs_path;
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x cd9253c23aedd61eb5ff11f37a36247cd46faf86
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090847-flashing-dimmed-2c7f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
cd9253c23aed ("btrfs: fix race between direct IO write and fsync when using same fd")
939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
9aa29a20b700 ("btrfs: move the direct IO code into its own file")
04ef7631bfa5 ("btrfs: cleanup duplicated parameters related to btrfs_create_dio_extent()")
9fec848b3a33 ("btrfs: cleanup duplicated parameters related to create_io_em()")
e9ea31fb5c1f ("btrfs: cleanup duplicated parameters related to btrfs_alloc_ordered_extent")
cdc627e65c7e ("btrfs: cleanup duplicated parameters related to can_nocow_file_extent_args")
c77a8c61002e ("btrfs: remove extent_map::block_start member")
e28b851ed9b2 ("btrfs: remove extent_map::block_len member")
4aa7b5d1784f ("btrfs: remove extent_map::orig_start member")
3f255ece2f1e ("btrfs: introduce extra sanity checks for extent maps")
3d2ac9922465 ("btrfs: introduce new members for extent_map")
87a6962f73b1 ("btrfs: export the expected file extent through can_nocow_extent()")
e8fe524da027 ("btrfs: rename extent_map::orig_block_len to disk_num_bytes")
8996f61ab9ff ("btrfs: move fiemap code into its own file")
56b7169f691c ("btrfs: use a btrfs_inode local variable at btrfs_sync_file()")
e641e323abb3 ("btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()")
cef2daba4268 ("btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()")
4e660ca3a98d ("btrfs: use a regular rb_root instead of cached rb_root for extent_map_tree")
7f5830bc964d ("btrfs: rename rb_root member of extent_map_tree from map to root")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9253c23aedd61eb5ff11f37a36247cd46faf86 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 29 Aug 2024 18:25:49 +0100
Subject: [PATCH] btrfs: fix race between direct IO write and fsync when using
same fd
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's VFS lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's VFS lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's VFS lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:983!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Code: 50 d6 86 c0 e8 (...)
RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x24
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x6a/0x90
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? exc_invalid_op+0x50/0x70
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? asm_exc_invalid_op+0x1a/0x20
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? __seccomp_filter+0x31d/0x4f0
__x64_sys_fdatasync+0x4f/0x90
do_syscall_64+0x82/0x160
? do_futex+0xcb/0x190
? __x64_sys_futex+0x10e/0x1d0
? switch_fpu_return+0x4f/0xd0
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of task A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 75fa563e4cac..c8568b1a61c4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -459,7 +459,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 67adbe9d294a..364bce34f034 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -864,13 +864,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
ret = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -879,13 +872,10 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
ret = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9914419f3b7d..2aeb8116549c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1603,7 +1603,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct btrfs_inode *inode = BTRFS_I(d_inode(dentry));
struct btrfs_root *root = inode->root;
@@ -1613,7 +1612,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->vfs_inode.i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 98c03ddc760b..dd9ce9b9f69e 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,6 +27,12 @@ struct btrfs_root_item;
struct btrfs_root;
struct btrfs_path;
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x cd9253c23aedd61eb5ff11f37a36247cd46faf86
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090844-flattered-badass-d13c@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
cd9253c23aed ("btrfs: fix race between direct IO write and fsync when using same fd")
939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
9aa29a20b700 ("btrfs: move the direct IO code into its own file")
04ef7631bfa5 ("btrfs: cleanup duplicated parameters related to btrfs_create_dio_extent()")
9fec848b3a33 ("btrfs: cleanup duplicated parameters related to create_io_em()")
e9ea31fb5c1f ("btrfs: cleanup duplicated parameters related to btrfs_alloc_ordered_extent")
cdc627e65c7e ("btrfs: cleanup duplicated parameters related to can_nocow_file_extent_args")
c77a8c61002e ("btrfs: remove extent_map::block_start member")
e28b851ed9b2 ("btrfs: remove extent_map::block_len member")
4aa7b5d1784f ("btrfs: remove extent_map::orig_start member")
3f255ece2f1e ("btrfs: introduce extra sanity checks for extent maps")
3d2ac9922465 ("btrfs: introduce new members for extent_map")
87a6962f73b1 ("btrfs: export the expected file extent through can_nocow_extent()")
e8fe524da027 ("btrfs: rename extent_map::orig_block_len to disk_num_bytes")
8996f61ab9ff ("btrfs: move fiemap code into its own file")
56b7169f691c ("btrfs: use a btrfs_inode local variable at btrfs_sync_file()")
e641e323abb3 ("btrfs: pass a btrfs_inode to btrfs_wait_ordered_range()")
cef2daba4268 ("btrfs: pass a btrfs_inode to btrfs_fdatawrite_range()")
4e660ca3a98d ("btrfs: use a regular rb_root instead of cached rb_root for extent_map_tree")
7f5830bc964d ("btrfs: rename rb_root member of extent_map_tree from map to root")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9253c23aedd61eb5ff11f37a36247cd46faf86 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 29 Aug 2024 18:25:49 +0100
Subject: [PATCH] btrfs: fix race between direct IO write and fsync when using
same fd
If we have 2 threads that are using the same file descriptor and one of
them is doing direct IO writes while the other is doing fsync, we have a
race where we can end up either:
1) Attempt a fsync without holding the inode's lock, triggering an
assertion failures when assertions are enabled;
2) Do an invalid memory access from the fsync task because the file private
points to memory allocated on stack by the direct IO task and it may be
used by the fsync task after the stack was destroyed.
The race happens like this:
1) A user space program opens a file descriptor with O_DIRECT;
2) The program spawns 2 threads using libpthread for example;
3) One of the threads uses the file descriptor to do direct IO writes,
while the other calls fsync using the same file descriptor.
4) Call task A the thread doing direct IO writes and task B the thread
doing fsyncs;
5) Task A does a direct IO write, and at btrfs_direct_write() sets the
file's private to an on stack allocated private with the member
'fsync_skip_inode_lock' set to true;
6) Task B enters btrfs_sync_file() and sees that there's a private
structure associated to the file which has 'fsync_skip_inode_lock' set
to true, so it skips locking the inode's VFS lock;
7) Task A completes the direct IO write, and resets the file's private to
NULL since it had no prior private and our private was stack allocated.
Then it unlocks the inode's VFS lock;
8) Task B enters btrfs_get_ordered_extents_for_logging(), then the
assertion that checks the inode's VFS lock is held fails, since task B
never locked it and task A has already unlocked it.
The stack trace produced is the following:
assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:983!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8
Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020
RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs]
Code: 50 d6 86 c0 e8 (...)
RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246
RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800
RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38
R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800
R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000
FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0
Call Trace:
<TASK>
? __die_body.cold+0x14/0x24
? die+0x2e/0x50
? do_trap+0xca/0x110
? do_error_trap+0x6a/0x90
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? exc_invalid_op+0x50/0x70
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? asm_exc_invalid_op+0x1a/0x20
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a]
? __seccomp_filter+0x31d/0x4f0
__x64_sys_fdatasync+0x4f/0x90
do_syscall_64+0x82/0x160
? do_futex+0xcb/0x190
? __x64_sys_futex+0x10e/0x1d0
? switch_fpu_return+0x4f/0xd0
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
? syscall_exit_to_user_mode+0x72/0x220
? do_syscall_64+0x8e/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Another problem here is if task B grabs the private pointer and then uses
it after task A has finished, since the private was allocated in the stack
of task A, it results in some invalid memory access with a hard to predict
result.
This issue, triggering the assertion, was observed with QEMU workloads by
two users in the Link tags below.
Fix this by not relying on a file's private to pass information to fsync
that it should skip locking the inode and instead pass this information
through a special value stored in current->journal_info. This is safe
because in the relevant section of the direct IO write path we are not
holding a transaction handle, so current->journal_info is NULL.
The following C program triggers the issue:
$ cat repro.c
/* Get the O_DIRECT definition. */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
static int fd;
static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset)
{
while (count > 0) {
ssize_t ret;
ret = pwrite(fd, buf, count, offset);
if (ret < 0) {
if (errno == EINTR)
continue;
return ret;
}
count -= ret;
buf += ret;
}
return 0;
}
static void *fsync_loop(void *arg)
{
while (1) {
int ret;
ret = fsync(fd);
if (ret != 0) {
perror("Fsync failed");
exit(6);
}
}
}
int main(int argc, char *argv[])
{
long pagesize;
void *write_buf;
pthread_t fsyncer;
int ret;
if (argc != 2) {
fprintf(stderr, "Use: %s <file path>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666);
if (fd == -1) {
perror("Failed to open/create file");
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) {
perror("Failed to get page size");
return 2;
}
ret = posix_memalign(&write_buf, pagesize, pagesize);
if (ret) {
perror("Failed to allocate buffer");
return 3;
}
ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL);
if (ret != 0) {
fprintf(stderr, "Failed to create writer thread: %d\n", ret);
return 4;
}
while (1) {
ret = do_write(fd, write_buf, pagesize, 0);
if (ret != 0) {
perror("Write failed");
exit(5);
}
}
return 0;
}
$ mkfs.btrfs -f /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ timeout 10 ./repro /mnt/sdi/foo
Usually the race is triggered within less than 1 second. A test case for
fstests will follow soon.
Reported-by: Paulo Dias <paulo.miguel.dias(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187
Reported-by: Andreas Jahn <jahn-andi(a)web.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199
Reported-by: syzbot+4704b3cc972bd76024f1(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/
Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 75fa563e4cac..c8568b1a61c4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -459,7 +459,6 @@ struct btrfs_file_private {
void *filldir_buf;
u64 last_index;
struct extent_state *llseek_cached_state;
- bool fsync_skip_inode_lock;
};
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 67adbe9d294a..364bce34f034 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -864,13 +864,6 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (IS_ERR_OR_NULL(dio)) {
ret = PTR_ERR_OR_ZERO(dio);
} else {
- struct btrfs_file_private stack_private = { 0 };
- struct btrfs_file_private *private;
- const bool have_private = (file->private_data != NULL);
-
- if (!have_private)
- file->private_data = &stack_private;
-
/*
* If we have a synchronous write, we must make sure the fsync
* triggered by the iomap_dio_complete() call below doesn't
@@ -879,13 +872,10 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
* partial writes due to the input buffer (or parts of it) not
* being already faulted in.
*/
- private = file->private_data;
- private->fsync_skip_inode_lock = true;
+ ASSERT(current->journal_info == NULL);
+ current->journal_info = BTRFS_TRANS_DIO_WRITE_STUB;
ret = iomap_dio_complete(dio);
- private->fsync_skip_inode_lock = false;
-
- if (!have_private)
- file->private_data = NULL;
+ current->journal_info = NULL;
}
/* No increment (+=) because iomap returns a cumulative value. */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9914419f3b7d..2aeb8116549c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1603,7 +1603,6 @@ static inline bool skip_inode_logging(const struct btrfs_log_ctx *ctx)
*/
int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct btrfs_file_private *private = file->private_data;
struct dentry *dentry = file_dentry(file);
struct btrfs_inode *inode = BTRFS_I(d_inode(dentry));
struct btrfs_root *root = inode->root;
@@ -1613,7 +1612,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
int ret = 0, err;
u64 len;
bool full_sync;
- const bool skip_ilock = (private ? private->fsync_skip_inode_lock : false);
+ bool skip_ilock = false;
+
+ if (current->journal_info == BTRFS_TRANS_DIO_WRITE_STUB) {
+ skip_ilock = true;
+ current->journal_info = NULL;
+ lockdep_assert_held(&inode->vfs_inode.i_rwsem);
+ }
trace_btrfs_sync_file(file, datasync);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 98c03ddc760b..dd9ce9b9f69e 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,6 +27,12 @@ struct btrfs_root_item;
struct btrfs_root;
struct btrfs_path;
+/*
+ * Signal that a direct IO write is in progress, to avoid deadlock for sync
+ * direct IO writes when fsync is called during the direct IO write path.
+ */
+#define BTRFS_TRANS_DIO_WRITE_STUB ((void *) 1)
+
/* Radix-tree tag for roots that are part of the trasaction. */
#define BTRFS_ROOT_TRANS_TAG 0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 0ecc5be200c84e67114f3640064ba2bae3ba2f5a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090830-kangaroo-hassle-e959@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
0ecc5be200c8 ("x86/apic: Make x2apic_disable() work correctly")
720a22fd6c1c ("x86/apic: Don't access the APIC when disabling x2APIC")
5a88f354dcd8 ("x86/apic: Split register_apic_address()")
d10a904435fa ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
49062454a3eb ("x86/apic: Rename disable_apic")
bea629d57d00 ("x86/apic: Save the APIC virtual base address")
3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
94a855111ed9 ("Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ecc5be200c84e67114f3640064ba2bae3ba2f5a Mon Sep 17 00:00:00 2001
From: Yuntao Wang <yuntao.wang(a)linux.dev>
Date: Tue, 13 Aug 2024 09:48:27 +0800
Subject: [PATCH] x86/apic: Make x2apic_disable() work correctly
x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even
when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable
it thereby creating inconsistent state.
Due to the early state check for X2APIC_ON, the code path which warns about
a locked X2APIC cannot be reached.
Test for state < X2APIC_ON instead and move the clearing of the state and
mode variables to the place which actually disables X2APIC.
[ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the
right place for back ports ]
Fixes: a57e456a7b28 ("x86/apic: Fix fallout from x2apic cleanup")
Signed-off-by: Yuntao Wang <yuntao.wang(a)linux.dev>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240813014827.895381-1-yuntao.wang@linux.dev
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 66fd4b2a37a3..373638691cd4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1775,12 +1775,9 @@ static __init void apic_set_fixmap(bool read_apic);
static __init void x2apic_disable(void)
{
- u32 x2apic_id, state = x2apic_state;
+ u32 x2apic_id;
- x2apic_mode = 0;
- x2apic_state = X2APIC_DISABLED;
-
- if (state != X2APIC_ON)
+ if (x2apic_state < X2APIC_ON)
return;
x2apic_id = read_apic_id();
@@ -1793,6 +1790,10 @@ static __init void x2apic_disable(void)
}
__x2apic_disable();
+
+ x2apic_mode = 0;
+ x2apic_state = X2APIC_DISABLED;
+
/*
* Don't reread the APIC ID as it was already done from
* check_x2apic() and the APIC driver still is a x2APIC variant,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 0ecc5be200c84e67114f3640064ba2bae3ba2f5a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090830-reactive-jokester-6061@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
0ecc5be200c8 ("x86/apic: Make x2apic_disable() work correctly")
720a22fd6c1c ("x86/apic: Don't access the APIC when disabling x2APIC")
5a88f354dcd8 ("x86/apic: Split register_apic_address()")
d10a904435fa ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
49062454a3eb ("x86/apic: Rename disable_apic")
bea629d57d00 ("x86/apic: Save the APIC virtual base address")
3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
94a855111ed9 ("Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ecc5be200c84e67114f3640064ba2bae3ba2f5a Mon Sep 17 00:00:00 2001
From: Yuntao Wang <yuntao.wang(a)linux.dev>
Date: Tue, 13 Aug 2024 09:48:27 +0800
Subject: [PATCH] x86/apic: Make x2apic_disable() work correctly
x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even
when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable
it thereby creating inconsistent state.
Due to the early state check for X2APIC_ON, the code path which warns about
a locked X2APIC cannot be reached.
Test for state < X2APIC_ON instead and move the clearing of the state and
mode variables to the place which actually disables X2APIC.
[ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the
right place for back ports ]
Fixes: a57e456a7b28 ("x86/apic: Fix fallout from x2apic cleanup")
Signed-off-by: Yuntao Wang <yuntao.wang(a)linux.dev>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240813014827.895381-1-yuntao.wang@linux.dev
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 66fd4b2a37a3..373638691cd4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1775,12 +1775,9 @@ static __init void apic_set_fixmap(bool read_apic);
static __init void x2apic_disable(void)
{
- u32 x2apic_id, state = x2apic_state;
+ u32 x2apic_id;
- x2apic_mode = 0;
- x2apic_state = X2APIC_DISABLED;
-
- if (state != X2APIC_ON)
+ if (x2apic_state < X2APIC_ON)
return;
x2apic_id = read_apic_id();
@@ -1793,6 +1790,10 @@ static __init void x2apic_disable(void)
}
__x2apic_disable();
+
+ x2apic_mode = 0;
+ x2apic_state = X2APIC_DISABLED;
+
/*
* Don't reread the APIC ID as it was already done from
* check_x2apic() and the APIC driver still is a x2APIC variant,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0ecc5be200c84e67114f3640064ba2bae3ba2f5a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090829-clench-kinfolk-03d9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
0ecc5be200c8 ("x86/apic: Make x2apic_disable() work correctly")
720a22fd6c1c ("x86/apic: Don't access the APIC when disabling x2APIC")
5a88f354dcd8 ("x86/apic: Split register_apic_address()")
d10a904435fa ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
49062454a3eb ("x86/apic: Rename disable_apic")
bea629d57d00 ("x86/apic: Save the APIC virtual base address")
3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
94a855111ed9 ("Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ecc5be200c84e67114f3640064ba2bae3ba2f5a Mon Sep 17 00:00:00 2001
From: Yuntao Wang <yuntao.wang(a)linux.dev>
Date: Tue, 13 Aug 2024 09:48:27 +0800
Subject: [PATCH] x86/apic: Make x2apic_disable() work correctly
x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even
when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable
it thereby creating inconsistent state.
Due to the early state check for X2APIC_ON, the code path which warns about
a locked X2APIC cannot be reached.
Test for state < X2APIC_ON instead and move the clearing of the state and
mode variables to the place which actually disables X2APIC.
[ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the
right place for back ports ]
Fixes: a57e456a7b28 ("x86/apic: Fix fallout from x2apic cleanup")
Signed-off-by: Yuntao Wang <yuntao.wang(a)linux.dev>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240813014827.895381-1-yuntao.wang@linux.dev
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 66fd4b2a37a3..373638691cd4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1775,12 +1775,9 @@ static __init void apic_set_fixmap(bool read_apic);
static __init void x2apic_disable(void)
{
- u32 x2apic_id, state = x2apic_state;
+ u32 x2apic_id;
- x2apic_mode = 0;
- x2apic_state = X2APIC_DISABLED;
-
- if (state != X2APIC_ON)
+ if (x2apic_state < X2APIC_ON)
return;
x2apic_id = read_apic_id();
@@ -1793,6 +1790,10 @@ static __init void x2apic_disable(void)
}
__x2apic_disable();
+
+ x2apic_mode = 0;
+ x2apic_state = X2APIC_DISABLED;
+
/*
* Don't reread the APIC ID as it was already done from
* check_x2apic() and the APIC driver still is a x2APIC variant,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0ecc5be200c84e67114f3640064ba2bae3ba2f5a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024090828-tiny-boggle-b7b4@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0ecc5be200c8 ("x86/apic: Make x2apic_disable() work correctly")
720a22fd6c1c ("x86/apic: Don't access the APIC when disabling x2APIC")
5a88f354dcd8 ("x86/apic: Split register_apic_address()")
d10a904435fa ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
49062454a3eb ("x86/apic: Rename disable_apic")
bea629d57d00 ("x86/apic: Save the APIC virtual base address")
3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
94a855111ed9 ("Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ecc5be200c84e67114f3640064ba2bae3ba2f5a Mon Sep 17 00:00:00 2001
From: Yuntao Wang <yuntao.wang(a)linux.dev>
Date: Tue, 13 Aug 2024 09:48:27 +0800
Subject: [PATCH] x86/apic: Make x2apic_disable() work correctly
x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even
when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable
it thereby creating inconsistent state.
Due to the early state check for X2APIC_ON, the code path which warns about
a locked X2APIC cannot be reached.
Test for state < X2APIC_ON instead and move the clearing of the state and
mode variables to the place which actually disables X2APIC.
[ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the
right place for back ports ]
Fixes: a57e456a7b28 ("x86/apic: Fix fallout from x2apic cleanup")
Signed-off-by: Yuntao Wang <yuntao.wang(a)linux.dev>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240813014827.895381-1-yuntao.wang@linux.dev
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 66fd4b2a37a3..373638691cd4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1775,12 +1775,9 @@ static __init void apic_set_fixmap(bool read_apic);
static __init void x2apic_disable(void)
{
- u32 x2apic_id, state = x2apic_state;
+ u32 x2apic_id;
- x2apic_mode = 0;
- x2apic_state = X2APIC_DISABLED;
-
- if (state != X2APIC_ON)
+ if (x2apic_state < X2APIC_ON)
return;
x2apic_id = read_apic_id();
@@ -1793,6 +1790,10 @@ static __init void x2apic_disable(void)
}
__x2apic_disable();
+
+ x2apic_mode = 0;
+ x2apic_state = X2APIC_DISABLED;
+
/*
* Don't reread the APIC ID as it was already done from
* check_x2apic() and the APIC driver still is a x2APIC variant,