The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ffe3986fece696cf65e0ef99e74c75f848be8e30
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042718-handset-kitty-0a0e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ffe3986fece6 ("ring-buffer: Only update pages_touched when a new page is touched")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ffe3986fece696cf65e0ef99e74c75f848be8e30 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 9 Apr 2024 15:13:09 -0400
Subject: [PATCH] ring-buffer: Only update pages_touched when a new page is
touched
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed
2) pages_lost - incremented every time a writer overwrites a sub-buffer
3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 25476ead681b..6511dc3a00da 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1393,7 +1393,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
- local_inc(&cpu_buffer->pages_touched);
/*
* Just make sure we have seen our old_write and synchronize
* with any interrupts that come in.
@@ -1430,8 +1429,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
*/
local_set(&next_page->page->commit, 0);
- /* Again, either we update tail_page or an interrupt does */
- (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page);
+ /* Either we update tail_page or an interrupt does */
+ if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page))
+ local_inc(&cpu_buffer->pages_touched);
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ffe3986fece696cf65e0ef99e74c75f848be8e30
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042707--b5a0@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ffe3986fece6 ("ring-buffer: Only update pages_touched when a new page is touched")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ffe3986fece696cf65e0ef99e74c75f848be8e30 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 9 Apr 2024 15:13:09 -0400
Subject: [PATCH] ring-buffer: Only update pages_touched when a new page is
touched
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed
2) pages_lost - incremented every time a writer overwrites a sub-buffer
3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 25476ead681b..6511dc3a00da 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1393,7 +1393,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
- local_inc(&cpu_buffer->pages_touched);
/*
* Just make sure we have seen our old_write and synchronize
* with any interrupts that come in.
@@ -1430,8 +1429,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
*/
local_set(&next_page->page->commit, 0);
- /* Again, either we update tail_page or an interrupt does */
- (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page);
+ /* Either we update tail_page or an interrupt does */
+ if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page))
+ local_inc(&cpu_buffer->pages_touched);
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ffe3986fece696cf65e0ef99e74c75f848be8e30
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042706--0fec@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ffe3986fece6 ("ring-buffer: Only update pages_touched when a new page is touched")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ffe3986fece696cf65e0ef99e74c75f848be8e30 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 9 Apr 2024 15:13:09 -0400
Subject: [PATCH] ring-buffer: Only update pages_touched when a new page is
touched
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed
2) pages_lost - incremented every time a writer overwrites a sub-buffer
3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 25476ead681b..6511dc3a00da 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1393,7 +1393,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
- local_inc(&cpu_buffer->pages_touched);
/*
* Just make sure we have seen our old_write and synchronize
* with any interrupts that come in.
@@ -1430,8 +1429,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
*/
local_set(&next_page->page->commit, 0);
- /* Again, either we update tail_page or an interrupt does */
- (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page);
+ /* Either we update tail_page or an interrupt does */
+ if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page))
+ local_inc(&cpu_buffer->pages_touched);
}
}
The EXT_RGMII_OOB_CTRL register can be written from different
contexts. It is predominantly written from the adjust_link
handler which is synchronized by the phydev->lock, but can
also be written from a different context when configuring the
mii in bcmgenet_mii_config().
The chances of contention are quite low, but it is conceivable
that adjust_link could occur during resume when WoL is enabled
so use the phydev->lock synchronizer in bcmgenet_mii_config()
to be sure.
Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
---
drivers/net/ethernet/broadcom/genet/bcmmii.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 9ada89355747..86a4aa72b3d4 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -2,7 +2,7 @@
/*
* Broadcom GENET MDIO routines
*
- * Copyright (c) 2014-2017 Broadcom
+ * Copyright (c) 2014-2024 Broadcom
*/
#include <linux/acpi.h>
@@ -275,6 +275,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
* block for the interface to work, unconditionally clear the
* Out-of-band disable since we do not need it.
*/
+ mutex_lock(&phydev->lock);
reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
reg &= ~OOB_DISABLE;
if (priv->ext_phy) {
@@ -286,6 +287,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
reg |= RGMII_MODE_EN;
}
bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+ mutex_unlock(&phydev->lock);
if (init)
dev_info(kdev, "configuring instance for %s\n", phy_name);
--
2.34.1
The patch titled
Subject: kmsan: compiler_types: declare __no_sanitize_or_inline
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-compiler_types-declare-__no_sanitize_or_inline.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kmsan: compiler_types: declare __no_sanitize_or_inline
Date: Fri, 26 Apr 2024 11:16:22 +0200
It turned out that KMSAN instruments READ_ONCE_NOCHECK(), resulting in
false positive reports, because __no_sanitize_or_inline enforced inlining.
Properly declare __no_sanitize_or_inline under __SANITIZE_MEMORY__, so
that it does not __always_inline the annotated function.
Link: https://lkml.kernel.org/r/20240426091622.3846771-1-glider@google.com
Fixes: 5de0ce85f5a4 ("kmsan: mark noinstr as __no_sanitize_memory")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reported-by: syzbot+355c5bb8c1445c871ee8(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/000000000000826ac1061675b0e3@google.com
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Marco Elver <elver(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Miguel Ojeda <ojeda(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler_types.h | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/include/linux/compiler_types.h~kmsan-compiler_types-declare-__no_sanitize_or_inline
+++ a/include/linux/compiler_types.h
@@ -278,6 +278,17 @@ struct ftrace_likely_data {
# define __no_kcsan
#endif
+#ifdef __SANITIZE_MEMORY__
+/*
+ * Similarly to KASAN and KCSAN, KMSAN loses function attributes of inlined
+ * functions, therefore disabling KMSAN checks also requires disabling inlining.
+ *
+ * __no_sanitize_or_inline effectively prevents KMSAN from reporting errors
+ * within the function and marks all its outputs as initialized.
+ */
+# define __no_sanitize_or_inline __no_kmsan_checks notrace __maybe_unused
+#endif
+
#ifndef __no_sanitize_or_inline
#define __no_sanitize_or_inline __always_inline
#endif
_
Patches currently in -mm which might be from glider(a)google.com are
kmsan-compiler_types-declare-__no_sanitize_or_inline.patch
The patch titled
Subject: mm: use memalloc_nofs_save() in page_cache_ra_order()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-use-memalloc_nofs_save-in-page_cache_ra_order.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Subject: mm: use memalloc_nofs_save() in page_cache_ra_order()
Date: Fri, 26 Apr 2024 19:29:38 +0800
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"),
ensure that page_cache_ra_order() do not attempt to reclaim file-backed
pages too, or it leads to a deadlock, found issue when test ext4 large
folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200
Call trace:
__switch_to+0x14c/0x240
__schedule+0x82c/0xdd0
schedule+0x58/0xf0
io_schedule+0x24/0xa0
__folio_lock+0x130/0x300
migrate_pages_batch+0x378/0x918
migrate_pages+0x350/0x700
compact_zone+0x63c/0xb38
compact_zone_order+0xc0/0x118
try_to_compact_pages+0xb0/0x280
__alloc_pages_direct_compact+0x98/0x248
__alloc_pages+0x510/0x1110
alloc_pages+0x9c/0x130
folio_alloc+0x20/0x78
filemap_alloc_folio+0x8c/0x1b0
page_cache_ra_order+0x174/0x308
ondemand_readahead+0x1c8/0x2b8
page_cache_async_ra+0x68/0xb8
filemap_readahead.isra.0+0x64/0xa8
filemap_get_pages+0x3fc/0x5b0
filemap_splice_read+0xf4/0x280
ext4_file_splice_read+0x2c/0x48 [ext4]
vfs_splice_read.part.0+0xa8/0x118
splice_direct_to_actor+0xbc/0x288
do_splice_direct+0x9c/0x108
do_sendfile+0x328/0x468
__arm64_sys_sendfile64+0x8c/0x148
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x4c/0x1f8
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/readahead.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/readahead.c~mm-use-memalloc_nofs_save-in-page_cache_ra_order
+++ a/mm/readahead.c
@@ -490,6 +490,7 @@ void page_cache_ra_order(struct readahea
pgoff_t index = readahead_index(ractl);
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
+ unsigned int nofs;
int err = 0;
gfp_t gfp = readahead_gfp_mask(mapping);
@@ -504,6 +505,8 @@ void page_cache_ra_order(struct readahea
new_order = min_t(unsigned int, new_order, ilog2(ra->size));
}
+ /* See comment in page_cache_ra_unbounded() */
+ nofs = memalloc_nofs_save();
filemap_invalidate_lock_shared(mapping);
while (index <= limit) {
unsigned int order = new_order;
@@ -527,6 +530,7 @@ void page_cache_ra_order(struct readahea
read_pages(ractl);
filemap_invalidate_unlock_shared(mapping);
+ memalloc_nofs_restore(nofs);
/*
* If there were already pages in the page cache, then we may have
_
Patches currently in -mm which might be from wangkefeng.wang(a)huawei.com are
mm-use-memalloc_nofs_save-in-page_cache_ra_order.patch
arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch
arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch
mm-move-mm-counter-updating-out-of-set_pte_range.patch
mm-filemap-batch-mm-counter-updating-in-filemap_map_pages.patch
mm-swapfile-check-usable-swap-device-in-__folio_throttle_swaprate.patch
mm-memory-check-userfaultfd_wp-in-vmf_orig_pte_uffd_wp.patch
The EXT_RGMII_OOB_CTRL register can be written from different
contexts. It is predominantly written from the adjust_link
handler which is synchronized by the phydev->lock, but can
also be written from a different context when configuring the
mii in bcmgenet_mii_config().
The chances of contention are quite low, but it is conceivable
that adjust_link could occur during resume when WoL is enabled
so use the phydev->lock synchronizer in bcmgenet_mii_config()
to be sure.
Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
---
drivers/net/ethernet/broadcom/genet/bcmmii.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 9ada89355747..86a4aa72b3d4 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -2,7 +2,7 @@
/*
* Broadcom GENET MDIO routines
*
- * Copyright (c) 2014-2017 Broadcom
+ * Copyright (c) 2014-2024 Broadcom
*/
#include <linux/acpi.h>
@@ -275,6 +275,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
* block for the interface to work, unconditionally clear the
* Out-of-band disable since we do not need it.
*/
+ mutex_lock(&phydev->lock);
reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
reg &= ~OOB_DISABLE;
if (priv->ext_phy) {
@@ -286,6 +287,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
reg |= RGMII_MODE_EN;
}
bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+ mutex_unlock(&phydev->lock);
if (init)
dev_info(kdev, "configuring instance for %s\n", phy_name);
--
2.34.1
Currently, when kdb is compiled with keyboard support, then we will use
schedule_work() to provoke reset of the keyboard status. Unfortunately
schedule_work() gets called from the kgdboc post-debug-exception
handler. That risks deadlock since schedule_work() is not NMI-safe and,
even on platforms where the NMI is not directly used for debugging, the
debug trap can have NMI-like behaviour depending on where breakpoints
are placed.
Fix this by using the irq work system, which is NMI-safe, to defer the
call to schedule_work() to a point when it is safe to call.
Reported-by: Liuye <liu.yeC(a)h3c.com>
Closes: https://lore.kernel.org/all/20240228025602.3087748-1-liu.yeC@h3c.com/
Cc: stable(a)vger.kernel.org
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
---
@Greg: I'm assuming this could/should go via your tree but feel free
to share an ack if you want me to hoover it up instead.
Changes in v2:
- Fix typo in the big comment (thanks Doug)
- Link to v1: https://lore.kernel.org/r/20240419-kgdboc_fix_schedule_work-v1-1-ff19881677…
---
drivers/tty/serial/kgdboc.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb1640054..58ea1e1391cee 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -19,6 +19,7 @@
#include <linux/console.h>
#include <linux/vt_kern.h>
#include <linux/input.h>
+#include <linux/irq_work.h>
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/serial_core.h>
@@ -48,6 +49,25 @@ static struct kgdb_io kgdboc_earlycon_io_ops;
static int (*earlycon_orig_exit)(struct console *con);
#endif /* IS_BUILTIN(CONFIG_KGDB_SERIAL_CONSOLE) */
+/*
+ * When we leave the debug trap handler we need to reset the keyboard status
+ * (since the original keyboard state gets partially clobbered by kdb use of
+ * the keyboard).
+ *
+ * The path to deliver the reset is somewhat circuitous.
+ *
+ * To deliver the reset we register an input handler, reset the keyboard and
+ * then deregister the input handler. However, to get this done right, we do
+ * have to carefully manage the calling context because we can only register
+ * input handlers from task context.
+ *
+ * In particular we need to trigger the action from the debug trap handler with
+ * all its NMI and/or NMI-like oddities. To solve this the kgdboc trap exit code
+ * (the "post_exception" callback) uses irq_work_queue(), which is NMI-safe, to
+ * schedule a callback from a hardirq context. From there we have to defer the
+ * work again, this time using schedule_work(), to get a callback using the
+ * system workqueue, which runs in task context.
+ */
#ifdef CONFIG_KDB_KEYBOARD
static int kgdboc_reset_connect(struct input_handler *handler,
struct input_dev *dev,
@@ -99,10 +119,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+ schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
static void kgdboc_restore_input(void)
{
if (likely(system_state == SYSTEM_RUNNING))
- schedule_work(&kgdboc_restore_input_work);
+ irq_work_queue(&kgdboc_restore_input_irq_work);
}
static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +160,7 @@ static void kgdboc_unregister_kbd(void)
i--;
}
}
+ irq_work_sync(&kgdboc_restore_input_irq_work);
flush_work(&kgdboc_restore_input_work);
}
#else /* ! CONFIG_KDB_KEYBOARD */
---
base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
change-id: 20240419-kgdboc_fix_schedule_work-f0cb44b8a354
Best regards,
--
Daniel Thompson <daniel.thompson(a)linaro.org>
Inspired by a patch from [Justin][1] I took a closer look at kdb_read().
Despite Justin's patch being a (correct) one-line manipulation it was a
tough patch to review because the surrounding code was hard to read and
it looked like there were unfixed problems.
This series isn't enough to make kdb_read() beautiful but it does make
it shorter, easier to reason about and fixes two buffer overflows and a
screen redraw problem!
[1]: https://lore.kernel.org/all/20240403-strncpy-kernel-debug-kdb-kdb_io-c-v1-1…
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
---
Changes in v3:
- Collected tags from v2
- Added comment to describe the hidden depths of kdb_position_cursor()
(thanks Doug)
- Fixed a couple of typos in the patch descriptions (thanks Doug)
- Link to v2: https://lore.kernel.org/r/20240422-kgdb_read_refactor-v2-0-ed51f7d145fe@lin…
Changes in v2:
- No code changes!
- I belatedly realized that one of the cleanups actually fixed a buffer
overflow so there are changes to Cc: (to add stable@...) and to one
of the patch descriptions.
- Link to v1: https://lore.kernel.org/r/20240416-kgdb_read_refactor-v1-0-b18c2d01076d@lin…
---
Daniel Thompson (7):
kdb: Fix buffer overflow during tab-complete
kdb: Use format-strings rather than '\0' injection in kdb_read()
kdb: Fix console handling when editing and tab-completing commands
kdb: Merge identical case statements in kdb_read()
kdb: Use format-specifiers rather than memset() for padding in kdb_read()
kdb: Replace double memcpy() with memmove() in kdb_read()
kdb: Simplify management of tmpbuffer in kdb_read()
kernel/debug/kdb/kdb_io.c | 153 +++++++++++++++++++++++-----------------------
1 file changed, 78 insertions(+), 75 deletions(-)
---
base-commit: dccce9b8780618986962ba37c373668bcf426866
change-id: 20240415-kgdb_read_refactor-2ea2dfc15dbb
Best regards,
--
Daniel Thompson <daniel.thompson(a)linaro.org>
From: Kan Liang <kan.liang(a)linux.intel.com>
Currently, the Sapphire Rapids and Granite Rapids share the same PMU
name, sapphire_rapids. Because from the kernel’s perspective, GNR is
similar to SPR. The only key difference is that they support different
extra MSRs. The code path and the PMU name are shared.
However, from end users' perspective, they are quite different. Besides
the extra MSRs, GNR has a newer PEBS format, supports Retire Latency,
supports new CPUID enumeration architecture, doesn't required the
load-latency AUX event, has additional TMA Level 1 Architectural Events,
etc. The differences can be enumerated by CPUID or the PERF_CAPABILITIES
MSR. They weren't reflected in the model-specific kernel setup.
But it is worth to have a distinct PMU name for GNR.
Fixes: a6742cb90b56 ("perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL")
Suggested-by: Ahmad Yasin <ahmad.yasin(a)intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f3315f13f920..da38a16b2cbc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6768,12 +6768,17 @@ __init int intel_pmu_init(void)
case INTEL_FAM6_EMERALDRAPIDS_X:
x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
x86_pmu.extra_regs = intel_glc_extra_regs;
+ pr_cont("Sapphire Rapids events, ");
+ name = "sapphire_rapids";
fallthrough;
case INTEL_FAM6_GRANITERAPIDS_X:
case INTEL_FAM6_GRANITERAPIDS_D:
intel_pmu_init_glc(NULL);
- if (!x86_pmu.extra_regs)
+ if (!x86_pmu.extra_regs) {
x86_pmu.extra_regs = intel_rwc_extra_regs;
+ pr_cont("Granite Rapids events, ");
+ name = "granite_rapids";
+ }
x86_pmu.pebs_ept = 1;
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = glc_get_event_constraints;
@@ -6784,8 +6789,6 @@ __init int intel_pmu_init(void)
td_attr = glc_td_events_attrs;
tsx_attr = glc_tsx_events_attrs;
intel_pmu_pebs_data_source_skl(true);
- pr_cont("Sapphire Rapids events, ");
- name = "sapphire_rapids";
break;
case INTEL_FAM6_ALDERLAKE:
--
2.35.1
Qualcomm Bluetooth controllers may not have been provisioned with a
valid device address and instead end up using the default address
00:00:00:00:5a:ad.
This was previously believed to be due to lack of persistent storage for
the address but it may also be due to integrators opting to not use the
on-chip OTP memory and instead store the address elsewhere (e.g. in
storage managed by secure world firmware).
According to Qualcomm, at least WCN6750, WCN6855 and WCN7850 have
on-chip OTP storage for the address.
As the device type alone cannot be used to determine when the address is
valid, instead read back the address during setup() and only set the
HCI_QUIRK_USE_BDADDR_PROPERTY flag when needed.
This specifically makes sure that controllers that have been provisioned
with an address do not start as unconfigured.
Reported-by: Janaki Ramaiah Thota <quic_janathot(a)quicinc.com>
Link: https://lore.kernel.org/r/124a7d54-5a18-4be7-9a76-a12017f6cce5@quicinc.com/
Fixes: 5971752de44c ("Bluetooth: hci_qca: Set HCI_QUIRK_USE_BDADDR_PROPERTY for wcn3990")
Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts")
Fixes: 6945795bc81a ("Bluetooth: fix use-bdaddr-property quirk")
Cc: stable(a)vger.kernel.org # 6.5
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/bluetooth/btqca.c | 38 +++++++++++++++++++++++++++++++++++++
drivers/bluetooth/hci_qca.c | 2 --
2 files changed, 38 insertions(+), 2 deletions(-)
Matthias and Doug,
As Chromium is the only known user of the 'local-bd-address' property,
could you please confirm that your controllers use the 00:00:00:00:5a:ad
address by default so that the quirk continues to be set as intended?
Johan
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 19cfc342fc7b..216826c31ee3 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -15,6 +15,8 @@
#define VERSION "0.1"
+#define QCA_BDADDR_DEFAULT (&(bdaddr_t) {{ 0xad, 0x5a, 0x00, 0x00, 0x00, 0x00 }})
+
int qca_read_soc_version(struct hci_dev *hdev, struct qca_btsoc_version *ver,
enum qca_btsoc_type soc_type)
{
@@ -612,6 +614,38 @@ int qca_set_bdaddr_rome(struct hci_dev *hdev, const bdaddr_t *bdaddr)
}
EXPORT_SYMBOL_GPL(qca_set_bdaddr_rome);
+static int qca_check_bdaddr(struct hci_dev *hdev)
+{
+ struct hci_rp_read_bd_addr *bda;
+ struct sk_buff *skb;
+ int err;
+
+ if (bacmp(&hdev->public_addr, BDADDR_ANY))
+ return 0;
+
+ skb = __hci_cmd_sync(hdev, HCI_OP_READ_BD_ADDR, 0, NULL,
+ HCI_INIT_TIMEOUT);
+ if (IS_ERR(skb)) {
+ err = PTR_ERR(skb);
+ bt_dev_err(hdev, "Failed to read device address (%d)", err);
+ return err;
+ }
+
+ if (skb->len != sizeof(*bda)) {
+ bt_dev_err(hdev, "Device address length mismatch");
+ kfree_skb(skb);
+ return -EIO;
+ }
+
+ bda = (struct hci_rp_read_bd_addr *)skb->data;
+ if (!bacmp(&bda->bdaddr, QCA_BDADDR_DEFAULT))
+ set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
+
+ kfree_skb(skb);
+
+ return 0;
+}
+
static void qca_generate_hsp_nvm_name(char *fwname, size_t max_size,
struct qca_btsoc_version ver, u8 rom_ver, u16 bid)
{
@@ -818,6 +852,10 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate,
break;
}
+ err = qca_check_bdaddr(hdev);
+ if (err)
+ return err;
+
bt_dev_info(hdev, "QCA setup on UART is completed");
return 0;
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index ecbc52eaf101..92fa20f5ac7d 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1905,8 +1905,6 @@ static int qca_setup(struct hci_uart *hu)
case QCA_WCN6750:
case QCA_WCN6855:
case QCA_WCN7850:
- set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
-
qcadev = serdev_device_get_drvdata(hu->serdev);
if (qcadev->bdaddr_property_broken)
set_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN, &hdev->quirks);
--
2.43.2
The Elan eKTH5015M touch controller found on the Lenovo ThinkPad X13s
shares the VCC33 supply with other peripherals that may remain powered
during suspend (e.g. when enabled as wakeup sources).
The reset line is also wired so that it can be left deasserted when the
supply is off.
This is important as it avoids holding the controller in reset for
extended periods of time when it remains powered, which can lead to
increased power consumption, and also avoids leaking current through the
X13s reset circuitry during suspend (and after driver unbind).
Use the new 'no-reset-on-power-off' devicetree property to determine
when reset needs to be asserted on power down.
Notably this also avoids wasting power on machine variants without a
touchscreen for which the driver would otherwise exit probe with reset
asserted.
Fixes: bd3cba00dcc6 ("HID: i2c-hid: elan: Add support for Elan eKTH6915 i2c-hid touchscreens")
Cc: stable(a)vger.kernel.org # 6.0
Cc: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/hid/i2c-hid/i2c-hid-of-elan.c | 37 ++++++++++++++++++++-------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/drivers/hid/i2c-hid/i2c-hid-of-elan.c b/drivers/hid/i2c-hid/i2c-hid-of-elan.c
index 5b91fb106cfc..8a905027d5e9 100644
--- a/drivers/hid/i2c-hid/i2c-hid-of-elan.c
+++ b/drivers/hid/i2c-hid/i2c-hid-of-elan.c
@@ -31,6 +31,7 @@ struct i2c_hid_of_elan {
struct regulator *vcc33;
struct regulator *vccio;
struct gpio_desc *reset_gpio;
+ bool no_reset_on_power_off;
const struct elan_i2c_hid_chip_data *chip_data;
};
@@ -40,17 +41,17 @@ static int elan_i2c_hid_power_up(struct i2chid_ops *ops)
container_of(ops, struct i2c_hid_of_elan, ops);
int ret;
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+
if (ihid_elan->vcc33) {
ret = regulator_enable(ihid_elan->vcc33);
if (ret)
- return ret;
+ goto err_deassert_reset;
}
ret = regulator_enable(ihid_elan->vccio);
- if (ret) {
- regulator_disable(ihid_elan->vcc33);
- return ret;
- }
+ if (ret)
+ goto err_disable_vcc33;
if (ihid_elan->chip_data->post_power_delay_ms)
msleep(ihid_elan->chip_data->post_power_delay_ms);
@@ -60,6 +61,15 @@ static int elan_i2c_hid_power_up(struct i2chid_ops *ops)
msleep(ihid_elan->chip_data->post_gpio_reset_on_delay_ms);
return 0;
+
+err_disable_vcc33:
+ if (ihid_elan->vcc33)
+ regulator_disable(ihid_elan->vcc33);
+err_deassert_reset:
+ if (ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 0);
+
+ return ret;
}
static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
@@ -67,7 +77,14 @@ static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
struct i2c_hid_of_elan *ihid_elan =
container_of(ops, struct i2c_hid_of_elan, ops);
- gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+ /*
+ * Do not assert reset when the hardware allows for it to remain
+ * deasserted regardless of the state of the (shared) power supply to
+ * avoid wasting power when the supply is left on.
+ */
+ if (!ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+
if (ihid_elan->chip_data->post_gpio_reset_off_delay_ms)
msleep(ihid_elan->chip_data->post_gpio_reset_off_delay_ms);
@@ -87,12 +104,14 @@ static int i2c_hid_of_elan_probe(struct i2c_client *client)
ihid_elan->ops.power_up = elan_i2c_hid_power_up;
ihid_elan->ops.power_down = elan_i2c_hid_power_down;
- /* Start out with reset asserted */
- ihid_elan->reset_gpio =
- devm_gpiod_get_optional(&client->dev, "reset", GPIOD_OUT_HIGH);
+ ihid_elan->reset_gpio = devm_gpiod_get_optional(&client->dev, "reset",
+ GPIOD_ASIS);
if (IS_ERR(ihid_elan->reset_gpio))
return PTR_ERR(ihid_elan->reset_gpio);
+ ihid_elan->no_reset_on_power_off = of_property_read_bool(client->dev.of_node,
+ "no-reset-on-power-off");
+
ihid_elan->vccio = devm_regulator_get(&client->dev, "vccio");
if (IS_ERR(ihid_elan->vccio))
return PTR_ERR(ihid_elan->vccio);
--
2.43.2
It turned out that KMSAN instruments READ_ONCE_NOCHECK(), resulting in
false positive reports, because __no_sanitize_or_inline enforced inlining.
Properly declare __no_sanitize_or_inline under __SANITIZE_MEMORY__,
so that it does not __always_inline the annotated function.
Reported-by: syzbot+355c5bb8c1445c871ee8(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/000000000000826ac1061675b0e3@google.com
Fixes: 5de0ce85f5a4 ("kmsan: mark noinstr as __no_sanitize_memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
---
include/linux/compiler_types.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 0caf354cb94b5..a6a28952836cb 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -278,6 +278,17 @@ struct ftrace_likely_data {
# define __no_kcsan
#endif
+#ifdef __SANITIZE_MEMORY__
+/*
+ * Similarly to KASAN and KCSAN, KMSAN loses function attributes of inlined
+ * functions, therefore disabling KMSAN checks also requires disabling inlining.
+ *
+ * __no_sanitize_or_inline effectively prevents KMSAN from reporting errors
+ * within the function and marks all its outputs as initialized.
+ */
+# define __no_sanitize_or_inline __no_kmsan_checks notrace __maybe_unused
+#endif
+
#ifndef __no_sanitize_or_inline
#define __no_sanitize_or_inline __always_inline
#endif
--
2.44.0.769.g3c40516874-goog
Correctly set the length of the drm_event to the size of the structure
that's actually used.
The length of the drm_event was set to the parent structure instead of
to the drm_vmw_event_fence which is supposed to be read. drm_read
uses the length parameter to copy the event to the user space thus
resuling in oob reads.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 8b7de6aa8468 ("vmwgfx: Rework fence event action")
Reported-by: zdi-disclosures(a)trendmicro.com # ZDI-CAN-23566
Cc: David Airlie <airlied(a)gmail.com>
CC: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Zack Rusin <zack.rusin(a)broadcom.com>
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kernel(a)vger.kernel.org
Cc: <stable(a)vger.kernel.org> # v3.4+
---
drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 2a0cda324703..5efc6a766f64 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -991,7 +991,7 @@ static int vmw_event_fence_action_create(struct drm_file *file_priv,
}
event->event.base.type = DRM_VMW_EVENT_FENCE_SIGNALED;
- event->event.base.length = sizeof(*event);
+ event->event.base.length = sizeof(event->event);
event->event.user_data = user_data;
ret = drm_event_reserve_init(dev, file_priv, &event->base, &event->event.base);
--
2.40.1
Legacy DU was broken by the referenced fixes commit because the placement
and the busy_placement no longer pointed to the same object. This was later
fixed indirectly by commit a78a8da51b36c7a0c0c16233f91d60aac03a5a49
("drm/ttm: replace busy placement with flags v6") in v6.9.
Fixes: 39985eea5a6d ("drm/vmwgfx: Abstract placement selection")
Signed-off-by: Ian Forbes <ian.forbes(a)broadcom.com>
Cc: <stable(a)vger.kernel.org> # v6.4+
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index 2bfac3aad7b7..98e73eb0ccf1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -204,6 +204,7 @@ int vmw_bo_pin_in_start_of_vram(struct vmw_private *dev_priv,
VMW_BO_DOMAIN_VRAM,
VMW_BO_DOMAIN_VRAM);
buf->places[0].lpfn = PFN_UP(bo->resource->size);
+ buf->busy_places[0].lpfn = PFN_UP(bo->resource->size);
ret = ttm_bo_validate(bo, &buf->placement, &ctx);
/* For some reason we didn't end up at the start of vram */
--
2.34.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 35e351780fa9d8240dd6f7e4f245f9ea37e96c19
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042319-freeing-degree-ca0d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
35e351780fa9 ("fork: defer linking file vma until vma is fully initialized")
d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Wed, 10 Apr 2024 17:14:41 +0800
Subject: [PATCH] fork: defer linking file vma until vma is fully initialized
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index 39a5046c2f0b..aebb3e6c96dc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ /*
+ * Link the vma into the MT. After using __mt_dup(), memory
+ * allocation is not necessary here, so it cannot fail.
+ */
+ vma_iter_bulk_store(&vmi, tmp);
+
+ mm->map_count++;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
- /*
- * Link the vma into the MT. After using __mt_dup(), memory
- * allocation is not necessary here, so it cannot fail.
- */
- vma_iter_bulk_store(&vmi, tmp);
-
- mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
if (retval) {
mpnt = vma_next(&vmi);
goto loop_out;
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x ca7c52ac7ad384bcf299d89482c45fec7cd00da9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042358-esteemed-fastball-c2d8@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
ca7c52ac7ad3 ("drm/xe/vm: prevent UAF with asid based lookup")
0eb2a18a8fad ("drm/xe: Implement VM snapshot support for BO's and userptr")
be7d51c5b468 ("drm/xe: Add batch buffer addresses to devcoredump")
4376cee62092 ("drm/xe: Print more device information in devcoredump")
98fefec8c381 ("drm/xe: Change devcoredump functions parameters to xe_sched_job")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ca7c52ac7ad384bcf299d89482c45fec7cd00da9 Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld(a)intel.com>
Date: Fri, 12 Apr 2024 12:31:45 +0100
Subject: [PATCH] drm/xe/vm: prevent UAF with asid based lookup
The asid is only erased from the xarray when the vm refcount reaches
zero, however this leads to potential UAF since the xe_vm_get() only
works on a vm with refcount != 0. Since the asid is allocated in the vm
create ioctl, rather erase it when closing the vm, prior to dropping the
potential last ref. This should also work when user closes driver fd
without explicit vm destroy.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-4-matth…
(cherry picked from commit 83967c57320d0d01ae512f10e79213f81e4bf594)
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 62d1ef8867a8..3d4c8f342e21 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1577,6 +1577,16 @@ void xe_vm_close_and_put(struct xe_vm *vm)
xe->usm.num_vm_in_fault_mode--;
else if (!(vm->flags & XE_VM_FLAG_MIGRATION))
xe->usm.num_vm_in_non_fault_mode--;
+
+ if (vm->usm.asid) {
+ void *lookup;
+
+ xe_assert(xe, xe->info.has_asid);
+ xe_assert(xe, !(vm->flags & XE_VM_FLAG_MIGRATION));
+
+ lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
+ xe_assert(xe, lookup == vm);
+ }
mutex_unlock(&xe->usm.lock);
for_each_tile(tile, xe, id)
@@ -1592,24 +1602,15 @@ static void vm_destroy_work_func(struct work_struct *w)
struct xe_device *xe = vm->xe;
struct xe_tile *tile;
u8 id;
- void *lookup;
/* xe_vm_close_and_put was not called? */
xe_assert(xe, !vm->size);
mutex_destroy(&vm->snap_mutex);
- if (!(vm->flags & XE_VM_FLAG_MIGRATION)) {
+ if (!(vm->flags & XE_VM_FLAG_MIGRATION))
xe_device_mem_access_put(xe);
- if (xe->info.has_asid && vm->usm.asid) {
- mutex_lock(&xe->usm.lock);
- lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
- xe_assert(xe, lookup == vm);
- mutex_unlock(&xe->usm.lock);
- }
- }
-
for_each_tile(tile, xe, id)
XE_WARN_ON(vm->pt_root[id]);