The patch titled
Subject: mm/mempolicy: failed to disable numa balancing
has been added to the -mm mm-unstable branch. Its filename is
mm-mempolicy-failed-to-disable-numa-balancing.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: tzm <tcm1030(a)163.com>
Subject: mm/mempolicy: failed to disable numa balancing
Date: Fri, 2 Dec 2022 22:16:30 +0800
The kernel fails to disable numa balancing policy permanently when the
user passes <numa_balancing=disable> to the boot cmdline parameters. The
numabalancing_override variable is 1 for enable -1 for disable. So,
!numabalancing_override will always be true, which causes this bug.
Link: https://lkml.kernel.org/r/20221202141630.41220-1-tcm1030@163.com
Signed-off-by: tzm <tcm1030(a)163.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mempolicy.c~mm-mempolicy-failed-to-disable-numa-balancing
+++ a/mm/mempolicy.c
@@ -2865,7 +2865,7 @@ static void __init check_numabalancing_e
if (numabalancing_override)
set_numabalancing_state(numabalancing_override == 1);
- if (num_online_nodes() > 1 && !numabalancing_override) {
+ if (num_online_nodes() > 1 && (numabalancing_override == 1)) {
pr_info("%s automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl\n",
numabalancing_default ? "Enabling" : "Disabling");
set_numabalancing_state(numabalancing_default);
_
Patches currently in -mm which might be from tcm1030(a)163.com are
mm-mempolicy-failed-to-disable-numa-balancing.patch
Ives van Hoorne from codesandbox.io reported an issue regarding possible
data loss of uffd-wp when applied to memfds on heavily loaded systems. The
symptom is some read page got data mismatch from the snapshot child VMs.
Here I can also reproduce with a Rust reproducer that was provided by Ives
that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate
80 instances I can trigger the issues in ten minutes.
It turns out that we got some pages write-through even if uffd-wp is
applied to the pte.
The problem is, when removing migration entries, we didn't really worry
about write bit as long as we know it's not a write migration entry. That
may not be true, for some memory types (e.g. writable shmem) mk_pte can
return a pte with write bit set, then to recover the migration entry to its
original state we need to explicit wr-protect the pte or it'll has the
write bit set if it's a read migration entry. For uffd it can cause
write-through.
The relevant code on uffd was introduced in the anon support, which is
commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration",
2020-04-07). However anon shouldn't suffer from this problem because anon
should already have the write bit cleared always, so that may not be a
proper Fixes target, while I'm adding the Fixes to be uffd shmem support.
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Reported-by: Ives van Hoorne <ives(a)codesandbox.io>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Tested-by: Ives van Hoorne <ives(a)codesandbox.io>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/migrate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..8b6351c08c78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio,
pte = pte_mkdirty(pte);
if (is_writable_migration_entry(entry))
pte = maybe_mkwrite(pte, vma);
- else if (pte_swp_uffd_wp(*pvmw.pte))
+ else
+ /* NOTE: mk_pte can have write bit set */
+ pte = pte_wrprotect(pte);
+
+ if (pte_swp_uffd_wp(*pvmw.pte)) {
+ WARN_ON_ONCE(pte_write(pte));
pte = pte_mkuffd_wp(pte);
+ }
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
--
2.37.3
On Wed, Nov 30 2022 at 23:36, Sean Christopherson wrote:
> Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
> the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs
> blocked and enters halt. At best, a second (or third...) shootdown is an
> expensive nop, at worst it can hang the kernel and prevent kexec'ing into
> a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
> double shootdown resulted in a double list_add(), which is fatal when running
> with CONFIG_BUG_ON_DATA_CORRUPTION=y.
>
> With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can
> be reached after kdump_nmi_shootdown_cpus() (currently the only two users
> of nmi_shootdown_cpus()).
>
> To fix, move the disabling of virtualization into crash_nmi_callback(),
> remove emergency_vmx_disable_all()'s callback, and do a shootdown for
> emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred.
> The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM
> (obviously), and since I can't envision a use case for an NMI shootdown that
> doesn't want to disable virtualization, doing that in the core handler means
> emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it
> doesn't care when that shootdown happened or what callback may have run.
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
The LTP test pty03 is causing a crash in slcan:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 348 Comm: kworker/0:3 Not tainted 6.0.8-1-default #1 openSUSE Tumbleweed 9d20364b934f5aab0a9bdf84e8f45cfdfae39dab
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
Workqueue: 0x0 (events)
RIP: 0010:process_one_work (/home/rich/kernel/linux/kernel/workqueue.c:706 /home/rich/kernel/linux/kernel/workqueue.c:2185)
Code: 49 89 ff 41 56 41 55 41 54 55 53 48 89 f3 48 83 ec 10 48 8b 06 48 8b 6f 48 49 89 c4 45 30 e4 a8 04 b8 00 00 00 00 4c 0f 44 e0 <49> 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e
RSP: 0018:ffffaf7b40f47e98 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff9d644e1b8b48 RCX: ffff9d649e439968
RDX: 00000000ffff8455 RSI: ffff9d644e1b8b48 RDI: ffff9d64764aa6c0
RBP: ffff9d649e4335c0 R08: 0000000000000c00 R09: ffff9d64764aa734
R10: 0000000000000007 R11: 0000000000000001 R12: 0000000000000000
R13: ffff9d649e4335e8 R14: ffff9d64490da780 R15: ffff9d64764aa6c0
FS: 0000000000000000(0000) GS:ffff9d649e400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000036424000 CR4: 00000000000006f0
Call Trace:
<TASK>
worker_thread (/home/rich/kernel/linux/kernel/workqueue.c:2436)
kthread (/home/rich/kernel/linux/kernel/kthread.c:376)
ret_from_fork (/home/rich/kernel/linux/arch/x86/entry/entry_64.S:312)
Apparently, the slcan's tx_work is freed while being scheduled. While
slcan_netdev_close() (netdev side) calls flush_work(&sl->tx_work),
slcan_close() (tty side) does not. So when the netdev is never set UP,
but the tty is stuffed with bytes and forced to wakeup write, the work
is scheduled, but never flushed.
So add an additional flush_work() to slcan_close() to be sure the work
is flushed under all circumstances.
The Fixes commit below moved flush_work() from slcan_close() to
slcan_netdev_close(). What was the rationale behind it? Maybe we can
drop the one in slcan_netdev_close()?
I see the same pattern in can327. So it perhaps needs the very same fix.
Fixes: cfcb4465e992 ("can: slcan: remove legacy infrastructure")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1205597
Reported-by: Richard Palethorpe <richard.palethorpe(a)suse.com>
Tested-by: Petr Vorel <petr.vorel(a)suse.com>
Cc: Dario Binacchi <dario.binacchi(a)amarulasolutions.com>
Cc: Wolfgang Grandegger <wg(a)grandegger.com>
Cc: Marc Kleine-Budde <mkl(a)pengutronix.de>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: linux-can(a)vger.kernel.org
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Cc: Max Staudt <max(a)enpas.org>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby(a)kernel.org>
---
drivers/net/can/slcan/slcan-core.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/can/slcan/slcan-core.c b/drivers/net/can/slcan/slcan-core.c
index fbb34139daa1..f4db77007c13 100644
--- a/drivers/net/can/slcan/slcan-core.c
+++ b/drivers/net/can/slcan/slcan-core.c
@@ -864,12 +864,14 @@ static void slcan_close(struct tty_struct *tty)
{
struct slcan *sl = (struct slcan *)tty->disc_data;
- /* unregister_netdev() calls .ndo_stop() so we don't have to.
- * Our .ndo_stop() also flushes the TTY write wakeup handler,
- * so we can safely set sl->tty = NULL after this.
- */
unregister_candev(sl->dev);
+ /*
+ * The netdev needn't be UP (so .ndo_stop() is not called). Hence make
+ * sure this is not running before freeing it up.
+ */
+ flush_work(&sl->tx_work);
+
/* Mark channel as dead */
spin_lock_bh(&sl->lock);
tty->disc_data = NULL;
--
2.38.1
--
تم إرسال بريد إليك في وقت ما الأسبوع الماضي مع توقع
تلقي بريد عودة منك ولكن لدهشتي لم تكلف نفسك عناء الرد.
يرجى الرد لمزيد من الإيضاحات.
مع الاحترام لك،
كين جي ريتشاردسون.
This series fixes following issues:
Patch 1:
This patch provides a fix to correctly report encapsulated LRO'ed
packet.
Patch 2:
This patch provides a fix to use correct intrConf reference.
Changes in v2:
- declare generic descriptor to be used
- remove white spaces
- remove single quote around commit reference in patch 2
- remove if check for encap_lro
Ronak Doshi (2):
vmxnet3: correctly report encapsulated LRO packet
vmxnet3: use correct intrConf reference when using extended queues
drivers/net/vmxnet3/vmxnet3_drv.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
--
2.11.0
[please ignore if it is already reported]
The stable-rc 5.10 arm64 allmodconfig builds failed with gcc-12.
List of build warnings and errors with gcc-12 are listed below.
aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
aarch64-linux-gnu-ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function
`__kvm_nvhe___kvm_tlb_flush_vmid_ipa':
(.hyp.text+0x1a4c): undefined reference to `__kvm_nvhe_memset'
steps to reproduce:
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.
# Original tuxmake command with fragments listed below.
# tuxmake --runtime podman --target-arch arm64 --toolchain gcc-12
--kconfig allmodconfig CROSS_COMPILE_COMPAT=arm-linux-gnueabihf-
build log:
---------
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/build
CROSS_COMPILE_COMPAT=arm-linux-gnueabihf- ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- 'CC=sccache aarch64-linux-gnu-gcc'
'HOSTCC=sccache gcc' allmodconfig
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/build
CROSS_COMPILE_COMPAT=arm-linux-gnueabihf- ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- 'CC=sccache aarch64-linux-gnu-gcc'
'HOSTCC=sccache gcc'
/builds/linux/drivers/acpi/acpica/utdebug.c: In function
'acpi_ut_init_stack_ptr_trace':
/builds/linux/drivers/acpi/acpica/utdebug.c:40:38: warning: storing
the address of local variable 'current_sp' in
'acpi_gbl_entry_stack_pointer' [-Wdangling-pointer=]
40 | acpi_gbl_entry_stack_pointer = ¤t_sp;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
/builds/linux/drivers/acpi/acpica/utdebug.c:38:19: note: 'current_sp'
declared here
38 | acpi_size current_sp;
| ^~~~~~~~~~
In file included from /builds/linux/include/acpi/acpi.h:31,
from /builds/linux/drivers/acpi/acpica/utdebug.c:12:
/builds/linux/drivers/acpi/acpica/acglobal.h:196:26: note:
'acpi_gbl_entry_stack_pointer' declared here
196 | ACPI_GLOBAL(acpi_size *, acpi_gbl_entry_stack_pointer);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/builds/linux/include/acpi/acpixf.h:45:21: note: in definition of
macro 'ACPI_GLOBAL'
45 | extern type name
| ^~~~
/builds/linux/fs/xfs/libxfs/xfs_attr_remote.c: In function
'__xfs_attr3_rmt_read_verify':
/builds/linux/fs/xfs/libxfs/xfs_attr_remote.c:140:35: warning: storing
the address of local variable '__here' in '*failaddr'
[-Wdangling-pointer=]
140 | *failaddr = __this_address;
In file included from /builds/linux/fs/xfs/xfs.h:22,
from /builds/linux/fs/xfs/libxfs/xfs_attr_remote.c:7:
/builds/linux/fs/xfs/xfs_linux.h:133:46: note: '__here' declared here
133 | #define __this_address ({ __label__ __here; __here:
barrier(); &&__here; })
| ^~~~~~
/builds/linux/fs/xfs/libxfs/xfs_attr_remote.c:140:37: note: in
expansion of macro '__this_address'
140 | *failaddr = __this_address;
| ^~~~~~~~~~~~~~
/builds/linux/fs/xfs/xfs_linux.h:133:46: note: 'failaddr' declared here
133 | #define __this_address ({ __label__ __here; __here:
barrier(); &&__here; })
| ^~~~~~
/builds/linux/fs/xfs/libxfs/xfs_attr_remote.c:140:37: note: in
expansion of macro '__this_address'
140 | *failaddr = __this_address;
| ^~~~~~~~~~~~~~
/builds/linux/drivers/acpi/thermal.c: In function 'acpi_thermal_resume':
/builds/linux/drivers/acpi/thermal.c:1123:21: warning: the comparison
will always evaluate as 'true' for the address of 'active' will never
be NULL [-Waddress]
1123 | if (!(&tz->trips.active[i]))
| ^
/builds/linux/drivers/acpi/thermal.c:154:36: note: 'active' declared here
154 | struct acpi_thermal_active active[ACPI_THERMAL_MAX_ACTIVE];
| ^~~~~~
In file included from /builds/linux/include/linux/preempt.h:11,
from /builds/linux/include/linux/percpu.h:6,
from /builds/linux/include/linux/context_tracking_state.h:5,
from /builds/linux/include/linux/hardirq.h:5,
from /builds/linux/include/linux/interrupt.h:11,
from /builds/linux/drivers/scsi/lpfc/lpfc_bsg.c:23:
In function '__list_add',
inlined from 'list_add_tail' at /builds/linux/include/linux/list.h:100:2,
inlined from 'diag_cmd_data_free.isra' at
/builds/linux/drivers/scsi/lpfc/lpfc_bsg.c:891:2:
/builds/linux/include/linux/list.h:70:20: warning: storing the address
of local variable 'head' in '*&mlist_1(D)->dma.list.prev'
[-Wdangling-pointer=]
70 | next->prev = new;
| ~~~~~~~~~~~^~~~~
/builds/linux/drivers/scsi/lpfc/lpfc_bsg.c: In function
'diag_cmd_data_free.isra':
/builds/linux/drivers/scsi/lpfc/lpfc_bsg.c:883:26: note: 'head' declared here
883 | struct list_head head, *curr, *next;
| ^~~~
/builds/linux/drivers/scsi/lpfc/lpfc_bsg.c:883:26: note: 'mlist' declared here
In file included from /builds/linux/include/linux/smp.h:12,
from /builds/linux/arch/arm64/include/asm/arch_timer.h:18,
from /builds/linux/arch/arm64/include/asm/timex.h:8,
from /builds/linux/include/linux/timex.h:67,
from /builds/linux/include/linux/time32.h:13,
from /builds/linux/include/linux/time.h:73,
from /builds/linux/include/linux/skbuff.h:15,
from /builds/linux/include/linux/if_ether.h:19,
from /builds/linux/include/linux/etherdevice.h:20,
from /builds/linux/drivers/net/wireless/ath/ath6kl/core.h:21,
from
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:18:
In function '__list_add',
inlined from 'list_add' at /builds/linux/include/linux/list.h:86:2,
inlined from 'ath6kl_htc_mbox_tx' at
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:1142:3:
/builds/linux/include/linux/list.h:72:19: warning: storing the address
of local variable 'queue' in '*&packet_15(D)->list.prev'
[-Wdangling-pointer=]
72 | new->prev = prev;
| ~~~~~~~~~~^~~~~~
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c: In function
'ath6kl_htc_mbox_tx':
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:1125:26:
note: 'queue' declared here
1125 | struct list_head queue;
| ^~~~~
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:1125:26:
note: 'packet' declared here
In function '__list_add',
inlined from 'list_add_tail' at /builds/linux/include/linux/list.h:100:2,
inlined from 'htc_tx_comp_handler' at
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:462:2:
/builds/linux/include/linux/list.h:72:19: warning: storing the address
of local variable 'container' in '*&packet_5(D)->list.prev'
[-Wdangling-pointer=]
72 | new->prev = prev;
| ~~~~~~~~~~^~~~~~
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c: In function
'htc_tx_comp_handler':
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:455:26: note:
'container' declared here
455 | struct list_head container;
| ^~~~~~~~~
/builds/linux/drivers/net/wireless/ath/ath6kl/htc_mbox.c:455:26: note:
'packet' declared here
/builds/linux/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c: In
function 'rtl92d_phy_reload_iqk_setting':
/builds/linux/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c:2389:39:
warning: the comparison will always evaluate as 'true' for the address
of 'value' will never be NULL [-Waddress]
2389 | value[0] != NULL)
| ^~
In file included from
/builds/linux/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c:4:
/builds/linux/drivers/net/wireless/realtek/rtlwifi/rtl8192de/../wifi.h:1293:14:
note: 'value' declared here
1293 | long value[1][IQK_MATRIX_REG_NUM];
| ^~~~~
/builds/linux/drivers/net/ethernet/sun/cassini.c: In function 'cas_init_rx_dma':
/builds/linux/drivers/net/ethernet/sun/cassini.c:1328:29: warning:
comparison between two arrays [-Warray-compare]
1328 | if (CAS_HP_FIRMWARE == cas_prog_null)
| ^~
/builds/linux/drivers/net/ethernet/sun/cassini.c:1328:29: note: use
'&cas_prog_workaroundtab[0] == &cas_prog_null[0]' to compare the
addresses
/builds/linux/drivers/net/ethernet/sun/cassini.c: In function 'cas_reset':
/builds/linux/drivers/net/ethernet/sun/cassini.c:3796:34: warning:
comparison between two arrays [-Warray-compare]
3796 | (CAS_HP_ALT_FIRMWARE == cas_prog_null)) {
| ^~
/builds/linux/drivers/net/ethernet/sun/cassini.c:3796:34: note: use
'&cas_prog_null[0] == &cas_prog_null[0]' to compare the addresses
aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
aarch64-linux-gnu-ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function
`__kvm_nvhe___kvm_tlb_flush_vmid_ipa':
(.hyp.text+0x1a4c): undefined reference to `__kvm_nvhe_memset'
aarch64-linux-gnu-ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function
`__kvm_nvhe___kvm_tlb_flush_vmid':
(.hyp.text+0x1b20): undefined reference to `__kvm_nvhe_memset'
aarch64-linux-gnu-ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function
`__kvm_nvhe___kvm_flush_cpu_context':
(.hyp.text+0x1b80): undefined reference to `__kvm_nvhe_memset'
make[1]: *** [/builds/linux/Makefile:1194: vmlinux] Error 1
Build link,
- https://builds.tuxbuild.com/2IHivEkKmuryHjt6Xv8xUn9RLy5/
Build comparison link,
- https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y-sanity/buil…
--
Linaro LKFT
https://lkft.linaro.org
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup")
2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume")
e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
444e2ff34df8 ("tools arch x86: Grab a copy of the file containing the MSR numbers")
87a682a7c4e7 ("perf build: Ignore intentional differences for the x86 insn decoder")
00a263902ac3 ("perf intel-pt: Use shared x86 insn decoder")
f1da0a6c1365 ("perf intel-pt: Remove inat.c from build dependency list")
8520a98dbab6 ("perf debug: Remove needless include directives from debug.h")
0ac25fd0a04d ("perf tools: Remove perf.h from source files not needing it")
c1a604dff486 ("perf tools: Remove needless perf.h include directive from headers")
91854f9a077e ("perf tools: Move everything related to sys_perf_event_open() to perf-sys.h")
0ac1dd5b4a70 ("perf timechart: Refactor svg_build_topology_map()")
2da39f1cc36b ("perf evlist: Remove needless util.h from evlist.h")
efa73d37c11a ("perf tools: Remove needless util.h include from builtin.h")
185bcb92c80e ("perf sort: Remove needless headers from sort.h, provide fwd struct decls")
97b9d866a66c ("perf srcline: Add missing srcline.h header to files needing its defs")
125009026bfc ("perf cacheline: Move cacheline related routines to separate files")
aeb00b1aeab6 ("perf record: Move record_opts and other record decls out of perf.h")
8db5957bc736 ("Merge tag 'v5.3-rc6' into perf/core, to pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Date: Tue, 15 Nov 2022 11:17:06 -0800
Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore
setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.16685397…
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 4cd39f304e20..93ae33248f42 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void)
{
- u32 spec_msr_id[] = {
- MSR_IA32_SPEC_CTRL,
- MSR_IA32_TSX_CTRL,
- MSR_TSX_FORCE_ABORT,
- MSR_IA32_MCU_OPT_CTRL,
- MSR_AMD64_LS_CFG,
- MSR_AMD64_DE_CFG,
+ struct msr_enumeration {
+ u32 msr_no;
+ u32 feature;
+ } msr_enum[] = {
+ { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL },
+ { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL },
+ { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT },
+ { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL },
+ { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD },
+ { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC },
};
+ int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id));
+ for (i = 0; i < ARRAY_SIZE(msr_enum); i++) {
+ if (boot_cpu_has(msr_enum[i].feature))
+ msr_build_context(&msr_enum[i].msr_no, 1);
+ }
}
static int pm_check_save_msr(void)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup")
2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume")
e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
444e2ff34df8 ("tools arch x86: Grab a copy of the file containing the MSR numbers")
87a682a7c4e7 ("perf build: Ignore intentional differences for the x86 insn decoder")
00a263902ac3 ("perf intel-pt: Use shared x86 insn decoder")
f1da0a6c1365 ("perf intel-pt: Remove inat.c from build dependency list")
8520a98dbab6 ("perf debug: Remove needless include directives from debug.h")
0ac25fd0a04d ("perf tools: Remove perf.h from source files not needing it")
c1a604dff486 ("perf tools: Remove needless perf.h include directive from headers")
91854f9a077e ("perf tools: Move everything related to sys_perf_event_open() to perf-sys.h")
0ac1dd5b4a70 ("perf timechart: Refactor svg_build_topology_map()")
2da39f1cc36b ("perf evlist: Remove needless util.h from evlist.h")
efa73d37c11a ("perf tools: Remove needless util.h include from builtin.h")
185bcb92c80e ("perf sort: Remove needless headers from sort.h, provide fwd struct decls")
97b9d866a66c ("perf srcline: Add missing srcline.h header to files needing its defs")
125009026bfc ("perf cacheline: Move cacheline related routines to separate files")
aeb00b1aeab6 ("perf record: Move record_opts and other record decls out of perf.h")
8db5957bc736 ("Merge tag 'v5.3-rc6' into perf/core, to pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Date: Tue, 15 Nov 2022 11:17:06 -0800
Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore
setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.16685397…
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 4cd39f304e20..93ae33248f42 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void)
{
- u32 spec_msr_id[] = {
- MSR_IA32_SPEC_CTRL,
- MSR_IA32_TSX_CTRL,
- MSR_TSX_FORCE_ABORT,
- MSR_IA32_MCU_OPT_CTRL,
- MSR_AMD64_LS_CFG,
- MSR_AMD64_DE_CFG,
+ struct msr_enumeration {
+ u32 msr_no;
+ u32 feature;
+ } msr_enum[] = {
+ { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL },
+ { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL },
+ { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT },
+ { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL },
+ { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD },
+ { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC },
};
+ int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id));
+ for (i = 0; i < ARRAY_SIZE(msr_enum); i++) {
+ if (boot_cpu_has(msr_enum[i].feature))
+ msr_build_context(&msr_enum[i].msr_no, 1);
+ }
}
static int pm_check_save_msr(void)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup")
2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume")
e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
444e2ff34df8 ("tools arch x86: Grab a copy of the file containing the MSR numbers")
87a682a7c4e7 ("perf build: Ignore intentional differences for the x86 insn decoder")
00a263902ac3 ("perf intel-pt: Use shared x86 insn decoder")
f1da0a6c1365 ("perf intel-pt: Remove inat.c from build dependency list")
8520a98dbab6 ("perf debug: Remove needless include directives from debug.h")
0ac25fd0a04d ("perf tools: Remove perf.h from source files not needing it")
c1a604dff486 ("perf tools: Remove needless perf.h include directive from headers")
91854f9a077e ("perf tools: Move everything related to sys_perf_event_open() to perf-sys.h")
0ac1dd5b4a70 ("perf timechart: Refactor svg_build_topology_map()")
2da39f1cc36b ("perf evlist: Remove needless util.h from evlist.h")
efa73d37c11a ("perf tools: Remove needless util.h include from builtin.h")
185bcb92c80e ("perf sort: Remove needless headers from sort.h, provide fwd struct decls")
97b9d866a66c ("perf srcline: Add missing srcline.h header to files needing its defs")
125009026bfc ("perf cacheline: Move cacheline related routines to separate files")
aeb00b1aeab6 ("perf record: Move record_opts and other record decls out of perf.h")
8db5957bc736 ("Merge tag 'v5.3-rc6' into perf/core, to pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Date: Tue, 15 Nov 2022 11:17:06 -0800
Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore
setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.16685397…
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 4cd39f304e20..93ae33248f42 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void)
{
- u32 spec_msr_id[] = {
- MSR_IA32_SPEC_CTRL,
- MSR_IA32_TSX_CTRL,
- MSR_TSX_FORCE_ABORT,
- MSR_IA32_MCU_OPT_CTRL,
- MSR_AMD64_LS_CFG,
- MSR_AMD64_DE_CFG,
+ struct msr_enumeration {
+ u32 msr_no;
+ u32 feature;
+ } msr_enum[] = {
+ { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL },
+ { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL },
+ { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT },
+ { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL },
+ { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD },
+ { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC },
};
+ int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id));
+ for (i = 0; i < ARRAY_SIZE(msr_enum); i++) {
+ if (boot_cpu_has(msr_enum[i].feature))
+ msr_build_context(&msr_enum[i].msr_no, 1);
+ }
}
static int pm_check_save_msr(void)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup")
2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume")
e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
444e2ff34df8 ("tools arch x86: Grab a copy of the file containing the MSR numbers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Date: Tue, 15 Nov 2022 11:17:06 -0800
Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore
setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.16685397…
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 4cd39f304e20..93ae33248f42 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void)
{
- u32 spec_msr_id[] = {
- MSR_IA32_SPEC_CTRL,
- MSR_IA32_TSX_CTRL,
- MSR_TSX_FORCE_ABORT,
- MSR_IA32_MCU_OPT_CTRL,
- MSR_AMD64_LS_CFG,
- MSR_AMD64_DE_CFG,
+ struct msr_enumeration {
+ u32 msr_no;
+ u32 feature;
+ } msr_enum[] = {
+ { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL },
+ { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL },
+ { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT },
+ { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL },
+ { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD },
+ { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC },
};
+ int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id));
+ for (i = 0; i < ARRAY_SIZE(msr_enum); i++) {
+ if (boot_cpu_has(msr_enum[i].feature))
+ msr_build_context(&msr_enum[i].msr_no, 1);
+ }
}
static int pm_check_save_msr(void)
Hello Marek,
it looks like commit 753395ea1e45 ("ARM: dts: imx7: Fix NAND controller
size-cells"), that was backported to stable 6.0.10, introduce a boot
regression on colibri-imx7, at least.
What I get is
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 6.0.10 (francesco@francesco-nb) (arm-linux-gnueabihf-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.
4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #36 SMP Wed Nov 30 14:07:15 CET 2022
...
[ 4.407499] gpmi-nand: error parsing ofpart partition /soc/nand-controller@33002000/partition@0 (/soc/nand-controller
@33002000)
[ 4.438401] gpmi-nand 33002000.nand-controller: driver registered.
...
[ 5.933906] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0): error -19
[ 5.946504] Please append a correct "root=" boot option; here are the available partitions:
...
Any idea? I'm not familiar with the gpmi-nand driver and I would just revert it, but
maybe you have a better idea.
Francesco
The quilt patch titled
Subject: error-injection: add prompt for function error injection
has been removed from the -mm tree. Its filename was
error-injection-add-prompt-for-function-error-injection.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Subject: error-injection: add prompt for function error injection
Date: Mon, 21 Nov 2022 10:44:03 -0500
The config to be able to inject error codes into any function annotated
with ALLOW_ERROR_INJECTION() is enabled when
CONFIG_FUNCTION_ERROR_INJECTION is enabled. But unfortunately, this is
always enabled on x86 when KPROBES is enabled, and there's no way to turn
it off.
As kprobes is useful for observability of the kernel, it is useful to have
it enabled in production environments. But error injection should be
avoided. Add a prompt to the config to allow it to be disabled even when
kprobes is enabled, and get rid of the "def_bool y".
This is a kernel debug feature (it's in Kconfig.debug), and should have
never been something enabled by default.
Link: https://lkml.kernel.org/r/20221121104403.1545f9b5@gandalf.local.home
Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov(a)gmail.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Florent Revest <revest(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: KP Singh <kpsingh(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.debug | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/lib/Kconfig.debug~error-injection-add-prompt-for-function-error-injection
+++ a/lib/Kconfig.debug
@@ -1875,8 +1875,14 @@ config NETDEV_NOTIFIER_ERROR_INJECT
If unsure, say N.
config FUNCTION_ERROR_INJECTION
- def_bool y
+ bool "Fault-injections of functions"
depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
+ help
+ Add fault injections into various functions that are annotated with
+ ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
+ value of theses functions. This is useful to test error paths of code.
+
+ If unsure, say N
config FAULT_INJECTION
bool "Fault-injection framework"
_
Patches currently in -mm which might be from rostedt(a)goodmis.org are
Since: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
we wait for all the workloads to be completed during shutdown. This was done to
avoid a stall once the device is started again.
Unfortunately this has the side effect of stalling kexec(), if the userspace
is frozen. Let's handle that case.
To: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
To: Liam Girdwood <lgirdwood(a)gmail.com>
To: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
To: Bard Liao <yung-chuan.liao(a)linux.intel.com>
To: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
To: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
To: Daniel Baluta <daniel.baluta(a)nxp.com>
To: Mark Brown <broonie(a)kernel.org>
To: Jaroslav Kysela <perex(a)perex.cz>
To: Takashi Iwai <tiwai(a)suse.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Chromeos Kdump <chromeos-kdump(a)google.com>
To: Steven Rostedt <rostedt(a)goodmis.org>
Cc: stable(a)vger.kernel.org
Cc: sound-open-firmware(a)alsa-project.org
Cc: alsa-devel(a)alsa-project.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kexec(a)lists.infradead.org
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v7:
- Fix commit message (Thanks Pierre-Louis).
- Link to v6: https://lore.kernel.org/r/20221127-snd-freeze-v6-0-3e90553f64a5@chromium.org
Changes in v6:
- Check if we are in kexec with the userspace frozen.
- Link to v5: https://lore.kernel.org/r/20221127-snd-freeze-v5-0-4ededeb08ba0@chromium.org
Changes in v5:
- Edit subject prefix.
- Link to v4: https://lore.kernel.org/r/20221127-snd-freeze-v4-0-51ca64b7f2ab@chromium.org
Changes in v4:
- Do not call snd_sof_machine_unregister from shutdown.
- Link to v3: https://lore.kernel.org/r/20221127-snd-freeze-v3-0-a2eda731ca14@chromium.org
Changes in v3:
- Wrap pm_freezing in a function.
- Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org
Changes in v2:
- Only use pm_freezing if CONFIG_FREEZER .
- Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
---
Ricardo Ribalda (2):
kexec: Introduce kexec_with_frozen_processes
ASoC: SOF: Fix deadlock when shutdown a frozen userspace
include/linux/kexec.h | 3 +++
kernel/kexec_core.c | 5 +++++
sound/soc/sof/core.c | 4 +++-
3 files changed, 11 insertions(+), 1 deletion(-)
---
base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
change-id: 20221127-snd-freeze-1ee143228326
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup")
2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume")
e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Date: Tue, 15 Nov 2022 11:17:06 -0800
Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore
setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.16685397…
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 4cd39f304e20..93ae33248f42 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void)
{
- u32 spec_msr_id[] = {
- MSR_IA32_SPEC_CTRL,
- MSR_IA32_TSX_CTRL,
- MSR_TSX_FORCE_ABORT,
- MSR_IA32_MCU_OPT_CTRL,
- MSR_AMD64_LS_CFG,
- MSR_AMD64_DE_CFG,
+ struct msr_enumeration {
+ u32 msr_no;
+ u32 feature;
+ } msr_enum[] = {
+ { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL },
+ { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL },
+ { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT },
+ { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL },
+ { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD },
+ { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC },
};
+ int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id));
+ for (i = 0; i < ARRAY_SIZE(msr_enum); i++) {
+ if (boot_cpu_has(msr_enum[i].feature))
+ msr_build_context(&msr_enum[i].msr_no, 1);
+ }
}
static int pm_check_save_msr(void)
On Thu, Nov 24, 2022 at 01:48:08PM -0500, John Aron wrote:
> Hello -
>
>
>
> I have an idea of where to begin: our kernel code compiles and works on Red
> Hat, CentOS, and Fedora. In Ubuntu 20.04, I have an error.
>
>
>
> root@form:/home/john/thor-linux/Kernel/ubuntu20.04# make
>
> rmmod: ERROR: Module thor is not currently loaded
>
> make: [Makefile:7: all] Error 1 (ignored)
>
> make[1]: Entering directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> CC [M] /home/john/thor-linux/Kernel/ubuntu22.04/thor.o
>
> /home/john/thor-linux/Kernel/ubuntu22.04/thor.o: warning: objtool:
> _Controller_process_response_map()+0x1b3: unreachable instruction
>
> Building modules, stage 2.
>
> MODPOST 1 modules
>
> CC [M] /home/john/thor-linux/Kernel/ubuntu22.04/thor.mod.o
>
> LD [M] /home/john/thor-linux/Kernel/ubuntu22.04/thor.ko
>
> make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> make[1]: Entering directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> CLEAN /home/john/thor-linux/Kernel/ubuntu22.04/Module.symvers
>
> make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> #@sudo dmesg -C
>
> #@sudo insmod /usr/local/etc/thor.ko
>
> filename: /usr/local/etc/thor.ko
>
> version: 0.1
>
> description: THOR KMOD
>
> author: Aronetics
>
> license: GPL
>
> srcversion: BC856FA85DB2FEFD38A1B2A
>
> depends:
>
> retpoline: Y
>
> name: thor
>
> vermagic: 5.4.0-131-generic SMP mod_unload modversions
>
> #@sudo dmesg
>
> root@form:/home/john/thor-linux/Kernel/ubuntu20.04#
> <mailto:root@form:/home/john/thor-linux/Kernel/ubuntu20.04#>
>
>
>
> Every 2.0s: tail -n30 /var/lib/dkms/thor/1.0.1/build/make.log
>
>
>
> DKMS make.log for thor-1.0.1 for kernel 5.4.0-131-generic (x86_64)
>
> Thu 24 Nov 2022 01:10:33 PM EST
>
> make: Entering directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> CC [M] /var/lib/dkms/thor/1.0.1/build/thor.o
>
> /var/lib/dkms/thor/1.0.1/build/thor.o: warning: objtool:
> _Controller_process_response_map()+0x1b3: unreachable instruction
>
> Building modules, stage 2.
>
> MODPOST 1 modules
>
> CC [M] /var/lib/dkms/thor/1.0.1/build/thor.mod.o
>
> LD [M] /var/lib/dkms/thor/1.0.1/build/thor.ko
>
> make: Leaving directory '/usr/src/linux-headers-5.4.0-131-generic'
>
>
>
> Is this an error in objtool on Ubuntu within
> /usr/src/linux-headers-5.4.0-${26-130}/tools/objtool ?
Do you have a pointer to your code anywhere? Do you have .S files in
it, or is it all C files?
And did you ask the Canonical developers about this? You should have a
support contract you are paying for with them, so why not use that?
thanks,
greg k-h
This bug is marked as fixed by commit:
ext4: block range must be validated before use in ext4_mb_clear_bb()
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
The condition detecting whether somebody else has the device exclusively
open in disk_scan_partitions() has a brownpaper bag bug. It triggers also
when nobody has the device exclusively open and we are coming from
BLKRRPART path. Interestingly this didn't have any adverse effects
during testing because tools update kernel's notion of the partition
table using ioctls and don't rely on BLKRRPART. Fix the bug before
somebody trips over it.
Fixes: 8d67fc20caf8 ("block: Do not reread partition table on exclusively open device")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
block/genhd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c
index 012529d36f5b..29fb2c98b401 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -367,7 +367,7 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner)
if (disk->open_partitions)
return -EBUSY;
/* Someone else has bdev exclusively open? */
- if (disk->part0->bd_holder != owner)
+ if (disk->part0->bd_holder && disk->part0->bd_holder != owner)
return -EBUSY;
set_bit(GD_NEED_PART_SCAN, &disk->state);
--
2.35.3
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
I'm looking to use a sendfile(2) with a Xilinx XDMA kernel driver in order to move data from a PCIe board with Xilinx FPGA to the network card with "zero-copy".
Currently I'm getting EINVAL return status from sendfile(2) when providing opened XDMA device file descriptor as input fd.
The device driver provides a character device that can be mmap'ed.
There seem to be other restrictions. Can anyone provide insight on what would be needed to make this work?
Thanks! //hinko
> On 29 Nov 2022, at 17:01, gregkh(a)linuxfoundation.org wrote:
>
>
> This is a note to let you know that I've just added the patch titled
>
> kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
>
> to the 6.0-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> kbuild-fix-wimplicit-function-declaration-in-license_is_gpl_compatible.patch
> and it can be found in the queue-6.0 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
> From 50c697215a8cc22f0e58c88f06f2716c05a26e85 Mon Sep 17 00:00:00 2001
> From: Sam James <sam(a)gentoo.org>
> Date: Wed, 16 Nov 2022 18:26:34 +0000
> Subject: kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
>
> From: Sam James <sam(a)gentoo.org>
>
> commit 50c697215a8cc22f0e58c88f06f2716c05a26e85 upstream.
Hi Greg,
Please yank this commit from all the stable queues -- it needs
Some further baking, and a revert is queued in Andrew's tree.
Thanks,
sam
>
> Add missing <linux/string.h> include for strcmp.
>
> Clang 16 makes -Wimplicit-function-declaration an error by default.
> Unfortunately, out of tree modules may use this in configure scripts,
> which means failure might cause silent miscompilation or misconfiguration.
>
> For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
> or the (new) c-std-porting mailing list [3].
>
> [0] https://lwn.net/Articles/913505/
> [1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-…
> [2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6…
> [3] hosted at lists.linux.dev.
>
> [akpm(a)linux-foundation.org: remember "linux/"]
> Link: https://lkml.kernel.org/r/20221116182634.2823136-1-sam@gentoo.org
> Signed-off-by: Sam James <sam(a)gentoo.org>
> Cc: <stable(a)vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> include/linux/license.h | 2 ++
test_bpf tail call tests end up as:
test_bpf: #0 Tail call leaf jited:1 85 PASS
test_bpf: #1 Tail call 2 jited:1 111 PASS
test_bpf: #2 Tail call 3 jited:1 145 PASS
test_bpf: #3 Tail call 4 jited:1 170 PASS
test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
test_bpf: #5 Tail call load/store jited:1
BUG: Unable to handle kernel data access on write at 0xf1b4e000
Faulting instruction address: 0xbe86b710
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in: test_bpf(+)
CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
NIP: be86b710 LR: be857e88 CTR: be86b704
REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+)
MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28008242 XER: 00000000
DAR: f1b4e000 DSISR: 42000000
GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
NIP [be86b710] 0xbe86b710
LR [be857e88] __run_one+0xec/0x264 [test_bpf]
Call Trace:
[f1b4dfe0] [00000002] 0x2 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0000000000000000 ]---
This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee7b690 ("bpf, tests: Add load store
test case for tail call")
This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.
This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.
Let's take a new approach. Use register r4 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. This means the input parameter must be in r3, which is more
correct as it is a 32 bits parameter, then tail call better match with
normal BPF function entry, the down side being that we move that input
parameter back and forth between r3 and r4. That can be optimised later.
Doing that also has the advantage of maximising the common parts between
tail calls and a normal function exit.
With the fix, tail call tests are now successfull:
test_bpf: #0 Tail call leaf jited:1 53 PASS
test_bpf: #1 Tail call 2 jited:1 115 PASS
test_bpf: #2 Tail call 3 jited:1 154 PASS
test_bpf: #3 Tail call 4 jited:1 165 PASS
test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
test_bpf: #5 Tail call load/store jited:1 141 PASS
test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
Suggested-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Fixes: 51c66ad849a7 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Tested-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.16692788…
---
arch/powerpc/net/bpf_jit_comp32.c | 52 +++++++++++++------------------
1 file changed, 21 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 43f1c76d48ce..a379b0ce19ff 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -113,23 +113,19 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
{
int i;
- /* First arg comes in as a 32 bits pointer. */
- EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3));
- EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
+ /* Initialize tail_call_cnt, to be skipped if we do tail calls. */
+ EMIT(PPC_RAW_LI(_R4, 0));
+
+#define BPF_TAILCALL_PROLOGUE_SIZE 4
+
EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx)));
- /*
- * Initialize tail_call_cnt in stack frame if we do tail calls.
- * Otherwise, put in NOPs so that it can be skipped when we are
- * invoked through a tail call.
- */
if (ctx->seen & SEEN_TAILCALL)
- EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_1) - 1, _R1,
- bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
- else
- EMIT(PPC_RAW_NOP());
+ EMIT(PPC_RAW_STW(_R4, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
-#define BPF_TAILCALL_PROLOGUE_SIZE 16
+ /* First arg comes in as a 32 bits pointer. */
+ EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3));
+ EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
/*
* We need a stack frame, but we don't necessarily need to
@@ -170,24 +166,24 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
if (bpf_is_seen_register(ctx, i))
EMIT(PPC_RAW_LWZ(i, _R1, bpf_jit_stack_offsetof(ctx, i)));
-}
-
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
-{
- EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
-
- bpf_jit_emit_common_epilogue(image, ctx);
-
- /* Tear down our stack frame */
if (ctx->seen & SEEN_FUNC)
EMIT(PPC_RAW_LWZ(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
+ /* Tear down our stack frame */
EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_PPC_STACKFRAME(ctx)));
if (ctx->seen & SEEN_FUNC)
EMIT(PPC_RAW_MTLR(_R0));
+}
+
+void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+{
+ EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
+
+ bpf_jit_emit_common_epilogue(image, ctx);
+
EMIT(PPC_RAW_BLR());
}
@@ -244,7 +240,6 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));
EMIT(PPC_RAW_ADD(_R3, _R3, b2p_bpf_array));
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_array, ptrs)));
- EMIT(PPC_RAW_STW(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
/*
* if (prog == NULL)
@@ -255,19 +250,14 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
/* goto *(prog->bpf_func + prologue_size); */
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_prog, bpf_func)));
-
- if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_LWZ(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
-
EMIT(PPC_RAW_ADDIC(_R3, _R3, BPF_TAILCALL_PROLOGUE_SIZE));
-
- if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_MTLR(_R0));
-
EMIT(PPC_RAW_MTCTR(_R3));
EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_1)));
+ /* Put tail_call_cnt in r4 */
+ EMIT(PPC_RAW_MR(_R4, _R0));
+
/* tear restore NVRs, ... */
bpf_jit_emit_common_epilogue(image, ctx);
--
2.38.1
Hi Greg
After upgrading to 5.4.211 we were started seeing some nodes getting
stuck in our Kubernetes cluster. All nodes are running this kernel
version. After taking a closer look it seems that runc was command getting
stuck. Looking at the stack it appears the thread is stuck in epoll wait for
sometime.
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] ep_poll+0x48d/0x4e0
[<0>] do_epoll_wait+0xab/0xc0
[<0>] __x64_sys_epoll_pwait+0x4d/0xa0
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] futex_wait_queue_me+0xb6/0x110
[<0>] futex_wait+0xe2/0x260
[<0>] do_futex+0x372/0x4f0
[<0>] __x64_sys_futex+0x134/0x180
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
I noticed there are other discussions going on as well
regarding this.
https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/
Reverting the below patch does fix the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
We don't see this issue in latest upstream kernel or even latest 5.10
stable tree. Looking at the patches that went in for 5.10 stable there's
one that stands out that seems to be missing in 5.4.
289caf5d8f6c61c6d2b7fd752a7f483cd153f182 (epoll: check for events when removing
a timed out thread from the wait queue)
Backporting this patch to 5.4 we don't see the hangups anymore. Looks like
this patch fixes time out scenarios which might cause missed wake ups.
The other patch in the patch series also fixes a race and is needed for
the second patch to apply.
Roman Penyaev (1):
epoll: call final ep_events_available() check under the lock
Soheil Hassas Yeganeh (1):
epoll: check for events when removing a timed out thread from the wait
queue
fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++--------------------
1 file changed, 41 insertions(+), 27 deletions(-)
--
2.37.1
The quilt patch titled
Subject: mm, compaction: fix fast_isolate_around() to stay within boundaries
has been removed from the -mm tree. Its filename was
mm-compaction-fix-fast_isolate_around-to-stay-within-boundaries.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: NARIBAYASHI Akira <a.naribayashi(a)fujitsu.com>
Subject: mm, compaction: fix fast_isolate_around() to stay within boundaries
Date: Wed, 26 Oct 2022 20:24:38 +0900
Depending on the memory configuration, isolate_freepages_block() may scan
pages out of the target range and causes panic.
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special
configurations. Depending on how many similar systems there are, it
may be a good idea to fix this problem for older kernels as well.
The problem is that pfn as argument of fast_isolate_around() could be out
of the target range. Therefore we should consider the case where pfn <
start_pfn, and also the case where end_pfn < pfn.
This problem should have been addressd by the commit 6e2b7044c199 ("mm,
compaction: make fast_isolate_freepages() stay within zone") but there was
an oversight.
Case1: pfn < start_pfn
<at memory compaction for node Y>
| node X's zone | node Y's zone
+-----------------+------------------------------...
pageblock ^ ^ ^
+-----------+-----------+-----------+-----------+...
^ ^ ^
^ ^ end_pfn
^ start_pfn = cc->zone->zone_start_pfn
pfn
<---------> scanned range by "Scan After"
Case2: end_pfn < pfn
<at memory compaction for node X>
| node X's zone | node Y's zone
+-----------------+------------------------------...
pageblock ^ ^ ^
+-----------+-----------+-----------+-----------+...
^ ^ ^
^ ^ pfn
^ end_pfn
start_pfn
<---------> scanned range by "Scan Before"
It seems that there is no good reason to skip nr_isolated pages just after
given pfn. So let perform simple scan from start to end instead of
dividing the scan into "Before" and "After".
Link: https://lkml.kernel.org/r/20221026112438.236336-1-a.naribayashi@fujitsu.com
Fixes: 6e2b7044c199 ("mm, compaction: make fast_isolate_freepages() stay within zone").
Signed-off-by: NARIBAYASHI Akira <a.naribayashi(a)fujitsu.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)
--- a/mm/compaction.c~mm-compaction-fix-fast_isolate_around-to-stay-within-boundaries
+++ a/mm/compaction.c
@@ -1344,7 +1344,7 @@ move_freelist_tail(struct list_head *fre
}
static void
-fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
+fast_isolate_around(struct compact_control *cc, unsigned long pfn)
{
unsigned long start_pfn, end_pfn;
struct page *page;
@@ -1365,21 +1365,13 @@ fast_isolate_around(struct compact_contr
if (!page)
return;
- /* Scan before */
- if (start_pfn != pfn) {
- isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false);
- if (cc->nr_freepages >= cc->nr_migratepages)
- return;
- }
-
- /* Scan after */
- start_pfn = pfn + nr_isolated;
- if (start_pfn < end_pfn)
- isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
+ isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
/* Skip this pageblock in the future as it's full or nearly full */
if (cc->nr_freepages < cc->nr_migratepages)
set_pageblock_skip(page);
+
+ return;
}
/* Search orders in round-robin fashion */
@@ -1556,7 +1548,7 @@ fast_isolate_freepages(struct compact_co
return cc->free_pfn;
low_pfn = page_to_pfn(page);
- fast_isolate_around(cc, low_pfn, nr_isolated);
+ fast_isolate_around(cc, low_pfn);
return low_pfn;
}
_
Patches currently in -mm which might be from a.naribayashi(a)fujitsu.com are
The quilt patch titled
Subject: mm/gup: disallow FOLL_FORCE|FOLL_WRITE on hugetlb mappings
has been removed from the -mm tree. Its filename was
mm-gup-disallow-foll_forcefoll_write-on-hugetlb-mappings.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: disallow FOLL_FORCE|FOLL_WRITE on hugetlb mappings
Date: Mon, 31 Oct 2022 16:25:24 +0100
hugetlb does not support fake write-faults (write faults without write
permissions). However, we are currently able to trigger a
FAULT_FLAG_WRITE fault on a VMA without VM_WRITE.
If we'd ever want to support FOLL_FORCE|FOLL_WRITE, we'd have to teach
hugetlb to:
(1) Leave the page mapped R/O after the fake write-fault, like
maybe_mkwrite() does.
(2) Allow writing to an exclusive anon page that's mapped R/O when
FOLL_FORCE is set, like can_follow_write_pte(). E.g.,
__follow_hugetlb_must_fault() needs adjustment.
For now, it's not clear if that added complexity is really required.
History tolds us that FOLL_FORCE is dangerous and that we better limit its
use to a bare minimum.
--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <stdint.h>
#include <sys/mman.h>
#include <linux/mman.h>
int main(int argc, char **argv)
{
char *map;
int mem_fd;
map = mmap(NULL, 2 * 1024 * 1024u, PROT_READ,
MAP_PRIVATE|MAP_ANON|MAP_HUGETLB|MAP_HUGE_2MB, -1, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed: %d\n", errno);
return 1;
}
mem_fd = open("/proc/self/mem", O_RDWR);
if (mem_fd < 0) {
fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno);
return 1;
}
if (pwrite(mem_fd, "0", 1, (uintptr_t) map) == 1) {
fprintf(stderr, "write() succeeded, which is unexpected\n");
return 1;
}
printf("write() failed as expected: %d\n", errno);
return 0;
}
--------------------------------------------------------------------------
Fortunately, we have a sanity check in hugetlb_wp() in place ever since
commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared
mappings"), that bails out instead of silently mapping a page writable in
a !PROT_WRITE VMA.
Consequently, above reproducer triggers a warning, similar to the one
reported by szsbot:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 3612 at mm/hugetlb.c:5313 hugetlb_wp+0x20a/0x1af0 mm/hugetlb.c:5313
Modules linked in:
CPU: 1 PID: 3612 Comm: syz-executor250 Not tainted 6.1.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
RIP: 0010:hugetlb_wp+0x20a/0x1af0 mm/hugetlb.c:5313
Code: ea 03 80 3c 02 00 0f 85 31 14 00 00 49 8b 5f 20 31 ff 48 89 dd 83 e5 02 48 89 ee e8 70 ab b7 ff 48 85 ed 75 5b e8 76 ae b7 ff <0f> 0b 41 bd 40 00 00 00 e8 69 ae b7 ff 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc90003caf620 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000008640070 RCX: 0000000000000000
RDX: ffff88807b963a80 RSI: ffffffff81c4ed2a RDI: 0000000000000007
RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000008c07e R12: ffff888023805800
R13: 0000000000000000 R14: ffffffff91217f38 R15: ffff88801d4b0360
FS: 0000555555bba300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff7a47a1b8 CR3: 000000002378d000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
hugetlb_no_page mm/hugetlb.c:5755 [inline]
hugetlb_fault+0x19cc/0x2060 mm/hugetlb.c:5874
follow_hugetlb_page+0x3f3/0x1850 mm/hugetlb.c:6301
__get_user_pages+0x2cb/0xf10 mm/gup.c:1202
__get_user_pages_locked mm/gup.c:1434 [inline]
__get_user_pages_remote+0x18f/0x830 mm/gup.c:2187
get_user_pages_remote+0x84/0xc0 mm/gup.c:2260
__access_remote_vm+0x287/0x6b0 mm/memory.c:5517
ptrace_access_vm+0x181/0x1d0 kernel/ptrace.c:61
generic_ptrace_pokedata kernel/ptrace.c:1323 [inline]
ptrace_request+0xb46/0x10c0 kernel/ptrace.c:1046
arch_ptrace+0x36/0x510 arch/x86/kernel/ptrace.c:828
__do_sys_ptrace kernel/ptrace.c:1296 [inline]
__se_sys_ptrace kernel/ptrace.c:1269 [inline]
__x64_sys_ptrace+0x178/0x2a0 kernel/ptrace.c:1269
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
So let's silence that warning by teaching GUP code that FOLL_FORCE -- so
far -- does not apply to hugetlb.
Note that FOLL_FORCE for read-access seems to be working as expected. The
assumption is that this has been broken forever, only ever since above
commit, we actually detect the wrong handling and WARN_ON_ONCE().
I assume this has been broken at least since 2014, when mm/gup.c came to
life. I failed to come up with a suitable Fixes tag quickly.
Link: https://lkml.kernel.org/r/20221031152524.173644-1-david@redhat.com
Fixes: 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared mappings")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: <syzbot+f0b97304ef90f0d0b1dc(a)syzkaller.appspotmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/gup.c~mm-gup-disallow-foll_forcefoll_write-on-hugetlb-mappings
+++ a/mm/gup.c
@@ -1009,6 +1009,9 @@ static int check_vma_flags(struct vm_are
if (!(vm_flags & VM_WRITE)) {
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
+ /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
+ if (is_vm_hugetlb_page(vma))
+ return -EFAULT;
/*
* We used to let the write,force case do COW in a
* VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could
_
Patches currently in -mm which might be from david(a)redhat.com are
selftests-vm-add-ksm-unmerge-tests.patch
mm-pagewalk-dont-trigger-test_walk-in-walk_page_vma.patch
selftests-vm-add-test-to-measure-madv_unmergeable-performance.patch
mm-ksm-simplify-break_ksm-to-not-rely-on-vm_fault_write.patch
mm-remove-vm_fault_write.patch
mm-ksm-fix-ksm-cow-breaking-with-userfaultfd-wp-via-fault_flag_unshare.patch
mm-pagewalk-add-walk_page_range_vma.patch
mm-ksm-convert-break_ksm-to-use-walk_page_range_vma.patch
mm-gup-remove-foll_migration.patch
Commit be36f9e7517e ("efi: READ_ONCE rng seed size before munmap")
added a READ_ONCE() and also changed the call to
add_bootloader_randomness() to use the local size variable. Neither
of these changes was actually needed and this was not backported to
the 4.19 stable branch.
Commit 161a438d730d ("efi: random: reduce seed size to 32 bytes")
reverted the addition of READ_ONCE() and added a limit to the value of
size. This depends on the earlier commit, because size can now differ
from seed->size, but it was wrongly backported to the 4.19 stable
branch by itself.
Apply the missing change to the add_bootloader_randomness() parameter
(except that here we are still using add_device_randomness()).
Fixes: 0513592520ae ("efi: random: reduce seed size to 32 bytes")
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
---
drivers/firmware/efi/efi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index f0ef2643b70e..2bbc2289fe09 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -566,7 +566,7 @@ int __init efi_config_parse_tables(void *config_tables, int count, int sz,
sizeof(*seed) + size);
if (seed != NULL) {
pr_notice("seeding entropy pool\n");
- add_device_randomness(seed->bits, seed->size);
+ add_device_randomness(seed->bits, size);
early_memunmap(seed, sizeof(*seed) + size);
} else {
pr_err("Could not map UEFI random seed!\n");
The quilt patch titled
Subject: Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
has been removed from the -mm tree. Its filename was
kconfigdebug-provide-a-little-extra-frame_warn-leeway-when-kasan-is-enabled.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lee Jones <lee(a)kernel.org>
Subject: Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
Date: Fri, 25 Nov 2022 12:07:50 +0000
When enabled, KASAN enlarges function's stack-frames. Pushing quite a few
over the current threshold. This can mainly be seen on 32-bit
architectures where the present limit (when !GCC) is a lowly 1024-Bytes.
Link: https://lkml.kernel.org/r/20221125120750.3537134-3-lee@kernel.org
Signed-off-by: Lee Jones <lee(a)kernel.org>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: "Christian K��nig" <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Leo Li <sunpeng.li(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: "Pan, Xinhui" <Xinhui.Pan(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tom Rix <trix(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.debug | 1 +
1 file changed, 1 insertion(+)
--- a/lib/Kconfig.debug~kconfigdebug-provide-a-little-extra-frame_warn-leeway-when-kasan-is-enabled
+++ a/lib/Kconfig.debug
@@ -399,6 +399,7 @@ config FRAME_WARN
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 2048 if PARISC
default 1536 if (!64BIT && XTENSA)
+ default 1280 if KASAN && !64BIT
default 1024 if !64BIT
default 2048 if 64BIT
help
_
Patches currently in -mm which might be from lee(a)kernel.org are
The quilt patch titled
Subject: drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
has been removed from the -mm tree. Its filename was
drm-amdgpu-temporarily-disable-broken-clang-builds-due-to-blown-stack-frame.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lee Jones <lee(a)kernel.org>
Subject: drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
Date: Fri, 25 Nov 2022 12:07:49 +0000
Patch series "Fix a bunch of allmodconfig errors", v2.
Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST")
WERROR now defaults to COMPILE_TEST meaning that it's enabled for
allmodconfig builds. This leads to some interesting build failures when
using Clang, each resolved in this set.
With this set applied, I am able to obtain a successful allmodconfig Arm
build.
This patch (of 2):
calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 ||
ARM64) architectures built with Clang (all released versions), whereby the
stack frame gets blown up to well over 5k. This would cause an immediate
kernel panic on most architectures. We'll revert this when the following
bug report has been resolved:
https://github.com/llvm/llvm-project/issues/41896.
Link: https://lkml.kernel.org/r/20221125120750.3537134-1-lee@kernel.org
Link: https://lkml.kernel.org/r/20221125120750.3537134-2-lee@kernel.org
Signed-off-by: Lee Jones <lee(a)kernel.org>
Suggested-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: "Christian K��nig" <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Lee Jones <lee(a)kernel.org>
Cc: Leo Li <sunpeng.li(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: "Pan, Xinhui" <Xinhui.Pan(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tom Rix <trix(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/gpu/drm/amd/display/Kconfig | 7 +++++++
1 file changed, 7 insertions(+)
--- a/drivers/gpu/drm/amd/display/Kconfig~drm-amdgpu-temporarily-disable-broken-clang-builds-due-to-blown-stack-frame
+++ a/drivers/gpu/drm/amd/display/Kconfig
@@ -5,6 +5,7 @@ menu "Display Engine Configuration"
config DRM_AMD_DC
bool "AMD DC - Enable new display engine"
default y
+ depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
select SND_HDA_COMPONENT if SND_HDA_CORE
select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128)
help
@@ -12,6 +13,12 @@ config DRM_AMD_DC
support for AMDGPU. This adds required support for Vega and
Raven ASICs.
+ calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64)
+ architectures built with Clang (all released versions), whereby the stack
+ frame gets blown up to well over 5k. This would cause an immediate kernel
+ panic on most architectures. We'll revert this when the following bug report
+ has been resolved: https://github.com/llvm/llvm-project/issues/41896.
+
config DRM_AMD_DC_DCN
def_bool n
help
_
Patches currently in -mm which might be from lee(a)kernel.org are
The quilt patch titled
Subject: mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
has been removed from the -mm tree. Its filename was
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
Date: Fri, 25 Nov 2022 22:37:14 +0100
Any codepath that zaps page table entries must invoke MMU notifiers to
ensure that secondary MMUs (like KVM) don't keep accessing pages which
aren't mapped anymore. Secondary MMUs don't hold their own references to
pages that are mirrored over, so failing to notify them can lead to page
use-after-free.
I'm marking this as addressing an issue introduced in commit f3f0e1d2150b
("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of
the security impact of this only came in commit 27e1f8273113 ("khugepaged:
enable collapse pmd for pte-mapped THP"), which actually omitted flushes
for the removal of present PTEs, not just for the removal of empty page
tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-3-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-3-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-3-jannh@google.com
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths
+++ a/mm/khugepaged.c
@@ -1399,6 +1399,7 @@ static void collapse_and_free_pmd(struct
unsigned long addr, pmd_t *pmdp)
{
pmd_t pmd;
+ struct mmu_notifier_range range;
mmap_assert_write_locked(mm);
if (vma->vm_file)
@@ -1410,8 +1411,12 @@ static void collapse_and_free_pmd(struct
if (vma->anon_vma)
lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, addr,
+ addr + HPAGE_PMD_SIZE);
+ mmu_notifier_invalidate_range_start(&range);
pmd = pmdp_collapse_flush(vma, addr, pmdp);
tlb_remove_table_sync_one();
+ mmu_notifier_invalidate_range_end(&range);
mm_dec_nr_ptes(mm);
page_table_check_pte_clear_range(mm, addr, pmd);
pte_free(mm, pmd_pgtable(pmd));
_
Patches currently in -mm which might be from jannh(a)google.com are
The quilt patch titled
Subject: mm/khugepaged: take the right locks for page table retraction
has been removed from the -mm tree. Its filename was
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/khugepaged: take the right locks for page table retraction
Date: Fri, 25 Nov 2022 22:37:12 +0100
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 55 ++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 51 insertions(+), 4 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-take-the-right-locks-for-page-table-retraction
+++ a/mm/khugepaged.c
@@ -1379,16 +1379,37 @@ static int set_huge_pmd(struct vm_area_s
return SCAN_SUCCEED;
}
+/*
+ * A note about locking:
+ * Trying to take the page table spinlocks would be useless here because those
+ * are only used to synchronize:
+ *
+ * - modifying terminal entries (ones that point to a data page, not to another
+ * page table)
+ * - installing *new* non-terminal entries
+ *
+ * Instead, we need roughly the same kind of protection as free_pgtables() or
+ * mm_take_all_locks() (but only for a single VMA):
+ * The mmap lock together with this VMA's rmap locks covers all paths towards
+ * the page table entries we're messing with here, except for hardware page
+ * table walks and lockless_pages_from_mm().
+ */
static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
- spinlock_t *ptl;
pmd_t pmd;
mmap_assert_write_locked(mm);
- ptl = pmd_lock(vma->vm_mm, pmdp);
+ if (vma->vm_file)
+ lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem);
+ /*
+ * All anon_vmas attached to the VMA have the same root and are
+ * therefore locked by the same lock.
+ */
+ if (vma->anon_vma)
+ lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
+
pmd = pmdp_collapse_flush(vma, addr, pmdp);
- spin_unlock(ptl);
mm_dec_nr_ptes(mm);
page_table_check_pte_clear_range(mm, addr, pmd);
pte_free(mm, pmd_pgtable(pmd));
@@ -1439,6 +1460,14 @@ int collapse_pte_mapped_thp(struct mm_st
if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false))
return SCAN_VMA_CHECK;
+ /*
+ * Symmetry with retract_page_tables(): Exclude MAP_PRIVATE mappings
+ * that got written to. Without this, we'd have to also lock the
+ * anon_vma if one exists.
+ */
+ if (vma->anon_vma)
+ return SCAN_VMA_CHECK;
+
/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
if (userfaultfd_wp(vma))
return SCAN_PTE_UFFD_WP;
@@ -1472,6 +1501,20 @@ int collapse_pte_mapped_thp(struct mm_st
goto drop_hpage;
}
+ /*
+ * We need to lock the mapping so that from here on, only GUP-fast and
+ * hardware page walks can access the parts of the page tables that
+ * we're operating on.
+ * See collapse_and_free_pmd().
+ */
+ i_mmap_lock_write(vma->vm_file->f_mapping);
+
+ /*
+ * This spinlock should be unnecessary: Nobody else should be accessing
+ * the page tables under spinlock protection here, only
+ * lockless_pages_from_mm() and the hardware page walker can access page
+ * tables while all the high-level locks are held in write mode.
+ */
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
result = SCAN_FAIL;
@@ -1526,6 +1569,8 @@ int collapse_pte_mapped_thp(struct mm_st
/* step 4: remove pte entries */
collapse_and_free_pmd(mm, vma, haddr, pmd);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+
maybe_install_pmd:
/* step 5: install pmd entry */
result = install_pmd
@@ -1539,6 +1584,7 @@ drop_hpage:
abort:
pte_unmap_unlock(start_pte, ptl);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
goto drop_hpage;
}
@@ -1595,7 +1641,8 @@ static int retract_page_tables(struct ad
* An alternative would be drop the check, but check that page
* table is clear before calling pmdp_collapse_flush() under
* ptl. It has higher chance to recover THP for the VMA, but
- * has higher cost too.
+ * has higher cost too. It would also probably require locking
+ * the anon_vma.
*/
if (vma->anon_vma) {
result = SCAN_PAGE_ANON;
_
Patches currently in -mm which might be from jannh(a)google.com are
The quilt patch titled
Subject: mm: migrate: fix THP's mapcount on isolation
has been removed from the -mm tree. Its filename was
mm-migrate-fix-thps-mapcount-on-isolation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gavin Shan <gshan(a)redhat.com>
Subject: mm: migrate: fix THP's mapcount on isolation
Date: Thu, 24 Nov 2022 17:55:23 +0800
The issue is reported when removing memory through virtio_mem device. The
transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page can't
be migrated and the corresponding memory block can't be put into offline
state.
Fix it by replacing page_mapcount() with total_mapcount(). With this, the
transparent huge page can be isolated and migrated, and the memory block
can be put into offline state. Besides, The page's refcount is increased
a bit earlier to avoid the page is released when the check is executed.
Link: https://lkml.kernel.org/r/20221124095523.31061-1-gshan@redhat.com
Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org> [5.7+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
--- a/mm/compaction.c~mm-migrate-fix-thps-mapcount-on-isolation
+++ a/mm/compaction.c
@@ -985,28 +985,28 @@ isolate_migratepages_block(struct compac
}
/*
+ * Be careful not to clear PageLRU until after we're
+ * sure the page is not being freed elsewhere -- the
+ * page release code relies on it.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto isolate_fail;
+
+ /*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
- goto isolate_fail;
+ if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+ goto isolate_fail_put;
/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
if (!(cc->gfp_mask & __GFP_FS) && mapping)
- goto isolate_fail;
-
- /*
- * Be careful not to clear PageLRU until after we're
- * sure the page is not being freed elsewhere -- the
- * page release code relies on it.
- */
- if (unlikely(!get_page_unless_zero(page)))
- goto isolate_fail;
+ goto isolate_fail_put;
/* Only take pages on LRU: a check now makes later tests safe */
if (!PageLRU(page))
_
Patches currently in -mm which might be from gshan(a)redhat.com are
The quilt patch titled
Subject: mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
has been removed from the -mm tree. Its filename was
mm-damon-sysfs-fix-wrong-empty-schemes-assumption-under-online-tuning-in-damon_sysfs_set_schemes.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
Date: Tue, 22 Nov 2022 19:48:31 +0000
Commit da87878010e5 ("mm/damon/sysfs: support online inputs update") made
'damon_sysfs_set_schemes()' to be called for running DAMON context, which
could have schemes. In the case, DAMON sysfs interface is supposed to
update, remove, or add schemes to reflect the sysfs files. However, the
code is assuming the DAMON context wouldn't have schemes at all, and
therefore creates and adds new schemes. As a result, the code doesn't
work as intended for online schemes tuning and could have more than
expected memory footprint. The schemes are all in the DAMON context, so
it doesn't leak the memory, though.
Remove the wrong asssumption (the DAMON context wouldn't have schemes) in
'damon_sysfs_set_schemes()' to fix the bug.
Link: https://lkml.kernel.org/r/20221122194831.3472-1-sj@kernel.org
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 46 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-fix-wrong-empty-schemes-assumption-under-online-tuning-in-damon_sysfs_set_schemes
+++ a/mm/damon/sysfs.c
@@ -2283,12 +2283,54 @@ static struct damos *damon_sysfs_mk_sche
&wmarks);
}
+static void damon_sysfs_update_scheme(struct damos *scheme,
+ struct damon_sysfs_scheme *sysfs_scheme)
+{
+ struct damon_sysfs_access_pattern *access_pattern =
+ sysfs_scheme->access_pattern;
+ struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
+ struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights;
+ struct damon_sysfs_watermarks *sysfs_wmarks = sysfs_scheme->watermarks;
+
+ scheme->pattern.min_sz_region = access_pattern->sz->min;
+ scheme->pattern.max_sz_region = access_pattern->sz->max;
+ scheme->pattern.min_nr_accesses = access_pattern->nr_accesses->min;
+ scheme->pattern.max_nr_accesses = access_pattern->nr_accesses->max;
+ scheme->pattern.min_age_region = access_pattern->age->min;
+ scheme->pattern.max_age_region = access_pattern->age->max;
+
+ scheme->action = sysfs_scheme->action;
+
+ scheme->quota.ms = sysfs_quotas->ms;
+ scheme->quota.sz = sysfs_quotas->sz;
+ scheme->quota.reset_interval = sysfs_quotas->reset_interval_ms;
+ scheme->quota.weight_sz = sysfs_weights->sz;
+ scheme->quota.weight_nr_accesses = sysfs_weights->nr_accesses;
+ scheme->quota.weight_age = sysfs_weights->age;
+
+ scheme->wmarks.metric = sysfs_wmarks->metric;
+ scheme->wmarks.interval = sysfs_wmarks->interval_us;
+ scheme->wmarks.high = sysfs_wmarks->high;
+ scheme->wmarks.mid = sysfs_wmarks->mid;
+ scheme->wmarks.low = sysfs_wmarks->low;
+}
+
static int damon_sysfs_set_schemes(struct damon_ctx *ctx,
struct damon_sysfs_schemes *sysfs_schemes)
{
- int i;
+ struct damos *scheme, *next;
+ int i = 0;
+
+ damon_for_each_scheme_safe(scheme, next, ctx) {
+ if (i < sysfs_schemes->nr)
+ damon_sysfs_update_scheme(scheme,
+ sysfs_schemes->schemes_arr[i]);
+ else
+ damon_destroy_scheme(scheme);
+ i++;
+ }
- for (i = 0; i < sysfs_schemes->nr; i++) {
+ for (; i < sysfs_schemes->nr; i++) {
struct damos *scheme, *next;
scheme = damon_sysfs_mk_scheme(sysfs_schemes->schemes_arr[i]);
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-split-out-damos-charged-region-skip-logic-into-a-new-function.patch
mm-damon-core-split-damos-application-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-stat-update-logic-into-a-new-function.patch
mm-damon-core-split-out-scheme-quota-adjustment-logic-into-a-new-function.patch
mm-damon-sysfs-use-damon_addr_range-for-regions-start-and-end-values.patch
mm-damon-sysfs-remove-parameters-of-damon_sysfs_region_alloc.patch
mm-damon-sysfs-move-sysfs_lock-to-common-module.patch
mm-damon-sysfs-move-unsigned-long-range-directory-to-common-module.patch
mm-damon-sysfs-split-out-kdamond-independent-schemes-stats-update-logic-into-a-new-function.patch
mm-damon-sysfs-split-out-schemes-directory-implementation-to-separate-file.patch
mm-damon-modules-deduplicate-init-steps-for-damon-context-setup.patch
mm-damon-reclaimlru_sort-remove-unnecessarily-included-headers.patch
mm-damon-reclaim-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_reclaims-enabled-parameter.patch
mm-damon-lru_sort-enable-and-disable-synchronously.patch
selftests-damon-add-tests-for-damon_lru_sorts-enabled-parameter.patch
docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch
docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch
mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch
mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch
mm-damon-sysfs-schemes-implement-scheme-region-directory.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command-fix.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command-fix.patch
tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch
docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch
docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch
selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch
The quilt patch titled
Subject: tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
has been removed from the -mm tree. Its filename was
tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
Date: Sat, 19 Nov 2022 10:36:59 +0800
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Link: https://lkml.kernel.org/r/1668825419-30584-1-git-send-email-yangtiezhu@loon…
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/vm/slabinfo-gnuplot.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/vm/slabinfo-gnuplot.sh~tools-vm-slabinfo-gnuplot-use-grep-e-instead-of-egrep
+++ a/tools/vm/slabinfo-gnuplot.sh
@@ -150,7 +150,7 @@ do_preprocess()
let lines=3
out=`basename "$in"`"-slabs-by-loss"
`cat "$in" | grep -A "$lines" 'Slabs sorted by loss' |\
- egrep -iv '\-\-|Name|Slabs'\
+ grep -E -iv '\-\-|Name|Slabs'\
| awk '{print $1" "$4+$2*$3" "$4}' > "$out"`
if [ $? -eq 0 ]; then
do_slabs_plotting "$out"
@@ -159,7 +159,7 @@ do_preprocess()
let lines=3
out=`basename "$in"`"-slabs-by-size"
`cat "$in" | grep -A "$lines" 'Slabs sorted by size' |\
- egrep -iv '\-\-|Name|Slabs'\
+ grep -E -iv '\-\-|Name|Slabs'\
| awk '{print $1" "$4" "$4-$2*$3}' > "$out"`
if [ $? -eq 0 ]; then
do_slabs_plotting "$out"
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
The quilt patch titled
Subject: nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
has been removed from the -mm tree. Its filename was
nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: ZhangPeng <zhangpeng362(a)huawei.com>
Subject: nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
Date: Sat, 19 Nov 2022 21:05:42 +0900
Syzbot reported a null-ptr-deref bug:
NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
frequency < 30 seconds
general protection fault, probably for non-canonical address
0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 3603 Comm: segctord Not tainted
6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
10/11/2022
RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
fs/nilfs2/alloc.c:608
Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
Call Trace:
<TASK>
nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
...
If DAT metadata file is corrupted on disk, there is a case where
req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
because nilfs_dat_commit_alloc() for a lower level block can initialize
the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
nilfs_dat_commit_end().
If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
causes the NULL pointer dereference above in
nilfs_palloc_commit_free_entry() function, which leads to a crash.
Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
This also calls nilfs_error() in that case to notify that there is a fatal
flaw in the filesystem metadata and prevent further operations.
Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com
Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com
Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com
Signed-off-by: ZhangPeng <zhangpeng362(a)huawei.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+ebe05ee8e98f755f61d0(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dat.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/fs/nilfs2/dat.c~nilfs2-fix-null-pointer-dereference-in-nilfs_palloc_commit_free_entry
+++ a/fs/nilfs2/dat.c
@@ -111,6 +111,13 @@ static void nilfs_dat_commit_free(struct
kunmap_atomic(kaddr);
nilfs_dat_commit_entry(dat, req);
+
+ if (unlikely(req->pr_desc_bh == NULL || req->pr_bitmap_bh == NULL)) {
+ nilfs_error(dat->i_sb,
+ "state inconsistency probably due to duplicate use of vblocknr = %llu",
+ (unsigned long long)req->pr_entry_nr);
+ return;
+ }
nilfs_palloc_commit_free_entry(dat, req);
}
_
Patches currently in -mm which might be from zhangpeng362(a)huawei.com are
hfs-fix-oob-read-in-__hfs_brec_find.patch
The quilt patch titled
Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
has been removed from the -mm tree. Its filename was
hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Date: Mon, 14 Nov 2022 15:55:06 -0800
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
tables associated with the address range. For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final. However,
__unmap_hugepage_range_final assumes the passed vma is about to be removed
and deletes the vma_lock to prevent pmd sharing as the vma is on the way
out. In the case of madvise(MADV_DONTNEED) the vma remains, but the
missing vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap. Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing. Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
not set in new pages added to the page table. This resulted in pages that
appeared anonymous in a VM_SHARED vma and triggered the BUG.
Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
call from unmap_vmas(). This is used to indicate the 'final' unmapping of
a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and
the vm_lock is not deleted.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 2 ++
mm/hugetlb.c | 27 ++++++++++++++++-----------
mm/memory.c | 2 +-
3 files changed, 19 insertions(+), 12 deletions(-)
--- a/include/linux/mm.h~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing
+++ a/include/linux/mm.h
@@ -1868,6 +1868,8 @@ struct zap_details {
* default, the flag is not set.
*/
#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */
+#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1))
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
--- a/mm/hugetlb.c~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing
+++ a/mm/hugetlb.c
@@ -5206,17 +5206,22 @@ void __unmap_hugepage_range_final(struct
__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
- /*
- * Unlock and free the vma lock before releasing i_mmap_rwsem. When
- * the vma_lock is freed, this makes the vma ineligible for pmd
- * sharing. And, i_mmap_rwsem is required to set up pmd sharing.
- * This is important as page tables for this unmapped range will
- * be asynchrously deleted. If the page tables are shared, there
- * will be issues when accessed by someone else.
- */
- __hugetlb_vma_unlock_write_free(vma);
-
- i_mmap_unlock_write(vma->vm_file->f_mapping);
+ if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */
+ /*
+ * Unlock and free the vma lock before releasing i_mmap_rwsem.
+ * When the vma_lock is freed, this makes the vma ineligible
+ * for pmd sharing. And, i_mmap_rwsem is required to set up
+ * pmd sharing. This is important as page tables for this
+ * unmapped range will be asynchrously deleted. If the page
+ * tables are shared, there will be issues when accessed by
+ * someone else.
+ */
+ __hugetlb_vma_unlock_write_free(vma);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ } else {
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ hugetlb_vma_unlock_write(vma);
+ }
}
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
--- a/mm/memory.c~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing
+++ a/mm/memory.c
@@ -1711,7 +1711,7 @@ void unmap_vmas(struct mmu_gather *tlb,
{
struct mmu_notifier_range range;
struct zap_details details = {
- .zap_flags = ZAP_FLAG_DROP_MARKER,
+ .zap_flags = ZAP_FLAG_DROP_MARKER | ZAP_FLAG_UNMAP,
/* Careful - we need to zap private pages too! */
.even_cows = true,
};
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
selftests-vm-update-hugetlb-madvise.patch
hugetlb-remove-duplicate-mmu-notifications.patch
The quilt patch titled
Subject: madvise: use zap_page_range_single for madvise dontneed
has been removed from the -mm tree. Its filename was
madvise-use-zap_page_range_single-for-madvise-dontneed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: madvise: use zap_page_range_single for madvise dontneed
Date: Mon, 14 Nov 2022 15:55:05 -0800
This series addresses the issue first reported in [1], and fully described
in patch 2. Patches 1 and 2 address the user visible issue and are tagged
for stable backports.
While exploring solutions to this issue, related problems with mmu
notification calls were discovered. This is addressed in the patch
"hugetlb: remove duplicate mmu notifications:". Since there are no user
visible effects, this third is not tagged for stable backports.
Previous discussions suggested further cleanup by removing the
routine zap_page_range. This is possible because zap_page_range_single
is now exported, and all callers of zap_page_range pass ranges entirely
within a single vma. This work will be done in a later patch so as not
to distract from this bug fix.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
This patch (of 2):
Expose the routine zap_page_range_single to zap a range within a single
vma. The madvise routine madvise_dontneed_single_vma can use this routine
as it explicitly operates on a single vma. Also, update the mmu
notification range in zap_page_range_single to take hugetlb pmd sharing
into account. This is required as MADV_DONTNEED supports hugetlb vmas.
Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 27 +++++++++++++++++++--------
mm/madvise.c | 6 +++---
mm/memory.c | 23 +++++++++++------------
3 files changed, 33 insertions(+), 23 deletions(-)
--- a/include/linux/mm.h~madvise-use-zap_page_range_single-for-madvise-dontneed
+++ a/include/linux/mm.h
@@ -1852,6 +1852,23 @@ static void __maybe_unused show_free_are
__show_free_areas(flags, nodemask, MAX_NR_ZONES - 1);
}
+/*
+ * Parameter block passed down to zap_pte_range in exceptional cases.
+ */
+struct zap_details {
+ struct folio *single_folio; /* Locked folio to be unmapped */
+ bool even_cows; /* Zap COWed private pages too? */
+ zap_flags_t zap_flags; /* Extra flags for zapping */
+};
+
+/*
+ * Whether to drop the pte markers, for example, the uffd-wp information for
+ * file-backed memory. This should only be specified when we will completely
+ * drop the page in the mm, either by truncation or unmapping of the vma. By
+ * default, the flag is not set.
+ */
+#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
#else
@@ -1869,6 +1886,8 @@ void zap_vma_ptes(struct vm_area_struct
unsigned long size);
void zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+ unsigned long size, struct zap_details *details);
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
unsigned long end);
@@ -3467,12 +3486,4 @@ madvise_set_anon_name(struct mm_struct *
}
#endif
-/*
- * Whether to drop the pte markers, for example, the uffd-wp information for
- * file-backed memory. This should only be specified when we will completely
- * drop the page in the mm, either by truncation or unmapping of the vma. By
- * default, the flag is not set.
- */
-#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
-
#endif /* _LINUX_MM_H */
--- a/mm/madvise.c~madvise-use-zap_page_range_single-for-madvise-dontneed
+++ a/mm/madvise.c
@@ -772,8 +772,8 @@ static int madvise_free_single_vma(struc
* Application no longer needs these pages. If the pages are dirty,
* it's OK to just throw them away. The app will be more careful about
* data it wants to keep. Be sure to free swap resources too. The
- * zap_page_range call sets things up for shrink_active_list to actually free
- * these pages later if no one else has touched them in the meantime,
+ * zap_page_range_single call sets things up for shrink_active_list to actually
+ * free these pages later if no one else has touched them in the meantime,
* although we could add these pages to a global reuse list for
* shrink_active_list to pick up before reclaiming other pages.
*
@@ -790,7 +790,7 @@ static int madvise_free_single_vma(struc
static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- zap_page_range(vma, start, end - start);
+ zap_page_range_single(vma, start, end - start, NULL);
return 0;
}
--- a/mm/memory.c~madvise-use-zap_page_range_single-for-madvise-dontneed
+++ a/mm/memory.c
@@ -1341,15 +1341,6 @@ copy_page_range(struct vm_area_struct *d
return ret;
}
-/*
- * Parameter block passed down to zap_pte_range in exceptional cases.
- */
-struct zap_details {
- struct folio *single_folio; /* Locked folio to be unmapped */
- bool even_cows; /* Zap COWed private pages too? */
- zap_flags_t zap_flags; /* Extra flags for zapping */
-};
-
/* Whether we should zap all COWed (private) pages too */
static inline bool should_zap_cows(struct zap_details *details)
{
@@ -1774,19 +1765,27 @@ void zap_page_range(struct vm_area_struc
*
* The range must fit into one VMA.
*/
-static void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
{
+ const unsigned long end = address + size;
struct mmu_notifier_range range;
struct mmu_gather tlb;
lru_add_drain();
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address, address + size);
+ address, end);
+ if (is_vm_hugetlb_page(vma))
+ adjust_range_if_pmd_sharing_possible(vma, &range.start,
+ &range.end);
tlb_gather_mmu(&tlb, vma->vm_mm);
update_hiwater_rss(vma->vm_mm);
mmu_notifier_invalidate_range_start(&range);
- unmap_single_vma(&tlb, vma, address, range.end, details);
+ /*
+ * unmap 'address-end' not 'range.start-range.end' as range
+ * could have been expanded for hugetlb pmd sharing.
+ */
+ unmap_single_vma(&tlb, vma, address, end, details);
mmu_notifier_invalidate_range_end(&range);
tlb_finish_mmu(&tlb);
}
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
selftests-vm-update-hugetlb-madvise.patch
hugetlb-remove-duplicate-mmu-notifications.patch
The VM_SOFTDIRTY should be set in the vma flags to be tested if new
allocation should be merged in previous vma or not. With this patch,
the new allocations are merged in the previous VMAs.
I've tested it by reverting the commit 34228d473efe ("mm: ignore
VM_SOFTDIRTY on VMA merging") and after adding this following patch,
I'm seeing that all the new allocations done through mmap() are merged
in the previous VMAs. The number of VMAs doesn't increase drastically
which had contributed to the crash of gimp. If I run the same test after
reverting and not including this patch, the number of VMAs keep on
increasing with every mmap() syscall which proves this patch.
The commit 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
seems like a workaround. But it lets the soft-dirty and non-soft-dirty
VMA to get merged. It helps in avoiding the creation of too many VMAs.
But it creates the problem while adding the feature of clearing the
soft-dirty status of only a part of the memory region.
Cc: <stable(a)vger.kernel.org>
Fixes: d9104d1ca966 ("mm: track vma changes with VM_SOFTDIRTY bit")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
We need more testing of this patch.
While implementing clear soft-dirty bit for a range of address space, I'm
facing an issue. The non-soft dirty VMA gets merged sometimes with the soft
dirty VMA. Thus the non-soft dirty VMA become dirty which is undesirable.
When discussed with the some other developers they consider it the
regression. Why the non-soft dirty page should appear as soft dirty when it
isn't soft dirty in reality? I agree with them. Should we revert
34228d473efe or find a workaround in the IOCTL?
* Revert may cause the VMAs to expand in uncontrollable situation where the
soft dirty bit of a lot of memory regions or the whole address space is
being cleared again and again. AFAIK normal process must either be only
clearing a few memory regions. So the applications should be okay. There is
still chance of regressions if some applications are already using the
soft-dirty bit. I'm not sure how to test it.
* Add a flag in the IOCTL to ignore the dirtiness of VMA. The user will
surely lose the functionality to detect reused memory regions. But the
extraneous soft-dirty pages would not appear. I'm trying to do this in the
patch series [1]. Some discussion is going on that this fails with some
mprotect use case [2]. I still need to have a look at the mprotect selftest
to see how and why this fails. I think this can be implemented after some
more work probably in mprotect side.
[1] https://lore.kernel.org/all/20221109102303.851281-1-usama.anjum@collabora.c…
[2] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com/
Changes in v2:
- Rebase on top of next-20221122
---
mm/mmap.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index f4e2989be5ff..031d23bc43c4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2552,6 +2552,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags |= VM_ACCOUNT;
}
+ /*
+ * New (or expanded) vma always get soft dirty status.
+ * Otherwise user-space soft-dirty page tracker won't
+ * be able to distinguish situation when vma area unmapped,
+ * then new mapped in-place (which must be aimed as
+ * a completely new data area).
+ */
+ vm_flags |= VM_SOFTDIRTY;
+
next = mas_next(&mas, ULONG_MAX);
prev = mas_prev(&mas, 0);
if (vm_flags & VM_SPECIAL)
@@ -2724,15 +2733,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (file)
uprobe_mmap(vma);
- /*
- * New (or expanded) vma always get soft dirty status.
- * Otherwise user-space soft-dirty page tracker won't
- * be able to distinguish situation when vma area unmapped,
- * then new mapped in-place (which must be aimed as
- * a completely new data area).
- */
- vma->vm_flags |= VM_SOFTDIRTY;
-
vma_set_page_prot(vma);
validate_mm(mm);
@@ -2974,7 +2974,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
vma->vm_start = addr;
vma->vm_end = addr + len;
vma->vm_pgoff = addr >> PAGE_SHIFT;
- vma->vm_flags = flags;
+ vma->vm_flags = flags | VM_SOFTDIRTY;
vma->vm_page_prot = vm_get_page_prot(flags);
mas_set_range(mas, vma->vm_start, addr + len - 1);
if (mas_store_gfp(mas, vma, GFP_KERNEL))
@@ -2987,7 +2987,6 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
mm->data_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
- vma->vm_flags |= VM_SOFTDIRTY;
validate_mm(mm);
return 0;
@@ -3021,6 +3020,8 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
if ((flags & (~VM_EXEC)) != 0)
return -EINVAL;
+ flags |= VM_SOFTDIRTY;
+
ret = check_brk_limits(addr, len);
if (ret)
goto limits_failed;
--
2.30.2
Hi,
I have started to look at igt for testing and want to use CRC tests. To
implement support for this I need to move away from the simple kms
helper.
When looking around for examples I came across Thomas' nice shadow
helper and thought, yes this is perfect for drm/gud. So I'll switch to
that before I move away from the simple kms helper.
The async framebuffer flushing code path now uses a shadow buffer and
doesn't touch the framebuffer when it shouldn't. I have also taken the
opportunity to inline the synchronous flush code path and make this the
default flushing stategy.
Noralf.
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Signed-off-by: Noralf Trønnes <noralf(a)tronnes.org>
---
Changes in v2:
- Drop patch (Thomas):
drm/gem: shadow_fb_access: Prepare imported buffers for CPU access
- Use src as variable name for iosys_map (Thomas)
- Prepare imported buffer for CPU access in the driver (Thomas)
- New patch: make sync flushing the default (Thomas)
- Link to v1: https://lore.kernel.org/r/20221122-gud-shadow-plane-v1-0-9de3afa3383e@tronn…
---
Noralf Trønnes (6):
drm/gud: Fix UBSAN warning
drm/gud: Don't retry a failed framebuffer flush
drm/gud: Split up gud_flush_work()
drm/gud: Prepare buffer for CPU access in gud_flush_work()
drm/gud: Use the shadow plane helper
drm/gud: Enable synchronous flushing by default
drivers/gpu/drm/gud/gud_drv.c | 1 +
drivers/gpu/drm/gud/gud_internal.h | 1 +
drivers/gpu/drm/gud/gud_pipe.c | 222 ++++++++++++++++++-------------------
3 files changed, 112 insertions(+), 112 deletions(-)
---
base-commit: 7257702951305b1f0259c3468c39fc59d1ad4d8b
change-id: 20221122-gud-shadow-plane-ae37a95d4d8d
Best regards,
--
Noralf Trønnes <noralf(a)tronnes.org>
From: Rob Clark <robdclark(a)chromium.org>
vm_open() is not allowed to fail. Fortunately we are guaranteed that
the pages are already pinned, thanks to the initial mmap which is now
being cloned into a forked process, and only need to increment the
refcnt. So just increment it directly. Previously if a signal was
delivered at the wrong time to the forking process, the
mutex_lock_interruptible() could fail resulting in the pages_use_count
not being incremented.
Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 3b7b71391a4c..b602cd72a120 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -571,12 +571,20 @@ static void drm_gem_shmem_vm_open(struct vm_area_struct *vma)
{
struct drm_gem_object *obj = vma->vm_private_data;
struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
- int ret;
WARN_ON(shmem->base.import_attach);
- ret = drm_gem_shmem_get_pages(shmem);
- WARN_ON_ONCE(ret != 0);
+ mutex_lock(&shmem->pages_lock);
+
+ /*
+ * We should have already pinned the pages when the buffer was first
+ * mmap'd, vm_open() just grabs an additional reference for the new
+ * mm the vma is getting copied into (ie. on fork()).
+ */
+ if (!WARN_ON_ONCE(!shmem->pages_use_count))
+ shmem->pages_use_count++;
+
+ mutex_unlock(&shmem->pages_lock);
drm_gem_vm_open(vma);
}
--
2.38.1
>
> On Thu, Nov 24, 2022 at 01:08:57AM +0000, Dominic Jones wrote:
> > > On Fri, Oct 28, 2022 at 02:51:43PM +0000, Dominic Jones wrote:
> > > > Updating the machine's kernel from v5.19.x to v6.0.x causes the machine to not
> > > > successfully boot. The machine boots successfully (and exhibits stable operation)
> > > > with version v5.19.17 and multiple earlier releases in the 5.19 line. Multiple releases
> > > > from the 6.0 line (including 6.0.0, 6.0.3, and 6.0.5), with no other changes to the
> > > > software environment, do not boot. Instead, the machine hangs after loading services
> > > > but before presenting a display manager; the machine instead shows repetitive hard
> > > > drive activity at this point and then no apparent activity.
> > > >
> > > > ''uname'' output for the machine successfully running v5.19.17 is:
> > > >
> > > > Linux [MACHINE_NAME] 5.19.17 #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:32:29 2022 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux
> > > >
> > > > The machine is an OCZ Neutrino netbook, running a custom OS build largely similar to
> > > > LFS development. The kernel update uses ''make olddefconfig''.
> > >
> > > Can you use 'git bisect' to find the offending change that causes this
> > > to happen?
> >
> > Bisection is complete. Here's what it returned.
> >
> > ---
> >
> > 3a194f3f8ad01bce00bd7174aaba1563bcc827eb is the first bad commit
> > commit 3a194f3f8ad01bce00bd7174aaba1563bcc827eb
> > Author: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
> > Date: Thu Jul 14 13:24:14 2022 +0900
> >
> > mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
> >
> > follow_pud_mask() does not support non-present pud entry now. As long as
> > I tested on x86_64 server, follow_pud_mask() still simply returns
> > no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
> > user-visible effect should happen. But generally we should call
> > follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
> >
> > Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
> > The changes are similar to previous works for pud entries commit
> > e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") and
> > commit cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present
> > hugepage").
> >
> > Link: https://lkml.kernel.org/r/20220714042420.1847125-3-naoya.horiguchi@linux.dev
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
> > Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
> > Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
> > Cc: David Hildenbrand <david(a)redhat.com>
> > Cc: kernel test robot <lkp(a)intel.com>
> > Cc: Liu Shixin <liushixin2(a)huawei.com>
> > Cc: Muchun Song <songmuchun(a)bytedance.com>
> > Cc: Oscar Salvador <osalvador(a)suse.de>
> > Cc: Yang Shi <shy828301(a)gmail.com>
> > Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> >
> > arch/x86/mm/hugetlbpage.c | 8 +++++++-
> > mm/hugetlb.c | 32 ++++++++++++++++++++++++++++++--
> > 2 files changed, 37 insertions(+), 3 deletions(-)
> >
I got two replies here, so I'm responding to both for visibility.
From Greg K H:
> Great! Please work with those developers to figure out why this is
> causing a problem for your system.
From Thorsten L:
> Many thx for this. A fix for that particular commit for recently
> committed to 6.0.y:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
>
> That thus bears the question: does your problem still happen with the
> latest 6.0.y version?
Version 6.0.9 appears to fix the issue, with no regression as of 6.0.10.
(The issue appeared in 6.0.7. I didn't test 6.0.8 since 6.0.9 had already
appeared by the time bisection was complete.)
Thanks!
Dominic Jones
jonesd(a)xmission.com
Hello!
The first patch fixes an issue reported by Sami, where linux panic()s
when bringing secondary CPUs online. The problem was the Spectre
workarounds trying to allocate a new slot for mitigating KVM when
those pages are no longer writeable.
While debugging that issue, I spotted the Spectre-BHB KVM mitigation was
over-riding the Spectre-v2 KVM Mitigation. It's supposed to happen the
other way round.
The backports aren't the same as mainline because the spectre mitigation code
was totally rewritten for v5.10, and prior to that the KVM infrastructure
is very different.
Thanks,
James Morse (2):
arm64: Fix panic() when Spectre-v2 causes Spectre-BHB to re-allocate
KVM vectors
arm64: errata: Fix KVM Spectre-v2 mitigation selection for
Cortex-A57/A72
arch/arm64/kernel/cpu_errata.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
--
2.30.2
Hello!
The first patch fixes an issue reported by Sami, where linux panic()s
when bringing secondary CPUs online. The problem was the Spectre
workarounds trying to allocate a new slot for mitigating KVM when
those pages are no longer writeable.
While debugging that issue, I spotted the Spectre-BHB KVM mitigation was
over-riding the Spectre-v2 KVM Mitigation. It's supposed to happen the
other way round.
The backports aren't the same as mainline because the spectre mitigation code
was totally rewritten for v5.10, and prior to that the KVM infrastructure
is very different.
Thanks,
James Morse (2):
arm64: Fix panic() when Spectre-v2 causes Spectre-BHB to re-allocate
KVM vectors
arm64: errata: Fix KVM Spectre-v2 mitigation selection for
Cortex-A57/A72
arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
--
2.30.2
Hello!
The first patch fixes an issue reported by Sami, where linux panic()s
when bringing secondary CPUs online. The problem was the Spectre
workarounds trying to allocate a new slot for mitigating KVM when
those pages are no longer writeable.
While debugging that issue, I spotted the Spectre-BHB KVM mitigation was
over-riding the Spectre-v2 KVM Mitigation. It's supposed to happen the
other way round.
The backports aren't the same as mainline because the spectre mitigation code
was totally rewritten for v5.10, and prior to that the KVM infrastructure
is very different.
Thanks,
James Morse (2):
arm64: Fix panic() when Spectre-v2 causes Spectre-BHB to re-allocate
KVM vectors
arm64: errata: Fix KVM Spectre-v2 mitigation selection for
Cortex-A57/A72
arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
85ef1679a190 ("drm/amdgpu/dm/mst: Fix uninitialized var in pre_compute_mst_dsc_configs_for_state()")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
6366fc70deb9 ("drm/display/dp_mst: Maintain time slot allocations when deleting payloads")
a5c2c0d164e9 ("drm/display/dp_mst: Add nonblocking helpers for DP MST")
0b4e477e08a1 ("drm/display/dp_mst: Add helper for finding payloads in atomic MST state")
0bee2ae29eb4 ("drm/display/dp_mst: Add some missing kdocs for atomic MST structs")
df78f7f660cd ("drm/display/dp_mst: Call them time slots, not VCPI slots")
48b6b3726fb7 ("drm/display/dp_mst: Rename drm_dp_mst_vcpi_allocation")
dbaadb3cebaa ("drm/amdgpu/dm/mst: Rename get_payload_table()")
8c5e9bbb3662 ("drm/amdgpu/dc/mst: Rename dp_mst_stream_allocation(_table)")
25f7cde8bad9 ("drm/amd/display: Add tags for indicating mst progress status")
8b076fa7c5be ("drm/amd/display: Add is_mst_connector debugfs entry")
922e7ee31def ("drm/amd/display: Clear edid when unplug mst connector")
990cad0e4a9d ("drm/amd/display: extract update stream allocation to link_hwss")
84a8b3908285 ("drm/amd/display: Release remote dc_sink under mst scenario")
71be4b16d39a ("drm/amd/display: dsc validate fail not pass to atomic check")
453b0016a054 ("drm/amd/display: Detect dpcd_rev when hotplug mst monitor")
00df0514ab13 ("Merge tag 'amd-drm-next-5.19-2022-05-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85ef1679a190a9740f6b72217cb139a0d9c58706 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Fri, 18 Nov 2022 14:54:05 -0500
Subject: [PATCH] drm/amdgpu/dm/mst: Fix uninitialized var in
pre_compute_mst_dsc_configs_for_state()
Coverity noticed this one, so let's fix it.
Fixes: ba891436c2d2b2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Cc: stable(a)vger.kernel.org # v5.6+
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 59648f5ffb59..6483ba266893 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1180,7 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct amdgpu_dm_connector *aconnector;
struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
- int ret;
+ int ret = 0;
for (i = 0; i < dc_state->stream_count; i++)
computed_streams[i] = false;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
85ef1679a190 ("drm/amdgpu/dm/mst: Fix uninitialized var in pre_compute_mst_dsc_configs_for_state()")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
6366fc70deb9 ("drm/display/dp_mst: Maintain time slot allocations when deleting payloads")
a5c2c0d164e9 ("drm/display/dp_mst: Add nonblocking helpers for DP MST")
0b4e477e08a1 ("drm/display/dp_mst: Add helper for finding payloads in atomic MST state")
0bee2ae29eb4 ("drm/display/dp_mst: Add some missing kdocs for atomic MST structs")
df78f7f660cd ("drm/display/dp_mst: Call them time slots, not VCPI slots")
48b6b3726fb7 ("drm/display/dp_mst: Rename drm_dp_mst_vcpi_allocation")
dbaadb3cebaa ("drm/amdgpu/dm/mst: Rename get_payload_table()")
8c5e9bbb3662 ("drm/amdgpu/dc/mst: Rename dp_mst_stream_allocation(_table)")
25f7cde8bad9 ("drm/amd/display: Add tags for indicating mst progress status")
8b076fa7c5be ("drm/amd/display: Add is_mst_connector debugfs entry")
922e7ee31def ("drm/amd/display: Clear edid when unplug mst connector")
990cad0e4a9d ("drm/amd/display: extract update stream allocation to link_hwss")
84a8b3908285 ("drm/amd/display: Release remote dc_sink under mst scenario")
71be4b16d39a ("drm/amd/display: dsc validate fail not pass to atomic check")
453b0016a054 ("drm/amd/display: Detect dpcd_rev when hotplug mst monitor")
00df0514ab13 ("Merge tag 'amd-drm-next-5.19-2022-05-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85ef1679a190a9740f6b72217cb139a0d9c58706 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Fri, 18 Nov 2022 14:54:05 -0500
Subject: [PATCH] drm/amdgpu/dm/mst: Fix uninitialized var in
pre_compute_mst_dsc_configs_for_state()
Coverity noticed this one, so let's fix it.
Fixes: ba891436c2d2b2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Cc: stable(a)vger.kernel.org # v5.6+
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 59648f5ffb59..6483ba266893 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1180,7 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct amdgpu_dm_connector *aconnector;
struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
- int ret;
+ int ret = 0;
for (i = 0; i < dc_state->stream_count; i++)
computed_streams[i] = false;
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
85ef1679a190 ("drm/amdgpu/dm/mst: Fix uninitialized var in pre_compute_mst_dsc_configs_for_state()")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
6366fc70deb9 ("drm/display/dp_mst: Maintain time slot allocations when deleting payloads")
a5c2c0d164e9 ("drm/display/dp_mst: Add nonblocking helpers for DP MST")
0b4e477e08a1 ("drm/display/dp_mst: Add helper for finding payloads in atomic MST state")
0bee2ae29eb4 ("drm/display/dp_mst: Add some missing kdocs for atomic MST structs")
df78f7f660cd ("drm/display/dp_mst: Call them time slots, not VCPI slots")
48b6b3726fb7 ("drm/display/dp_mst: Rename drm_dp_mst_vcpi_allocation")
dbaadb3cebaa ("drm/amdgpu/dm/mst: Rename get_payload_table()")
8c5e9bbb3662 ("drm/amdgpu/dc/mst: Rename dp_mst_stream_allocation(_table)")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85ef1679a190a9740f6b72217cb139a0d9c58706 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Fri, 18 Nov 2022 14:54:05 -0500
Subject: [PATCH] drm/amdgpu/dm/mst: Fix uninitialized var in
pre_compute_mst_dsc_configs_for_state()
Coverity noticed this one, so let's fix it.
Fixes: ba891436c2d2b2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Cc: stable(a)vger.kernel.org # v5.6+
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 59648f5ffb59..6483ba266893 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1180,7 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct amdgpu_dm_connector *aconnector;
struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
- int ret;
+ int ret = 0;
for (i = 0; i < dc_state->stream_count; i++)
computed_streams[i] = false;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d60b82aa4d67 ("drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing DSC state")
dfbc00410c48 ("drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
47519d8224ba ("Merge tag 'amd-drm-next-6.1-2022-09-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d60b82aa4d67b2e6cf0364947a008bb7255ca4da Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:55 -0500
Subject: [PATCH] drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing
DSC state
Now that we've fixed the issue with using the incorrect topology manager,
we're actually grabbing the topology manager's lock - and consequently
deadlocking. Luckily for us though, there's actually nothing in AMD's DSC
state computation code that really should need this lock. The one exception
is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no
locks beneath &mgr->lock there so that should be fine to leave be.
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 5196c9a0e432..59648f5ffb59 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1148,10 +1148,8 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
@@ -1208,10 +1206,8 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d60b82aa4d67 ("drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing DSC state")
dfbc00410c48 ("drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
47519d8224ba ("Merge tag 'amd-drm-next-6.1-2022-09-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d60b82aa4d67b2e6cf0364947a008bb7255ca4da Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:55 -0500
Subject: [PATCH] drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing
DSC state
Now that we've fixed the issue with using the incorrect topology manager,
we're actually grabbing the topology manager's lock - and consequently
deadlocking. Luckily for us though, there's actually nothing in AMD's DSC
state computation code that really should need this lock. The one exception
is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no
locks beneath &mgr->lock there so that should be fine to leave be.
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 5196c9a0e432..59648f5ffb59 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1148,10 +1148,8 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
@@ -1208,10 +1206,8 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d60b82aa4d67 ("drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing DSC state")
dfbc00410c48 ("drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
47519d8224ba ("Merge tag 'amd-drm-next-6.1-2022-09-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d60b82aa4d67b2e6cf0364947a008bb7255ca4da Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:55 -0500
Subject: [PATCH] drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing
DSC state
Now that we've fixed the issue with using the incorrect topology manager,
we're actually grabbing the topology manager's lock - and consequently
deadlocking. Luckily for us though, there's actually nothing in AMD's DSC
state computation code that really should need this lock. The one exception
is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no
locks beneath &mgr->lock there so that should be fine to leave be.
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 5196c9a0e432..59648f5ffb59 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1148,10 +1148,8 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
@@ -1208,10 +1206,8 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
dfbc00410c48 ("drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
47519d8224ba ("Merge tag 'amd-drm-next-6.1-2022-09-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfbc00410c48a9896d4a65600be7137202517780 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:54 -0500
Subject: [PATCH] drm/amdgpu/dm/mst: Use the correct topology mgr pointer in
amdgpu_dm_connector
This bug hurt me. Basically, it appears that we've been grabbing the
entirely wrong mutex in the MST DSC computation code for amdgpu! While
we've been grabbing:
amdgpu_dm_connector->mst_mgr
That's zero-initialized memory, because the only connectors we'll ever
actually be doing DSC computations for are MST ports. Which have mst_mgr
zero-initialized, and instead have the correct topology mgr pointer located
at:
amdgpu_dm_connector->mst_port->mgr;
I'm a bit impressed that until now, this code has managed not to crash
anyone's systems! It does seem to cause a warning in LOCKDEP though:
[ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
This was causing the problems that appeared to have been introduced by:
commit 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
This wasn't actually where they came from though. Presumably, before the
only thing we were doing with the topology mgr pointer was attempting to
grab mst_mgr->lock. Since the above commit however, we grab much more
information from mst_mgr including the atomic MST state and respective
modesetting locks.
This patch also implies that up until now, it's quite likely we could be
susceptible to race conditions when going through the MST topology state
for DSC computations since we technically will not have grabbed any lock
when going through it.
So, let's fix this by adjusting all the respective code paths to look at
the right pointer and skip things that aren't actual MST connectors from a
topology.
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index bba2e8aaa2c2..5196c9a0e432 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1117,6 +1117,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret = 0;
@@ -1131,7 +1132,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1146,16 +1147,13 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
-
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1182,6 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret;
@@ -1196,7 +1195,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1208,15 +1207,13 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1419,6 +1416,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
unsigned int upper_link_bw_in_kbps = 0, down_link_bw_in_kbps = 0;
unsigned int max_compressed_bw_in_kbps = 0;
struct dc_dsc_bw_range bw_range = {0};
+ struct drm_dp_mst_topology_mgr *mst_mgr;
/*
* check if the mode could be supported if DSC pass-through is supported
@@ -1427,7 +1425,8 @@ enum dc_status dm_dp_mst_is_port_support_mode(
*/
if (is_dsc_common_config_possible(stream, &bw_range) &&
aconnector->port->passthrough_aux) {
- mutex_lock(&aconnector->mst_mgr.lock);
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
cur_link_settings = stream->link->verified_link_cap;
@@ -1440,7 +1439,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
end_to_end_bw_in_kbps = min(upper_link_bw_in_kbps,
down_link_bw_in_kbps);
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
/*
* use the maximum dsc compression bandwidth as the required
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
dfbc00410c48 ("drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
47519d8224ba ("Merge tag 'amd-drm-next-6.1-2022-09-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfbc00410c48a9896d4a65600be7137202517780 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:54 -0500
Subject: [PATCH] drm/amdgpu/dm/mst: Use the correct topology mgr pointer in
amdgpu_dm_connector
This bug hurt me. Basically, it appears that we've been grabbing the
entirely wrong mutex in the MST DSC computation code for amdgpu! While
we've been grabbing:
amdgpu_dm_connector->mst_mgr
That's zero-initialized memory, because the only connectors we'll ever
actually be doing DSC computations for are MST ports. Which have mst_mgr
zero-initialized, and instead have the correct topology mgr pointer located
at:
amdgpu_dm_connector->mst_port->mgr;
I'm a bit impressed that until now, this code has managed not to crash
anyone's systems! It does seem to cause a warning in LOCKDEP though:
[ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
This was causing the problems that appeared to have been introduced by:
commit 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
This wasn't actually where they came from though. Presumably, before the
only thing we were doing with the topology mgr pointer was attempting to
grab mst_mgr->lock. Since the above commit however, we grab much more
information from mst_mgr including the atomic MST state and respective
modesetting locks.
This patch also implies that up until now, it's quite likely we could be
susceptible to race conditions when going through the MST topology state
for DSC computations since we technically will not have grabbed any lock
when going through it.
So, let's fix this by adjusting all the respective code paths to look at
the right pointer and skip things that aren't actual MST connectors from a
topology.
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index bba2e8aaa2c2..5196c9a0e432 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1117,6 +1117,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret = 0;
@@ -1131,7 +1132,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1146,16 +1147,13 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
-
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1182,6 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret;
@@ -1196,7 +1195,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1208,15 +1207,13 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1419,6 +1416,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
unsigned int upper_link_bw_in_kbps = 0, down_link_bw_in_kbps = 0;
unsigned int max_compressed_bw_in_kbps = 0;
struct dc_dsc_bw_range bw_range = {0};
+ struct drm_dp_mst_topology_mgr *mst_mgr;
/*
* check if the mode could be supported if DSC pass-through is supported
@@ -1427,7 +1425,8 @@ enum dc_status dm_dp_mst_is_port_support_mode(
*/
if (is_dsc_common_config_possible(stream, &bw_range) &&
aconnector->port->passthrough_aux) {
- mutex_lock(&aconnector->mst_mgr.lock);
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
cur_link_settings = stream->link->verified_link_cap;
@@ -1440,7 +1439,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
end_to_end_bw_in_kbps = min(upper_link_bw_in_kbps,
down_link_bw_in_kbps);
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
/*
* use the maximum dsc compression bandwidth as the required
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
dfbc00410c48 ("drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector")
ba891436c2d2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
876fcc4222e1 ("drm/amd/display: Validate DSC After Enable All New CRTCs")
47519d8224ba ("Merge tag 'amd-drm-next-6.1-2022-09-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dfbc00410c48a9896d4a65600be7137202517780 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:54 -0500
Subject: [PATCH] drm/amdgpu/dm/mst: Use the correct topology mgr pointer in
amdgpu_dm_connector
This bug hurt me. Basically, it appears that we've been grabbing the
entirely wrong mutex in the MST DSC computation code for amdgpu! While
we've been grabbing:
amdgpu_dm_connector->mst_mgr
That's zero-initialized memory, because the only connectors we'll ever
actually be doing DSC computations for are MST ports. Which have mst_mgr
zero-initialized, and instead have the correct topology mgr pointer located
at:
amdgpu_dm_connector->mst_port->mgr;
I'm a bit impressed that until now, this code has managed not to crash
anyone's systems! It does seem to cause a warning in LOCKDEP though:
[ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
This was causing the problems that appeared to have been introduced by:
commit 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
This wasn't actually where they came from though. Presumably, before the
only thing we were doing with the topology mgr pointer was attempting to
grab mst_mgr->lock. Since the above commit however, we grab much more
information from mst_mgr including the atomic MST state and respective
modesetting locks.
This patch also implies that up until now, it's quite likely we could be
susceptible to race conditions when going through the MST topology state
for DSC computations since we technically will not have grabbed any lock
when going through it.
So, let's fix this by adjusting all the respective code paths to look at
the right pointer and skip things that aren't actual MST connectors from a
topology.
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index bba2e8aaa2c2..5196c9a0e432 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1117,6 +1117,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret = 0;
@@ -1131,7 +1132,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1146,16 +1147,13 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
-
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1182,6 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret;
@@ -1196,7 +1195,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1208,15 +1207,13 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1419,6 +1416,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
unsigned int upper_link_bw_in_kbps = 0, down_link_bw_in_kbps = 0;
unsigned int max_compressed_bw_in_kbps = 0;
struct dc_dsc_bw_range bw_range = {0};
+ struct drm_dp_mst_topology_mgr *mst_mgr;
/*
* check if the mode could be supported if DSC pass-through is supported
@@ -1427,7 +1425,8 @@ enum dc_status dm_dp_mst_is_port_support_mode(
*/
if (is_dsc_common_config_possible(stream, &bw_range) &&
aconnector->port->passthrough_aux) {
- mutex_lock(&aconnector->mst_mgr.lock);
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
cur_link_settings = stream->link->verified_link_cap;
@@ -1440,7 +1439,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
end_to_end_bw_in_kbps = min(upper_link_bw_in_kbps,
down_link_bw_in_kbps);
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
/*
* use the maximum dsc compression bandwidth as the required
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2f3a1273862c ("drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2f3a1273862cb82cca227630cc7f04ce0c94b6bb Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:53 -0500
Subject: [PATCH] drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs()
return code
Looks like that we're accidentally dropping a pretty important return code
here. For some reason, we just return -EINVAL if we fail to get the MST
topology state. This is wrong: error codes are important and should never
be squashed without being handled, which here seems to have the potential
to cause a deadlock.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Fixes: 8ec046716ca8 ("drm/dp_mst: Add helper to trigger modeset on affected DSC MST CRTCs")
Cc: <stable(a)vger.kernel.org> # v5.6+
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ecd22c038c8c..51a46689cda7 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -5186,7 +5186,7 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct drm_atomic_state *state, struct drm
mst_state = drm_atomic_get_mst_topology_state(state, mgr);
if (IS_ERR(mst_state))
- return -EINVAL;
+ return PTR_ERR(mst_state);
list_for_each_entry(pos, &mst_state->payloads, next) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2f3a1273862c ("drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2f3a1273862cb82cca227630cc7f04ce0c94b6bb Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Mon, 14 Nov 2022 17:17:53 -0500
Subject: [PATCH] drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs()
return code
Looks like that we're accidentally dropping a pretty important return code
here. For some reason, we just return -EINVAL if we fail to get the MST
topology state. This is wrong: error codes are important and should never
be squashed without being handled, which here seems to have the potential
to cause a deadlock.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Fixes: 8ec046716ca8 ("drm/dp_mst: Add helper to trigger modeset on affected DSC MST CRTCs")
Cc: <stable(a)vger.kernel.org> # v5.6+
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ecd22c038c8c..51a46689cda7 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -5186,7 +5186,7 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct drm_atomic_state *state, struct drm
mst_state = drm_atomic_get_mst_topology_state(state, mgr);
if (IS_ERR(mst_state))
- return -EINVAL;
+ return PTR_ERR(mst_state);
list_for_each_entry(pos, &mst_state->payloads, next) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
00a6c36cca76 ("drm/i915/ttm: never purge busy objects")
ab4911b7d411 ("drm/i915/ttm: ensure we unmap when purging")
ffa3fe080c77 ("drm/i915: clean up shrinker_release_pages")
9354417750e5 ("drm/i915: remove writeback hook")
004746e4b119 ("drm/i915/ttm: Correctly handle waiting for gpu when shrinking")
3589fdbd3b20 ("drm/i915/ttm: Reorganize the ttm move code")
cad7109a2b5e ("drm/i915: Introduce refcounted sg-tables")
ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru")
e25d1ea4b1dc ("drm/i915: add some kernel-doc for shrink_pin and friends")
7ae034590cea ("drm/i915/ttm: add tt shmem backend")
f05b985e6f76 ("drm/i915/gem: Break out some shmem backend utils")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 00a6c36cca760d0b659f894dee728555b193c5e1 Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld(a)intel.com>
Date: Tue, 15 Nov 2022 10:46:20 +0000
Subject: [PATCH] drm/i915/ttm: never purge busy objects
In i915_gem_madvise_ioctl() we immediately purge the object is not
currently used, like when the mm.pages are NULL. With shmem the pages
might still be hanging around or are perhaps swapped out. Similarly with
ttm we might still have the pages hanging around on the ttm resource,
like with lmem or shmem, but here we need to be extra careful since
async unbinds are possible as well as in-progress kernel moves. In
i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
for us, however if it's busy the memory is only moved to a ghost object,
which then leads to broken behaviour when for example clearing the
i915_tt->filp, since the actual ttm_tt is still alive and populated,
even though it's been moved to the ghost object. When we later destroy
the ghost object we hit the following, since the filp is now NULL:
[ +0.006982] #PF: supervisor read access in kernel mode
[ +0.005149] #PF: error_code(0x0000) - not-present page
[ +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
[ +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
[ +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
[ +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
30 e9 42 b2 ff ff 4c 89
[ +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
[ +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
[ +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
[ +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
[ +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
[ +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
[ +0.007154] FS: 0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
[ +0.008108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
[ +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ +0.007154] Call Trace:
[ +0.002459] <TASK>
[ +0.002126] ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
[ +0.005068] ttm_bo_tt_destroy+0x1c/0x50 [ttm]
[ +0.004464] ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
[ +0.005244] ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
[ +0.004721] ttm_bo_delayed_delete+0x235/0x250 [ttm]
[ +0.004981] ttm_device_delayed_workqueue+0x13/0x40 [ttm]
[ +0.005422] process_one_work+0x248/0x560
[ +0.004028] worker_thread+0x4b/0x390
[ +0.003682] ? process_one_work+0x560/0x560
[ +0.004199] kthread+0xeb/0x120
[ +0.003163] ? kthread_complete_and_exit+0x20/0x20
[ +0.004815] ret_from_fork+0x1f/0x30
v2:
- Just use ttm_bo_wait() directly (Niranjana)
- Add testcase reference
Testcase: igt@gem_madvise@dontneed-evict-race
Fixes: 213d50927763 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura(a)intel.com>
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Andrzej Hajda <andrzej.hajda(a)intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.15+
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura(a)intel.com>
Acked-by: Nirmoy Das <Nirmoy.Das(a)intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221115104620.120432-1-matth…
(cherry picked from commit 5524b5e52e08f675116a93296fe5bee60bc43c03)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 3d4305eea1aa..0d6d640225fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -612,6 +612,10 @@ static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
+ err = ttm_bo_wait(bo, true, false);
+ if (err)
+ return err;
+
err = i915_ttm_move_notify(bo);
if (err)
return err;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
db5df2541200 ("btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker")
8949b9a11401 ("btrfs: fix lock inversion problem when doing qgroup extent tracing")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
cb13eea3b490 ("btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
bf77467a93bd ("btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT")
9631e4cc1a03 ("btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks")
fd7ba1c1202d ("btrfs: add nesting tags to the locking helpers")
51899412dd95 ("btrfs: introduce btrfs_path::recurse")
329ced799be8 ("btrfs: rename extent_buffer::lock_nested to extent_buffer::lock_recursed")
d16c702fe4f2 ("btrfs: ctree: check key order before merging tree blocks")
d3beaa253fd6 ("btrfs: set the lockdep class for log tree extent buffers")
ad24466588ab ("btrfs: set the correct lockdep class for new nodes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
db5df2541200 ("btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker")
8949b9a11401 ("btrfs: fix lock inversion problem when doing qgroup extent tracing")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
cb13eea3b490 ("btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
bf77467a93bd ("btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT")
9631e4cc1a03 ("btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks")
fd7ba1c1202d ("btrfs: add nesting tags to the locking helpers")
51899412dd95 ("btrfs: introduce btrfs_path::recurse")
329ced799be8 ("btrfs: rename extent_buffer::lock_nested to extent_buffer::lock_recursed")
d16c702fe4f2 ("btrfs: ctree: check key order before merging tree blocks")
d3beaa253fd6 ("btrfs: set the lockdep class for log tree extent buffers")
ad24466588ab ("btrfs: set the correct lockdep class for new nodes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
db5df2541200 ("btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker")
8949b9a11401 ("btrfs: fix lock inversion problem when doing qgroup extent tracing")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
cb13eea3b490 ("btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
bf77467a93bd ("btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT")
9631e4cc1a03 ("btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks")
fd7ba1c1202d ("btrfs: add nesting tags to the locking helpers")
51899412dd95 ("btrfs: introduce btrfs_path::recurse")
329ced799be8 ("btrfs: rename extent_buffer::lock_nested to extent_buffer::lock_recursed")
d16c702fe4f2 ("btrfs: ctree: check key order before merging tree blocks")
d3beaa253fd6 ("btrfs: set the lockdep class for log tree extent buffers")
ad24466588ab ("btrfs: set the correct lockdep class for new nodes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
db5df2541200 ("btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker")
8949b9a11401 ("btrfs: fix lock inversion problem when doing qgroup extent tracing")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
cb13eea3b490 ("btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
bf77467a93bd ("btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT")
9631e4cc1a03 ("btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks")
fd7ba1c1202d ("btrfs: add nesting tags to the locking helpers")
51899412dd95 ("btrfs: introduce btrfs_path::recurse")
329ced799be8 ("btrfs: rename extent_buffer::lock_nested to extent_buffer::lock_recursed")
d16c702fe4f2 ("btrfs: ctree: check key order before merging tree blocks")
d3beaa253fd6 ("btrfs: set the lockdep class for log tree extent buffers")
ad24466588ab ("btrfs: set the correct lockdep class for new nodes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
db5df2541200 ("btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker")
8949b9a11401 ("btrfs: fix lock inversion problem when doing qgroup extent tracing")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
cb13eea3b490 ("btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
db5df2541200 ("btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
e562a8bdf652 ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7e942b5bb35d8e3af54053d19a6bf04143a3955 Mon Sep 17 00:00:00 2001
From: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Date: Wed, 16 Nov 2022 22:23:54 +0800
Subject: [PATCH] btrfs: qgroup: fix sleep from invalid context bug in
btrfs_qgroup_inherit()
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9334c3157c22..b74105a10f16 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2951,14 +2951,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
dstgroup->rsv_rfer = inherit->lim.rsv_rfer;
dstgroup->rsv_excl = inherit->lim.rsv_excl;
- ret = update_qgroup_limit_item(trans, dstgroup);
- if (ret) {
- qgroup_mark_inconsistent(fs_info);
- btrfs_info(fs_info,
- "unable to update quota limit for %llu",
- dstgroup->qgroupid);
- goto unlock;
- }
+ qgroup_dirty(fs_info, dstgroup);
}
if (srcid) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
418ffb9e3cf6 ("btrfs: free btrfs_path before copying inodes to userspace")
e3059ec06b9f ("btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 418ffb9e3cf6c4e2574d3a732b724916684bd133 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 10 Nov 2022 11:36:28 +0530
Subject: [PATCH] btrfs: free btrfs_path before copying inodes to userspace
btrfs_ioctl_logical_to_ino() frees the search path after the userspace
copy from the temp buffer @inodes. Which potentially can lead to a lock
splat.
Fix this by freeing the path before we copy @inodes to userspace.
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 89b8d14cb68c..b595f2c6dfc9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4282,21 +4282,20 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
size = min_t(u32, loi->size, SZ_16M);
}
- path = btrfs_alloc_path();
- if (!path) {
- ret = -ENOMEM;
- goto out;
- }
-
inodes = init_data_container(size);
if (IS_ERR(inodes)) {
ret = PTR_ERR(inodes);
- inodes = NULL;
- goto out;
+ goto out_loi;
}
+ path = btrfs_alloc_path();
+ if (!path) {
+ ret = -ENOMEM;
+ goto out;
+ }
ret = iterate_inodes_from_logical(loi->logical, fs_info, path,
inodes, ignore_offset);
+ btrfs_free_path(path);
if (ret == -EINVAL)
ret = -ENOENT;
if (ret < 0)
@@ -4308,7 +4307,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
ret = -EFAULT;
out:
- btrfs_free_path(path);
kvfree(inodes);
out_loi:
kfree(loi);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
418ffb9e3cf6 ("btrfs: free btrfs_path before copying inodes to userspace")
e3059ec06b9f ("btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 418ffb9e3cf6c4e2574d3a732b724916684bd133 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 10 Nov 2022 11:36:28 +0530
Subject: [PATCH] btrfs: free btrfs_path before copying inodes to userspace
btrfs_ioctl_logical_to_ino() frees the search path after the userspace
copy from the temp buffer @inodes. Which potentially can lead to a lock
splat.
Fix this by freeing the path before we copy @inodes to userspace.
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 89b8d14cb68c..b595f2c6dfc9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4282,21 +4282,20 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
size = min_t(u32, loi->size, SZ_16M);
}
- path = btrfs_alloc_path();
- if (!path) {
- ret = -ENOMEM;
- goto out;
- }
-
inodes = init_data_container(size);
if (IS_ERR(inodes)) {
ret = PTR_ERR(inodes);
- inodes = NULL;
- goto out;
+ goto out_loi;
}
+ path = btrfs_alloc_path();
+ if (!path) {
+ ret = -ENOMEM;
+ goto out;
+ }
ret = iterate_inodes_from_logical(loi->logical, fs_info, path,
inodes, ignore_offset);
+ btrfs_free_path(path);
if (ret == -EINVAL)
ret = -ENOENT;
if (ret < 0)
@@ -4308,7 +4307,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
ret = -EFAULT;
out:
- btrfs_free_path(path);
kvfree(inodes);
out_loi:
kfree(loi);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
418ffb9e3cf6 ("btrfs: free btrfs_path before copying inodes to userspace")
e3059ec06b9f ("btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 418ffb9e3cf6c4e2574d3a732b724916684bd133 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 10 Nov 2022 11:36:28 +0530
Subject: [PATCH] btrfs: free btrfs_path before copying inodes to userspace
btrfs_ioctl_logical_to_ino() frees the search path after the userspace
copy from the temp buffer @inodes. Which potentially can lead to a lock
splat.
Fix this by freeing the path before we copy @inodes to userspace.
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 89b8d14cb68c..b595f2c6dfc9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4282,21 +4282,20 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
size = min_t(u32, loi->size, SZ_16M);
}
- path = btrfs_alloc_path();
- if (!path) {
- ret = -ENOMEM;
- goto out;
- }
-
inodes = init_data_container(size);
if (IS_ERR(inodes)) {
ret = PTR_ERR(inodes);
- inodes = NULL;
- goto out;
+ goto out_loi;
}
+ path = btrfs_alloc_path();
+ if (!path) {
+ ret = -ENOMEM;
+ goto out;
+ }
ret = iterate_inodes_from_logical(loi->logical, fs_info, path,
inodes, ignore_offset);
+ btrfs_free_path(path);
if (ret == -EINVAL)
ret = -ENOENT;
if (ret < 0)
@@ -4308,7 +4307,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
ret = -EFAULT;
out:
- btrfs_free_path(path);
kvfree(inodes);
out_loi:
kfree(loi);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
418ffb9e3cf6 ("btrfs: free btrfs_path before copying inodes to userspace")
e3059ec06b9f ("btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 418ffb9e3cf6c4e2574d3a732b724916684bd133 Mon Sep 17 00:00:00 2001
From: Anand Jain <anand.jain(a)oracle.com>
Date: Thu, 10 Nov 2022 11:36:28 +0530
Subject: [PATCH] btrfs: free btrfs_path before copying inodes to userspace
btrfs_ioctl_logical_to_ino() frees the search path after the userspace
copy from the temp buffer @inodes. Which potentially can lead to a lock
splat.
Fix this by freeing the path before we copy @inodes to userspace.
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 89b8d14cb68c..b595f2c6dfc9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4282,21 +4282,20 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
size = min_t(u32, loi->size, SZ_16M);
}
- path = btrfs_alloc_path();
- if (!path) {
- ret = -ENOMEM;
- goto out;
- }
-
inodes = init_data_container(size);
if (IS_ERR(inodes)) {
ret = PTR_ERR(inodes);
- inodes = NULL;
- goto out;
+ goto out_loi;
}
+ path = btrfs_alloc_path();
+ if (!path) {
+ ret = -ENOMEM;
+ goto out;
+ }
ret = iterate_inodes_from_logical(loi->logical, fs_info, path,
inodes, ignore_offset);
+ btrfs_free_path(path);
if (ret == -EINVAL)
ret = -ENOENT;
if (ret < 0)
@@ -4308,7 +4307,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
ret = -EFAULT;
out:
- btrfs_free_path(path);
kvfree(inodes);
out_loi:
kfree(loi);
Hi,
[ Marc, can you help reviewing? Esp. the first patch? ]
This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
we're seeing on AWS ARM instances where attaching an EBS volume (which is a
nvme device) to the instance after offlining CPUs causes the device to take
several minutes to show up and eventually nvme kworkers and other threads start
getting stuck.
This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
An easy reproducer is:
1. Start an ARM instance with 32 CPUs
2. Once the instance is booted, offline all CPUs but CPU 0. Eg:
# for i in $(seq 1 32); do chcpu -d $i; done
3. Once the CPUs are offline, attach an EBS volume
4. Watch lsblk and dmesg in the instance
Eventually, you get this stack trace:
[ 71.842974] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802
[ 71.843966] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 71.845149] pci 0000:00:1f.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 71.846694] pci 0000:00:1f.0: BAR 0: assigned [mem 0x8011c000-0x8011ffff]
[ 71.848458] ACPI: \_SB_.PCI0.GSI3: Enabled at IRQ 38
[ 71.850852] nvme nvme1: pci function 0000:00:1f.0
[ 71.851611] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
[ 135.887787] nvme nvme1: I/O 22 QID 0 timeout, completion polled
[ 197.328276] nvme nvme1: I/O 23 QID 0 timeout, completion polled
[ 197.329221] nvme nvme1: 1/0/0 default/read/poll queues
[ 243.408619] INFO: task kworker/u64:2:275 blocked for more than 122 seconds.
[ 243.409674] Not tainted 5.15.79 #1
[ 243.410270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.411389] task:kworker/u64:2 state:D stack: 0 pid: 275 ppid: 2 flags:0x00000008
[ 243.412602] Workqueue: events_unbound async_run_entry_fn
[ 243.413417] Call trace:
[ 243.413797] __switch_to+0x15c/0x1a4
[ 243.414335] __schedule+0x2bc/0x990
[ 243.414849] schedule+0x68/0xf8
[ 243.415334] schedule_timeout+0x184/0x340
[ 243.415946] wait_for_completion+0xc8/0x220
[ 243.416543] __flush_work.isra.43+0x240/0x2f0
[ 243.417179] flush_work+0x20/0x2c
[ 243.417666] nvme_async_probe+0x20/0x3c
[ 243.418228] async_run_entry_fn+0x3c/0x1e0
[ 243.418858] process_one_work+0x1bc/0x460
[ 243.419437] worker_thread+0x164/0x528
[ 243.420030] kthread+0x118/0x124
[ 243.420517] ret_from_fork+0x10/0x20
[ 258.768771] nvme nvme1: I/O 20 QID 0 timeout, completion polled
[ 320.209266] nvme nvme1: I/O 21 QID 0 timeout, completion polled
For completion, I tested the same test-case on x86 with this series applied
on 5.15.79 and 5.10.155 as well. It works as expected.
Thanks,
Marc Zyngier (4):
genirq/msi: Shutdown managed interrupts with unsatifiable affinities
genirq: Always limit the affinity to online CPUs
irqchip/gic-v3: Always trust the managed affinity provided by the core
code
genirq: Take the proposed affinity at face value if force==true
drivers/irqchip/irq-gic-v3-its.c | 2 +-
kernel/irq/manage.c | 31 +++++++++++++++++++++++--------
kernel/irq/msi.c | 7 +++++++
3 files changed, 31 insertions(+), 9 deletions(-)
--
2.37.1
> On Fri, Oct 28, 2022 at 02:51:43PM +0000, Dominic Jones wrote:
> > Updating the machine's kernel from v5.19.x to v6.0.x causes the machine to not
> > successfully boot. The machine boots successfully (and exhibits stable operation)
> > with version v5.19.17 and multiple earlier releases in the 5.19 line. Multiple releases
> > from the 6.0 line (including 6.0.0, 6.0.3, and 6.0.5), with no other changes to the
> > software environment, do not boot. Instead, the machine hangs after loading services
> > but before presenting a display manager; the machine instead shows repetitive hard
> > drive activity at this point and then no apparent activity.
> >
> > ''uname'' output for the machine successfully running v5.19.17 is:
> >
> > Linux [MACHINE_NAME] 5.19.17 #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:32:29 2022 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux
> >
> > The machine is an OCZ Neutrino netbook, running a custom OS build largely similar to
> > LFS development. The kernel update uses ''make olddefconfig''.
>
> Can you use 'git bisect' to find the offending change that causes this
> to happen?
Bisection is complete. Here's what it returned.
---
3a194f3f8ad01bce00bd7174aaba1563bcc827eb is the first bad commit
commit 3a194f3f8ad01bce00bd7174aaba1563bcc827eb
Author: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Date: Thu Jul 14 13:24:14 2022 +0900
mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
follow_pud_mask() does not support non-present pud entry now. As long as
I tested on x86_64 server, follow_pud_mask() still simply returns
no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
user-visible effect should happen. But generally we should call
follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
The changes are similar to previous works for pud entries commit
e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") and
commit cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present
hugepage").
Link: https://lkml.kernel.org/r/20220714042420.1847125-3-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: kernel test robot <lkp(a)intel.com>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
arch/x86/mm/hugetlbpage.c | 8 +++++++-
mm/hugetlb.c | 32 ++++++++++++++++++++++++++++++--
2 files changed, 37 insertions(+), 3 deletions(-)
---
Dominic
This is the start of the stable review cycle for the 4.9.328 release.
There are 42 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 15 Sep 2022 14:03:27 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.328-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.328-rc1
NeilBrown <neilb(a)suse.de>
SUNRPC: use _bh spinlocking on ->transport_lock
Yang Ling <gnaygnil(a)gmail.com>
MIPS: loongson32: ls1c: Fix hang during startup
Johan Hovold <johan+linaro(a)kernel.org>
usb: dwc3: fix PHY disable sequence
Toke Høiland-Jørgensen <toke(a)toke.dk>
sch_sfb: Also store skb len before calling child enqueue
Neal Cardwell <ncardwell(a)google.com>
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
Dan Carpenter <dan.carpenter(a)oracle.com>
tipc: fix shift wrapping bug in map_get()
Toke Høiland-Jørgensen <toke(a)toke.dk>
sch_sfb: Don't assume the skb is still around after enqueueing to child
David Leadbeater <dgl(a)dgl.cx>
netfilter: nf_conntrack_irc: Fix forged IP logic
Harsh Modi <harshmodi(a)google.com>
netfilter: br_netfilter: Drop dst references before setting.
Isaac J. Manjarres <isaacmanjarres(a)google.com>
driver core: Don't probe devices after bus_type.match() probe deferral
Sreekanth Reddy <sreekanth.reddy(a)broadcom.com>
scsi: mpt3sas: Fix use-after-free warning
Dongxiang Ke <kdx.glider(a)gmail.com>
ALSA: usb-audio: Fix an out-of-bounds bug in __snd_usb_parse_audio_interface()
Pattara Teerapong <pteerapong(a)chromium.org>
ALSA: aloop: Fix random zeros in capture data when using jiffies timer
Tasos Sahanidis <tasos(a)tasossah.com>
ALSA: emu10k1: Fix out of bounds access in snd_emu10k1_pcm_channel_alloc()
Yang Yingliang <yangyingliang(a)huawei.com>
fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init()
Helge Deller <deller(a)gmx.de>
parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
Li Qiong <liqiong(a)nfschina.com>
parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
Zhenneng Li <lizhenneng(a)kylinos.cn>
drm/radeon: add a force flush to delay work when radeon
Yee Lee <yee.lee(a)mediatek.com>
Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"
Linus Torvalds <torvalds(a)linux-foundation.org>
fs: only do a memory barrier for the first set_buffer_uptodate()
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: Fix data-race at module auto-loading
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: oss: Fix data-race for max_midi_devs access
Miquel Raynal <miquel.raynal(a)bootlin.com>
net: mac802154: Fix a condition in the receive path
Siddh Raman Pant <code(a)siddh.me>
wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
Krishna Kurapati <quic_kriskura(a)quicinc.com>
usb: gadget: mass_storage: Fix cdrom data transfers on MAC-OS
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Prevent nested device-reset calls
Josh Poimboeuf <jpoimboe(a)kernel.org>
s390: fix nospec table alignments
Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepages
Witold Lipieta <witold.lipieta(a)thaumatec.com>
usb-storage: Add ignore-residue quirk for NXP PN7462AU
Thierry GUIBERT <thierry.guibert(a)croix-rouge.fr>
USB: cdc-acm: Add Icom PMR F3400 support (0c26:0020)
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add support for Cinterion MV32-WA/WB RmNet mode
Yan Xinyu <sdlyyxy(a)bupt.edu.cn>
USB: serial: option: add support for OPPO R11 diag port
Johan Hovold <johan(a)kernel.org>
USB: serial: cp210x: add Decagon UCA device id
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: Add grace period after xHC start to prevent premature runtime suspend.
Armin Wolf <W_Armin(a)gmx.de>
hwmon: (gpio-fan) Fix array out of bounds access
Niek Nooijens <niek.nooijens(a)omron.com>
USB: serial: ftdi_sio: add Omron CS1W-CIF31 device id
Helge Deller <deller(a)gmx.de>
vt: Clear selection before changing the font
Dan Carpenter <dan.carpenter(a)oracle.com>
staging: rtl8712: fix use after free bugs
Shenwei Wang <shenwei.wang(a)nxp.com>
serial: fsl_lpuart: RS485 RTS polariy is inverse
Dan Carpenter <dan.carpenter(a)oracle.com>
wifi: cfg80211: debugfs: fix return type in ht40allow_map_read()
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask
Letu Ren <fantasquex(a)gmail.com>
fbdev: fb_pm2fb: Avoid potential divide by zero error
-------------
Diffstat:
Makefile | 4 +--
arch/mips/loongson32/ls1c/board.c | 1 -
arch/parisc/kernel/head.S | 43 +++++++++++++++++++++++++++-
arch/s390/include/asm/hugetlb.h | 6 ++--
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/x86/include/asm/pmc_atom.h | 6 ++--
arch/x86/platform/atom/pmc_atom.c | 2 +-
drivers/base/dd.c | 10 +++++++
drivers/gpu/drm/radeon/radeon_device.c | 3 ++
drivers/hwmon/gpio-fan.c | 3 ++
drivers/parisc/ccio-dma.c | 11 +++++--
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +-
drivers/staging/rtl8712/rtl8712_cmd.c | 36 -----------------------
drivers/tty/serial/fsl_lpuart.c | 4 +--
drivers/tty/vt/vt.c | 12 +++++---
drivers/usb/class/cdc-acm.c | 3 ++
drivers/usb/core/hub.c | 10 +++++++
drivers/usb/dwc3/core.c | 20 ++++++-------
drivers/usb/gadget/function/storage_common.c | 6 ++--
drivers/usb/host/xhci-hub.c | 11 +++++++
drivers/usb/host/xhci.c | 4 ++-
drivers/usb/host/xhci.h | 2 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/ftdi_sio.c | 2 ++
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++
drivers/usb/serial/option.c | 11 +++++++
drivers/usb/storage/unusual_devs.h | 7 +++++
drivers/video/fbdev/chipsfb.c | 1 +
drivers/video/fbdev/pm2fb.c | 5 ++++
include/linux/buffer_head.h | 11 +++++++
include/linux/usb.h | 2 ++
mm/kmemleak.c | 8 +++---
net/bridge/br_netfilter_hooks.c | 2 ++
net/bridge/br_netfilter_ipv6.c | 1 +
net/ipv4/tcp_input.c | 25 +++++++++++-----
net/mac80211/ibss.c | 4 +++
net/mac802154/rx.c | 2 +-
net/netfilter/nf_conntrack_irc.c | 5 ++--
net/sched/sch_sfb.c | 13 +++++----
net/sunrpc/xprt.c | 4 +--
net/tipc/monitor.c | 2 +-
net/wireless/debugfs.c | 3 +-
sound/core/seq/oss/seq_oss_midi.c | 2 ++
sound/core/seq/seq_clientmgr.c | 12 ++++----
sound/drivers/aloop.c | 7 +++--
sound/pci/emu10k1/emupcm.c | 2 +-
sound/usb/stream.c | 2 +-
47 files changed, 236 insertions(+), 104 deletions(-)
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Commit ac4e97abce9b8 ("scatterlist: sg_set_buf() argument must be in linear
mapping") requires that both the signature and the digest resides in the
linear mapping area.
However, more recently commit ba14a194a434c ("fork: Add generic vmalloced
stack support"), made it possible to move the stack in the vmalloc area,
which could make the requirement of the first commit not satisfied anymore.
If CONFIG_SG=y and CONFIG_VMAP_STACK=y, the following BUG() is triggered:
[ 467.077359] kernel BUG at include/linux/scatterlist.h:163!
[ 467.077939] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[...]
[ 467.095225] Call Trace:
[ 467.096088] <TASK>
[ 467.096928] ? rcu_read_lock_held_common+0xe/0x50
[ 467.097569] ? rcu_read_lock_sched_held+0x13/0x70
[ 467.098123] ? trace_hardirqs_on+0x2c/0xd0
[ 467.098647] ? public_key_verify_signature+0x470/0x470
[ 467.099237] asymmetric_verify+0x14c/0x300
[ 467.099869] evm_verify_hmac+0x245/0x360
[ 467.100391] evm_inode_setattr+0x43/0x190
The failure happens only for the digest, as the pointer comes from the
stack, and not for the signature, which instead was allocated by
vfs_getxattr_alloc().
Fix this by making a copy of both in asymmetric_verify(), so that the
linear mapping requirement is always satisfied, regardless of the caller.
Cc: stable(a)vger.kernel.org # 4.9.x
Fixes: ba14a194a434 ("fork: Add generic vmalloced stack support")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/digsig_asymmetric.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/security/integrity/digsig_asymmetric.c b/security/integrity/digsig_asymmetric.c
index 895f4b9ce8c6..635238d5c7fe 100644
--- a/security/integrity/digsig_asymmetric.c
+++ b/security/integrity/digsig_asymmetric.c
@@ -122,11 +122,26 @@ int asymmetric_verify(struct key *keyring, const char *sig,
goto out;
}
- pks.digest = (u8 *)data;
+ pks.digest = kmemdup(data, datalen, GFP_KERNEL);
+ if (!pks.digest) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
pks.digest_size = datalen;
- pks.s = hdr->sig;
+
+ pks.s = kmemdup(hdr->sig, siglen, GFP_KERNEL);
+ if (!pks.s) {
+ kfree(pks.digest);
+ ret = -ENOMEM;
+ goto out;
+ }
+
pks.s_size = siglen;
+
ret = verify_signature(key, &pks);
+ kfree(pks.digest);
+ kfree(pks.s);
out:
key_put(key);
pr_debug("%s() = %d\n", __func__, ret);
--
2.25.1
Since: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
we wait for all the workloads to be completed during shutdown. This was done to
avoid a stall once the device is started again.
Unfortunately this has the side effect of stalling kexec(), if the userspace
is frozen. Let's handle that case.
To: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
To: Liam Girdwood <lgirdwood(a)gmail.com>
To: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
To: Bard Liao <yung-chuan.liao(a)linux.intel.com>
To: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
To: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
To: Daniel Baluta <daniel.baluta(a)nxp.com>
To: Mark Brown <broonie(a)kernel.org>
To: Jaroslav Kysela <perex(a)perex.cz>
To: Takashi Iwai <tiwai(a)suse.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Chromeos Kdump <chromeos-kdump(a)google.com>
To: Steven Rostedt <rostedt(a)goodmis.org>
Cc: stable(a)vger.kernel.org
Cc: sound-open-firmware(a)alsa-project.org
Cc: alsa-devel(a)alsa-project.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kexec(a)lists.infradead.org
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v6:
- Check if we are in kexec with the userspace frozen.
- Link to v5: https://lore.kernel.org/r/20221127-snd-freeze-v5-0-4ededeb08ba0@chromium.org
Changes in v5:
- Edit subject prefix.
- Link to v4: https://lore.kernel.org/r/20221127-snd-freeze-v4-0-51ca64b7f2ab@chromium.org
Changes in v4:
- Do not call snd_sof_machine_unregister from shutdown.
- Link to v3: https://lore.kernel.org/r/20221127-snd-freeze-v3-0-a2eda731ca14@chromium.org
Changes in v3:
- Wrap pm_freezing in a function.
- Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org
Changes in v2:
- Only use pm_freezing if CONFIG_FREEZER .
- Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
---
Ricardo Ribalda (2):
kexec: Introduce kexec_with_frozen_processes
ASoC: SOF: Fix deadlock when shutdown a frozen userspace
include/linux/kexec.h | 3 +++
kernel/kexec_core.c | 5 +++++
sound/soc/sof/core.c | 4 +++-
3 files changed, 11 insertions(+), 1 deletion(-)
---
base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
change-id: 20221127-snd-freeze-1ee143228326
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
During kexec(), the userspace is frozen. Therefore we cannot wait for it
to complete.
Avoid running snd_sof_machine_unregister during shutdown.
This fixes:
[ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done.
[ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds.
[ 246.819035] Call Trace:
[ 246.821782] <TASK>
[ 246.824186] __schedule+0x5f9/0x1263
[ 246.828231] schedule+0x87/0xc5
[ 246.831779] snd_card_disconnect_sync+0xb5/0x127
...
[ 246.889249] snd_sof_device_shutdown+0xb4/0x150
[ 246.899317] pci_device_shutdown+0x37/0x61
[ 246.903990] device_shutdown+0x14c/0x1d6
[ 246.908391] kernel_kexec+0x45/0xb9
And:
[ 246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds.
[ 246.927709] Call Trace:
[ 246.930461] <TASK>
[ 246.932819] __schedule+0x5f9/0x1263
[ 246.936855] ? fsnotify_grab_connector+0x5c/0x70
[ 246.942045] schedule+0x87/0xc5
[ 246.945567] schedule_timeout+0x49/0xf3
[ 246.949877] wait_for_completion+0x86/0xe8
[ 246.954463] snd_card_free+0x68/0x89
...
[ 247.001080] platform_device_unregister+0x12/0x35
Cc: stable(a)vger.kernel.org
Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
To: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
To: Liam Girdwood <lgirdwood(a)gmail.com>
To: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
To: Bard Liao <yung-chuan.liao(a)linux.intel.com>
To: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
To: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
To: Daniel Baluta <daniel.baluta(a)nxp.com>
To: Mark Brown <broonie(a)kernel.org>
To: Jaroslav Kysela <perex(a)perex.cz>
To: Takashi Iwai <tiwai(a)suse.com>
Cc: sound-open-firmware(a)alsa-project.org
Cc: alsa-devel(a)alsa-project.org
Cc: linux-kernel(a)vger.kernel.org
---
Changes in v4:
- Do not call snd_sof_machine_unregister from shutdown.
- Link to v3: https://lore.kernel.org/r/20221127-snd-freeze-v3-0-a2eda731ca14@chromium.org
Changes in v3:
- Wrap pm_freezing in a function
- Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org
Changes in v2:
- Only use pm_freezing if CONFIG_FREEZER
- Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
---
sound/soc/sof/core.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c
index 3e6141d03770..9616ba607ded 100644
--- a/sound/soc/sof/core.c
+++ b/sound/soc/sof/core.c
@@ -475,19 +475,16 @@ EXPORT_SYMBOL(snd_sof_device_remove);
int snd_sof_device_shutdown(struct device *dev)
{
struct snd_sof_dev *sdev = dev_get_drvdata(dev);
- struct snd_sof_pdata *pdata = sdev->pdata;
if (IS_ENABLED(CONFIG_SND_SOC_SOF_PROBE_WORK_QUEUE))
cancel_work_sync(&sdev->probe_work);
/*
- * make sure clients and machine driver(s) are unregistered to force
- * all userspace devices to be closed prior to the DSP shutdown sequence
+ * make sure clients are unregistered prior to the DSP shutdown
+ * sequence.
*/
sof_unregister_clients(sdev);
- snd_sof_machine_unregister(sdev, pdata);
-
if (sdev->fw_state == SOF_FW_BOOT_COMPLETE)
return snd_sof_shutdown(sdev);
---
base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
change-id: 20221127-snd-freeze-1ee143228326
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
For Lexicon I-ONIX FW810S, the call of ioctl(2) with
SNDRV_PCM_IOCTL_HW_PARAMS can returns -ETIMEDOUT. This is a regression due
to the commit 41319eb56e19 ("ALSA: dice: wait just for
NOTIFY_CLOCK_ACCEPTED after GLOBAL_CLOCK_SELECT operation"). The device
does not emit NOTIFY_CLOCK_ACCEPTED notification when accepting
GLOBAL_CLOCK_SELECT operation with the same parameters as current ones.
This commit fixes the regression. When receiving no notification, return
-ETIMEDOUT as long as operating for any change.
Fixes: 41319eb56e19 ("ALSA: dice: wait just for NOTIFY_CLOCK_ACCEPTED after GLOBAL_CLOCK_SELECT operation")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
sound/firewire/dice/dice-stream.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/sound/firewire/dice/dice-stream.c b/sound/firewire/dice/dice-stream.c
index f99e00083141..4c677c8546c7 100644
--- a/sound/firewire/dice/dice-stream.c
+++ b/sound/firewire/dice/dice-stream.c
@@ -59,7 +59,7 @@ int snd_dice_stream_get_rate_mode(struct snd_dice *dice, unsigned int rate,
static int select_clock(struct snd_dice *dice, unsigned int rate)
{
- __be32 reg;
+ __be32 reg, new;
u32 data;
int i;
int err;
@@ -83,15 +83,17 @@ static int select_clock(struct snd_dice *dice, unsigned int rate)
if (completion_done(&dice->clock_accepted))
reinit_completion(&dice->clock_accepted);
- reg = cpu_to_be32(data);
+ new = cpu_to_be32(data);
err = snd_dice_transaction_write_global(dice, GLOBAL_CLOCK_SELECT,
- ®, sizeof(reg));
+ &new, sizeof(new));
if (err < 0)
return err;
if (wait_for_completion_timeout(&dice->clock_accepted,
- msecs_to_jiffies(NOTIFICATION_TIMEOUT_MS)) == 0)
- return -ETIMEDOUT;
+ msecs_to_jiffies(NOTIFICATION_TIMEOUT_MS)) == 0) {
+ if (reg != new)
+ return -ETIMEDOUT;
+ }
return 0;
}
--
2.37.2
commit 3ce00bb7e91cf57d723905371507af57182c37ef upstream.
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr
dereference") binder caches a pointer to the current->mm during open().
This fixes a null-ptr dereference reported by syzkaller. Unfortunately,
it also opens the door for a process to update its mm after the open(),
(e.g. via execve) making the cached alloc->mm pointer invalid.
Things get worse when the process continues to mmap() a vma. From this
point forward, binder will attempt to find this vma using an obsolete
alloc->mm reference. Such as in binder_update_page_range(), where the
wrong vma is obtained via vma_lookup(), yet binder proceeds to happily
insert new pages into it.
To avoid this issue fail the ->mmap() callback if we detect a mismatch
between the vma->vm_mm and the original alloc->mm pointer. This prevents
alloc->vm_addr from getting set, so that any subsequent vma_lookup()
calls fail as expected.
Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.15+
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[cmllamas: renamed alloc->mm since missing e66b77e50522]
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
drivers/android/binder_alloc.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 8ed450125c92..6acfb896b2e5 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -753,6 +753,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
const char *failure_string;
struct binder_buffer *buffer;
+ if (unlikely(vma->vm_mm != alloc->vma_vm_mm)) {
+ ret = -EINVAL;
+ failure_string = "invalid vma->vm_mm";
+ goto err_invalid_mm;
+ }
+
mutex_lock(&binder_alloc_mmap_lock);
if (alloc->buffer_size) {
ret = -EBUSY;
@@ -799,6 +805,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
alloc->buffer_size = 0;
err_already_mapped:
mutex_unlock(&binder_alloc_mmap_lock);
+err_invalid_mm:
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%s: %d %lx-%lx %s failed %d\n", __func__,
alloc->pid, vma->vm_start, vma->vm_end,
--
2.38.1.584.g0f3c55d4c2-goog
commit 3ce00bb7e91cf57d723905371507af57182c37ef upstream.
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr
dereference") binder caches a pointer to the current->mm during open().
This fixes a null-ptr dereference reported by syzkaller. Unfortunately,
it also opens the door for a process to update its mm after the open(),
(e.g. via execve) making the cached alloc->mm pointer invalid.
Things get worse when the process continues to mmap() a vma. From this
point forward, binder will attempt to find this vma using an obsolete
alloc->mm reference. Such as in binder_update_page_range(), where the
wrong vma is obtained via vma_lookup(), yet binder proceeds to happily
insert new pages into it.
To avoid this issue fail the ->mmap() callback if we detect a mismatch
between the vma->vm_mm and the original alloc->mm pointer. This prevents
alloc->vm_addr from getting set, so that any subsequent vma_lookup()
calls fail as expected.
Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.15+
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[cmllamas: renamed alloc->mm since missing e66b77e50522]
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
drivers/android/binder_alloc.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 9b1778c00610..64999777e0bf 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -760,6 +760,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
const char *failure_string;
struct binder_buffer *buffer;
+ if (unlikely(vma->vm_mm != alloc->vma_vm_mm)) {
+ ret = -EINVAL;
+ failure_string = "invalid vma->vm_mm";
+ goto err_invalid_mm;
+ }
+
mutex_lock(&binder_alloc_mmap_lock);
if (alloc->buffer_size) {
ret = -EBUSY;
@@ -806,6 +812,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
alloc->buffer_size = 0;
err_already_mapped:
mutex_unlock(&binder_alloc_mmap_lock);
+err_invalid_mm:
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%s: %d %lx-%lx %s failed %d\n", __func__,
alloc->pid, vma->vm_start, vma->vm_end,
--
2.38.1.584.g0f3c55d4c2-goog
From: Rob Clark <robdclark(a)chromium.org>
vm_open() is not allowed to fail. Fortunately we are guaranteed that
the pages are already pinned, and only need to increment the refcnt. So
just increment it directly.
Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 110a9eac2af8..9885ba64127f 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -571,12 +571,20 @@ static void drm_gem_shmem_vm_open(struct vm_area_struct *vma)
{
struct drm_gem_object *obj = vma->vm_private_data;
struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
- int ret;
WARN_ON(shmem->base.import_attach);
- ret = drm_gem_shmem_get_pages(shmem);
- WARN_ON_ONCE(ret != 0);
+ mutex_lock(&shmem->pages_lock);
+
+ /*
+ * We should have already pinned the pages, vm_open() just grabs
+ * an additional reference for the new mm the vma is getting
+ * copied into.
+ */
+ WARN_ON_ONCE(!shmem->pages_use_count);
+
+ shmem->pages_use_count++;
+ mutex_unlock(&shmem->pages_lock);
drm_gem_vm_open(vma);
}
--
2.38.1
During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.
Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for
modules that have finished loading"), the kernel waits for modules in
MODULE_STATE_GOING state to finish unloading before making another
attempt to load the same module.
This creates unnecessary work in the described scenario and delays the
boot. In the worst case, it can prevent udev from loading drivers for
other devices and might cause timeouts of services waiting on them and
subsequently a failed boot.
This patch attempts a different solution for the problem 6e6de3dee51a
was trying to solve. Rather than waiting for the unloading to complete,
it returns a different error code (-EBUSY) for modules in the GOING
state. This should avoid the error situation that was described in
6e6de3dee51a (user space attempting to load a dependent module because
the -EEXIST error code would suggest to user space that the first module
had been loaded successfully), while avoiding the delay situation too.
Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Co-developed-by: Martin Wilck <mwilck(a)suse.com>
Signed-off-by: Martin Wilck <mwilck(a)suse.com>
Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com>
Cc: stable(a)vger.kernel.org
---
Notes:
Sending this alternative patch per the discussion in
https://lore.kernel.org/linux-modules/20220919123233.8538-1-petr.pavlu@suse….
The initial version comes internally from Martin, hence the co-developed tag.
kernel/module/main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index d02d39c7174e..b7e08d1edc27 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name)
sched_annotate_sleep();
mutex_lock(&module_mutex);
mod = find_module_all(name, strlen(name), true);
- ret = !mod || mod->state == MODULE_STATE_LIVE;
+ ret = !mod || mod->state == MODULE_STATE_LIVE
+ || mod->state == MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
return ret;
@@ -2566,7 +2567,8 @@ static int add_unformed_module(struct module *mod)
mutex_lock(&module_mutex);
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
- if (old->state != MODULE_STATE_LIVE) {
+ if (old->state == MODULE_STATE_COMING
+ || old->state == MODULE_STATE_UNFORMED) {
/* Wait in case it fails to load. */
mutex_unlock(&module_mutex);
err = wait_event_interruptible(module_wq,
@@ -2575,7 +2577,7 @@ static int add_unformed_module(struct module *mod)
goto out_unlocked;
goto again;
}
- err = -EEXIST;
+ err = old->state != MODULE_STATE_LIVE ? -EBUSY : -EEXIST;
goto out;
}
mod_update_bounds(mod);
--
2.35.3
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4dbd6a3e90e0 ("x86/ioremap: Fix page aligned size calculation in __ioremap_caller()")
ecdd6ee77b73 ("x86/mm/pat: Standardize on memtype_*() prefix for APIs")
f9b57cf80c8b ("x86/mm/pat: Move the memtype related files to arch/x86/mm/pat/")
baf65855baac ("x86/mm/pat: Harmonize 'struct memtype *' local variable and function parameter use")
ef35b0fcee23 ("x86/mm/pat: Create fixed width output in /sys/kernel/debug/x86/pat_memtype_list, similar to the E820 debug printouts")
aee7f91369a8 ("x86/mm/pat: Update the comments in pat.c and pat_interval.c and refresh the code a bit")
91298f1a302d ("x86/mm/pat: Fix off-by-one bugs in interval tree search")
1c134b198daa ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4dbd6a3e90e03130973688fd79e19425f720d999 Mon Sep 17 00:00:00 2001
From: Michael Kelley <mikelley(a)microsoft.com>
Date: Wed, 16 Nov 2022 10:41:24 -0800
Subject: [PATCH] x86/ioremap: Fix page aligned size calculation in
__ioremap_caller()
Current code re-calculates the size after aligning the starting and
ending physical addresses on a page boundary. But the re-calculation
also embeds the masking of high order bits that exceed the size of
the physical address space (via PHYSICAL_PAGE_MASK). If the masking
removes any high order bits, the size calculation results in a huge
value that is likely to immediately fail.
Fix this by re-calculating the page-aligned size first. Then mask any
high order bits using PHYSICAL_PAGE_MASK.
Fixes: ffa71f33a820 ("x86, ioremap: Fix incorrect physical address handling in PAE mode")
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/1668624097-14884-2-git-send-email-mikelley@micros…
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78c5bc654cff..6453fbaedb08 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -217,9 +217,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size,
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
- phys_addr &= PHYSICAL_PAGE_MASK;
+ phys_addr &= PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;
+ /*
+ * Mask out any bits not part of the actual physical
+ * address, like memory encryption bits.
+ */
+ phys_addr &= PHYSICAL_PAGE_MASK;
+
retval = memtype_reserve(phys_addr, (u64)phys_addr + size,
pcm, &new_pcm);
if (retval) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4dbd6a3e90e0 ("x86/ioremap: Fix page aligned size calculation in __ioremap_caller()")
ecdd6ee77b73 ("x86/mm/pat: Standardize on memtype_*() prefix for APIs")
f9b57cf80c8b ("x86/mm/pat: Move the memtype related files to arch/x86/mm/pat/")
baf65855baac ("x86/mm/pat: Harmonize 'struct memtype *' local variable and function parameter use")
ef35b0fcee23 ("x86/mm/pat: Create fixed width output in /sys/kernel/debug/x86/pat_memtype_list, similar to the E820 debug printouts")
aee7f91369a8 ("x86/mm/pat: Update the comments in pat.c and pat_interval.c and refresh the code a bit")
91298f1a302d ("x86/mm/pat: Fix off-by-one bugs in interval tree search")
1c134b198daa ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4dbd6a3e90e03130973688fd79e19425f720d999 Mon Sep 17 00:00:00 2001
From: Michael Kelley <mikelley(a)microsoft.com>
Date: Wed, 16 Nov 2022 10:41:24 -0800
Subject: [PATCH] x86/ioremap: Fix page aligned size calculation in
__ioremap_caller()
Current code re-calculates the size after aligning the starting and
ending physical addresses on a page boundary. But the re-calculation
also embeds the masking of high order bits that exceed the size of
the physical address space (via PHYSICAL_PAGE_MASK). If the masking
removes any high order bits, the size calculation results in a huge
value that is likely to immediately fail.
Fix this by re-calculating the page-aligned size first. Then mask any
high order bits using PHYSICAL_PAGE_MASK.
Fixes: ffa71f33a820 ("x86, ioremap: Fix incorrect physical address handling in PAE mode")
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/1668624097-14884-2-git-send-email-mikelley@micros…
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78c5bc654cff..6453fbaedb08 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -217,9 +217,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size,
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
- phys_addr &= PHYSICAL_PAGE_MASK;
+ phys_addr &= PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;
+ /*
+ * Mask out any bits not part of the actual physical
+ * address, like memory encryption bits.
+ */
+ phys_addr &= PHYSICAL_PAGE_MASK;
+
retval = memtype_reserve(phys_addr, (u64)phys_addr + size,
pcm, &new_pcm);
if (retval) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4dbd6a3e90e0 ("x86/ioremap: Fix page aligned size calculation in __ioremap_caller()")
ecdd6ee77b73 ("x86/mm/pat: Standardize on memtype_*() prefix for APIs")
f9b57cf80c8b ("x86/mm/pat: Move the memtype related files to arch/x86/mm/pat/")
baf65855baac ("x86/mm/pat: Harmonize 'struct memtype *' local variable and function parameter use")
ef35b0fcee23 ("x86/mm/pat: Create fixed width output in /sys/kernel/debug/x86/pat_memtype_list, similar to the E820 debug printouts")
aee7f91369a8 ("x86/mm/pat: Update the comments in pat.c and pat_interval.c and refresh the code a bit")
91298f1a302d ("x86/mm/pat: Fix off-by-one bugs in interval tree search")
1c134b198daa ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4dbd6a3e90e03130973688fd79e19425f720d999 Mon Sep 17 00:00:00 2001
From: Michael Kelley <mikelley(a)microsoft.com>
Date: Wed, 16 Nov 2022 10:41:24 -0800
Subject: [PATCH] x86/ioremap: Fix page aligned size calculation in
__ioremap_caller()
Current code re-calculates the size after aligning the starting and
ending physical addresses on a page boundary. But the re-calculation
also embeds the masking of high order bits that exceed the size of
the physical address space (via PHYSICAL_PAGE_MASK). If the masking
removes any high order bits, the size calculation results in a huge
value that is likely to immediately fail.
Fix this by re-calculating the page-aligned size first. Then mask any
high order bits using PHYSICAL_PAGE_MASK.
Fixes: ffa71f33a820 ("x86, ioremap: Fix incorrect physical address handling in PAE mode")
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/1668624097-14884-2-git-send-email-mikelley@micros…
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78c5bc654cff..6453fbaedb08 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -217,9 +217,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size,
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
- phys_addr &= PHYSICAL_PAGE_MASK;
+ phys_addr &= PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;
+ /*
+ * Mask out any bits not part of the actual physical
+ * address, like memory encryption bits.
+ */
+ phys_addr &= PHYSICAL_PAGE_MASK;
+
retval = memtype_reserve(phys_addr, (u64)phys_addr + size,
pcm, &new_pcm);
if (retval) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4dbd6a3e90e0 ("x86/ioremap: Fix page aligned size calculation in __ioremap_caller()")
ecdd6ee77b73 ("x86/mm/pat: Standardize on memtype_*() prefix for APIs")
f9b57cf80c8b ("x86/mm/pat: Move the memtype related files to arch/x86/mm/pat/")
baf65855baac ("x86/mm/pat: Harmonize 'struct memtype *' local variable and function parameter use")
ef35b0fcee23 ("x86/mm/pat: Create fixed width output in /sys/kernel/debug/x86/pat_memtype_list, similar to the E820 debug printouts")
aee7f91369a8 ("x86/mm/pat: Update the comments in pat.c and pat_interval.c and refresh the code a bit")
91298f1a302d ("x86/mm/pat: Fix off-by-one bugs in interval tree search")
1c134b198daa ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4dbd6a3e90e03130973688fd79e19425f720d999 Mon Sep 17 00:00:00 2001
From: Michael Kelley <mikelley(a)microsoft.com>
Date: Wed, 16 Nov 2022 10:41:24 -0800
Subject: [PATCH] x86/ioremap: Fix page aligned size calculation in
__ioremap_caller()
Current code re-calculates the size after aligning the starting and
ending physical addresses on a page boundary. But the re-calculation
also embeds the masking of high order bits that exceed the size of
the physical address space (via PHYSICAL_PAGE_MASK). If the masking
removes any high order bits, the size calculation results in a huge
value that is likely to immediately fail.
Fix this by re-calculating the page-aligned size first. Then mask any
high order bits using PHYSICAL_PAGE_MASK.
Fixes: ffa71f33a820 ("x86, ioremap: Fix incorrect physical address handling in PAE mode")
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/1668624097-14884-2-git-send-email-mikelley@micros…
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78c5bc654cff..6453fbaedb08 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -217,9 +217,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size,
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
- phys_addr &= PHYSICAL_PAGE_MASK;
+ phys_addr &= PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;
+ /*
+ * Mask out any bits not part of the actual physical
+ * address, like memory encryption bits.
+ */
+ phys_addr &= PHYSICAL_PAGE_MASK;
+
retval = memtype_reserve(phys_addr, (u64)phys_addr + size,
pcm, &new_pcm);
if (retval) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
16ae56d7e052 ("KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16ae56d7e0528559bf8dc9070e3bfd8ba3de80df Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:44 +0200
Subject: [PATCH] KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02
while still in use
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-3-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..b02a3a1792f1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,6 +1125,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
05311ce954ae ("KVM: x86: remove exit_int_info warning in svm_handle_exit")
404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
dcf068da7eb2 ("KVM: VMX: Introduce generic fastpath handler")
a9ab13ff6e84 ("KVM: X86: Improve latency for single target IPI fastpath")
873e1da16918 ("KVM: VMX: Optimize handling of VM-Entry failures in vmx_vcpu_run()")
e64419d991ea ("KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook")
56a87e5d997b ("KVM: SVM: Fix __svm_vcpu_run declaration.")
199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
eaf78265a4ab ("KVM: SVM: Move SEV code to separate file")
ef0f64960d01 ("KVM: SVM: Move AVIC code to separate file")
883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
8c1b724ddb21 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 05311ce954aebe75935d9ae7d38ac82b5b796e33 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:51 +0200
Subject: [PATCH] KVM: x86: remove exit_int_info warning in svm_handle_exit
It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).
Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).
Thus this warning can be user triggred and has very little value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-10-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 098f04bec8ef..c0950ae86b2b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -346,12 +346,6 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
return 0;
}
-static int is_external_interrupt(u32 info)
-{
- info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
- return info == (SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR);
-}
-
static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3426,15 +3420,6 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 0;
}
- if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
- exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
- exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
- exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
- printk(KERN_ERR "%s: unexpected exit_int_info 0x%x "
- "exit_code 0x%x\n",
- __func__, svm->vmcb->control.exit_int_info,
- exit_code);
-
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
05311ce954ae ("KVM: x86: remove exit_int_info warning in svm_handle_exit")
404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
dcf068da7eb2 ("KVM: VMX: Introduce generic fastpath handler")
a9ab13ff6e84 ("KVM: X86: Improve latency for single target IPI fastpath")
873e1da16918 ("KVM: VMX: Optimize handling of VM-Entry failures in vmx_vcpu_run()")
e64419d991ea ("KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook")
56a87e5d997b ("KVM: SVM: Fix __svm_vcpu_run declaration.")
199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
eaf78265a4ab ("KVM: SVM: Move SEV code to separate file")
ef0f64960d01 ("KVM: SVM: Move AVIC code to separate file")
883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
8c1b724ddb21 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 05311ce954aebe75935d9ae7d38ac82b5b796e33 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:51 +0200
Subject: [PATCH] KVM: x86: remove exit_int_info warning in svm_handle_exit
It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).
Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).
Thus this warning can be user triggred and has very little value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-10-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 098f04bec8ef..c0950ae86b2b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -346,12 +346,6 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
return 0;
}
-static int is_external_interrupt(u32 info)
-{
- info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
- return info == (SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR);
-}
-
static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3426,15 +3420,6 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 0;
}
- if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
- exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
- exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
- exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
- printk(KERN_ERR "%s: unexpected exit_int_info 0x%x "
- "exit_code 0x%x\n",
- __func__, svm->vmcb->control.exit_int_info,
- exit_code);
-
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
05311ce954ae ("KVM: x86: remove exit_int_info warning in svm_handle_exit")
404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
dcf068da7eb2 ("KVM: VMX: Introduce generic fastpath handler")
a9ab13ff6e84 ("KVM: X86: Improve latency for single target IPI fastpath")
873e1da16918 ("KVM: VMX: Optimize handling of VM-Entry failures in vmx_vcpu_run()")
e64419d991ea ("KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook")
56a87e5d997b ("KVM: SVM: Fix __svm_vcpu_run declaration.")
199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
eaf78265a4ab ("KVM: SVM: Move SEV code to separate file")
ef0f64960d01 ("KVM: SVM: Move AVIC code to separate file")
883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
8c1b724ddb21 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 05311ce954aebe75935d9ae7d38ac82b5b796e33 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:51 +0200
Subject: [PATCH] KVM: x86: remove exit_int_info warning in svm_handle_exit
It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).
Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).
Thus this warning can be user triggred and has very little value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-10-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 098f04bec8ef..c0950ae86b2b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -346,12 +346,6 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
return 0;
}
-static int is_external_interrupt(u32 info)
-{
- info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
- return info == (SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR);
-}
-
static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3426,15 +3420,6 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 0;
}
- if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
- exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
- exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
- exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
- printk(KERN_ERR "%s: unexpected exit_int_info 0x%x "
- "exit_code 0x%x\n",
- __func__, svm->vmcb->control.exit_int_info,
- exit_code);
-
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
05311ce954ae ("KVM: x86: remove exit_int_info warning in svm_handle_exit")
404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
dcf068da7eb2 ("KVM: VMX: Introduce generic fastpath handler")
a9ab13ff6e84 ("KVM: X86: Improve latency for single target IPI fastpath")
873e1da16918 ("KVM: VMX: Optimize handling of VM-Entry failures in vmx_vcpu_run()")
e64419d991ea ("KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook")
56a87e5d997b ("KVM: SVM: Fix __svm_vcpu_run declaration.")
199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
eaf78265a4ab ("KVM: SVM: Move SEV code to separate file")
ef0f64960d01 ("KVM: SVM: Move AVIC code to separate file")
883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c")
46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory")
8c1b724ddb21 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 05311ce954aebe75935d9ae7d38ac82b5b796e33 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:51 +0200
Subject: [PATCH] KVM: x86: remove exit_int_info warning in svm_handle_exit
It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).
Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).
Thus this warning can be user triggred and has very little value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-10-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 098f04bec8ef..c0950ae86b2b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -346,12 +346,6 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
return 0;
}
-static int is_external_interrupt(u32 info)
-{
- info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
- return info == (SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR);
-}
-
static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3426,15 +3420,6 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 0;
}
- if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
- exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
- exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
- exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
- printk(KERN_ERR "%s: unexpected exit_int_info 0x%x "
- "exit_code 0x%x\n",
- __func__, svm->vmcb->control.exit_int_info,
- exit_code);
-
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9697df25143 ("KVM: x86: add kvm_leave_nested")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9697df251438b0798780900e8b43bdb12a56d64 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:45 +0200
Subject: [PATCH] KVM: x86: add kvm_leave_nested
add kvm_leave_nested which wraps a call to nested_ops->leave_nested
into a function.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-4-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b02a3a1792f1..7354f0035a69 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1146,9 +1146,6 @@ void svm_free_nested(struct vcpu_svm *svm)
svm->nested.initialized = false;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void svm_leave_nested(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..f7333b9cdfbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6440,9 +6440,6 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
return kvm_state.size;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void vmx_leave_nested(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..ff5be7189237 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -628,6 +628,12 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
ex->payload = payload;
}
+/* Forcibly leave the nested mode in cases like a vCPU reset */
+static void kvm_leave_nested(struct kvm_vcpu *vcpu)
+{
+ kvm_x86_ops.nested_ops->leave_nested(vcpu);
+}
+
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
unsigned nr, bool has_error, u32 error_code,
bool has_payload, unsigned long payload, bool reinject)
@@ -5195,7 +5201,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) {
- kvm_x86_ops.nested_ops->leave_nested(vcpu);
+ kvm_leave_nested(vcpu);
kvm_smm_changed(vcpu, events->smi.smm);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9697df25143 ("KVM: x86: add kvm_leave_nested")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9697df251438b0798780900e8b43bdb12a56d64 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:45 +0200
Subject: [PATCH] KVM: x86: add kvm_leave_nested
add kvm_leave_nested which wraps a call to nested_ops->leave_nested
into a function.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-4-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b02a3a1792f1..7354f0035a69 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1146,9 +1146,6 @@ void svm_free_nested(struct vcpu_svm *svm)
svm->nested.initialized = false;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void svm_leave_nested(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..f7333b9cdfbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6440,9 +6440,6 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
return kvm_state.size;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void vmx_leave_nested(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..ff5be7189237 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -628,6 +628,12 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
ex->payload = payload;
}
+/* Forcibly leave the nested mode in cases like a vCPU reset */
+static void kvm_leave_nested(struct kvm_vcpu *vcpu)
+{
+ kvm_x86_ops.nested_ops->leave_nested(vcpu);
+}
+
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
unsigned nr, bool has_error, u32 error_code,
bool has_payload, unsigned long payload, bool reinject)
@@ -5195,7 +5201,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) {
- kvm_x86_ops.nested_ops->leave_nested(vcpu);
+ kvm_leave_nested(vcpu);
kvm_smm_changed(vcpu, events->smi.smm);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9697df25143 ("KVM: x86: add kvm_leave_nested")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9697df251438b0798780900e8b43bdb12a56d64 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:45 +0200
Subject: [PATCH] KVM: x86: add kvm_leave_nested
add kvm_leave_nested which wraps a call to nested_ops->leave_nested
into a function.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-4-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b02a3a1792f1..7354f0035a69 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1146,9 +1146,6 @@ void svm_free_nested(struct vcpu_svm *svm)
svm->nested.initialized = false;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void svm_leave_nested(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..f7333b9cdfbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6440,9 +6440,6 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
return kvm_state.size;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void vmx_leave_nested(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..ff5be7189237 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -628,6 +628,12 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
ex->payload = payload;
}
+/* Forcibly leave the nested mode in cases like a vCPU reset */
+static void kvm_leave_nested(struct kvm_vcpu *vcpu)
+{
+ kvm_x86_ops.nested_ops->leave_nested(vcpu);
+}
+
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
unsigned nr, bool has_error, u32 error_code,
bool has_payload, unsigned long payload, bool reinject)
@@ -5195,7 +5201,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) {
- kvm_x86_ops.nested_ops->leave_nested(vcpu);
+ kvm_leave_nested(vcpu);
kvm_smm_changed(vcpu, events->smi.smm);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9697df25143 ("KVM: x86: add kvm_leave_nested")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9697df251438b0798780900e8b43bdb12a56d64 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:45 +0200
Subject: [PATCH] KVM: x86: add kvm_leave_nested
add kvm_leave_nested which wraps a call to nested_ops->leave_nested
into a function.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-4-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b02a3a1792f1..7354f0035a69 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1146,9 +1146,6 @@ void svm_free_nested(struct vcpu_svm *svm)
svm->nested.initialized = false;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void svm_leave_nested(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..f7333b9cdfbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6440,9 +6440,6 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
return kvm_state.size;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void vmx_leave_nested(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..ff5be7189237 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -628,6 +628,12 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
ex->payload = payload;
}
+/* Forcibly leave the nested mode in cases like a vCPU reset */
+static void kvm_leave_nested(struct kvm_vcpu *vcpu)
+{
+ kvm_x86_ops.nested_ops->leave_nested(vcpu);
+}
+
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
unsigned nr, bool has_error, u32 error_code,
bool has_payload, unsigned long payload, bool reinject)
@@ -5195,7 +5201,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) {
- kvm_x86_ops.nested_ops->leave_nested(vcpu);
+ kvm_leave_nested(vcpu);
kvm_smm_changed(vcpu, events->smi.smm);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9697df25143 ("KVM: x86: add kvm_leave_nested")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9697df251438b0798780900e8b43bdb12a56d64 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:45 +0200
Subject: [PATCH] KVM: x86: add kvm_leave_nested
add kvm_leave_nested which wraps a call to nested_ops->leave_nested
into a function.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-4-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b02a3a1792f1..7354f0035a69 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1146,9 +1146,6 @@ void svm_free_nested(struct vcpu_svm *svm)
svm->nested.initialized = false;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void svm_leave_nested(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c62352dda6a..f7333b9cdfbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6440,9 +6440,6 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
return kvm_state.size;
}
-/*
- * Forcibly leave nested mode in order to be able to reset the VCPU later on.
- */
void vmx_leave_nested(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..ff5be7189237 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -628,6 +628,12 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
ex->payload = payload;
}
+/* Forcibly leave the nested mode in cases like a vCPU reset */
+static void kvm_leave_nested(struct kvm_vcpu *vcpu)
+{
+ kvm_x86_ops.nested_ops->leave_nested(vcpu);
+}
+
static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
unsigned nr, bool has_error, u32 error_code,
bool has_payload, unsigned long payload, bool reinject)
@@ -5195,7 +5201,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) {
- kvm_x86_ops.nested_ops->leave_nested(vcpu);
+ kvm_leave_nested(vcpu);
kvm_smm_changed(vcpu, events->smi.smm);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
917401f26a6a ("KVM: x86: nSVM: leave nested mode on vCPU free")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 917401f26a6af5756d89b550a8e1bd50cf42b07e Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:43 +0200
Subject: [PATCH] KVM: x86: nSVM: leave nested mode on vCPU free
If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.
Soon a warning will be added for this condition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-2-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9f88c8e6766e..098f04bec8ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1438,6 +1438,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
*/
svm_clear_current_vmcb(svm->vmcb);
+ svm_leave_nested(vcpu);
svm_free_nested(svm);
sev_free_vcpu(vcpu);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
917401f26a6a ("KVM: x86: nSVM: leave nested mode on vCPU free")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 917401f26a6af5756d89b550a8e1bd50cf42b07e Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:43 +0200
Subject: [PATCH] KVM: x86: nSVM: leave nested mode on vCPU free
If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.
Soon a warning will be added for this condition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-2-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9f88c8e6766e..098f04bec8ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1438,6 +1438,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
*/
svm_clear_current_vmcb(svm->vmcb);
+ svm_leave_nested(vcpu);
svm_free_nested(svm);
sev_free_vcpu(vcpu);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
917401f26a6a ("KVM: x86: nSVM: leave nested mode on vCPU free")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 917401f26a6af5756d89b550a8e1bd50cf42b07e Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:43 +0200
Subject: [PATCH] KVM: x86: nSVM: leave nested mode on vCPU free
If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.
Soon a warning will be added for this condition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-2-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9f88c8e6766e..098f04bec8ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1438,6 +1438,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
*/
svm_clear_current_vmcb(svm->vmcb);
+ svm_leave_nested(vcpu);
svm_free_nested(svm);
sev_free_vcpu(vcpu);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
917401f26a6a ("KVM: x86: nSVM: leave nested mode on vCPU free")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 917401f26a6af5756d89b550a8e1bd50cf42b07e Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:43 +0200
Subject: [PATCH] KVM: x86: nSVM: leave nested mode on vCPU free
If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.
Soon a warning will be added for this condition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-2-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9f88c8e6766e..098f04bec8ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1438,6 +1438,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
*/
svm_clear_current_vmcb(svm->vmcb);
+ svm_leave_nested(vcpu);
svm_free_nested(svm);
sev_free_vcpu(vcpu);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
16ae56d7e052 ("KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16ae56d7e0528559bf8dc9070e3bfd8ba3de80df Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:44 +0200
Subject: [PATCH] KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02
while still in use
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-3-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..b02a3a1792f1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,6 +1125,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
16ae56d7e052 ("KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16ae56d7e0528559bf8dc9070e3bfd8ba3de80df Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:44 +0200
Subject: [PATCH] KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02
while still in use
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-3-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..b02a3a1792f1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,6 +1125,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
16ae56d7e052 ("KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16ae56d7e0528559bf8dc9070e3bfd8ba3de80df Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:44 +0200
Subject: [PATCH] KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02
while still in use
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-3-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..b02a3a1792f1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,6 +1125,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
16ae56d7e052 ("KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use")
2fcf4876ada8 ("KVM: nSVM: implement on demand allocation of the nested state")
72f211ecaa80 ("KVM: x86: allow kvm_x86_ops.set_efer to return an error value")
fd6fa73d1337 ("KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied")
476c9bd8e997 ("KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs")
d85a8034c016 ("KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"")
eb3db1b13788 ("KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"")
ce833b2324ba ("KVM: VMX: Prepend "MAX_" to MSR array size defines")
7e34fbd05c63 ("KVM: x86: Rename "shared_msrs" to "user_return_msrs"")
8d22b90e942c ("KVM: SVM: refactor exit labels in svm_create_vcpu")
0681de1b8369 ("KVM: SVM: use __GFP_ZERO instead of clear_page")
f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation")
0dd16b5b0c9b ("KVM: nSVM: rename nested vmcb to vmcb12")
1feaba144cd3 ("KVM: SVM: rename a variable in the svm_create_vcpu")
bf3c0e5e7102 ("Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16ae56d7e0528559bf8dc9070e3bfd8ba3de80df Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:44 +0200
Subject: [PATCH] KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02
while still in use
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-3-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..b02a3a1792f1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,6 +1125,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset")
0aa1837533e5 ("KVM: x86: Properly reset MMU context at vCPU RESET/INIT")
b3646477d458 ("KVM: x86: use static calls to reduce kvm_x86_ops overhead")
15b51dc08a34 ("KVM: x86: Take KVM's SRCU lock only if steal time update is needed")
19979fba9bfa ("KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()")
5719455fbd95 ("KVM: SVM: Do not report support for SMM for an SEV-ES guest")
f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
f9a4d621761a ("KVM: x86: introduce complete_emulated_msr callback")
8b474427cbee ("KVM: x86: use kvm_complete_insn_gp in emulating RDMSR/WRMSR")
2259c17f0188 ("kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions")
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
c21d54f0307f ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl")
ee69c92bac61 ("KVM: x86: Return bool instead of int for CR4 and SREGS validity checks")
c2fe3cd4604a ("KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook")
a447e38a7fad ("KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()")
d3a9e4146a6f ("KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()")
a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
1d8dd6b3f12b ("kvm: x86/mmu: Support changed pte notifier in tdp MMU")
f8e144971c68 ("kvm: x86/mmu: Add access tracking for tdp_mmu")
063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed129ec9057f89d615ba0c81a4984a90345a1684 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:46 +0200
Subject: [PATCH] KVM: x86: forcibly leave nested mode on vCPU reset
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-5-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff5be7189237..597d7f804d72 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12003,8 +12003,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset")
0aa1837533e5 ("KVM: x86: Properly reset MMU context at vCPU RESET/INIT")
b3646477d458 ("KVM: x86: use static calls to reduce kvm_x86_ops overhead")
15b51dc08a34 ("KVM: x86: Take KVM's SRCU lock only if steal time update is needed")
19979fba9bfa ("KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()")
5719455fbd95 ("KVM: SVM: Do not report support for SMM for an SEV-ES guest")
f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
f9a4d621761a ("KVM: x86: introduce complete_emulated_msr callback")
8b474427cbee ("KVM: x86: use kvm_complete_insn_gp in emulating RDMSR/WRMSR")
2259c17f0188 ("kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions")
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
c21d54f0307f ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl")
ee69c92bac61 ("KVM: x86: Return bool instead of int for CR4 and SREGS validity checks")
c2fe3cd4604a ("KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook")
a447e38a7fad ("KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()")
d3a9e4146a6f ("KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()")
a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
1d8dd6b3f12b ("kvm: x86/mmu: Support changed pte notifier in tdp MMU")
f8e144971c68 ("kvm: x86/mmu: Add access tracking for tdp_mmu")
063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed129ec9057f89d615ba0c81a4984a90345a1684 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:46 +0200
Subject: [PATCH] KVM: x86: forcibly leave nested mode on vCPU reset
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-5-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff5be7189237..597d7f804d72 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12003,8 +12003,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset")
0aa1837533e5 ("KVM: x86: Properly reset MMU context at vCPU RESET/INIT")
b3646477d458 ("KVM: x86: use static calls to reduce kvm_x86_ops overhead")
15b51dc08a34 ("KVM: x86: Take KVM's SRCU lock only if steal time update is needed")
19979fba9bfa ("KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()")
5719455fbd95 ("KVM: SVM: Do not report support for SMM for an SEV-ES guest")
f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
f9a4d621761a ("KVM: x86: introduce complete_emulated_msr callback")
8b474427cbee ("KVM: x86: use kvm_complete_insn_gp in emulating RDMSR/WRMSR")
2259c17f0188 ("kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions")
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
c21d54f0307f ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl")
ee69c92bac61 ("KVM: x86: Return bool instead of int for CR4 and SREGS validity checks")
c2fe3cd4604a ("KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook")
a447e38a7fad ("KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()")
d3a9e4146a6f ("KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()")
a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
1d8dd6b3f12b ("kvm: x86/mmu: Support changed pte notifier in tdp MMU")
f8e144971c68 ("kvm: x86/mmu: Add access tracking for tdp_mmu")
063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed129ec9057f89d615ba0c81a4984a90345a1684 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:46 +0200
Subject: [PATCH] KVM: x86: forcibly leave nested mode on vCPU reset
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-5-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff5be7189237..597d7f804d72 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12003,8 +12003,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset")
0aa1837533e5 ("KVM: x86: Properly reset MMU context at vCPU RESET/INIT")
b3646477d458 ("KVM: x86: use static calls to reduce kvm_x86_ops overhead")
15b51dc08a34 ("KVM: x86: Take KVM's SRCU lock only if steal time update is needed")
19979fba9bfa ("KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()")
5719455fbd95 ("KVM: SVM: Do not report support for SMM for an SEV-ES guest")
f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
f9a4d621761a ("KVM: x86: introduce complete_emulated_msr callback")
8b474427cbee ("KVM: x86: use kvm_complete_insn_gp in emulating RDMSR/WRMSR")
2259c17f0188 ("kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions")
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
c21d54f0307f ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl")
ee69c92bac61 ("KVM: x86: Return bool instead of int for CR4 and SREGS validity checks")
c2fe3cd4604a ("KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook")
a447e38a7fad ("KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()")
d3a9e4146a6f ("KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()")
a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
1d8dd6b3f12b ("kvm: x86/mmu: Support changed pte notifier in tdp MMU")
f8e144971c68 ("kvm: x86/mmu: Add access tracking for tdp_mmu")
063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed129ec9057f89d615ba0c81a4984a90345a1684 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:46 +0200
Subject: [PATCH] KVM: x86: forcibly leave nested mode on vCPU reset
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-5-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff5be7189237..597d7f804d72 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12003,8 +12003,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset")
0aa1837533e5 ("KVM: x86: Properly reset MMU context at vCPU RESET/INIT")
b3646477d458 ("KVM: x86: use static calls to reduce kvm_x86_ops overhead")
15b51dc08a34 ("KVM: x86: Take KVM's SRCU lock only if steal time update is needed")
19979fba9bfa ("KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()")
5719455fbd95 ("KVM: SVM: Do not report support for SMM for an SEV-ES guest")
f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
f9a4d621761a ("KVM: x86: introduce complete_emulated_msr callback")
8b474427cbee ("KVM: x86: use kvm_complete_insn_gp in emulating RDMSR/WRMSR")
2259c17f0188 ("kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions")
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
c21d54f0307f ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl")
ee69c92bac61 ("KVM: x86: Return bool instead of int for CR4 and SREGS validity checks")
c2fe3cd4604a ("KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook")
a447e38a7fad ("KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()")
d3a9e4146a6f ("KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed129ec9057f89d615ba0c81a4984a90345a1684 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 3 Nov 2022 16:13:46 +0200
Subject: [PATCH] KVM: x86: forcibly leave nested mode on vCPU reset
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20221103141351.50662-5-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff5be7189237..597d7f804d72 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12003,8 +12003,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
47b0c2e4c220 ("KVM: x86/mmu: Fix race condition in direct_page_fault")
a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
e710c5f6be0e ("KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault")
73a3c659478a ("KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault")
3c8ad5a675d9 ("KVM: MMU: change fast_page_fault() arguments to kvm_page_fault")
cdc47767a039 ("KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to kvm_page_fault")
2f6305dd5676 ("KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault")
9c03b1821a89 ("KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault")
43b74355ef8b ("KVM: MMU: change __direct_map() arguments to kvm_page_fault")
3a13f4fea3c1 ("KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault")
3647cd04b7d0 ("KVM: MMU: change kvm_faultin_pfn() arguments to kvm_page_fault")
b8a5d5511515 ("KVM: MMU: change page_fault_handle_page_track() arguments to kvm_page_fault")
4326e57ef40a ("KVM: MMU: change direct_page_fault() arguments to kvm_page_fault")
c501040abc42 ("KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault")
6defd9bb178c ("KVM: MMU: Introduce struct kvm_page_fault")
d055f028a533 ("KVM: MMU: pass unadulterated gpa to direct_page_fault")
f1c4a88c41ea ("KVM: X86: Don't unsync pagetables when speculative")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 47b0c2e4c220f2251fd8dcfbb44479819c715e15 Mon Sep 17 00:00:00 2001
From: Kazuki Takiguchi <takiguchi.kazuki171(a)gmail.com>
Date: Wed, 23 Nov 2022 14:36:00 -0500
Subject: [PATCH] KVM: x86/mmu: Fix race condition in direct_page_fault
make_mmu_pages_available() must be called with mmu_lock held for write.
However, if the TDP MMU is used, it will be called with mmu_lock held for
read.
This function does nothing unless shadow pages are used, so there is no
race unless nested TDP is used.
Since nested TDP uses shadow pages, old shadow pages may be zapped by this
function even when the TDP MMU is enabled.
Since shadow pages are never allocated by kvm_tdp_mmu_map(), a race
condition can be avoided by not calling make_mmu_pages_available() if the
TDP MMU is currently in use.
I encountered this when repeatedly starting and stopping nested VM.
It can be artificially caused by allocating a large number of nested TDP
SPTEs.
For example, the following BUG and general protection fault are caused in
the host kernel.
pte_list_remove: 00000000cd54fc10 many->many
------------[ cut here ]------------
kernel BUG at arch/x86/kvm/mmu/mmu.c:963!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:pte_list_remove.cold+0x16/0x48 [kvm]
Call Trace:
<TASK>
drop_spte+0xe0/0x180 [kvm]
mmu_page_zap_pte+0x4f/0x140 [kvm]
__kvm_mmu_prepare_zap_page+0x62/0x3e0 [kvm]
kvm_mmu_zap_oldest_mmu_pages+0x7d/0xf0 [kvm]
direct_page_fault+0x3cb/0x9b0 [kvm]
kvm_tdp_page_fault+0x2c/0xa0 [kvm]
kvm_mmu_page_fault+0x207/0x930 [kvm]
npf_interception+0x47/0xb0 [kvm_amd]
svm_invoke_exit_handler+0x13c/0x1a0 [kvm_amd]
svm_handle_exit+0xfc/0x2c0 [kvm_amd]
kvm_arch_vcpu_ioctl_run+0xa79/0x1780 [kvm]
kvm_vcpu_ioctl+0x29b/0x6f0 [kvm]
__x64_sys_ioctl+0x95/0xd0
do_syscall_64+0x5c/0x90
general protection fault, probably for non-canonical address
0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:kvm_mmu_commit_zap_page.part.0+0x4b/0xe0 [kvm]
Call Trace:
<TASK>
kvm_mmu_zap_oldest_mmu_pages+0xae/0xf0 [kvm]
direct_page_fault+0x3cb/0x9b0 [kvm]
kvm_tdp_page_fault+0x2c/0xa0 [kvm]
kvm_mmu_page_fault+0x207/0x930 [kvm]
npf_interception+0x47/0xb0 [kvm_amd]
CVE: CVE-2022-45869
Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Signed-off-by: Kazuki Takiguchi <takiguchi.kazuki171(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1ccb769f62af..b6f96d47e596 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2443,6 +2443,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
{
bool list_unstable, zapped_root = false;
+ lockdep_assert_held_write(&kvm->mmu_lock);
trace_kvm_mmu_prepare_zap_page(sp);
++kvm->stat.mmu_shadow_zapped;
*nr_zapped = mmu_zap_unsync_children(kvm, sp, invalid_list);
@@ -4262,14 +4263,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (is_page_fault_stale(vcpu, fault, mmu_seq))
goto out_unlock;
- r = make_mmu_pages_available(vcpu);
- if (r)
- goto out_unlock;
-
- if (is_tdp_mmu_fault)
+ if (is_tdp_mmu_fault) {
r = kvm_tdp_mmu_map(vcpu, fault);
- else
+ } else {
+ r = make_mmu_pages_available(vcpu);
+ if (r)
+ goto out_unlock;
r = __direct_map(vcpu, fault);
+ }
out_unlock:
if (is_tdp_mmu_fault)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a26a35e9019f ("io_uring: make poll refs more robust")
539bcb57da2f ("io_uring: fix tw losing poll events")
0638cd7be212 ("io_uring: clean poll ->private flagging")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
a9c210cebe13 ("io_uring: move epoll handler to its own file")
4cf90495281b ("io_uring: add a dummy -EOPNOTSUPP prep handler")
99f15d8d6136 ("io_uring: move uring_cmd handling to its own file")
cd40cae29ef8 ("io_uring: split out open/close operations")
453b329be5ea ("io_uring: separate out file table handling code")
f4c163dd7d4b ("io_uring: split out fadvise/madvise operations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a26a35e9019fd70bf3cf647dcfdae87abc7bacea Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Sun, 20 Nov 2022 16:57:42 +0000
Subject: [PATCH] io_uring: make poll refs more robust
poll_refs carry two functions, the first is ownership over the request.
The second is notifying the io_poll_check_events() that there was an
event but wake up couldn't grab the ownership, so io_poll_check_events()
should retry.
We want to make poll_refs more robust against overflows. Instead of
always incrementing it, which covers two purposes with one atomic, check
if poll_refs is elevated enough and if so set a retry flag without
attempts to grab ownership. The gap between the bias check and following
atomics may seem racy, but we don't need it to be strict. Moreover there
might only be maximum 4 parallel updates: by the first and the second
poll entries, __io_arm_poll_handler() and cancellation. From those four,
only poll wake ups may be executed multiple times, but they're protected
by a spin.
Cc: stable(a)vger.kernel.org
Reported-by: Lin Ma <linma(a)zju.edu.cn>
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/c762bc31f8683b3270f3587691348a7119ef9c9d.16689630…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 1b78b527075d..b444b7d87697 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -40,7 +40,14 @@ struct io_poll_table {
};
#define IO_POLL_CANCEL_FLAG BIT(31)
-#define IO_POLL_REF_MASK GENMASK(30, 0)
+#define IO_POLL_RETRY_FLAG BIT(30)
+#define IO_POLL_REF_MASK GENMASK(29, 0)
+
+/*
+ * We usually have 1-2 refs taken, 128 is more than enough and we want to
+ * maximise the margin between this amount and the moment when it overflows.
+ */
+#define IO_POLL_REF_BIAS 128
#define IO_WQE_F_DOUBLE 1
@@ -58,6 +65,21 @@ static inline bool wqe_is_double(struct wait_queue_entry *wqe)
return priv & IO_WQE_F_DOUBLE;
}
+static bool io_poll_get_ownership_slowpath(struct io_kiocb *req)
+{
+ int v;
+
+ /*
+ * poll_refs are already elevated and we don't have much hope for
+ * grabbing the ownership. Instead of incrementing set a retry flag
+ * to notify the loop that there might have been some change.
+ */
+ v = atomic_fetch_or(IO_POLL_RETRY_FLAG, &req->poll_refs);
+ if (v & IO_POLL_REF_MASK)
+ return false;
+ return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK);
+}
+
/*
* If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can
* bump it and acquire ownership. It's disallowed to modify requests while not
@@ -66,6 +88,8 @@ static inline bool wqe_is_double(struct wait_queue_entry *wqe)
*/
static inline bool io_poll_get_ownership(struct io_kiocb *req)
{
+ if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS))
+ return io_poll_get_ownership_slowpath(req);
return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK);
}
@@ -235,6 +259,16 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
*/
if ((v & IO_POLL_REF_MASK) != 1)
req->cqe.res = 0;
+ if (v & IO_POLL_RETRY_FLAG) {
+ req->cqe.res = 0;
+ /*
+ * We won't find new events that came in between
+ * vfs_poll and the ref put unless we clear the flag
+ * in advance.
+ */
+ atomic_andnot(IO_POLL_RETRY_FLAG, &req->poll_refs);
+ v &= ~IO_POLL_RETRY_FLAG;
+ }
/* the mask was stashed in __io_poll_execute */
if (!req->cqe.res) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2f3893437a4e ("io_uring: cmpxchg for poll arm refs release")
49f1c68e048f ("io_uring: optimise submission side poll_refs")
de08356f4858 ("io_uring: refactor poll arm error handling")
063a007996bf ("io_uring: change arm poll return values")
13a99017ff19 ("io_uring: remove events caching atavisms")
0638cd7be212 ("io_uring: clean poll ->private flagging")
48863ffd3e81 ("io_uring: clean up tracing events")
9ca9fb24d5fe ("io_uring: mutex locked poll hashing")
e6f89be61410 ("io_uring: introduce a struct for hash table")
a2cdd5193218 ("io_uring: pass hash table into poll_find")
0ec6dca22319 ("io_uring: use state completion infra for poll reqs")
8b1dfd343ae6 ("io_uring: clean up io_ring_ctx_alloc")
4a07723fb4bb ("io_uring: limit the number of cancellation buckets")
1ab1edb0a104 ("io_uring: pass poll_find lock back")
38513c464d3d ("io_uring: switch cancel_hash to use per entry spinlock")
aff5b2df9e8b ("io_uring: better caching for ctx timeout fields")
b9ba8a4463cd ("io_uring: add support for level triggered poll")
735729844819 ("io_uring: move rsrc related data, core, and commands")
3b77495a9723 ("io_uring: split provided buffers handling into its own file")
7aaff708a768 ("io_uring: move cancelation into its own file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2f3893437a4ebf2e892ca172e9e122841319d675 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Sun, 20 Nov 2022 16:57:41 +0000
Subject: [PATCH] io_uring: cmpxchg for poll arm refs release
Replace atomically substracting the ownership reference at the end of
arming a poll with a cmpxchg. We try to release ownership by setting 0
assuming that poll_refs didn't change while we were arming. If it did
change, we keep the ownership and use it to queue a tw, which is fully
capable to process all events and (even tolerates spurious wake ups).
It's a bit more elegant as we reduce races b/w setting the cancellation
flag and getting refs with this release, and with that we don't have to
worry about any kinds of underflows. It's not the fastest path for
polling. The performance difference b/w cmpxchg and atomic dec is
usually negligible and it's not the fastest path.
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/0c95251624397ea6def568ff040cad2d7926fd51.16689630…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 055632e9092a..1b78b527075d 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -518,7 +518,6 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
- int v;
INIT_HLIST_NODE(&req->hash_node);
req->work.cancel_seq = atomic_read(&ctx->cancel_seq);
@@ -586,11 +585,10 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
if (ipt->owning) {
/*
- * Release ownership. If someone tried to queue a tw while it was
- * locked, kick it off for them.
+ * Try to release ownership. If we see a change of state, e.g.
+ * poll was waken up, queue up a tw, it'll deal with it.
*/
- v = atomic_dec_return(&req->poll_refs);
- if (unlikely(v & IO_POLL_REF_MASK))
+ if (atomic_cmpxchg(&req->poll_refs, 1, 0) != 1)
__io_poll_execute(req, 0);
}
return 0;