From: Josip Pavic <Josip.Pavic(a)amd.com>
[ Upstream commit 8de297dc046c180651c0500f8611663ae1c3828a ]
[why]
In some cases MPC tree bottom pipe ends up point to itself. This causes
iterating from top to bottom to hang the system in an infinite loop.
[how]
When looping to next MPC bottom pipe, check that the pointer is not same
as current to avoid infinite loop.
Reviewed-by: Josip Pavic <Josip.Pavic(a)amd.com>
Reviewed-by: Jun Lei <Jun.Lei(a)amd.com>
Acked-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Aric Cyr <aric.cyr(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c | 6 ++++++
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
index 3fcd408e9103..855682590c1b 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
@@ -125,6 +125,12 @@ struct mpcc *mpc1_get_mpcc_for_dpp(struct mpc_tree *tree, int dpp_id)
while (tmp_mpcc != NULL) {
if (tmp_mpcc->dpp_id == dpp_id)
return tmp_mpcc;
+
+ /* avoid circular linked list */
+ ASSERT(tmp_mpcc != tmp_mpcc->mpcc_bot);
+ if (tmp_mpcc == tmp_mpcc->mpcc_bot)
+ break;
+
tmp_mpcc = tmp_mpcc->mpcc_bot;
}
return NULL;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
index 99cc095dc33c..a701ea56c0aa 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
@@ -533,6 +533,12 @@ struct mpcc *mpc2_get_mpcc_for_dpp(struct mpc_tree *tree, int dpp_id)
while (tmp_mpcc != NULL) {
if (tmp_mpcc->dpp_id == 0xf || tmp_mpcc->dpp_id == dpp_id)
return tmp_mpcc;
+
+ /* avoid circular linked list */
+ ASSERT(tmp_mpcc != tmp_mpcc->mpcc_bot);
+ if (tmp_mpcc == tmp_mpcc->mpcc_bot)
+ break;
+
tmp_mpcc = tmp_mpcc->mpcc_bot;
}
return NULL;
--
2.35.1
This is the start of the stable review cycle for the 4.9.326 release.
There are 98 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 26 Aug 2022 07:24:55 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.326-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.326-rc2
Nathan Chancellor <nathan(a)kernel.org>
MIPS: tlbex: Explicitly compare _PAGE_NO_EXEC against 0
Zheyu Ma <zheyuma97(a)gmail.com>
video: fbdev: i740fb: Check the argument of i740_calc_vclk()
Zhouyi Zhou <zhouzhouyi(a)gmail.com>
powerpc/64: Init jump labels before parse_early_param()
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Use deferred fasync helper
Takashi Iwai <tiwai(a)suse.de>
ALSA: core: Add async signal helpers
Liang He <windhl(a)126.com>
mips: cavium-octeon: Fix missing of_node_put() in octeon2_usb_clocks_start
Schspa Shi <schspa(a)gmail.com>
vfio: Clear the caps->buf to NULL after free
Liang He <windhl(a)126.com>
tty: serial: Fix refcount leak bug in ucc_uart.c
Kiselev, Oleg <okiselev(a)amazon.com>
ext4: avoid resizing to a partial cluster size
Ye Bin <yebin10(a)huawei.com>
ext4: avoid remove directory when directory is corrupted
Wentao_Liang <Wentao_Liang_g(a)163.com>
drivers:md:fix a potential use-after-free bug
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
cxl: Fix a memory leak in an error handling path
Jozef Martiniak <jomajm(a)gmail.com>
gadgetfs: ep_io - wait until IRQ finishes
Liang He <windhl(a)126.com>
usb: host: ohci-ppc-of: Fix refcount leak bug
Sai Prakash Ranjan <quic_saipraka(a)quicinc.com>
irqchip/tegra: Fix overflow implicit truncation warnings
Csókás Bence <csokas.bence(a)prolan.hu>
fec: Fix timer capture timing in `fec_ptp_enable_pps()`
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: really skip inactive sets when allocating name
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: add force_successful_syscall_return()
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: restarts apply only to the first sigframe we build...
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: fix syscall restart checks
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: traced syscall does need to check the syscall number
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: don't leave NULLs in sys_call_table[]
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: page fault et.al. are *not* restartable syscalls...
Duoming Zhou <duoming(a)zju.edu.cn>
atm: idt77252: fix use-after-free bugs caused by tst_timer
Dan Carpenter <dan.carpenter(a)oracle.com>
xen/xenbus: fix return type in xenbus_file_read()
Peilin Ye <peilin.ye(a)bytedance.com>
vsock: Fix memory leak in vsock_connect()
Nikita Travkin <nikita(a)trvn.ru>
pinctrl: qcom: msm8916: Allow CAMSS GP clocks to be muxed
Miaoqian Lin <linmq006(a)gmail.com>
pinctrl: nomadik: Fix refcount leak in nmk_pinctrl_dt_subnode_to_map
Trond Myklebust <trond.myklebust(a)hammerspace.com>
SUNRPC: Reinitialise the backchannel request buffers before reuse
Zhang Xianwei <zhang.xianwei8(a)zte.com.cn>
NFSv4.1: RECLAIM_COMPLETE must handle EACCES
Marc Kleine-Budde <mkl(a)pengutronix.de>
can: ems_usb: fix clang's -Wunaligned-access warning
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix lost error handling when looking up extended ref on log replay
Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
ata: libata-eh: Add missing command name
Mikulas Patocka <mpatocka(a)redhat.com>
rds: add missing barrier to release_refill
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ALSA: info: Fix llseek return value when using callback
Jamal Hadi Salim <jhs(a)mojatatu.com>
net_sched: cls_route: disallow handle of 0
Tyler Hicks <tyhicks(a)linux.microsoft.com>
net/9p: Initialize the iounit field during fid creation
Guenter Roeck <linux(a)roeck-us.net>
nios2: time: Read timer in get_cycles only if initialized
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression
Jose Alonso <joalonsof(a)gmail.com>
Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP"
Tony Battersby <tonyb(a)cybernetics.com>
scsi: sg: Allow waiting for commands to complete on removed device
Eric Dumazet <edumazet(a)google.com>
tcp: fix over estimation in sk_forced_mem_schedule()
Qu Wenruo <wqu(a)suse.com>
btrfs: reject log replay if there is unsupported RO compat flag
Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
net_sched: cls_route: remove from list when handle is 0
Mikulas Patocka <mpatocka(a)redhat.com>
dm raid: fix address sanitizer warning in raid_status
Baokun Li <libaokun1(a)huawei.com>
ext4: correct max_inline_xattr_value_size computing
Eric Whitney <enwlinux(a)gmail.com>
ext4: fix extent status tree race in writeback error recovery path
Theodore Ts'o <tytso(a)mit.edu>
ext4: update s_overhead_clusters in the superblock during an on-line resize
Baokun Li <libaokun1(a)huawei.com>
ext4: fix use-after-free in ext4_xattr_set_entry
Lukas Czerner <lczerner(a)redhat.com>
ext4: make sure ext4_append() always allocates new block
Baokun Li <libaokun1(a)huawei.com>
ext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h
David Collins <quic_collinsd(a)quicinc.com>
spmi: trace: fix stack-out-of-bound access in SPMI tracing functions
Alexander Lobakin <alexandr.lobakin(a)intel.com>
x86/olpc: fix 'logical not is only applied to the left hand side'
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: Fix missing auto port scan and thus missing target ports
Florian Westphal <fw(a)strlen.de>
netfilter: nf_tables: fix null deref due to zeroed list head
Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
USB: HCD: Fix URB giveback issue in tasklet function
Huacai Chen <chenhuacai(a)loongson.cn>
MIPS: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/powernv: Avoid crashing if rng is NULL
Pali Rohár <pali(a)kernel.org>
powerpc/fsl-pci: Fix Class Code of PCIe Root Port
Pali Rohár <pali(a)kernel.org>
PCI: Add defines for normal and subtractive PCI bridges
Alexander Lobakin <alexandr.lobakin(a)intel.com>
ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
Mikulas Patocka <mpatocka(a)redhat.com>
md-raid10: fix KASAN warning
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: limit nsec
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: fix overflow in prog accounting
Timur Tabi <ttabi(a)nvidia.com>
drm/nouveau: fix another off-by-one in nvbios_addr
Helge Deller <deller(a)gmx.de>
parisc: Fix device names in /proc/iomem
Lukas Wunner <lukas(a)wunner.de>
usbnet: Fix linkwatch use-after-free on disconnect
David Howells <dhowells(a)redhat.com>
vfs: Check the truncate maximum size in inode_newsize_ok()
Allen Ballway <ballway(a)chromium.org>
ALSA: hda/cirrus - support for iMac 12,1 model
Meng Tang <tangmeng(a)uniontech.com>
ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks
Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0
Mikulas Patocka <mpatocka(a)redhat.com>
add barriers to buffer_uptodate and set_buffer_uptodate
Zheyu Ma <zheyuma97(a)gmail.com>
ALSA: bcd2000: Fix a UAF bug on the error path of probing
Ning Qiang <sohu0106(a)126.com>
macintosh/adb: fix oob read in do_adb_query() function
Hans-Christian Noren Egtvedt <hegtvedt(a)cisco.com>
random: only call boot_init_stack_canary() once
Werner Sembach <wse(a)tuxedocomputers.com>
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
Werner Sembach <wse(a)tuxedocomputers.com>
ACPI: video: Force backlight native for some TongFang devices
Daniel Micay <danielmicay(a)gmail.com>
init/main.c: extract early boot entropy from the passed cmdline
Laura Abbott <lauraa(a)codeaurora.org>
init: move stack canary initialization after setup_arch
Viresh Kumar <viresh.kumar(a)linaro.org>
init/main: properly align the multi-line comment
Viresh Kumar <viresh.kumar(a)linaro.org>
init/main: Fix double "the" in comment
Christian Borntraeger <borntraeger(a)de.ibm.com>
include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap
Paul Moore <paul(a)paul-moore.com>
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
Tianyue Ren <rentianyue(a)kylinos.cn>
selinux: fix error initialization in inode_doinit_with_dentry()
Andreas Gruenbacher <agruenba(a)redhat.com>
selinux: Convert isec->lock into a spinlock
Andreas Gruenbacher <agruenba(a)redhat.com>
selinux: Clean up initialization of isec->sclass
Andreas Gruenbacher <agruenba(a)redhat.com>
proc: Pass file mode to proc_pid_make_inode
Andreas Gruenbacher <agruenba(a)redhat.com>
selinux: Minor cleanups
Nathan Chancellor <nathan(a)kernel.org>
ion: Make user_ion_handle_put_nolock() a void function
Wei Mingzhi <whistler(a)member.fsf.org>
mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle.
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
ARM: crypto: comment out gcc warning that breaks clang builds
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: do not allow packet truncation below transport header offset
Liang He <windhl(a)126.com>
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
Kuniyuki Iwashima <kuniyu(a)amazon.com>
net: ping6: Fix memleak in ipv6_renew_options().
Liang He <windhl(a)126.com>
scsi: ufs: host: Hold reference returned by of_parse_phandle()
ChenXiaoSong <chenxiaosong2(a)huawei.com>
ntfs: fix use-after-free in ntfs_ucsncmp()
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
-------------
Diffstat:
Makefile | 4 +-
arch/arm/lib/xor-neon.c | 3 +-
arch/ia64/include/asm/processor.h | 2 +-
arch/mips/cavium-octeon/octeon-platform.c | 3 +-
arch/mips/kernel/proc.c | 2 +-
arch/mips/mm/tlbex.c | 4 +-
arch/nios2/include/asm/entry.h | 3 +-
arch/nios2/include/asm/ptrace.h | 2 +
arch/nios2/kernel/entry.S | 22 +++--
arch/nios2/kernel/signal.c | 3 +-
arch/nios2/kernel/syscall_table.c | 1 +
arch/nios2/kernel/time.c | 5 +-
arch/parisc/kernel/drivers.c | 9 +-
arch/powerpc/kernel/prom.c | 7 ++
arch/powerpc/platforms/powernv/rng.c | 2 +
arch/powerpc/sysdev/fsl_pci.c | 8 ++
arch/powerpc/sysdev/fsl_pci.h | 1 +
arch/x86/kvm/emulate.c | 19 ++--
arch/x86/kvm/svm.c | 2 -
arch/x86/platform/olpc/olpc-xo1-sci.c | 2 +-
drivers/acpi/video_detect.c | 55 +++++++----
drivers/ata/libata-eh.c | 1 +
drivers/atm/idt77252.c | 1 +
drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.c | 2 +-
drivers/irqchip/irq-tegra.c | 10 +-
drivers/macintosh/adb.c | 2 +-
drivers/md/dm-raid.c | 2 +-
drivers/md/raid10.c | 5 +-
drivers/md/raid5.c | 2 +-
drivers/misc/cxl/irq.c | 1 +
drivers/net/can/usb/ems_usb.c | 2 +-
drivers/net/ethernet/freescale/fec_ptp.c | 6 +-
drivers/net/sungem_phy.c | 1 +
drivers/net/usb/ax88179_178a.c | 14 +--
drivers/net/usb/usbnet.c | 8 +-
drivers/net/wireless/mediatek/mt7601u/usb.c | 1 +
drivers/pinctrl/nomadik/pinctrl-nomadik.c | 4 +-
drivers/pinctrl/qcom/pinctrl-msm8916.c | 4 +-
drivers/s390/scsi/zfcp_fc.c | 29 ++++--
drivers/s390/scsi/zfcp_fc.h | 6 +-
drivers/s390/scsi/zfcp_fsf.c | 4 +-
drivers/scsi/sg.c | 57 ++++++-----
drivers/scsi/ufs/ufshcd-pltfrm.c | 15 ++-
drivers/staging/android/ion/ion-ioctl.c | 8 +-
drivers/tty/serial/ucc_uart.c | 2 +
drivers/usb/core/hcd.c | 26 ++---
drivers/usb/gadget/legacy/inode.c | 1 +
drivers/usb/host/ohci-ppc-of.c | 1 +
drivers/vfio/vfio.c | 1 +
drivers/video/fbdev/i740fb.c | 9 +-
drivers/xen/xenbus/xenbus_dev_frontend.c | 4 +-
fs/attr.c | 2 +
fs/btrfs/disk-io.c | 14 +++
fs/btrfs/tree-log.c | 4 +-
fs/ext4/inline.c | 3 +
fs/ext4/inode.c | 7 ++
fs/ext4/namei.c | 23 ++++-
fs/ext4/resize.c | 11 +++
fs/ext4/xattr.c | 6 +-
fs/ext4/xattr.h | 13 +++
fs/fuse/inode.c | 6 ++
fs/nfs/nfs4proc.c | 3 +
fs/ntfs/attrib.c | 8 +-
fs/proc/base.c | 23 ++---
fs/proc/fd.c | 6 +-
fs/proc/internal.h | 2 +-
fs/proc/namespaces.c | 3 +-
include/linux/bpf.h | 11 +++
include/linux/buffer_head.h | 25 ++++-
include/linux/pci_ids.h | 2 +
include/linux/usb/hcd.h | 1 +
include/net/bluetooth/l2cap.h | 1 +
include/sound/core.h | 8 ++
include/trace/events/spmi.h | 12 +--
include/uapi/linux/swab.h | 4 +-
init/main.c | 14 +--
kernel/bpf/core.c | 16 ++-
kernel/bpf/syscall.c | 36 +++++--
net/9p/client.c | 4 +-
net/bluetooth/l2cap_core.c | 68 +++++++++----
net/ipv4/tcp_output.c | 7 +-
net/ipv6/ping.c | 6 ++
net/netfilter/nf_tables_api.c | 3 +-
net/netfilter/nfnetlink_queue.c | 7 +-
net/rds/ib_recv.c | 1 +
net/sched/cls_route.c | 8 +-
net/sunrpc/backchannel_rqst.c | 14 +++
net/vmw_vsock/af_vsock.c | 9 +-
security/selinux/hooks.c | 123 +++++++++++++++---------
security/selinux/include/objsec.h | 5 +-
security/selinux/selinuxfs.c | 4 +-
sound/core/info.c | 6 +-
sound/core/misc.c | 94 ++++++++++++++++++
sound/core/timer.c | 11 ++-
sound/pci/hda/patch_cirrus.c | 1 +
sound/pci/hda/patch_conexant.c | 11 ++-
sound/usb/bcd2000/bcd2000.c | 3 +-
97 files changed, 743 insertions(+), 294 deletions(-)
commit 5535be3099717646781ce1540cf725965d680e7b upstream.
Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know
that FOLL_FORCE can be possibly dangerous, especially if there are races
that can be exploited by user space.
Right now, it would be sufficient to have some code that sets a PTE of a
R/O-mapped shared page dirty, in order for it to erroneously become
writable by FOLL_FORCE. The implications of setting a write-protected PTE
dirty might not be immediately obvious to everyone.
And in fact ever since commit 9ae0f87d009c ("mm/shmem: unconditionally set
pte dirty in mfill_atomic_install_pte"), we can use UFFDIO_CONTINUE to map
a shmem page R/O while marking the pte dirty. This can be used by
unprivileged user space to modify tmpfs/shmem file content even if the
user does not have write permissions to the file, and to bypass memfd
write sealing -- Dirty COW restricted to tmpfs/shmem (CVE-2022-2590).
To fix such security issues for good, the insight is that we really only
need that fancy retry logic (FOLL_COW) for COW mappings that are not
writable (!VM_WRITE). And in a COW mapping, we really only broke COW if
we have an exclusive anonymous page mapped. If we have something else
mapped, or the mapped anonymous page might be shared (!PageAnonExclusive),
we have to trigger a write fault to break COW. If we don't find an
exclusive anonymous page when we retry, we have to trigger COW breaking
once again because something intervened.
Let's move away from this mandatory-retry + dirty handling and rely on our
PageAnonExclusive() flag for making a similar decision, to use the same
COW logic as in other kernel parts here as well. In case we stumble over
a PTE in a COW mapping that does not map an exclusive anonymous page, COW
was not properly broken and we have to trigger a fake write-fault to break
COW.
Just like we do in can_change_pte_writable() added via commit 64fe24a3e05e
("mm/mprotect: try avoiding write faults for exclusive anonymous pages
when changing protection") and commit 76aefad628aa ("mm/mprotect: fix
soft-dirty check in can_change_pte_writable()"), take care of softdirty
and uffd-wp manually.
For example, a write() via /proc/self/mem to a uffd-wp-protected range has
to fail instead of silently granting write access and bypassing the
userspace fault handler. Note that FOLL_FORCE is not only used for debug
access, but also triggered by applications without debug intentions, for
example, when pinning pages via RDMA.
This fixes CVE-2022-2590. Note that only x86_64 and aarch64 are
affected, because only those support CONFIG_HAVE_ARCH_USERFAULTFD_MINOR.
Fortunately, FOLL_COW is no longer required to handle FOLL_FORCE. So
let's just get rid of it.
Thanks to Nadav Amit for pointing out that the pte_dirty() check in
FOLL_FORCE code is problematic and might be exploitable.
Note 1: We don't check for the PTE being dirty because it doesn't matter
for making a "was COWed" decision anymore, and whoever modifies the
page has to set the page dirty either way.
Note 2: Kernels before extended uffd-wp support and before
PageAnonExclusive (< 5.19) can simply revert the problematic
commit instead and be safe regarding UFFDIO_CONTINUE. A backport to
v5.19 requires minor adjustments due to lack of
vma_soft_dirty_enabled().
Link: https://lkml.kernel.org/r/20220809205640.70916-1-david@redhat.com
Fixes: 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: David Laight <David.Laight(a)ACULAB.COM>
Cc: <stable(a)vger.kernel.org> [5.16]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
include/linux/mm.h | 1 -
mm/gup.c | 69 +++++++++++++++++++++++++++++++---------------
mm/huge_memory.c | 65 +++++++++++++++++++++++++++++--------------
3 files changed, 91 insertions(+), 44 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7898e29bcfb5..25b8860f47cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2939,7 +2939,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
#define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
-#define FOLL_COW 0x4000 /* internal GUP flag */
#define FOLL_ANON 0x8000 /* don't do file mappings */
#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */
#define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */
diff --git a/mm/gup.c b/mm/gup.c
index e2a39e30756d..d2fd46b50102 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -478,14 +478,43 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
return -EEXIST;
}
-/*
- * FOLL_FORCE can write to even unwritable pte's, but only
- * after we've gone through a COW cycle and they are dirty.
- */
-static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
+/* FOLL_FORCE can write to even unwritable PTEs in COW mappings. */
+static inline bool can_follow_write_pte(pte_t pte, struct page *page,
+ struct vm_area_struct *vma,
+ unsigned int flags)
{
- return pte_write(pte) ||
- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+ /* If the pte is writable, we can write to the page. */
+ if (pte_write(pte))
+ return true;
+
+ /* Maybe FOLL_FORCE is set to override it? */
+ if (!(flags & FOLL_FORCE))
+ return false;
+
+ /* But FOLL_FORCE has no effect on shared mappings */
+ if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
+ return false;
+
+ /* ... or read-only private ones */
+ if (!(vma->vm_flags & VM_MAYWRITE))
+ return false;
+
+ /* ... or already writable ones that just need to take a write fault */
+ if (vma->vm_flags & VM_WRITE)
+ return false;
+
+ /*
+ * See can_change_pte_writable(): we broke COW and could map the page
+ * writable if we have an exclusive anonymous page ...
+ */
+ if (!page || !PageAnon(page) || !PageAnonExclusive(page))
+ return false;
+
+ /* ... and a write-fault isn't required for other reasons. */
+ if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) &&
+ !(vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
+ return false;
+ return !userfaultfd_pte_wp(vma, pte);
}
static struct page *follow_page_pte(struct vm_area_struct *vma,
@@ -528,12 +557,19 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
}
if ((flags & FOLL_NUMA) && pte_protnone(pte))
goto no_page;
- if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
- pte_unmap_unlock(ptep, ptl);
- return NULL;
- }
page = vm_normal_page(vma, address, pte);
+
+ /*
+ * We only care about anon pages in can_follow_write_pte() and don't
+ * have to worry about pte_devmap() because they are never anon.
+ */
+ if ((flags & FOLL_WRITE) &&
+ !can_follow_write_pte(pte, page, vma, flags)) {
+ page = NULL;
+ goto out;
+ }
+
if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
/*
* Only return device mapping pages in the FOLL_GET or FOLL_PIN
@@ -967,17 +1003,6 @@ static int faultin_page(struct vm_area_struct *vma,
return -EBUSY;
}
- /*
- * The VM_FAULT_WRITE bit tells us that do_wp_page has broken COW when
- * necessary, even if maybe_mkwrite decided not to set pte_write. We
- * can thus safely do subsequent page lookups as if they were reads.
- * But only do so when looping for pte_write is futile: in some cases
- * userspace may also be wanting to write to the gotten user page,
- * which a read fault here might prevent (a readonly page might get
- * reCOWed by userspace write).
- */
- if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
- *flags |= FOLL_COW;
return 0;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 834f288b3769..164d13b62079 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -977,12 +977,6 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
assert_spin_locked(pmd_lockptr(mm, pmd));
- /*
- * When we COW a devmap PMD entry, we split it into PTEs, so we should
- * not be in this function with `flags & FOLL_COW` set.
- */
- WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");
-
/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
@@ -1348,14 +1342,43 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
return VM_FAULT_FALLBACK;
}
-/*
- * FOLL_FORCE can write to even unwritable pmd's, but only
- * after we've gone through a COW cycle and they are dirty.
- */
-static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
+/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */
+static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
+ struct vm_area_struct *vma,
+ unsigned int flags)
{
- return pmd_write(pmd) ||
- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+ /* If the pmd is writable, we can write to the page. */
+ if (pmd_write(pmd))
+ return true;
+
+ /* Maybe FOLL_FORCE is set to override it? */
+ if (!(flags & FOLL_FORCE))
+ return false;
+
+ /* But FOLL_FORCE has no effect on shared mappings */
+ if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
+ return false;
+
+ /* ... or read-only private ones */
+ if (!(vma->vm_flags & VM_MAYWRITE))
+ return false;
+
+ /* ... or already writable ones that just need to take a write fault */
+ if (vma->vm_flags & VM_WRITE)
+ return false;
+
+ /*
+ * See can_change_pte_writable(): we broke COW and could map the page
+ * writable if we have an exclusive anonymous page ...
+ */
+ if (!page || !PageAnon(page) || !PageAnonExclusive(page))
+ return false;
+
+ /* ... and a write-fault isn't required for other reasons. */
+ if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) &&
+ !(vma->vm_flags & VM_SOFTDIRTY) && !pmd_soft_dirty(pmd))
+ return false;
+ return !userfaultfd_huge_pmd_wp(vma, pmd);
}
struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
@@ -1364,12 +1387,16 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
unsigned int flags)
{
struct mm_struct *mm = vma->vm_mm;
- struct page *page = NULL;
+ struct page *page;
assert_spin_locked(pmd_lockptr(mm, pmd));
- if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags))
- goto out;
+ page = pmd_page(*pmd);
+ VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
+
+ if ((flags & FOLL_WRITE) &&
+ !can_follow_write_pmd(*pmd, page, vma, flags))
+ return NULL;
/* Avoid dumping huge zero page */
if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
@@ -1377,10 +1404,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
/* Full NUMA hinting faults to serialise migration in fault paths */
if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
- goto out;
-
- page = pmd_page(*pmd);
- VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
+ return NULL;
if (!pmd_write(*pmd) && gup_must_unshare(flags, page))
return ERR_PTR(-EMLINK);
@@ -1397,7 +1421,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);
-out:
return page;
}
--
2.37.1
When running identity mapped and depending on the kernel configuration,
it is possible that cc_platform_has() can have compiler generated code
that uses jump tables. This causes a boot failure because the jump table
uses un-mapped kernel virtual addresses, not identity mapped addresses.
This has been seen with CONFIG_RETPOLINE=n.
Similar to sme_encrypt_kernel(), use an open-coded direct check for the
status of SNP rather than trying to eliminate the jump table. This
preserves any code optimization in cc_platform_has() that can be useful
post boot. It also limits the changes to SEV-specific files so that
future compiler features won't necessarily require possible build changes
just because they are not compatible with running identity mapped.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216385
Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/
Cc: <stable(a)vger.kernel.org> # 5.19.x
Reported-by: Sean Christopherson <seanjc(a)google.com>
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
---
arch/x86/kernel/sev.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 63dc626627a0..4f84c3f11af5 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -701,7 +701,13 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage
void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned int npages)
{
- if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ /*
+ * This can be invoked in early boot while running identity mapped, so
+ * use an open coded check for SNP instead of using cc_platform_has().
+ * This eliminates worries about jump tables or checking boot_cpu_data
+ * in the cc_platform_has() function.
+ */
+ if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
return;
/*
@@ -717,7 +723,13 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned int npages)
{
- if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ /*
+ * This can be invoked in early boot while running identity mapped, so
+ * use an open coded check for SNP instead of using cc_platform_has().
+ * This eliminates worries about jump tables or checking boot_cpu_data
+ * in the cc_platform_has() function.
+ */
+ if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
return;
/* Invalidate the memory pages before they are marked shared in the RMP table. */
--
2.37.2
Do changesets that already included the "Fixes:" tag in the commit
description also need to include the "Cc: stable(a)vger.kernel.org" in
order to be included in stable?
--
Thanks,
Steve