A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
a double fetch race condition vulnerability and invoke the VMGEXIT
handler recursively.
sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
fetches the exit code using ghcb_get_sw_exit_code(). Soon after,
sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
page is shared with the guest, the guest is able to quickly swap the
values with another vCPU and hence bypass the validation. One vmexit code
that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
if sev_handle_vmgexit() observes it in the second fetch, the call
to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
recursively.
To avoid the race, always fetch the GHCB data from the places where
sev_es_sync_from_ghcb stores it.
Exploiting recursions on linux kernel has been proven feasible
in the past, but the impact is mitigated by stack guard pages
(CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler
multiple times, they can theoretically trigger a stack overflow and
cause a denial-of-service, or potentially guest-to-host escape in kernel
configurations without stack guard pages.
Note that winning the race reliably in every iteration is very tricky
due to the very tight window of the fetches; depending on the compiler
settings, they are often consecutive because of optimization and inlining.
Tested by booting an SEV-ES RHEL9 guest.
Fixes: CVE-2023-4155
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable(a)vger.kernel.org
Reported-by: Andy Nguyen <theflow(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/svm/sev.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e898f0b2b0ba..ca4ba5fe9a01 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2445,9 +2445,15 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}
+static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control)
+{
+ return (((u64)control->exit_code_hi) << 32) | control->exit_code;
+}
+
static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
{
- struct kvm_vcpu *vcpu;
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
struct ghcb *ghcb;
u64 exit_code;
u64 reason;
@@ -2458,7 +2464,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
* Retrieve the exit code now even though it may not be marked valid
* as it could help with debugging.
*/
- exit_code = ghcb_get_sw_exit_code(ghcb);
+ exit_code = kvm_ghcb_get_sw_exit_code(control);
/* Only GHCB Usage code 0 is supported */
if (ghcb->ghcb_usage) {
@@ -2473,7 +2479,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
!kvm_ghcb_sw_exit_info_2_is_valid(svm))
goto vmgexit_err;
- switch (ghcb_get_sw_exit_code(ghcb)) {
+ switch (exit_code) {
case SVM_EXIT_READ_DR7:
break;
case SVM_EXIT_WRITE_DR7:
@@ -2490,18 +2496,18 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!kvm_ghcb_rax_is_valid(svm) ||
!kvm_ghcb_rcx_is_valid(svm))
goto vmgexit_err;
- if (ghcb_get_rax(ghcb) == 0xd)
+ if (vcpu->arch.regs[VCPU_REGS_RAX] == 0xd)
if (!kvm_ghcb_xcr0_is_valid(svm))
goto vmgexit_err;
break;
case SVM_EXIT_INVD:
break;
case SVM_EXIT_IOIO:
- if (ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_STR_MASK) {
+ if (control->exit_info_1 & SVM_IOIO_STR_MASK) {
if (!kvm_ghcb_sw_scratch_is_valid(svm))
goto vmgexit_err;
} else {
- if (!(ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_TYPE_MASK))
+ if (!(control->exit_info_1 & SVM_IOIO_TYPE_MASK))
if (!kvm_ghcb_rax_is_valid(svm))
goto vmgexit_err;
}
@@ -2509,7 +2515,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_EXIT_MSR:
if (!kvm_ghcb_rcx_is_valid(svm))
goto vmgexit_err;
- if (ghcb_get_sw_exit_info_1(ghcb)) {
+ if (control->exit_info_1) {
if (!kvm_ghcb_rax_is_valid(svm) ||
!kvm_ghcb_rdx_is_valid(svm))
goto vmgexit_err;
@@ -2553,8 +2559,6 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
return 0;
vmgexit_err:
- vcpu = &svm->vcpu;
-
if (reason == GHCB_ERR_INVALID_USAGE) {
vcpu_unimpl(vcpu, "vmgexit: ghcb usage %#x is not valid\n",
ghcb->ghcb_usage);
@@ -2852,8 +2856,6 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
- exit_code = ghcb_get_sw_exit_code(ghcb);
-
sev_es_sync_from_ghcb(svm);
ret = sev_es_validate_vmgexit(svm);
if (ret)
@@ -2862,6 +2864,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ghcb_set_sw_exit_info_1(ghcb, 0);
ghcb_set_sw_exit_info_2(ghcb, 0);
+ exit_code = kvm_ghcb_get_sw_exit_code(control);
switch (exit_code) {
case SVM_VMGEXIT_MMIO_READ:
ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
--
2.39.0
From: Laszlo Ersek <lersek(a)redhat.com>
stable inclusion
from stable-v5.10.189
commit 5ea23f1cb67e4468db7ff651627892c9217fec24
category: bugfix
bugzilla: 189104, https://gitee.com/src-openeuler/kernel/issues/I7QXHX
CVE: CVE-2023-4194
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
---------------------------
commit 9bc3047374d5bec163e83e743709e23753376f0c upstream.
Commit a096ccca6e50 initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/net/tun" device node's owner UID. Per
original commit 86741ec25462 ("net: core: Add a UID field to struct
sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
userspace process that creates the socket. Commit 86741ec25462 mentions
socket() and accept(); with "tun", the action that creates the socket is
open("/dev/net/tun").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e50
has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Pietro Borrello <borrello(a)diag.uniroma1.it>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: a096ccca6e50 ("tun: tun_chr_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
---
drivers/net/tun.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f8feec522b32..50c2ce392cd1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3456,7 +3456,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
tfile->socket.file = file;
tfile->socket.ops = &tun_socket_ops;
- sock_init_data_uid(&tfile->socket, &tfile->sk, inode->i_uid);
+ sock_init_data_uid(&tfile->socket, &tfile->sk, current_fsuid());
tfile->sk.sk_write_space = tun_sock_write_space;
tfile->sk.sk_sndbuf = INT_MAX;
--
2.25.1
From: Laszlo Ersek <lersek(a)redhat.com>
stable inclusion
from stable-v5.10.189
commit 33a339e717be2c88b7ad11375165168d5b40e38e
category: bugfix
bugzilla: 189104, https://gitee.com/src-openeuler/kernel/issues/I7QXHX
CVE: CVE-2023-4194
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
---------------------------
commit 5c9241f3ceab3257abe2923a59950db0dc8bb737 upstream.
Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec25462 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec25462 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has
no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Pietro Borrello <borrello(a)diag.uniroma1.it>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
---
drivers/net/tap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index d9018d9fe310..2c9ae02ada3e 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -523,7 +523,7 @@ static int tap_open(struct inode *inode, struct file *file)
q->sock.state = SS_CONNECTED;
q->sock.file = file;
q->sock.ops = &tap_socket_ops;
- sock_init_data_uid(&q->sock, &q->sk, inode->i_uid);
+ sock_init_data_uid(&q->sock, &q->sk, current_fsuid());
q->sk.sk_write_space = tap_sock_write_space;
q->sk.sk_destruct = tap_sock_destruct;
q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
--
2.25.1
All small, fairly safe changes.
The following changes since commit 52a93d39b17dc7eb98b6aa3edb93943248e03b2f:
Linux 6.5-rc5 (2023-08-06 15:07:51 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to f55484fd7be923b740e8e1fc304070ba53675cb4:
virtio-mem: check if the config changed before fake offlining memory (2023-08-10 15:51:46 -0400)
----------------------------------------------------------------
virtio: bugfixes
just a bunch of bugfixes all over the place.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Allen Hubbe (2):
pds_vdpa: reset to vdpa specified mac
pds_vdpa: alloc irq vectors on DRIVER_OK
David Hildenbrand (4):
virtio-mem: remove unsafe unplug in Big Block Mode (BBM)
virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block Mode (SBM)
virtio-mem: check if the config changed before fake offlining memory
Dragos Tatulea (4):
vdpa: Enable strict validation for netlinks ops
vdpa/mlx5: Correct default number of queues when MQ is on
vdpa/mlx5: Fix mr->initialized semantics
vdpa/mlx5: Fix crash on shutdown for when no ndev exists
Eugenio Pérez (1):
vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
Feng Liu (1):
virtio-pci: Fix legacy device flag setting error in probe
Gal Pressman (1):
virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs()
Hawkins Jiawei (1):
virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case
Lin Ma (3):
vdpa: Add features attr to vdpa_nl_policy for nlattr length check
vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check
vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check
Maxime Coquelin (1):
vduse: Use proper spinlock for IRQ injection
Mike Christie (3):
vhost-scsi: Fix alignment handling with windows
vhost-scsi: Rename vhost_scsi_iov_to_sgl
MAINTAINERS: add vhost-scsi entry and myself as a co-maintainer
Shannon Nelson (4):
pds_vdpa: protect Makefile from unconfigured debugfs
pds_vdpa: always allow offering VIRTIO_NET_F_MAC
pds_vdpa: clean and reset vqs entries
pds_vdpa: fix up debugfs feature bit printing
Wolfram Sang (1):
virtio-mmio: don't break lifecycle of vm_dev
MAINTAINERS | 11 ++-
drivers/net/virtio_net.c | 2 +-
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 +
drivers/vdpa/mlx5/core/mr.c | 105 +++++++++++++++------
drivers/vdpa/mlx5/net/mlx5_vnet.c | 26 +++---
drivers/vdpa/pds/Makefile | 3 +-
drivers/vdpa/pds/debugfs.c | 15 ++-
drivers/vdpa/pds/vdpa_dev.c | 176 ++++++++++++++++++++++++----------
drivers/vdpa/pds/vdpa_dev.h | 5 +-
drivers/vdpa/vdpa.c | 9 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 8 +-
drivers/vhost/scsi.c | 187 ++++++++++++++++++++++++++++++++-----
drivers/virtio/virtio_mem.c | 168 ++++++++++++++++++++++-----------
drivers/virtio/virtio_mmio.c | 5 +-
drivers/virtio/virtio_pci_common.c | 2 -
drivers/virtio/virtio_pci_legacy.c | 1 +
drivers/virtio/virtio_vdpa.c | 2 +
17 files changed, 519 insertions(+), 208 deletions(-)
This is the start of the stable review cycle for the 4.14.323 release.
There are 26 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.323-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.323-rc1
Masahiro Yamada <masahiroy(a)kernel.org>
alpha: remove __init annotation from exported page_is_ram()
Zhu Wang <wangzhu9(a)huawei.com>
scsi: core: Fix possible memory leak if device_add() fails
Zhu Wang <wangzhu9(a)huawei.com>
scsi: snic: Fix possible memory leak if device_add() fails
Alexandra Diupina <adiupina(a)astralinux.ru>
scsi: 53c700: Check that command slot is not NULL
Michael Kelley <mikelley(a)microsoft.com>
scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
Tony Battersby <tonyb(a)cybernetics.com>
scsi: core: Fix legacy /proc parsing buffer overflow
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: report use refcount overflow
Christoph Hellwig <hch(a)lst.de>
btrfs: don't stop integrity writeback too early
Douglas Miller <doug.miller(a)cornelisnetworks.com>
IB/hfi1: Fix possible panic during hotplug remove
Andrew Kanner <andrew.kanner(a)gmail.com>
drivers: net: prevent tun_build_skb() to exceed the packet size limit
Eric Dumazet <edumazet(a)google.com>
dccp: fix data-race around dp->dccps_mss_cache
Ziyang Xuan <william.xuanziyang(a)huawei.com>
bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
Eric Dumazet <edumazet(a)google.com>
net/packet: annotate data-races around tp->status
Karol Herbst <kherbst(a)redhat.com>
drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
Arnd Bergmann <arnd(a)arndb.de>
x86: Move gds_ucode_mitigated() declaration to header
Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
Elson Roy Serrao <quic_eserrao(a)quicinc.com>
usb: dwc3: Properly handle processing of pending events
Alan Stern <stern(a)rowland.harvard.edu>
usb-storage: alauda: Fix uninit-value in alauda_check_media()
Yiyuan Guo <yguoaz(a)gmail.com>
iio: cros_ec: Fix the allocation size for cros_ec_command
Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
Colin Ian King <colin.i.king(a)gmail.com>
radix tree test suite: fix incorrect allocation size for pthreads
Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
dmaengine: pl330: Return DMA_PAUSED when transaction is paused
Maciej Żenczykowski <maze(a)google.com>
ipv6: adjust ndisc_is_useropt() to also return true for PIO
Sergei Antonov <saproj(a)gmail.com>
mmc: moxart: read scr register without changing byte order
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
sparc: fix up arch_cpu_finalize_init() build breakage.
-------------
Diffstat:
Makefile | 4 +-
arch/alpha/kernel/setup.c | 3 +-
arch/sparc/Kconfig | 2 +-
arch/x86/entry/vdso/vma.c | 4 +-
arch/x86/include/asm/processor.h | 2 +
arch/x86/kvm/x86.c | 2 -
drivers/dma/pl330.c | 18 ++-
drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +-
.../common/cros_ec_sensors/cros_ec_sensors_core.c | 2 +-
drivers/infiniband/hw/hfi1/chip.c | 1 +
drivers/mmc/host/moxart-mmc.c | 8 +-
drivers/net/bonding/bond_main.c | 4 +-
drivers/net/tun.c | 2 +-
drivers/scsi/53c700.c | 2 +-
drivers/scsi/raid_class.c | 1 +
drivers/scsi/scsi_proc.c | 30 +++--
drivers/scsi/snic/snic_disc.c | 1 +
drivers/scsi/storvsc_drv.c | 4 -
drivers/usb/dwc3/gadget.c | 9 +-
drivers/usb/storage/alauda.c | 9 +-
fs/btrfs/extent_io.c | 7 +-
fs/nilfs2/inode.c | 8 ++
fs/nilfs2/segment.c | 2 +
fs/nilfs2/the_nilfs.h | 2 +
include/net/netfilter/nf_tables.h | 27 +++-
lib/test_firmware.c | 8 +-
net/dccp/output.c | 2 +-
net/dccp/proto.c | 10 +-
net/ipv6/ndisc.c | 3 +-
net/netfilter/nf_tables_api.c | 143 +++++++++++++--------
net/netfilter/nft_objref.c | 8 +-
net/packet/af_packet.c | 16 ++-
tools/testing/radix-tree/regression1.c | 2 +-
33 files changed, 228 insertions(+), 120 deletions(-)
Dear all,
I found in all versions of Linux (at least for kernel version 4/5/6),
the following bug exists:
When a user is granted full access to a file of which he is not the
owner, he can read/write/delete the file, but cannot "change only its
last modification date". In particular, `touch -m` fails and Python's
`os.utime()` also fails with "Operation not permitted", but `touch`
without -m works.
This applies to both FACL extended permission as well as basic Linux
file permission.
Thank you for fixing this in the future!
Cheers,
Xuancong