Hi all, I'm using vanilla kernels on a gentoo-based laptop and since 6.3.2 I'm getting the kernel log below when using kvm VM on my box. I know, kernel is tainted but avoiding to load nvidia driver could make things complicated on my side; if needed for debug I can try to avoid it.
Not sure which other infos can be relevant in this context; if you need more details just let me know, happy to provide them.
[Fri May 26 09:16:35 2023] ------------[ cut here ]------------ [Fri May 26 09:16:35 2023] WARNING: CPU: 5 PID: 4684 at kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] [Fri May 26 09:16:35 2023] Modules linked in: vhost_net vhost vhost_iotlb tap tun tls rfcomm snd_hrtimer snd_seq xt_CHECKSUM algif_skcipher xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat iptable_filter ip_tables bpfilter bridge stp llc rmi_smbus rmi_core bnep squashfs sch_fq_codel nvidia_drm(POE) intel_rapl_msr vboxnetadp(OE) vboxnetflt(OE) nvidia_modeset(POE) mei_pxp mei_hdcp rtsx_pci_sdmmc vboxdrv(OE) mmc_core intel_rapl_common intel_pmc_core_pltdrv intel_pmc_core snd_ctl_led intel_tcc_cooling snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp btusb btrtl snd_usb_audio btbcm btmtk kvm_intel btintel snd_hda_intel snd_intel_dspcfg snd_usbmidi_lib snd_hda_codec snd_rawmidi snd_hwdep bluetooth snd_hda_core snd_seq_device kvm snd_pcm thinkpad_acpi iwlmvm mousedev ledtrig_audio uvcvideo snd_timer ecdh_generic irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni snd think_lmi joydev mei_me ecc uvc [Fri May 26 09:16:35 2023] polyval_generic rtsx_pci iwlwifi firmware_attributes_class psmouse wmi_bmof soundcore intel_pch_thermal mei platform_profile input_leds evdev nvidia(POE) coretemp hwmon akvcam(OE) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc loop nfsd auth_rpcgss nfs_acl efivarfs dmi_sysfs dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time dm_round_robin dm_queue_length dm_multipath dm_delay virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio_blk virtio_console virtio_balloon vxlan ip6_udp_tunnel udp_tunnel macvlan virtio_net net_failover failover virtio_ring virtio fuse overlay nfs lockd grace sunrpc linear raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod dm_snapshot dm_bufio dm_crypt trusted asn1_encoder tpm rng_core dm_mirror dm_region_hash dm_log firewire_core crc_itu_t hid_apple usb_storage ehci_pci ehci_hcd sr_mod cdrom ahci libahci libata [Fri May 26 09:16:35 2023] CPU: 5 PID: 4684 Comm: kvm-nx-lpage-re Tainted: P U OE 6.3.4-cova #1 [Fri May 26 09:16:35 2023] Hardware name: LENOVO 20EQS58500/20EQS58500, BIOS N1EET98W (1.71 ) 12/06/2022 [Fri May 26 09:16:35 2023] RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] [Fri May 26 09:16:35 2023] Code: 48 8b 44 24 30 4c 39 e0 0f 85 1b fe ff ff 48 89 df e8 2e ab fb ff e9 23 fe ff ff 49 bc ff ff ff ff ff ff ff 7f e9 fb fc ff ff <0f> 0b e9 1b ff ff ff 48 8b 44 24 40 65 48 2b 04 25 28 00 00 00 75 [Fri May 26 09:16:35 2023] RSP: 0018:ffff8e1a4403fe68 EFLAGS: 00010246 [Fri May 26 09:16:35 2023] RAX: 0000000000000000 RBX: ffff8e1a42bbd000 RCX: 0000000000000000 [Fri May 26 09:16:35 2023] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [Fri May 26 09:16:35 2023] RBP: ffff8b4e9a56d930 R08: 0000000000000000 R09: ffff8b4e9a56d8a0 [Fri May 26 09:16:35 2023] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8e1a4403fe98 [Fri May 26 09:16:35 2023] R13: 0000000000000001 R14: ffff8b4d9c432e80 R15: 0000000000000010 [Fri May 26 09:16:35 2023] FS: 0000000000000000(0000) GS:ffff8b5cdf740000(0000) knlGS:0000000000000000 [Fri May 26 09:16:35 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri May 26 09:16:35 2023] CR2: 00007efeac53d000 CR3: 0000000978c2c003 CR4: 00000000003726e0 [Fri May 26 09:16:35 2023] Call Trace: [Fri May 26 09:16:35 2023] <TASK> [Fri May 26 09:16:35 2023] ? __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm] [Fri May 26 09:16:35 2023] kvm_vm_worker_thread+0x106/0x1c0 [kvm] [Fri May 26 09:16:35 2023] ? __pfx_kvm_vm_worker_thread+0x10/0x10 [kvm] [Fri May 26 09:16:35 2023] kthread+0xd9/0x100 [Fri May 26 09:16:35 2023] ? __pfx_kthread+0x10/0x10 [Fri May 26 09:16:35 2023] ret_from_fork+0x2c/0x50 [Fri May 26 09:16:35 2023] </TASK> [Fri May 26 09:16:35 2023] ---[ end trace 0000000000000000 ]---
On Fri, May 26, 2023 at 09:43:17AM +0200, Fabio Coatti wrote:
Hi all, I'm using vanilla kernels on a gentoo-based laptop and since 6.3.2 I'm getting the kernel log below when using kvm VM on my box. I know, kernel is tainted but avoiding to load nvidia driver could make things complicated on my side; if needed for debug I can try to avoid it.
Can you try uninstalling nvidia driver (should make kernel not tainted anymore) and reproduce this regression?
On Fri, May 26, 2023, Fabio Coatti wrote:
Hi all, I'm using vanilla kernels on a gentoo-based laptop and since 6.3.2
What was the last kernel you used that didn't trigger this WARN?
I'm getting the kernel log below when using kvm VM on my box.
Are you doing anything "interesting" when the WARN fires, or are you just running the VM and it random fires? Either way, can you provide your QEMU command line?
I know, kernel is tainted but avoiding to load nvidia driver could make things complicated on my side; if needed for debug I can try to avoid it.
Nah, don't worry about that at this point.
Not sure which other infos can be relevant in this context; if you need more details just let me know, happy to provide them.
[Fri May 26 09:16:35 2023] ------------[ cut here ]------------ [Fri May 26 09:16:35 2023] WARNING: CPU: 5 PID: 4684 at kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
Do you have the actual line number for the WARN? There are a handful of sanity checks in kvm_recover_nx_huge_pages(), it would be helpful to pinpoint which one is firing. My builds generate quite different code, and the code stream doesn't appear to be useful for reverse engineering the location.
Il giorno ven 26 mag 2023 alle ore 19:01 Sean Christopherson seanjc@google.com ha scritto:
I'm using vanilla kernels on a gentoo-based laptop and since 6.3.2
What was the last kernel you used that didn't trigger this WARN?
6.3.1
I'm getting the kernel log below when using kvm VM on my box.
Are you doing anything "interesting" when the WARN fires, or are you just running the VM and it random fires? Either way, can you provide your QEMU command line?
I'm not able to spot a specific action that triggers the dump. Now it happened when I was "simply" opening a new chrome page in the guest VM. I guess this can cause some work on mm side, but not really an "interesting" action, I'd say. Basically, I fired up the guest machine (ubuntu 22.04 very basic) on a newly rebooted host, connected a USB device (yubikey) and started chrome. No message just after starting chrome, only when I opened a new page.
Anyway, this is the command line (libvirt managed VM)
/usr/sbin/qemu-system-x86_64 -name guest=ubuntu-u2204-kvm,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-ubuntu-u2204-kvm/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/edk2-ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/ubuntu-u2204-kvm_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine pc-q35-7.1,usb=off,vmport=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,hpet=off,acpi=on -accel kvm -cpu host,migratable=on -global driver=cfi.pflash01,property=secure,value=on -m 16384 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":17179869184} -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 160141fc-ec2e-4d91-bc1c-3e597643bcfd -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=30,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device {"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"} -device {"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"} -device {"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"} -device {"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"} -device {"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"} -device {"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"} -device {"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"} -device {"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"} -device {"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"} -device {"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"} -device {"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"} -device {"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/ubuntu22.04.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null} -device {"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1} -device {"driver":"ide-cd","bus":"ide.0","id":"sata0-0-0"} -netdev {"type":"tap","fd":"32","vhost":true,"vhostfd":"34","id":"hostnet0"} -device {"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:17:0a:44","bus":"pci.1","addr":"0x0"} -chardev pty,id=charserial0 -device {"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0} -chardev socket,id=charchannel0,fd=28,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -chardev spicevmc,id=charchannel1,name=vdagent -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":2,"chardev":"charchannel1","id":"channel1","name":"com.redhat.spice.0"} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"} -audiodev {"id":"audio1","driver":"spice"} -spice port=0,disable-ticketing=on,image-compression=off,seamless-migration=on -device {"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"} -device {"driver":"ich9-intel-hda","id":"sound0","bus":"pcie.0","addr":"0x1b"} -device {"driver":"hda-duplex","id":"sound0-codec0","bus":"sound0.0","cad":0,"audiodev":"audio1"} -global ICH9-LPC.noreboot=off -watchdog-action reset -chardev spicevmc,id=charredir0,name=usbredir -device {"driver":"usb-redir","chardev":"charredir0","id":"redir0","bus":"usb.0","port":"2"} -chardev spicevmc,id=charredir1,name=usbredir -device {"driver":"usb-redir","chardev":"charredir1","id":"redir1","bus":"usb.0","port":"3"} -device {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"} -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device {"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
ps output (of course a different run from the first message report): 57176 ? I< 0:00 [kvm] 57178 ? S 0:00 [kvm-nx-lpage-recovery-57159] 57189 ? S 0:00 [kvm-pit/57159]
Not sure which other infos can be relevant in this context; if you need more details just let me know, happy to provide them.
[Fri May 26 09:16:35 2023] ------------[ cut here ]------------ [Fri May 26 09:16:35 2023] WARNING: CPU: 5 PID: 4684 at kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
Do you have the actual line number for the WARN? There are a handful of sanity checks in kvm_recover_nx_huge_pages(), it would be helpful to pinpoint which one is firing. My builds generate quite different code, and the code stream doesn't appear to be useful for reverse engineering the location.
That's the full message I get. Maybe I should recompile the host kernel with some debug active, any specific suggestion?
Il giorno ven 26 mag 2023 alle ore 19:01 Sean Christopherson seanjc@google.com ha scritto:
Do you have the actual line number for the WARN? There are a handful of sanity checks in kvm_recover_nx_huge_pages(), it would be helpful to pinpoint which one is firing. My builds generate quite different code, and the code stream doesn't appear to be useful for reverse engineering the location.
Just got the following: arch/x86/kvm/mmu/mmu.c:7015 so seemingly around here:
if (atomic_read(&kvm->nr_memslots_dirty_logging)) { slot = gfn_to_memslot(kvm, sp->gfn); WARN_ON_ONCE(!slot); }
[Sun May 28 12:48:12 2023] ------------[ cut here ]------------ [Sun May 28 12:48:12 2023] WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015 kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] [Sun May 28 12:48:12 2023] Modules linked in: vhost_net vhost vhost_iotlb tap tun rfcomm snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat iptable_filter ip_tables bpfilter bridge stp llc algif_skcipher bnep rmi_smbus rmi_core squashfs sch_fq_codel vboxnetadp(OE) nvidia_drm(POE) vboxnetflt(OE) rtsx_pci_sdmmc intel_rapl_msr nvidia_modeset(POE) mmc_core mei_pxp mei_hdcp vboxdrv(OE) snd_ctl_led intel_rapl_common snd_hda_codec_realtek intel_pmc_core_pltdrv snd_hda_codec_generic intel_pmc_core intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp btusb snd_hda_intel btrtl btbcm snd_intel_dspcfg btmtk snd_usb_audio kvm_intel btintel snd_usbmidi_lib snd_hda_codec snd_hwdep kvm snd_rawmidi iwlmvm snd_hda_core snd_seq_device bluetooth snd_pcm thinkpad_acpi irqbypass crct10dif_pclmul crc32_pclmul snd_timer mei_me ledtrig_audio ecdh_generic psmouse joydev think_lmi uvcvideo polyval_clmulni snd polyval_generic wmi_bmof [Sun May 28 12:48:12 2023] firmware_attributes_class iwlwifi rtsx_pci uvc ecc mousedev soundcore mei intel_pch_thermal platform_profile evdev input_leds nvidia(POE) coretemp hwmon akvcam(OE) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc loop nfsd auth_rpcgss nfs_acl efivarfs dmi_sysfs dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time dm_round_robin dm_queue_length dm_multipath dm_delay virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio_blk virtio_console virtio_balloon vxlan ip6_udp_tunnel udp_tunnel macvlan virtio_net net_failover failover virtio_ring virtio fuse overlay nfs lockd grace sunrpc linear raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod dm_snapshot dm_bufio dm_crypt trusted asn1_encoder tpm rng_core dm_mirror dm_region_hash dm_log firewire_core crc_itu_t hid_apple usb_storage ehci_pci ehci_hcd sr_mod cdrom ahci libahci libata [Sun May 28 12:48:12 2023] CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re Tainted: P U OE 6.3.4-cova #2 [Sun May 28 12:48:12 2023] Hardware name: LENOVO 20EQS58500/20EQS58500, BIOS N1EET98W (1.71 ) 12/06/2022 [Sun May 28 12:48:12 2023] RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] [Sun May 28 12:48:12 2023] Code: 48 8b 44 24 30 4c 39 e0 0f 85 1b fe ff ff 48 89 df e8 2e ab fb ff e9 23 fe ff ff 49 bc ff ff ff ff ff ff ff 7f e9 fb fc ff ff <0f> 0b e9 1b ff ff ff 48 8b 44 24 40 65 48 2b 04 25 28 00 00 00 75 [Sun May 28 12:48:12 2023] RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246 [Sun May 28 12:48:12 2023] RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000 [Sun May 28 12:48:12 2023] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [Sun May 28 12:48:12 2023] RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450 [Sun May 28 12:48:12 2023] R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98 [Sun May 28 12:48:12 2023] R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003 [Sun May 28 12:48:12 2023] FS: 0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000 [Sun May 28 12:48:12 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Sun May 28 12:48:12 2023] CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0 [Sun May 28 12:48:12 2023] Call Trace: [Sun May 28 12:48:12 2023] <TASK> [Sun May 28 12:48:12 2023] ? __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm] [Sun May 28 12:48:12 2023] kvm_vm_worker_thread+0x106/0x1c0 [kvm] [Sun May 28 12:48:12 2023] ? __pfx_kvm_vm_worker_thread+0x10/0x10 [kvm] [Sun May 28 12:48:12 2023] kthread+0xd9/0x100 [Sun May 28 12:48:12 2023] ? __pfx_kthread+0x10/0x10 [Sun May 28 12:48:12 2023] ret_from_fork+0x2c/0x50 [Sun May 28 12:48:12 2023] </TASK> [Sun May 28 12:48:12 2023] ---[ end trace 0000000000000000 ]---
On Fri, May 26, 2023 at 09:43:17AM +0200, Fabio Coatti wrote:
Hi all, I'm using vanilla kernels on a gentoo-based laptop and since 6.3.2 I'm getting the kernel log below when using kvm VM on my box. I know, kernel is tainted but avoiding to load nvidia driver could make things complicated on my side; if needed for debug I can try to avoid it.
Not sure which other infos can be relevant in this context; if you need more details just let me know, happy to provide them.
[Fri May 26 09:16:35 2023] ------------[ cut here ]------------ [Fri May 26 09:16:35 2023] WARNING: CPU: 5 PID: 4684 at kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] [Fri May 26 09:16:35 2023] Modules linked in: vhost_net vhost vhost_iotlb tap tun tls rfcomm snd_hrtimer snd_seq xt_CHECKSUM algif_skcipher xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat iptable_filter ip_tables bpfilter bridge stp llc rmi_smbus rmi_core bnep squashfs sch_fq_codel nvidia_drm(POE) intel_rapl_msr vboxnetadp(OE) vboxnetflt(OE) nvidia_modeset(POE) mei_pxp mei_hdcp rtsx_pci_sdmmc vboxdrv(OE) mmc_core intel_rapl_common intel_pmc_core_pltdrv intel_pmc_core snd_ctl_led intel_tcc_cooling snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp btusb btrtl snd_usb_audio btbcm btmtk kvm_intel btintel snd_hda_intel snd_intel_dspcfg snd_usbmidi_lib snd_hda_codec snd_rawmidi snd_hwdep bluetooth snd_hda_core snd_seq_device kvm snd_pcm thinkpad_acpi iwlmvm mousedev ledtrig_audio uvcvideo snd_timer ecdh_generic irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni snd think_lmi joydev mei_me ecc uvc [Fri May 26 09:16:35 2023] polyval_generic rtsx_pci iwlwifi firmware_attributes_class psmouse wmi_bmof soundcore intel_pch_thermal mei platform_profile input_leds evdev nvidia(POE) coretemp hwmon akvcam(OE) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc loop nfsd auth_rpcgss nfs_acl efivarfs dmi_sysfs dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time dm_round_robin dm_queue_length dm_multipath dm_delay virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio_blk virtio_console virtio_balloon vxlan ip6_udp_tunnel udp_tunnel macvlan virtio_net net_failover failover virtio_ring virtio fuse overlay nfs lockd grace sunrpc linear raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod dm_snapshot dm_bufio dm_crypt trusted asn1_encoder tpm rng_core dm_mirror dm_region_hash dm_log firewire_core crc_itu_t hid_apple usb_storage ehci_pci ehci_hcd sr_mod cdrom ahci libahci libata [Fri May 26 09:16:35 2023] CPU: 5 PID: 4684 Comm: kvm-nx-lpage-re Tainted: P U OE 6.3.4-cova #1 [Fri May 26 09:16:35 2023] Hardware name: LENOVO 20EQS58500/20EQS58500, BIOS N1EET98W (1.71 ) 12/06/2022 [Fri May 26 09:16:35 2023] RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] [Fri May 26 09:16:35 2023] Code: 48 8b 44 24 30 4c 39 e0 0f 85 1b fe ff ff 48 89 df e8 2e ab fb ff e9 23 fe ff ff 49 bc ff ff ff ff ff ff ff 7f e9 fb fc ff ff <0f> 0b e9 1b ff ff ff 48 8b 44 24 40 65 48 2b 04 25 28 00 00 00 75 [Fri May 26 09:16:35 2023] RSP: 0018:ffff8e1a4403fe68 EFLAGS: 00010246 [Fri May 26 09:16:35 2023] RAX: 0000000000000000 RBX: ffff8e1a42bbd000 RCX: 0000000000000000 [Fri May 26 09:16:35 2023] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [Fri May 26 09:16:35 2023] RBP: ffff8b4e9a56d930 R08: 0000000000000000 R09: ffff8b4e9a56d8a0 [Fri May 26 09:16:35 2023] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8e1a4403fe98 [Fri May 26 09:16:35 2023] R13: 0000000000000001 R14: ffff8b4d9c432e80 R15: 0000000000000010 [Fri May 26 09:16:35 2023] FS: 0000000000000000(0000) GS:ffff8b5cdf740000(0000) knlGS:0000000000000000 [Fri May 26 09:16:35 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri May 26 09:16:35 2023] CR2: 00007efeac53d000 CR3: 0000000978c2c003 CR4: 00000000003726e0 [Fri May 26 09:16:35 2023] Call Trace: [Fri May 26 09:16:35 2023] <TASK> [Fri May 26 09:16:35 2023] ? __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm] [Fri May 26 09:16:35 2023] kvm_vm_worker_thread+0x106/0x1c0 [kvm] [Fri May 26 09:16:35 2023] ? __pfx_kvm_vm_worker_thread+0x10/0x10 [kvm] [Fri May 26 09:16:35 2023] kthread+0xd9/0x100 [Fri May 26 09:16:35 2023] ? __pfx_kthread+0x10/0x10 [Fri May 26 09:16:35 2023] ret_from_fork+0x2c/0x50 [Fri May 26 09:16:35 2023] </TASK> [Fri May 26 09:16:35 2023] ---[ end trace 0000000000000000 ]---
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: v6.3.1..v6.3.2 #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome
Fabio, can you also check the mainline (on guest)?
Il giorno dom 28 mag 2023 alle ore 14:44 Bagas Sanjaya bagasdotme@gmail.com ha scritto:
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: v6.3.1..v6.3.2 #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome
Out of curiosity, I recompiled 6.3.4 after reverting the following commit mentioned in 6.3.2 changelog:
commit 2ec1fe292d6edb3bd112f900692d9ef292b1fa8b Author: Sean Christopherson seanjc@google.com Date: Wed Apr 26 15:03:23 2023 -0700 KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec upstream.
And the WARN message no longer appears on my host kernel logs, at least so far :)
Fabio, can you also check the mainline (on guest)?
Not sure to understand, you mean 6.4-rcX? I can do that, sure, but why on guest? The WARN appears on host logs, the one with 6.3.4 kernel. Guest is a standard ubuntu 22.04, currently with 5.19.0-42-generic (ubuntu) kernel
On Tue, May 30, 2023, Fabio Coatti wrote:
Il giorno dom 28 mag 2023 alle ore 14:44 Bagas Sanjaya bagasdotme@gmail.com ha scritto:
#regzbot ^introduced: v6.3.1..v6.3.2 #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome
Out of curiosity, I recompiled 6.3.4 after reverting the following commit mentioned in 6.3.2 changelog:
commit 2ec1fe292d6edb3bd112f900692d9ef292b1fa8b Author: Sean Christopherson seanjc@google.com Date: Wed Apr 26 15:03:23 2023 -0700 KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec upstream.
And the WARN message no longer appears on my host kernel logs, at least so far :)
Hmm, more than likely an NX shadow page is outliving a memslot update. I'll take another look at those flows to see if I can spot a race or leak.
Fabio, can you also check the mainline (on guest)?
Not sure to understand, you mean 6.4-rcX? I can do that, sure, but why on guest?
Misunderstanding probably? Please do test with 6.4-rcX on the host. I expect the WARN to reproduce there as well, but if it doesn't then we'll have a very useful datapoint.
On Tue, May 30, 2023, Sean Christopherson wrote:
On Tue, May 30, 2023, Fabio Coatti wrote:
Il giorno dom 28 mag 2023 alle ore 14:44 Bagas Sanjaya bagasdotme@gmail.com ha scritto:
#regzbot ^introduced: v6.3.1..v6.3.2 #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome
Out of curiosity, I recompiled 6.3.4 after reverting the following commit mentioned in 6.3.2 changelog:
commit 2ec1fe292d6edb3bd112f900692d9ef292b1fa8b Author: Sean Christopherson seanjc@google.com Date: Wed Apr 26 15:03:23 2023 -0700 KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec upstream.
And the WARN message no longer appears on my host kernel logs, at least so far :)
Hmm, more than likely an NX shadow page is outliving a memslot update. I'll take another look at those flows to see if I can spot a race or leak.
I didn't spot anything, and I couldn't reproduce the WARN even when dropping the dirty logging requirement and hacking KVM to periodically delete memslots.
printk debugging it is... Can you run with this and report back?
--- arch/x86/kvm/mmu/mmu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d3812de54b02..89c2e5ee7d36 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -855,6 +855,8 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) if (!list_empty(&sp->possible_nx_huge_page_link)) return;
+ sp->mmu_valid_gen = kvm->arch.mmu_valid_gen; + ++kvm->stat.nx_lpage_splits; list_add_tail(&sp->possible_nx_huge_page_link, &kvm->arch.possible_nx_huge_pages); @@ -7012,7 +7014,9 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm) slot = NULL; if (atomic_read(&kvm->nr_memslots_dirty_logging)) { slot = gfn_to_memslot(kvm, sp->gfn); - WARN_ON_ONCE(!slot); + if (!WARN_ON_ONCE(!slot)) + pr_warn_ratelimited("No slot for gfn = %llx, role = %x, TDP MMU = %u, root count = %u, gen = %u vs %u\n", + sp->gfn, sp->role.word, sp->tdp_mmu_page, sp->root_count, sp->mmu_valid_gen, kvm->arch.mmu_valid_gen); }
if (slot && kvm_slot_dirty_track_enabled(slot))
base-commit: 17f2d782f18c9a49943ea723d7628da1837c9204 --
On Tue, May 30, 2023, Sean Christopherson wrote:
On Tue, May 30, 2023, Sean Christopherson wrote:
On Tue, May 30, 2023, Fabio Coatti wrote:
Il giorno dom 28 mag 2023 alle ore 14:44 Bagas Sanjaya bagasdotme@gmail.com ha scritto:
#regzbot ^introduced: v6.3.1..v6.3.2 #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome
Out of curiosity, I recompiled 6.3.4 after reverting the following commit mentioned in 6.3.2 changelog:
commit 2ec1fe292d6edb3bd112f900692d9ef292b1fa8b Author: Sean Christopherson seanjc@google.com Date: Wed Apr 26 15:03:23 2023 -0700 KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec upstream.
And the WARN message no longer appears on my host kernel logs, at least so far :)
Hmm, more than likely an NX shadow page is outliving a memslot update. I'll take another look at those flows to see if I can spot a race or leak.
I didn't spot anything, and I couldn't reproduce the WARN even when dropping the dirty logging requirement and hacking KVM to periodically delete memslots.
Aha! Apparently my brain was just waiting until I sat down for dinner to have its lightbulb moment.
The memslot lookup isn't factoring in whether the shadow page is for non-SMM versus SMM. QEMU configures SMM to have memslots that do not exist in the non-SMM world, so if kvm_recover_nx_huge_pages() encounters an SMM shadow page, the memslot lookup can fail to find a memslot because it looks only in the set of non-SMM memslots.
Before commit 2ec1fe292d6e ("KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated"), KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages once all vCPUs exited SMM. That made the window where this bug could be encountered quite tiny, as the NX recovery thread would have to kick in while at least one vCPU was in SMM. QEMU VMs typically only use SMM during boot, and so the "bad" shadow pages were gone by the time the NX recovery thread ran.
Now that KVM preserves TDP MMU roots until they are explicity invalidated (by a memslot deletion), the window to encounter the bug is effectively never closed because QEMU doesn't delete memslots after boot (except for a handful of special scenarios.
Assuming I'm correct, this should fix the issue:
--- arch/x86/kvm/mmu/mmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d3812de54b02..d5c03f14cdc7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7011,7 +7011,10 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm) */ slot = NULL; if (atomic_read(&kvm->nr_memslots_dirty_logging)) { - slot = gfn_to_memslot(kvm, sp->gfn); + struct kvm_memslots *slots; + + slots = kvm_memslots_for_spte_role(kvm, sp->role); + slot = __gfn_to_memslot(slots, sp->gfn); WARN_ON_ONCE(!slot); }
base-commit: 17f2d782f18c9a49943ea723d7628da1837c9204 --
Il giorno mer 31 mag 2023 alle ore 04:04 Sean Christopherson seanjc@google.com ha scritto:
On Tue, May 30, 2023, Sean Christopherson wrote:
On Tue, May 30, 2023, Sean Christopherson wrote:
On Tue, May 30, 2023, Fabio Coatti wrote:
Il giorno dom 28 mag 2023 alle ore 14:44 Bagas Sanjaya bagasdotme@gmail.com ha scritto:
#regzbot ^introduced: v6.3.1..v6.3.2 #regzbot title: WARNING trace at kvm_nx_huge_page_recovery_worker when opening a new tab in Chrome
Out of curiosity, I recompiled 6.3.4 after reverting the following commit mentioned in 6.3.2 changelog:
commit 2ec1fe292d6edb3bd112f900692d9ef292b1fa8b Author: Sean Christopherson seanjc@google.com Date: Wed Apr 26 15:03:23 2023 -0700 KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec upstream.
And the WARN message no longer appears on my host kernel logs, at least so far :)
Hmm, more than likely an NX shadow page is outliving a memslot update. I'll take another look at those flows to see if I can spot a race or leak.
I didn't spot anything, and I couldn't reproduce the WARN even when dropping the dirty logging requirement and hacking KVM to periodically delete memslots.
Aha! Apparently my brain was just waiting until I sat down for dinner to have its lightbulb moment.
The memslot lookup isn't factoring in whether the shadow page is for non-SMM versus SMM. QEMU configures SMM to have memslots that do not exist in the non-SMM world, so if kvm_recover_nx_huge_pages() encounters an SMM shadow page, the memslot lookup can fail to find a memslot because it looks only in the set of non-SMM memslots.
Before commit 2ec1fe292d6e ("KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated"), KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages once all vCPUs exited SMM. That made the window where this bug could be encountered quite tiny, as the NX recovery thread would have to kick in while at least one vCPU was in SMM. QEMU VMs typically only use SMM during boot, and so the "bad" shadow pages were gone by the time the NX recovery thread ran.
Now that KVM preserves TDP MMU roots until they are explicity invalidated (by a memslot deletion), the window to encounter the bug is effectively never closed because QEMU doesn't delete memslots after boot (except for a handful of special scenarios.
Assuming I'm correct, this should fix the issue:
arch/x86/kvm/mmu/mmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d3812de54b02..d5c03f14cdc7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7011,7 +7011,10 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm) */ slot = NULL; if (atomic_read(&kvm->nr_memslots_dirty_logging)) {
slot = gfn_to_memslot(kvm, sp->gfn);
struct kvm_memslots *slots;
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, sp->gfn); WARN_ON_ONCE(!slot); }
base-commit: 17f2d782f18c9a49943ea723d7628da1837c9204
I applied this patch on the same kernel I was using for testing (6.3.4) and indeed I'm no longer able to see the WARN message, so I assume that you are indeed correct :) . Many thanks, it seems to be fixed at least on my machine!
linux-stable-mirror@lists.linaro.org