On Sun, Feb 05, 2023 at 05:02:29PM -0800, Dan Williams wrote:
Summary:
CXL RAM support allows for the dynamic provisioning of new CXL RAM regions, and more routinely, assembling a region from an existing configuration established by platform-firmware. The latter is motivated by CXL memory RAS (Reliability, Availability and Serviceability) support, that requires associating device events with System Physical Address ranges and vice versa.
The 'Soft Reserved' policy rework arranges for performance differentiated memory like CXL attached DRAM, or high-bandwidth memory, to be designated for 'System RAM' by default, rather than the device-dax dedicated access mode. That current device-dax default is confusing and surprising for the Pareto of users that do not expect memory to be quarantined for dedicated access by default. Most users expect all 'System RAM'-capable memory to show up in FREE(1).
Leverage the same QEMU branch, machine, and configuration as my prior tests, i'm now experiencing a kernel panic on boot. Will debug a bit in the morning, but here is the stack trace i'm seeing
Saw this in both 1 and 2 root port configurations
(note: I also have the region reset issue previously discussed on top of your branch).
QEMU configuration:
sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \ -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \ -m 2G,slots=4,maxmem=4G \ -smp 4 \ -machine type=q35,accel=kvm,cxl=on \ -enable-kvm \ -nographic \ -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \ -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \ -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \ -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \ -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
[ 13.936817] Call Trace: [ 13.970691] <TASK> [ 13.990690] device_add+0x39d/0x9a0 [ 14.024690] ? kobject_set_name_vargs+0x6d/0x90 [ 14.066690] ? dev_set_name+0x4b/0x60 [ 14.090691] devm_cxl_add_port+0x29a/0x4d0 [ 14.135946] cxl_acpi_probe+0xd9/0x2f0 [ 14.167691] ? device_pm_check_callbacks+0x36/0x100 [ 14.203691] platform_probe+0x44/0x90 [ 14.247691] really_probe+0xde/0x380 [ 14.277690] ? pm_runtime_barrier+0x50/0x90 [ 14.324693] __driver_probe_device+0x78/0x170 [ 14.356694] driver_probe_device+0x1f/0x90 [ 14.396692] __driver_attach+0xce/0x1c0 [ 14.435691] ? __pfx___driver_attach+0x10/0x10 [ 14.471692] bus_for_each_dev+0x73/0xa0 [ 14.508693] bus_add_driver+0x1ae/0x200 [ 14.551691] driver_register+0x89/0xe0 [ 14.587691] ? __pfx_cxl_acpi_init+0x10/0x10 [ 14.625690] do_one_initcall+0x59/0x230 [ 14.814691] kernel_init_freeable+0x204/0x24e [ 14.846710] ? __pfx_kernel_init+0x10/0x10 [ 14.899692] kernel_init+0x16/0x140 [ 14.954691] ret_from_fork+0x2c/0x50 [ 14.986692] </TASK> [ 15.023689] Modules linked in: [ 15.057693] CR2: 0000000000000060 [ 15.105691] ---[ end trace 0000000000000000 ]--- [ 15.162837] RIP: 0010:bus_add_device+0x5b/0x150 [ 15.217693] Code: 49 8b 74 24 20 48 89 df e8 92 88 ff ff 89 c5 85 c0 75 3b 48 8b 53 50 48 85 d2 75 03 48 8b 13 49 8b 84 24 a8 00 00 00 48 89 0 [ 15.427859] RSP: 0000:ffffbd2a40013bc0 EFLAGS: 00010246 [ 15.475692] RAX: 0000000000000000 RBX: ffff955c419e1800 RCX: 0000000000000000 [ 15.591691] RDX: ffff955c41921778 RSI: ffff955c419e1800 RDI: ffff955c419e1800 [ 15.703693] RBP: 0000000000000000 R08: 0000000000000228 R09: ffff955c4119e550 [ 15.802711] R10: ffff955c414bedc8 R11: 0000000000000000 R12: ffffffff9e259d60 [ 15.907692] R13: ffffffff9d3cee40 R14: ffff955c419e1800 R15: ffff955c41f06010 [ 15.983693] FS: 0000000000000000(0000) GS:ffff955cbdd00000(0000) knlGS:0000000000000000 [ 16.126698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.206694] CR2: 0000000000000060 CR3: 0000000036010000 CR4: 00000000000006e0 [ 16.347694] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 [ 16.348686] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---