Hi,
On recent kernel I get kernel panic when starting a Xen PV domain with PCI devices assigned. This happens on 5.10.60 (worked on .54) and 5.4.142 (worked on .136):
[ 13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0 [ 13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1 [ 13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2 [ 13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3 [ 13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6 [ 14.036142] e1000e: Intel(R) PRO/1000 Network Driver [ 14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13 [ 14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 14.045188] BUG: unable to handle page fault for address: ffffc9004069100c [ 14.045197] #PF: supervisor write access in kernel mode [ 14.045202] #PF: error_code(0x0003) - permissions violation [ 14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075 [ 14.045227] Oops: 0003 [#1] SMP NOPTI [ 14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G W 5.14.0-rc7-1.fc32.qubes.x86_64 #15 [ 14.045245] Workqueue: events work_for_cpu_fn [ 14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045393] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045401] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045420] Call Trace: [ 14.045431] e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] [ 14.045479] e1000_probe+0x41f/0xdb0 [e1000e] [ 14.045506] local_pci_probe+0x42/0x80 [ 14.045515] work_for_cpu_fn+0x16/0x20 [ 14.045522] process_one_work+0x1ec/0x390 [ 14.045529] worker_thread+0x53/0x3e0 [ 14.045534] ? process_one_work+0x390/0x390 [ 14.045540] kthread+0x127/0x150 [ 14.045548] ? set_kthread_struct+0x40/0x40 [ 14.045554] ret_from_fork+0x22/0x30 [ 14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront [ 14.045620] CR2: ffffc9004069100c [ 14.045627] ---[ end trace 307f5bb3bd9f30b4 ]--- [ 14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045698] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045706] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045718] Kernel panic - not syncing: Fatal exception [ 14.045726] Kernel Offset: disabled
I've bisected it down to this commit:
commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f Author: Thomas Gleixner tglx@linutronix.de Date: Thu Jul 29 23:51:41 2021 +0200
PCI/MSI: Mask all unused MSI-X entries
I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think Xen version matters here.
Any idea how to fix it?
On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
On recent kernel I get kernel panic when starting a Xen PV domain with PCI devices assigned. This happens on 5.10.60 (worked on .54) and 5.4.142 (worked on .136):
[ 13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0 [ 13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1 [ 13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2 [ 13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3 [ 13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6 [ 14.036142] e1000e: Intel(R) PRO/1000 Network Driver [ 14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13 [ 14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 14.045188] BUG: unable to handle page fault for address: ffffc9004069100c [ 14.045197] #PF: supervisor write access in kernel mode [ 14.045202] #PF: error_code(0x0003) - permissions violation [ 14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
I'm curious what lives at physical address FEBD4000. The maximum verbosity hypervisor log may also have a hint as to why this is a read-only PTE.
[ 14.045227] Oops: 0003 [#1] SMP NOPTI [ 14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G W 5.14.0-rc7-1.fc32.qubes.x86_64 #15 [ 14.045245] Workqueue: events work_for_cpu_fn [ 14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045393] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045401] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045420] Call Trace: [ 14.045431] e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] [ 14.045479] e1000_probe+0x41f/0xdb0 [e1000e]
Otoh, from this it's pretty clear it's not a device Xen may have found a need to access for its own purposes. If aforementioned address covers (or is adjacent to) the MSI-X table of a device drive by this driver, then it would also be helpful to know how many MSI-X entries the device reports its table can have.
Jan
[ 14.045506] local_pci_probe+0x42/0x80 [ 14.045515] work_for_cpu_fn+0x16/0x20 [ 14.045522] process_one_work+0x1ec/0x390 [ 14.045529] worker_thread+0x53/0x3e0 [ 14.045534] ? process_one_work+0x390/0x390 [ 14.045540] kthread+0x127/0x150 [ 14.045548] ? set_kthread_struct+0x40/0x40 [ 14.045554] ret_from_fork+0x22/0x30 [ 14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront [ 14.045620] CR2: ffffc9004069100c [ 14.045627] ---[ end trace 307f5bb3bd9f30b4 ]--- [ 14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045698] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045706] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045718] Kernel panic - not syncing: Fatal exception [ 14.045726] Kernel Offset: disabled
I've bisected it down to this commit:
commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f Author: Thomas Gleixner <tglx@linutronix.de> Date: Thu Jul 29 23:51:41 2021 +0200 PCI/MSI: Mask all unused MSI-X entries
I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think Xen version matters here.
Any idea how to fix it?
On Wed, Aug 25, 2021 at 05:33:54PM +0200, Jan Beulich wrote:
On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
On recent kernel I get kernel panic when starting a Xen PV domain with PCI devices assigned. This happens on 5.10.60 (worked on .54) and 5.4.142 (worked on .136):
[ 13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0 [ 13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1 [ 13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2 [ 13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3 [ 13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6 [ 14.036142] e1000e: Intel(R) PRO/1000 Network Driver [ 14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13 [ 14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 14.045188] BUG: unable to handle page fault for address: ffffc9004069100c [ 14.045197] #PF: supervisor write access in kernel mode [ 14.045202] #PF: error_code(0x0003) - permissions violation [ 14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
I'm curious what lives at physical address FEBD4000.
This is a third BAR of this device, related to MSI-X:
00:04.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Intel Corporation Device 0000 Physical Slot: 4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at feba0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at c080 [size=32] Region 3: Memory at febd4000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at feb40000 [disabled] [size=256K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable- Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Kernel driver in use: pciback Kernel modules: e1000e
The maximum verbosity hypervisor log may also have a hint as to why this is a read-only PTE.
I'll try, if that still makes sense.
[ 14.045227] Oops: 0003 [#1] SMP NOPTI [ 14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G W 5.14.0-rc7-1.fc32.qubes.x86_64 #15 [ 14.045245] Workqueue: events work_for_cpu_fn [ 14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045393] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045401] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045420] Call Trace: [ 14.045431] e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] [ 14.045479] e1000_probe+0x41f/0xdb0 [e1000e]
Otoh, from this it's pretty clear it's not a device Xen may have found a need to access for its own purposes. If aforementioned address covers (or is adjacent to) the MSI-X table of a device drive by this driver, then it would also be helpful to know how many MSI-X entries the device reports its table can have.
See above.
Does PCI passthrough for on PV support MSI-X at all? If so, I guess the issue is the kernel trying to write directly, instead of via some hypercall, right?
[ 14.045506] local_pci_probe+0x42/0x80 [ 14.045515] work_for_cpu_fn+0x16/0x20 [ 14.045522] process_one_work+0x1ec/0x390 [ 14.045529] worker_thread+0x53/0x3e0 [ 14.045534] ? process_one_work+0x390/0x390 [ 14.045540] kthread+0x127/0x150 [ 14.045548] ? set_kthread_struct+0x40/0x40 [ 14.045554] ret_from_fork+0x22/0x30 [ 14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront [ 14.045620] CR2: ffffc9004069100c [ 14.045627] ---[ end trace 307f5bb3bd9f30b4 ]--- [ 14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045698] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045706] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045718] Kernel panic - not syncing: Fatal exception [ 14.045726] Kernel Offset: disabled
I've bisected it down to this commit:
commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f Author: Thomas Gleixner <tglx@linutronix.de> Date: Thu Jul 29 23:51:41 2021 +0200 PCI/MSI: Mask all unused MSI-X entries
I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think Xen version matters here.
Any idea how to fix it?
On 25.08.2021 17:47, Marek Marczykowski-Górecki wrote:
On Wed, Aug 25, 2021 at 05:33:54PM +0200, Jan Beulich wrote:
On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
On recent kernel I get kernel panic when starting a Xen PV domain with PCI devices assigned. This happens on 5.10.60 (worked on .54) and 5.4.142 (worked on .136):
[ 13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0 [ 13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1 [ 13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2 [ 13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3 [ 13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6 [ 14.036142] e1000e: Intel(R) PRO/1000 Network Driver [ 14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13 [ 14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 14.045188] BUG: unable to handle page fault for address: ffffc9004069100c [ 14.045197] #PF: supervisor write access in kernel mode [ 14.045202] #PF: error_code(0x0003) - permissions violation [ 14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
I'm curious what lives at physical address FEBD4000.
This is a third BAR of this device, related to MSI-X:
00:04.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Intel Corporation Device 0000 Physical Slot: 4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at feba0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at c080 [size=32] Region 3: Memory at febd4000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at feb40000 [disabled] [size=256K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable- Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Kernel driver in use: pciback Kernel modules: e1000e
The maximum verbosity hypervisor log may also have a hint as to why this is a read-only PTE.
I'll try, if that still makes sense.
I think the above data clarifies it already.
[ 14.045227] Oops: 0003 [#1] SMP NOPTI [ 14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G W 5.14.0-rc7-1.fc32.qubes.x86_64 #15 [ 14.045245] Workqueue: events work_for_cpu_fn [ 14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 [ 14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48 [ 14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212 [ 14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c [ 14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000 [ 14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f [ 14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000 [ 14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000 [ 14.045393] FS: 0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000 [ 14.045401] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660 [ 14.045420] Call Trace: [ 14.045431] e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] [ 14.045479] e1000_probe+0x41f/0xdb0 [e1000e]
Otoh, from this it's pretty clear it's not a device Xen may have found a need to access for its own purposes. If aforementioned address covers (or is adjacent to) the MSI-X table of a device drive by this driver, then it would also be helpful to know how many MSI-X entries the device reports its table can have.
See above.
Does PCI passthrough for on PV support MSI-X at all?
It is supposed to work. The treatment by generic code shouldn't be overly different from how MSI-X works for Dom0 (Xen specific code of course differs).
If so, I guess the issue is the kernel trying to write directly, instead of via some hypercall, right?
Indeed. Or to be precise - the kernel isn't supposed to be "writing" this at all. It is supposed to make hypercalls which may result in such writes. Such "mask everything" functionality imo is the job of the hypervisor anyway when talking about PV environments; HVM is a different thing here.
Jan
On Wed, Aug 25, 2021 at 05:55:09PM +0200, Jan Beulich wrote:
On 25.08.2021 17:47, Marek Marczykowski-Górecki wrote:
If so, I guess the issue is the kernel trying to write directly, instead of via some hypercall, right?
Indeed. Or to be precise - the kernel isn't supposed to be "writing" this at all. It is supposed to make hypercalls which may result in such writes. Such "mask everything" functionality imo is the job of the hypervisor anyway when talking about PV environments; HVM is a different thing here.
Ok, I dug a bit and found why it was working before: there is pci_mask_ignore_mask variable, that is set to 1 for Xen PV (and only then). This bypassed __pci_msi{x,}_desc_mask_irq(), but does not bypass the new msix_mask_all(). Adding that check back fixes the issue - no crash, the device works, although the driver doesn't seem to enable MSI/MSI-X (but that wasn't the case before either).
linux-stable-mirror@lists.linaro.org