The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev;
- free_irqs(ndev); + if (ndev) + free_irqs(ndev); }
static const struct auxiliary_device_id mlx5v_id_table[] = {
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
Rule: 'Cc: stable@vger.kernel.org' or 'commit <sha1> upstream.' Subject: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists Link: https://lore.kernel.org/stable/20230726190744.14143-1-dtatulea%40nvidia.com
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace:
<TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev;
- free_irqs(ndev);
- if (ndev)
free_irqs(ndev);
}
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
this is what's creating all this mess.
static const struct auxiliary_device_id mlx5v_id_table[] = {
2.41.0
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev; - free_irqs(ndev); + if (ndev) + free_irqs(ndev); }
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
this is what's creating all this mess.
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
static const struct auxiliary_device_id mlx5v_id_table[] = {
2.41.0
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev; - free_irqs(ndev); + if (ndev) + free_irqs(ndev); }
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
this is what's creating all this mess.
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
and why do we care whether irqs are freed on shutdown?
static const struct auxiliary_device_id mlx5v_id_table[] = {
2.41.0
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev; - free_irqs(ndev); + if (ndev) + free_irqs(ndev); }
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
this is what's creating all this mess.
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
and why do we care whether irqs are freed on shutdown?
Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev; - free_irqs(ndev); + if (ndev) + free_irqs(ndev); }
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
this is what's creating all this mess.
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
and why do we care whether irqs are freed on shutdown?
Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin mst@redhat.com wrote:
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. This patch simply adds a check before using ndev.
BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0- gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace:
<TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]---
Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9138ef2fb2c8..e2e7ebd71798 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev) mgtdev = auxiliary_get_drvdata(auxdev); ndev = mgtdev->ndev;
free_irqs(ndev);
if (ndev)
free_irqs(ndev);
}
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
this is what's creating all this mess.
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
and why do we care whether irqs are freed on shutdown?
Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
There used to be bzs that requires virtio drivers to add a shutdown to fix kexec:
https://bugzilla.redhat.com/show_bug.cgi?id=2108406
Thanks
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
-- MST
On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin mst@redhat.com wrote:
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > The ndev was accessed on shutdown without a check if it actually > exists. > This triggered the crash pasted below. This patch simply adds a > check > before using ndev. > > BUG: kernel NULL pointer dereference, address: 0000000000000300 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] SMP > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel- > 1.13.0-0- > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 > RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 > R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? __die+0x20/0x60 > ? page_fault_oops+0x14c/0x3c0 > ? exc_page_fault+0x75/0x140 > ? asm_exc_page_fault+0x22/0x30 > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > device_shutdown+0x13e/0x1e0 > kernel_restart+0x36/0x90 > __do_sys_reboot+0x141/0x210 > ? vfs_writev+0xcd/0x140 > ? handle_mm_fault+0x161/0x260 > ? do_writev+0x6b/0x110 > do_syscall_64+0x3d/0x90 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x7f496990fb56 > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: > 00000000000000a9 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 > RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 > </TASK> > CR2: 0000000000000300 > ---[ end trace 0000000000000000 ]--- > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > Signed-off-by: Dragos Tatulea dtatulea@nvidia.com > --- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index 9138ef2fb2c8..e2e7ebd71798 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > auxiliary_device > *auxdev) > mgtdev = auxiliary_get_drvdata(auxdev); > ndev = mgtdev->ndev; > > - free_irqs(ndev); > + if (ndev) > + free_irqs(ndev); > } >
something I don't get: irqs are allocated in mlx5_vdpa_dev_add why are they not freed in mlx5_vdpa_dev_del?
That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
this is what's creating all this mess.
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
and why do we care whether irqs are freed on shutdown?
Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
I don't. x86_64 is not one of them though. I will do some more digging ...
There used to be bzs that requires virtio drivers to add a shutdown to fix kexec:
I don't have access to this. What is it about?
Thanks, Dragos
Thanks
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
-- MST
On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea dtatulea@nvidia.com wrote:
On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin mst@redhat.com wrote:
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > The ndev was accessed on shutdown without a check if it actually > > exists. > > This triggered the crash pasted below. This patch simply adds a > > check > > before using ndev. > > > > BUG: kernel NULL pointer dereference, address: 0000000000000300 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel- > > 1.13.0-0- > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 > > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 > > RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 > > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 > > R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 > > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > <TASK> > > ? __die+0x20/0x60 > > ? page_fault_oops+0x14c/0x3c0 > > ? exc_page_fault+0x75/0x140 > > ? asm_exc_page_fault+0x22/0x30 > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > device_shutdown+0x13e/0x1e0 > > kernel_restart+0x36/0x90 > > __do_sys_reboot+0x141/0x210 > > ? vfs_writev+0xcd/0x140 > > ? handle_mm_fault+0x161/0x260 > > ? do_writev+0x6b/0x110 > > do_syscall_64+0x3d/0x90 > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > RIP: 0033:0x7f496990fb56 > > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: > > 00000000000000a9 > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 > > RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead > > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 > > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 > > </TASK> > > CR2: 0000000000000300 > > ---[ end trace 0000000000000000 ]--- > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > Signed-off-by: Dragos Tatulea dtatulea@nvidia.com > > --- > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > auxiliary_device > > *auxdev) > > mgtdev = auxiliary_get_drvdata(auxdev); > > ndev = mgtdev->ndev; > > > > - free_irqs(ndev); > > + if (ndev) > > + free_irqs(ndev); > > } > > > > something I don't get: > irqs are allocated in mlx5_vdpa_dev_add > why are they not freed in mlx5_vdpa_dev_del? > That is a good point. I will try to find out. I also don't get why free_irq is called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
> this is what's creating all this mess. > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or am I missing something?
and why do we care whether irqs are freed on shutdown?
Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
I don't. x86_64 is not one of them though. I will do some more digging ...
There used to be bzs that requires virtio drivers to add a shutdown to fix kexec:
I don't have access to this. What is it about?
This bug might be more accurate:
https://bugzilla.redhat.com/show_bug.cgi?id=1820521
It's about the kexec guys (cced relevant people) wanting to add a shutdown method for virito to fix potential kexec issues.
Thanks
Thanks, Dragos
Thanks
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
-- MST
On Wed, 2023-08-02 at 10:51 +0800, Jason Wang wrote:
On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea dtatulea@nvidia.com wrote:
On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin mst@redhat.com wrote:
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote: > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote: > > > The ndev was accessed on shutdown without a check if it > > > actually > > > exists. > > > This triggered the crash pasted below. This patch simply adds > > > a > > > check > > > before using ndev. > > > > > > BUG: kernel NULL pointer dereference, address: > > > 0000000000000300 > > > #PF: supervisor read access in kernel mode > > > #PF: error_code(0x0000) - not-present page > > > PGD 0 P4D 0 > > > Oops: 0000 [#1] SMP > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel- > > > 1.13.0-0- > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 > > > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: > > > 0000000000000017 > > > RDX: 0000000000000001 RSI: 0000000000000212 RDI: > > > ffff888109d28000 > > > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: > > > 0000000000000001 > > > R10: 0000000000000000 R11: 0000000000000000 R12: > > > ffff888109d28000 > > > R13: ffff888109d28080 R14: 00000000fee1dead R15: > > > 0000000000000000 > > > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) > > > knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: > > > 0000000000370eb0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > > 0000000000000400 > > > Call Trace: > > > <TASK> > > > ? __die+0x20/0x60 > > > ? page_fault_oops+0x14c/0x3c0 > > > ? exc_page_fault+0x75/0x140 > > > ? asm_exc_page_fault+0x22/0x30 > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > device_shutdown+0x13e/0x1e0 > > > kernel_restart+0x36/0x90 > > > __do_sys_reboot+0x141/0x210 > > > ? vfs_writev+0xcd/0x140 > > > ? handle_mm_fault+0x161/0x260 > > > ? do_writev+0x6b/0x110 > > > do_syscall_64+0x3d/0x90 > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > RIP: 0033:0x7f496990fb56 > > > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: > > > 00000000000000a9 > > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > > > 00007f496990fb56 > > > RDX: 0000000001234567 RSI: 0000000028121969 RDI: > > > fffffffffee1dead > > > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: > > > 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000206 R12: > > > 0000000000000000 > > > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: > > > 00007fffc7bde2b8 > > > </TASK> > > > CR2: 0000000000000300 > > > ---[ end trace 0000000000000000 ]--- > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") > > > Signed-off-by: Dragos Tatulea dtatulea@nvidia.com > > > --- > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > auxiliary_device > > > *auxdev) > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > ndev = mgtdev->ndev; > > > > > > - free_irqs(ndev); > > > + if (ndev) > > > + free_irqs(ndev); > > > } > > > > > > > something I don't get: > > irqs are allocated in mlx5_vdpa_dev_add > > why are they not freed in mlx5_vdpa_dev_del? > > > That is a good point. I will try to find out. I also don't get why > free_irq > is > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. > Maybe I > can > change > that in a different refactoring.
as it is I have no idea whether e.g. ndev can change between these two call sites. that would make the check pointless.
> > this is what's creating all this mess. > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's > what > I > see. Or > am I missing something?
and why do we care whether irqs are freed on shutdown?
Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
I don't. x86_64 is not one of them though. I will do some more digging ...
Turns out that this fix (releasing the irqs on .shutdown on mlx5_core) was required for PPC arch but only for certain mainframe systems. That's all the info I could find.
There used to be bzs that requires virtio drivers to add a shutdown to fix kexec:
I don't have access to this. What is it about?
This bug might be more accurate:
https://bugzilla.redhat.com/show_bug.cgi?id=1820521
It's about the kexec guys (cced relevant people) wanting to add a shutdown method for virito to fix potential kexec issues.
Thanks
Thanks, Dragos
Thanks
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
.shutdown could be removed in mlx5_vdpa. But I notice that mlx5_core's .shutdown kicks in from pci_device_shutdown to clean the irqs. So the irqs will still be freed but as a side effect. Which is not good.
Thanks, Dragos
On Wed, 2023-08-02 at 09:56 +0200, Dragos Tatulea wrote:
On Wed, 2023-08-02 at 10:51 +0800, Jason Wang wrote:
On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea dtatulea@nvidia.com wrote:
On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin mst@redhat.com wrote:
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote: > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea > > > wrote: > > > > The ndev was accessed on shutdown without a check if it > > > > actually > > > > exists. > > > > This triggered the crash pasted below. This patch simply > > > > adds > > > > a > > > > check > > > > before using ndev. > > > > > > > > BUG: kernel NULL pointer dereference, address: > > > > 0000000000000300 > > > > #PF: supervisor read access in kernel mode > > > > #PF: error_code(0x0000) - not-present page > > > > PGD 0 P4D 0 > > > > Oops: 0000 [#1] SMP > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > rel- > > > > 1.13.0-0- > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 > > > > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: > > > > 0000000000000017 > > > > RDX: 0000000000000001 RSI: 0000000000000212 RDI: > > > > ffff888109d28000 > > > > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: > > > > 0000000000000001 > > > > R10: 0000000000000000 R11: 0000000000000000 R12: > > > > ffff888109d28000 > > > > R13: ffff888109d28080 R14: 00000000fee1dead R15: > > > > 0000000000000000 > > > > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) > > > > knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: > > > > 0000000000370eb0 > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > 0000000000000000 > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > > > 0000000000000400 > > > > Call Trace: > > > > <TASK> > > > > ? __die+0x20/0x60 > > > > ? page_fault_oops+0x14c/0x3c0 > > > > ? exc_page_fault+0x75/0x140 > > > > ? asm_exc_page_fault+0x22/0x30 > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > device_shutdown+0x13e/0x1e0 > > > > kernel_restart+0x36/0x90 > > > > __do_sys_reboot+0x141/0x210 > > > > ? vfs_writev+0xcd/0x140 > > > > ? handle_mm_fault+0x161/0x260 > > > > ? do_writev+0x6b/0x110 > > > > do_syscall_64+0x3d/0x90 > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > RIP: 0033:0x7f496990fb56 > > > > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: > > > > 00000000000000a9 > > > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > > > > 00007f496990fb56 > > > > RDX: 0000000001234567 RSI: 0000000028121969 RDI: > > > > fffffffffee1dead > > > > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: > > > > 0000000000000000 > > > > R10: 0000000000000000 R11: 0000000000000206 R12: > > > > 0000000000000000 > > > > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: > > > > 00007fffc7bde2b8 > > > > </TASK> > > > > CR2: 0000000000000300 > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt > > > > bypassing") > > > > Signed-off-by: Dragos Tatulea dtatulea@nvidia.com > > > > --- > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > > auxiliary_device > > > > *auxdev) > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > ndev = mgtdev->ndev; > > > > > > > > - free_irqs(ndev); > > > > + if (ndev) > > > > + free_irqs(ndev); > > > > } > > > > > > > > > > something I don't get: > > > irqs are allocated in mlx5_vdpa_dev_add > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > That is a good point. I will try to find out. I also don't get > > why > > free_irq > > is > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. > > Maybe I > > can > > change > > that in a different refactoring. > > as it is I have no idea whether e.g. ndev can change > between these two call sites. that would make the check > pointless. > > > > this is what's creating all this mess. > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least > > that's > > what > > I > > see. Or > > am I missing something? > > and why do we care whether irqs are freed on shutdown? > Had to ask around a bit to find out the answer: there can be issues with kexec IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Thanks, Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
I don't. x86_64 is not one of them though. I will do some more digging ...
Turns out that this fix (releasing the irqs on .shutdown on mlx5_core) was required for PPC arch but only for certain mainframe systems. That's all the info I could find.
I will send a v2 for this patch that removes the shutdown op. The irqs will be released by the mlx5_core shutdown handler which is responsible for the VF.
Thanks, Dragos
There used to be bzs that requires virtio drivers to add a shutdown to fix kexec:
I don't have access to this. What is it about?
This bug might be more accurate:
https://bugzilla.redhat.com/show_bug.cgi?id=1820521
It's about the kexec guys (cced relevant people) wanting to add a shutdown method for virito to fix potential kexec issues.
Thanks
Thanks, Dragos
Thanks
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
.shutdown could be removed in mlx5_vdpa. But I notice that mlx5_core's .shutdown kicks in from pci_device_shutdown to clean the irqs. So the irqs will still be freed but as a side effect. Which is not good.
Thanks, Dragos
On Thu, Aug 03, 2023 at 03:02:59PM +0000, Dragos Tatulea wrote:
On Wed, 2023-08-02 at 09:56 +0200, Dragos Tatulea wrote:
On Wed, 2023-08-02 at 10:51 +0800, Jason Wang wrote:
On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea dtatulea@nvidia.com wrote:
On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin mst@redhat.com wrote:
On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote: > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote: > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote: > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote: > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea > > > > wrote: > > > > > The ndev was accessed on shutdown without a check if it > > > > > actually > > > > > exists. > > > > > This triggered the crash pasted below. This patch simply > > > > > adds > > > > > a > > > > > check > > > > > before using ndev. > > > > > > > > > > BUG: kernel NULL pointer dereference, address: > > > > > 0000000000000300 > > > > > #PF: supervisor read access in kernel mode > > > > > #PF: error_code(0x0000) - not-present page > > > > > PGD 0 P4D 0 > > > > > Oops: 0000 [#1] SMP > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0- > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1 > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > > > rel- > > > > > 1.13.0-0- > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 > > > > > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: > > > > > 0000000000000017 > > > > > RDX: 0000000000000001 RSI: 0000000000000212 RDI: > > > > > ffff888109d28000 > > > > > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: > > > > > 0000000000000001 > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: > > > > > ffff888109d28000 > > > > > R13: ffff888109d28080 R14: 00000000fee1dead R15: > > > > > 0000000000000000 > > > > > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) > > > > > knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: > > > > > 0000000000370eb0 > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > > 0000000000000000 > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > > > > 0000000000000400 > > > > > Call Trace: > > > > > <TASK> > > > > > ? __die+0x20/0x60 > > > > > ? page_fault_oops+0x14c/0x3c0 > > > > > ? exc_page_fault+0x75/0x140 > > > > > ? asm_exc_page_fault+0x22/0x30 > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] > > > > > device_shutdown+0x13e/0x1e0 > > > > > kernel_restart+0x36/0x90 > > > > > __do_sys_reboot+0x141/0x210 > > > > > ? vfs_writev+0xcd/0x140 > > > > > ? handle_mm_fault+0x161/0x260 > > > > > ? do_writev+0x6b/0x110 > > > > > do_syscall_64+0x3d/0x90 > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > RIP: 0033:0x7f496990fb56 > > > > > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: > > > > > 00000000000000a9 > > > > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > > > > > 00007f496990fb56 > > > > > RDX: 0000000001234567 RSI: 0000000028121969 RDI: > > > > > fffffffffee1dead > > > > > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: > > > > > 0000000000000000 > > > > > R10: 0000000000000000 R11: 0000000000000206 R12: > > > > > 0000000000000000 > > > > > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: > > > > > 00007fffc7bde2b8 > > > > > </TASK> > > > > > CR2: 0000000000000300 > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt > > > > > bypassing") > > > > > Signed-off-by: Dragos Tatulea dtatulea@nvidia.com > > > > > --- > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644 > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct > > > > > auxiliary_device > > > > > *auxdev) > > > > > mgtdev = auxiliary_get_drvdata(auxdev); > > > > > ndev = mgtdev->ndev; > > > > > > > > > > - free_irqs(ndev); > > > > > + if (ndev) > > > > > + free_irqs(ndev); > > > > > } > > > > > > > > > > > > > something I don't get: > > > > irqs are allocated in mlx5_vdpa_dev_add > > > > why are they not freed in mlx5_vdpa_dev_del? > > > > > > > That is a good point. I will try to find out. I also don't get > > > why > > > free_irq > > > is > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. > > > Maybe I > > > can > > > change > > > that in a different refactoring. > > > > as it is I have no idea whether e.g. ndev can change > > between these two call sites. that would make the check > > pointless. > > > > > > this is what's creating all this mess. > > > > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least > > > that's > > > what > > > I > > > see. Or > > > am I missing something? > > > > and why do we care whether irqs are freed on shutdown? > > > Had to ask around a bit to find out the answer: there can be issues > with > kexec > IRQ allocation on some platforms. It is documented here [0] for > mlx5_core. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv... > > Thanks, > Dragos
It's quite weird. * Some platforms requiring freeing the IRQ's in the shutdown * flow. If they aren't freed they can't be allocated after * kexec. There is no need to cleanup the mlx5_core software * contexts.
but most drivers don't have a shutdown callback how do they work then? do you know which platforms these are?
I don't. x86_64 is not one of them though. I will do some more digging ...
Turns out that this fix (releasing the irqs on .shutdown on mlx5_core) was required for PPC arch but only for certain mainframe systems. That's all the info I could find.
I will send a v2 for this patch that removes the shutdown op. The irqs will be released by the mlx5_core shutdown handler which is responsible for the VF.
Thanks, Dragos
Certainly seems cleaner. Thanks!
There used to be bzs that requires virtio drivers to add a shutdown to fix kexec:
I don't have access to this. What is it about?
This bug might be more accurate:
https://bugzilla.redhat.com/show_bug.cgi?id=1820521
It's about the kexec guys (cced relevant people) wanting to add a shutdown method for virito to fix potential kexec issues.
Thanks
Thanks, Dragos
Thanks
I don't really know much about why shutdown callback is even necessary. I guess this is to detect shutdown and do a faster cleanup than the slow, graceful removal, just cleaning hardware resources?
.shutdown could be removed in mlx5_vdpa. But I notice that mlx5_core's .shutdown kicks in from pci_device_shutdown to clean the irqs. So the irqs will still be freed but as a side effect. Which is not good.
Thanks, Dragos
linux-stable-mirror@lists.linaro.org