Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter.
The value of the maximum number of Ethernet queues should be limited to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if enabled to avoid out of bounds reads and writes.
Stack traces:
UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11 index 20 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11 index 28 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8 index 29 is out of range for type 'stats_query_entry [19]' CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Workqueue: bnx2x bnx2x_sp_task [bnx2x] Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x] bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x] ? bnx2x_hw_stats_post+0x231/0x250 [bnx2x] bnx2x_stats_start+0x44/0x70 [bnx2x] bnx2x_stats_handle+0x149/0x350 [bnx2x] bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x] bnx2x_sp_task+0x491/0x5c0 [bnx2x] process_one_work+0x18d/0x3f0 </TASK> ---[ end trace ]---
Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value") Signed-off-by: Ghadi Elie Rahme ghadi.rahme@canonical.com Cc: stable@vger.kernel.org --- Changes since v1: * Fix checkpatch complaints: - Wrapped commit message to comply with 75 character limit - Added space before ( in if condition
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index a8e07e51418f..c895dd680cf8 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -66,7 +66,12 @@ static int bnx2x_calc_num_queues(struct bnx2x *bp) if (is_kdump_kernel()) nq = 1;
- nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp)); + int max_nq = FP_SB_MAX_E1x - 1; + + if (NO_FCOE(bp)) + max_nq = FP_SB_MAX_E1x; + + nq = clamp(nq, 1, max_nq); return nq; }
Ghadi Elie Rahme ghadi.rahme@canonical.com writes:
Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter.
The value of the maximum number of Ethernet queues should be limited to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if enabled to avoid out of bounds reads and writes.
Stack traces:
UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11 index 20 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace:
<TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11 index 28 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0
I also hit this one on powerpc:
https://lore.kernel.org/all/87pltc4rs8.fsf@mail.lhotse/
And confirm that this patch fixes it there too.
cheers
On Wed, 12 Jun 2024 18:44:49 +0300 Ghadi Elie Rahme wrote:
Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter.
The value of the maximum number of Ethernet queues should be limited to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if enabled to avoid out of bounds reads and writes.
You're just describing what the code does, not providing extra context...
Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value")
Sure this is not more recent, netif_get_num_default_rss_queues() used to always return 8.
Signed-off-by: Ghadi Elie Rahme ghadi.rahme@canonical.com Cc: stable@vger.kernel.org
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index a8e07e51418f..c895dd680cf8 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -66,7 +66,12 @@ static int bnx2x_calc_num_queues(struct bnx2x *bp) if (is_kdump_kernel()) nq = 1;
- nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp));
- int max_nq = FP_SB_MAX_E1x - 1;
please don't mix declarations and code
- if (NO_FCOE(bp))
max_nq = FP_SB_MAX_E1x;
you really need to explain somewhere why you're hardcoding E1x constants while at a glance the driver also supports E2. Also why is BNX2X_MAX_QUEUES() higher than the number of queues? Isn't that the bug?
On 13/06/2024 17:48, Jakub Kicinski wrote:
On Wed, 12 Jun 2024 18:44:49 +0300 Ghadi Elie Rahme wrote:
Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter.
The value of the maximum number of Ethernet queues should be limited to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if enabled to avoid out of bounds reads and writes.
You're just describing what the code does, not providing extra context...
Apologies for the lack of explanation.
Currently there is a read/write out of bounds that occurs on the array "struct stats_query_entry query" present inside the "bnx2x_fw_stats_req" struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h". Looking at the definition of the "struct stats_query_entry query" array:
struct stats_query_entry query[FP_SB_MAX_E1x+ BNX2X_FIRST_QUEUE_QUERY_IDX];
FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3 meaning the array has a total size of 19. Since accesses to "struct stats_query_entry query" are offset-ted by BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet queues should not exceed FP_SB_MAX_E1x (16). However one of these queues is reserved for FCOE and thus the number of Ethernet queues should be set to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if it is not.
This is also described in a comment in the source code in drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition of FP_SB_MAX_E1x. Below is the part of this explanation that it important for this patch
/* * The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is * control by the number of fast-path status blocks supported by the * device (HW/FW). Each fast-path status block (FP-SB) aka non-default * status block represents an independent interrupts context that can * serve a regular L2 networking queue. However special L2 queues such * as the FCoE queue do not require a FP-SB and other components like * the CNIC may consume FP-SB reducing the number of possible L2 queues * * If the maximum number of FP-SB available is X then: * a. If CNIC is supported it consumes 1 FP-SB thus the max number of * regular L2 queues is Y=X-1 * b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor) * c. If the FCoE L2 queue is supported the actual number of L2 queues * is Y+1 * d. The number of irqs (MSIX vectors) is either Y+1 (one extra for * slow-path interrupts) or Y+2 if CNIC is supported (one additional * FP interrupt context for the CNIC). * e. The number of HW context (CID count) is always X or X+1 if FCoE * L2 queue is supported. The cid for the FCoE L2 queue is always X. */
Looking at the commits when the E2 support was added, it was originally using the E1x parameters [f2e0899f0f27 (bnx2x: Add 57712 support)]. Where FP_SB_MAX_E2 was set to 16 the same as E1x. Since I do not have access to the datasheets of these devices I had to guess based on the previous work done on the driver what would be the safest way to fix this array overflow. Thus I decided to go with how things were done before, which is to limit the E2 to using the same number of queues as E1x. This patch accomplishes that.
However I also had another solution which made more sense to me but I had no way to tell if it would be safe. The other solution was to increase the size of the stats_query_entry query array to be large enough to fit the number of queues supported by E2. This would mean that the new definition would look like the following:
struct stats_query_entry query[FP_SB_MAX_E2+ BNX2X_FIRST_QUEUE_QUERY_IDX];
I have tested this approach and it worked fine so I am more comfortable now changing the patch an sending in a v3 undoing the changes in v2 and simply increasing the array size. I believe now that using FP_SB_MAX_E1x instead of FP_SB_MAX_E2 to define the array size might have been an oversight when updating the driver to take full advantage of the E2 after it was just limiting itself to the capabilities of an E1x.
Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value")
Sure this is not more recent, netif_get_num_default_rss_queues() used to always return 8.
The value of the number of queues can be defined by the kernel or the user, which is why I used the commit that I did for the Fixes tag because it is the job of the clamp to make sure both these values are in check. Setting the Fixes tag to when netif_get_num_default_rss_queues() was changed ignores the fact that the user value can be out of bounds.
Signed-off-by: Ghadi Elie Rahme ghadi.rahme@canonical.com Cc: stable@vger.kernel.org drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index a8e07e51418f..c895dd680cf8 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -66,7 +66,12 @@ static int bnx2x_calc_num_queues(struct bnx2x *bp) if (is_kdump_kernel()) nq = 1;
- nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp));
- int max_nq = FP_SB_MAX_E1x - 1;
please don't mix declarations and code
- if (NO_FCOE(bp))
max_nq = FP_SB_MAX_E1x;
you really need to explain somewhere why you're hardcoding E1x constants while at a glance the driver also supports E2. Also why is BNX2X_MAX_QUEUES() higher than the number of queues? Isn't that the bug?
The reason I did not patch BNX2X_MAX_QUEUES() is because the macro is working as expected by returning the actual number of queues that can be handled by a NIC using an E2/E1x chip. It was the driver that was not able to handle the maximum an E2 NIC can take.
On Thu, 20 Jun 2024 17:59:42 +0300 Ghadi Rahme wrote:
I have tested this approach and it worked fine so I am more comfortable now changing the patch an sending in a v3 undoing the changes in v2 and simply increasing the array size. I believe now that using FP_SB_MAX_E1x instead of FP_SB_MAX_E2 to define the array size might have been an oversight when updating the driver to take full advantage of the E2 after it was just limiting itself to the capabilities of an E1x.
Sounds good! Just to be clear, please do include the explanations you provided here in the commit message, with the necessary edits. There's no limit on the length of the commit message.
linux-stable-mirror@lists.linaro.org