Hi Levin,
On Stable 6.6.23 kernel, iwlwifi crashed with the following error:
[ 290.279712] ------------[ cut here ]------------ [ 290.279726] Invalid rxb from HW 0 [ 290.279816] WARNING: CPU: 19 PID: 477 at drivers/net/wireless/intel/iwlwifi/pcie/rx.c:1489 iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi] [ 290.279885] Modules linked in: snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_realtek snd_hda_codec_generic rfcomm nvme_fabrics ccm cmac algif_hash algif_skcipher af_alg bnep uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev btusb btrtl btintel btbcm btmtk videobuf2_common bluetooth mc ecdh_generic ecc joydev snd_soc_dmic intel_uncore_frequency intel_uncore_frequency_common snd_sof_pci_intel_mtl snd_sof_intel_hda_common x86_pkg_temp_thermal soundwire_intel intel_powerclamp soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp coretemp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core kvm_intel snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus kvm snd_soc_core irqbypass snd_compress crct10dif_pclmul ac97_bus crc32_pclmul snd_pcm_dmaengine polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 iwlmvm snd_hda_intel sha256_ssse3 binfmt_misc sha1_ssse3 i915 [ 290.279973] snd_intel_dspcfg drm_buddy aesni_intel snd_intel_sdw_acpi ttm mac80211 crypto_simd processor_thermal_device_pci snd_hda_codec drm_display_helper spi_nor hid_multitouch processor_thermal_device cryptd think_lmi pmt_telemetry hid_generic libarc4 mtd intel_rapl_msr pmt_class iwlwifi firmware_attributes_class wmi_bmof snd_hda_core cec processor_thermal_rfim rapl snd_hwdep psmouse thinkpad_acpi input_leds rc_core mei_me snd_seq_midi ucsi_acpi processor_thermal_mbox intel_cstate intel_lpss_pci snd_seq_midi_event typec_ucsi processor_thermal_rapl nvram i2c_i801 intel_lpss xhci_pci drm_kms_helper spi_intel_pci ledtrig_audio cfg80211 snd_pcm e1000e thunderbolt serio_raw platform_profile mei i2c_smbus spi_intel typec idma64 xhci_pci_renesas i2c_algo_bit intel_vsec intel_rapl_common snd_rawmidi mac_hid snd_seq snd_seq_device snd_timer i2c_hid_acpi i2c_hid hid snd soundcore video int3403_thermal int340x_thermal_zone wmi acpi_tad acpi_pad intel_pmc_core intel_hid int3400_thermal pinctrl_meteorlake sparse_keymap [ 290.280076] acpi_thermal_rel sch_fq_codel msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables autofs4 [ 290.280097] CPU: 19 PID: 477 Comm: irq/182-iwlwifi Not tainted 6.6.23 #75 [ 290.280104] Hardware name: LENOVO 21ML0SIT12/21ML0SIT12, BIOS N47ET13W (1.02 ) 02/17/2024 [ 290.280108] RIP: 0010:iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi] [ 290.280156] Code: 8b 8d 6c ff ff ff 4c 89 f2 4c 89 e6 4c 89 ef e8 4a f4 ff ff e9 08 fe ff ff 4d 89 ef 89 d6 48 c7 c7 c5 6c bf c0 e8 44 1d 36 fb <0f> 0b 4c 89 ff e8 1a 48 ff ff e9 9e fe ff ff 0f 1f 44 00 00 e9 f6 [ 290.280161] RSP: 0018:ffffc900004e0de8 EFLAGS: 00010246 [ 290.280167] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 290.280170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 290.280173] RBP: ffffc900004e0e98 R08: 0000000000000000 R09: 0000000000000000 [ 290.280176] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88811dd27f88 [ 290.280179] R13: ffff888119ae8028 R14: ffff88812c4a0000 R15: ffff888119ae8028 [ 290.280182] FS: 0000000000000000(0000) GS:ffff8882214c0000(0000) knlGS:0000000000000000 [ 290.280187] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 290.280191] CR2: 00007f914800ba8c CR3: 000000020e83a005 CR4: 0000000000770ee0 [ 290.280195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 290.280198] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 290.280201] PKRU: 55555554 [ 290.280204] Call Trace: [ 290.280208] <IRQ> [ 290.280214] ? show_regs+0x72/0x90 [ 290.280225] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi] [ 290.280269] ? __warn+0x8d/0x160 [ 290.280278] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi] [ 290.280324] ? report_bug+0x1bb/0x1d0 [ 290.280335] ? console_unlock+0x77/0x130 [ 290.280346] ? handle_bug+0x46/0x90 [ 290.280354] ? exc_invalid_op+0x19/0x80 [ 290.280360] ? asm_exc_invalid_op+0x1b/0x20 [ 290.280369] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi] [ 290.280412] ? iwl_pcie_rx_handle+0x80c/0xad0 [iwlwifi] [ 290.280457] iwl_pcie_napi_poll_msix+0x30/0x100 [iwlwifi] [ 290.280500] ? try_to_wake_up+0x278/0x6c0 [ 290.280507] __napi_poll+0x30/0x1f0 [ 290.280515] net_rx_action+0x190/0x300 [ 290.280521] ? __irq_wake_thread+0x42/0x50 [ 290.280529] __do_softirq+0xda/0x330 [ 290.280533] ? handle_edge_irq+0xda/0x250 [ 290.280540] ? __pfx_irq_thread_fn+0x10/0x10 [ 290.280547] do_softirq.part.0+0x41/0x80 [ 290.280557] </IRQ> [ 290.280559] <TASK> [ 290.280562] __local_bh_enable_ip+0x72/0x80 [ 290.280570] iwl_pcie_irq_rx_msix_handler+0xd7/0x1c0 [iwlwifi] [ 290.280644] irq_thread_fn+0x25/0x70 [ 290.280653] irq_thread+0xea/0x1c0 [ 290.280660] ? __pfx_irq_thread_dtor+0x10/0x10 [ 290.280668] ? __pfx_irq_thread+0x10/0x10 [ 290.280675] kthread+0xf4/0x130 [ 290.280683] ? __pfx_kthread+0x10/0x10 [ 290.280690] ret_from_fork+0x43/0x70 [ 290.280697] ? __pfx_kthread+0x10/0x10 [ 290.280704] ret_from_fork_asm+0x1b/0x30 [ 290.280712] </TASK> [ 290.280715] ---[ end trace 0000000000000000 ]--- [ 290.281118] iwlwifi 0000:09:00.0: Microcode SW error detected. Restarting 0x0.
Found the first bad commit c1c1039135c3 ("wifi: iwlwifi: increase number of RX buffers for EHT devices") Another commit should be along with it: commit 9f9797c7de18 ("wifi: iwlwifi: pcie: fix RB status reading")
BugLink: https://bugs.launchpad.net/bugs/2058808
Johannes Berg (1): wifi: iwlwifi: pcie: fix RB status reading
drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 8 ++++---- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 12 ++++-------- 3 files changed, 9 insertions(+), 13 deletions(-)
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 9f9797c7de18d2ec6be4ef6e0abbaea585040b39 ]
On newer hardware, a queue's RB status / write pointer can be bigger than 4095 (0xFFF), so we cannot mask the value by 0xFFF unconditionally. Since anyway that's only necessary on older hardware, move the masking to the helper function and apply it only for older HW. This also moves the endian conversion in to handle it more easily.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20230830112059.7be2a3fff6f4.I94f11dee314a4f7c1941d... Signed-off-by: Johannes Berg johannes.berg@intel.com Cc: stable@vger.kernel.org # 6.6.y Signed-off-by: Aaron Ma aaron.ma@canonical.com --- drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 8 ++++---- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 12 ++++-------- 3 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h index 5602441df2b7e..8408e4ddddedd 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h @@ -190,17 +190,17 @@ struct iwl_rb_allocator { * iwl_get_closed_rb_stts - get closed rb stts from different structs * @rxq - the rxq to get the rb stts from */ -static inline __le16 iwl_get_closed_rb_stts(struct iwl_trans *trans, - struct iwl_rxq *rxq) +static inline u16 iwl_get_closed_rb_stts(struct iwl_trans *trans, + struct iwl_rxq *rxq) { if (trans->trans_cfg->device_family >= IWL_DEVICE_FAMILY_AX210) { __le16 *rb_stts = rxq->rb_stts;
- return READ_ONCE(*rb_stts); + return le16_to_cpu(READ_ONCE(*rb_stts)); } else { struct iwl_rb_status *rb_stts = rxq->rb_stts;
- return READ_ONCE(rb_stts->closed_rb_num); + return le16_to_cpu(READ_ONCE(rb_stts->closed_rb_num)) & 0xFFF; } }
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c index 63091c45a576d..be9b5a19e2a7c 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c @@ -1510,7 +1510,7 @@ static int iwl_pcie_rx_handle(struct iwl_trans *trans, int queue, int budget) spin_lock(&rxq->lock); /* uCode's read index (stored in shared DRAM) indicates the last Rx * buffer that the driver may process (last buffer filled by ucode). */ - r = le16_to_cpu(iwl_get_closed_rb_stts(trans, rxq)) & 0x0FFF; + r = iwl_get_closed_rb_stts(trans, rxq); i = rxq->read;
/* W/A 9000 device step A0 wrap-around bug */ diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index 1bc4a0089c6ff..e9807fcca6ad1 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -2714,11 +2714,9 @@ static ssize_t iwl_dbgfs_rx_queue_read(struct file *file, pos += scnprintf(buf + pos, bufsz - pos, "\tfree_count: %u\n", rxq->free_count); if (rxq->rb_stts) { - u32 r = __le16_to_cpu(iwl_get_closed_rb_stts(trans, - rxq)); + u32 r = iwl_get_closed_rb_stts(trans, rxq); pos += scnprintf(buf + pos, bufsz - pos, - "\tclosed_rb_num: %u\n", - r & 0x0FFF); + "\tclosed_rb_num: %u\n", r); } else { pos += scnprintf(buf + pos, bufsz - pos, "\tclosed_rb_num: Not Allocated\n"); @@ -3091,7 +3089,7 @@ static u32 iwl_trans_pcie_dump_rbs(struct iwl_trans *trans,
spin_lock_bh(&rxq->lock);
- r = le16_to_cpu(iwl_get_closed_rb_stts(trans, rxq)) & 0x0FFF; + r = iwl_get_closed_rb_stts(trans, rxq);
for (i = rxq->read, j = 0; i != r && j < allocated_rb_nums; @@ -3387,9 +3385,7 @@ iwl_trans_pcie_dump_data(struct iwl_trans *trans, /* Dump RBs is supported only for pre-9000 devices (1 queue) */ struct iwl_rxq *rxq = &trans_pcie->rxq[0]; /* RBs */ - num_rbs = - le16_to_cpu(iwl_get_closed_rb_stts(trans, rxq)) - & 0x0FFF; + num_rbs = iwl_get_closed_rb_stts(trans, rxq); num_rbs = (num_rbs - rxq->read) & RX_QUEUE_MASK; len += num_rbs * (sizeof(*data) + sizeof(struct iwl_fw_error_dump_rb) +
On Thu, Mar 28, 2024 at 09:54:02AM +0800, Aaron Ma wrote:
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 9f9797c7de18d2ec6be4ef6e0abbaea585040b39 ]
On newer hardware, a queue's RB status / write pointer can be bigger than 4095 (0xFFF), so we cannot mask the value by 0xFFF unconditionally. Since anyway that's only necessary on older hardware, move the masking to the helper function and apply it only for older HW. This also moves the endian conversion in to handle it more easily.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Link: https://lore.kernel.org/r/20230830112059.7be2a3fff6f4.I94f11dee314a4f7c1941d... Signed-off-by: Johannes Berg johannes.berg@intel.com Cc: stable@vger.kernel.org # 6.6.y Signed-off-by: Aaron Ma aaron.ma@canonical.com
drivers/net/wireless/intel/iwlwifi/pcie/internal.h | 8 ++++---- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 12 ++++-------- 3 files changed, 9 insertions(+), 13 deletions(-)
Now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org