Upgrade to 6.1.30, got crash message after resume, but looks still running normally
After revert e16629c639d429e48c849808e59f1efcce886849 thunderbolt: Clear registers properly when auto clear isn't in use This error was gone.
kernel config attached, system is Slackware 15.0 on XPS 9700
May 27 13:55:39 devel kernel: ------------[ cut here ]------------ May 27 13:55:39 devel kernel: thunderbolt 0000:07:00.0: interrupt for TX ring 0 is already enabled May 27 13:55:39 devel kernel: WARNING: CPU: 15 PID: 21394 at drivers/thunderbolt/nhi.c:137 ring_interrupt_active+0x1ff/0x250 [thunderbolt] May 27 13:55:39 devel kernel: Modules linked in: squashfs nls_iso8859_1 nls_cp437 tun fuse 8021q garp mrp iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables efivarfs binfmt_misc snd_ctl_led snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_soc_intel_sof_maxim_common snd_sof_probes snd_soc_rt715 snd_soc_rt711 snd_soc_rt1308_sdw regmap_sdw snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_sof_pci soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_hdac_hda soundwire_bus snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core coretemp snd_compress ac97_bus nouveau intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal dell_smm_hwmon hid_multitouch iwlmvm hwmon intel_powerclamp snd_intel_dspcfg mxm_wmi i915 i2c_designware_platform snd_intel_sdw_acpi rtsx_pci_sdmmc drm_ttm_helper i2c_designware_core mac80211 drm_buddy i2c_algo_bit dell_laptop snd_hda_codec May 27 13:55:39 devel kernel: ucsi_ccg dell_wmi mmc_core hid_generic drm_display_helper ledtrig_audio sparse_keymap libarc4 snd_hwdep intel_rapl_msr dell_smbios uvcvideo ttm snd_hda_core dell_wmi_sysman kvm_intel videobuf2_vmalloc firmware_attributes_class dell_wmi_descriptor wmi_bmof intel_wmi_thunderbolt dcdbas processor_thermal_device_pci_legacy drm_kms_helper videobuf2_memops iwlwifi intel_soc_dts_iosf kvm btusb r8153_ecm btrtl videobuf2_v4l2 snd_pcm syscopyarea processor_thermal_device irqbypass cdc_ether btbcm evdev usbnet psmouse intel_lpss_pci btintel processor_thermal_rfim snd_timer videobuf2_common crc32c_intel ucsi_acpi sysfillrect ghash_clmulni_intel serio_raw cfg80211 efi_pstore r8152 typec_ucsi bluetooth sysimgblt videodev processor_thermal_mbox intel_gtt intel_lpss fb_sys_fops processor_thermal_rapl i2c_i801 roles snd i2c_nvidia_gpu drm i2c_smbus ecdh_generic idma64 i2c_hid_acpi mii usbhid thunderbolt mc soundcore rtsx_pci ecc agpgart i2c_ccgx_ucsi rfkill intel_rapl_common mfd_core May 27 13:55:39 devel kernel: intel_pch_thermal i2c_hid typec video button battery hid int3403_thermal int340x_thermal_zone pinctrl_cannonlake pinctrl_intel wmi int3400_thermal intel_pmc_core acpi_pad acpi_thermal_rel acpi_tad ac usb_storage May 27 13:55:39 devel kernel: CPU: 15 PID: 21394 Comm: kworker/u32:15 Tainted: G W 6.1.30-dell-2 #1 May 27 13:55:39 devel kernel: Hardware name: Dell Inc. XPS 17 9700/0P1CHN, BIOS 1.11.1 11/18/2021 May 27 13:55:39 devel kernel: Workqueue: events_unbound async_run_entry_fn May 27 13:55:39 devel kernel: RIP: 0010:ring_interrupt_active+0x1ff/0x250 [thunderbolt] May 27 13:55:39 devel kernel: Code: 24 04 e8 24 2b 3c e1 4c 8b 4c 24 08 44 8b 44 24 04 48 c7 c7 50 c7 29 a0 48 8b 4c 24 10 48 8b 54 24 18 48 89 c6 e8 71 34 e4 e0 <0f> 0b 45 84 ed 0f 85 09 ff ff ff 48 8b 43 08 f6 40 70 01 0f 85 38 May 27 13:55:39 devel kernel: RSP: 0018:ffffc90000517c48 EFLAGS: 00010082 May 27 13:55:39 devel kernel: RAX: 0000000000000000 RBX: ffff888101dab800 RCX: 0000000000000000 May 27 13:55:39 devel kernel: RDX: 0000000000000004 RSI: 0000000000000086 RDI: 00000000ffffffff May 27 13:55:39 devel kernel: RBP: 0000000000000000 R08: 80000000ffffe7b4 R09: 0000000082999bac May 27 13:55:39 devel kernel: R10: ffffffffffffffff R11: ffffffff82999ba1 R12: 0000000000001001 May 27 13:55:39 devel kernel: R13: 0000000000000001 R14: 0000000000038200 R15: 0000000000000001 May 27 13:55:39 devel kernel: FS: 0000000000000000(0000) GS:ffff88887d7c0000(0000) knlGS:0000000000000000 May 27 13:55:39 devel kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 13:55:39 devel kernel: CR2: 00007f745c010b00 CR3: 000000000220a005 CR4: 00000000007706e0 May 27 13:55:39 devel kernel: PKRU: 55555554 May 27 13:55:39 devel kernel: Call Trace: May 27 13:55:39 devel kernel: <TASK> May 27 13:55:39 devel kernel: tb_ring_start+0x141/0x230 [thunderbolt] May 27 13:55:39 devel kernel: tb_ctl_start+0x1f/0x70 [thunderbolt] May 27 13:55:39 devel kernel: ? pci_pm_restore_noirq+0xc0/0xc0 May 27 13:55:39 devel kernel: tb_domain_runtime_resume+0x15/0x30 [thunderbolt] May 27 13:55:39 devel kernel: __rpm_callback+0x41/0x110 May 27 13:55:39 devel kernel: ? pci_pm_restore_noirq+0xc0/0xc0 May 27 13:55:39 devel kernel: rpm_callback+0x59/0x70 May 27 13:55:39 devel kernel: rpm_resume+0x4b3/0x7f0 May 27 13:55:39 devel kernel: ? _raw_spin_unlock_irq+0x13/0x30 May 27 13:55:39 devel kernel: ? __wait_for_common+0x171/0x1a0 May 27 13:55:39 devel kernel: ? usleep_range_state+0x90/0x90 May 27 13:55:39 devel kernel: ? preempt_count_add+0x68/0xa0 May 27 13:55:39 devel kernel: __pm_runtime_resume+0x4a/0x80 May 27 13:55:39 devel kernel: pci_pm_suspend+0x60/0x170 May 27 13:55:39 devel kernel: ? pci_pm_freeze+0xb0/0xb0 May 27 13:55:39 devel kernel: dpm_run_callback+0x3f/0x150 May 27 13:55:39 devel kernel: ? _raw_spin_lock_irqsave+0x19/0x40 May 27 13:55:39 devel kernel: __device_suspend+0x130/0x4d0 May 27 13:55:39 devel kernel: async_suspend+0x1b/0x90 May 27 13:55:39 devel kernel: async_run_entry_fn+0x1a/0xa0 May 27 13:55:39 devel kernel: process_one_work+0x1bd/0x3c0 May 27 13:55:39 devel kernel: worker_thread+0x4d/0x3c0 May 27 13:55:39 devel kernel: ? process_one_work+0x3c0/0x3c0 May 27 13:55:39 devel kernel: kthread+0xe5/0x110 May 27 13:55:39 devel kernel: ? kthread_complete_and_exit+0x20/0x20 May 27 13:55:39 devel kernel: ret_from_fork+0x1f/0x30 May 27 13:55:39 devel kernel: </TASK> May 27 13:55:39 devel kernel: ---[ end trace 0000000000000000 ]---
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
After revert e16629c639d429e48c849808e59f1efcce886849 thunderbolt: Clear registers properly when auto clear isn't in use This error was gone.
Can you check latest mainline to see if this regression still happens?
kernel config attached, system is Slackware 15.0 on XPS 9700
May 27 13:55:39 devel kernel: ------------[ cut here ]------------ May 27 13:55:39 devel kernel: thunderbolt 0000:07:00.0: interrupt for TX ring 0 is already enabled May 27 13:55:39 devel kernel: WARNING: CPU: 15 PID: 21394 at drivers/thunderbolt/nhi.c:137 ring_interrupt_active+0x1ff/0x250 [thunderbolt] May 27 13:55:39 devel kernel: Modules linked in: squashfs nls_iso8859_1 nls_cp437 tun fuse 8021q garp mrp iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables efivarfs binfmt_misc snd_ctl_led snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_soc_intel_sof_maxim_common snd_sof_probes snd_soc_rt715 snd_soc_rt711 snd_soc_rt1308_sdw regmap_sdw snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_sof_pci soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_hdac_hda soundwire_bus snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core coretemp snd_compress ac97_bus nouveau intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal dell_smm_hwmon hid_multitouch iwlmvm hwmon intel_powerclamp snd_intel_dspcfg mxm_wmi i915 i2c_designware_platform snd_intel_sdw_acpi rtsx_pci_sdmmc drm_ttm_helper i2c_designware_core mac80211 drm_buddy i2c_algo_bit dell_laptop snd_hda_codec May 27 13:55:39 devel kernel: ucsi_ccg dell_wmi mmc_core hid_generic drm_display_helper ledtrig_audio sparse_keymap libarc4 snd_hwdep intel_rapl_msr dell_smbios uvcvideo ttm snd_hda_core dell_wmi_sysman kvm_intel videobuf2_vmalloc firmware_attributes_class dell_wmi_descriptor wmi_bmof intel_wmi_thunderbolt dcdbas processor_thermal_device_pci_legacy drm_kms_helper videobuf2_memops iwlwifi intel_soc_dts_iosf kvm btusb r8153_ecm btrtl videobuf2_v4l2 snd_pcm syscopyarea processor_thermal_device irqbypass cdc_ether btbcm evdev usbnet psmouse intel_lpss_pci btintel processor_thermal_rfim snd_timer videobuf2_common crc32c_intel ucsi_acpi sysfillrect ghash_clmulni_intel serio_raw cfg80211 efi_pstore r8152 typec_ucsi bluetooth sysimgblt videodev processor_thermal_mbox intel_gtt intel_lpss fb_sys_fops processor_thermal_rapl i2c_i801 roles snd i2c_nvidia_gpu drm i2c_smbus ecdh_generic idma64 i2c_hid_acpi mii usbhid thunderbolt mc soundcore rtsx_pci ecc agpgart i2c_ccgx_ucsi rfkill intel_rapl_common mfd_core May 27 13:55:39 devel kernel: intel_pch_thermal i2c_hid typec video button battery hid int3403_thermal int340x_thermal_zone pinctrl_cannonlake pinctrl_intel wmi int3400_thermal intel_pmc_core acpi_pad acpi_thermal_rel acpi_tad ac usb_storage May 27 13:55:39 devel kernel: CPU: 15 PID: 21394 Comm: kworker/u32:15 Tainted: G W 6.1.30-dell-2 #1 May 27 13:55:39 devel kernel: Hardware name: Dell Inc. XPS 17 9700/0P1CHN, BIOS 1.11.1 11/18/2021 May 27 13:55:39 devel kernel: Workqueue: events_unbound async_run_entry_fn May 27 13:55:39 devel kernel: RIP: 0010:ring_interrupt_active+0x1ff/0x250 [thunderbolt] May 27 13:55:39 devel kernel: Code: 24 04 e8 24 2b 3c e1 4c 8b 4c 24 08 44 8b 44 24 04 48 c7 c7 50 c7 29 a0 48 8b 4c 24 10 48 8b 54 24 18 48 89 c6 e8 71 34 e4 e0 <0f> 0b 45 84 ed 0f 85 09 ff ff ff 48 8b 43 08 f6 40 70 01 0f 85 38 May 27 13:55:39 devel kernel: RSP: 0018:ffffc90000517c48 EFLAGS: 00010082 May 27 13:55:39 devel kernel: RAX: 0000000000000000 RBX: ffff888101dab800 RCX: 0000000000000000 May 27 13:55:39 devel kernel: RDX: 0000000000000004 RSI: 0000000000000086 RDI: 00000000ffffffff May 27 13:55:39 devel kernel: RBP: 0000000000000000 R08: 80000000ffffe7b4 R09: 0000000082999bac May 27 13:55:39 devel kernel: R10: ffffffffffffffff R11: ffffffff82999ba1 R12: 0000000000001001 May 27 13:55:39 devel kernel: R13: 0000000000000001 R14: 0000000000038200 R15: 0000000000000001 May 27 13:55:39 devel kernel: FS: 0000000000000000(0000) GS:ffff88887d7c0000(0000) knlGS:0000000000000000 May 27 13:55:39 devel kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 13:55:39 devel kernel: CR2: 00007f745c010b00 CR3: 000000000220a005 CR4: 00000000007706e0 May 27 13:55:39 devel kernel: PKRU: 55555554 May 27 13:55:39 devel kernel: Call Trace: May 27 13:55:39 devel kernel: <TASK> May 27 13:55:39 devel kernel: tb_ring_start+0x141/0x230 [thunderbolt] May 27 13:55:39 devel kernel: tb_ctl_start+0x1f/0x70 [thunderbolt] May 27 13:55:39 devel kernel: ? pci_pm_restore_noirq+0xc0/0xc0 May 27 13:55:39 devel kernel: tb_domain_runtime_resume+0x15/0x30 [thunderbolt] May 27 13:55:39 devel kernel: __rpm_callback+0x41/0x110 May 27 13:55:39 devel kernel: ? pci_pm_restore_noirq+0xc0/0xc0 May 27 13:55:39 devel kernel: rpm_callback+0x59/0x70 May 27 13:55:39 devel kernel: rpm_resume+0x4b3/0x7f0 May 27 13:55:39 devel kernel: ? _raw_spin_unlock_irq+0x13/0x30 May 27 13:55:39 devel kernel: ? __wait_for_common+0x171/0x1a0 May 27 13:55:39 devel kernel: ? usleep_range_state+0x90/0x90 May 27 13:55:39 devel kernel: ? preempt_count_add+0x68/0xa0 May 27 13:55:39 devel kernel: __pm_runtime_resume+0x4a/0x80 May 27 13:55:39 devel kernel: pci_pm_suspend+0x60/0x170 May 27 13:55:39 devel kernel: ? pci_pm_freeze+0xb0/0xb0 May 27 13:55:39 devel kernel: dpm_run_callback+0x3f/0x150 May 27 13:55:39 devel kernel: ? _raw_spin_lock_irqsave+0x19/0x40 May 27 13:55:39 devel kernel: __device_suspend+0x130/0x4d0 May 27 13:55:39 devel kernel: async_suspend+0x1b/0x90 May 27 13:55:39 devel kernel: async_run_entry_fn+0x1a/0xa0 May 27 13:55:39 devel kernel: process_one_work+0x1bd/0x3c0 May 27 13:55:39 devel kernel: worker_thread+0x4d/0x3c0 May 27 13:55:39 devel kernel: ? process_one_work+0x3c0/0x3c0 May 27 13:55:39 devel kernel: kthread+0xe5/0x110 May 27 13:55:39 devel kernel: ? kthread_complete_and_exit+0x20/0x20 May 27 13:55:39 devel kernel: ret_from_fork+0x1f/0x30 May 27 13:55:39 devel kernel: </TASK> May 27 13:55:39 devel kernel: ---[ end trace 0000000000000000 ]---
Anyway, I'm adding it to regzbot (as stable-specific regression for now):
#regzbot ^introduced: e16629c639d429 #regzbot title: Properly clearing Thunderbolt registers when not autoclearing triggers ring_interrupt_active crash on resume
Thanks.
On 5/27/23 18:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
This is specific resuming from s2idle, doesn't happen at boot?
Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
After revert e16629c639d429e48c849808e59f1efcce886849 thunderbolt: Clear registers properly when auto clear isn't in use This error was gone.
Can you check latest mainline to see if this regression still happens?
In addition to checking mainline, can you please attach a full dmesg to somewhere ephemeral like a kernel bugzilla with thunderbolt.dyndbg='+p' on the kernel command line set?
kernel config attached, system is Slackware 15.0 on XPS 9700
May 27 13:55:39 devel kernel: ------------[ cut here ]------------ May 27 13:55:39 devel kernel: thunderbolt 0000:07:00.0: interrupt for TX ring 0 is already enabled May 27 13:55:39 devel kernel: WARNING: CPU: 15 PID: 21394 at drivers/thunderbolt/nhi.c:137 ring_interrupt_active+0x1ff/0x250 [thunderbolt] May 27 13:55:39 devel kernel: Modules linked in: squashfs nls_iso8859_1 nls_cp437 tun fuse 8021q garp mrp iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables efivarfs binfmt_misc snd_ctl_led snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_soc_intel_sof_maxim_common snd_sof_probes snd_soc_rt715 snd_soc_rt711 snd_soc_rt1308_sdw regmap_sdw snd_soc_dmic snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_sof_pci soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_hdac_hda soundwire_bus snd_hda_ext_core snd_hda_codec_hdmi snd_soc_core coretemp snd_compress ac97_bus nouveau intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal dell_smm_hwmon hid_multitouch iwlmvm hwmon intel_powerclamp snd_intel_dspcfg mxm_wmi i915 i2c_designware_platform snd_intel_sdw_acpi rtsx_pci_sdmmc drm_ttm_helper i2c_designware_core mac80211 drm_buddy i2c_algo_bit dell_laptop snd_hda_codec May 27 13:55:39 devel kernel: ucsi_ccg dell_wmi mmc_core hid_generic drm_display_helper ledtrig_audio sparse_keymap libarc4 snd_hwdep intel_rapl_msr dell_smbios uvcvideo ttm snd_hda_core dell_wmi_sysman kvm_intel videobuf2_vmalloc firmware_attributes_class dell_wmi_descriptor wmi_bmof intel_wmi_thunderbolt dcdbas processor_thermal_device_pci_legacy drm_kms_helper videobuf2_memops iwlwifi intel_soc_dts_iosf kvm btusb r8153_ecm btrtl videobuf2_v4l2 snd_pcm syscopyarea processor_thermal_device irqbypass cdc_ether btbcm evdev usbnet psmouse intel_lpss_pci btintel processor_thermal_rfim snd_timer videobuf2_common crc32c_intel ucsi_acpi sysfillrect ghash_clmulni_intel serio_raw cfg80211 efi_pstore r8152 typec_ucsi bluetooth sysimgblt videodev processor_thermal_mbox intel_gtt intel_lpss fb_sys_fops processor_thermal_rapl i2c_i801 roles snd i2c_nvidia_gpu drm i2c_smbus ecdh_generic idma64 i2c_hid_acpi mii usbhid thunderbolt mc soundcore rtsx_pci ecc agpgart i2c_ccgx_ucsi rfkill intel_rapl_common mfd_core May 27 13:55:39 devel kernel: intel_pch_thermal i2c_hid typec video button battery hid int3403_thermal int340x_thermal_zone pinctrl_cannonlake pinctrl_intel wmi int3400_thermal intel_pmc_core acpi_pad acpi_thermal_rel acpi_tad ac usb_storage May 27 13:55:39 devel kernel: CPU: 15 PID: 21394 Comm: kworker/u32:15 Tainted: G W 6.1.30-dell-2 #1 May 27 13:55:39 devel kernel: Hardware name: Dell Inc. XPS 17 9700/0P1CHN, BIOS 1.11.1 11/18/2021 May 27 13:55:39 devel kernel: Workqueue: events_unbound async_run_entry_fn May 27 13:55:39 devel kernel: RIP: 0010:ring_interrupt_active+0x1ff/0x250 [thunderbolt] May 27 13:55:39 devel kernel: Code: 24 04 e8 24 2b 3c e1 4c 8b 4c 24 08 44 8b 44 24 04 48 c7 c7 50 c7 29 a0 48 8b 4c 24 10 48 8b 54 24 18 48 89 c6 e8 71 34 e4 e0 <0f> 0b 45 84 ed 0f 85 09 ff ff ff 48 8b 43 08 f6 40 70 01 0f 85 38 May 27 13:55:39 devel kernel: RSP: 0018:ffffc90000517c48 EFLAGS: 00010082 May 27 13:55:39 devel kernel: RAX: 0000000000000000 RBX: ffff888101dab800 RCX: 0000000000000000 May 27 13:55:39 devel kernel: RDX: 0000000000000004 RSI: 0000000000000086 RDI: 00000000ffffffff May 27 13:55:39 devel kernel: RBP: 0000000000000000 R08: 80000000ffffe7b4 R09: 0000000082999bac May 27 13:55:39 devel kernel: R10: ffffffffffffffff R11: ffffffff82999ba1 R12: 0000000000001001 May 27 13:55:39 devel kernel: R13: 0000000000000001 R14: 0000000000038200 R15: 0000000000000001 May 27 13:55:39 devel kernel: FS: 0000000000000000(0000) GS:ffff88887d7c0000(0000) knlGS:0000000000000000 May 27 13:55:39 devel kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 13:55:39 devel kernel: CR2: 00007f745c010b00 CR3: 000000000220a005 CR4: 00000000007706e0 May 27 13:55:39 devel kernel: PKRU: 55555554 May 27 13:55:39 devel kernel: Call Trace: May 27 13:55:39 devel kernel: <TASK> May 27 13:55:39 devel kernel: tb_ring_start+0x141/0x230 [thunderbolt] May 27 13:55:39 devel kernel: tb_ctl_start+0x1f/0x70 [thunderbolt] May 27 13:55:39 devel kernel: ? pci_pm_restore_noirq+0xc0/0xc0 May 27 13:55:39 devel kernel: tb_domain_runtime_resume+0x15/0x30 [thunderbolt] May 27 13:55:39 devel kernel: __rpm_callback+0x41/0x110 May 27 13:55:39 devel kernel: ? pci_pm_restore_noirq+0xc0/0xc0 May 27 13:55:39 devel kernel: rpm_callback+0x59/0x70 May 27 13:55:39 devel kernel: rpm_resume+0x4b3/0x7f0 May 27 13:55:39 devel kernel: ? _raw_spin_unlock_irq+0x13/0x30 May 27 13:55:39 devel kernel: ? __wait_for_common+0x171/0x1a0 May 27 13:55:39 devel kernel: ? usleep_range_state+0x90/0x90 May 27 13:55:39 devel kernel: ? preempt_count_add+0x68/0xa0 May 27 13:55:39 devel kernel: __pm_runtime_resume+0x4a/0x80 May 27 13:55:39 devel kernel: pci_pm_suspend+0x60/0x170 May 27 13:55:39 devel kernel: ? pci_pm_freeze+0xb0/0xb0 May 27 13:55:39 devel kernel: dpm_run_callback+0x3f/0x150 May 27 13:55:39 devel kernel: ? _raw_spin_lock_irqsave+0x19/0x40 May 27 13:55:39 devel kernel: __device_suspend+0x130/0x4d0 May 27 13:55:39 devel kernel: async_suspend+0x1b/0x90 May 27 13:55:39 devel kernel: async_run_entry_fn+0x1a/0xa0 May 27 13:55:39 devel kernel: process_one_work+0x1bd/0x3c0 May 27 13:55:39 devel kernel: worker_thread+0x4d/0x3c0 May 27 13:55:39 devel kernel: ? process_one_work+0x3c0/0x3c0 May 27 13:55:39 devel kernel: kthread+0xe5/0x110 May 27 13:55:39 devel kernel: ? kthread_complete_and_exit+0x20/0x20 May 27 13:55:39 devel kernel: ret_from_fork+0x1f/0x30 May 27 13:55:39 devel kernel: </TASK> May 27 13:55:39 devel kernel: ---[ end trace 0000000000000000 ]---
Anyway, I'm adding it to regzbot (as stable-specific regression for now):
#regzbot ^introduced: e16629c639d429 #regzbot title: Properly clearing Thunderbolt registers when not autoclearing triggers ring_interrupt_active crash on resume
Thanks.
On Sun, May 28, 2023 at 07:55:39AM -0500, Mario Limonciello wrote:
On 5/27/23 18:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
This is specific resuming from s2idle, doesn't happen at boot?
Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
Happens also when device is connected and do
# rmmod thunderbolt # modprobe thunderbolt
I think it is because nhi_mask_interrupt() does not mask interrupt on Intel now.
Can you try the patch below? I'm unable to try myself because my test system has some booting issues at the moment.
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 4c9f2811d20d..a11650da40f9 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -60,9 +60,12 @@ static int ring_interrupt_index(const struct tb_ring *ring)
static void nhi_mask_interrupt(struct tb_nhi *nhi, int mask, int ring) { - if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) - return; - iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring); + if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) { + u32 val = ioread32(nhi->iobase + REG_RING_INTERRUPT_BASE + ring); + iowrite32(val & ~mask, nhi->iobase + REG_RING_INTERRUPT_BASE + ring); + } else { + iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring); + } }
static void nhi_clear_interrupt(struct tb_nhi *nhi, int ring)
both # rmmod thunderbolt # modprobe thunderbolt makes many crash logs on my hardware.
try to patch this to 6.1.30 and 6.4-rc4 but are all failed.
how about continue this on the kernel bugzilla, and post a patch here after it is resolved as Greg said ?
On Mon, May 29, 2023 at 7:38 AM Mika Westerberg mika.westerberg@linux.intel.com wrote:
On Sun, May 28, 2023 at 07:55:39AM -0500, Mario Limonciello wrote:
On 5/27/23 18:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
This is specific resuming from s2idle, doesn't happen at boot?
Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
Happens also when device is connected and do
# rmmod thunderbolt # modprobe thunderbolt
I think it is because nhi_mask_interrupt() does not mask interrupt on Intel now.
Can you try the patch below? I'm unable to try myself because my test system has some booting issues at the moment.
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 4c9f2811d20d..a11650da40f9 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -60,9 +60,12 @@ static int ring_interrupt_index(const struct tb_ring *ring)
static void nhi_mask_interrupt(struct tb_nhi *nhi, int mask, int ring) {
if (nhi->quirks & QUIRK_AUTO_CLEAR_INT)
return;
iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) {
u32 val = ioread32(nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
iowrite32(val & ~mask, nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
} else {
iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
}
}
static void nhi_clear_interrupt(struct tb_nhi *nhi, int ring)
Hi,
On Mon, May 29, 2023 at 02:40:26PM -0400, beld zhang wrote:
both # rmmod thunderbolt # modprobe thunderbolt makes many crash logs on my hardware.
try to patch this to 6.1.30 and 6.4-rc4 but are all failed.
You mean patching fails or the patch does not solve the issue at hand?
I managed to boot my test system today and with the patch I don't see the issue anymore.
how about continue this on the kernel bugzilla, and post a patch here after it is resolved as Greg said ?
Sure works for me.
On 5/29/23 06:38, Mika Westerberg wrote:
On Sun, May 28, 2023 at 07:55:39AM -0500, Mario Limonciello wrote:
On 5/27/23 18:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
This is specific resuming from s2idle, doesn't happen at boot?
Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
Happens also when device is connected and do
# rmmod thunderbolt # modprobe thunderbolt
I think it is because nhi_mask_interrupt() does not mask interrupt on Intel now.
Can you try the patch below? I'm unable to try myself because my test system has some booting issues at the moment.
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 4c9f2811d20d..a11650da40f9 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -60,9 +60,12 @@ static int ring_interrupt_index(const struct tb_ring *ring) static void nhi_mask_interrupt(struct tb_nhi *nhi, int mask, int ring) {
- if (nhi->quirks & QUIRK_AUTO_CLEAR_INT)
return;
- iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
- if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) {
u32 val = ioread32(nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
iowrite32(val & ~mask, nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
- } else {
iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
- } }
static void nhi_clear_interrupt(struct tb_nhi *nhi, int ring)
Mika, that looks good for the issue, thanks!
You can add: Reviewed-by: Mario Limonciello mario.limonciello@amd.com
When you submit it.
test passed both 6.1.30 and 6.4-rc4 comments at bugzilla.
On Tue, May 30, 2023 at 12:12 AM Mario Limonciello mario.limonciello@amd.com wrote:
On 5/29/23 06:38, Mika Westerberg wrote:
On Sun, May 28, 2023 at 07:55:39AM -0500, Mario Limonciello wrote:
On 5/27/23 18:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
This is specific resuming from s2idle, doesn't happen at boot?
Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
Happens also when device is connected and do
# rmmod thunderbolt # modprobe thunderbolt
I think it is because nhi_mask_interrupt() does not mask interrupt on Intel now.
Can you try the patch below? I'm unable to try myself because my test system has some booting issues at the moment.
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 4c9f2811d20d..a11650da40f9 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -60,9 +60,12 @@ static int ring_interrupt_index(const struct tb_ring *ring)
static void nhi_mask_interrupt(struct tb_nhi *nhi, int mask, int ring) {
if (nhi->quirks & QUIRK_AUTO_CLEAR_INT)
return;
iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) {
u32 val = ioread32(nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
iowrite32(val & ~mask, nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
} else {
iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
}
}
static void nhi_clear_interrupt(struct tb_nhi *nhi, int ring)
Mika, that looks good for the issue, thanks!
You can add: Reviewed-by: Mario Limonciello mario.limonciello@amd.com
When you submit it.
On 5/30/23 11:27, beld zhang wrote:
test passed both 6.1.30 and 6.4-rc4 comments at bugzilla.
tl;dr:
A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail?
A: No. Q: Should I include quotations after my reply?
Please don't top-post your reply in the future. Reply inline with appropriate context instead.
On Mon, May 29, 2023 at 11:12:45PM -0500, Mario Limonciello wrote:
On 5/29/23 06:38, Mika Westerberg wrote:
On Sun, May 28, 2023 at 07:55:39AM -0500, Mario Limonciello wrote:
On 5/27/23 18:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
This is specific resuming from s2idle, doesn't happen at boot?
Does it happen with hot-plugging or hot-unplugging a TBT3 or USB4 dock too?
Happens also when device is connected and do
# rmmod thunderbolt # modprobe thunderbolt
I think it is because nhi_mask_interrupt() does not mask interrupt on Intel now.
Can you try the patch below? I'm unable to try myself because my test system has some booting issues at the moment.
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 4c9f2811d20d..a11650da40f9 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -60,9 +60,12 @@ static int ring_interrupt_index(const struct tb_ring *ring) static void nhi_mask_interrupt(struct tb_nhi *nhi, int mask, int ring) {
- if (nhi->quirks & QUIRK_AUTO_CLEAR_INT)
return;
- iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
- if (nhi->quirks & QUIRK_AUTO_CLEAR_INT) {
u32 val = ioread32(nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
iowrite32(val & ~mask, nhi->iobase + REG_RING_INTERRUPT_BASE + ring);
- } else {
iowrite32(mask, nhi->iobase + REG_RING_INTERRUPT_MASK_CLEAR_BASE + ring);
- } } static void nhi_clear_interrupt(struct tb_nhi *nhi, int ring)
Mika, that looks good for the issue, thanks!
You can add: Reviewed-by: Mario Limonciello mario.limonciello@amd.com
When you submit it.
Thanks, submitted formal patch now here:
https://lore.kernel.org/linux-usb/20230530075555.35239-1-mika.westerberg@lin...
beld zhang, can you try it and see if it works on your system? It should apply on top of thunderbolt.git/fixes [1]. Thanks!
[1] git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
On Tue, May 30, 2023 at 4:03 AM Mika Westerberg wrote:
On Mon, May 29, 2023 at 11:12:45PM -0500, Mario Limonciello wrote:
Thanks, submitted formal patch now here:
https://lore.kernel.org/linux-usb/20230530075555.35239-1-mika.westerberg@lin...
beld zhang, can you try it and see if it works on your system? It should apply on top of thunderbolt.git/fixes [1]. Thanks!
[1] git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
tested fixes branch, applied patch: 1) boot with mouse..............: good 2) remove mouse then put on.....: good 3) rmmod / modprobe thunderbolt.: good 4) suspend / resume.............: good
On Tue, May 30, 2023 at 10:38:01AM -0400, beld zhang wrote:
On Tue, May 30, 2023 at 4:03 AM Mika Westerberg wrote:
On Mon, May 29, 2023 at 11:12:45PM -0500, Mario Limonciello wrote:
Thanks, submitted formal patch now here:
https://lore.kernel.org/linux-usb/20230530075555.35239-1-mika.westerberg@lin...
beld zhang, can you try it and see if it works on your system? It should apply on top of thunderbolt.git/fixes [1]. Thanks!
[1] git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
tested fixes branch, applied patch: 1) boot with mouse..............: good 2) remove mouse then put on.....: good 3) rmmod / modprobe thunderbolt.: good 4) suspend / resume.............: good
Thanks for testing! I took the liberty to add your tested-by to the patch. Let me know if that's not OK.
[TLDR: This mail in primarily relevant for Linux kernel regression tracking. See link in footer if these mails annoy you.]
On 28.05.23 01:48, Bagas Sanjaya wrote:
On Sat, May 27, 2023 at 04:15:51PM -0400, beld zhang wrote:
Upgrade to 6.1.30, got crash message after resume, but looks still running normally
After revert e16629c639d429e48c849808e59f1efcce886849 thunderbolt: Clear registers properly when auto clear isn't in use This error was gone.
Can you check latest mainline to see if this regression still happens? [...] #regzbot ^introduced: e16629c639d429 #regzbot title: Properly clearing Thunderbolt registers when not autoclearing triggers ring_interrupt_active crash on resume
#regzbot fix: 5532962c9ed259daf6824041aa923452cfca6b #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
linux-stable-mirror@lists.linaro.org