I have been having issues with the 6.x series of kernels resuming from suspend with one of my drives. Far as I can tell it has trouble with the cache on the drive when coming out of s3 sleep. Tried a few different distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same error message. It appears to work fine on the 5.15 kernel just fine however.
This is the error or errors that I have been getting and assume has been holding up the system from resuming from suspend. Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5
The full suspend log. Jul 20 04:12:50 rageworks systemd-logind[869]: The system will suspend now! Jul 20 04:12:50 rageworks ModemManager[902]: <info> [sleep-monitor-systemd] system is about to suspend Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.4923] manager: sleep: sleep requested (sleeping: no enabled: yes) Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.4924] manager: NetworkManager state is now ASLEEP Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.4926] device (enp9s0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'managed') Jul 20 04:12:50 rageworks dbus-daemon[866]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid=894 comm="/usr/bin/NetworkManager --no-daemon") Jul 20 04:12:50 rageworks systemd[1]: Starting Network Manager Script Dispatcher Service... Jul 20 04:12:50 rageworks dbus-daemon[866]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Jul 20 04:12:50 rageworks systemd[1]: Started Network Manager Script Dispatcher Service. Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.6544] device (enp9s0): state change: deactivating -> disconnected (reason 'sleeping', sys-iface-state: 'managed') Jul 20 04:12:50 rageworks avahi-daemon[864]: Withdrawing address record for fe80::881:c55d:1583:20f5 on enp9s0. Jul 20 04:12:50 rageworks avahi-daemon[864]: Leaving mDNS multicast group on interface enp9s0.IPv6 with address fe80::881:c55d:1583:20f5. Jul 20 04:12:50 rageworks avahi-daemon[864]: Interface enp9s0.IPv6 no longer relevant for mDNS. Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.6726] dhcp4 (enp9s0): canceled DHCP transaction Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.6726] dhcp4 (enp9s0): activation: beginning transaction (timeout in 45 seconds) Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.6727] dhcp4 (enp9s0): state changed no lease Jul 20 04:12:50 rageworks avahi-daemon[864]: Withdrawing address record for 192.168.1.3 on enp9s0. Jul 20 04:12:50 rageworks avahi-daemon[864]: Leaving mDNS multicast group on interface enp9s0.IPv4 with address 192.168.1.3. Jul 20 04:12:50 rageworks avahi-daemon[864]: Interface enp9s0.IPv4 no longer relevant for mDNS. Jul 20 04:12:50 rageworks NetworkManager[894]: <info> [1689847970.7576] device (enp9s0): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'managed') Jul 20 04:12:50 rageworks kernel: r8169 0000:09:00.0 enp9s0: Link is Down Jul 20 04:12:50 rageworks systemd[1]: Reached target Sleep. Jul 20 04:12:50 rageworks systemd[1]: Starting NVIDIA system suspend actions... Jul 20 04:12:50 rageworks suspend[2051]: nvidia-suspend.service Jul 20 04:12:50 rageworks logger[2051]: <13>Jul 20 04:12:50 suspend: nvidia-suspend.service Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully. Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions. Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend... Jul 20 04:12:51 rageworks systemd-sleep[2059]: Entering sleep state 'suspend'... Jul 20 04:12:51 rageworks kernel: PM: suspend entry (deep) Jul 20 04:13:41 rageworks kernel: Filesystems sync: 0.284 seconds Jul 20 04:13:41 rageworks kernel: Freezing user space processes Jul 20 04:13:41 rageworks kernel: Freezing user space processes completed (elapsed 0.001 seconds) Jul 20 04:13:41 rageworks kernel: OOM killer disabled. Jul 20 04:13:41 rageworks kernel: Freezing remaining freezable tasks Jul 20 04:13:41 rageworks kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds) Jul 20 04:13:41 rageworks kernel: printk: Suspending console(s) (use no_console_suspend to debug) Jul 20 04:13:41 rageworks kernel: serial 00:05: disabled Jul 20 04:13:41 rageworks kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Jul 20 04:13:41 rageworks kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Synchronizing SCSI cache Jul 20 04:13:41 rageworks kernel: sd 1:0:0:0: [sdb] Stopping disk Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Stopping disk Jul 20 04:13:41 rageworks kernel: sd 0:0:0:0: [sda] Stopping disk Jul 20 04:13:41 rageworks kernel: ACPI: PM: Preparing to enter system sleep state S3 Jul 20 04:13:41 rageworks kernel: ACPI: PM: Saving platform NVS memory Jul 20 04:13:41 rageworks kernel: Disabling non-boot CPUs ... Jul 20 04:13:41 rageworks kernel: smpboot: CPU 1 is now offline Jul 20 04:13:41 rageworks kernel: smpboot: CPU 2 is now offline Jul 20 04:13:41 rageworks kernel: smpboot: CPU 3 is now offline Jul 20 04:13:41 rageworks kernel: ACPI: PM: Low-level resume complete Jul 20 04:13:41 rageworks kernel: ACPI: PM: Restoring platform NVS memory Jul 20 04:13:41 rageworks kernel: Enabling non-boot CPUs ... Jul 20 04:13:41 rageworks kernel: x86: Booting SMP configuration: Jul 20 04:13:41 rageworks kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1 Jul 20 04:13:41 rageworks kernel: microcode: CPU1: patch_level=0x08101016 Jul 20 04:13:41 rageworks kernel: CPU1 is up Jul 20 04:13:41 rageworks kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2 Jul 20 04:13:41 rageworks kernel: microcode: CPU2: patch_level=0x08101016 Jul 20 04:13:41 rageworks kernel: CPU2 is up Jul 20 04:13:41 rageworks kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3 Jul 20 04:13:41 rageworks kernel: microcode: CPU3: patch_level=0x08101016 Jul 20 04:13:41 rageworks kernel: CPU3 is up Jul 20 04:13:41 rageworks kernel: ACPI: PM: Waking up from system sleep state S3 Jul 20 04:13:41 rageworks kernel: xhci_hcd 0000:02:00.0: xHC error in resume, USBSTS 0x401, Reinit Jul 20 04:13:41 rageworks kernel: usb usb1: root hub lost power or was reset Jul 20 04:13:41 rageworks kernel: usb usb2: root hub lost power or was reset Jul 20 04:13:41 rageworks kernel: serial 00:05: activated Jul 20 04:13:41 rageworks kernel: sd 0:0:0:0: [sda] Starting disk Jul 20 04:13:41 rageworks kernel: sd 1:0:0:0: [sdb] Starting disk Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Starting disk Jul 20 04:13:41 rageworks kernel: ata5: SATA link down (SStatus 0 SControl 330) Jul 20 04:13:41 rageworks kernel: ata9: SATA link down (SStatus 0 SControl 300) Jul 20 04:13:41 rageworks kernel: ata6: SATA link down (SStatus 0 SControl 330) Jul 20 04:13:41 rageworks kernel: usb 1-7: reset full-speed USB device number 2 using xhci_hcd Jul 20 04:13:41 rageworks kernel: usb 1-10: reset full-speed USB device number 4 using xhci_hcd Jul 20 04:13:41 rageworks kernel: usb 1-8: reset full-speed USB device number 3 using xhci_hcd Jul 20 04:13:41 rageworks kernel: ata2: found unknown device (class 0) Jul 20 04:13:41 rageworks kernel: ata1: found unknown device (class 0) Jul 20 04:13:41 rageworks kernel: ata10: found unknown device (class 0) Jul 20 04:13:41 rageworks kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 20 04:13:41 rageworks kernel: ata2.00: configured for UDMA/133 Jul 20 04:13:41 rageworks kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 20 04:13:41 rageworks kernel: ata1.00: configured for UDMA/133 Jul 20 04:13:41 rageworks kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 20 04:13:41 rageworks kernel: ata10.00: configured for UDMA/133 Jul 20 04:13:41 rageworks kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 20 04:13:41 rageworks kernel: ata10.00: configured for UDMA/133 Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5 Jul 20 04:13:41 rageworks kernel: OOM killer enabled. Jul 20 04:13:41 rageworks kernel: Restarting tasks ... done. Jul 20 04:13:41 rageworks kernel: random: crng reseeded on system resumption Jul 20 04:13:41 rageworks kernel: PM: suspend exit Jul 20 04:13:41 rageworks rtkit-daemon[1349]: The canary thread is apparently starving. Taking action. Jul 20 04:13:41 rageworks systemd-sleep[2059]: System returned from sleep state. Jul 20 04:13:41 rageworks rtkit-daemon[1349]: Demoting known real-time threads. Jul 20 04:13:41 rageworks systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. Jul 20 04:13:41 rageworks rtkit-daemon[1349]: Successfully demoted thread 1597 of process 1342. Jul 20 04:13:41 rageworks rtkit-daemon[1349]: Successfully demoted thread 1342 of process 1342. Jul 20 04:13:41 rageworks rtkit-daemon[1349]: Demoted 2 threads. Jul 20 04:13:42 rageworks systemd[1]: systemd-suspend.service: Deactivated successfully. Jul 20 04:13:42 rageworks systemd[1]: Finished System Suspend. Jul 20 04:13:42 rageworks systemd[1]: Stopped target Sleep. Jul 20 04:13:42 rageworks systemd[1]: Reached target Suspend. Jul 20 04:13:42 rageworks systemd[1]: Stopped target Suspend. Jul 20 04:13:42 rageworks systemd-logind[869]: Operation 'sleep' finished.
Thanks
Hi, Thorsten here, the Linux kernel's regression tracker.
On 26.07.23 13:54, TW wrote:
I have been having issues with the 6.x series of kernels resuming from suspend with one of my drives. Far as I can tell it has trouble with the cache on the drive when coming out of s3 sleep. Tried a few different distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same error message. It appears to work fine on the 5.15 kernel just fine however.
This is the error or errors that I have been getting and assume has been holding up the system from resuming from suspend.
Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5
Thx for your report. I CCed a few people, with a bit of luck they have an idea. But I doubt it. If no one replies you likely will need a bisection to find the root of the problem. But before going down that route you want to check if latest mainline kernel (vanilla!) works better.
FWIW, this is not my area of expertise, so the following might be a misleading comment, but the problem looks somewhat similar to this one that iirc was never solved: https://bugzilla.kernel.org/show_bug.cgi?id=216087
Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully. Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions. Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend...
That sounds like you are using out-of tree drivers which can cause all sorts of issues. Please recheck if the problem happens without those as well and do not use them in all further tests to debug the issue.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On 7/26/23 22:47, Thorsten Leemhuis wrote:
Hi, Thorsten here, the Linux kernel's regression tracker.
On 26.07.23 13:54, TW wrote:
I have been having issues with the 6.x series of kernels resuming from suspend with one of my drives. Far as I can tell it has trouble with the cache on the drive when coming out of s3 sleep. Tried a few different distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same error message. It appears to work fine on the 5.15 kernel just fine however.
This is the error or errors that I have been getting and assume has been holding up the system from resuming from suspend.
Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
This sense is garbage. This issue was reported already, but it is hard to deal with as it seems to be due to drives/adapters not correctly reporting status bits. So for now, let's ignore this sense codes.
The start/stop unit failure is weird. On another case, I am suspecting that this command is causing a delay on resume, but not an error like this.
Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5
Thx for your report. I CCed a few people, with a bit of luck they have an idea. But I doubt it. If no one replies you likely will need a bisection to find the root of the problem. But before going down that route you want to check if latest mainline kernel (vanilla!) works better.
FWIW, this is not my area of expertise, so the following might be a misleading comment, but the problem looks somewhat similar to this one that iirc was never solved: https://bugzilla.kernel.org/show_bug.cgi?id=216087
Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully. Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions. Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend...
That sounds like you are using out-of tree drivers which can cause all sorts of issues. Please recheck if the problem happens without those as well and do not use them in all further tests to debug the issue.
Yes. Please retest with the latest 6.5-rc3.
And can you try this patch to see if it solves your issue ?
commit 29e81d11812ee924d19425343ec69acd34af9d35 Author: Damien Le Moal dlemoal@kernel.org Date: Mon Jul 24 13:23:14 2023 +0900
ata,scsi: do not issue START STOP UNIT on resume
Signed-off-by: Damien Le Moal dlemoal@kernel.org
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 370d18aca71e..6184c7bcc16c 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1100,7 +1100,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) } } else { sdev->sector_size = ata_id_logical_sector_size(dev->id); + /* + * Stop the drive on suspend but do not issue START STOP UNIT + * on resume as this is not necessary: the port is reset on + * resume, which wakes up the drive. + */ sdev->manage_start_stop = 1; + sdev->no_start_on_resume = 1; }
/* diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 68b12afa0721..b8584fe3123e 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) static int sd_resume(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); - int ret; + int ret = 0;
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ return 0; @@ -3885,7 +3885,8 @@ static int sd_resume(struct device *dev) return 0;
sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); - ret = sd_start_stop_device(sdkp, 1); + if (!sdkp->device->no_start_on_resume) + ret = sd_start_stop_device(sdkp, 1); if (!ret) opal_unlock_from_suspend(sdkp->opal_dev); return ret; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 75b2235b99e2..b9230b6add04 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -194,6 +194,7 @@ struct scsi_device { unsigned no_start_on_add:1; /* do not issue start on add */ unsigned allow_restart:1; /* issue START_UNIT in error handler */ unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ + unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ unsigned select_no_atn:1;
I retried on 6.5 rc3 without the Nvidia drivers and still received the same error and going to try for the patch next but got a malformed patch error on line 6 for the first patch for libata-scsi.c. The other two seem to go through just fine however.
Also the bugzilla link is similar to what I have but the disk doesn't disappear, comes back but just takes awhile to come back out of sleep mode.
On 7/26/23 17:39, Damien Le Moal wrote:
On 7/26/23 22:47, Thorsten Leemhuis wrote:
Hi, Thorsten here, the Linux kernel's regression tracker.
On 26.07.23 13:54, TW wrote:
I have been having issues with the 6.x series of kernels resuming from suspend with one of my drives. Far as I can tell it has trouble with the cache on the drive when coming out of s3 sleep. Tried a few different distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same error message. It appears to work fine on the 5.15 kernel just fine however.
This is the error or errors that I have been getting and assume has been holding up the system from resuming from suspend.
Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
This sense is garbage. This issue was reported already, but it is hard to deal with as it seems to be due to drives/adapters not correctly reporting status bits. So for now, let's ignore this sense codes.
The start/stop unit failure is weird. On another case, I am suspecting that this command is causing a delay on resume, but not an error like this.
Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5
Thx for your report. I CCed a few people, with a bit of luck they have an idea. But I doubt it. If no one replies you likely will need a bisection to find the root of the problem. But before going down that route you want to check if latest mainline kernel (vanilla!) works better.
FWIW, this is not my area of expertise, so the following might be a misleading comment, but the problem looks somewhat similar to this one that iirc was never solved: https://bugzilla.kernel.org/show_bug.cgi?id=216087
Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully. Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions. Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend...
That sounds like you are using out-of tree drivers which can cause all sorts of issues. Please recheck if the problem happens without those as well and do not use them in all further tests to debug the issue.
Yes. Please retest with the latest 6.5-rc3.
And can you try this patch to see if it solves your issue ?
commit 29e81d11812ee924d19425343ec69acd34af9d35 Author: Damien Le Moal dlemoal@kernel.org Date: Mon Jul 24 13:23:14 2023 +0900
ata,scsi: do not issue START STOP UNIT on resume Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 370d18aca71e..6184c7bcc16c 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1100,7 +1100,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) } } else { sdev->sector_size = ata_id_logical_sector_size(dev->id);
/*
* Stop the drive on suspend but do not issue START STOP UNIT
* on resume as this is not necessary: the port is reset on
* resume, which wakes up the drive.
*/
sdev->manage_start_stop = 1;
sdev->no_start_on_resume = 1;
}
/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 68b12afa0721..b8584fe3123e 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev) static int sd_resume(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev);
- int ret;
int ret = 0;
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ return 0;
@@ -3885,7 +3885,8 @@ static int sd_resume(struct device *dev) return 0;
sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
- ret = sd_start_stop_device(sdkp, 1);
- if (!sdkp->device->no_start_on_resume)
if (!ret) opal_unlock_from_suspend(sdkp->opal_dev); return ret;ret = sd_start_stop_device(sdkp, 1);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 75b2235b99e2..b9230b6add04 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -194,6 +194,7 @@ struct scsi_device { unsigned no_start_on_add:1; /* do not issue start on add */ unsigned allow_restart:1; /* issue START_UNIT in error handler */ unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */
- unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */ unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ unsigned select_no_atn:1;
I managed to fix the patch file, guess the formatting messed up a bit. So will try with those patches installed.
On 7/27/23 04:06, TW wrote:
I retried on 6.5 rc3 without the Nvidia drivers and still received the same error and going to try for the patch next but got a malformed patch error on line 6 for the first patch for libata-scsi.c. The other two seem to go through just fine however.
On 7/27/23 19:22, TW wrote:
I managed to fix the patch file, guess the formatting messed up a bit. So will try with those patches installed.
Please do not top post.
Again, I sent you a single patch. What other patches are you talking about ?
On 7/27/23 04:06, TW wrote:
I retried on 6.5 rc3 without the Nvidia drivers and still received the same error and going to try for the patch next but got a malformed patch error on line 6 for the first patch for libata-scsi.c. The other two seem to go through just fine however.
It was all 1 patch but the first change had a formatting issue from the email format I guess. So I fixed that and the patch went through and looks like the drive error message has stopped. Still a little slow coming back but that error is gone at least.
Jul 27 05:05:05 rageworks systemd[1]: Starting System Suspend... Jul 27 05:05:05 rageworks systemd-sleep[1624]: Entering sleep state 'suspend'... Jul 27 05:05:05 rageworks kernel: PM: suspend entry (deep) Jul 27 05:05:05 rageworks kernel: Filesystems sync: 0.246 seconds Jul 27 05:05:26 rageworks kernel: Freezing user space processes Jul 27 05:05:26 rageworks kernel: Freezing user space processes completed (elapsed 0.001 seconds) Jul 27 05:05:26 rageworks kernel: OOM killer disabled. Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds) Jul 27 05:05:26 rageworks kernel: printk: Suspending console(s) (use no_console_suspend to debug) Jul 27 05:05:26 rageworks kernel: serial 00:05: disabled Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Synchronizing SCSI cache Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Stopping disk Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Stopping disk Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Stopping disk Jul 27 05:05:26 rageworks kernel: ACPI: PM: Preparing to enter system sleep state S3 Jul 27 05:05:26 rageworks kernel: ACPI: PM: Saving platform NVS memory Jul 27 05:05:26 rageworks kernel: Disabling non-boot CPUs ... Jul 27 05:05:26 rageworks kernel: smpboot: CPU 1 is now offline Jul 27 05:05:26 rageworks kernel: smpboot: CPU 2 is now offline Jul 27 05:05:26 rageworks kernel: smpboot: CPU 3 is now offline Jul 27 05:05:26 rageworks kernel: ACPI: PM: Low-level resume complete Jul 27 05:05:26 rageworks kernel: ACPI: PM: Restoring platform NVS memory Jul 27 05:05:26 rageworks kernel: Enabling non-boot CPUs ... Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1 Jul 27 05:05:26 rageworks kernel: CPU1 is up Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2 Jul 27 05:05:26 rageworks kernel: CPU2 is up Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3 Jul 27 05:05:26 rageworks kernel: CPU3 is up Jul 27 05:05:26 rageworks kernel: ACPI: PM: Waking up from system sleep state S3 Jul 27 05:05:26 rageworks kernel: xhci_hcd 0000:02:00.0: xHC error in resume, USBSTS 0x401, Reinit Jul 27 05:05:26 rageworks kernel: usb usb1: root hub lost power or was reset Jul 27 05:05:26 rageworks kernel: usb usb2: root hub lost power or was reset Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Starting disk Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Starting disk Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Starting disk Jul 27 05:05:26 rageworks kernel: serial 00:05: activated Jul 27 05:05:26 rageworks kernel: ata6: SATA link down (SStatus 0 SControl 330) Jul 27 05:05:26 rageworks kernel: ata5: SATA link down (SStatus 0 SControl 330) Jul 27 05:05:26 rageworks kernel: ata9: SATA link down (SStatus 0 SControl 300) Jul 27 05:05:26 rageworks kernel: usb 1-10: reset full-speed USB device number 4 using xhci_hcd Jul 27 05:05:26 rageworks kernel: usb 1-8: reset full-speed USB device number 3 using xhci_hcd Jul 27 05:05:26 rageworks kernel: usb 1-7: reset full-speed USB device number 2 using xhci_hcd Jul 27 05:05:26 rageworks kernel: OOM killer enabled. Jul 27 05:05:26 rageworks kernel: Restarting tasks ... done. Jul 27 05:05:26 rageworks kernel: random: crng reseeded on system resumption Jul 27 05:05:26 rageworks kernel: PM: suspend exit
On 7/27/23 04:27, Damien Le Moal wrote:
On 7/27/23 19:22, TW wrote:
I managed to fix the patch file, guess the formatting messed up a bit. So will try with those patches installed.
Just in case, patch fil attached to avoid formatting issues.
On 7/27/23 21:25, TW wrote:
It was all 1 patch but the first change had a formatting issue from the email format I guess. So I fixed that and the patch went through and looks like the drive error message has stopped. Still a little slow coming back but that error is gone at least.
"Slow coming back" -> Compared to which version of the kernel ? Do you have numbers ?
If the devices are HDDs, resume will wait for these to spin up. That takes a while (about 10s normally).
Jul 27 05:05:05 rageworks systemd[1]: Starting System Suspend... Jul 27 05:05:05 rageworks systemd-sleep[1624]: Entering sleep state 'suspend'... Jul 27 05:05:05 rageworks kernel: PM: suspend entry (deep) Jul 27 05:05:05 rageworks kernel: Filesystems sync: 0.246 seconds Jul 27 05:05:26 rageworks kernel: Freezing user space processes Jul 27 05:05:26 rageworks kernel: Freezing user space processes completed (elapsed 0.001 seconds) Jul 27 05:05:26 rageworks kernel: OOM killer disabled. Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds) Jul 27 05:05:26 rageworks kernel: printk: Suspending console(s) (use no_console_suspend to debug) Jul 27 05:05:26 rageworks kernel: serial 00:05: disabled Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Synchronizing SCSI cache Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Stopping disk Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Stopping disk Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Stopping disk Jul 27 05:05:26 rageworks kernel: ACPI: PM: Preparing to enter system sleep state S3 Jul 27 05:05:26 rageworks kernel: ACPI: PM: Saving platform NVS memory Jul 27 05:05:26 rageworks kernel: Disabling non-boot CPUs ... Jul 27 05:05:26 rageworks kernel: smpboot: CPU 1 is now offline Jul 27 05:05:26 rageworks kernel: smpboot: CPU 2 is now offline Jul 27 05:05:26 rageworks kernel: smpboot: CPU 3 is now offline Jul 27 05:05:26 rageworks kernel: ACPI: PM: Low-level resume complete Jul 27 05:05:26 rageworks kernel: ACPI: PM: Restoring platform NVS memory Jul 27 05:05:26 rageworks kernel: Enabling non-boot CPUs ... Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1 Jul 27 05:05:26 rageworks kernel: CPU1 is up Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 2 APIC 0x2 Jul 27 05:05:26 rageworks kernel: CPU2 is up Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3 Jul 27 05:05:26 rageworks kernel: CPU3 is up Jul 27 05:05:26 rageworks kernel: ACPI: PM: Waking up from system sleep state S3 Jul 27 05:05:26 rageworks kernel: xhci_hcd 0000:02:00.0: xHC error in resume, USBSTS 0x401, Reinit Jul 27 05:05:26 rageworks kernel: usb usb1: root hub lost power or was reset Jul 27 05:05:26 rageworks kernel: usb usb2: root hub lost power or was reset Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Starting disk Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Starting disk Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Starting disk Jul 27 05:05:26 rageworks kernel: serial 00:05: activated Jul 27 05:05:26 rageworks kernel: ata6: SATA link down (SStatus 0 SControl 330) Jul 27 05:05:26 rageworks kernel: ata5: SATA link down (SStatus 0 SControl 330) Jul 27 05:05:26 rageworks kernel: ata9: SATA link down (SStatus 0 SControl 300) Jul 27 05:05:26 rageworks kernel: usb 1-10: reset full-speed USB device number 4 using xhci_hcd Jul 27 05:05:26 rageworks kernel: usb 1-8: reset full-speed USB device number 3 using xhci_hcd Jul 27 05:05:26 rageworks kernel: usb 1-7: reset full-speed USB device number 2 using xhci_hcd Jul 27 05:05:26 rageworks kernel: OOM killer enabled. Jul 27 05:05:26 rageworks kernel: Restarting tasks ... done. Jul 27 05:05:26 rageworks kernel: random: crng reseeded on system resumption Jul 27 05:05:26 rageworks kernel: PM: suspend exit
On 7/27/23 04:27, Damien Le Moal wrote:
On 7/27/23 19:22, TW wrote:
I managed to fix the patch file, guess the formatting messed up a bit. So will try with those patches installed.
Just in case, patch fil attached to avoid formatting issues.
Comparing to the 5.15 kernel which had almost no delay. They are HDDs though, it was working flawlessly earlier but didn't want to top post again. Tried a systemd suspend instead of from the Xfce4 logout menu and everything works as intended. So I'd say that it's fixed now.
On 7/27/23 20:33, Damien Le Moal wrote:
"Slow coming back" -> Compared to which version of the kernel ? Do you have numbers ? If the devices are HDDs, resume will wait for these to spin up. That takes a while (about 10s normally).
On 7/28/23 11:49, TW wrote:
Comparing to the 5.15 kernel which had almost no delay. They are HDDs though, it was working flawlessly earlier but didn't want to top post again. Tried a systemd suspend instead of from the Xfce4 logout menu and everything works as intended. So I'd say that it's fixed now.
I think that asynchronous scsi suspend/resume using the PM infrastructure, introduced in 5.16, is creating this additional delay. The reason is that now, ata and scsi drivers do their resume operations from within the PM resume process, resulting in some delays to reach "PM: suspend exit".
I am looking into improving this, but it may take some more time.
I will post officially the patch you tried and CC you. A "tested-by" tag from you would be appreciated.
Thanks.
On 7/27/23 20:33, Damien Le Moal wrote:
"Slow coming back" -> Compared to which version of the kernel ? Do you have numbers ? If the devices are HDDs, resume will wait for these to spin up. That takes a while (about 10s normally).
On 7/28/23 11:49, TW wrote:
Comparing to the 5.15 kernel which had almost no delay. They are HDDs though, it was working flawlessly earlier but didn't want to top post again. Tried a systemd suspend instead of from the Xfce4 logout menu and everything works as intended. So I'd say that it's fixed now.
I posted a proper patch and CC-ed you. A "Tested-by:" tag from you would be much appreciated (please use your full name in that case, instead of only "TW"). Thanks.
On 7/27/23 20:33, Damien Le Moal wrote:
"Slow coming back" -> Compared to which version of the kernel ? Do you have numbers ? If the devices are HDDs, resume will wait for these to spin up. That takes a while (about 10s normally).
On 7/27/23 19:06, TW wrote:
I retried on 6.5 rc3 without the Nvidia drivers and still received the same error and going to try for the patch next but got a malformed patch error on line 6 for the first patch for libata-scsi.c. The other two seem to go through just fine however.
Which other two patches are you talking about ?
Also the bugzilla link is similar to what I have but the disk doesn't disappear, comes back but just takes awhile to come back out of sleep mode.
The switch to async resume has revealed many issues with the way ata devices are resumed with regard to their scsi representation. The issues manifest in different form. The drive "gone" problem was fixed recently. The start-stop command seem to cause most of the time a delay in resume that several users noticed. In your case though, it is an outright failure.
Hence the *single* patch I asked you to test (another user with a delay issue is testing the same as well). Not sure what other patches you are talking about.
linux-stable-mirror@lists.linaro.org