On 7/17/24 20:25, Johan Hovold wrote:
On Wed, Jul 17, 2024 at 07:46:14PM +0900, Damien Le Moal wrote:
On 7/17/24 18:00, Johan Hovold wrote:
On Wed, Jul 17, 2024 at 07:48:26AM +0900, Damien Le Moal wrote:
On 7/17/24 01:11, Johan Hovold wrote:
This reverts commit 7a6bbc2829d4ab592c7e440a6f6f5deb3cd95db4.
The offending commit tried to suppress a double "Starting disk" message for some drivers, but instead started spamming the log with bogus messages every five seconds:
[ 311.798956] sd 0:0:0:0: [sda] Starting disk [ 316.919103] sd 0:0:0:0: [sda] Starting disk [ 322.040775] sd 0:0:0:0: [sda] Starting disk [ 327.161140] sd 0:0:0:0: [sda] Starting disk [ 332.281352] sd 0:0:0:0: [sda] Starting disk [ 337.401878] sd 0:0:0:0: [sda] Starting disk [ 342.521527] sd 0:0:0:0: [sda] Starting disk [ 345.850401] sd 0:0:0:0: [sda] Starting disk [ 350.967132] sd 0:0:0:0: [sda] Starting disk [ 356.090454] sd 0:0:0:0: [sda] Starting disk ...
on machines that do not actually stop the disk on runtime suspend (e.g. the Qualcomm sc8280xp CRD with UFS).
This is odd. If the disk is not being being suspended, why does the platform even enable runtime PM for it ?
This is clearly intended to be supported as sd_do_start_stop() returns false and that prevents sd_start_stop_device() from being called on resume (and similarly on suspend which is why there are no matching stopping disk messages above):
[ 32.822189] sd 0:0:0:0: sd_resume_common - runtime = 1, sd_do_start_stop = 0, manage_runtime_start_stop = 0
Yes, so we can suppress the "Starting disk" message for runtime resume, to match the runtime suspend not having the message.
No, the point is that the stopping disk message is also suppressed when sd_do_start_stop() returns false (i.e. when sd_start_stop_device() is never called). See sd_suspend_common().
Are you sure about this ? Or is it simply that the runtime pm timer is set to a very low interval ?
I haven't tried to determine why runtime pm is used this way, but your patch is clearly broken as it prints a message about starting the disk even when sd_do_start_stop() returns false.
The patch is not *that* broken, because sd_do_start_stop() returning false mean only that the disk will *not* be started using a START STOP UNIT command. But the underlying LLD must start the drive. So the message is not wrong, even though it is probably best to suppress it for the runtime case.
From a quick look at the code I interpret the (original) intention to be to only print these messages in cases were sd_start_stop_device() is actually called.
The point here is that sd_runtime_resume() should NOT be called every 5s unless there is also a runtime suspend in between the calls. As mentioned, this can happen if the autosuspend timer is set to a very low timeout to aggressively suspend the disk after a short idle time. That of course makes absolutely no sense for HDDs given the spinup time needed, but I guess that is a possiblity for UFS drives.
I don't see anything obviously wrong with this for things like UFS.
Here's what some printk reveal for the Qualcomm platform in question:
[ 50.659451] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume [ 50.669756] sd 0:0:0:0: sd_resume_runtime [ 52.911603] sd 0:0:0:0: sd_suspend_runtime [ 52.921707] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend [ 53.472894] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume [ 53.481464] sd 0:0:0:0: sd_resume_runtime [ 55.550493] sd 0:0:0:0: sd_suspend_runtime [ 55.559697] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend [ 58.595554] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume [ 58.607868] sd 0:0:0:0: sd_resume_runtime [ 60.667330] sd 0:0:0:0: sd_suspend_runtime [ 60.677623] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend [ 63.714149] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume [ 63.724498] sd 0:0:0:0: sd_resume_runtime [ 65.772893] sd 0:0:0:0: sd_suspend_runtime [ 65.784696] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend [ 68.836015] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume [ 68.849576] sd 0:0:0:0: sd_resume_runtime [ 71.359102] sd 0:0:0:0: sd_suspend_runtime [ 71.368928] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend [ 73.955031] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume [ 73.963040] sd 0:0:0:0: sd_resume_runtime [ 76.032153] sd 0:0:0:0: sd_suspend_runtime [ 76.042100] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
Looks like a 2-second autosuspend timeout somewhere, and the controller stays suspended for 1-3 seconds in between.
OK. So all good and nothing suspicious with this. That is only very aggressive autosuspend. As I said, let's revert and I will rework the start/stop messages.
It almost sound like what we need to do here is suppress this message for the runtime resume case, so something like:
No, that would only make things worse as I assume you'd have a stopped disk message without a matching start message for driver that do end up stopping the disk here.
OK. so let's revert this patch and I will rework that message to be displayed only on device removal, system suspend and system shutdown.
Sounds good.
Johan