Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
$ hdparm -C /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda /dev/sda: drive state is: active/idle
I did a bisection within the stable series and found the following commit to be the first bad one:
28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
According to kernel.dance the same commit was also backported to the v6.10.3 and v6.1.103 stable kernels and I could not find any commit or pending patch with a "Fixes:" tag for the offending commit.
So far I have not been able to test with the mainline kernel as this is a remote device which I couldn't rescue in case of a boot failure. Also just for transparency it does have the out of tree ZFS module loaded, but AFAIU this shouldn't be an issue here, as the commit seems clearly related to the error. If needed I can test with an untainted mainline kernel on Friday when I'm near the device.
I have attached the output of hdparm -I below and would be happy to provide further debug information or test patches.
Cheers, Christian
---
#regzbot introduced: 28ab9769117c #regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives
---
$ pacman -Q hdparm hdparm 9.65-2
$ hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media Model Number: WDC WD40EFRX-68N32N0 Serial Number: WD-WCC7K4NLX884 Firmware Revision: 82.00A82 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x006d) Supported: 10 9 8 7 6 5 Likely used: 10 Configuration: Logical max current cylinders 16383 0 heads 16 0 sectors/track 63 0 -- LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 7814037168 Logical Sector size: 512 bytes Physical Sector size: 4096 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 3815447 MBytes device size with M = 1000*1000: 4000787 MBytes (4000 GB) cache/buffer size = unknown Form Factor: 3.5 inch Nominal Media Rotation Rate: 5400 Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * IDLE_IMMEDIATE with UNLOAD * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * Idle-Unload when NCQ is active * NCQ priority information * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT * DMA Setup Auto-Activate optimization * Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) * DOWNLOAD MICROCODE DMA command * WRITE BUFFER DMA command * READ BUFFER DMA command Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 504min for SECURITY ERASE UNIT. 504min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee2647735a1 NAA : 5 IEEE OUI : 0014ee Unique ID : 2647735a1 Checksum: correct
On 2024/08/07 10:23, Christian Heusel wrote:
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
$ hdparm -C /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda /dev/sda: drive state is: active/idle
I did a bisection within the stable series and found the following commit to be the first bad one:
28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
According to kernel.dance the same commit was also backported to the v6.10.3 and v6.1.103 stable kernels and I could not find any commit or pending patch with a "Fixes:" tag for the offending commit.
So far I have not been able to test with the mainline kernel as this is a remote device which I couldn't rescue in case of a boot failure. Also just for transparency it does have the out of tree ZFS module loaded, but AFAIU this shouldn't be an issue here, as the commit seems clearly related to the error. If needed I can test with an untainted mainline kernel on Friday when I'm near the device.
I have attached the output of hdparm -I below and would be happy to provide further debug information or test patches.
I confirm this, using 6.11-rc2. The problem is actually hdparm code which assumes that the sense data is in descriptor format without ever looking at the D_SENSE bit to verify that. So commit 28ab9769117c reveals this issue because as its title explains, it (correctly) honors D_SENSE instead of always generating sense data in descriptor format.
Hmm... This is annoying. The kernel is fixed to be spec compliant but that breaks old/non-compliant applications... We definitely should fix hdparm code, but I think we still need to revert 28ab9769117c...
Niklas, Igor, thoughts ?
Cheers, Christian
#regzbot introduced: 28ab9769117c #regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives
$ pacman -Q hdparm hdparm 9.65-2
$ hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media Model Number: WDC WD40EFRX-68N32N0 Serial Number: WD-WCC7K4NLX884 Firmware Revision: 82.00A82 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x006d) Supported: 10 9 8 7 6 5 Likely used: 10 Configuration: Logical max current cylinders 16383 0 heads 16 0 sectors/track 63 0 -- LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 7814037168 Logical Sector size: 512 bytes Physical Sector size: 4096 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 3815447 MBytes device size with M = 1000*1000: 4000787 MBytes (4000 GB) cache/buffer size = unknown Form Factor: 3.5 inch Nominal Media Rotation Rate: 5400 Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * IDLE_IMMEDIATE with UNLOAD * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * Idle-Unload when NCQ is active * NCQ priority information * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT * DMA Setup Auto-Activate optimization * Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) * DOWNLOAD MICROCODE DMA command * WRITE BUFFER DMA command * READ BUFFER DMA command Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 504min for SECURITY ERASE UNIT. 504min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee2647735a1 NAA : 5 IEEE OUI : 0014ee Unique ID : 2647735a1 Checksum: correct
On Wed, Aug 07, 2024 at 11:26:46AM -0700, Damien Le Moal wrote:
On 2024/08/07 10:23, Christian Heusel wrote:
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
$ hdparm -C /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda /dev/sda: drive state is: active/idle
I did a bisection within the stable series and found the following commit to be the first bad one:
28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
According to kernel.dance the same commit was also backported to the v6.10.3 and v6.1.103 stable kernels and I could not find any commit or pending patch with a "Fixes:" tag for the offending commit.
So far I have not been able to test with the mainline kernel as this is a remote device which I couldn't rescue in case of a boot failure. Also just for transparency it does have the out of tree ZFS module loaded, but AFAIU this shouldn't be an issue here, as the commit seems clearly related to the error. If needed I can test with an untainted mainline kernel on Friday when I'm near the device.
I have attached the output of hdparm -I below and would be happy to provide further debug information or test patches.
I confirm this, using 6.11-rc2. The problem is actually hdparm code which assumes that the sense data is in descriptor format without ever looking at the D_SENSE bit to verify that. So commit 28ab9769117c reveals this issue because as its title explains, it (correctly) honors D_SENSE instead of always generating sense data in descriptor format.
You mean: the user space application is using the sense buffer without first checking if the returned sense buffer is in descriptor or fixed format.
This seems like a fundamentally flawed assumption by the user space program. If it doesn't even bother checking the first field in the sense buffer, sb[0], perhaps it shouldn't bother trying to use the sense buffer at all.
(Yes, the D_SENSE bit can be configured by the user, but that doesn't change the fact that a user space program must check the format of the returned buffer before trying to use it.)
Hmm... This is annoying. The kernel is fixed to be spec compliant but that breaks old/non-compliant applications... We definitely should fix hdparm code, but I think we still need to revert 28ab9769117c...
Well.. if we look at commit: 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") https://github.com/torvalds/linux/commit/11093cb1ef56147fe33f5750b1eab347bde...
We can see that before that commit, the kernel used to call ata_scsi_set_sense().
Back then ata_scsi_set_sense() was defined as: https://github.com/torvalds/linux/blob/11093cb1ef56147fe33f5750b1eab347bdef3... scsi_build_sense_buffer(0, cmd->sense_buffer, sk, asc, ascq);
Where the first argument to scsi_build_sense_buffer() is if the generated sense buffer should be fixed or desc format (0 == fixed format), so we used to generate the sense buffer in fixed format: https://github.com/torvalds/linux/blob/11093cb1ef56147fe33f5750b1eab347bdef3...
However, as we can see, the kernel then used to incorrectly just change sb[0} to say that the buffer was in desc format, without updating the other fields, e.g. sb[2]: https://github.com/torvalds/linux/blob/11093cb1ef56147fe33f5750b1eab347bdef3... so the format was really in some franken format... following neither fixed or descriptor format.
11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") did change so that successful ATA-passthrough commands always generated the sense data in descriptor format. However, that commit also managed to mess up the offsets for fixed format sense...
The commit that later changed ata_scsi_set_sense() to honor D_SENSE was commit: 06dbde5f3a44 ("libata: Implement control mode page to select sense format")
So basically: Before commit 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense"), we generated sense data in some franken format for both successful and failed ATA-passthrough commands.
After commit 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") we generate sense data for sucessful ATA-passthrough commands in descriptor format unconditionally, but still in franken format for failed ATA-passthrough commands.
After commit 06dbde5f3a44 ("libata: Implement control mode page to select sense format") we generate sense data for sucessful ATA-passthrough commands in descriptor format unconditionally, but for failed commands we actually honored D_SENSE to generate it either in fixed format or descriptor format. (However, because of a bug in 11093cb1ef56, if using fixed format, the offsets were wrong...)
The incorrect offsets for fixed format was fixed recently, in commit 38dab832c3f4 ("ata: libata-scsi: Fix offsets for the fixed format sense data")
Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error") fixed so that we actually honor D_SENSE not only for failed ATA-passthrough commands, but also for successfull ATA-passthrough commands.
TL;DR: it is very hard to say that we have introduced a regression, because this crap has basically been broken in one way or another since it was introduced... Personally, I would definitely want all the patches that are in mainline in the kernel running on my machine, since that is the only thing that is consistent.
However, that assumes that user space programs that are trying to parse the sense data actually bothers to check the first field in the sense data, to see which format the returned sense data is in... Applications that do not even both with that will have problems on a lot of (historic) kernel versions.
Kind regards, Niklas
Hello Damien,
If we want to no longer respect the D_SENSE bit for successful ATA-passthrough commands, e.g. by replacing the ata_scsi_set_sense() call with a scsi_build_sense() call in the else clause: https://github.com/torvalds/linux/blob/v6.11-rc2/drivers/ata/libata-scsi.c#L...
...then I think that we should also replace the ata_scsi_set_sense() call with a scsi_build_sense() call for failed ATA-passthrough commands too (in the non-else clause): https://github.com/torvalds/linux/blob/v6.11-rc2/drivers/ata/libata-scsi.c#L...
..however, that does not sound like a very nice solution IMO.
Another option, if there are a lot of user space programs that incorrectly assume that the sense data (for both successful and failed commands) is in descriptor format, without bothering to check the sense data type, one option might be to change the default value of D_SENSE in the control mode page to 1 in libata's SATL, i.e. set ATA_DFLAG_D_SENSE in dev->flags by default.
That way, we will still respect D_SENSE while generating the sense data (in case the user issues a mode select to modify the bit), and the default will be to generate the sense data in descriptor format, as it has been since 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense").
Kind regards, Niklas
On 2024/08/09 7:30, Niklas Cassel wrote:
Hello Damien,
If we want to no longer respect the D_SENSE bit for successful ATA-passthrough commands, e.g. by replacing the ata_scsi_set_sense() call with a scsi_build_sense() call in the else clause: https://github.com/torvalds/linux/blob/v6.11-rc2/drivers/ata/libata-scsi.c#L...
...then I think that we should also replace the ata_scsi_set_sense() call with a scsi_build_sense() call for failed ATA-passthrough commands too (in the non-else clause): https://github.com/torvalds/linux/blob/v6.11-rc2/drivers/ata/libata-scsi.c#L...
..however, that does not sound like a very nice solution IMO.
Another option, if there are a lot of user space programs that incorrectly assume that the sense data (for both successful and failed commands) is in descriptor format, without bothering to check the sense data type, one option might be to change the default value of D_SENSE in the control mode page to 1 in libata's SATL, i.e. set ATA_DFLAG_D_SENSE in dev->flags by default.
That way, we will still respect D_SENSE while generating the sense data (in case the user issues a mode select to modify the bit), and the default will be to generate the sense data in descriptor format, as it has been since 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense").
That indeed should be acceptable. And we should also patch hdparm to properly look at the sense format and not assume descriptor format by default.
On 2024/08/07 15:10, Niklas Cassel wrote:
On Wed, Aug 07, 2024 at 11:26:46AM -0700, Damien Le Moal wrote:
On 2024/08/07 10:23, Christian Heusel wrote:
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
$ hdparm -C /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda /dev/sda: drive state is: active/idle
I did a bisection within the stable series and found the following commit to be the first bad one:
28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
According to kernel.dance the same commit was also backported to the v6.10.3 and v6.1.103 stable kernels and I could not find any commit or pending patch with a "Fixes:" tag for the offending commit.
So far I have not been able to test with the mainline kernel as this is a remote device which I couldn't rescue in case of a boot failure. Also just for transparency it does have the out of tree ZFS module loaded, but AFAIU this shouldn't be an issue here, as the commit seems clearly related to the error. If needed I can test with an untainted mainline kernel on Friday when I'm near the device.
I have attached the output of hdparm -I below and would be happy to provide further debug information or test patches.
I confirm this, using 6.11-rc2. The problem is actually hdparm code which assumes that the sense data is in descriptor format without ever looking at the D_SENSE bit to verify that. So commit 28ab9769117c reveals this issue because as its title explains, it (correctly) honors D_SENSE instead of always generating sense data in descriptor format.
You mean: the user space application is using the sense buffer without first checking if the returned sense buffer is in descriptor or fixed format.
Yes. The code looks like:
desc = sb + 8; if (io_hdr.driver_status != SG_DRIVER_SENSE) { ... } else if (sb[0] != 0x72 || sb[7] < 14 || desc[0] != 0x09 || desc[1] < 0x0c) { if (verbose || tf->command != ATA_OP_IDENTIFY) dump_bytes("SG_IO: bad/missing sense data, sb[]", sb, sizeof(sb)); }
So clearly it assumes descrip@tor format.
This seems like a fundamentally flawed assumption by the user space program. If it doesn't even bother checking the first field in the sense buffer, sb[0], perhaps it shouldn't bother trying to use the sense buffer at all.
(Yes, the D_SENSE bit can be configured by the user, but that doesn't change the fact that a user space program must check the format of the returned buffer before trying to use it.)
Yep. I agree.
Hmm... This is annoying. The kernel is fixed to be spec compliant but that breaks old/non-compliant applications... We definitely should fix hdparm code, but I think we still need to revert 28ab9769117c...
Well.. if we look at commit: 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") https://github.com/torvalds/linux/commit/11093cb1ef56147fe33f5750b1eab347bde...
We can see that before that commit, the kernel used to call ata_scsi_set_sense().
Back then ata_scsi_set_sense() was defined as: https://github.com/torvalds/linux/blob/11093cb1ef56147fe33f5750b1eab347bdef3... scsi_build_sense_buffer(0, cmd->sense_buffer, sk, asc, ascq);
Where the first argument to scsi_build_sense_buffer() is if the generated sense buffer should be fixed or desc format (0 == fixed format), so we used to generate the sense buffer in fixed format: https://github.com/torvalds/linux/blob/11093cb1ef56147fe33f5750b1eab347bdef3...
However, as we can see, the kernel then used to incorrectly just change sb[0} to say that the buffer was in desc format, without updating the other fields, e.g. sb[2]: https://github.com/torvalds/linux/blob/11093cb1ef56147fe33f5750b1eab347bdef3... so the format was really in some franken format... following neither fixed or descriptor format.
11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") did change so that successful ATA-passthrough commands always generated the sense data in descriptor format. However, that commit also managed to mess up the offsets for fixed format sense...
The commit that later changed ata_scsi_set_sense() to honor D_SENSE was commit: 06dbde5f3a44 ("libata: Implement control mode page to select sense format")
So basically: Before commit 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense"), we generated sense data in some franken format for both successful and failed ATA-passthrough commands.
After commit 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") we generate sense data for sucessful ATA-passthrough commands in descriptor format unconditionally, but still in franken format for failed ATA-passthrough commands.
After commit 06dbde5f3a44 ("libata: Implement control mode page to select sense format") we generate sense data for sucessful ATA-passthrough commands in descriptor format unconditionally, but for failed commands we actually honored D_SENSE to generate it either in fixed format or descriptor format. (However, because of a bug in 11093cb1ef56, if using fixed format, the offsets were wrong...)
The incorrect offsets for fixed format was fixed recently, in commit 38dab832c3f4 ("ata: libata-scsi: Fix offsets for the fixed format sense data")
Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error") fixed so that we actually honor D_SENSE not only for failed ATA-passthrough commands, but also for successfull ATA-passthrough commands.
TL;DR: it is very hard to say that we have introduced a regression, because this crap has basically been broken in one way or another since it was introduced... Personally, I would definitely want all the patches that are in mainline in the kernel running on my machine, since that is the only thing that is consistent.
However, that assumes that user space programs that are trying to parse the sense data actually bothers to check the first field in the sense data, to see which format the returned sense data is in... Applications that do not even both with that will have problems on a lot of (historic) kernel versions.
Yes, indeed. I do not want to revert any of these recent patches, because as you rightly summarize here, these fix something that has been broken for a long time. We were just lucky that we did not see more application failures until now, or rather unlucky that we did not as that would have revealed these problems earlier.
So I think we will have some patching to do to hdparm at least to fix the problems there.
On 24/08/09 08:34AM, Damien Le Moal wrote:
On 2024/08/07 15:10, Niklas Cassel wrote:
On Wed, Aug 07, 2024 at 11:26:46AM -0700, Damien Le Moal wrote:
On 2024/08/07 10:23, Christian Heusel wrote:
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
$ hdparm -C /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda /dev/sda: drive state is: active/idle
Yes, indeed. I do not want to revert any of these recent patches, because as you rightly summarize here, these fix something that has been broken for a long time. We were just lucky that we did not see more application failures until now, or rather unlucky that we did not as that would have revealed these problems earlier.
So I think we will have some patching to do to hdparm at least to fix the problems there.
It seems like this does not only break hdparm but also hddtemp, which does not use hdparm as dep as far as I can tell:
# on bad kernel for the above issue $ hddtemp /dev/sda /dev/sda: WDC WD40EFRX-68N32N0 : drive is sleeping
# on good kernel for the above issue $ hddtemp /dev/sda /dev/sda: WDC WD40EFRX-68N32N0: 31°C
I didn't take the time to actually verify that this is the same issue, but it seems very likely from what we have gathered in this thread already.
So while I agree that it might have previously just worked by chance it seems like there is quite some stuff depending on the previous behavior.
This was first discovered in [this thread in the Arch Linux Forums][0] by user @GerBra.
~Chris
On 24/08/09 08:42PM, Christian Heusel wrote:
On 24/08/09 08:34AM, Damien Le Moal wrote:
On 2024/08/07 15:10, Niklas Cassel wrote:
On Wed, Aug 07, 2024 at 11:26:46AM -0700, Damien Le Moal wrote:
On 2024/08/07 10:23, Christian Heusel wrote:
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
$ hdparm -C /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda /dev/sda: drive state is: active/idle
Yes, indeed. I do not want to revert any of these recent patches, because as you rightly summarize here, these fix something that has been broken for a long time. We were just lucky that we did not see more application failures until now, or rather unlucky that we did not as that would have revealed these problems earlier.
So I think we will have some patching to do to hdparm at least to fix the problems there.
It seems like this does not only break hdparm but also hddtemp, which does not use hdparm as dep as far as I can tell:
# on bad kernel for the above issue $ hddtemp /dev/sda /dev/sda: WDC WD40EFRX-68N32N0 : drive is sleeping # on good kernel for the above issue $ hddtemp /dev/sda /dev/sda: WDC WD40EFRX-68N32N0: 31°C
I didn't take the time to actually verify that this is the same issue, but it seems very likely from what we have gathered in this thread already.
So while I agree that it might have previously just worked by chance it seems like there is quite some stuff depending on the previous behavior.
This was first discovered in [this thread in the Arch Linux Forums][0] by user @GerBra.
~Chris
As someone on the same thread has pointed out, this also seems to affect udiskd:
On 09.08.24 22:13, Christian Heusel wrote:
On 24/08/09 08:42PM, Christian Heusel wrote:
On 24/08/09 08:34AM, Damien Le Moal wrote:
On 2024/08/07 15:10, Niklas Cassel wrote:
On Wed, Aug 07, 2024 at 11:26:46AM -0700, Damien Le Moal wrote:
On 2024/08/07 10:23, Christian Heusel wrote:
on my NAS I am encountering the following issue since v6.6.44 (LTS), when executing the hdparm command for my WD-WCC7K4NLX884 drives to get the active or standby state:
[...] Yes, indeed. I do not want to revert any of these recent patches, because as you rightly summarize here, these fix something that has been broken for a long time. We were just lucky that we did not see more application failures until now, or rather unlucky that we did not as that would have revealed these problems earlier.
It seems like this does not only break hdparm but also hddtemp, which does not use hdparm as dep as far as I can tell:
As someone on the same thread has pointed out, this also seems to affect udiskd:
For the record, three more people reported similar symptoms in the past few days:
https://lore.kernel.org/all/e620f887-a674-f007-c17b-dc16f9a0a588@web.de/ https://bugzilla.kernel.org/show_bug.cgi?id=219144
Ciao, Thorsten
P.S.: I for the tracking for now assume those are indeed the same problem:
#regzbot dup: https://lore.kernel.org/all/e620f887-a674-f007-c17b-dc16f9a0a588@web.de/ #regzbot dup: https://bugzilla.kernel.org/show_bug.cgi?id=219144
Hi,
On Mon, Aug 12, 2024 at 11:26:53AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
On 09.08.24 22:13, Christian Heusel wrote:
On 24/08/09 08:42PM, Christian Heusel wrote:
On 24/08/09 08:34AM, Damien Le Moal wrote:
On 2024/08/07 15:10, Niklas Cassel wrote:
On Wed, Aug 07, 2024 at 11:26:46AM -0700, Damien Le Moal wrote:
On 2024/08/07 10:23, Christian Heusel wrote: > > on my NAS I am encountering the following issue since v6.6.44 (LTS), > when executing the hdparm command for my WD-WCC7K4NLX884 drives to get > the active or standby state:
[...] Yes, indeed. I do not want to revert any of these recent patches, because as you rightly summarize here, these fix something that has been broken for a long time. We were just lucky that we did not see more application failures until now, or rather unlucky that we did not as that would have revealed these problems earlier.
It seems like this does not only break hdparm but also hddtemp, which does not use hdparm as dep as far as I can tell:
As someone on the same thread has pointed out, this also seems to affect udiskd:
For the record, three more people reported similar symptoms in the past few days:
https://lore.kernel.org/all/e620f887-a674-f007-c17b-dc16f9a0a588@web.de/ https://bugzilla.kernel.org/show_bug.cgi?id=219144
Ciao, Thorsten
P.S.: I for the tracking for now assume those are indeed the same problem:
#regzbot dup: https://lore.kernel.org/all/e620f887-a674-f007-c17b-dc16f9a0a588@web.de/ #regzbot dup: https://bugzilla.kernel.org/show_bug.cgi?id=219144
AFAICS, this has now been reverted upstream with fa0db8e56878 ("Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"") in 6.11-rc4 and the revert was as well backported to stable releases (5.15.165, 6.1.106, 6.6.47 and 6.10.6)
Regards, Salvatore
linux-stable-mirror@lists.linaro.org