On Wed, 2018-11-07 at 11:34 -0800, Dan Williams wrote:
On Wed, Nov 7, 2018 at 10:52 AM Toshi Kani toshi.kani@hpe.com wrote:
ndctl zero-labels completes with a large number of zeroed nmems when it fails to do zeroing on a protected NVDIMM.
# ndctl zero-labels nmem1 zeroed 65504 nmems
When an ACPI call completes with error, xlat_status() called from acpi_nfit_ctl() sets error to *cmd_rc. __nd_ioctl(), however, does not check this error and returns with success.
Fix __nd_ioctl() to check this error in cmd_rc.
So this arrangement is by design and the bug is in the ndctl utility.
A successful return code from the ioctl means that the command was successfully submitted to firmware. It's then up to userspace to parse if there was a command specific error returned in the response payload. Automatically returning cmd_rc removes the ability for userspace tooling to do its own command specific error handling. With this change userspace could no longer be sure if the failure is in the submission or the execution of the command, or determine if the command response payload is valid.
I see. I was wondering which side needs to be fixed, and decided to follow kernel-internal ACPI calls like nvdimm_clear_poison(). I agree that a command error code is necessary if user space tool needs to deal with it. OK, I will look into fixing ndctl.
Thanks, -Toshi