Quoting the documentation:
Some persistent memory devices run a firmware locally on the device / "DIMM" to perform tasks like media management, capacity provisioning, and health monitoring. The process of updating that firmware typically involves a reboot because it has implications for in-flight memory transactions. However, reboots are disruptive and at least the Intel persistent memory platform implementation, described by the Intel ACPI DSM specification [1], has added support for activating firmware at runtime.
[1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism behind a libnvdimm-generic sysfs interface. The interface could support runtime-firmware-activation on another architecture without need to change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods (DSMs) to 'arm' individual devices for activation and bus-level 'trigger' method to execute the activation. Informational / enumeration methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that the memory controller may need to be quiesced, no memory cycles, during the activation. While the platform has mechanisms to support holding off in-flight DMA during the activation, the device response to that delay is potentially undefined. The platform may reject a runtime firmware update if, for example a PCI-E device does not support its completion timeout value being increased to meet the activation time. Outside of device timeouts the quiesce period may also violate application timeouts.
Given the above device and application timeout considerations the implementation defaults to hooking into the suspend path to trigger the activation, i.e. that a suspend-resume cycle (at least up to the syscore suspend point) is required. That default policy ensures that the system is in a quiescent state before ceasing memory controller responses for the activate. However, if desired, runtime activation without suspend can be forced as an override.
The ndctl utility grows the following extensions / commands to drive this mechanism:
1/ The existing update-firmware command will 'arm' devices where the firmware image is staged by default.
ndctl update-firmware all -f firmware_image.bin
2/ The existing ability to enumerate firmware-update capabilities now includes firmware activate capabilities at the 'bus' and 'dimm/device' level:
ndctl list -BDF -b nfit_test.0 [ { "provider":"nfit_test.0", "dev":"ndbus2", "scrub_state":"idle", "firmware":{ "activate_method":"suspend", "activate_state":"idle" }, "dimms":[ { "dev":"nmem1", "id":"cdab-0a-07e0-ffffffff", "handle":0, "phys_id":0, "security":"disabled", "firmware":{ "current_version":0, "can_update":true } }, ...
3/ When the system can support activation without quiesce, or when the suspend-resume requirement is going to be suppressed, the new activate-firmware command wraps that functionality:
ndctl activate-firmware nfit_test.0 --force
One major open question for review is how users can trigger firmware-activation via suspend without doing a full trip through the BIOS. The activation currently requires CONFIG_PM_DEBUG to enable that flow. This seems an awkward dependency for something that is expected to be a production capability.
---
Dan Williams (12): libnvdimm: Validate command family indices ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor ACPI: NFIT: Define runtime firmware activation commands tools/testing/nvdimm: Cleanup dimm index passing tools/testing/nvdimm: Add command debug messages tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation tools/testing/nvdimm: Emulate firmware activation commands driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW} libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO() libnvdimm: Add runtime firmware activation sysfs interface PM, libnvdimm: Add syscore_quiesced() callback for firmware activation ACPI: NFIT: Add runtime firmware activate support
Documentation/ABI/testing/sysfs-bus-nfit | 35 ++ Documentation/ABI/testing/sysfs-bus-nvdimm | 2 .../driver-api/nvdimm/firmware-activate.rst | 74 +++ drivers/acpi/nfit/core.c | 146 +++++-- drivers/acpi/nfit/intel.c | 426 ++++++++++++++++++++ drivers/acpi/nfit/intel.h | 61 +++ drivers/acpi/nfit/nfit.h | 39 ++ drivers/base/syscore.c | 18 + drivers/nvdimm/bus.c | 46 ++ drivers/nvdimm/core.c | 103 +++++ drivers/nvdimm/dimm_devs.c | 99 +++++ drivers/nvdimm/namespace_devs.c | 2 drivers/nvdimm/nd-core.h | 1 drivers/nvdimm/pfn_devs.c | 2 drivers/nvdimm/region_devs.c | 2 include/linux/device.h | 4 include/linux/libnvdimm.h | 53 ++ include/linux/syscore_ops.h | 2 include/linux/sysfs.h | 7 include/uapi/linux/ndctl.h | 5 kernel/power/suspend.c | 2 tools/testing/nvdimm/test/nfit.c | 367 ++++++++++++++--- 22 files changed, 1382 insertions(+), 114 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-nvdimm create mode 100644 Documentation/driver-api/nvdimm/firmware-activate.rst
base-commit: 48778464bb7d346b47157d21ffde2af6b2d39110
The ND_CMD_CALL format allows for a general passthrough of whitelisted commands targeting a given command set. However there is no validation of the family index relative to what the bus supports.
- Update the NFIT bus implementation (the only one that supports ND_CMD_CALL passthrough) to also whitelist the valid set of command family indices.
- Update the generic __nd_ioctl() path to validate that field on behalf of all implementations.
Cc: Vishal Verma vishal.l.verma@intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Len Brown lenb@kernel.org Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism") Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/acpi/nfit/core.c | 11 +++++++++-- drivers/acpi/nfit/nfit.h | 1 - drivers/nvdimm/bus.c | 16 ++++++++++++++++ include/linux/libnvdimm.h | 2 ++ include/uapi/linux/ndctl.h | 4 ++++ 5 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 7c138a4edc03..1f72ce1a782b 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1823,6 +1823,7 @@ static void populate_shutdown_status(struct nfit_mem *nfit_mem) static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc, struct nfit_mem *nfit_mem, u32 device_handle) { + struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc; struct acpi_device *adev, *adev_dimm; struct device *dev = acpi_desc->dev; unsigned long dsm_mask, label_mask; @@ -1834,6 +1835,7 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc, /* nfit test assumes 1:1 relationship between commands and dsms */ nfit_mem->dsm_mask = acpi_desc->dimm_cmd_force_en; nfit_mem->family = NVDIMM_FAMILY_INTEL; + set_bit(NVDIMM_FAMILY_INTEL, &nd_desc->dimm_family_mask);
if (dcr->valid_fields & ACPI_NFIT_CONTROL_MFG_INFO_VALID) sprintf(nfit_mem->id, "%04x-%02x-%04x-%08x", @@ -1886,10 +1888,13 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc, * Note, that checking for function0 (bit0) tells us if any commands * are reachable through this GUID. */ + clear_bit(NVDIMM_FAMILY_INTEL, &nd_desc->dimm_family_mask); for (i = 0; i <= NVDIMM_FAMILY_MAX; i++) - if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1)) + if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1)) { + set_bit(i, &nd_desc->dimm_family_mask); if (family < 0 || i == default_dsm_family) family = i; + }
/* limit the supported commands to those that are publicly documented */ nfit_mem->family = family; @@ -2153,6 +2158,9 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc)
nd_desc->cmd_mask = acpi_desc->bus_cmd_force_en; nd_desc->bus_dsm_mask = acpi_desc->bus_nfit_cmd_force_en; + set_bit(ND_CMD_CALL, &nd_desc->cmd_mask); + set_bit(NVDIMM_BUS_FAMILY_NFIT, &nd_desc->bus_family_mask); + adev = to_acpi_dev(acpi_desc); if (!adev) return; @@ -2160,7 +2168,6 @@ static void acpi_nfit_init_dsms(struct acpi_nfit_desc *acpi_desc) for (i = ND_CMD_ARS_CAP; i <= ND_CMD_CLEAR_ERROR; i++) if (acpi_check_dsm(adev->handle, guid, 1, 1ULL << i)) set_bit(i, &nd_desc->cmd_mask); - set_bit(ND_CMD_CALL, &nd_desc->cmd_mask);
dsm_mask = (1 << ND_CMD_ARS_CAP) | diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h index f5525f8bb770..5c5e7ebba8dc 100644 --- a/drivers/acpi/nfit/nfit.h +++ b/drivers/acpi/nfit/nfit.h @@ -33,7 +33,6 @@ | ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \ | ACPI_NFIT_MEM_NOT_ARMED | ACPI_NFIT_MEM_MAP_FAILED)
-#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_HYPERV #define NVDIMM_CMD_MAX 31
#define NVDIMM_STANDARD_CMDMASK \ diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 09087c38fabd..955265656b96 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -1037,9 +1037,25 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm, dimm_name = "bus"; }
+ /* Validate command family support against bus declared support */ if (cmd == ND_CMD_CALL) { + unsigned long *mask; + if (copy_from_user(&pkg, p, sizeof(pkg))) return -EFAULT; + + if (nvdimm) { + if (pkg.nd_family > NVDIMM_FAMILY_MAX) + return -EINVAL; + mask = &nd_desc->dimm_family_mask; + } else { + if (pkg.nd_family > NVDIMM_BUS_FAMILY_MAX) + return -EINVAL; + mask = &nd_desc->bus_family_mask; + } + + if (!test_bit(pkg.nd_family, mask)) + return -EINVAL; }
if (!desc || diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 18da4059be09..bd39a2cf7972 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -78,6 +78,8 @@ struct nvdimm_bus_descriptor { const struct attribute_group **attr_groups; unsigned long bus_dsm_mask; unsigned long cmd_mask; + unsigned long dimm_family_mask; + unsigned long bus_family_mask; struct module *module; char *provider_name; struct device_node *of_node; diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h index 0e09dc5cec19..e9468b9332bd 100644 --- a/include/uapi/linux/ndctl.h +++ b/include/uapi/linux/ndctl.h @@ -245,6 +245,10 @@ struct nd_cmd_pkg { #define NVDIMM_FAMILY_MSFT 3 #define NVDIMM_FAMILY_HYPERV 4 #define NVDIMM_FAMILY_PAPR 5 +#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_PAPR + +#define NVDIMM_BUS_FAMILY_NFIT 0 +#define NVDIMM_BUS_FAMILY_MAX NVDIMM_BUS_FAMILY_NFIT
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\ struct nd_cmd_pkg)
Hi
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag fixing commit: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism").
The bot has tested the following trees: v5.7.6, v5.4.49, v4.19.130, v4.14.186, v4.9.228.
v5.7.6: Failed to apply! Possible dependencies: f517f7925b7b4 ("ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods")
v5.4.49: Failed to apply! Possible dependencies: 72c4ebbac476b ("powerpc/papr_scm: Mark papr_scm_ndctl() as static") f517f7925b7b4 ("ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods")
v4.19.130: Failed to apply! Possible dependencies: 01091c496f920 ("acpi/nfit: improve bounds checking for 'func'") 0ead11181fe0c ("acpi, nfit: Collect shutdown status") 6f07f86c49407 ("acpi, nfit: Introduce nfit_mem flags") 72c4ebbac476b ("powerpc/papr_scm: Mark papr_scm_ndctl() as static") b3ed2ce024c36 ("acpi/nfit: Add support for Intel DSM 1.8 commands") b5beae5e224f1 ("powerpc/pseries: Add driver for PAPR SCM regions") d6548ae4d16dc ("acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm") f517f7925b7b4 ("ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods")
v4.14.186: Failed to apply! Possible dependencies: 01091c496f920 ("acpi/nfit: improve bounds checking for 'func'") 0e7f0741450b1 ("acpi, nfit: validate commands against the device type") 1194c4133195d ("nfit: Add Hyper-V NVDIMM DSM command set to white list") 11e1427016095 ("acpi, nfit: add support for NVDIMM_FAMILY_INTEL v1.6 DSMs") 466d1493ea830 ("acpi, nfit: rework NVDIMM leaf method detection") 4b27db7e26cdb ("acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods") 6f07f86c49407 ("acpi, nfit: Introduce nfit_mem flags") b37b3fd33d034 ("acpi nfit: Enable to show what feature is supported via ND_CMD_CALL for nfit_test") b9b1504d3c6d6 ("acpi, nfit: hide unknown commands from nmemX/commands") d6548ae4d16dc ("acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm")
v4.9.228: Failed to apply! Possible dependencies: 095ab4b39f91b ("acpi, nfit: allow override of built-in bitmasks for nvdimm DSMs") 0f817ae696b04 ("usb: dwc3: pci: add a private driver structure") 36daf3aa399c0 ("usb: dwc3: pci: avoid build warning") 3f23df72dc351 ("mmc: sdhci-pci: Use ACPI to get max frequency for Intel NI byt sdio") 41c8bdb3ab10c ("acpi, nfit: Switch to use new generic UUID API") 42237e393f64d ("libnvdimm: allow a platform to force enable label support") 42b06496407c0 ("mmc: sdhci-pci: Add PCI ID for Intel NI byt sdio") 4b27db7e26cdb ("acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods") 6f07f86c49407 ("acpi, nfit: Introduce nfit_mem flags") 8f078b38dd382 ("libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED") 94116f8126de9 ("ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()") 9cecca75b5a0d ("usb: dwc3: pci: call _DSM for suspend/resume") 9d62ed9651182 ("libnvdimm: handle locked label storage areas") b7fe92999a98a ("ACPI / extlog: Switch to use new generic UUID API") b917078c1c107 ("net: hns: Add ACPI support to check SFP present") ba650cfcf9409 ("acpi, nfit: allow specifying a default DSM family") c959a6b00ff58 ("mmc: sdhci-pci: Don't re-tune with runtime pm for some Intel devices") d2061f9cc32db ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY") d6548ae4d16dc ("acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm") fab9288428ec0 ("usb: USB Type-C connector class")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
Hi
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag fixing commit: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism").
The bot has tested the following trees: v5.7.6, v5.4.49, v4.19.130, v4.14.186, v4.9.228.
v5.7.6: Failed to apply! Possible dependencies: f517f7925b7b4 ("ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods")
v5.4.49: Failed to apply! Possible dependencies: 72c4ebbac476b ("powerpc/papr_scm: Mark papr_scm_ndctl() as static") f517f7925b7b4 ("ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods")
v4.19.130: Failed to apply! Possible dependencies: 01091c496f920 ("acpi/nfit: improve bounds checking for 'func'") 0ead11181fe0c ("acpi, nfit: Collect shutdown status") 6f07f86c49407 ("acpi, nfit: Introduce nfit_mem flags") 72c4ebbac476b ("powerpc/papr_scm: Mark papr_scm_ndctl() as static") b3ed2ce024c36 ("acpi/nfit: Add support for Intel DSM 1.8 commands") b5beae5e224f1 ("powerpc/pseries: Add driver for PAPR SCM regions") d6548ae4d16dc ("acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm") f517f7925b7b4 ("ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods")
v4.14.186: Failed to apply! Possible dependencies: 01091c496f920 ("acpi/nfit: improve bounds checking for 'func'") 0e7f0741450b1 ("acpi, nfit: validate commands against the device type") 1194c4133195d ("nfit: Add Hyper-V NVDIMM DSM command set to white list") 11e1427016095 ("acpi, nfit: add support for NVDIMM_FAMILY_INTEL v1.6 DSMs") 466d1493ea830 ("acpi, nfit: rework NVDIMM leaf method detection") 4b27db7e26cdb ("acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods") 6f07f86c49407 ("acpi, nfit: Introduce nfit_mem flags") b37b3fd33d034 ("acpi nfit: Enable to show what feature is supported via ND_CMD_CALL for nfit_test") b9b1504d3c6d6 ("acpi, nfit: hide unknown commands from nmemX/commands") d6548ae4d16dc ("acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm")
v4.9.228: Failed to apply! Possible dependencies: 095ab4b39f91b ("acpi, nfit: allow override of built-in bitmasks for nvdimm DSMs") 0f817ae696b04 ("usb: dwc3: pci: add a private driver structure") 36daf3aa399c0 ("usb: dwc3: pci: avoid build warning") 3f23df72dc351 ("mmc: sdhci-pci: Use ACPI to get max frequency for Intel NI byt sdio") 41c8bdb3ab10c ("acpi, nfit: Switch to use new generic UUID API") 42237e393f64d ("libnvdimm: allow a platform to force enable label support") 42b06496407c0 ("mmc: sdhci-pci: Add PCI ID for Intel NI byt sdio") 4b27db7e26cdb ("acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods") 6f07f86c49407 ("acpi, nfit: Introduce nfit_mem flags") 8f078b38dd382 ("libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED") 94116f8126de9 ("ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()") 9cecca75b5a0d ("usb: dwc3: pci: call _DSM for suspend/resume") 9d62ed9651182 ("libnvdimm: handle locked label storage areas") b7fe92999a98a ("ACPI / extlog: Switch to use new generic UUID API") b917078c1c107 ("net: hns: Add ACPI support to check SFP present") ba650cfcf9409 ("acpi, nfit: allow specifying a default DSM family") c959a6b00ff58 ("mmc: sdhci-pci: Don't re-tune with runtime pm for some Intel devices") d2061f9cc32db ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY") d6548ae4d16dc ("acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm") fab9288428ec0 ("usb: USB Type-C connector class")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
On Fri, Jun 26, 2020 at 2:06 AM Dan Williams dan.j.williams@intel.com wrote:
Quoting the documentation:
Some persistent memory devices run a firmware locally on the device / "DIMM" to perform tasks like media management, capacity provisioning, and health monitoring. The process of updating that firmware typically involves a reboot because it has implications for in-flight memory transactions. However, reboots are disruptive and at least the Intel persistent memory platform implementation, described by the Intel ACPI DSM specification [1], has added support for activating firmware at runtime. [1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism behind a libnvdimm-generic sysfs interface. The interface could support runtime-firmware-activation on another architecture without need to change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods (DSMs) to 'arm' individual devices for activation and bus-level 'trigger' method to execute the activation. Informational / enumeration methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that the memory controller may need to be quiesced, no memory cycles, during the activation. While the platform has mechanisms to support holding off in-flight DMA during the activation, the device response to that delay is potentially undefined. The platform may reject a runtime firmware update if, for example a PCI-E device does not support its completion timeout value being increased to meet the activation time. Outside of device timeouts the quiesce period may also violate application timeouts.
Given the above device and application timeout considerations the implementation defaults to hooking into the suspend path to trigger the activation, i.e. that a suspend-resume cycle (at least up to the syscore suspend point) is required.
Well, that doesn't work if the suspend method for the system is set to suspend-to-idle (for example, via /sys/power/mem_sleep), because the syscore callbacks are not invoked in that case.
Also you probably don't need the device power state toggling that happens during regular suspend/resume (you may not want it even for some devices).
The hibernation freeze/thaw may be a better match and there is some test support in there already that may be kind of co-opted for your use case.
Cheers!
On Fri, Jun 26, 2020 at 7:22 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jun 26, 2020 at 2:06 AM Dan Williams dan.j.williams@intel.com wrote:
Quoting the documentation:
Some persistent memory devices run a firmware locally on the device / "DIMM" to perform tasks like media management, capacity provisioning, and health monitoring. The process of updating that firmware typically involves a reboot because it has implications for in-flight memory transactions. However, reboots are disruptive and at least the Intel persistent memory platform implementation, described by the Intel ACPI DSM specification [1], has added support for activating firmware at runtime. [1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism behind a libnvdimm-generic sysfs interface. The interface could support runtime-firmware-activation on another architecture without need to change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods (DSMs) to 'arm' individual devices for activation and bus-level 'trigger' method to execute the activation. Informational / enumeration methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that the memory controller may need to be quiesced, no memory cycles, during the activation. While the platform has mechanisms to support holding off in-flight DMA during the activation, the device response to that delay is potentially undefined. The platform may reject a runtime firmware update if, for example a PCI-E device does not support its completion timeout value being increased to meet the activation time. Outside of device timeouts the quiesce period may also violate application timeouts.
Given the above device and application timeout considerations the implementation defaults to hooking into the suspend path to trigger the activation, i.e. that a suspend-resume cycle (at least up to the syscore suspend point) is required.
Well, that doesn't work if the suspend method for the system is set to suspend-to-idle (for example, via /sys/power/mem_sleep), because the syscore callbacks are not invoked in that case.
Also you probably don't need the device power state toggling that happens during regular suspend/resume (you may not want it even for some devices).
The hibernation freeze/thaw may be a better match and there is some test support in there already that may be kind of co-opted for your use case.
Hmm, yes I guess freeze should be sufficient to quiesce most device-DMA in the general case as applications will stop sending requests. I do expect some RDMA devices will happily keep on transmitting, but that likely will need explicit mitigation. It also appears the suspend callback for at least one RDMA device mlx5_suspend() is rather violent as it appears to fully teardown the device context, not just suspend operations.
To be clear, what debug interface were you thinking I could glom onto to just trigger firmware-activate at the end of the freeze phase?
On Fri, Jun 26, 2020 at 8:43 PM Dan Williams dan.j.williams@intel.com wrote:
On Fri, Jun 26, 2020 at 7:22 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jun 26, 2020 at 2:06 AM Dan Williams dan.j.williams@intel.com wrote:
Quoting the documentation:
Some persistent memory devices run a firmware locally on the device / "DIMM" to perform tasks like media management, capacity provisioning, and health monitoring. The process of updating that firmware typically involves a reboot because it has implications for in-flight memory transactions. However, reboots are disruptive and at least the Intel persistent memory platform implementation, described by the Intel ACPI DSM specification [1], has added support for activating firmware at runtime. [1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism behind a libnvdimm-generic sysfs interface. The interface could support runtime-firmware-activation on another architecture without need to change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods (DSMs) to 'arm' individual devices for activation and bus-level 'trigger' method to execute the activation. Informational / enumeration methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that the memory controller may need to be quiesced, no memory cycles, during the activation. While the platform has mechanisms to support holding off in-flight DMA during the activation, the device response to that delay is potentially undefined. The platform may reject a runtime firmware update if, for example a PCI-E device does not support its completion timeout value being increased to meet the activation time. Outside of device timeouts the quiesce period may also violate application timeouts.
Given the above device and application timeout considerations the implementation defaults to hooking into the suspend path to trigger the activation, i.e. that a suspend-resume cycle (at least up to the syscore suspend point) is required.
Well, that doesn't work if the suspend method for the system is set to suspend-to-idle (for example, via /sys/power/mem_sleep), because the syscore callbacks are not invoked in that case.
Also you probably don't need the device power state toggling that happens during regular suspend/resume (you may not want it even for some devices).
The hibernation freeze/thaw may be a better match and there is some test support in there already that may be kind of co-opted for your use case.
Hmm, yes I guess freeze should be sufficient to quiesce most device-DMA in the general case as applications will stop sending requests.
It is expected to be sufficient to quiesce all of them.
If that is not the case, the integrity of the hibernation image cannot be guaranteed on the system in question.
I do expect some RDMA devices will happily keep on transmitting, but that likely will need explicit mitigation. It also appears the suspend callback for at least one RDMA device mlx5_suspend() is rather violent as it appears to fully teardown the device context, not just suspend operations.
To be clear, what debug interface were you thinking I could glom onto to just trigger firmware-activate at the end of the freeze phase?
Functionally, the same as for suspend, but using the hibernation interface, so "echo platform > /sys/power/pm_test" followed by "echo disk > /sys/power/state".
But it might be cleaner to introduce a special "hibernation mode", ie. is one more item in /sys/power/disk, that will trigger what you need (in analogy with "test_resume").
On Sun, Jun 28, 2020 at 10:23 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jun 26, 2020 at 8:43 PM Dan Williams dan.j.williams@intel.com wrote:
On Fri, Jun 26, 2020 at 7:22 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jun 26, 2020 at 2:06 AM Dan Williams dan.j.williams@intel.com wrote:
Quoting the documentation:
Some persistent memory devices run a firmware locally on the device / "DIMM" to perform tasks like media management, capacity provisioning, and health monitoring. The process of updating that firmware typically involves a reboot because it has implications for in-flight memory transactions. However, reboots are disruptive and at least the Intel persistent memory platform implementation, described by the Intel ACPI DSM specification [1], has added support for activating firmware at runtime. [1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism behind a libnvdimm-generic sysfs interface. The interface could support runtime-firmware-activation on another architecture without need to change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods (DSMs) to 'arm' individual devices for activation and bus-level 'trigger' method to execute the activation. Informational / enumeration methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that the memory controller may need to be quiesced, no memory cycles, during the activation. While the platform has mechanisms to support holding off in-flight DMA during the activation, the device response to that delay is potentially undefined. The platform may reject a runtime firmware update if, for example a PCI-E device does not support its completion timeout value being increased to meet the activation time. Outside of device timeouts the quiesce period may also violate application timeouts.
Given the above device and application timeout considerations the implementation defaults to hooking into the suspend path to trigger the activation, i.e. that a suspend-resume cycle (at least up to the syscore suspend point) is required.
Well, that doesn't work if the suspend method for the system is set to suspend-to-idle (for example, via /sys/power/mem_sleep), because the syscore callbacks are not invoked in that case.
Also you probably don't need the device power state toggling that happens during regular suspend/resume (you may not want it even for some devices).
The hibernation freeze/thaw may be a better match and there is some test support in there already that may be kind of co-opted for your use case.
Hmm, yes I guess freeze should be sufficient to quiesce most device-DMA in the general case as applications will stop sending requests.
It is expected to be sufficient to quiesce all of them.
If that is not the case, the integrity of the hibernation image cannot be guaranteed on the system in question.
Ah, indeed, I was overlooking that property.
I do expect some RDMA devices will happily keep on transmitting, but that likely will need explicit mitigation. It also appears the suspend callback for at least one RDMA device mlx5_suspend() is rather violent as it appears to fully teardown the device context, not just suspend operations.
To be clear, what debug interface were you thinking I could glom onto to just trigger firmware-activate at the end of the freeze phase?
Functionally, the same as for suspend, but using the hibernation interface, so "echo platform > /sys/power/pm_test" followed by "echo disk > /sys/power/state".
But it might be cleaner to introduce a special "hibernation mode", ie. is one more item in /sys/power/disk, that will trigger what you need (in analogy with "test_resume").
I'll move the trigger to be after process freeze, but I'll keep it tied to suspend-debug vs hibernate-debug. It appears the hibernate debug path still goes through the exercise of allocating memory for the hibernation image which is unnecessary if the goal is just to 'freeze', 'activate', and 'thaw'.
On Tue, Jun 30, 2020 at 1:37 AM Dan Williams dan.j.williams@intel.com wrote:
On Sun, Jun 28, 2020 at 10:23 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jun 26, 2020 at 8:43 PM Dan Williams dan.j.williams@intel.com wrote:
On Fri, Jun 26, 2020 at 7:22 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, Jun 26, 2020 at 2:06 AM Dan Williams dan.j.williams@intel.com wrote:
Quoting the documentation:
Some persistent memory devices run a firmware locally on the device / "DIMM" to perform tasks like media management, capacity provisioning, and health monitoring. The process of updating that firmware typically involves a reboot because it has implications for in-flight memory transactions. However, reboots are disruptive and at least the Intel persistent memory platform implementation, described by the Intel ACPI DSM specification [1], has added support for activating firmware at runtime. [1]: https://docs.pmem.io/persistent-memory/
The approach taken is to abstract the Intel platform specific mechanism behind a libnvdimm-generic sysfs interface. The interface could support runtime-firmware-activation on another architecture without need to change userspace tooling.
The ACPI NFIT implementation involves a set of device-specific-methods (DSMs) to 'arm' individual devices for activation and bus-level 'trigger' method to execute the activation. Informational / enumeration methods are also provided at the bus and device level.
One complicating aspect of the memory device firmware activation is that the memory controller may need to be quiesced, no memory cycles, during the activation. While the platform has mechanisms to support holding off in-flight DMA during the activation, the device response to that delay is potentially undefined. The platform may reject a runtime firmware update if, for example a PCI-E device does not support its completion timeout value being increased to meet the activation time. Outside of device timeouts the quiesce period may also violate application timeouts.
Given the above device and application timeout considerations the implementation defaults to hooking into the suspend path to trigger the activation, i.e. that a suspend-resume cycle (at least up to the syscore suspend point) is required.
Well, that doesn't work if the suspend method for the system is set to suspend-to-idle (for example, via /sys/power/mem_sleep), because the syscore callbacks are not invoked in that case.
Also you probably don't need the device power state toggling that happens during regular suspend/resume (you may not want it even for some devices).
The hibernation freeze/thaw may be a better match and there is some test support in there already that may be kind of co-opted for your use case.
Hmm, yes I guess freeze should be sufficient to quiesce most device-DMA in the general case as applications will stop sending requests.
It is expected to be sufficient to quiesce all of them.
If that is not the case, the integrity of the hibernation image cannot be guaranteed on the system in question.
Ah, indeed, I was overlooking that property.
I do expect some RDMA devices will happily keep on transmitting, but that likely will need explicit mitigation. It also appears the suspend callback for at least one RDMA device mlx5_suspend() is rather violent as it appears to fully teardown the device context, not just suspend operations.
To be clear, what debug interface were you thinking I could glom onto to just trigger firmware-activate at the end of the freeze phase?
Functionally, the same as for suspend, but using the hibernation interface, so "echo platform > /sys/power/pm_test" followed by "echo disk > /sys/power/state".
But it might be cleaner to introduce a special "hibernation mode", ie. is one more item in /sys/power/disk, that will trigger what you need (in analogy with "test_resume").
I'll move the trigger to be after process freeze, but I'll keep it tied to suspend-debug vs hibernate-debug. It appears the hibernate debug path still goes through the exercise of allocating memory for the hibernation image which is unnecessary if the goal is just to 'freeze', 'activate', and 'thaw'.
But you need the ->freeze and ->thaw callbacks to run which does not happen at the process freeze stage.
If you add a new hibernation mode dedicated to the NVDIMM firmware update, though, you can instrument the code to skip the memory allocation if this mode is selected.
linux-stable-mirror@lists.linaro.org