On Wed, Sep 17, 2025 at 11:51:20AM +0800, Yicong Yang wrote:
On 2025/9/16 22:56, Catalin Marinas wrote:
On Mon, Sep 15, 2025 at 04:29:25PM +0800, Yicong Yang wrote:
in my understanding the hwcap only describes the capabilities of the CPU but not the whole system. the users should make sure the function works as expected if the CPU supports it and they're going to use it. specifically the LS64 is intended for device memory only, so the user should take responsibility of using it on supported memory.
We have other cases like MTE where we avoid exposing the HWCAP to user if we know the memory system does not support MTE, though we intercepted this early and asked the (micro)architects to tie the CPU ID field to what the system supports.
but we lack the same identification mechanism as CPU for the memory system, so it's just a restriction for the hardware vendor that if certain feature is not supported for the whole system (SoC) then do not advertise it in the CPU's ID field. otherwise i think we're currently doing in the manner that if capability mismatch or cannot work as expected together then a errata/workaround is used to disable the feature or add some workaround on this certain platform.
this is also the case for LS64 but a bit more complex, since it involves the completer outside the SoC (the device) and could be a hotplug one (PCIe). from the SoC part we can restrict to advertise the feature only if it's fully supported (what we've already done on our hardware).
That's good to know. Hopefully other vendors do the same.
I think the ARM ARM would benefit from a note here that the system designers should not advertise this if the interconnect does not support it. I can raise this internally.
Arguably, the use of LD/ST64B* is fairly specialised and won't be used on the general purpose RAM and by random applications. It needs a device driver to create the NC/Device mapping and specific programs/libraries to access it. I'm not sure the LS64 properties are guaranteed by the device alone or the device together with the interconnect. I suspect the latter and neither the kernel driver nor user space can tell. In the best case, you get a fault and realise the system doesn't work as expected. Worse is the non-atomicity with potentially silent corruption.
will be the latter one, both interconnect and the target device need to support it. but I think the driver developer (kernel driver or userspace driver) must have knowledge about the support status, otherwise they should not use it.
[...]
my thoughts is that the driver developer should have known whether their device support it or not if going to use this. the information in the firmware table should be fine for platform devices, but cannot describe information for hotpluggable ones like PCIe endpoint devices which may not be listed in a firmware table.
There's a risk of such instructions ending up in more generic copy_to/from_io implementations but it's not much we can do other than not enabling the feature at all.
So, I think a HWCAP bit is useful but we need (a) clarification that the CPUID field won't be set if the system doesn't support it and (b) document the Linux bit that it's a per-device capability even if the CPU/system supports it (the HWCAP is only a prerequisite to be able to use the instructions; the driver can fall back to non-atomic ops, maybe with a DGH if it helps performance).
An alternative would have been for the kernel driver to communicate to the user that the device supports the 64-byte atomic accesses but I'm not aware of any fairly generic way to do this.