On 11/1/22 14:33, Mark Brown wrote:
As well as a number of simple features which only add new instructions and require corresponding hwcaps SME2 introduces a new register ZT0 for which we must define ABI. Fortunately this is a fixed size 512 bits and therefore much more straightforward than the base SME state, the only wrinkle is that it is only accessible when ZA is accessible.
While there is only a single register the architecture is written with a view to exensibility, including a number in the name, so follow this in the ABI.
Signed-off-by: Mark Brown broonie@kernel.org
Documentation/arm64/sme.rst | 52 ++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 9 deletions(-)
diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst index 16d2db4c2e2e..5f7eabee4853 100644 --- a/Documentation/arm64/sme.rst +++ b/Documentation/arm64/sme.rst @@ -18,14 +18,19 @@ model features for SME is included in Appendix A.
- General
-* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA
- register state and TPIDR2_EL0 are tracked per thread.
+* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when
- present) ZT0 register state and TPIDR2_EL0 are tracked per thread.
- The presence of SME is reported to userspace via HWCAP2_SME in the aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of the SME instructions and registers, and the Linux-specific system interfaces described in this document. SME is reported in /proc/cpuinfo as "sme".
+* The presence of SME2 is reported to userspace via HWCAP2_SME in the
I suppose HWCAP2_SME -> HWCAP2_SME2?
- aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of
- the SME2 instructions and ZT0, and the Linux-specific system interfaces
- described in this document. SME2 is reported in /proc/cpuinfo as "sme2".
- Support for the execution of SME instructions in userspace can also be detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS instruction, and checking that the value of the SME field is nonzero. [3]
@@ -44,6 +49,7 @@ model features for SME is included in Appendix A. HWCAP2_SME_B16F32 HWCAP2_SME_F32F32 HWCAP2_SME_FA64
HWCAP2_SME2
This list may be extended over time as the SME architecture evolves. @@ -52,8 +58,8 @@ model features for SME is included in Appendix A. cpu-feature-registers.txt for details.
- Debuggers should restrict themselves to interacting with the target via the
- NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets. The recommended way
- of detecting support for these regsets is to connect to a target process
- NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended
- way of detecting support for these regsets is to connect to a target process first and then attempt a
ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov). @@ -89,13 +95,13 @@ be zeroed.
- On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
- ZA matrix are preserved.
- ZA matrix and ZT0 (if present) are preserved.
- On syscall PSTATE.SM will be cleared and the SVE registers will be handled as per the standard SVE ABI.
-* Neither the SVE registers nor ZA are used to pass arguments to or receive
- results from any syscall.
+* None of the SVE registers, ZA or ZT0 are used to pass arguments to
- or receive results from any syscall.
- On process creation (eg, clone()) the newly created process will have PSTATE.SM cleared.
@@ -134,6 +140,14 @@ be zeroed. __reserved[] referencing this space. za_context is then written in the extra space. Refer to [1] for further details about this mechanism. +* If ZT is supported and PSTATE.ZA==1 then a signal frame record for ZT will
- be generated.
I noticed we refer to ZT0 as ZT sometimes. Should we use ZT0 throughout? Or maybe ZT, if it makes more sense?
Otherwise it can get a bit confusing.
+* The signal record for ZT has magic ZT_MAGIC (0x73d4e827) and consists of a
- standard signal frame header followed by a struct zt_context specifying
- the number of ZT registers supported by the system, then zt_contxt.nregs
zt_contxt -> zt_context
- blocks of 64 bytes of data per register.
5. Signal return
@@ -151,6 +165,9 @@ When returning from a signal handler: the signal frame does not match the current vector length, the signal return attempt is treated as illegal, resulting in a forced SIGSEGV. +* If ZT is not supported or PSTATE.ZA==0 then it is illegal to have a
- signal frame record for ZT, resulting in a forced SIGSEGV.
6. prctl extensions
@@ -214,8 +231,8 @@ prctl(PR_SME_SET_VL, unsigned long arg) vector length that will be applied at the next execve() by the calling thread.
- Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
- Changing the vector length causes all of ZA, ZT, P0..P15, FFR and all
bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become unspecified, including both streaming and non-streaming SVE state. Calling PR_SME_SET_VL with vl equal to the thread's current vector length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
@@ -317,6 +334,15 @@ The regset data starts with struct user_za_header, containing:
- The effect of writing a partial, incomplete payload is unspecified.
+* A new regset NT_ARM_ZT is defined for for access to ZT state via
typo, double for
- PTRACE_GETREGSET and PTRACE_SETREGSET.
+* The NT_ARM_ZT regset consists of a single 512 bit register.
+* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZT as 0.
+* Writes to NT_ARM_ZT will set PSTATE.ZA to 1.
8. ELF coredump extensions
@@ -331,6 +357,11 @@ The regset data starts with struct user_za_header, containing: been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread when the coredump was generated. +* A NT_ARM_ZT note will be added to each coredump for each thread of the
- dumped process. The contents will be equivalent to the data that would have
- been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread
- when the coredump was generated.
- The NT_ARM_TLS note will be extended to two registers, the second register will contain TPIDR2_EL0 on systems that support SME and will be read as zero with writes ignored otherwise.
@@ -406,6 +437,9 @@ In A64 state, SME adds the following: For best system performance it is strongly encouraged for software to enable ZA only when it is actively being used. +* A new ZT0 register is introduced when SME2 is present. This is a 512 bit
- register which is accessible PSTATE.ZA is set, as ZA itself is.
accessible WHEN?
- Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and SMSTOP instructions or by access to the SVCR system register: