Re: SVE routines for cortex-strings

1 Jun 2018

      On 06/01/2018 02:32 AM, Richard Sandiford wrote:
...
Thanks for doing these.  One general comment is that the routines
tend to use the FFR result even in the case where no potential
fault is detected.  Although it's not as obvious as it could be
from some of the published documentation, the architecturally-
preferred approach is instead to have the "normal" case depend only
on the flags set by RDFFRS, not on the FFR itself.
Clearly it would be interesting to read the microarch docs once they are
available.  This is not the result I would have imagined.
...
RDFFRS Pn.B, Pg/Z
B.NLAST recovery

So the takeaway is that, if the branch is predicted untaken, and we don't use
Pn in the predicted path, then FFR is speculatively unused.
...
Also, using INCB, INCH, INCW and INCD is architecturally preferred over
INCP in cases where either could be used.
This is much more directly understandable.  I guess it's the sort of thing
where the real cycle count for INCP is going to depend on the actual width of
the implementation, whereas INCB is always going to be a simple add.
...
The idea is that the B.NLAST should be highly predictable,
so it's usually not necessary to wait for the FFR value to become
available.  And in practice, getting a precise FFR predicate is likely
to be a slow operation (to the extent that this is an ISA-level
principle rather than a uarch optimisation).
I'm surprised about FFR being quite so slow.  And I guess from the language
elsewhere in the manual -- more or less that first-fault reads can fail at any
time for any reason -- I expected them to fail more often than you are implying
that they should.
I suppose if they only fail on tlb misses, and the OS is using 64k pages, then
the failure rate must be below 0.5%.
Thanks.  This is the sort of stuff that's missing from the public manual so far.
r~

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: SVE routines for cortex-strings