On 06/01/2018 02:32 AM, Richard Sandiford wrote:
Thanks for doing these. One general comment is that the routines tend to use the FFR result even in the case where no potential fault is detected. Although it's not as obvious as it could be from some of the published documentation, the architecturally- preferred approach is instead to have the "normal" case depend only on the flags set by RDFFRS, not on the FFR itself.
Clearly it would be interesting to read the microarch docs once they are available. This is not the result I would have imagined.
RDFFRS Pn.B, Pg/Z B.NLAST recovery
So the takeaway is that, if the branch is predicted untaken, and we don't use Pn in the predicted path, then FFR is speculatively unused.
Also, using INCB, INCH, INCW and INCD is architecturally preferred over INCP in cases where either could be used.
This is much more directly understandable. I guess it's the sort of thing where the real cycle count for INCP is going to depend on the actual width of the implementation, whereas INCB is always going to be a simple add.
The idea is that the B.NLAST should be highly predictable, so it's usually not necessary to wait for the FFR value to become available. And in practice, getting a precise FFR predicate is likely to be a slow operation (to the extent that this is an ISA-level principle rather than a uarch optimisation).
I'm surprised about FFR being quite so slow. And I guess from the language elsewhere in the manual -- more or less that first-fault reads can fail at any time for any reason -- I expected them to fail more often than you are implying that they should.
I suppose if they only fail on tlb misses, and the OS is using 64k pages, then the failure rate must be below 0.5%.
Thanks. This is the sort of stuff that's missing from the public manual so far.
r~