Re: Flaw in "random32: update the net random state on interrupt and activity"

7 Aug 2020


      ...
On Aug 7, 2020, at 12:21 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Fri, Aug 7, 2020 at 12:08 PM Andy Lutomirski luto@amacapital.net wrote:
...
4 cycles per byte on Core 2
I took the reference C implementation as-is, and just compiled it with
O2, so my numbers may not be what some heavily optimized case does.
But it was way more than that, even when amortizing for "only need to
do it every 8 cases". I think the 4 cycles/byte might be some "zero
branch mispredicts" case when you've fully unrolled the thing, but
then you'll be taking I$ misses out of the wazoo, since by definition
this won't be in your L1 I$ at all (only called every 8 times).
Sure, it might look ok on microbenchmarks where it does stay hot the
cache all the time, but that's not realistic. I
No one said we have to do only one ChaCha20 block per slow path hit.  In fact, the more we reduce the number of rounds, the more time we spend on I$ misses, branch mispredictions, etc, so reducing rounds may be barking up the wrong tree entirely.  We probably don’t want to have more than one page
I wonder if AES-NI adds any value here.  AES-CTR is almost a drop-in replacement for ChaCha20, and maybe the performance for a cache-cold short run is better.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: Flaw in "random32: update the net random state on interrupt and activity"