Hi Christophe,
On Wed, Aug 28, 2024 at 4:38 PM LEROY Christophe christophe.leroy2@cs-soprasteria.com wrote:
key[0], key[1], key[2], key[3], key[4], key[5], key[6], key[7]
le32toh(key[0]), le32toh(key[1]), le32toh(key[2]), le32toh(key[3]),
le32toh(key[4]), le32toh(key[5]), le32toh(key[6]), le32toh(key[7])
Are you sure about that?
From drivers/char/random.c:
static void _get_random_bytes(void *buf, size_t len) { u32 chacha_state[CHACHA_STATE_WORDS]; ... crng_make_state(chacha_state, buf, first_block_len); ... while (len) { if (len < CHACHA_BLOCK_SIZE) { chacha20_block(chacha_state, tmp);
static void crng_make_state(u32 chacha_state[CHACHA_STATE_WORDS], u8 *random_data, size_t random_data_len) { ... crng_fast_key_erasure(base_crng.key, chacha_state, random_data, random_data_len); ... crng_fast_key_erasure(base_crng.key, chacha_state, crng->key, sizeof(crng->key));
static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE], u32 chacha_state[CHACHA_STATE_WORDS], u8 *random_data, size_t random_data_len) { u8 first_block[CHACHA_BLOCK_SIZE];
BUG_ON(random_data_len > 32);
chacha_init_consts(chacha_state); memcpy(&chacha_state[4], key, CHACHA_KEY_SIZE); memset(&chacha_state[12], 0, sizeof(u32) * 4); chacha20_block(chacha_state, first_block);
memcpy(key, first_block, CHACHA_KEY_SIZE); memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len); memzero_explicit(first_block, sizeof(first_block)); }
And then if we look at chacha20_block->chacha_block_generic in lib/crypto/chacha.c:
void chacha_block_generic(u32 *state, u8 *stream, int nrounds) { u32 x[16]; int i;
memcpy(x, state, 64);
chacha_permute(x, nrounds);
for (i = 0; i < ARRAY_SIZE(x); i++) put_unaligned_le32(x[i] + state[i], &stream[i * sizeof(u32)]);
state[12]++; }
So I don't see any endianness conversion happening with the key bytes. They're memcpy'd from rng output bytes directly into native endian u32 words.
You may have an objection to this. But the goal of the vDSO code is to match the kernel's algorithm 1:1 without deviations. To that end, I suspect this patch actually improves the unit test to ensure that.
With regards to your objection, though, if you feel strongly enough about it, I suppose that's something you could attempt to change throughout, with one commit that touches random.c and the vDSO code. I'm not sure whether or not I'd go along with that yet, but if it were to happen, I think that's the way to do it. For now, though, the goal is for the vDSO algorithm to copy the kernel algorithm.
Do you agree that this patch helps the vDSO algorithm copy the kernel algorithm better? Genuinely asking, because maybe I got it all backwards somehow.
Jason