From: Ard Biesheuvel ardb@kernel.org
The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code.
The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs.
Cc: stable@vger.kernel.org Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Tested-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel ardb@kernel.org --- arch/arm64/crypto/aes-neonbs-glue.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index bac4cabef607..849dc41320db 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -227,8 +227,19 @@ static int ctr_encrypt(struct skcipher_request *req) src += blocks * AES_BLOCK_SIZE; } if (nbytes && walk.nbytes == walk.total) { + u8 buf[AES_BLOCK_SIZE]; + u8 *d = dst; + + if (unlikely(nbytes < AES_BLOCK_SIZE)) + src = dst = memcpy(buf + sizeof(buf) - nbytes, + src, nbytes); + neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds, nbytes, walk.iv); + + if (unlikely(nbytes < AES_BLOCK_SIZE)) + memcpy(d, buf + sizeof(buf) - nbytes, nbytes); + nbytes = 0; } kernel_neon_end();
On Sat, 17 Feb 2024 at 17:12, Ard Biesheuvel ardb+git@google.com wrote:
From: Ard Biesheuvel ardb@kernel.org
The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code.
The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs.
Cc: stable@vger.kernel.org Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Tested-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel ardb@kernel.org
Ping?
arch/arm64/crypto/aes-neonbs-glue.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index bac4cabef607..849dc41320db 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -227,8 +227,19 @@ static int ctr_encrypt(struct skcipher_request *req) src += blocks * AES_BLOCK_SIZE; } if (nbytes && walk.nbytes == walk.total) {
u8 buf[AES_BLOCK_SIZE];
u8 *d = dst;
if (unlikely(nbytes < AES_BLOCK_SIZE))
src = dst = memcpy(buf + sizeof(buf) - nbytes,
src, nbytes);
neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds, nbytes, walk.iv);
if (unlikely(nbytes < AES_BLOCK_SIZE))
memcpy(d, buf + sizeof(buf) - nbytes, nbytes);
nbytes = 0; } kernel_neon_end();
-- 2.44.0.rc0.258.g7320e95886-goog
On Thu, Feb 22, 2024 at 12:37:45AM +0100, Ard Biesheuvel wrote:
On Sat, 17 Feb 2024 at 17:12, Ard Biesheuvel ardb+git@google.com wrote:
From: Ard Biesheuvel ardb@kernel.org
The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code.
The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs.
Cc: stable@vger.kernel.org Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Tested-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel ardb@kernel.org
Ping?
It's in my queue. Thanks.
On Sat, Feb 17, 2024 at 05:11:52PM +0100, Ard Biesheuvel wrote:
From: Ard Biesheuvel ardb@kernel.org
The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code.
The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs.
Cc: stable@vger.kernel.org Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Tested-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel ardb@kernel.org
Looks like this could use:
Fixes: fc074e130051 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk")
if (unlikely(nbytes < AES_BLOCK_SIZE))
src = dst = memcpy(buf + sizeof(buf) - nbytes,
src, nbytes);
neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds, nbytes, walk.iv);
if (unlikely(nbytes < AES_BLOCK_SIZE))
memcpy(d, buf + sizeof(buf) - nbytes, nbytes);
The second one could use 'dst' instead of 'buf + sizeof(buf) - nbytes', right?
Otherwise this looks good.
Reviewed-by: Eric Biggers ebiggers@google.com
- Eric
On Thu, 22 Feb 2024 at 07:34, Eric Biggers ebiggers@kernel.org wrote:
On Sat, Feb 17, 2024 at 05:11:52PM +0100, Ard Biesheuvel wrote:
From: Ard Biesheuvel ardb@kernel.org
The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code.
The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs.
Cc: stable@vger.kernel.org Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Tested-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel ardb@kernel.org
Looks like this could use:
Fixes: fc074e130051 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk")
Indeed.
if (unlikely(nbytes < AES_BLOCK_SIZE))
src = dst = memcpy(buf + sizeof(buf) - nbytes,
src, nbytes);
neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds, nbytes, walk.iv);
if (unlikely(nbytes < AES_BLOCK_SIZE))
memcpy(d, buf + sizeof(buf) - nbytes, nbytes);
The second one could use 'dst' instead of 'buf + sizeof(buf) - nbytes', right?
Correct.
Otherwise this looks good.
Reviewed-by: Eric Biggers ebiggers@google.com
I'll respin with these changes. Thanks.
linux-stable-mirror@lists.linaro.org