The upstream changes necessary to fix these CVEs rely on the presence of JMP32, which is not a small backport and brings its own potential set of necessary follow-ups.
Daniel Borkmann, John Fastabend and Alexei Starovoitov came up with a fix involving the use of the AX register.
This has been tested against the test_verifier in 4.19.y tree and some tests specific to the two referred CVEs.
Daniel Borkmann (3): bpf: Do not use ax register in interpreter on div/mod bpf: Fix 32 bit src register truncation on div/mod bpf: Fix truncation handling for mod32 dst reg wrt zero
include/linux/filter.h | 24 ++++++++++++++++++++++++ kernel/bpf/core.c | 32 +++++++++++++++----------------- kernel/bpf/verifier.c | 27 ++++++++++++++------------- 3 files changed, 53 insertions(+), 30 deletions(-)
From: Daniel Borkmann daniel@iogearbox.net
Partially undo old commit 144cd91c4c2b ("bpf: move tmp variable into ax register in interpreter"). The reason we need this here is because ax register will be used for holding temporary state for div/mod instruction which otherwise interpreter would corrupt. This will cause a small +8 byte stack increase for interpreter, but with the gain that we can use it from verifier rewrites as scratch register.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com [cascardo: This partial revert is needed in order to support using AX for the following two commits, as there is no JMP32 on 4.19.y] Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@canonical.com --- kernel/bpf/core.c | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 36be400c3e65..d2b6d2459aad 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -705,9 +705,6 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from, * below. * * Constant blinding is only used by JITs, not in the interpreter. - * The interpreter uses AX in some occasions as a local temporary - * register e.g. in DIV or MOD instructions. - * * In restricted circumstances, the verifier can also use the AX * register for rewrites as long as they do not interfere with * the above cases! @@ -1057,6 +1054,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) #undef BPF_INSN_3_LBL #undef BPF_INSN_2_LBL u32 tail_call_cnt = 0; + u64 tmp;
#define CONT ({ insn++; goto select_insn; }) #define CONT_JMP ({ insn++; goto select_insn; }) @@ -1117,36 +1115,36 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) (*(s64 *) &DST) >>= IMM; CONT; ALU64_MOD_X: - div64_u64_rem(DST, SRC, &AX); - DST = AX; + div64_u64_rem(DST, SRC, &tmp); + DST = tmp; CONT; ALU_MOD_X: - AX = (u32) DST; - DST = do_div(AX, (u32) SRC); + tmp = (u32) DST; + DST = do_div(tmp, (u32) SRC); CONT; ALU64_MOD_K: - div64_u64_rem(DST, IMM, &AX); - DST = AX; + div64_u64_rem(DST, IMM, &tmp); + DST = tmp; CONT; ALU_MOD_K: - AX = (u32) DST; - DST = do_div(AX, (u32) IMM); + tmp = (u32) DST; + DST = do_div(tmp, (u32) IMM); CONT; ALU64_DIV_X: DST = div64_u64(DST, SRC); CONT; ALU_DIV_X: - AX = (u32) DST; - do_div(AX, (u32) SRC); - DST = (u32) AX; + tmp = (u32) DST; + do_div(tmp, (u32) SRC); + DST = (u32) tmp; CONT; ALU64_DIV_K: DST = div64_u64(DST, IMM); CONT; ALU_DIV_K: - AX = (u32) DST; - do_div(AX, (u32) IMM); - DST = (u32) AX; + tmp = (u32) DST; + do_div(tmp, (u32) IMM); + DST = (u32) tmp; CONT; ALU_END_TO_BE: switch (IMM) {
From: Daniel Borkmann daniel@iogearbox.net
Commit e88b2c6e5a4d9ce30d75391e4d950da74bb2bd90 upstream.
While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example:
# bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...]
In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6:
div, 64 bit: div, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit
mod, 64 bit: mod, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit
x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero.
Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend john.fastabend@gmail.com Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net [Salvatore Bonaccorso: This is an earlier version of the patch provided by Daniel Borkmann which does not rely on availability of the BPF_JMP32 instruction class. This means it is not even strictly a backport of the upstream commit mentioned but based on Daniel's and John's work to address the issue.] Tested-by: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@canonical.com --- include/linux/filter.h | 24 ++++++++++++++++++++++++ kernel/bpf/verifier.c | 22 +++++++++++----------- 2 files changed, 35 insertions(+), 11 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h index 117f9380069a..7c84762cb59e 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -77,6 +77,14 @@ struct sock_reuseport;
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+#define BPF_ALU_REG(CLASS, OP, DST, SRC) \ + ((struct bpf_insn) { \ + .code = CLASS | BPF_OP(OP) | BPF_X, \ + .dst_reg = DST, \ + .src_reg = SRC, \ + .off = 0, \ + .imm = 0 }) + #define BPF_ALU64_REG(OP, DST, SRC) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ @@ -123,6 +131,14 @@ struct sock_reuseport;
/* Short form of mov, dst_reg = src_reg */
+#define BPF_MOV_REG(CLASS, DST, SRC) \ + ((struct bpf_insn) { \ + .code = CLASS | BPF_MOV | BPF_X, \ + .dst_reg = DST, \ + .src_reg = SRC, \ + .off = 0, \ + .imm = 0 }) + #define BPF_MOV64_REG(DST, SRC) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_MOV | BPF_X, \ @@ -157,6 +173,14 @@ struct sock_reuseport; .off = 0, \ .imm = IMM })
+#define BPF_RAW_REG(insn, DST, SRC) \ + ((struct bpf_insn) { \ + .code = (insn).code, \ + .dst_reg = DST, \ + .src_reg = SRC, \ + .off = (insn).off, \ + .imm = (insn).imm }) + /* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */ #define BPF_LD_IMM64(DST, IMM) \ BPF_LD_IMM64_RAW(DST, 0, IMM) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2bf83305e5ab..a346ecfe6241 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6177,28 +6177,28 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; struct bpf_insn mask_and_div[] = { - BPF_MOV32_REG(insn->src_reg, insn->src_reg), + BPF_MOV_REG(BPF_CLASS(insn->code), BPF_REG_AX, insn->src_reg), /* Rx div 0 -> 0 */ - BPF_JMP_IMM(BPF_JNE, insn->src_reg, 0, 2), - BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, 2), + BPF_RAW_REG(*insn, insn->dst_reg, BPF_REG_AX), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - *insn, + BPF_ALU_REG(BPF_CLASS(insn->code), BPF_XOR, insn->dst_reg, insn->dst_reg), }; struct bpf_insn mask_and_mod[] = { - BPF_MOV32_REG(insn->src_reg, insn->src_reg), + BPF_MOV_REG(BPF_CLASS(insn->code), BPF_REG_AX, insn->src_reg), /* Rx mod 0 -> Rx */ - BPF_JMP_IMM(BPF_JEQ, insn->src_reg, 0, 1), - *insn, + BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, 1), + BPF_RAW_REG(*insn, insn->dst_reg, BPF_REG_AX), }; struct bpf_insn *patchlet;
if (insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) || insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { - patchlet = mask_and_div + (is64 ? 1 : 0); - cnt = ARRAY_SIZE(mask_and_div) - (is64 ? 1 : 0); + patchlet = mask_and_div; + cnt = ARRAY_SIZE(mask_and_div); } else { - patchlet = mask_and_mod + (is64 ? 1 : 0); - cnt = ARRAY_SIZE(mask_and_mod) - (is64 ? 1 : 0); + patchlet = mask_and_mod; + cnt = ARRAY_SIZE(mask_and_mod); }
new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
From: Daniel Borkmann daniel@iogearbox.net
Commit 9b00f1b78809309163dda2d044d9e94a3c0248a3 upstream.
Recently noticed that when mod32 with a known src reg of 0 is performed, then the dst register is 32-bit truncated in verifier:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 0 1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=inv0 R1_w=inv-1 R10=fp0 2: (b4) w2 = -1 3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0 3: (9c) w1 %= w0 4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 4: (b7) r0 = 1 5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 5: (1d) if r1 == r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 6: (b7) r0 = 2 7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0 7: (95) exit 7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0 7: (95) exit
However, as a runtime result, we get 2 instead of 1, meaning the dst register does not contain (u32)-1 in this case. The reason is fairly straight forward given the 0 test leaves the dst register as-is:
# ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+1 4: (9c) w1 %= w0 5: (b7) r0 = 1 6: (1d) if r1 == r2 goto pc+1 7: (b7) r0 = 2 8: (95) exit
This was originally not an issue given the dst register was marked as completely unknown (aka 64 bit unknown). However, after 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification") the verifier casts the register output to 32 bit, and hence it becomes 32 bit unknown. Note that for the case where the src register is unknown, the dst register is marked 64 bit unknown. After the fix, the register is truncated by the runtime and the test passes:
# ./bpftool p d x i 23 0: (b7) r0 = 0 1: (b7) r1 = -1 2: (b4) w2 = -1 3: (16) if w0 == 0x0 goto pc+2 4: (9c) w1 %= w0 5: (05) goto pc+1 6: (bc) w1 = w1 7: (b7) r0 = 1 8: (1d) if r1 == r2 goto pc+1 9: (b7) r0 = 2 10: (95) exit
Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows:
mod32: mod64:
(16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1 (9c) w1 %= w0 (9f) r1 %= r0 (05) goto pc+1 (bc) w1 = w1
Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com [Salvatore Bonaccorso: This is an earlier version based on work by Daniel and John which does not rely on availability of the BPF_JMP32 instruction class. This means it is not even strictly a backport of the upstream commit mentioned but based on Daniel's and John's work to address the issue and was finalized by Thadeu Lima de Souza Cascardo.] Tested-by: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@canonical.com --- kernel/bpf/verifier.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a346ecfe6241..abdc9eca463c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6178,7 +6178,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; struct bpf_insn mask_and_div[] = { BPF_MOV_REG(BPF_CLASS(insn->code), BPF_REG_AX, insn->src_reg), - /* Rx div 0 -> 0 */ + /* [R,W]x div 0 -> 0 */ BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, 2), BPF_RAW_REG(*insn, insn->dst_reg, BPF_REG_AX), BPF_JMP_IMM(BPF_JA, 0, 0, 1), @@ -6186,9 +6186,10 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) }; struct bpf_insn mask_and_mod[] = { BPF_MOV_REG(BPF_CLASS(insn->code), BPF_REG_AX, insn->src_reg), - /* Rx mod 0 -> Rx */ - BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, 1), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, 1 + (is64 ? 0 : 1)), BPF_RAW_REG(*insn, insn->dst_reg, BPF_REG_AX), + BPF_JMP_IMM(BPF_JA, 0, 0, 1), + BPF_MOV32_REG(insn->dst_reg, insn->dst_reg), }; struct bpf_insn *patchlet;
@@ -6198,7 +6199,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env) cnt = ARRAY_SIZE(mask_and_div); } else { patchlet = mask_and_mod; - cnt = ARRAY_SIZE(mask_and_mod); + cnt = ARRAY_SIZE(mask_and_mod) - (is64 ? 2 : 0); }
new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
On Fri, Aug 27, 2021 at 10:55:30AM -0300, Thadeu Lima de Souza Cascardo wrote:
The upstream changes necessary to fix these CVEs rely on the presence of JMP32, which is not a small backport and brings its own potential set of necessary follow-ups.
Daniel Borkmann, John Fastabend and Alexei Starovoitov came up with a fix involving the use of the AX register.
This has been tested against the test_verifier in 4.19.y tree and some tests specific to the two referred CVEs.
THanks for these, now queued up!
greg k-h
linux-stable-mirror@lists.linaro.org