This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of rejecting the programs.
The approach was previously presented at LPC'24 [1] and RAID'24 [2].
To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects potentially-dangerous unprivileged BPF programs as of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). In [2], we have analyzed 364 object files from open source projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf Examples, Parca, and Prevail) and found that this affects 31% to 54% of programs.
To resolve this in the majority of cases this patchset adds a fall-back for mitigating Spectre v1 using speculation barriers. The kernel still optimistically attempts to verify all speculative paths but uses speculation barriers against v1 when unsafe behavior is detected. This allows for more programs to be accepted without disabling the BPF Spectre mitigations (e.g., by setting cpu_mitigations_off()).
For this, it relies on the fact that speculation barriers prevent all later instructions if the speculation was not correct:
* On x86_64, lfence acts as full speculation barrier, not only as a load fence [3]:
An LFENCE instruction or a serializing instruction will ensure that no later instructions execute, even speculatively, until all prior instructions complete locally. [...] Inserting an LFENCE instruction after a bounds check prevents later operations from executing before the bound check completes.
This was experimentally confirmed in [4].
* ARM's SB speculation barrier instruction also affects "any instruction that appears later in the program order than the barrier" [5].
In [1] we have measured the overhead of this approach relative to having mitigations off and including the upstream Spectre v4 mitigations. For event tracing and stack-sampling profilers, we found that mitigations increase BPF program execution time by 0% to 62%. For the Loxilb network load balancer, we have measured a 14% slowdown in SCTP performance but no significant slowdown for TCP. This overhead only applies to programs that were previously rejected.
I reran the expressiveness-evaluation with v6.14 and made sure the main results still match those from [1] and [2] (which used v6.5).
Main design decisions are:
* Do not use separate bytecode insns for v1 and v4 barriers. This simplifies the verifier significantly and has the only downside that performance on PowerPC is not as high as it could be.
* Allow archs to still disable v1/v4 mitigations separately by setting bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can benefit from improved BPF expressiveness / performance if they are not vulnerable (e.g., ARM64 for v4 in the kernel).
* Do not remove the empty BPF_NOSPEC implementation for backends for which it is unknown whether they are vulnerable to Spectre v1.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") [3] https://www.intel.com/content/www/us/en/developer/articles/technical/softwar... ("Managed Runtime Speculative Execution Side Channel Mitigations") [4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a tool to analyze speculative execution attacks and mitigations" - Section 4.6 "Stopping Speculative Execution") [5] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB... ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set Architecture (2020-12)")
Changes:
* v1 -> v2: - Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11 ("bpf: Fall back to nospec for spec path verification") as suggested by Alexei. This series therefore no longer changes push_stack() to return PTR_ERR. - Add detailed explanation of how lfence works internally and how it affects the algorithm. - Add tests checking that nospec instructions are inserted in expected locations using __xlated_unpriv as suggested by Eduard (also, include a fix for __xlated_unpriv) - Add a test for the mitigations from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches") - Remove unused variables from do_check[_insn]() as suggested by Eduard. - Remove INSN_IDX_MODIFIED to improve readability as suggested by Eduard. This also causes the nospec_result-check to run (and fail) for jumping-ops. Add a warning to assert that this check must never succeed in that case. - Add details on the safety of patch 10 ("bpf: Allow nospec-protected var-offset stack access") based on the feedback on v1. - Rebase to bpf-next-250420 - Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/
* RFC -> v1: - rebase to bpf-next-250313 - tests: mark expected successes/new errors - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in bpf_bypass_spec_v1/v4() - ensure that nospec with v1-support is implemented for archs for which GCC supports speculation barriers, except for MIPS - arm64: emit speculation barrier - powerpc: change nospec to include v1 barrier - discuss potential security (archs that do not impl. BPF nospec) and performance (only PowerPC) regressions - Linkt to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/
Luis Gerhorst (11): selftests/bpf: Fix caps for __xlated/jited_unpriv bpf: Move insn if/else into do_check_insn() bpf: Return -EFAULT on misconfigurations bpf: Return -EFAULT on internal errors bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4() bpf, arm64, powerpc: Change nospec to include v1 barrier bpf: Rename sanitize_stack_spill to nospec_result bpf: Fall back to nospec for Spectre v1 selftests/bpf: Add test for Spectre v1 mitigation bpf: Allow nospec-protected var-offset stack access bpf: Fall back to nospec for sanitization-failures
arch/arm64/net/bpf_jit.h | 5 + arch/arm64/net/bpf_jit_comp.c | 28 +- arch/powerpc/net/bpf_jit_comp64.c | 79 ++- include/linux/bpf.h | 11 +- include/linux/bpf_verifier.h | 3 +- include/linux/filter.h | 2 +- kernel/bpf/core.c | 32 +- kernel/bpf/verifier.c | 648 ++++++++++-------- tools/testing/selftests/bpf/progs/bpf_misc.h | 4 + .../selftests/bpf/progs/verifier_and.c | 8 +- .../selftests/bpf/progs/verifier_bounds.c | 66 +- .../bpf/progs/verifier_bounds_deduction.c | 45 +- .../selftests/bpf/progs/verifier_map_ptr.c | 20 +- .../selftests/bpf/progs/verifier_movsx.c | 16 +- .../selftests/bpf/progs/verifier_unpriv.c | 65 +- .../bpf/progs/verifier_value_ptr_arith.c | 101 ++- tools/testing/selftests/bpf/test_loader.c | 14 +- .../selftests/bpf/verifier/dead_code.c | 3 +- tools/testing/selftests/bpf/verifier/jmp32.c | 33 +- tools/testing/selftests/bpf/verifier/jset.c | 10 +- 20 files changed, 765 insertions(+), 428 deletions(-)
base-commit: 8582d9ab3efdebb88e0cd8beed8e0b9de76443e7
Currently, __xlated_unpriv and __jited_unpriv do not work because the BPF syscall will overwrite info.jited_prog_len and info.xlated_prog_len with 0 if the process is not bpf_capable(). This bug was not noticed before, because there is no test that actually uses __xlated_unpriv/__jited_unpriv.
To resolve this, simply restore the capabilities earlier (but still after loading the program). Adding this here unconditionally is fine because the function first checks that the capabilities were initialized before attempting to restore them.
This will be important later when we add tests that check whether a speculation barrier was inserted in the correct location.
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Fixes: 9c9f73391310 ("selftests/bpf: allow checking xlated programs in verifier_* tests") Fixes: 7d743e4c759c ("selftests/bpf: __jited test tag to check disassembly after jit") --- tools/testing/selftests/bpf/test_loader.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c index 49f2fc61061f..9551d8d5f8f9 100644 --- a/tools/testing/selftests/bpf/test_loader.c +++ b/tools/testing/selftests/bpf/test_loader.c @@ -1042,6 +1042,14 @@ void run_subtest(struct test_loader *tester, emit_verifier_log(tester->log_buf, false /*force*/); validate_msgs(tester->log_buf, &subspec->expect_msgs, emit_verifier_log);
+ /* Restore capabilities because the kernel will silently ignore requests + * for program info (such as xlated program text) if we are not + * bpf-capable. Also, for some reason test_verifier executes programs + * with all capabilities restored. Do the same here. + */ + if (restore_capabilities(&caps)) + goto tobj_cleanup; + if (subspec->expect_xlated.cnt) { err = get_xlated_program_text(bpf_program__fd(tprog), tester->log_buf, tester->log_buf_sz); @@ -1067,12 +1075,6 @@ void run_subtest(struct test_loader *tester, }
if (should_do_test_run(spec, subspec)) { - /* For some reason test_verifier executes programs - * with all capabilities restored. Do the same here. - */ - if (restore_capabilities(&caps)) - goto tobj_cleanup; - /* Do bpf_map__attach_struct_ops() for each struct_ops map. * This should trigger bpf_struct_ops->reg callback on kernel side. */
This is required to catch the errors later and fall back to a nospec if on a speculative path.
Eliminate the regs variable as it is only used once and insn_idx is not modified in-between the definition and usage.
Still pass insn simply to match the other check_*() functions. As Eduard points out [1], insn is assumed to correspond to env->insn_idx in many places (e.g, __check_reg_arg()).
Move code into do_check_insn(), replace * "continue" with "return 0" after modifying insn_idx * "goto process_bpf_exit" with "return PROCESS_BPF_EXIT" * "do_print_state = " with "*do_print_state = "
[1] https://lore.kernel.org/all/293dbe3950a782b8eb3b87b71d7a967e120191fd.camel@g...
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Acked-by: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- kernel/bpf/verifier.c | 425 ++++++++++++++++++++++-------------------- 1 file changed, 219 insertions(+), 206 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 54c6953a8b84..c4f197ca6c45 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -19389,20 +19389,218 @@ static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type typ return 0; }
+enum { + PROCESS_BPF_EXIT = 1 +}; + +static int do_check_insn(struct bpf_verifier_env *env, struct bpf_insn *insn, + bool *do_print_state) +{ + int err; + u8 class = BPF_CLASS(insn->code); + bool exception_exit = false; + + if (class == BPF_ALU || class == BPF_ALU64) { + err = check_alu_op(env, insn); + if (err) + return err; + + } else if (class == BPF_LDX) { + bool is_ldsx = BPF_MODE(insn->code) == BPF_MEMSX; + + /* Check for reserved fields is already done in + * resolve_pseudo_ldimm64(). + */ + err = check_load_mem(env, insn, false, is_ldsx, true, "ldx"); + if (err) + return err; + } else if (class == BPF_STX) { + if (BPF_MODE(insn->code) == BPF_ATOMIC) { + err = check_atomic(env, insn); + if (err) + return err; + env->insn_idx++; + return 0; + } + + if (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0) { + verbose(env, "BPF_STX uses reserved fields\n"); + return -EINVAL; + } + + err = check_store_reg(env, insn, false); + if (err) + return err; + } else if (class == BPF_ST) { + enum bpf_reg_type dst_reg_type; + + if (BPF_MODE(insn->code) != BPF_MEM || + insn->src_reg != BPF_REG_0) { + verbose(env, "BPF_ST uses reserved fields\n"); + return -EINVAL; + } + /* check src operand */ + err = check_reg_arg(env, insn->dst_reg, SRC_OP); + if (err) + return err; + + dst_reg_type = cur_regs(env)[insn->dst_reg].type; + + /* check that memory (dst_reg + off) is writeable */ + err = check_mem_access(env, env->insn_idx, insn->dst_reg, + insn->off, BPF_SIZE(insn->code), + BPF_WRITE, -1, false, false); + if (err) + return err; + + err = save_aux_ptr_type(env, dst_reg_type, false); + if (err) + return err; + } else if (class == BPF_JMP || class == BPF_JMP32) { + u8 opcode = BPF_OP(insn->code); + + env->jmps_processed++; + if (opcode == BPF_CALL) { + if (BPF_SRC(insn->code) != BPF_K || + (insn->src_reg != BPF_PSEUDO_KFUNC_CALL && + insn->off != 0) || + (insn->src_reg != BPF_REG_0 && + insn->src_reg != BPF_PSEUDO_CALL && + insn->src_reg != BPF_PSEUDO_KFUNC_CALL) || + insn->dst_reg != BPF_REG_0 || class == BPF_JMP32) { + verbose(env, "BPF_CALL uses reserved fields\n"); + return -EINVAL; + } + + if (env->cur_state->active_locks) { + if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) || + (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && + (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) { + verbose(env, + "function calls are not allowed while holding a lock\n"); + return -EINVAL; + } + } + if (insn->src_reg == BPF_PSEUDO_CALL) { + err = check_func_call(env, insn, &env->insn_idx); + } else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { + err = check_kfunc_call(env, insn, &env->insn_idx); + if (!err && is_bpf_throw_kfunc(insn)) { + exception_exit = true; + goto process_bpf_exit_full; + } + } else { + err = check_helper_call(env, insn, &env->insn_idx); + } + if (err) + return err; + + mark_reg_scratched(env, BPF_REG_0); + } else if (opcode == BPF_JA) { + if (BPF_SRC(insn->code) != BPF_K || + insn->src_reg != BPF_REG_0 || + insn->dst_reg != BPF_REG_0 || + (class == BPF_JMP && insn->imm != 0) || + (class == BPF_JMP32 && insn->off != 0)) { + verbose(env, "BPF_JA uses reserved fields\n"); + return -EINVAL; + } + + if (class == BPF_JMP) + env->insn_idx += insn->off + 1; + else + env->insn_idx += insn->imm + 1; + return 0; + } else if (opcode == BPF_EXIT) { + if (BPF_SRC(insn->code) != BPF_K || + insn->imm != 0 || + insn->src_reg != BPF_REG_0 || + insn->dst_reg != BPF_REG_0 || + class == BPF_JMP32) { + verbose(env, "BPF_EXIT uses reserved fields\n"); + return -EINVAL; + } +process_bpf_exit_full: + /* We must do check_reference_leak here before + * prepare_func_exit to handle the case when + * state->curframe > 0, it may be a callback function, + * for which reference_state must match caller reference + * state when it exits. + */ + err = check_resource_leak(env, exception_exit, !env->cur_state->curframe, + "BPF_EXIT instruction in main prog"); + if (err) + return err; + + /* The side effect of the prepare_func_exit which is + * being skipped is that it frees bpf_func_state. + * Typically, process_bpf_exit will only be hit with + * outermost exit. copy_verifier_state in pop_stack will + * handle freeing of any extra bpf_func_state left over + * from not processing all nested function exits. We + * also skip return code checks as they are not needed + * for exceptional exits. + */ + if (exception_exit) + return PROCESS_BPF_EXIT; + + if (env->cur_state->curframe) { + /* exit from nested function */ + err = prepare_func_exit(env, &env->insn_idx); + if (err) + return err; + *do_print_state = true; + return 0; + } + + err = check_return_code(env, BPF_REG_0, "R0"); + if (err) + return err; + return PROCESS_BPF_EXIT; + } else { + err = check_cond_jmp_op(env, insn, &env->insn_idx); + if (err) + return err; + } + } else if (class == BPF_LD) { + u8 mode = BPF_MODE(insn->code); + + if (mode == BPF_ABS || mode == BPF_IND) { + err = check_ld_abs(env, insn); + if (err) + return err; + + } else if (mode == BPF_IMM) { + err = check_ld_imm(env, insn); + if (err) + return err; + + env->insn_idx++; + sanitize_mark_insn_seen(env); + } else { + verbose(env, "invalid BPF_LD mode\n"); + return -EINVAL; + } + } else { + verbose(env, "unknown insn class %d\n", class); + return -EINVAL; + } + + env->insn_idx++; + return 0; +} + static int do_check(struct bpf_verifier_env *env) { bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); struct bpf_verifier_state *state = env->cur_state; struct bpf_insn *insns = env->prog->insnsi; - struct bpf_reg_state *regs; int insn_cnt = env->prog->len; bool do_print_state = false; int prev_insn_idx = -1;
for (;;) { - bool exception_exit = false; struct bpf_insn *insn; - u8 class; int err;
/* reset current history entry on each new instruction */ @@ -19416,7 +19614,6 @@ static int do_check(struct bpf_verifier_env *env) }
insn = &insns[env->insn_idx]; - class = BPF_CLASS(insn->code);
if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) { verbose(env, @@ -19486,216 +19683,32 @@ static int do_check(struct bpf_verifier_env *env) return err; }
- regs = cur_regs(env); sanitize_mark_insn_seen(env); prev_insn_idx = env->insn_idx;
- if (class == BPF_ALU || class == BPF_ALU64) { - err = check_alu_op(env, insn); - if (err) - return err; - - } else if (class == BPF_LDX) { - bool is_ldsx = BPF_MODE(insn->code) == BPF_MEMSX; - - /* Check for reserved fields is already done in - * resolve_pseudo_ldimm64(). - */ - err = check_load_mem(env, insn, false, is_ldsx, true, - "ldx"); - if (err) - return err; - } else if (class == BPF_STX) { - if (BPF_MODE(insn->code) == BPF_ATOMIC) { - err = check_atomic(env, insn); - if (err) - return err; - env->insn_idx++; - continue; - } - - if (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0) { - verbose(env, "BPF_STX uses reserved fields\n"); - return -EINVAL; - } - - err = check_store_reg(env, insn, false); - if (err) - return err; - } else if (class == BPF_ST) { - enum bpf_reg_type dst_reg_type; - - if (BPF_MODE(insn->code) != BPF_MEM || - insn->src_reg != BPF_REG_0) { - verbose(env, "BPF_ST uses reserved fields\n"); - return -EINVAL; - } - /* check src operand */ - err = check_reg_arg(env, insn->dst_reg, SRC_OP); - if (err) - return err; - - dst_reg_type = regs[insn->dst_reg].type; - - /* check that memory (dst_reg + off) is writeable */ - err = check_mem_access(env, env->insn_idx, insn->dst_reg, - insn->off, BPF_SIZE(insn->code), - BPF_WRITE, -1, false, false); - if (err) - return err; - - err = save_aux_ptr_type(env, dst_reg_type, false); - if (err) - return err; - } else if (class == BPF_JMP || class == BPF_JMP32) { - u8 opcode = BPF_OP(insn->code); - - env->jmps_processed++; - if (opcode == BPF_CALL) { - if (BPF_SRC(insn->code) != BPF_K || - (insn->src_reg != BPF_PSEUDO_KFUNC_CALL - && insn->off != 0) || - (insn->src_reg != BPF_REG_0 && - insn->src_reg != BPF_PSEUDO_CALL && - insn->src_reg != BPF_PSEUDO_KFUNC_CALL) || - insn->dst_reg != BPF_REG_0 || - class == BPF_JMP32) { - verbose(env, "BPF_CALL uses reserved fields\n"); - return -EINVAL; - } - - if (env->cur_state->active_locks) { - if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) || - (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && - (insn->off != 0 || !kfunc_spin_allowed(insn->imm)))) { - verbose(env, "function calls are not allowed while holding a lock\n"); - return -EINVAL; - } - } - if (insn->src_reg == BPF_PSEUDO_CALL) { - err = check_func_call(env, insn, &env->insn_idx); - } else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { - err = check_kfunc_call(env, insn, &env->insn_idx); - if (!err && is_bpf_throw_kfunc(insn)) { - exception_exit = true; - goto process_bpf_exit_full; - } - } else { - err = check_helper_call(env, insn, &env->insn_idx); - } - if (err) - return err; - - mark_reg_scratched(env, BPF_REG_0); - } else if (opcode == BPF_JA) { - if (BPF_SRC(insn->code) != BPF_K || - insn->src_reg != BPF_REG_0 || - insn->dst_reg != BPF_REG_0 || - (class == BPF_JMP && insn->imm != 0) || - (class == BPF_JMP32 && insn->off != 0)) { - verbose(env, "BPF_JA uses reserved fields\n"); - return -EINVAL; - } - - if (class == BPF_JMP) - env->insn_idx += insn->off + 1; - else - env->insn_idx += insn->imm + 1; - continue; - - } else if (opcode == BPF_EXIT) { - if (BPF_SRC(insn->code) != BPF_K || - insn->imm != 0 || - insn->src_reg != BPF_REG_0 || - insn->dst_reg != BPF_REG_0 || - class == BPF_JMP32) { - verbose(env, "BPF_EXIT uses reserved fields\n"); - return -EINVAL; - } -process_bpf_exit_full: - /* We must do check_reference_leak here before - * prepare_func_exit to handle the case when - * state->curframe > 0, it may be a callback - * function, for which reference_state must - * match caller reference state when it exits. - */ - err = check_resource_leak(env, exception_exit, !env->cur_state->curframe, - "BPF_EXIT instruction in main prog"); - if (err) - return err; - - /* The side effect of the prepare_func_exit - * which is being skipped is that it frees - * bpf_func_state. Typically, process_bpf_exit - * will only be hit with outermost exit. - * copy_verifier_state in pop_stack will handle - * freeing of any extra bpf_func_state left over - * from not processing all nested function - * exits. We also skip return code checks as - * they are not needed for exceptional exits. - */ - if (exception_exit) - goto process_bpf_exit; - - if (state->curframe) { - /* exit from nested function */ - err = prepare_func_exit(env, &env->insn_idx); - if (err) - return err; - do_print_state = true; - continue; - } - - err = check_return_code(env, BPF_REG_0, "R0"); - if (err) - return err; + err = do_check_insn(env, insn, &do_print_state); + if (err < 0) { + return err; + } else if (err == PROCESS_BPF_EXIT) { process_bpf_exit: - mark_verifier_state_scratched(env); - update_branch_counts(env, env->cur_state); - err = pop_stack(env, &prev_insn_idx, - &env->insn_idx, pop_log); - if (err < 0) { - if (err != -ENOENT) - return err; - break; - } else { - if (WARN_ON_ONCE(env->cur_state->loop_entry)) { - verbose(env, "verifier bug: env->cur_state->loop_entry != NULL\n"); - return -EFAULT; - } - do_print_state = true; - continue; - } - } else { - err = check_cond_jmp_op(env, insn, &env->insn_idx); - if (err) - return err; - } - } else if (class == BPF_LD) { - u8 mode = BPF_MODE(insn->code); - - if (mode == BPF_ABS || mode == BPF_IND) { - err = check_ld_abs(env, insn); - if (err) - return err; - - } else if (mode == BPF_IMM) { - err = check_ld_imm(env, insn); - if (err) + mark_verifier_state_scratched(env); + update_branch_counts(env, env->cur_state); + err = pop_stack(env, &prev_insn_idx, &env->insn_idx, + pop_log); + if (err < 0) { + if (err != -ENOENT) return err; - - env->insn_idx++; - sanitize_mark_insn_seen(env); + break; } else { - verbose(env, "invalid BPF_LD mode\n"); - return -EINVAL; + if (WARN_ON_ONCE(env->cur_state->loop_entry)) { + verbose(env, "verifier bug: env->cur_state->loop_entry != NULL\n"); + return -EFAULT; + } + do_print_state = true; + continue; } - } else { - verbose(env, "unknown insn class %d\n", class); - return -EINVAL; } - - env->insn_idx++; + WARN_ON_ONCE(err); }
return 0;
Mark these cases as non-recoverable to later prevent them from being caught when they occur during speculative path verification.
Eduard writes [1]:
The only pace I'm aware of that might act upon specific error code from verifier syscall is libbpf. Looking through libbpf code, it seems that this change does not interfere with libbpf.
[1] https://lore.kernel.org/all/785b4531ce3b44a84059a4feb4ba458c68fce719.camel@g...
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Reviewed-by: Eduard Zingerman eddyz87@gmail.com Acked-by: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- kernel/bpf/verifier.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c4f197ca6c45..55c1d7ada098 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8965,7 +8965,7 @@ static int resolve_map_arg_type(struct bpf_verifier_env *env, if (!meta->map_ptr) { /* kernel subsystem misconfigured verifier */ verbose(env, "invalid map_ptr to access map->type\n"); - return -EACCES; + return -EFAULT; }
switch (meta->map_ptr->map_type) { @@ -9653,7 +9653,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, * that kernel subsystem misconfigured verifier */ verbose(env, "invalid map_ptr to access map->key\n"); - return -EACCES; + return -EFAULT; } key_size = meta->map_ptr->key_size; err = check_helper_mem_access(env, regno, key_size, BPF_READ, false, NULL); @@ -9680,7 +9680,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, if (!meta->map_ptr) { /* kernel subsystem misconfigured verifier */ verbose(env, "invalid map_ptr to access map->value\n"); - return -EACCES; + return -EFAULT; } meta->raw_mode = arg_type & MEM_UNINIT; err = check_helper_mem_access(env, regno, meta->map_ptr->value_size, @@ -10979,7 +10979,7 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
if (map == NULL) { verbose(env, "kernel subsystem misconfigured verifier\n"); - return -EINVAL; + return -EFAULT; }
/* In case of read-only, some additional restrictions @@ -11018,7 +11018,7 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, return 0; if (!map || map->map_type != BPF_MAP_TYPE_PROG_ARRAY) { verbose(env, "kernel subsystem misconfigured verifier\n"); - return -EINVAL; + return -EFAULT; }
reg = ®s[BPF_REG_3]; @@ -11272,7 +11272,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn if (changes_data && fn->arg1_type != ARG_PTR_TO_CTX) { verbose(env, "kernel subsystem misconfigured func %s#%d: r1 != ctx\n", func_id_name(func_id), func_id); - return -EINVAL; + return -EFAULT; }
memset(&meta, 0, sizeof(meta)); @@ -11574,7 +11574,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn if (meta.map_ptr == NULL) { verbose(env, "kernel subsystem misconfigured verifier\n"); - return -EINVAL; + return -EFAULT; }
if (func_id == BPF_FUNC_map_lookup_elem && @@ -16697,7 +16697,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn) dst_reg->type = CONST_PTR_TO_MAP; } else { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; }
return 0; @@ -16744,7 +16744,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
if (!env->ops->gen_ld_abs) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; }
if (insn->dst_reg != BPF_REG_0 || insn->off != 0 || @@ -20781,7 +20781,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) -(subprogs[0].stack_depth + 8)); if (epilogue_cnt >= INSN_BUF_SIZE) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; } else if (epilogue_cnt) { /* Save the ARG_PTR_TO_CTX for the epilogue to use */ cnt = 0; @@ -20804,13 +20804,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) if (ops->gen_prologue || env->seen_direct_write) { if (!ops->gen_prologue) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; } cnt = ops->gen_prologue(insn_buf, env->seen_direct_write, env->prog); if (cnt >= INSN_BUF_SIZE) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; } else if (cnt) { new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt); if (!new_prog) @@ -20967,7 +20967,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
if (type == BPF_WRITE) { verbose(env, "bpf verifier narrow ctx access misconfigured\n"); - return -EINVAL; + return -EFAULT; }
size_code = BPF_H; @@ -20986,7 +20986,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) if (cnt == 0 || cnt >= INSN_BUF_SIZE || (ctx_field_size && !target_size)) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; }
if (is_narrower_load && size < target_size) { @@ -20994,7 +20994,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) off, size, size_default) * 8; if (shift && cnt + 1 >= INSN_BUF_SIZE) { verbose(env, "bpf verifier narrow ctx load misconfigured\n"); - return -EINVAL; + return -EFAULT; } if (ctx_field_size <= 4) { if (shift) @@ -21757,7 +21757,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) cnt = env->ops->gen_ld_abs(insn, insn_buf); if (cnt == 0 || cnt >= INSN_BUF_SIZE) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; }
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); @@ -22093,7 +22093,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto patch_map_ops_generic; if (cnt <= 0 || cnt >= INSN_BUF_SIZE) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; }
new_prog = bpf_patch_insn_data(env, i + delta, @@ -22453,7 +22453,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) !map_ptr->ops->map_poke_untrack || !map_ptr->ops->map_poke_run) { verbose(env, "bpf verifier is misconfigured\n"); - return -EINVAL; + return -EFAULT; }
ret = map_ptr->ops->map_poke_track(map_ptr, prog->aux);
This prevents us from trying to recover from these on speculative paths in the future.
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Reviewed-by: Eduard Zingerman eddyz87@gmail.com Acked-by: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- kernel/bpf/verifier.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 55c1d7ada098..27d3bc97a9e0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11667,7 +11667,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn verbose(env, "verifier internal error:"); verbose(env, "func %s has non-overwritten BPF_PTR_POISON return type\n", func_id_name(func_id)); - return -EINVAL; + return -EFAULT; } ret_btf = btf_vmlinux; ret_btf_id = *fn->ret_btf_id; @@ -15261,12 +15261,12 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env, if (WARN_ON_ONCE(ptr_reg)) { print_verifier_state(env, vstate, vstate->curframe, true); verbose(env, "verifier internal error: unexpected ptr_reg\n"); - return -EINVAL; + return -EFAULT; } if (WARN_ON(!src_reg)) { print_verifier_state(env, vstate, vstate->curframe, true); verbose(env, "verifier internal error: no src_reg\n"); - return -EINVAL; + return -EFAULT; } err = adjust_scalar_min_max_vals(env, insn, dst_reg, *src_reg); if (err)
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to skip analysis/patching for the respective vulnerability. For v4, this will reduce the number of barriers the verifier inserts. For v1, it allows more programs to be accepted.
The primary motivation for this is to not regress unpriv BPF's performance on ARM64 in a future commit where BPF_NOSPEC is also used against Spectre v1.
This has the user-visible change that v1-induced rejections on non-vulnerable PowerPC CPUs are avoided.
For now, this does not change the semantics of BPF_NOSPEC. It is still a v4-only barrier and must not be implemented if bypass_spec_v4 is always true for the arch. Changing it to a v1 AND v4-barrier is done in a future commit.
As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1 AND NOSPEC_V4 instructions and allow backends to skip their lowering as suggested by commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was found to be preferable for the following reason:
* bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the same analysis (not taking into account whether the current CPU is vulnerable), needlessly restricts users of CPUs that are not vulnerable. The only use case for this would be portability-testing, but this can later be added easily when needed by allowing users to force bypass_spec_v1/v4 to false.
* Portability is still acceptable: Directly disabling the analysis instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow programs on non-vulnerable CPUs to be accepted while the program will be rejected on vulnerable CPUs. With the fallback to speculation barriers for Spectre v1 implemented in a future commit, this will only affect programs that do variable stack-accesses or are very complex.
For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based on the check that was previously located in the BPF_NOSPEC case.
For LoongArch, it would likely be safe to set both bpf_jit_bypass_spec_v1() and _v4() according to commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode"). This is omitted here as I am unable to do any testing for LoongArch.
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Cc: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- arch/arm64/net/bpf_jit_comp.c | 21 ++++++++++++--------- arch/powerpc/net/bpf_jit_comp64.c | 21 +++++++++++++++++---- include/linux/bpf.h | 11 +++++++++-- kernel/bpf/core.c | 15 +++++++++++++++ 4 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 70d7c89d3ac9..0f617b55866e 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1583,15 +1583,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
/* speculation barrier */ case BPF_ST | BPF_NOSPEC: - /* - * Nothing required here. - * - * In case of arm64, we rely on the firmware mitigation of - * Speculative Store Bypass as controlled via the ssbd kernel - * parameter. Whenever the mitigation is enabled, it works - * for all of the kernel code with no need to provide any - * additional instructions. - */ + /* See bpf_jit_bypass_spec_v4() */ break;
/* ST: *(size *)(dst + off) = imm */ @@ -2762,6 +2754,17 @@ bool bpf_jit_supports_percpu_insn(void) return true; }
+bool bpf_jit_bypass_spec_v4(void) +{ + /* In case of arm64, we rely on the firmware mitigation of Speculative + * Store Bypass as controlled via the ssbd kernel parameter. Whenever + * the mitigation is enabled, it works for all of the kernel code with + * no need to provide any additional instructions. Therefore, skip + * inserting nospec insns against Spectre v4. + */ + return true; +} + bool bpf_jit_inlines_helper_call(s32 imm) { switch (imm) { diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 233703b06d7c..b5339c541283 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -363,6 +363,23 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o return 0; }
+bool bpf_jit_bypass_spec_v1(void) +{ +#if defined(CONFIG_PPC_E500) || defined(CONFIG_PPC_BOOK3S_64) + return !(security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)); +#else + return true; +#endif +} + +bool bpf_jit_bypass_spec_v4(void) +{ + return !(security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_STF_BARRIER) && + stf_barrier_type_get() != STF_BARRIER_NONE); +} + /* * We spill into the redzone always, even if the bpf program has its own stackframe. * Offsets hardcoded based on BPF_PPC_STACK_SAVE -- see bpf_jit_stack_local() @@ -785,10 +802,6 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code * BPF_ST NOSPEC (speculation barrier) */ case BPF_ST | BPF_NOSPEC: - if (!security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) || - !security_ftr_enabled(SEC_FTR_STF_BARRIER)) - break; - switch (stf_barrier) { case STF_BARRIER_EIEIO: EMIT(PPC_RAW_EIEIO() | 0x02000000); diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3f0cc89c0622..2632fbf24654 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2453,14 +2453,21 @@ static inline bool bpf_allow_uninit_stack(const struct bpf_token *token) return bpf_token_capable(token, CAP_PERFMON); }
+bool bpf_jit_bypass_spec_v1(void); +bool bpf_jit_bypass_spec_v4(void); + static inline bool bpf_bypass_spec_v1(const struct bpf_token *token) { - return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON); + return bpf_jit_bypass_spec_v1() || + cpu_mitigations_off() || + bpf_token_capable(token, CAP_PERFMON); }
static inline bool bpf_bypass_spec_v4(const struct bpf_token *token) { - return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON); + return bpf_jit_bypass_spec_v4() || + cpu_mitigations_off() || + bpf_token_capable(token, CAP_PERFMON); }
int bpf_map_new_fd(struct bpf_map *map, int flags); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba6b6118cf50..804f1e52bfa3 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -3029,6 +3029,21 @@ bool __weak bpf_jit_needs_zext(void) return false; }
+/* By default, enable the verifier's mitigations against Spectre v1 and v4 for + * all archs. The value returned must not change at runtime as there is + * currently no support for reloading programs that were loaded without + * mitigations. + */ +bool __weak bpf_jit_bypass_spec_v1(void) +{ + return false; +} + +bool __weak bpf_jit_bypass_spec_v4(void) +{ + return false; +} + /* Return true if the JIT inlines the call to the helper corresponding to * the imm. *
This changes the semantics of BPF_NOSPEC (previously a v4-only barrier) to always emit a speculation barrier that works against both Spectre v1 AND v4. If mitigation is not needed on an architecture, the backend should set bpf_jit_bypass_spec_v4/v1().
As of now, this commit only has the user-visible implication that unpriv BPF's performance on PowerPC is reduced. This is the case because we have to emit additional v1 barrier instructions for BPF_NOSPEC now.
This commit is required for a future commit to allow us to rely on BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature that nospec acts as a v1 barrier is unused.
Commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4") noted that mitigation instructions for v1 and v4 might be different on some archs. While this would potentially offer improved performance on PowerPC, it was dismissed after the following considerations:
* Only having one barrier simplifies the verifier and allows us to easily rely on v4-induced barriers for reducing the complexity of v1-induced speculative path verification.
* For the architectures that implemented BPF_NOSPEC, only PowerPC has distinct instructions for v1 and v4. Even there, some insns may be shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and 'sync'). If this is still found to impact performance in an unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4 insns from being emitted for PowerPC with this setup if bypass_spec_v1/v4 is set.
Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of this commit, v1 in the future) is therefore:
* x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This patch implements BPF_NOSPEC for these architectures. The previous v4-only version was supported since commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4") and commit b7540d625094 ("powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC").
* LoongArch: Not Vulnerable - Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode") is the only other past commit related to BPF_NOSPEC and indicates that the insn is not required there.
* MIPS: Vulnerable (if unprivileged BPF is enabled) - Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode") indicates that it is not vulnerable but this contradicts the kernel and Debian documentation. Therefore I assume that there exist vulnerable MIPS CPUs (but maybe not from Loongson?). In the future, BPF_NOSPEC could be implemented for MIPS based on the GCC speculation_barrier [1]. For now, we rely on unprivileged BPF being disabled by default.
* Other: Unknown - To the best of my knowledge there is no definitive information available that indicates that any other arch is vulnerable. They are therefore left untouched (BPF_NOSPEC is not implemented, but bypass_spec_v1/v4 is also not set).
I did the following testing to ensure the insn encoding is correct:
* ARM64: * 'dsb nsh; isb' was successfully tested with the BPF CI in [2] * 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is executed for example with './test_progs -t verifier_array_access')
* PowerPC: The following configs were tested locally with ppc64le QEMU v8.2 '-machine pseries -cpu POWER9': * STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64 * STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64 * STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64 * CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO * CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on) * CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on) * CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on) Most of those cobinations should not occur in practice, but I was not able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing it on). In any case, this should ensure that there are no unexpected conflicts between the insns when combined like this. Individual v1/v4 barriers were already emitted elsewhere.
[1] https://gcc.gnu.org/git/?p=gcc.git%3Ba=commit%3Bh=29b74545531f6afbee9fc38c26... ("MIPS: Add speculation_barrier support") [2] https://github.com/kernel-patches/bpf/pull/8576
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Cc: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- arch/arm64/net/bpf_jit.h | 5 +++ arch/arm64/net/bpf_jit_comp.c | 9 +++-- arch/powerpc/net/bpf_jit_comp64.c | 58 ++++++++++++++++++++++--------- include/linux/filter.h | 2 +- kernel/bpf/core.c | 17 ++++----- 5 files changed, 64 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h index a3b0e693a125..bbea4f36f9f2 100644 --- a/arch/arm64/net/bpf_jit.h +++ b/arch/arm64/net/bpf_jit.h @@ -325,4 +325,9 @@ #define A64_MRS_SP_EL0(Rt) \ aarch64_insn_gen_mrs(Rt, AARCH64_INSN_SYSREG_SP_EL0)
+/* Barriers */ +#define A64_SB aarch64_insn_get_sb_value() +#define A64_DSB_NSH (aarch64_insn_get_dsb_base_value() | 0x7 << 8) +#define A64_ISB aarch64_insn_get_isb_value() + #endif /* _BPF_JIT_H */ diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 0f617b55866e..ccd6a2f31e35 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1581,9 +1581,14 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, return ret; break;
- /* speculation barrier */ + /* speculation barrier against v1 and v4 */ case BPF_ST | BPF_NOSPEC: - /* See bpf_jit_bypass_spec_v4() */ + if (alternative_has_cap_likely(ARM64_HAS_SB)) { + emit(A64_SB, ctx); + } else { + emit(A64_DSB_NSH, ctx); + emit(A64_ISB, ctx); + } break;
/* ST: *(size *)(dst + off) = imm */ diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index b5339c541283..c00951e2a50e 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -800,26 +800,52 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
/* * BPF_ST NOSPEC (speculation barrier) + * + * The following must act as a barrier against both Spectre v1 + * and v4 if we requested both mitigations. Therefore, also emit + * 'isync; sync' on E500 or 'ori31' on BOOK3S_64 in addition to + * the insns needed for a Spectre v4 barrier. + * + * If we requested only !bypass_spec_v1 OR only !bypass_spec_v4, + * we can skip the respective other barrier type as an + * optimization. */ case BPF_ST | BPF_NOSPEC: - switch (stf_barrier) { - case STF_BARRIER_EIEIO: - EMIT(PPC_RAW_EIEIO() | 0x02000000); - break; - case STF_BARRIER_SYNC_ORI: + bool sync_emitted = false; + bool ori31_emitted = false; +#ifdef CONFIG_PPC_E500 + if (!bpf_jit_bypass_spec_v1()) { + EMIT(PPC_RAW_ISYNC()); EMIT(PPC_RAW_SYNC()); - EMIT(PPC_RAW_LD(tmp1_reg, _R13, 0)); - EMIT(PPC_RAW_ORI(_R31, _R31, 0)); - break; - case STF_BARRIER_FALLBACK: - ctx->seen |= SEEN_FUNC; - PPC_LI64(_R12, dereference_kernel_function_descriptor(bpf_stf_barrier)); - EMIT(PPC_RAW_MTCTR(_R12)); - EMIT(PPC_RAW_BCTRL()); - break; - case STF_BARRIER_NONE: - break; + sync_emitted = true; + } +#endif + if (!bpf_jit_bypass_spec_v4()) { + switch (stf_barrier) { + case STF_BARRIER_EIEIO: + EMIT(PPC_RAW_EIEIO() | 0x02000000); + break; + case STF_BARRIER_SYNC_ORI: + if (!sync_emitted) + EMIT(PPC_RAW_SYNC()); + EMIT(PPC_RAW_LD(tmp1_reg, _R13, 0)); + EMIT(PPC_RAW_ORI(_R31, _R31, 0)); + ori31_emitted = true; + break; + case STF_BARRIER_FALLBACK: + ctx->seen |= SEEN_FUNC; + PPC_LI64(_R12, dereference_kernel_function_descriptor(bpf_stf_barrier)); + EMIT(PPC_RAW_MTCTR(_R12)); + EMIT(PPC_RAW_BCTRL()); + break; + case STF_BARRIER_NONE: + break; + } } +#ifdef CONFIG_PPC_BOOK3S_64 + if (!bpf_jit_bypass_spec_v1() && !ori31_emitted) + EMIT(PPC_RAW_ORI(_R31, _R31, 0)); +#endif break;
/* diff --git a/include/linux/filter.h b/include/linux/filter.h index f5cf4d35d83e..eca229752cbe 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -82,7 +82,7 @@ struct ctl_table_header; #define BPF_CALL_ARGS 0xe0
/* unused opcode to mark speculation barrier for mitigating - * Speculative Store Bypass + * Spectre v1 and v4 */ #define BPF_NOSPEC 0xc0
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 804f1e52bfa3..fe16be379bf4 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2102,14 +2102,15 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn) #undef COND_JMP /* ST, STX and LDX*/ ST_NOSPEC: - /* Speculation barrier for mitigating Speculative Store Bypass. - * In case of arm64, we rely on the firmware mitigation as - * controlled via the ssbd kernel parameter. Whenever the - * mitigation is enabled, it works for all of the kernel code - * with no need to provide any additional instructions here. - * In case of x86, we use 'lfence' insn for mitigation. We - * reuse preexisting logic from Spectre v1 mitigation that - * happens to produce the required code on x86 for v4 as well. + /* Speculation barrier for mitigating Speculative Store Bypass, + * Bounds-Check Bypass and Type Confusion. In case of arm64, we + * rely on the firmware mitigation as controlled via the ssbd + * kernel parameter. Whenever the mitigation is enabled, it + * works for all of the kernel code with no need to provide any + * additional instructions here. In case of x86, we use 'lfence' + * insn for mitigation. We reuse preexisting logic from Spectre + * v1 mitigation that happens to produce the required code on + * x86 for v4 as well. */ barrier_nospec(); CONT;
This is made to clarify that this flag will cause a nospec to be added after this insn and can therefore be relied upon to reduce speculative path analysis.
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Cc: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- include/linux/bpf_verifier.h | 2 +- kernel/bpf/verifier.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 9734544b6957..cebb67becdad 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -576,7 +576,7 @@ struct bpf_insn_aux_data { u64 map_key_state; /* constant (32 bit) key tracking for maps */ int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ u32 seen; /* this insn was processed by the verifier at env->pass_cnt */ - bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */ + bool nospec_result; /* result is unsafe under speculation, nospec must follow */ bool zext_dst; /* this insn zero extends dst reg */ bool needs_zext; /* alu op needs to clear upper bits */ bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 27d3bc97a9e0..3d446c19bdf6 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5031,7 +5031,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, }
if (sanitize) - env->insn_aux_data[insn_idx].sanitize_stack_spill = true; + env->insn_aux_data[insn_idx].nospec_result = true; }
err = destroy_if_dynptr_stack_slot(env, state, spi); @@ -20886,7 +20886,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) }
if (type == BPF_WRITE && - env->insn_aux_data[i + delta].sanitize_stack_spill) { + env->insn_aux_data[i + delta].nospec_result) { struct bpf_insn patch[] = { *insn, BPF_ST_NOSPEC(),
This implements the core of the series and causes the verifier to fall back to mitigating Spectre v1 using speculation barriers. The approach was presented at LPC'24 [1] and RAID'24 [2].
If we find any forbidden behavior on a speculative path, we insert a nospec (e.g., lfence speculation barrier on x86) before the instruction and stop verifying the path. While verifying a speculative path, we can furthermore stop verification of that path whenever we encounter a nospec instruction.
A minimal example program would look as follows:
A = true B = true if A goto e f() if B goto e unsafe() e: exit
There are the following speculative and non-speculative paths (`cur->speculative` and `speculative` referring to the value of the push_stack() parameters):
- A = true - B = true - if A goto e - A && !cur->speculative && !speculative - exit - !A && !cur->speculative && speculative - f() - if B goto e - B && cur->speculative && !speculative - exit - !B && cur->speculative && speculative - unsafe()
If f() contains any unsafe behavior under Spectre v1 and the unsafe behavior matches `state->speculative && error_recoverable_with_nospec(err)`, do_check() will now add a nospec before f() instead of rejecting the program:
A = true B = true if A goto e nospec f() if B goto e unsafe() e: exit
Alternatively, the algorithm also takes advantage of nospec instructions inserted for other reasons (e.g., Spectre v4). Taking the program above as an example, speculative path exploration can stop before f() if a nospec was inserted there because of Spectre v4 sanitization.
In this example, all instructions after the nospec are dead code (and with the nospec they are also dead code speculatively).
On x86_64, this depends on the following property of lfence [3]:
An LFENCE instruction or a serializing instruction will ensure that no later instructions execute, even speculatively, until all prior instructions complete locally. [...] Inserting an LFENCE instruction after a bounds check prevents later operations from executing before the bound check completes.
Regarding the example, this implies that `if B goto e` will not execute before `if A goto e` completes. Once `if A goto e` completes, the CPU should find that the speculation was wrong and continue with `exit`.
If there is any other path that leads to `if B goto e` (and therefore `unsafe()`) without going through `if A goto e`, then a nospec will still be needed there. However, this patch assumes this other path will be explored separately and therefore be discovered by the verifier even if the exploration discussed here stops at the nospec.
This patch furthermore has the unfortunate consequence that Spectre v1 mitigations now only support architectures which implement BPF_NOSPEC. Before this commit, Spectre v1 mitigations prevented exploits by rejecting the programs on all architectures. Because some JITs do not implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's security to a limited extent:
* The regression is limited to systems vulnerable to Spectre v1, have unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The latter is not the case for x86 64- and 32-bit, arm64, and powerpc 64-bit and they are therefore not affected by the regression. According to commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode"), LoongArch is not vulnerable to Spectre v1 and therefore also not affected by the regression.
* To the best of my knowledge this regression may therefore only affect MIPS. This is deemed acceptable because unpriv BPF is still disabled there by default. As stated in a previous commit, BPF_NOSPEC could be implemented for MIPS based on GCC's speculation_barrier implementation.
* It is unclear which other architectures (besides x86 64- and 32-bit, ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel are vulnerable to Spectre v1. Also, it is not clear if barriers are available on these architectures. Implementing BPF_NOSPEC on these architectures therefore is non-trivial. Searching GCC and the kernel for speculation barrier implementations for these architectures yielded no result.
* If any of those regressed systems is also vulnerable to Spectre v4, the system was already vulnerable to Spectre v4 attacks based on unpriv BPF before this patch and the impact is therefore further limited.
As an alternative to regressing security, one could still reject programs if the architecture does not emit BPF_NOSPEC (e.g., by removing the empty BPF_NOSPEC-case from all JITs except for LoongArch where it appears justified). However, this will cause rejections on these archs that are likely unfounded in the vast majority of cases.
In the tests, some are now successful where we previously had a false-positive (i.e., rejection). Change them to reflect where the nospec should be inserted (using __xlated_unpriv) and modify the error message if the nospec is able to mitigate a problem that previously shadowed another problem (in that case __xlated_unpriv does not work, therefore just add a comment).
Define SPEC_V1 to avoid duplicating this ifdef whenever we check for nospec insns using __xlated_unpriv, define it here once. This also improves readability. PowerPC can probably also be added here. However, omit it for now because the BPF CI currently does not include a test.
Briefly went through all the occurrences of EPERM, EINVAL, and EACCESS in the verifier in order to validate that catching them like this makes sense.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") [3] https://www.intel.com/content/www/us/en/developer/articles/technical/softwar... ("Managed Runtime Speculative Execution Side Channel Mitigations")
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Acked-by: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- include/linux/bpf_verifier.h | 1 + kernel/bpf/verifier.c | 78 ++++++++++++++++++- tools/testing/selftests/bpf/progs/bpf_misc.h | 4 + .../selftests/bpf/progs/verifier_and.c | 8 +- .../selftests/bpf/progs/verifier_bounds.c | 61 ++++++++++++--- .../selftests/bpf/progs/verifier_movsx.c | 16 +++- .../selftests/bpf/progs/verifier_unpriv.c | 8 +- .../bpf/progs/verifier_value_ptr_arith.c | 16 +++- .../selftests/bpf/verifier/dead_code.c | 3 +- tools/testing/selftests/bpf/verifier/jmp32.c | 33 +++----- tools/testing/selftests/bpf/verifier/jset.c | 10 +-- 11 files changed, 184 insertions(+), 54 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index cebb67becdad..f1573e093120 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -576,6 +576,7 @@ struct bpf_insn_aux_data { u64 map_key_state; /* constant (32 bit) key tracking for maps */ int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ u32 seen; /* this insn was processed by the verifier at env->pass_cnt */ + bool nospec; /* do not execute this instruction speculatively */ bool nospec_result; /* result is unsafe under speculation, nospec must follow */ bool zext_dst; /* this insn zero extends dst reg */ bool needs_zext; /* alu op needs to clear upper bits */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3d446c19bdf6..92490964eb3b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2014,6 +2014,18 @@ static int pop_stack(struct bpf_verifier_env *env, int *prev_insn_idx, return 0; }
+static bool error_recoverable_with_nospec(int err) +{ + /* Should only return true for non-fatal errors that are allowed to + * occur during speculative verification. For these we can insert a + * nospec and the program might still be accepted. Do not include + * something like ENOMEM because it is likely to re-occur for the next + * architectural path once it has been recovered-from in all speculative + * paths. + */ + return err == -EPERM || err == -EACCES || err == -EINVAL; +} + static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx, bool speculative) @@ -11160,7 +11172,7 @@ static int check_get_func_ip(struct bpf_verifier_env *env) return -ENOTSUPP; }
-static struct bpf_insn_aux_data *cur_aux(struct bpf_verifier_env *env) +static struct bpf_insn_aux_data *cur_aux(const struct bpf_verifier_env *env) { return &env->insn_aux_data[env->insn_idx]; } @@ -13997,7 +14009,9 @@ static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, static bool can_skip_alu_sanitation(const struct bpf_verifier_env *env, const struct bpf_insn *insn) { - return env->bypass_spec_v1 || BPF_SRC(insn->code) == BPF_K; + return env->bypass_spec_v1 || + BPF_SRC(insn->code) == BPF_K || + cur_aux(env)->nospec; }
static int update_alu_sanitation_state(struct bpf_insn_aux_data *aux, @@ -19686,10 +19700,41 @@ static int do_check(struct bpf_verifier_env *env) sanitize_mark_insn_seen(env); prev_insn_idx = env->insn_idx;
+ /* Reduce verification complexity by stopping speculative path + * verification when a nospec is encountered. + */ + if (state->speculative && cur_aux(env)->nospec) + goto process_bpf_exit; + err = do_check_insn(env, insn, &do_print_state); - if (err < 0) { + if (state->speculative && error_recoverable_with_nospec(err)) { + /* Prevent this speculative path from ever reaching the + * insn that would have been unsafe to execute. + */ + cur_aux(env)->nospec = true; + /* If it was an ADD/SUB insn, potentially remove any + * markings for alu sanitization. + */ + cur_aux(env)->alu_state = 0; + goto process_bpf_exit; + } else if (err < 0) { return err; } else if (err == PROCESS_BPF_EXIT) { + goto process_bpf_exit; + } + WARN_ON_ONCE(err); + + if (state->speculative && cur_aux(env)->nospec_result) { + /* If we are on a path that performed a jump-op, this + * may skip a nospec patched-in after the jump. This can + * currently never happen because nospec_result is only + * used for the write-ops + * `*(size*)(dst_reg+off)=src_reg|imm32` which must + * never skip the following insn. Still, add a warning + * to document this in case nospec_result is used + * elsewhere in the future. + */ + WARN_ON_ONCE(env->insn_idx != prev_insn_idx + 1); process_bpf_exit: mark_verifier_state_scratched(env); update_branch_counts(env, env->cur_state); @@ -19708,7 +19753,6 @@ static int do_check(struct bpf_verifier_env *env) continue; } } - WARN_ON_ONCE(err); }
return 0; @@ -20837,6 +20881,29 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) bpf_convert_ctx_access_t convert_ctx_access; u8 mode;
+ if (env->insn_aux_data[i + delta].nospec) { + WARN_ON_ONCE(env->insn_aux_data[i + delta].alu_state); + struct bpf_insn patch[] = { + BPF_ST_NOSPEC(), + *insn, + }; + + cnt = ARRAY_SIZE(patch); + new_prog = bpf_patch_insn_data(env, i + delta, patch, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = new_prog; + insn = new_prog->insnsi + i + delta; + /* This can not be easily merged with the + * nospec_result-case, because an insn may require a + * nospec before and after itself. Therefore also do not + * 'continue' here but potentially apply further + * patching to insn. *insn should equal patch[1] now. + */ + } + if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || insn->code == (BPF_LDX | BPF_MEM | BPF_H) || insn->code == (BPF_LDX | BPF_MEM | BPF_W) || @@ -20887,6 +20954,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
if (type == BPF_WRITE && env->insn_aux_data[i + delta].nospec_result) { + /* nospec_result is only used to mitigate Spectre v4 and + * to limit verification-time for Spectre v1. + */ struct bpf_insn patch[] = { *insn, BPF_ST_NOSPEC(), diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h index 13a2e22f5465..83658b3ba408 100644 --- a/tools/testing/selftests/bpf/progs/bpf_misc.h +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h @@ -230,4 +230,8 @@ #define CAN_USE_LOAD_ACQ_STORE_REL #endif
+#if defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) +#define SPEC_V1 +#endif + #endif diff --git a/tools/testing/selftests/bpf/progs/verifier_and.c b/tools/testing/selftests/bpf/progs/verifier_and.c index e97e518516b6..2b4fdca162be 100644 --- a/tools/testing/selftests/bpf/progs/verifier_and.c +++ b/tools/testing/selftests/bpf/progs/verifier_and.c @@ -85,8 +85,14 @@ l0_%=: r0 = r0; \
SEC("socket") __description("check known subreg with unknown reg") -__success __failure_unpriv __msg_unpriv("R1 !read_ok") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if w0 < 0x1 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R1 !read_ok'` */ +__xlated_unpriv("goto pc-1") /* `r1 = *(u32*)(r1 + 512)`, sanitized dead code */ +__xlated_unpriv("r0 = 0") +#endif __naked void known_subreg_with_unknown_reg(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_bounds.c b/tools/testing/selftests/bpf/progs/verifier_bounds.c index 0eb33bb801b5..30e16153fdf1 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bounds.c +++ b/tools/testing/selftests/bpf/progs/verifier_bounds.c @@ -620,8 +620,14 @@ l1_%=: exit; \
SEC("socket") __description("bounds check mixed 32bit and 64bit arithmetic. test1") -__success __failure_unpriv __msg_unpriv("R0 invalid mem access 'scalar'") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 invalid mem access 'scalar'` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("exit") +#endif __naked void _32bit_and_64bit_arithmetic_test1(void) { asm volatile (" \ @@ -643,8 +649,14 @@ l1_%=: exit; \
SEC("socket") __description("bounds check mixed 32bit and 64bit arithmetic. test2") -__success __failure_unpriv __msg_unpriv("R0 invalid mem access 'scalar'") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 invalid mem access 'scalar'` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("exit") +#endif __naked void _32bit_and_64bit_arithmetic_test2(void) { asm volatile (" \ @@ -691,9 +703,14 @@ l0_%=: r0 = 0; \
SEC("socket") __description("bounds check for reg = 0, reg xor 1") -__success __failure_unpriv -__msg_unpriv("R0 min value is outside of the allowed memory range") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if r1 != 0x0 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 min value is outside of the allowed memory range` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("r0 = 0") +#endif __naked void reg_0_reg_xor_1(void) { asm volatile (" \ @@ -719,9 +736,14 @@ l1_%=: r0 = 0; \
SEC("socket") __description("bounds check for reg32 = 0, reg32 xor 1") -__success __failure_unpriv -__msg_unpriv("R0 min value is outside of the allowed memory range") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if w1 != 0x0 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 min value is outside of the allowed memory range` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("r0 = 0") +#endif __naked void reg32_0_reg32_xor_1(void) { asm volatile (" \ @@ -747,9 +769,14 @@ l1_%=: r0 = 0; \
SEC("socket") __description("bounds check for reg = 2, reg xor 3") -__success __failure_unpriv -__msg_unpriv("R0 min value is outside of the allowed memory range") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if r1 > 0x0 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 min value is outside of the allowed memory range` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("r0 = 0") +#endif __naked void reg_2_reg_xor_3(void) { asm volatile (" \ @@ -829,9 +856,14 @@ l1_%=: r0 = 0; \
SEC("socket") __description("bounds check for reg > 0, reg xor 3") -__success __failure_unpriv -__msg_unpriv("R0 min value is outside of the allowed memory range") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if r1 >= 0x0 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 min value is outside of the allowed memory range` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("r0 = 0") +#endif __naked void reg_0_reg_xor_3(void) { asm volatile (" \ @@ -858,9 +890,14 @@ l1_%=: r0 = 0; \
SEC("socket") __description("bounds check for reg32 > 0, reg32 xor 3") -__success __failure_unpriv -__msg_unpriv("R0 min value is outside of the allowed memory range") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if w1 >= 0x0 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R0 min value is outside of the allowed memory range` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("r0 = 0") +#endif __naked void reg32_0_reg32_xor_3(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_movsx.c b/tools/testing/selftests/bpf/progs/verifier_movsx.c index 994bbc346d25..a4d8814eb5ed 100644 --- a/tools/testing/selftests/bpf/progs/verifier_movsx.c +++ b/tools/testing/selftests/bpf/progs/verifier_movsx.c @@ -245,7 +245,13 @@ l0_%=: \ SEC("socket") __description("MOV32SX, S8, var_off not u32_max, positive after s8 extension") __success __retval(0) -__failure_unpriv __msg_unpriv("frame pointer is read only") +__success_unpriv +#ifdef SPEC_V1 +__xlated_unpriv("w0 = 0") +__xlated_unpriv("exit") +__xlated_unpriv("nospec") /* inserted to prevent `frame pointer is read only` */ +__xlated_unpriv("goto pc-1") +#endif __naked void mov64sx_s32_varoff_2(void) { asm volatile (" \ @@ -267,7 +273,13 @@ l0_%=: \ SEC("socket") __description("MOV32SX, S8, var_off not u32_max, negative after s8 extension") __success __retval(0) -__failure_unpriv __msg_unpriv("frame pointer is read only") +__success_unpriv +#ifdef SPEC_V1 +__xlated_unpriv("w0 = 0") +__xlated_unpriv("exit") +__xlated_unpriv("nospec") /* inserted to prevent `frame pointer is read only` */ +__xlated_unpriv("goto pc-1") +#endif __naked void mov64sx_s32_varoff_3(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_unpriv.c b/tools/testing/selftests/bpf/progs/verifier_unpriv.c index a4a5e2071604..c42c3839b30c 100644 --- a/tools/testing/selftests/bpf/progs/verifier_unpriv.c +++ b/tools/testing/selftests/bpf/progs/verifier_unpriv.c @@ -572,8 +572,14 @@ l0_%=: exit; \
SEC("socket") __description("alu32: mov u32 const") -__success __failure_unpriv __msg_unpriv("R7 invalid mem access 'scalar'") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if r0 == 0x0 goto pc+2") +__xlated_unpriv("nospec") /* inserted to prevent `R7 invalid mem access 'scalar'` */ +__xlated_unpriv("goto pc-1") /* sanitized dead code */ +__xlated_unpriv("exit") +#endif __naked void alu32_mov_u32_const(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c b/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c index 5ba6e53571c8..fcea9819e359 100644 --- a/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c +++ b/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c @@ -398,7 +398,8 @@ l2_%=: r0 = 1; \
SEC("socket") __description("map access: mixing value pointer and scalar, 1") -__success __failure_unpriv __msg_unpriv("R2 pointer comparison prohibited") +__success __failure_unpriv +__msg_unpriv("R2 tried to add from different maps, paths or scalars, pointer arithmetic with it prohibited for !root") __retval(0) __naked void value_pointer_and_scalar_1(void) { @@ -433,6 +434,7 @@ l2_%=: /* common instruction */ \ l3_%=: /* branch B */ \ r0 = 0x13371337; \ /* verifier follows fall-through */ \ + /* unpriv: nospec (inserted to prevent `R2 pointer comparison prohibited`) */\ if r2 != 0x100000 goto l4_%=; \ r0 = 0; \ exit; \ @@ -450,7 +452,8 @@ l4_%=: /* fake-dead code; targeted from branch A to \
SEC("socket") __description("map access: mixing value pointer and scalar, 2") -__success __failure_unpriv __msg_unpriv("R0 invalid mem access 'scalar'") +__success __failure_unpriv +__msg_unpriv("R2 tried to add from different maps, paths or scalars, pointer arithmetic with it prohibited for !root") __retval(0) __naked void value_pointer_and_scalar_2(void) { @@ -492,6 +495,7 @@ l4_%=: /* fake-dead code; targeted from branch A to \ * prevent dead code sanitization, rejected \ * via branch B however \ */ \ + /* unpriv: nospec (inserted to prevent `R0 invalid mem access 'scalar'`) */\ r0 = *(u8*)(r0 + 0); \ r0 = 0; \ exit; \ @@ -1296,9 +1300,13 @@ l0_%=: r0 = 1; \
SEC("socket") __description("map access: value_ptr -= unknown scalar, 2") -__success __failure_unpriv -__msg_unpriv("R0 pointer arithmetic of map value goes out of range") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("r1 &= 7") +__xlated_unpriv("nospec") /* inserted to prevent `R0 pointer arithmetic of map value goes out of range` */ +__xlated_unpriv("r0 -= r1") +#endif __naked void value_ptr_unknown_scalar_2_2(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/verifier/dead_code.c b/tools/testing/selftests/bpf/verifier/dead_code.c index ee454327e5c6..77207b498c6f 100644 --- a/tools/testing/selftests/bpf/verifier/dead_code.c +++ b/tools/testing/selftests/bpf/verifier/dead_code.c @@ -2,14 +2,13 @@ "dead code: start", .insns = { BPF_JMP_IMM(BPF_JA, 0, 0, 2), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_JMP_IMM(BPF_JA, 0, 0, 2), BPF_MOV64_IMM(BPF_REG_0, 7), BPF_JMP_IMM(BPF_JGE, BPF_REG_0, 10, -4), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 7, }, diff --git a/tools/testing/selftests/bpf/verifier/jmp32.c b/tools/testing/selftests/bpf/verifier/jmp32.c index 43776f6f92f4..91d83e9cb148 100644 --- a/tools/testing/selftests/bpf/verifier/jmp32.c +++ b/tools/testing/selftests/bpf/verifier/jmp32.c @@ -84,11 +84,10 @@ BPF_JMP32_IMM(BPF_JSET, BPF_REG_7, 0x10, 1), BPF_EXIT_INSN(), BPF_JMP32_IMM(BPF_JGE, BPF_REG_7, 0x10, 1), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .result = ACCEPT, }, { @@ -149,11 +148,10 @@ BPF_JMP32_IMM(BPF_JEQ, BPF_REG_7, 0x10, 1), BPF_EXIT_INSN(), BPF_JMP32_IMM(BPF_JSGE, BPF_REG_7, 0xf, 1), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .result = ACCEPT, }, { @@ -214,11 +212,10 @@ BPF_JMP32_IMM(BPF_JNE, BPF_REG_7, 0x10, 1), BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x10, 1), BPF_EXIT_INSN(), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .result = ACCEPT, }, { @@ -283,11 +280,10 @@ BPF_JMP32_REG(BPF_JGE, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP32_IMM(BPF_JGE, BPF_REG_7, 0x7ffffff0, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -354,11 +350,10 @@ BPF_JMP32_REG(BPF_JGT, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JGT, BPF_REG_7, 0x7ffffff0, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -425,11 +420,10 @@ BPF_JMP32_REG(BPF_JLE, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP32_IMM(BPF_JLE, BPF_REG_7, 0x7ffffff0, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -496,11 +490,10 @@ BPF_JMP32_REG(BPF_JLT, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0x7ffffff0, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -567,11 +560,10 @@ BPF_JMP32_REG(BPF_JSGE, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0x7ffffff0, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -638,11 +630,10 @@ BPF_JMP32_REG(BPF_JSGT, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JSGT, BPF_REG_7, -2, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -709,11 +700,10 @@ BPF_JMP32_REG(BPF_JSLE, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JSLE, BPF_REG_7, 0x7ffffff0, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, @@ -780,11 +770,10 @@ BPF_JMP32_REG(BPF_JSLT, BPF_REG_7, BPF_REG_8, 1), BPF_EXIT_INSN(), BPF_JMP32_IMM(BPF_JSLT, BPF_REG_7, -1, 1), + /* unpriv: nospec (inserted to prevent "R0 invalid mem access 'scalar'") */ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 invalid mem access 'scalar'", - .result_unpriv = REJECT, .result = ACCEPT, .retval = 2, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, diff --git a/tools/testing/selftests/bpf/verifier/jset.c b/tools/testing/selftests/bpf/verifier/jset.c index 11fc68da735e..e901eefd774a 100644 --- a/tools/testing/selftests/bpf/verifier/jset.c +++ b/tools/testing/selftests/bpf/verifier/jset.c @@ -78,12 +78,11 @@ .insns = { BPF_MOV64_IMM(BPF_REG_0, 1), BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .retval = 1, .result = ACCEPT, }, @@ -136,13 +135,12 @@ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), BPF_ALU64_IMM(BPF_OR, BPF_REG_0, 2), BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 3, 1), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .result = ACCEPT, }, { @@ -154,16 +152,16 @@ BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0xff), BPF_JMP_IMM(BPF_JSET, BPF_REG_1, 0xf0, 3), BPF_JMP_IMM(BPF_JLT, BPF_REG_1, 0x10, 1), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JSET, BPF_REG_1, 0x10, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JGE, BPF_REG_1, 0x10, 1), + /* unpriv: nospec (inserted to prevent "R9 !read_ok") */ BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, - .errstr_unpriv = "R9 !read_ok", - .result_unpriv = REJECT, .result = ACCEPT, },
This is based on the gadget from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches").
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de --- .../selftests/bpf/progs/verifier_unpriv.c | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_unpriv.c b/tools/testing/selftests/bpf/progs/verifier_unpriv.c index c42c3839b30c..43236b93ebb5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_unpriv.c +++ b/tools/testing/selftests/bpf/progs/verifier_unpriv.c @@ -729,4 +729,61 @@ l0_%=: r0 = 0; \ " ::: __clobber_all); }
+SEC("socket") +__description("unpriv: Spectre v1 path-based type confusion of scalar as stack-ptr") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("if r0 != 0x1 goto pc+2") +/* This nospec prevents the exploit because it forces the mispredicted (not + * taken) `if r0 != 0x0 goto l0_%=` to resolve before using r6 as a pointer. + * This causes the CPU to realize that `r6 = r9` should have never executed. It + * ensures that r6 always contains a readable stack slot ptr when the insn after + * the nospec executes. + */ +__xlated_unpriv("nospec") +__xlated_unpriv("r9 = *(u8 *)(r6 +0)") +#endif +__naked void unpriv_spec_v1_type_confusion(void) +{ + asm volatile (" \ + r1 = 0; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_hash_8b] ll; \ + call %[bpf_map_lookup_elem]; \ + if r0 == 0 goto l2_%=; \ + /* r0: pointer to a map array entry */ \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_hash_8b] ll; \ + /* r1, r2: prepared call args */ \ + r6 = r10; \ + r6 += -8; \ + /* r6: pointer to readable stack slot */ \ + r9 = 0xffffc900; \ + r9 <<= 32; \ + /* r9: scalar controlled by attacker */ \ + r0 = *(u64 *)(r0 + 0); /* cache miss */ \ + if r0 != 0x0 goto l0_%=; \ + r6 = r9; \ +l0_%=: if r0 != 0x1 goto l1_%=; \ + r9 = *(u8 *)(r6 + 0); \ +l1_%=: /* leak r9 */ \ + r9 &= 1; \ + r9 <<= 9; \ + *(u64*)(r10 - 8) = r9; \ + call %[bpf_map_lookup_elem]; \ + if r0 == 0 goto l2_%=; \ + /* leak secret into is_cached(map[0|512]): */ \ + r0 = *(u64 *)(r0 + 0); \ +l2_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_hash_8b) + : __clobber_all); +} + char _license[] SEC("license") = "GPL";
Insert a nospec before the access to prevent it from ever using an index that is subject to speculative scalar-confusion.
The access itself can either happen directly in the BPF program (reads only, check_stack_read_var_off()) or in a helper (read/write, check_helper_mem_access()).
This relies on the fact that the speculative scalar confusion that leads to the variable-stack access going OOBs must stem from a prior speculative store or branch bypass. Adding a nospec before the variable-stack access will force all previously bypassed stores/branches to complete and cause the stack access to only ever go to the stack slot that is accessed architecturally.
Alternatively, the variable-offset stack access might be a write that can itself be subject to speculative store bypass (this can happen in theory even if this code adds a nospec /before/ the variable-offset write). Only indirect writes by helpers might be affected here (e.g., those taking ARG_PTR_TO_MAP_VALUE). (Because check_stack_write_var_off() does not use check_stack_range_initialized(), in-program variable-offset writes are not affected.) If the in-helper write can be subject to Spectre v4 and the helper writes/overwrites pointers on the BPF stack, they are already a problem for fixed-offset stack accesses and should be subject to Spectre v4 sanitization.
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Acked-by: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- kernel/bpf/verifier.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 92490964eb3b..2cd925b915e0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7894,6 +7894,11 @@ static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn) } }
+static struct bpf_insn_aux_data *cur_aux(const struct bpf_verifier_env *env) +{ + return &env->insn_aux_data[env->insn_idx]; +} + /* When register 'regno' is used to read the stack (either directly or through * a helper function) make sure that it's within stack boundary and, depending * on the access type and privileges, that all elements of the stack are @@ -7933,18 +7938,18 @@ static int check_stack_range_initialized( if (tnum_is_const(reg->var_off)) { min_off = max_off = reg->var_off.value + off; } else { - /* Variable offset is prohibited for unprivileged mode for + /* Variable offset requires a nospec for unprivileged mode for * simplicity since it requires corresponding support in * Spectre masking for stack ALU. * See also retrieve_ptr_limit(). */ if (!env->bypass_spec_v1) { - char tn_buf[48]; - - tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off); - verbose(env, "R%d variable offset stack access prohibited for !root, var_off=%s\n", - regno, tn_buf); - return -EACCES; + /* Allow the access, but prevent it from using a + * speculative offset using a nospec before the + * dereference op. + */ + cur_aux(env)->nospec = true; + WARN_ON_ONCE(cur_aux(env)->alu_state); } /* Only initialized buffer on stack is allowed to be accessed * with variable offset. With uninitialized buffer it's hard to @@ -11172,11 +11177,6 @@ static int check_get_func_ip(struct bpf_verifier_env *env) return -ENOTSUPP; }
-static struct bpf_insn_aux_data *cur_aux(const struct bpf_verifier_env *env) -{ - return &env->insn_aux_data[env->insn_idx]; -} - static bool loop_flag_is_zero(struct bpf_verifier_env *env) { struct bpf_reg_state *regs = cur_regs(env);
ALU sanitization was introduced to ensure that a subsequent ptr access can never go OOB, even under speculation. This is required because we currently allow speculative scalar confusion. Spec. scalar confusion is possible because Spectre v4 sanitization only adds a nospec after critical stores (e.g., scalar overwritten with a pointer).
If we add a nospec before the ALU op, none of the operands can be subject to scalar confusion. As an ADD/SUB can not introduce scalar confusion itself, the result will also not be subject to scalar confusion. Therefore, the subsequent ptr access is always safe.
We directly fall back to nospec for the sanitization errors REASON_BOUNDS, _TYPE, _PATHS, and _LIMIT, even if we are not on a speculative path.
For REASON_STACK, we return the error -ENOMEM directly now. Previously, sanitize_err() returned -EACCES for this case but we change it to -ENOMEM because doing so prevents do_check() from falling back to a nospec if we are on a speculative path. This would not be a serious issue (the verifier would probably run into the -ENOMEM again shortly on the next non-speculative path and still abort verification), but -ENOMEM is more fitting here anyway. An alternative would be -EFAULT, which is also returned for some of the other cases where push_stack() fails, but this is more frequently used for verifier-internal bugs.
Signed-off-by: Luis Gerhorst luis.gerhorst@fau.de Acked-by: Henriette Herzog henriette.herzog@rub.de Cc: Maximilian Ott ott@cs.fau.de Cc: Milan Stephan milan.stephan@fau.de --- kernel/bpf/verifier.c | 85 +++++----------- .../selftests/bpf/progs/verifier_bounds.c | 5 +- .../bpf/progs/verifier_bounds_deduction.c | 45 ++++++--- .../selftests/bpf/progs/verifier_map_ptr.c | 20 +++- .../bpf/progs/verifier_value_ptr_arith.c | 97 ++++++++++++++++--- 5 files changed, 156 insertions(+), 96 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2cd925b915e0..180cab806199 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -13967,14 +13967,6 @@ static bool check_reg_sane_offset(struct bpf_verifier_env *env, return true; }
-enum { - REASON_BOUNDS = -1, - REASON_TYPE = -2, - REASON_PATHS = -3, - REASON_LIMIT = -4, - REASON_STACK = -5, -}; - static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, u32 *alu_limit, bool mask_to_left) { @@ -13997,11 +13989,13 @@ static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, ptr_reg->umax_value) + ptr_reg->off; break; default: - return REASON_TYPE; + /* Register has pointer with unsupported alu operation. */ + return -EOPNOTSUPP; }
+ /* Register tried access beyond pointer bounds. */ if (ptr_limit >= max) - return REASON_LIMIT; + return -EOPNOTSUPP; *alu_limit = ptr_limit; return 0; } @@ -14022,8 +14016,12 @@ static int update_alu_sanitation_state(struct bpf_insn_aux_data *aux, */ if (aux->alu_state && (aux->alu_state != alu_state || - aux->alu_limit != alu_limit)) - return REASON_PATHS; + aux->alu_limit != alu_limit)) { + /* Tried to perform alu op from different maps, paths or scalars */ + aux->nospec = true; + aux->alu_state = 0; + return 0; + }
/* Corresponding fixup done in do_misc_fixups(). */ aux->alu_state = alu_state; @@ -14104,16 +14102,24 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
if (!commit_window) { if (!tnum_is_const(off_reg->var_off) && - (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) - return REASON_BOUNDS; + (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) { + /* Register has unknown scalar with mixed signed bounds. */ + aux->nospec = true; + aux->alu_state = 0; + return 0; + }
info->mask_to_left = (opcode == BPF_ADD && off_is_neg) || (opcode == BPF_SUB && !off_is_neg); }
err = retrieve_ptr_limit(ptr_reg, &alu_limit, info->mask_to_left); - if (err < 0) - return err; + if (err) { + WARN_ON_ONCE(err != -EOPNOTSUPP); + aux->nospec = true; + aux->alu_state = 0; + return 0; + }
if (commit_window) { /* In commit phase we narrow the masking window based on @@ -14166,7 +14172,7 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env, env->insn_idx); if (!ptr_is_dst_reg && ret) *dst_reg = tmp; - return !ret ? REASON_STACK : 0; + return !ret ? -ENOMEM : 0; }
static void sanitize_mark_insn_seen(struct bpf_verifier_env *env) @@ -14182,45 +14188,6 @@ static void sanitize_mark_insn_seen(struct bpf_verifier_env *env) env->insn_aux_data[env->insn_idx].seen = env->pass_cnt; }
-static int sanitize_err(struct bpf_verifier_env *env, - const struct bpf_insn *insn, int reason, - const struct bpf_reg_state *off_reg, - const struct bpf_reg_state *dst_reg) -{ - static const char *err = "pointer arithmetic with it prohibited for !root"; - const char *op = BPF_OP(insn->code) == BPF_ADD ? "add" : "sub"; - u32 dst = insn->dst_reg, src = insn->src_reg; - - switch (reason) { - case REASON_BOUNDS: - verbose(env, "R%d has unknown scalar with mixed signed bounds, %s\n", - off_reg == dst_reg ? dst : src, err); - break; - case REASON_TYPE: - verbose(env, "R%d has pointer with unsupported alu operation, %s\n", - off_reg == dst_reg ? src : dst, err); - break; - case REASON_PATHS: - verbose(env, "R%d tried to %s from different maps, paths or scalars, %s\n", - dst, op, err); - break; - case REASON_LIMIT: - verbose(env, "R%d tried to %s beyond pointer bounds, %s\n", - dst, op, err); - break; - case REASON_STACK: - verbose(env, "R%d could not be pushed for speculative verification, %s\n", - dst, err); - break; - default: - verbose(env, "verifier internal error: unknown reason (%d)\n", - reason); - break; - } - - return -EACCES; -} - /* check that stack access falls within stack limits and that 'reg' doesn't * have a variable offset. * @@ -14386,7 +14353,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg, &info, false); if (ret < 0) - return sanitize_err(env, insn, ret, off_reg, dst_reg); + return ret; }
switch (opcode) { @@ -14514,7 +14481,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, ret = sanitize_ptr_alu(env, insn, dst_reg, off_reg, dst_reg, &info, true); if (ret < 0) - return sanitize_err(env, insn, ret, off_reg, dst_reg); + return ret; }
return 0; @@ -15108,7 +15075,7 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, if (sanitize_needed(opcode)) { ret = sanitize_val_alu(env, insn); if (ret < 0) - return sanitize_err(env, insn, ret, NULL, NULL); + return ret; }
/* Calculate sign/unsigned bounds and tnum for alu32 and alu64 bit ops. diff --git a/tools/testing/selftests/bpf/progs/verifier_bounds.c b/tools/testing/selftests/bpf/progs/verifier_bounds.c index 30e16153fdf1..f2ee6d7febda 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bounds.c +++ b/tools/testing/selftests/bpf/progs/verifier_bounds.c @@ -47,9 +47,12 @@ SEC("socket") __description("subtraction bounds (map value) variant 2") __failure __msg("R0 min value is negative, either use unsigned index or do a if (index >=0) check.") -__msg_unpriv("R1 has unknown scalar with mixed signed bounds") +__msg_unpriv("R0 pointer arithmetic of map value goes out of range, prohibited for !root") __naked void bounds_map_value_variant_2(void) { + /* unpriv: nospec inserted to prevent "R1 has unknown scalar with mixed + * signed bounds". + */ asm volatile (" \ r1 = 0; \ *(u64*)(r10 - 8) = r1; \ diff --git a/tools/testing/selftests/bpf/progs/verifier_bounds_deduction.c b/tools/testing/selftests/bpf/progs/verifier_bounds_deduction.c index c506afbdd936..24ecaf89004e 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bounds_deduction.c +++ b/tools/testing/selftests/bpf/progs/verifier_bounds_deduction.c @@ -8,22 +8,26 @@ SEC("socket") __description("check deducing bounds from const, 1") __failure __msg("R0 tried to subtract pointer from scalar") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __naked void deducing_bounds_from_const_1(void) { asm volatile (" \ r0 = 1; \ if r0 s>= 1 goto l0_%=; \ -l0_%=: r0 -= r1; \ +l0_%=: /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ + r0 -= r1; \ exit; \ " ::: __clobber_all); }
SEC("socket") __description("check deducing bounds from const, 2") -__success __failure_unpriv -__msg_unpriv("R1 has pointer with unsupported alu operation") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("nospec") /* inserted to prevent `R1 has pointer with unsupported alu operation` */ +__xlated_unpriv("r1 -= r0") +#endif __naked void deducing_bounds_from_const_2(void) { asm volatile (" \ @@ -40,22 +44,26 @@ l1_%=: r1 -= r0; \ SEC("socket") __description("check deducing bounds from const, 3") __failure __msg("R0 tried to subtract pointer from scalar") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __naked void deducing_bounds_from_const_3(void) { asm volatile (" \ r0 = 0; \ if r0 s<= 0 goto l0_%=; \ -l0_%=: r0 -= r1; \ +l0_%=: /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ + r0 -= r1; \ exit; \ " ::: __clobber_all); }
SEC("socket") __description("check deducing bounds from const, 4") -__success __failure_unpriv -__msg_unpriv("R6 has pointer with unsupported alu operation") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("nospec") /* inserted to prevent `R6 has pointer with unsupported alu operation` */ +__xlated_unpriv("r6 -= r0") +#endif __naked void deducing_bounds_from_const_4(void) { asm volatile (" \ @@ -73,12 +81,13 @@ l1_%=: r6 -= r0; \ SEC("socket") __description("check deducing bounds from const, 5") __failure __msg("R0 tried to subtract pointer from scalar") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __naked void deducing_bounds_from_const_5(void) { asm volatile (" \ r0 = 0; \ if r0 s>= 1 goto l0_%=; \ + /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ r0 -= r1; \ l0_%=: exit; \ " ::: __clobber_all); @@ -87,14 +96,15 @@ l0_%=: exit; \ SEC("socket") __description("check deducing bounds from const, 6") __failure __msg("R0 tried to subtract pointer from scalar") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __naked void deducing_bounds_from_const_6(void) { asm volatile (" \ r0 = 0; \ if r0 s>= 0 goto l0_%=; \ exit; \ -l0_%=: r0 -= r1; \ +l0_%=: /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ + r0 -= r1; \ exit; \ " ::: __clobber_all); } @@ -102,14 +112,15 @@ l0_%=: r0 -= r1; \ SEC("socket") __description("check deducing bounds from const, 7") __failure __msg("dereference of modified ctx ptr") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __flag(BPF_F_ANY_ALIGNMENT) __naked void deducing_bounds_from_const_7(void) { asm volatile (" \ r0 = %[__imm_0]; \ if r0 s>= 0 goto l0_%=; \ -l0_%=: r1 -= r0; \ +l0_%=: /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ + r1 -= r0; \ r0 = *(u32*)(r1 + %[__sk_buff_mark]); \ exit; \ " : @@ -121,13 +132,14 @@ l0_%=: r1 -= r0; \ SEC("socket") __description("check deducing bounds from const, 8") __failure __msg("negative offset ctx ptr R1 off=-1 disallowed") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __flag(BPF_F_ANY_ALIGNMENT) __naked void deducing_bounds_from_const_8(void) { asm volatile (" \ r0 = %[__imm_0]; \ if r0 s>= 0 goto l0_%=; \ + /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ r1 += r0; \ l0_%=: r0 = *(u32*)(r1 + %[__sk_buff_mark]); \ exit; \ @@ -140,13 +152,14 @@ l0_%=: r0 = *(u32*)(r1 + %[__sk_buff_mark]); \ SEC("socket") __description("check deducing bounds from const, 9") __failure __msg("R0 tried to subtract pointer from scalar") -__msg_unpriv("R1 has pointer with unsupported alu operation") +__failure_unpriv __naked void deducing_bounds_from_const_9(void) { asm volatile (" \ r0 = 0; \ if r0 s>= 0 goto l0_%=; \ -l0_%=: r0 -= r1; \ +l0_%=: /* unpriv: nospec (inserted to prevent `R1 has pointer with unsupported alu operation`) */\ + r0 -= r1; \ exit; \ " ::: __clobber_all); } diff --git a/tools/testing/selftests/bpf/progs/verifier_map_ptr.c b/tools/testing/selftests/bpf/progs/verifier_map_ptr.c index 11a079145966..a540f29b7532 100644 --- a/tools/testing/selftests/bpf/progs/verifier_map_ptr.c +++ b/tools/testing/selftests/bpf/progs/verifier_map_ptr.c @@ -110,9 +110,17 @@ __naked void ptr_read_ops_field_accepted(void)
SEC("socket") __description("bpf_map_ptr: r = 0, map_ptr = map_ptr + r") -__success __failure_unpriv -__msg_unpriv("R1 has pointer with unsupported alu operation") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("r1 = 0x") /* r0 = %[map_hash_16b] ll */ +/* This nospec is not really needed here, because there is no bypassable + * branch/store that could lead to r0 not containing 0 when `r1 += r0` executes. + * We still keep it like this to not complicate the verifier. + */ +__xlated_unpriv("nospec") /* inserted to prevent `R1 has pointer with unsupported alu operation` */ +__xlated_unpriv("r1 += r0") +#endif __naked void map_ptr_map_ptr_r(void) { asm volatile (" \ @@ -134,9 +142,13 @@ __naked void map_ptr_map_ptr_r(void)
SEC("socket") __description("bpf_map_ptr: r = 0, r = r + map_ptr") -__success __failure_unpriv -__msg_unpriv("R0 has pointer with unsupported alu operation") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("r0 = 0x") /* r0 = %[map_hash_16b] ll */ +__xlated_unpriv("nospec") /* inserted to prevent `R0 has pointer with unsupported alu operation` */ +__xlated_unpriv("r1 += r0") +#endif __naked void _0_r_r_map_ptr(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c b/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c index fcea9819e359..a35090e3de2b 100644 --- a/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c +++ b/tools/testing/selftests/bpf/progs/verifier_value_ptr_arith.c @@ -41,9 +41,17 @@ struct {
SEC("socket") __description("map access: known scalar += value_ptr unknown vs const") -__success __failure_unpriv -__msg_unpriv("R1 tried to add from different maps, paths or scalars") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("r1 &= 7") +__xlated_unpriv("goto pc+1") +/* l3_%=: */ +__xlated_unpriv("r1 = 3") +/* l4_%=: */ +__xlated_unpriv("nospec") /* inserted to prevent `R1 tried to add from different maps, paths or scalars` */ +__xlated_unpriv("r1 += r0") +#endif __naked void value_ptr_unknown_vs_const(void) { asm volatile (" \ @@ -79,9 +87,14 @@ l2_%=: r0 = 1; \
SEC("socket") __description("map access: known scalar += value_ptr const vs unknown") -__success __failure_unpriv -__msg_unpriv("R1 tried to add from different maps, paths or scalars") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("r1 &= 7") +/* l4_%=: */ +__xlated_unpriv("nospec") /* inserted to prevent `R1 tried to add from different maps, paths or scalars` */ +__xlated_unpriv("r1 += r0") +#endif __naked void value_ptr_const_vs_unknown(void) { asm volatile (" \ @@ -117,9 +130,16 @@ l2_%=: r0 = 1; \
SEC("socket") __description("map access: known scalar += value_ptr const vs const (ne)") -__success __failure_unpriv -__msg_unpriv("R1 tried to add from different maps, paths or scalars") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("goto pc+1") /* to l4, must not be pc+2 as this would skip nospec */ +/* l3_%=: */ +__xlated_unpriv("r1 = 5") +/* l4_%=: */ +__xlated_unpriv("nospec") /* inserted to prevent `R1 tried to add from different maps, paths or scalars` */ +__xlated_unpriv("r1 += r0") +#endif __naked void ptr_const_vs_const_ne(void) { asm volatile (" \ @@ -225,9 +245,18 @@ l2_%=: r0 = 1; \
SEC("socket") __description("map access: known scalar += value_ptr unknown vs unknown (lt)") -__success __failure_unpriv -__msg_unpriv("R1 tried to add from different maps, paths or scalars") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("r1 &= 3") +__xlated_unpriv("goto pc+3") /* must go to l4 (nospec) */ +__xlated_unpriv("r1 = 6") +__xlated_unpriv("r1 = -r1") +__xlated_unpriv("r1 &= 7") +/* l4_%=: */ +__xlated_unpriv("nospec") /* inserted to prevent `R1 tried to add from different maps, paths or scalars` */ +__xlated_unpriv("r1 += r0") +#endif __naked void ptr_unknown_vs_unknown_lt(void) { asm volatile (" \ @@ -265,9 +294,14 @@ l2_%=: r0 = 1; \
SEC("socket") __description("map access: known scalar += value_ptr unknown vs unknown (gt)") -__success __failure_unpriv -__msg_unpriv("R1 tried to add from different maps, paths or scalars") +__success __success_unpriv __retval(1) +#ifdef SPEC_V1 +__xlated_unpriv("r1 &= 3") +/* l4_%=: */ +__xlated_unpriv("nospec") /* inserted to prevent `R1 tried to add from different maps, paths or scalars` */ +__xlated_unpriv("r1 += r0") +#endif __naked void ptr_unknown_vs_unknown_gt(void) { asm volatile (" \ @@ -398,9 +432,27 @@ l2_%=: r0 = 1; \
SEC("socket") __description("map access: mixing value pointer and scalar, 1") -__success __failure_unpriv -__msg_unpriv("R2 tried to add from different maps, paths or scalars, pointer arithmetic with it prohibited for !root") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("goto pc+2") /* to l2, must not be pc+3 as this would skip nospec */ +__xlated_unpriv("r2 = 0") +__xlated_unpriv("r3 = 1048576") +/* l2_%=: */ +__xlated_unpriv("nospec") /* inserted to prevent `R2 tried to add from different maps, paths or scalars, pointer arithmetic with it prohibited for !root` */ +__xlated_unpriv("r2 += r3") +__xlated_unpriv("if r1 != 0x0 goto pc+1") +__xlated_unpriv("goto pc+5") /* to l4 */ +/* l3_%=: */ +__xlated_unpriv("r0 = 322376503") +__xlated_unpriv("nospec") /* inserted to prevent `R2 pointer comparison prohibited` */ +__xlated_unpriv("if r2 != 0x100000 goto pc+2") /* to l4 */ +__xlated_unpriv("r0 = 0") +__xlated_unpriv("exit") +/* l4_%=: */ +__xlated_unpriv("nospec") +__xlated_unpriv("r0 = *(u8 *)(r0 +0)") +#endif __naked void value_pointer_and_scalar_1(void) { asm volatile (" \ @@ -434,7 +486,6 @@ l2_%=: /* common instruction */ \ l3_%=: /* branch B */ \ r0 = 0x13371337; \ /* verifier follows fall-through */ \ - /* unpriv: nospec (inserted to prevent `R2 pointer comparison prohibited`) */\ if r2 != 0x100000 goto l4_%=; \ r0 = 0; \ exit; \ @@ -452,9 +503,24 @@ l4_%=: /* fake-dead code; targeted from branch A to \
SEC("socket") __description("map access: mixing value pointer and scalar, 2") -__success __failure_unpriv -__msg_unpriv("R2 tried to add from different maps, paths or scalars, pointer arithmetic with it prohibited for !root") +__success __success_unpriv __retval(0) +#ifdef SPEC_V1 +__xlated_unpriv("goto pc+2") +__xlated_unpriv("r2 = r0") +__xlated_unpriv("r3 = 0") +__xlated_unpriv("nospec") +__xlated_unpriv("r2 += r3") +__xlated_unpriv("if r1 != 0x0 goto pc+1") +__xlated_unpriv("goto pc+5") +__xlated_unpriv("r0 = 322376503") +__xlated_unpriv("nospec") +__xlated_unpriv("if r2 != 0x100000 goto pc+2") +__xlated_unpriv("r0 = 0") +__xlated_unpriv("exit") +__xlated_unpriv("nospec") /* inserted to prevent `R0 invalid mem access 'scalar'` */ +__xlated_unpriv("r0 = *(u8 *)(r0 +0)") +#endif __naked void value_pointer_and_scalar_2(void) { asm volatile (" \ @@ -495,7 +561,6 @@ l4_%=: /* fake-dead code; targeted from branch A to \ * prevent dead code sanitization, rejected \ * via branch B however \ */ \ - /* unpriv: nospec (inserted to prevent `R0 invalid mem access 'scalar'`) */\ r0 = *(u8*)(r0 + 0); \ r0 = 0; \ exit; \
linux-kselftest-mirror@lists.linaro.org