On Mon, 2023-07-17 at 10:59 -0400, Luiz Capitulino wrote:
On 2023-07-17 10:55, Eduard Zingerman wrote:
On Mon, 2023-07-17 at 09:04 -0400, Luiz Capitulino wrote:
Hi,
The upstream commit below is backported to 5.10.186, 5.15.120 and 6.1.36:
""" commit ecdf985d7615356b78241fdb159c091830ed0380 Author: Eduard Zingerman eddyz87@gmail.com Date: Wed Feb 15 01:20:27 2023 +0200
bpf: track immediate values written to stack by BPF_ST instruction
"""
This commit is causing the following bpf:test_verifier kselftest to fail:
""" # #760/p precise: ST insn causing spi > allocated_stack FAIL """
I can reproduce the error on 6.1.36 but don't understand what's causing it yet. The log is suspiciously different from master, will comment later today.
Thank you very much for the prompt reply, Eduard.
I'm available for further testing if needed.
The test case in question looks as follows:
; old log new log 0: R1=ctx(off=0,imm=0) 0: r3 = r10 ; R3=fp0 R10=fp0 R3=fp0 R10=fp0 1: if r3 != 0x7b goto pc+0 ; R3=fp0 R3=fp0 2: *(u64 *)(r3 -8) = 0 ; R3=fp0 fp-8_w=mmmmmmmm R3=fp0 fp-8_w=00000000 3: r4 = *(u64 *)(r10 -8) ; R4_w=scalar() R10=fp0 R4_w=P0 R10=fp0 fp-8_w=00000000 4: r0 = -1 ; R0=-1 R0=-1 5: if r4 > r0 goto pc+0 6: exit
The test checks the log generated by __mark_chain_precision() at instruction #5: registers `r4` and `r0` should be marked precise (in that order).
The `r4` and `r0` are marked precise by check_cond_jmp_op() function because jump #5 is predicted (because comparison is unsigned `r4` is never greater than -1/`r0`).
Failure happens because only log for `r0` markings is present.
Patch [5] changes processing of instruction #2 ( *(u64 *)(r3 -8) = 0; ): - before patch fp[-8] is marked as mmmmmmmm (any 64-bit scalar value); - after patch fp[-8] is marked as 00000000.
Which in turn changes processing of instruction #3 ( r4 = *(u64 *)(r10 -8); ): - before patch r4 is marked as unbound scalar - after patch r4 is marked as precise zero (check_stack_read_fixed_off() has a special case for zero loads, DST registers of such loads are always marked precise).
If __mark_chain_precision() is called for a register that is already marked as precise no log is generated. Thus the log looks different when check_cond_jmp_op() calls __mark_chain_precision() for `r4` and then for `r0`: - before patch `r4` is not precise and log for it is generated; - after patch `r4` is precise and log for it is not generated.
Now a question: why does current master has log for both `r4` and `r0`, given that commit [5] is present in master?
This happens because of commit [2] (not back-ported to 6.1.36).
Commit [2] adds function mark_all_scalars_imprecise() that removes precision marks from scalar registers when new state is created in a state chain (and for this test new states are created before #1 and #5 because of the BPF_F_TEST_STATE_FREQ flag).
Effectively, before #5 is processed by check_cond_jmp_op() mark_all_scalars_imprecise() undoes `r4`'s precision mark assigned by check_stack_read_fixed_off() => log for both `r4` and `r0` is present.
I need some time to convince myself that such interaction between check_stack_read_fixed_off() and mark_all_scalars_imprecise() is safe, but that's what we have in master.
Commit [2] is a part of a series [4] by Andrii Nakryiko. This series had been partially back-ported already. Commits [0,1,2,3] are not yet back-ported and have to be picked altogether, otherwise there are numerous test failures.
Still, when I cherry-pick [0,1,2,3] `./test_progs -a setget_sockopt` is failing. I'll investigate this failure but don't think I'll finish today.
---
Alternatively, if the goal is to minimize amount of changes, we can disable or modify the 'precise: ST insn causing spi > allocated_stack'.
---
Commits (in chronological order): [0] be2ef8161572 ("bpf: allow precision tracking for programs with subprogs") [1] f63181b6ae79 ("bpf: stop setting precise in current state") [2] 7a830b53c17b ("bpf: aggressively forget precise markings during state checkpointing") [3] 4f999b767769 ("selftests/bpf: make test_align selftest more robust") [4] 07d90c72efbe ("Merge branch 'BPF verifier precision tracking improvements'") [5] ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
- Luiz
Since this test didn't fail before ecdf985d76 backport, the question is if this is a test bug or if this commit introduced a regression.
I haven't checked if this failure is present in latest Linus tree because I was unable to build & run the bpf kselftests in an older distro.
Also, there some important details about running the bpf kselftests in 5.10 and 5.15:
- On 5.10, bpf kselftest build is broken. The following upstream
commit needs to be cherry-picked for it to build & run:
""" commit 4237e9f4a96228ccc8a7abe5e4b30834323cd353 Author: Gilad Reti gilad.reti@gmail.com Date: Wed Jan 13 07:38:08 2021 +0200
selftests/bpf: Add verifier test for PTR_TO_MEM spill
"""
- On 5.15.120 there's one additional test that's failing, but I didn't
debug this one:
""" #150/p calls: trigger reg2btf_ids[reg→type] for reg→type > __BPF_REG_TYPE_MAX FAIL FAIL """
- On 5.11 onwards, building and running bpf tests is disabled by
default by commit 7a6eb7c34a78498742b5f82543b7a68c1c443329, so I wonder if we want to backport this to 5.10 as well?
Thanks!
- Luiz