Re: [5.10, 5.15] New bpf kselftest failure

17 Jul 2023

      On Mon, 2023-07-17 at 10:59 -0400, Luiz Capitulino wrote:
...
On 2023-07-17 10:55, Eduard Zingerman wrote:
...
On Mon, 2023-07-17 at 09:04 -0400, Luiz Capitulino wrote:
...
Hi,
The upstream commit below is backported to 5.10.186, 5.15.120 and 6.1.36:
"""
commit ecdf985d7615356b78241fdb159c091830ed0380
Author: Eduard Zingerman eddyz87@gmail.com
Date:   Wed Feb 15 01:20:27 2023 +0200
  bpf: track immediate values written to stack by BPF_ST instruction

"""
This commit is causing the following bpf:test_verifier kselftest to fail:
"""
# #760/p precise: ST insn causing spi > allocated_stack FAIL
"""
I can reproduce the error on 6.1.36 but don't understand what's causing it yet.
The log is suspiciously different from master, will comment later today.
Thank you very much for the prompt reply, Eduard.
I'm available for further testing if needed.
The test case in question looks as follows:
; old log                 new log
0: R1=ctx(off=0,imm=0)
0: r3 = r10                ; R3=fp0 R10=fp0          R3=fp0 R10=fp0
1: if r3 != 0x7b goto pc+0 ; R3=fp0                  R3=fp0
2: *(u64 *)(r3 -8) = 0     ; R3=fp0 fp-8_w=mmmmmmmm  R3=fp0 fp-8_w=00000000
3: r4 = *(u64 *)(r10 -8)   ; R4_w=scalar() R10=fp0   R4_w=P0 R10=fp0 fp-8_w=00000000
4: r0 = -1                 ; R0=-1                   R0=-1
5: if r4 > r0 goto pc+0
6: exit
The test checks the log generated by __mark_chain_precision() at
instruction #5: registers `r4` and `r0` should be marked precise
(in that order).
The `r4` and `r0` are marked precise by check_cond_jmp_op() function
because jump #5 is predicted (because comparison is unsigned `r4` is
never greater than -1/`r0`).
Failure happens because only log for `r0` markings is present.
Patch [5] changes processing of instruction #2 ( *(u64 *)(r3 -8) = 0; ):
- before patch fp[-8] is marked as mmmmmmmm (any 64-bit scalar value);
- after patch fp[-8] is marked as 00000000.
Which in turn changes processing of instruction #3 ( r4 = *(u64 *)(r10 -8); ):
- before patch r4 is marked as unbound scalar
- after patch r4 is marked as precise zero
  (check_stack_read_fixed_off() has a special case for zero loads,
   DST registers of such loads are always marked precise).
If __mark_chain_precision() is called for a register that is already
marked as precise no log is generated. Thus the log looks different
when check_cond_jmp_op() calls __mark_chain_precision() for `r4` and
then for `r0`:
- before patch `r4` is not precise and log for it is generated;
- after patch `r4` is precise and log for it is not generated.
Now a question:
  why does current master has log for both `r4` and `r0`,
  given that commit [5] is present in master?
This happens because of commit [2] (not back-ported to 6.1.36).
Commit [2] adds function mark_all_scalars_imprecise() that removes
precision marks from scalar registers when new state is created in a
state chain (and for this test new states are created before #1 and #5
because of the BPF_F_TEST_STATE_FREQ flag).
Effectively, before #5 is processed by check_cond_jmp_op()
mark_all_scalars_imprecise() undoes `r4`'s precision mark assigned by
check_stack_read_fixed_off() => log for both `r4` and `r0` is present.
I need some time to convince myself that such interaction between
check_stack_read_fixed_off() and mark_all_scalars_imprecise()
is safe, but that's what we have in master.
Commit [2] is a part of a series [4] by Andrii Nakryiko.
This series had been partially back-ported already.
Commits [0,1,2,3] are not yet back-ported and have to be picked
altogether, otherwise there are numerous test failures.
Still, when I cherry-pick [0,1,2,3] `./test_progs -a setget_sockopt` is failing.
I'll investigate this failure but don't think I'll finish today.
---
Alternatively, if the goal is to minimize amount of changes, we can
disable or modify the 'precise: ST insn causing spi > allocated_stack'.
---
Commits (in chronological order):
[0] be2ef8161572 ("bpf: allow precision tracking for programs with subprogs")
[1] f63181b6ae79 ("bpf: stop setting precise in current state")
[2] 7a830b53c17b ("bpf: aggressively forget precise markings during state checkpointing")
[3] 4f999b767769 ("selftests/bpf: make test_align selftest more robust")
[4] 07d90c72efbe ("Merge branch 'BPF verifier precision tracking improvements'")
[5] ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
...

Luiz

...
...
Since this test didn't fail before ecdf985d76 backport, the question is
if this is a test bug or if this commit introduced a regression.
I haven't checked if this failure is present in latest Linus tree because
I was unable to build & run the bpf kselftests in an older distro.
Also, there some important details about running the bpf kselftests
in 5.10 and 5.15:

On 5.10, bpf kselftest build is broken. The following upstream

commit needs to be cherry-picked for it to build & run:
"""
commit 4237e9f4a96228ccc8a7abe5e4b30834323cd353
Author: Gilad Reti gilad.reti@gmail.com
Date:   Wed Jan 13 07:38:08 2021 +0200
  selftests/bpf: Add verifier test for PTR_TO_MEM spill

"""

On 5.15.120 there's one additional test that's failing, but I didn't

debug this one:
"""
#150/p calls: trigger reg2btf_ids[reg→type] for reg→type > __BPF_REG_TYPE_MAX FAIL
FAIL
"""

On 5.11 onwards, building and running bpf tests is disabled by

default by commit 7a6eb7c34a78498742b5f82543b7a68c1c443329, so I wonder
if we want to backport this to 5.10 as well?
Thanks!

Luiz

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [5.10, 5.15] New bpf kselftest failure