All for 4.9 backported and tested.
Thanks!
Daniel Borkmann (5): bpf: fix wrong exposure of map_flags into fdinfo for lpm bpf: fix mlock precharge on arraymaps bpf, x64: implement retpoline for tail call bpf, arm64: fix out of bounds access in tail call bpf, ppc64: fix out of bounds access in tail call
Eric Dumazet (1): bpf: add schedule points in percpu arrays management
arch/arm64/net/bpf_jit_comp.c | 5 +++-- arch/powerpc/net/bpf_jit_comp64.c | 1 + arch/x86/include/asm/nospec-branch.h | 37 ++++++++++++++++++++++++++++++++++++ arch/x86/net/bpf_jit_comp.c | 9 +++++---- kernel/bpf/arraymap.c | 35 ++++++++++++++++++++++------------ kernel/bpf/stackmap.c | 1 + 6 files changed, 70 insertions(+), 18 deletions(-)
[ upstream commit a316338cb71a3260201490e615f2f6d5c0d8fb2c ]
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via attr->map_flags, since it does not support preallocation yet. We check the flag, but we never copy the flag into trie->map.map_flags, which is later on exposed into fdinfo and used by loaders such as iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test whether a pinned map has the same spec as the one from the BPF obj file and if not, bails out, which is currently the case for lpm since it exposes always 0 as flags.
Also copy over flags in array_map_alloc() and stack_map_alloc(). They always have to be 0 right now, but we should make sure to not miss to copy them over at a later point in time when we add actual flags for them to use.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme jarno@covalent.io Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Daniel Borkmann daniel@iogearbox.net --- kernel/bpf/arraymap.c | 1 + kernel/bpf/stackmap.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 9a1e6ed..3be357a 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -107,6 +107,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) array->map.key_size = attr->key_size; array->map.value_size = attr->value_size; array->map.max_entries = attr->max_entries; + array->map.map_flags = attr->map_flags; array->elem_size = elem_size;
if (!percpu) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index be85191..a2a232d 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -88,6 +88,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr) smap->map.key_size = attr->key_size; smap->map.value_size = value_size; smap->map.max_entries = attr->max_entries; + smap->map.map_flags = attr->map_flags; smap->n_buckets = n_buckets; smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
This is a note to let you know that I've just added the patch titled
bpf: fix wrong exposure of map_flags into fdinfo for lpm
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 14:20:51 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:32 +0100 Subject: bpf: fix wrong exposure of map_flags into fdinfo for lpm To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org, "David S . Miller" davem@davemloft.net Message-ID: d0c41d2614afbe10501cc3d96d998952694bab7f.1520521792.git.daniel@iogearbox.net
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit a316338cb71a3260201490e615f2f6d5c0d8fb2c ]
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via attr->map_flags, since it does not support preallocation yet. We check the flag, but we never copy the flag into trie->map.map_flags, which is later on exposed into fdinfo and used by loaders such as iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test whether a pinned map has the same spec as the one from the BPF obj file and if not, bails out, which is currently the case for lpm since it exposes always 0 as flags.
Also copy over flags in array_map_alloc() and stack_map_alloc(). They always have to be 0 right now, but we should make sure to not miss to copy them over at a later point in time when we add actual flags for them to use.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme jarno@covalent.io Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/arraymap.c | 1 + kernel/bpf/stackmap.c | 1 + 2 files changed, 2 insertions(+)
--- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -107,6 +107,7 @@ static struct bpf_map *array_map_alloc(u array->map.key_size = attr->key_size; array->map.value_size = attr->value_size; array->map.max_entries = attr->max_entries; + array->map.map_flags = attr->map_flags; array->elem_size = elem_size;
if (!percpu) --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -88,6 +88,7 @@ static struct bpf_map *stack_map_alloc(u smap->map.key_size = attr->key_size; smap->map.value_size = value_size; smap->map.max_entries = attr->max_entries; + smap->map.map_flags = attr->map_flags; smap->n_buckets = n_buckets; smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.9/bpf-fix-mlock-precharge-on-arraymaps.patch queue-4.9/bpf-x64-implement-retpoline-for-tail-call.patch queue-4.9/bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch queue-4.9/bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-add-schedule-points-in-percpu-arrays-management.patch
[ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ]
syzkaller recently triggered OOM during percpu map allocation; while there is work in progress by Dennis Zhou to add __GFP_NORETRY semantics for percpu allocator under pressure, there seems also a missing bpf_map_precharge_memlock() check in array map allocation.
Given today the actual bpf_map_charge_memlock() happens after the find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock() is there to bail out early before we go and do the map setup work when we find that we hit the limits anyway. Therefore add this for array map as well.
Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Dennis Zhou dennisszhou@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net --- kernel/bpf/arraymap.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 3be357a..075ddb8 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -48,8 +48,9 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY; u32 elem_size, index_mask, max_entries; bool unpriv = !capable(CAP_SYS_ADMIN); + u64 cost, array_size, mask64; struct bpf_array *array; - u64 array_size, mask64; + int ret;
/* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || @@ -92,8 +93,19 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) array_size += (u64) max_entries * elem_size;
/* make sure there is no u32 overflow later in round_up() */ - if (array_size >= U32_MAX - PAGE_SIZE) + cost = array_size; + if (cost >= U32_MAX - PAGE_SIZE) return ERR_PTR(-ENOMEM); + if (percpu) { + cost += (u64)attr->max_entries * elem_size * num_possible_cpus(); + if (cost >= U32_MAX - PAGE_SIZE) + return ERR_PTR(-ENOMEM); + } + cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; + + ret = bpf_map_precharge_memlock(cost); + if (ret < 0) + return ERR_PTR(ret);
/* allocate all map elements and zero-initialize them */ array = bpf_map_area_alloc(array_size); @@ -108,20 +120,15 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) array->map.value_size = attr->value_size; array->map.max_entries = attr->max_entries; array->map.map_flags = attr->map_flags; + array->map.pages = cost; array->elem_size = elem_size;
- if (!percpu) - goto out; - - array_size += (u64) attr->max_entries * elem_size * num_possible_cpus(); - - if (array_size >= U32_MAX - PAGE_SIZE || - elem_size > PCPU_MIN_UNIT_SIZE || bpf_array_alloc_percpu(array)) { + if (percpu && + (elem_size > PCPU_MIN_UNIT_SIZE || + bpf_array_alloc_percpu(array))) { bpf_map_area_free(array); return ERR_PTR(-ENOMEM); } -out: - array->map.pages = round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
return &array->map; }
This is a note to let you know that I've just added the patch titled
bpf: fix mlock precharge on arraymaps
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-fix-mlock-precharge-on-arraymaps.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 14:20:51 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:33 +0100 Subject: bpf: fix mlock precharge on arraymaps To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org, Dennis Zhou dennisszhou@gmail.com Message-ID: 56632230ccbd31f06f33af4c6ac9fcbc88506825.1520521792.git.daniel@iogearbox.net
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ]
syzkaller recently triggered OOM during percpu map allocation; while there is work in progress by Dennis Zhou to add __GFP_NORETRY semantics for percpu allocator under pressure, there seems also a missing bpf_map_precharge_memlock() check in array map allocation.
Given today the actual bpf_map_charge_memlock() happens after the find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock() is there to bail out early before we go and do the map setup work when we find that we hit the limits anyway. Therefore add this for array map as well.
Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Dennis Zhou dennisszhou@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/arraymap.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-)
--- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -48,8 +48,9 @@ static struct bpf_map *array_map_alloc(u bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY; u32 elem_size, index_mask, max_entries; bool unpriv = !capable(CAP_SYS_ADMIN); + u64 cost, array_size, mask64; struct bpf_array *array; - u64 array_size, mask64; + int ret;
/* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || @@ -92,8 +93,19 @@ static struct bpf_map *array_map_alloc(u array_size += (u64) max_entries * elem_size;
/* make sure there is no u32 overflow later in round_up() */ - if (array_size >= U32_MAX - PAGE_SIZE) + cost = array_size; + if (cost >= U32_MAX - PAGE_SIZE) return ERR_PTR(-ENOMEM); + if (percpu) { + cost += (u64)attr->max_entries * elem_size * num_possible_cpus(); + if (cost >= U32_MAX - PAGE_SIZE) + return ERR_PTR(-ENOMEM); + } + cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; + + ret = bpf_map_precharge_memlock(cost); + if (ret < 0) + return ERR_PTR(ret);
/* allocate all map elements and zero-initialize them */ array = bpf_map_area_alloc(array_size); @@ -108,20 +120,15 @@ static struct bpf_map *array_map_alloc(u array->map.value_size = attr->value_size; array->map.max_entries = attr->max_entries; array->map.map_flags = attr->map_flags; + array->map.pages = cost; array->elem_size = elem_size;
- if (!percpu) - goto out; - - array_size += (u64) attr->max_entries * elem_size * num_possible_cpus(); - - if (array_size >= U32_MAX - PAGE_SIZE || - elem_size > PCPU_MIN_UNIT_SIZE || bpf_array_alloc_percpu(array)) { + if (percpu && + (elem_size > PCPU_MIN_UNIT_SIZE || + bpf_array_alloc_percpu(array))) { bpf_map_area_free(array); return ERR_PTR(-ENOMEM); } -out: - array->map.pages = round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
return &array->map; }
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.9/bpf-fix-mlock-precharge-on-arraymaps.patch queue-4.9/bpf-x64-implement-retpoline-for-tail-call.patch queue-4.9/bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch queue-4.9/bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-add-schedule-points-in-percpu-arrays-management.patch
[ upstream commit a493a87f38cfa48caaa95c9347be2d914c6fdf29 ]
Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]).
JIT dump after patch:
# bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit
With CONFIG_RETPOLINE:
# bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...]
* relative fall-through jumps in error case + retpoline for indirect jump
Without CONFIG_RETPOLINE:
# bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...]
* relative fall-through jumps in error case - plain indirect jump as before
[0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2f...
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net --- arch/x86/include/asm/nospec-branch.h | 37 ++++++++++++++++++++++++++++++++++++ arch/x86/net/bpf_jit_comp.c | 9 +++++---- 2 files changed, 42 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 76b0585..81a1be3 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -177,4 +177,41 @@ static inline void indirect_branch_prediction_barrier(void) }
#endif /* __ASSEMBLY__ */ + +/* + * Below is used in the eBPF JIT compiler and emits the byte sequence + * for the following assembly: + * + * With retpolines configured: + * + * callq do_rop + * spec_trap: + * pause + * lfence + * jmp spec_trap + * do_rop: + * mov %rax,(%rsp) + * retq + * + * Without retpolines configured: + * + * jmp *%rax + */ +#ifdef CONFIG_RETPOLINE +# define RETPOLINE_RAX_BPF_JIT_SIZE 17 +# define RETPOLINE_RAX_BPF_JIT() \ + EMIT1_off32(0xE8, 7); /* callq do_rop */ \ + /* spec_trap: */ \ + EMIT2(0xF3, 0x90); /* pause */ \ + EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \ + EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \ + /* do_rop: */ \ + EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \ + EMIT1(0xC3); /* retq */ +#else +# define RETPOLINE_RAX_BPF_JIT_SIZE 2 +# define RETPOLINE_RAX_BPF_JIT() \ + EMIT2(0xFF, 0xE0); /* jmp *%rax */ +#endif + #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */ diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 7840331..1f7ed2e 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -12,6 +12,7 @@ #include <linux/filter.h> #include <linux/if_vlan.h> #include <asm/cacheflush.h> +#include <asm/nospec-branch.h> #include <linux/bpf.h>
int bpf_jit_enable __read_mostly; @@ -281,7 +282,7 @@ static void emit_bpf_tail_call(u8 **pprog) EMIT2(0x89, 0xD2); /* mov edx, edx */ EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */ offsetof(struct bpf_array, map.max_entries)); -#define OFFSET1 43 /* number of bytes to jump */ +#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* number of bytes to jump */ EMIT2(X86_JBE, OFFSET1); /* jbe out */ label1 = cnt;
@@ -290,7 +291,7 @@ static void emit_bpf_tail_call(u8 **pprog) */ EMIT2_off32(0x8B, 0x85, -STACKSIZE + 36); /* mov eax, dword ptr [rbp - 516] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ -#define OFFSET2 32 +#define OFFSET2 (30 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JA, OFFSET2); /* ja out */ label2 = cnt; EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ @@ -304,7 +305,7 @@ static void emit_bpf_tail_call(u8 **pprog) * goto out; */ EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */ -#define OFFSET3 10 +#define OFFSET3 (8 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JE, OFFSET3); /* je out */ label3 = cnt;
@@ -317,7 +318,7 @@ static void emit_bpf_tail_call(u8 **pprog) * rdi == ctx (1st arg) * rax == prog->bpf_func + prologue_size */ - EMIT2(0xFF, 0xE0); /* jmp rax */ + RETPOLINE_RAX_BPF_JIT();
/* out: */ BUILD_BUG_ON(cnt - label1 != OFFSET1);
This is a note to let you know that I've just added the patch titled
bpf, x64: implement retpoline for tail call
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-x64-implement-retpoline-for-tail-call.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 14:20:51 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:34 +0100 Subject: bpf, x64: implement retpoline for tail call To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org Message-ID: cfd8963c4c57f676177fb2d3a516a4b63cdccde2.1520521792.git.daniel@iogearbox.net
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit a493a87f38cfa48caaa95c9347be2d914c6fdf29 ]
Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]).
JIT dump after patch:
# bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit
With CONFIG_RETPOLINE:
# bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...]
* relative fall-through jumps in error case + retpoline for indirect jump
Without CONFIG_RETPOLINE:
# bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...]
* relative fall-through jumps in error case - plain indirect jump as before
[0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2f...
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++++++++++ arch/x86/net/bpf_jit_comp.c | 9 ++++---- 2 files changed, 42 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -177,4 +177,41 @@ static inline void indirect_branch_predi }
#endif /* __ASSEMBLY__ */ + +/* + * Below is used in the eBPF JIT compiler and emits the byte sequence + * for the following assembly: + * + * With retpolines configured: + * + * callq do_rop + * spec_trap: + * pause + * lfence + * jmp spec_trap + * do_rop: + * mov %rax,(%rsp) + * retq + * + * Without retpolines configured: + * + * jmp *%rax + */ +#ifdef CONFIG_RETPOLINE +# define RETPOLINE_RAX_BPF_JIT_SIZE 17 +# define RETPOLINE_RAX_BPF_JIT() \ + EMIT1_off32(0xE8, 7); /* callq do_rop */ \ + /* spec_trap: */ \ + EMIT2(0xF3, 0x90); /* pause */ \ + EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \ + EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \ + /* do_rop: */ \ + EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \ + EMIT1(0xC3); /* retq */ +#else +# define RETPOLINE_RAX_BPF_JIT_SIZE 2 +# define RETPOLINE_RAX_BPF_JIT() \ + EMIT2(0xFF, 0xE0); /* jmp *%rax */ +#endif + #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */ --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -12,6 +12,7 @@ #include <linux/filter.h> #include <linux/if_vlan.h> #include <asm/cacheflush.h> +#include <asm/nospec-branch.h> #include <linux/bpf.h>
int bpf_jit_enable __read_mostly; @@ -281,7 +282,7 @@ static void emit_bpf_tail_call(u8 **ppro EMIT2(0x89, 0xD2); /* mov edx, edx */ EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */ offsetof(struct bpf_array, map.max_entries)); -#define OFFSET1 43 /* number of bytes to jump */ +#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* number of bytes to jump */ EMIT2(X86_JBE, OFFSET1); /* jbe out */ label1 = cnt;
@@ -290,7 +291,7 @@ static void emit_bpf_tail_call(u8 **ppro */ EMIT2_off32(0x8B, 0x85, -STACKSIZE + 36); /* mov eax, dword ptr [rbp - 516] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ -#define OFFSET2 32 +#define OFFSET2 (30 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JA, OFFSET2); /* ja out */ label2 = cnt; EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ @@ -304,7 +305,7 @@ static void emit_bpf_tail_call(u8 **ppro * goto out; */ EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */ -#define OFFSET3 10 +#define OFFSET3 (8 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JE, OFFSET3); /* je out */ label3 = cnt;
@@ -317,7 +318,7 @@ static void emit_bpf_tail_call(u8 **ppro * rdi == ctx (1st arg) * rax == prog->bpf_func + prologue_size */ - EMIT2(0xFF, 0xE0); /* jmp rax */ + RETPOLINE_RAX_BPF_JIT();
/* out: */ BUILD_BUG_ON(cnt - label1 != OFFSET1);
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.9/bpf-fix-mlock-precharge-on-arraymaps.patch queue-4.9/bpf-x64-implement-retpoline-for-tail-call.patch queue-4.9/bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch queue-4.9/bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-add-schedule-points-in-percpu-arrays-management.patch
This is a note to let you know that I've just added the patch titled
bpf, x64: implement retpoline for tail call
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-x64-implement-retpoline-for-tail-call.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 16:00:31 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:34 +0100 Subject: bpf, x64: implement retpoline for tail call To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org Message-ID: cfd8963c4c57f676177fb2d3a516a4b63cdccde2.1520521792.git.daniel@iogearbox.net
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit a493a87f38cfa48caaa95c9347be2d914c6fdf29 ]
Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]).
JIT dump after patch:
# bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit
With CONFIG_RETPOLINE:
# bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...]
* relative fall-through jumps in error case + retpoline for indirect jump
Without CONFIG_RETPOLINE:
# bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...]
* relative fall-through jumps in error case - plain indirect jump as before
[0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2f...
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++++++++++ arch/x86/net/bpf_jit_comp.c | 9 ++++---- 2 files changed, 42 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -195,4 +195,41 @@ static inline void vmexit_fill_RSB(void) }
#endif /* __ASSEMBLY__ */ + +/* + * Below is used in the eBPF JIT compiler and emits the byte sequence + * for the following assembly: + * + * With retpolines configured: + * + * callq do_rop + * spec_trap: + * pause + * lfence + * jmp spec_trap + * do_rop: + * mov %rax,(%rsp) + * retq + * + * Without retpolines configured: + * + * jmp *%rax + */ +#ifdef CONFIG_RETPOLINE +# define RETPOLINE_RAX_BPF_JIT_SIZE 17 +# define RETPOLINE_RAX_BPF_JIT() \ + EMIT1_off32(0xE8, 7); /* callq do_rop */ \ + /* spec_trap: */ \ + EMIT2(0xF3, 0x90); /* pause */ \ + EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \ + EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \ + /* do_rop: */ \ + EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \ + EMIT1(0xC3); /* retq */ +#else +# define RETPOLINE_RAX_BPF_JIT_SIZE 2 +# define RETPOLINE_RAX_BPF_JIT() \ + EMIT2(0xFF, 0xE0); /* jmp *%rax */ +#endif + #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */ --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -12,6 +12,7 @@ #include <linux/filter.h> #include <linux/if_vlan.h> #include <asm/cacheflush.h> +#include <asm/nospec-branch.h> #include <linux/bpf.h>
int bpf_jit_enable __read_mostly; @@ -269,7 +270,7 @@ static void emit_bpf_tail_call(u8 **ppro EMIT2(0x89, 0xD2); /* mov edx, edx */ EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */ offsetof(struct bpf_array, map.max_entries)); -#define OFFSET1 43 /* number of bytes to jump */ +#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* number of bytes to jump */ EMIT2(X86_JBE, OFFSET1); /* jbe out */ label1 = cnt;
@@ -278,7 +279,7 @@ static void emit_bpf_tail_call(u8 **ppro */ EMIT2_off32(0x8B, 0x85, -STACKSIZE + 36); /* mov eax, dword ptr [rbp - 516] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ -#define OFFSET2 32 +#define OFFSET2 (30 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JA, OFFSET2); /* ja out */ label2 = cnt; EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ @@ -292,7 +293,7 @@ static void emit_bpf_tail_call(u8 **ppro * goto out; */ EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */ -#define OFFSET3 10 +#define OFFSET3 (8 + RETPOLINE_RAX_BPF_JIT_SIZE) EMIT2(X86_JE, OFFSET3); /* je out */ label3 = cnt;
@@ -305,7 +306,7 @@ static void emit_bpf_tail_call(u8 **ppro * rdi == ctx (1st arg) * rax == prog->bpf_func + prologue_size */ - EMIT2(0xFF, 0xE0); /* jmp rax */ + RETPOLINE_RAX_BPF_JIT();
/* out: */ BUILD_BUG_ON(cnt - label1 != OFFSET1);
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.4/bpf-x64-implement-retpoline-for-tail-call.patch
[ upstream commit 16338a9b3ac30740d49f5dfed81bac0ffa53b9c7 ]
I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows:
[ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...]
In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2.
While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission:
# bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...]
Meaning, the tests passed because commit ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing.
Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616.
Result after patch:
# bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...]
Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net --- arch/arm64/net/bpf_jit_comp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index d8199e1..b47a26f 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -234,8 +234,9 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx) off = offsetof(struct bpf_array, map.max_entries); emit_a64_mov_i64(tmp, off, ctx); emit(A64_LDR32(tmp, r2, tmp), ctx); + emit(A64_MOV(0, r3, r3), ctx); emit(A64_CMP(0, r3, tmp), ctx); - emit(A64_B_(A64_COND_GE, jmp_offset), ctx); + emit(A64_B_(A64_COND_CS, jmp_offset), ctx);
/* if (tail_call_cnt > MAX_TAIL_CALL_CNT) * goto out; @@ -243,7 +244,7 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx) */ emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx); emit(A64_CMP(1, tcc, tmp), ctx); - emit(A64_B_(A64_COND_GT, jmp_offset), ctx); + emit(A64_B_(A64_COND_HI, jmp_offset), ctx); emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
/* prog = array->ptrs[index];
This is a note to let you know that I've just added the patch titled
bpf, arm64: fix out of bounds access in tail call
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 14:20:51 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:35 +0100 Subject: bpf, arm64: fix out of bounds access in tail call To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org Message-ID: 3e884789ba211b116935a7c05044b861aba2a30e.1520521792.git.daniel@iogearbox.net
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit 16338a9b3ac30740d49f5dfed81bac0ffa53b9c7 ]
I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows:
[ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...]
In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2.
While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission:
# bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...]
Meaning, the tests passed because commit ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing.
Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616.
Result after patch:
# bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...]
Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/net/bpf_jit_comp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -234,8 +234,9 @@ static int emit_bpf_tail_call(struct jit off = offsetof(struct bpf_array, map.max_entries); emit_a64_mov_i64(tmp, off, ctx); emit(A64_LDR32(tmp, r2, tmp), ctx); + emit(A64_MOV(0, r3, r3), ctx); emit(A64_CMP(0, r3, tmp), ctx); - emit(A64_B_(A64_COND_GE, jmp_offset), ctx); + emit(A64_B_(A64_COND_CS, jmp_offset), ctx);
/* if (tail_call_cnt > MAX_TAIL_CALL_CNT) * goto out; @@ -243,7 +244,7 @@ static int emit_bpf_tail_call(struct jit */ emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx); emit(A64_CMP(1, tcc, tmp), ctx); - emit(A64_B_(A64_COND_GT, jmp_offset), ctx); + emit(A64_B_(A64_COND_HI, jmp_offset), ctx); emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
/* prog = array->ptrs[index];
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.9/bpf-fix-mlock-precharge-on-arraymaps.patch queue-4.9/bpf-x64-implement-retpoline-for-tail-call.patch queue-4.9/bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch queue-4.9/bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-add-schedule-points-in-percpu-arrays-management.patch
From: Eric Dumazet edumazet@google.com
[ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]
syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu()
It takes time to allocate a huge percpu map, but even more time to free it.
Since we run in process context, use cond_resched() to yield cpu if needed.
Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net --- kernel/bpf/arraymap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 075ddb8..a38119e 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -20,8 +20,10 @@ static void bpf_array_free_percpu(struct bpf_array *array) { int i;
- for (i = 0; i < array->map.max_entries; i++) + for (i = 0; i < array->map.max_entries; i++) { free_percpu(array->pptrs[i]); + cond_resched(); + } }
static int bpf_array_alloc_percpu(struct bpf_array *array) @@ -37,6 +39,7 @@ static int bpf_array_alloc_percpu(struct bpf_array *array) return -ENOMEM; } array->pptrs[i] = ptr; + cond_resched(); }
return 0;
This is a note to let you know that I've just added the patch titled
bpf: add schedule points in percpu arrays management
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-add-schedule-points-in-percpu-arrays-management.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 14:20:51 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:36 +0100 Subject: bpf: add schedule points in percpu arrays management To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org, Eric Dumazet edumazet@google.com Message-ID: 2f5704af2bdf05e8eae92917e2aeaec49d5477c9.1520521792.git.daniel@iogearbox.net
From: Eric Dumazet edumazet@google.com
[ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]
syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu()
It takes time to allocate a huge percpu map, but even more time to free it.
Since we run in process context, use cond_resched() to yield cpu if needed.
Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/arraymap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -20,8 +20,10 @@ static void bpf_array_free_percpu(struct { int i;
- for (i = 0; i < array->map.max_entries; i++) + for (i = 0; i < array->map.max_entries; i++) { free_percpu(array->pptrs[i]); + cond_resched(); + } }
static int bpf_array_alloc_percpu(struct bpf_array *array) @@ -37,6 +39,7 @@ static int bpf_array_alloc_percpu(struct return -ENOMEM; } array->pptrs[i] = ptr; + cond_resched(); }
return 0;
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.9/bpf-fix-mlock-precharge-on-arraymaps.patch queue-4.9/bpf-x64-implement-retpoline-for-tail-call.patch queue-4.9/bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch queue-4.9/bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-add-schedule-points-in-percpu-arrays-management.patch
[ upstream commit d269176e766c71c998cb75b4ea8cbc321cc0019d ]
While working on 16338a9b3ac3 ("bpf, arm64: fix out of bounds access in tail call") I noticed that ppc64 JIT is partially affected as well. While the bound checking is correctly performed as unsigned comparison, the register with the index value however, is never truncated into 32 bit space, so e.g. a index value of 0x100000000ULL with a map of 1 element would pass with PPC_CMPLW() whereas we later on continue with the full 64 bit register value. Therefore, as we do in interpreter and other JITs truncate the value to 32 bit initially in order to fix access.
Fixes: ce0761419fae ("powerpc/bpf: Implement support for tail calls") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Tested-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net --- arch/powerpc/net/bpf_jit_comp64.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 0fe98a5..be9d968 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -245,6 +245,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 * goto out; */ PPC_LWZ(b2p[TMP_REG_1], b2p_bpf_array, offsetof(struct bpf_array, map.max_entries)); + PPC_RLWINM(b2p_index, b2p_index, 0, 0, 31); PPC_CMPLW(b2p_index, b2p[TMP_REG_1]); PPC_BCC(COND_GE, out);
This is a note to let you know that I've just added the patch titled
bpf, ppc64: fix out of bounds access in tail call
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Fri Mar 9 14:20:51 PST 2018
From: Daniel Borkmann daniel@iogearbox.net Date: Thu, 8 Mar 2018 16:17:37 +0100 Subject: bpf, ppc64: fix out of bounds access in tail call To: gregkh@linuxfoundation.org Cc: ast@kernel.org, daniel@iogearbox.net, stable@vger.kernel.org Message-ID: 08bc9e5902de0e9e9b26194ba4ea219f053b7206.1520521792.git.daniel@iogearbox.net
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit d269176e766c71c998cb75b4ea8cbc321cc0019d ]
While working on 16338a9b3ac3 ("bpf, arm64: fix out of bounds access in tail call") I noticed that ppc64 JIT is partially affected as well. While the bound checking is correctly performed as unsigned comparison, the register with the index value however, is never truncated into 32 bit space, so e.g. a index value of 0x100000000ULL with a map of 1 element would pass with PPC_CMPLW() whereas we later on continue with the full 64 bit register value. Therefore, as we do in interpreter and other JITs truncate the value to 32 bit initially in order to fix access.
Fixes: ce0761419fae ("powerpc/bpf: Implement support for tail calls") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Tested-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/net/bpf_jit_comp64.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -245,6 +245,7 @@ static void bpf_jit_emit_tail_call(u32 * * goto out; */ PPC_LWZ(b2p[TMP_REG_1], b2p_bpf_array, offsetof(struct bpf_array, map.max_entries)); + PPC_RLWINM(b2p_index, b2p_index, 0, 0, 31); PPC_CMPLW(b2p_index, b2p[TMP_REG_1]); PPC_BCC(COND_GE, out);
Patches currently in stable-queue which might be from daniel@iogearbox.net are
queue-4.9/bpf-fix-mlock-precharge-on-arraymaps.patch queue-4.9/bpf-x64-implement-retpoline-for-tail-call.patch queue-4.9/bpf-arm64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-fix-wrong-exposure-of-map_flags-into-fdinfo-for-lpm.patch queue-4.9/bpf-ppc64-fix-out-of-bounds-access-in-tail-call.patch queue-4.9/bpf-add-schedule-points-in-percpu-arrays-management.patch
On 03/09/2018 11:22 PM, Greg KH wrote:
On Thu, Mar 08, 2018 at 04:17:31PM +0100, Daniel Borkmann wrote:
All for 4.9 backported and tested.
All now applied. Should I work to backport these to the 4.4.y tree?
Yeah, would be great as I don't have a way to test them on 4.4.
Thanks Greg!
On Fri, Mar 09, 2018 at 11:36:10PM +0100, Daniel Borkmann wrote:
On 03/09/2018 11:22 PM, Greg KH wrote:
On Thu, Mar 08, 2018 at 04:17:31PM +0100, Daniel Borkmann wrote:
All for 4.9 backported and tested.
All now applied. Should I work to backport these to the 4.4.y tree?
Yeah, would be great as I don't have a way to test them on 4.4.
Looks like only the one patch that Ben wanted to have applied was relevant, so I've added that one now.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org