JIT'd bpf programs that have subprograms can have a postive value for
num_extentries but a NULL value for extable. This is problematic if one of
these bpf programs encounters a fault during its execution. The fault
handlers correctly identify that the faulting IP belongs to a bpf program.
However, performing a search_extable call on a NULL extable leads to a
second fault.
Fix up by refusing to search a NULL extable, and by checking the
subprograms' extables if the umbrella program has subprograms configured.
Once I realized what was going on, I was able to use the following bpf
program to get an oops from this failure:
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
#define PATH_MAX 4096
struct callback_ctx {
u8 match;
};
struct filter_value {
char prefix[PATH_MAX];
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 256);
__type(key, int);
__type(value, struct filter_value);
} test_filter SEC(".maps");
static __u64 test_filter_cb(struct bpf_map *map, __u32 *key,
struct filter_value *val,
struct callback_ctx *data)
{
return 1;
}
SEC("fentry/__sys_bind")
int BPF_PROG(__sys_bind, int fd, struct sockaddr *umyaddr, int addrlen)
{
pid_t pid;
struct callback_ctx cx = { .match = 0 };
pid = bpf_get_current_pid_tgid() >> 32;
bpf_for_each_map_elem(&test_filter, test_filter_cb, &cx, 0);
bpf_printk("fentry: pid = %d, family = %llx\n", pid, umyaddr->sa_family);
return 0;
}
And then the following code to actually trigger a failure:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/ip.h>
int
main(int argc, char *argv[])
{
int sfd, rc;
struct sockaddr *sockptr = (struct sockaddr *)0x900000000000;
sfd = socket(AF_INET, SOCK_STREAM, 0);
if (sfd < 0) {
perror("socket");
exit(EXIT_FAILURE);
}
while (1) {
rc = bind(sfd, (struct sockaddr *) sockptr, sizeof(struct sockaddr_in));
if (rc < 0) {
perror("bind");
sleep(5);
} else {
break;
}
}
return 0;
}
I was able to validate that this problem does not occur when subprograms
are not in use, or when the direct pointer accesses are replaced with
bpf_probe_read calls. I further validated that this did not break the
extable handling in existing bpf programs. The same program caused no
failures when subprograms were removed, but the exception was still
injected.
Cc: stable(a)vger.kernel.org
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com>
---
kernel/bpf/core.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7421487422d4..0e12238e4340 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -736,15 +736,33 @@ const struct exception_table_entry *search_bpf_extables(unsigned long addr)
{
const struct exception_table_entry *e = NULL;
struct bpf_prog *prog;
+ struct bpf_prog_aux *aux;
+ int i;
rcu_read_lock();
prog = bpf_prog_ksym_find(addr);
if (!prog)
goto out;
- if (!prog->aux->num_exentries)
+ aux = prog->aux;
+ if (!aux->num_exentries)
goto out;
- e = search_extable(prog->aux->extable, prog->aux->num_exentries, addr);
+ /* prog->aux->extable can be NULL if subprograms are in use. In that
+ * case, check each sub-function's aux->extables to see if it has a
+ * matching entry.
+ */
+ if (aux->extable != NULL) {
+ e = search_extable(prog->aux->extable,
+ prog->aux->num_exentries, addr);
+ } else {
+ for (i = 0; (i < aux->func_cnt) && (e == NULL); i++) {
+ if (!aux->func[i]->aux->num_exentries ||
+ aux->func[i]->aux->extable == NULL)
+ continue;
+ e = search_extable(aux->func[i]->aux->extable,
+ aux->func[i]->aux->num_exentries, addr);
+ }
+ }
out:
rcu_read_unlock();
return e;
--
2.25.1
Recently we have been seeing kernel panic in cifs_reconnect function
while accessing tgt_list. Looks like tgt_list is not initialized
correctly. There are fixes already present in 5.10 and later trees.
Backporting them to 5.4
CIFS VFS: \\172.30.1.14 cifs_reconnect: no target servers for DFS
failover
BUG: unable to handle page fault for address: fffffffffffffff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 260e067 P4D 260e067 PUD 2610067 PMD 0
Oops: 0000 [#1] SMP PTI
RIP: 0010:cifs_reconnect+0x51d/0xef0 [cifs]
RSP: 0018:ffffc90000693da0 EFLAGS: 00010282
RAX: fffffffffffffff8 RBX: ffff8887fa63b800 RCX: fffffffffffffff8
Call Trace:
cifs_handle_standard+0x18d/0x1b0 [cifs]
cifs_demultiplex_thread+0xa5c/0xc90 [cifs]
kthread+0x113/0x130
*** BLURB HERE ***
Paulo Alcantara (2):
cifs: get rid of unused parameter in reconn_setup_dfs_targets()
cifs: handle empty list of targets in cifs_reconnect()
fs/cifs/connect.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
--
2.39.2
As reported by Ackerley[1], the use of page_cache_next_miss() in
hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
same offset fails with -EEXIST. Revert this change and go back to the
previous method of using get from the page cache and then dropping the
reference on success.
hugetlbfs_pagecache_present() was also refactored to use
page_cache_next_miss(), revert the usage there as well.
User visible impacts include hugetlb fallocate incorrectly returning
EEXIST if pages are already present in the file. In addition, hugetlb
pages will not be included in core dumps if they need to be brought in via
GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
already present in the cache. It may try to allocate a new page and
potentially return ENOMEM as opposed to EEXIST.
Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Cc: <stable(a)vger.kernel.org> #v6.3
Reported-by: Ackerley Tng <ackerleytng(a)google.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.co…
---
This revert is the safest way to fix 6.3. The upstream fix will either
fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
new function to check if a page is present in the page cache. Both
directions are currently under review so we can use this safe and simple
fix for 6.3
fs/hugetlbfs/inode.c | 8 +++-----
mm/hugetlb.c | 11 +++++------
2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9062da6da5675..586767afb4cdb 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -821,7 +821,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
*/
struct folio *folio;
unsigned long addr;
- bool present;
cond_resched();
@@ -845,10 +844,9 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
mutex_lock(&hugetlb_fault_mutex_table[hash]);
/* See if already present in mapping to avoid alloc/free */
- rcu_read_lock();
- present = page_cache_next_miss(mapping, index, 1) != index;
- rcu_read_unlock();
- if (present) {
+ folio = filemap_get_folio(mapping, idx);
+ if (folio) {
+ folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
hugetlb_drop_vma_policy(&pseudo_vma);
continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 245038a9fe4ea..29ab27d2a3ef5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5666,13 +5666,12 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
{
struct address_space *mapping = vma->vm_file->f_mapping;
pgoff_t idx = vma_hugecache_offset(h, vma, address);
- bool present;
-
- rcu_read_lock();
- present = page_cache_next_miss(mapping, idx, 1) != idx;
- rcu_read_unlock();
+ struct folio *folio;
- return present;
+ folio = filemap_get_folio(mapping, idx);
+ if (folio)
+ folio_put(folio);
+ return folio != NULL;
}
int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
--
2.40.1