commit dbd3e6eaf3d813939b28e8a66e29d81cdc836445 upstream.
The removal function is called regardless of whether
/proc/i8k was created successfully or not, the later
causing a WARN() on module removal.
Fix that by only calling the removal function
if /proc/i8k was created successfully.
Since the original patch depends on the driver
registering a platform device, the backported patch
stores the return value of proc_create() and only
calls proc_remove_entry() on exit if proc_create()
was successful.
Tested on a Inspiron 3505 for kernel 5.10.
Cc: <stable(a)vger.kernel.org> # 5.10.x
Signed-off-by: Armin Wolf <W_Armin(a)gmx.de>
---
drivers/hwmon/dell-smm-hwmon.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
index 63b74e781c5d..87f401100466 100644
--- a/drivers/hwmon/dell-smm-hwmon.c
+++ b/drivers/hwmon/dell-smm-hwmon.c
@@ -603,15 +603,18 @@ static const struct proc_ops i8k_proc_ops = {
.proc_ioctl = i8k_ioctl,
};
+static struct proc_dir_entry *entry;
+
static void __init i8k_init_procfs(void)
{
/* Register the proc entry */
- proc_create("i8k", 0, NULL, &i8k_proc_ops);
+ entry = proc_create("i8k", 0, NULL, &i8k_proc_ops);
}
static void __exit i8k_exit_procfs(void)
{
- remove_proc_entry("i8k", NULL);
+ if (entry)
+ remove_proc_entry("i8k", NULL);
}
#else
--
2.30.2
From: Daniel Borkmann <daniel(a)iogearbox.net>
commit 6e6fddc78323533be570873abb728b7e0ba7e024 upstream.
sykzaller triggered several panics similar to the below:
[...]
[ 248.851531] BUG: KASAN: use-after-free in _copy_to_user+0x5c/0x90
[ 248.857656] Read of size 985 at addr ffff8808017ffff2 by task a.out/1425
[...]
[ 248.865902] CPU: 1 PID: 1425 Comm: a.out Not tainted 4.18.0-rc4+ #13
[ 248.865903] Hardware name: Supermicro SYS-5039MS-H12TRF/X11SSE-F, BIOS 2.1a 03/08/2018
[ 248.865905] Call Trace:
[ 248.865910] dump_stack+0xd6/0x185
[ 248.865911] ? show_regs_print_info+0xb/0xb
[ 248.865913] ? printk+0x9c/0xc3
[ 248.865915] ? kmsg_dump_rewind_nolock+0xe4/0xe4
[ 248.865919] print_address_description+0x6f/0x270
[ 248.865920] kasan_report+0x25b/0x380
[ 248.865922] ? _copy_to_user+0x5c/0x90
[ 248.865924] check_memory_region+0x137/0x190
[ 248.865925] kasan_check_read+0x11/0x20
[ 248.865927] _copy_to_user+0x5c/0x90
[ 248.865930] bpf_test_finish.isra.8+0x4f/0xc0
[ 248.865932] bpf_prog_test_run_skb+0x6a0/0xba0
[...]
After scrubbing the BPF prog a bit from the noise, turns out it called
bpf_skb_change_head() for the lwt_xmit prog with headroom of 2. Nothing
wrong in that, however, this was run with repeat >> 0 in bpf_prog_test_run_skb()
and the same skb thus keeps changing until the pskb_expand_head() called
from skb_cow() keeps bailing out in atomic alloc context with -ENOMEM.
So upon return we'll basically have 0 headroom left yet blindly do the
__skb_push() of 14 bytes and keep copying data from there in bpf_test_finish()
out of bounds. Fix to check if we have enough headroom and if pskb_expand_head()
fails, bail out with error.
Another bug independent of this fix (but related in triggering above) is
that BPF_PROG_TEST_RUN should be reworked to reset the skb/xdp buffer to
it's original state from input as otherwise repeating the same test in a
loop won't work for benchmarking when underlying input buffer is getting
changed by the prog each time and reused for the next run leading to
unexpected results.
Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
Reported-by: syzbot+709412e651e55ed96498(a)syzkaller.appspotmail.com
Reported-by: syzbot+54f39d6ab58f39720a55(a)syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
[connoro: drop test_verifier.c changes not applicable to 4.14]
Signed-off-by: Connor O'Brien <connoro(a)google.com>
---
Hello,
This is a backport for the 4.14 stable tree.
Thanks,
Connor
net/bpf/test_run.c | 17 ++++++++++++++---
tools/testing/selftests/bpf/test_verifier.c | 18 ++++++++++++++++++
2 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 6be41a44d688..4f3c08583d8c 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -96,6 +96,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
u32 size = kattr->test.data_size_in;
u32 repeat = kattr->test.repeat;
u32 retval, duration;
+ int hh_len = ETH_HLEN;
struct sk_buff *skb;
void *data;
int ret;
@@ -131,12 +132,22 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
skb_reset_network_header(skb);
if (is_l2)
- __skb_push(skb, ETH_HLEN);
+ __skb_push(skb, hh_len);
if (is_direct_pkt_access)
bpf_compute_data_end(skb);
retval = bpf_test_run(prog, skb, repeat, &duration);
- if (!is_l2)
- __skb_push(skb, ETH_HLEN);
+ if (!is_l2) {
+ if (skb_headroom(skb) < hh_len) {
+ int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb));
+
+ if (pskb_expand_head(skb, nhead, 0, GFP_USER)) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+ }
+ memset(__skb_push(skb, hh_len), 0, hh_len);
+ }
+
size = skb->len;
/* bpf program can never convert linear skb to non-linear */
if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index d4f611546fc0..0846345fe1e5 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4334,6 +4334,24 @@ static struct bpf_test tests[] = {
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_LWT_XMIT,
},
+ {
+ "make headroom for LWT_XMIT",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_MOV64_IMM(BPF_REG_2, 34),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_EMIT_CALL(BPF_FUNC_skb_change_head),
+ /* split for s390 to succeed */
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+ BPF_MOV64_IMM(BPF_REG_2, 42),
+ BPF_MOV64_IMM(BPF_REG_3, 0),
+ BPF_EMIT_CALL(BPF_FUNC_skb_change_head),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_LWT_XMIT,
+ },
{
"invalid access of tc_classid for LWT_IN",
.insns = {
--
2.34.1.173.g76aa8bc2d0-goog
From: Ondrej Mosnacek <omosnace(a)redhat.com>
commit cbfcd13be5cb2a07868afe67520ed181956579a7 upstream.
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae669 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c02 ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c02, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.o…
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable(a)vger.kernel.org
Cc: Xinjie Zheng <xinjie(a)google.com>
Reported-by: Sujithra Periasamy <sujithra(a)google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
(cherry picked from commit cbfcd13be5cb2a07868afe67520ed181956579a7)
[vijayb: Backport contextual differences are due to v5.10 RCU related
changes are not in 5.4]
Signed-off-by: Vijay Balakrishna <vijayb(a)linux.microsoft.com>
---
We have kernel crashes with stack traces related to selinux security
context to sid in 5.4 --
https://lore.kernel.org/all/af058f59-ce8a-7648-25e8-f8b8a2dbb0ba@linux.micr…
Unfortunately we don't have a on-demand repro. We are hoping this
patch would help in addressing a possible race in 5.4.
[ 6.222870] Unable to handle kernel access to user memory outside uaccess routines at virtual address 000000000000000c
[ 6.222875] Mem abort info:
[ 6.222876] ESR = 0x96000004
[ 6.222878] EC = 0x25: DABT (current EL), IL = 32 bits
[ 6.222879] SET = 0, FnV = 0
[ 6.222881] EA = 0, S1PTW = 0
[ 6.222881] Data abort info:
[ 6.222883] ISV = 0, ISS = 0x00000004
[ 6.222884] CM = 0, WnR = 0
[ 6.222887] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000965148000
[ 6.222888] [000000000000000c] pgd=0000000000000000
[ 6.222893] Internal error: Oops: 96000004 [#1] SMP
[ 6.227931] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
[ 6.235480] CPU: 6 PID: 1 Comm: systemd Tainted: G O 5.4.144-xx #1
[ 6.244632] Hardware name: Overlake (DT)
[ 6.248677] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 6.253629] pc : sidtab_context_to_sid+0x154/0x600
[ 6.258570] lr : sidtab_context_to_sid+0x150/0x600
[ 6.263510] sp : ffff80001005b7e0
[ 6.266928] x29: ffff80001005b7e0 x28: 0000000000000000
[ 6.272406] x27: 0000000000000000 x26: ffff80001005b8d8
[ 6.277884] x25: ffff80001005b8f0 x24: ffff250b25230000
[ 6.283362] x23: ffff80001005b9a4 x22: ffffd429fedb9808
[ 6.288841] x21: ffff80001005b8c0 x20: 0000000000000118
[ 6.294319] x19: 0000000000000000 x18: 0000000000000000
[ 6.299797] x17: 0000000000000000 x16: 0000000000000000
[ 6.305275] x15: 0000000000000000 x14: 0000000000000000
[ 6.310753] x13: 0000000000000000 x12: 0000000000000010
[ 6.316231] x11: 0000000000000010 x10: 0101010101010101
[ 6.321710] x9 : fffffffffffffffe x8 : 7f7f7f7f7f7f7f7f
[ 6.327188] x7 : fefefefefeff735e x6 : 0000808080808080
[ 6.332667] x5 : 0000000000000000 x4 : ffff250b25230000
[ 6.338144] x3 : ffff80001005b8c0 x2 : 0000000000000000
[ 6.343622] x1 : 0000000000000119 x0 : 0000000000000000
[ 6.349100] Call trace:
[ 6.351625] sidtab_context_to_sid+0x154/0x600
[ 6.356207] security_context_to_sid_core.isra.21+0x190/0x250
[ 6.362133] security_context_to_sid+0x54/0x68
[ 6.366715] selinux_kernfs_init_security+0xd0/0x210
[ 6.371838] security_kernfs_init_security+0x40/0x60
[ 6.376961] __kernfs_new_node+0x174/0x218
[ 6.381185] kernfs_new_node+0x60/0x90
[ 6.385051] __kernfs_create_file+0x60/0x300
[ 6.389457] cgroup_addrm_files+0x14c/0x308
[ 6.393770] css_populate_dir+0x7c/0x168
[ 6.397815] cgroup_apply_control_enable+0x100/0x348
[ 6.402934] cgroup_mkdir+0x380/0x520
[ 6.406710] kernfs_iop_mkdir+0x94/0xf0
[ 6.410666] vfs_mkdir+0xf4/0x1c0
[ 6.414084] do_mkdirat+0x98/0x110
[ 6.417590] __arm64_sys_mkdirat+0x28/0x38
[ 6.421817] el0_svc_handler+0x90/0x138
[ 6.425773] el0_svc+0x8/0x208
[ 6.428925] Code: 2a1403e1 aa1803e0 97fffd81 aa0003fc (b9400c00)
[ 6.435219] ---[ end trace bb81d12a8eb77133 ]---
---
security/selinux/ss/services.c | 159 ++++++++++++++++++---------------
1 file changed, 87 insertions(+), 72 deletions(-)
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index f62adf3cfce8..a0afe49309c8 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2250,6 +2250,43 @@ size_t security_policydb_len(struct selinux_state *state)
return len;
}
+/**
+ * ocontext_to_sid - Helper to safely get sid for an ocontext
+ * @sidtab: SID table
+ * @c: ocontext structure
+ * @index: index of the context entry (0 or 1)
+ * @out_sid: pointer to the resulting SID value
+ *
+ * For all ocontexts except OCON_ISID the SID fields are populated
+ * on-demand when needed. Since updating the SID value is an SMP-sensitive
+ * operation, this helper must be used to do that safely.
+ *
+ * WARNING: This function may return -ESTALE, indicating that the caller
+ * must retry the operation after re-acquiring the policy pointer!
+ */
+static int ocontext_to_sid(struct sidtab *sidtab, struct ocontext *c,
+ size_t index, u32 *out_sid)
+{
+ int rc;
+ u32 sid;
+
+ /* Ensure the associated sidtab entry is visible to this thread. */
+ sid = smp_load_acquire(&c->sid[index]);
+ if (!sid) {
+ rc = sidtab_context_to_sid(sidtab, &c->context[index], &sid);
+ if (rc)
+ return rc;
+
+ /*
+ * Ensure the new sidtab entry is visible to other threads
+ * when they see the SID.
+ */
+ smp_store_release(&c->sid[index], sid);
+ }
+ *out_sid = sid;
+ return 0;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @protocol: protocol number
@@ -2262,10 +2299,12 @@ int security_port_sid(struct selinux_state *state,
struct policydb *policydb;
struct sidtab *sidtab;
struct ocontext *c;
- int rc = 0;
+ int rc;
read_lock(&state->ss->policy_rwlock);
+retry:
+ rc = 0;
policydb = &state->ss->policydb;
sidtab = state->ss->sidtab;
@@ -2279,14 +2318,11 @@ int security_port_sid(struct selinux_state *state,
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc)
- goto out;
- }
- *out_sid = c->sid[0];
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE)
+ goto retry;
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_PORT;
}
@@ -2308,10 +2344,12 @@ int security_ib_pkey_sid(struct selinux_state *state,
struct policydb *policydb;
struct sidtab *sidtab;
struct ocontext *c;
- int rc = 0;
+ int rc;
read_lock(&state->ss->policy_rwlock);
+retry:
+ rc = 0;
policydb = &state->ss->policydb;
sidtab = state->ss->sidtab;
@@ -2326,14 +2364,11 @@ int security_ib_pkey_sid(struct selinux_state *state,
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc)
- goto out;
- }
- *out_sid = c->sid[0];
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE)
+ goto retry;
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2354,10 +2389,12 @@ int security_ib_endport_sid(struct selinux_state *state,
struct policydb *policydb;
struct sidtab *sidtab;
struct ocontext *c;
- int rc = 0;
+ int rc;
read_lock(&state->ss->policy_rwlock);
+retry:
+ rc = 0;
policydb = &state->ss->policydb;
sidtab = state->ss->sidtab;
@@ -2373,14 +2410,11 @@ int security_ib_endport_sid(struct selinux_state *state,
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc)
- goto out;
- }
- *out_sid = c->sid[0];
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE)
+ goto retry;
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2399,11 +2433,13 @@ int security_netif_sid(struct selinux_state *state,
{
struct policydb *policydb;
struct sidtab *sidtab;
- int rc = 0;
+ int rc;
struct ocontext *c;
read_lock(&state->ss->policy_rwlock);
+retry:
+ rc = 0;
policydb = &state->ss->policydb;
sidtab = state->ss->sidtab;
@@ -2415,19 +2451,11 @@ int security_netif_sid(struct selinux_state *state,
}
if (c) {
- if (!c->sid[0] || !c->sid[1]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc)
- goto out;
- rc = sidtab_context_to_sid(sidtab,
- &c->context[1],
- &c->sid[1]);
- if (rc)
- goto out;
- }
- *if_sid = c->sid[0];
+ rc = ocontext_to_sid(sidtab, c, 0, if_sid);
+ if (rc == -ESTALE)
+ goto retry;
+ if (rc)
+ goto out;
} else
*if_sid = SECINITSID_NETIF;
@@ -2469,6 +2497,7 @@ int security_node_sid(struct selinux_state *state,
read_lock(&state->ss->policy_rwlock);
+retry:
policydb = &state->ss->policydb;
sidtab = state->ss->sidtab;
@@ -2511,14 +2540,11 @@ int security_node_sid(struct selinux_state *state,
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc)
- goto out;
- }
- *out_sid = c->sid[0];
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE)
+ goto retry;
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_NODE;
}
@@ -2677,7 +2703,7 @@ static inline int __security_genfs_sid(struct selinux_state *state,
u16 sclass;
struct genfs *genfs;
struct ocontext *c;
- int rc, cmp = 0;
+ int cmp = 0;
while (path[0] == '/' && path[1] == '/')
path++;
@@ -2691,9 +2717,8 @@ static inline int __security_genfs_sid(struct selinux_state *state,
break;
}
- rc = -ENOENT;
if (!genfs || cmp)
- goto out;
+ return -ENOENT;
for (c = genfs->head; c; c = c->next) {
len = strlen(c->u.name);
@@ -2702,20 +2727,10 @@ static inline int __security_genfs_sid(struct selinux_state *state,
break;
}
- rc = -ENOENT;
if (!c)
- goto out;
-
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
- if (rc)
- goto out;
- }
+ return -ENOENT;
- *sid = c->sid[0];
- rc = 0;
-out:
- return rc;
+ return ocontext_to_sid(sidtab, c, 0, sid);
}
/**
@@ -2750,13 +2765,15 @@ int security_fs_use(struct selinux_state *state, struct super_block *sb)
{
struct policydb *policydb;
struct sidtab *sidtab;
- int rc = 0;
+ int rc;
struct ocontext *c;
struct superblock_security_struct *sbsec = sb->s_security;
const char *fstype = sb->s_type->name;
read_lock(&state->ss->policy_rwlock);
+retry:
+ rc = 0;
policydb = &state->ss->policydb;
sidtab = state->ss->sidtab;
@@ -2769,13 +2786,11 @@ int security_fs_use(struct selinux_state *state, struct super_block *sb)
if (c) {
sbsec->behavior = c->v.behavior;
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc)
- goto out;
- }
- sbsec->sid = c->sid[0];
+ rc = ocontext_to_sid(sidtab, c, 0, &sbsec->sid);
+ if (rc == -ESTALE)
+ goto retry;
+ if (rc)
+ goto out;
} else {
rc = __security_genfs_sid(state, fstype, "/", SECCLASS_DIR,
&sbsec->sid);
--
2.30.2
From: Sean Christopherson <seanjc(a)google.com>
commit 3244867af8c065e51969f1bffe732d3ebfd9a7d2 upstream.
Do not bail early if there are no bits set in the sparse banks for a
non-sparse, a.k.a. "all CPUs", IPI request. Per the Hyper-V spec, it is
legal to have a variable length of '0', e.g. VP_SET's BankContents in
this case, if the request can be serviced without the extra info.
It is possible that for a given invocation of a hypercall that does
accept variable sized input headers that all the header input fits
entirely within the fixed size header. In such cases the variable sized
input header is zero-sized and the corresponding bits in the hypercall
input should be set to zero.
Bailing early results in KVM failing to send IPIs to all CPUs as expected
by the guest.
Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Message-Id: <20211207220926.718794-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
---
arch/x86/kvm/hyperv.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index bb39f493447c..328f37e4fd3a 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1641,11 +1641,13 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *current_vcpu, u64 ingpa, u64 outgpa,
all_cpus = send_ipi_ex.vp_set.format == HV_GENERIC_SET_ALL;
+ if (all_cpus)
+ goto check_and_send_ipi;
+
if (!sparse_banks_len)
goto ret_success;
- if (!all_cpus &&
- kvm_read_guest(kvm,
+ if (kvm_read_guest(kvm,
ingpa + offsetof(struct hv_send_ipi_ex,
vp_set.bank_contents),
sparse_banks,
@@ -1653,6 +1655,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *current_vcpu, u64 ingpa, u64 outgpa,
return HV_STATUS_INVALID_HYPERCALL_INPUT;
}
+check_and_send_ipi:
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
--
2.33.1
On Mon, Dec 13, 2021 at 10:37 AM Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
>
> So I'll just apply the patch. Thanks for the report and the testing
Done, it's commit e386dfc56f83 ("fget: clarify and improve
__fget_files() implementation") in my tree now.
I didn't mark it as "Fixes:" or for stable, because I can't imagine
that it matters in real life.
But then it struck me that Greg has mentioned that he ends up getting
a lot of performance regression reports for people testing stable and
they can be distracting.
So I'm adding a stable cc here just so people are aware of this as a
"yeah, will-it-scale.poll2 performance regression has been reported,
has a fix available if somebody cares".
Linus
The patch titled
Subject: mm: fix panic in __alloc_pages
has been added to the -mm tree. Its filename is
mm-fix-panic-in-__alloc_pages.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-panic-in-__alloc_pages.pat…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-panic-in-__alloc_pages.pat…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alexey Makhalov <amakhalov(a)vmware.com>
Subject: mm: fix panic in __alloc_pages
There is a kernel panic caused by pcpu_alloc_pages() passing offlined and
uninitialized node to alloc_pages_node() leading to panic by NULL
dereferencing uninitialized NODE_DATA(nid).
CPU2 has been hot-added
BUG: unable to handle page fault for address: 0000000000001608
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1 Comm: systemd Tainted: G E 5.15.0-rc7+ #11
Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW
RIP: 0010:__alloc_pages+0x127/0x290
Code: 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 e0 48 8b 55 b8 c1 e8 0c 83 e0 01 88 45 d0 4c 89 c8 48 85 d2 0f 85 1a 01 00 00 <45> 3b 41 08 0f 82 10 01 00 00 48 89 45 c0 48 8b 00 44 89 e2 81 e2
RSP: 0018:ffffc900006f3bc8 EFLAGS: 00010246
RAX: 0000000000001600 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000cc2
RBP: ffffc900006f3c18 R08: 0000000000000001 R09: 0000000000001600
R10: ffffc900006f3a40 R11: ffff88813c9fffe8 R12: 0000000000000cc2
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000cc2
FS: 00007f27ead70500(0000) GS:ffff88807ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001608 CR3: 000000000582c003 CR4: 00000000001706b0
Call Trace:
pcpu_alloc_pages.constprop.0+0xe4/0x1c0
pcpu_populate_chunk+0x33/0xb0
pcpu_alloc+0x4d3/0x6f0
__alloc_percpu_gfp+0xd/0x10
alloc_mem_cgroup_per_node_info+0x54/0xb0
mem_cgroup_alloc+0xed/0x2f0
mem_cgroup_css_alloc+0x33/0x2f0
css_create+0x3a/0x1f0
cgroup_apply_control_enable+0x12b/0x150
cgroup_mkdir+0xdd/0x110
kernfs_iop_mkdir+0x4f/0x80
vfs_mkdir+0x178/0x230
do_mkdirat+0xfd/0x120
__x64_sys_mkdir+0x47/0x70
? syscall_exit_to_user_mode+0x21/0x50
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Panic can be easily reproduced by disabling udev rule for automatic
onlining hot added CPU followed by CPU with memoryless node (NUMA node
with CPU only) hot add.
Hot adding CPU and memoryless node does not bring the node to online
state. Memoryless node will be onlined only during the onlining its CPU.
Node can be in one of the following states:
1. not present.(nid == NUMA_NO_NODE)
2. present, but offline (nid > NUMA_NO_NODE, node_online(nid) == 0,
NODE_DATA(nid) == NULL)
3. present and online (nid > NUMA_NO_NODE, node_online(nid) > 0,
NODE_DATA(nid) != NULL)
Percpu code is doing allocations for all possible CPUs. The issue happens
when it serves hot added but not yet onlined CPU when its node is in 2nd
state. This node is not ready to use, fallback to numa_mem_id().
Link: https://lkml.kernel.org/r/20211108202325.20304-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/percpu-vm.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/percpu-vm.c~mm-fix-panic-in-__alloc_pages
+++ a/mm/percpu-vm.c
@@ -84,15 +84,19 @@ static int pcpu_alloc_pages(struct pcpu_
gfp_t gfp)
{
unsigned int cpu, tcpu;
- int i;
+ int i, nid;
gfp |= __GFP_HIGHMEM;
for_each_possible_cpu(cpu) {
+ nid = cpu_to_node(cpu);
+ if (nid == NUMA_NO_NODE || !node_online(nid))
+ nid = numa_mem_id();
+
for (i = page_start; i < page_end; i++) {
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
- *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+ *pagep = alloc_pages_node(nid, gfp, 0);
if (!*pagep)
goto err;
}
_
Patches currently in -mm which might be from amakhalov(a)vmware.com are
mm-fix-panic-in-__alloc_pages.patch