We found an issue with null pointer access due to kprobe debug exception
error handling on 5.10, and I proposed a separate fix patch for 5.10,
see [1]. But as Greg gave advice, we always choose to backport relevant
patches from upstream to fix issues with stable kernels, so I made this
patch set.
The main one we need to backport is patch 5, which uses int3 instead of
debug trap for single-stepping, thus avoiding the problems we
encountered with kprobe debug exception error handling. Patches 1-4 are
pre-patches, and patches 6-9 are fixes for patch 5. The major
modifications are patch 2 and patch 5. Patch 2 optimizes
resume_execution() to avoid repeated instruction decoding, and patch 5
uses int3 instead of debug trap, and as Masami said in the commit
message this patch will change some behavior of kprobe, but it has
almost no effect on the actual usage.
Please let me know if there are any problems, thanks!
[1] https://lore.kernel.org/lkml/20230630020845.227939-1-lihuafei1@huawei.com/
Gustavo A. R. Silva (1):
kprobes/x86: Fix fall-through warnings for Clang
Masami Hiramatsu (5):
x86/kprobes: Do not decode opcode in resume_execution()
x86/kprobes: Retrieve correct opcode for group instruction
x86/kprobes: Identify far indirect JMP correctly
x86/kprobes: Use int3 instead of debug trap for single-step
x86/kprobes: Fix to identify indirect jmp and others using range case
Masami Hiramatsu (Google) (1):
x86/kprobes: Update kcb status flag after singlestepping
Nadav Amit (1):
x86/kprobes: Fix JNG/JNLE emulation
Wei Yongjun (1):
x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss()
declaration
arch/x86/include/asm/kprobes.h | 24 +-
arch/x86/kernel/kprobes/core.c | 639 ++++++++++++++++++++-------------
arch/x86/kernel/traps.c | 3 -
3 files changed, 409 insertions(+), 257 deletions(-)
--
2.17.1
NOTE: This patch is tested against 5.4 stable
NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.
The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
were fixed in the separate
commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
which was incompatible with 5.4
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.34.1
During recent vma locking patch reviews Linus and Jann Horn noted a number
of issues with vma locking and suggested improvements:
1. walk_page_range() does not have ability to write-lock a vma during the
walk when it's done under mmap_write_lock. For example s390_reset_cmma().
2. Vma locking is hidden inside vm_flags modifiers and is hard to follow.
Suggestion is to change vm_flags_reset{_once} to assert that vma is
write-locked and require an explicit locking.
3. Same issue with vma_prepare() hiding vma locking.
4. In userfaultfd vm_flags are modified after vma->vm_userfaultfd_ctx and
page faults can operate on a context while it's changed.
5. do_brk_flags() and __install_special_mapping() not locking a newly
created vma before adding it into the mm. While not strictly a problem,
this is fragile if vma is modified after insertion, as in the
mmap_region() case which was recently fixed. Suggestion is to always lock
a new vma before inserting it and making it visible to page faults.
6. vma_assert_write_locked() for CONFIG_PER_VMA_LOCK=n would benefit from
being mmap_assert_write_locked() instead of no-op and then any place which
operates on a vma and calls mmap_assert_write_locked() can be converted
into vma_assert_write_locked().
I CC'ed stable only on the first patch because others are cleanups and the
bug in userfaultfd does not affect stable (lock_vma_under_rcu prevents
uffds from being handled under vma lock protection). However I would be
happy if the whole series is merged into stable 6.4 since it makes vma
locking more maintainable.
The patches apply cleanly over Linus' ToT and will conflict when applied
over mm-unstable due to missing [1]. The conflict can be easily resolved
by ignoring conflicting deletions but probably simpler to take [1] into
mm-unstable and avoid later conflict.
[1] commit 6c21e066f925 ("mm/mempolicy: Take VMA lock before replacing policy")
Changes since v1:
- replace walk_page_range() parameter with mm_walk_ops.walk_lock,
per Linus
- introduced page_walk_lock enum to allow different locking modes
during a walk, per Linus
- added Liam's Reviewed-by
Suren Baghdasaryan (6):
mm: enable page walking API to lock vmas during the walk
mm: for !CONFIG_PER_VMA_LOCK equate write lock assertion for vma and
mmap
mm: replace mmap with vma write lock assertions when operating on a
vma
mm: lock vma explicitly before doing vm_flags_reset and
vm_flags_reset_once
mm: always lock new vma before inserting into vma tree
mm: move vma locking out of vma_prepare
arch/powerpc/kvm/book3s_hv_uvmem.c | 1 +
arch/powerpc/mm/book3s64/subpage_prot.c | 1 +
arch/riscv/mm/pageattr.c | 1 +
arch/s390/mm/gmap.c | 5 ++++
drivers/infiniband/hw/hfi1/file_ops.c | 1 +
fs/proc/task_mmu.c | 5 ++++
fs/userfaultfd.c | 6 +++++
include/linux/mm.h | 13 ++++++---
include/linux/pagewalk.h | 11 ++++++++
mm/damon/vaddr.c | 2 ++
mm/hmm.c | 1 +
mm/hugetlb.c | 2 +-
mm/khugepaged.c | 5 ++--
mm/ksm.c | 25 ++++++++++-------
mm/madvise.c | 8 +++---
mm/memcontrol.c | 2 ++
mm/memory-failure.c | 1 +
mm/memory.c | 2 +-
mm/mempolicy.c | 22 +++++++++------
mm/migrate_device.c | 1 +
mm/mincore.c | 1 +
mm/mlock.c | 4 ++-
mm/mmap.c | 29 +++++++++++++-------
mm/mprotect.c | 2 ++
mm/pagewalk.c | 36 ++++++++++++++++++++++---
mm/vmscan.c | 1 +
26 files changed, 146 insertions(+), 42 deletions(-)
--
2.41.0.585.gd2178a4bd4-goog
When setting ZT0 via ptrace we do not currently force a reload of the
floating point register state from memory, do that to ensure that the newly
set value gets loaded into the registers on next task execution.
The function was templated off the function for FPSIMD which due to our
providing the option of embedding a FPSIMD regset within the SVE regset
does not directly include the flush.
Fixes: f90b529bcbe5 ("arm64/sme: Implement ZT0 ptrace support")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/kernel/ptrace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index d7f4f0d1ae12..740e81e9db04 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1180,6 +1180,8 @@ static int zt_set(struct task_struct *target,
if (ret == 0)
target->thread.svcr |= SVCR_ZA_MASK;
+ fpsimd_flush_task_state(target);
+
return ret;
}
---
base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4
change-id: 20230802-arm64-fix-ptrace-zt0-flush-d6d71b9f8461
Best regards,
--
Mark Brown <broonie(a)kernel.org>
When setting SME vector lengths we clear TIF_SME to reenable SME traps,
doing a reallocation of the backing storage on next use. We do this using
clear_thread_flag() which operates on the current thread, meaning that when
setting the vector length via ptrace we may both not force traps for the
target task and force a spurious flush of any SME state that the tracing
task may have.
Clear the flag in the target task.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett(a)arm.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/kernel/fpsimd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 520b681a07bb..a61a1fd6492d 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -909,7 +909,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
*/
task->thread.svcr &= ~(SVCR_SM_MASK |
SVCR_ZA_MASK);
- clear_thread_flag(TIF_SME);
+ clear_tsk_thread_flag(task, TIF_SME);
free_sme = true;
}
}
---
base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4
change-id: 20230802-arm64-fix-ptrace-tif-sme-0bfd94c8266d
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The original patches fixing CVE-2023-1076 are incorrect in my opinion.
This small series fixes them up; see the individual commit messages for
explanation.
I have a very elaborate test procedure demonstrating the problem for
both tun and tap; it involves libvirt, qemu, and "crash". I can share
that procedure if necessary, but it's indeed quite long (I wrote it
originally for our QE team).
The patches in this series are supposed to "re-fix" CVE-2023-1076; given
that said CVE is classified as Low Impact (CVSSv3=5.5), I'm posting this
publicly, and not suggesting any embargo. Red Hat Product Security may
assign a new CVE number later.
I've tested the patches on top of v6.5-rc4, with "crash" built at commit
c74f375e0ef7.
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Pietro Borrello <borrello(a)diag.uniroma1.it>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Laszlo Ersek (2):
net: tun_chr_open(): set sk_uid from current_fsuid()
net: tap_open(): set sk_uid from current_fsuid()
drivers/net/tap.c | 2 +-
drivers/net/tun.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4