From: Oleg Nesterov oleg@redhat.com
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid, xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE, unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/events/uprobes.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4b52cb2ae6d62..cc605df73d72f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1683,8 +1683,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) static void xol_free_insn_slot(struct task_struct *tsk) { struct xol_area *area; - unsigned long vma_end; unsigned long slot_addr; + unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask) return; @@ -1693,24 +1693,21 @@ static void xol_free_insn_slot(struct task_struct *tsk) if (unlikely(!slot_addr)) return;
+ tsk->utask->xol_vaddr = 0; area = tsk->mm->uprobes_state.xol_area; - vma_end = area->vaddr + PAGE_SIZE; - if (area->vaddr <= slot_addr && slot_addr < vma_end) { - unsigned long offset; - int slot_nr; - - offset = slot_addr - area->vaddr; - slot_nr = offset / UPROBE_XOL_SLOT_BYTES; - if (slot_nr >= UINSNS_PER_PAGE) - return; + offset = slot_addr - area->vaddr; + /* + * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE). + * This check can only fail if the "[uprobes]" vma was mremap'ed. + */ + if (offset < PAGE_SIZE) { + int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap); atomic_dec(&area->slot_count); smp_mb__after_atomic(); /* pairs with prepare_to_wait() */ if (waitqueue_active(&area->wq)) wake_up(&area->wq); - - tsk->utask->xol_vaddr = 0; } }
From: Breno Leitao leitao@debian.org
[ Upstream commit de20037e1b3c2f2ca97b8c12b8c7bca8abd509a7 ]
Warning at every leaking bits can cause a flood of message, triggering various stall-warning mechanisms to fire, including CSD locks, which makes the machine to be unusable.
Track the bits that are being leaked, and only warn when a new bit is set.
That said, this patch will help with the following issues:
1) It will tell us which bits are being set, so, it is easy to communicate it back to vendor, and to do a root-cause analyzes.
2) It avoid the machine to be unusable, because, worst case scenario, the user gets less than 60 WARNs (one per unhandled bit).
Suggested-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Sandipan Das sandipan.das@amd.com Reviewed-by: Paul E. McKenney paulmck@kernel.org Link: https://lkml.kernel.org/r/20241001141020.2620361-1-leitao@debian.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/amd/core.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 920e3a640cadd..b4a1a2576510e 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -943,11 +943,12 @@ static int amd_pmu_v2_snapshot_branch_stack(struct perf_branch_entry *entries, u static int amd_pmu_v2_handle_irq(struct pt_regs *regs) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + static atomic64_t status_warned = ATOMIC64_INIT(0); + u64 reserved, status, mask, new_bits, prev_bits; struct perf_sample_data data; struct hw_perf_event *hwc; struct perf_event *event; int handled = 0, idx; - u64 reserved, status, mask; bool pmu_enabled;
/* @@ -1012,7 +1013,12 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs) * the corresponding PMCs are expected to be inactive according to the * active_mask */ - WARN_ON(status > 0); + if (status > 0) { + prev_bits = atomic64_fetch_or(status, &status_warned); + // A new bit was set for the very first time. + new_bits = status & ~prev_bits; + WARN(new_bits, "New overflows for inactive PMCs: %llx\n", new_bits); + }
/* Clear overflow and freeze bits */ amd_pmu_ack_global_status(~status);
From: Thomas Hellström thomas.hellstrom@linux.intel.com
[ Upstream commit 823a566221a5639f6c69424897218e5d6431a970 ]
When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the number of acquired lockdep_maps of mutexes of the same class, and also keeps a pointer to the first acquired lockdep_map of a class. That pointer is then used for various comparison-, printing- and checking purposes, but there is no mechanism to actively ensure that lockdep_map stays in memory. Instead, a warning is printed if the lockdep_map is freed and there are still held locks of the same lock class, even if the lockdep_map itself has been released.
In the context of WW/WD transactions that means that if a user unlocks and frees a ww_mutex from within an ongoing ww transaction, and that mutex happens to be the first ww_mutex grabbed in the transaction, such a warning is printed and there might be a risk of a UAF.
Note that this is only problem when lockdep is enabled and affects only dereferences of struct lockdep_map.
Adjust to this by adding a fake lockdep_map to the acquired context and make sure it is the first acquired lockdep map of the associated ww_mutex class. Then hold it for the duration of the WW/WD transaction.
This has the side effect that trying to lock a ww mutex *without* a ww_acquire_context but where a such context has been acquire, we'd see a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so modify that particular test to not acquire a ww_acquire_context if it is not going to be used.
Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel... Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ww_mutex.h | 14 ++++++++++++++ kernel/locking/test-ww_mutex.c | 8 +++++--- 2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h index bb763085479af..a401a2f31a775 100644 --- a/include/linux/ww_mutex.h +++ b/include/linux/ww_mutex.h @@ -65,6 +65,16 @@ struct ww_acquire_ctx { #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; + /** + * @first_lock_dep_map: fake lockdep_map for first locked ww_mutex. + * + * lockdep requires the lockdep_map for the first locked ww_mutex + * in a ww transaction to remain in memory until all ww_mutexes of + * the transaction have been unlocked. Ensure this by keeping a + * fake locked ww_mutex lockdep map between ww_acquire_init() and + * ww_acquire_fini(). + */ + struct lockdep_map first_lock_dep_map; #endif #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH unsigned int deadlock_inject_interval; @@ -146,7 +156,10 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx, debug_check_no_locks_freed((void *)ctx, sizeof(*ctx)); lockdep_init_map(&ctx->dep_map, ww_class->acquire_name, &ww_class->acquire_key, 0); + lockdep_init_map(&ctx->first_lock_dep_map, ww_class->mutex_name, + &ww_class->mutex_key, 0); mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_); + mutex_acquire_nest(&ctx->first_lock_dep_map, 0, 0, &ctx->dep_map, _RET_IP_); #endif #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH ctx->deadlock_inject_interval = 1; @@ -185,6 +198,7 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx) static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx) { #ifdef CONFIG_DEBUG_LOCK_ALLOC + mutex_release(&ctx->first_lock_dep_map, _THIS_IP_); mutex_release(&ctx->dep_map, _THIS_IP_); #endif #ifdef DEBUG_WW_MUTEXES diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c index 10a5736a21c22..5d58b2c0ef98b 100644 --- a/kernel/locking/test-ww_mutex.c +++ b/kernel/locking/test-ww_mutex.c @@ -62,7 +62,8 @@ static int __test_mutex(unsigned int flags) int ret;
ww_mutex_init(&mtx.mutex, &ww_class); - ww_acquire_init(&ctx, &ww_class); + if (flags & TEST_MTX_CTX) + ww_acquire_init(&ctx, &ww_class);
INIT_WORK_ONSTACK(&mtx.work, test_mutex_work); init_completion(&mtx.ready); @@ -90,7 +91,8 @@ static int __test_mutex(unsigned int flags) ret = wait_for_completion_timeout(&mtx.done, TIMEOUT); } ww_mutex_unlock(&mtx.mutex); - ww_acquire_fini(&ctx); + if (flags & TEST_MTX_CTX) + ww_acquire_fini(&ctx);
if (ret) { pr_err("%s(flags=%x): mutual exclusion failure\n", @@ -679,7 +681,7 @@ static int __init test_ww_mutex_init(void) if (ret) return ret;
- ret = stress(2047, hweight32(STRESS_ALL)*ncpus, STRESS_ALL); + ret = stress(2046, hweight32(STRESS_ALL)*ncpus, STRESS_ALL); if (ret) return ret;
On Sun, 2024-11-24 at 07:46 -0500, Sasha Levin wrote:
From: Thomas Hellström thomas.hellstrom@linux.intel.com
[ Upstream commit 823a566221a5639f6c69424897218e5d6431a970 ]
When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the number of acquired lockdep_maps of mutexes of the same class, and also keeps a pointer to the first acquired lockdep_map of a class. That pointer is then used for various comparison-, printing- and checking purposes, but there is no mechanism to actively ensure that lockdep_map stays in memory. Instead, a warning is printed if the lockdep_map is freed and there are still held locks of the same lock class, even if the lockdep_map itself has been released.
In the context of WW/WD transactions that means that if a user unlocks and frees a ww_mutex from within an ongoing ww transaction, and that mutex happens to be the first ww_mutex grabbed in the transaction, such a warning is printed and there might be a risk of a UAF.
Note that this is only problem when lockdep is enabled and affects only dereferences of struct lockdep_map.
Adjust to this by adding a fake lockdep_map to the acquired context and make sure it is the first acquired lockdep map of the associated ww_mutex class. Then hold it for the duration of the WW/WD transaction.
This has the side effect that trying to lock a ww mutex *without* a ww_acquire_context but where a such context has been acquire, we'd see a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so modify that particular test to not acquire a ww_acquire_context if it is not going to be used.
Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel... Signed-off-by: Sasha Levin sashal@kernel.org
It looks like the last version was not picked for this patch.
https://lore.kernel.org/all/172918113162.1279674.6570518059490493206@2413ebb...
The version in this autosel patch regresses the locking api selftests and should not be backported. Same for the corresponding backports for 6.11 and 6.6. Let me know if I should reply separately to those.
Thanks, Thomas
include/linux/ww_mutex.h | 14 ++++++++++++++ kernel/locking/test-ww_mutex.c | 8 +++++--- 2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h index bb763085479af..a401a2f31a775 100644 --- a/include/linux/ww_mutex.h +++ b/include/linux/ww_mutex.h @@ -65,6 +65,16 @@ struct ww_acquire_ctx { #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map;
- /**
* @first_lock_dep_map: fake lockdep_map for first locked
ww_mutex.
*
* lockdep requires the lockdep_map for the first locked
ww_mutex
* in a ww transaction to remain in memory until all
ww_mutexes of
* the transaction have been unlocked. Ensure this by
keeping a
* fake locked ww_mutex lockdep map between
ww_acquire_init() and
* ww_acquire_fini().
*/
- struct lockdep_map first_lock_dep_map;
#endif #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH unsigned int deadlock_inject_interval; @@ -146,7 +156,10 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx, debug_check_no_locks_freed((void *)ctx, sizeof(*ctx)); lockdep_init_map(&ctx->dep_map, ww_class->acquire_name, &ww_class->acquire_key, 0);
- lockdep_init_map(&ctx->first_lock_dep_map, ww_class-
mutex_name,
&ww_class->mutex_key, 0);
mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
- mutex_acquire_nest(&ctx->first_lock_dep_map, 0, 0, &ctx-
dep_map, _RET_IP_);
#endif #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH ctx->deadlock_inject_interval = 1; @@ -185,6 +198,7 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx) static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx) { #ifdef CONFIG_DEBUG_LOCK_ALLOC
- mutex_release(&ctx->first_lock_dep_map, _THIS_IP_);
mutex_release(&ctx->dep_map, _THIS_IP_); #endif #ifdef DEBUG_WW_MUTEXES diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test- ww_mutex.c index 10a5736a21c22..5d58b2c0ef98b 100644 --- a/kernel/locking/test-ww_mutex.c +++ b/kernel/locking/test-ww_mutex.c @@ -62,7 +62,8 @@ static int __test_mutex(unsigned int flags) int ret; ww_mutex_init(&mtx.mutex, &ww_class);
- ww_acquire_init(&ctx, &ww_class);
- if (flags & TEST_MTX_CTX)
ww_acquire_init(&ctx, &ww_class);
INIT_WORK_ONSTACK(&mtx.work, test_mutex_work); init_completion(&mtx.ready); @@ -90,7 +91,8 @@ static int __test_mutex(unsigned int flags) ret = wait_for_completion_timeout(&mtx.done, TIMEOUT); } ww_mutex_unlock(&mtx.mutex);
- ww_acquire_fini(&ctx);
- if (flags & TEST_MTX_CTX)
ww_acquire_fini(&ctx);
if (ret) { pr_err("%s(flags=%x): mutual exclusion failure\n", @@ -679,7 +681,7 @@ static int __init test_ww_mutex_init(void) if (ret) return ret;
- ret = stress(2047, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
- ret = stress(2046, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
if (ret) return ret;
On Mon, Nov 25, 2024 at 11:06:38AM +0100, Thomas Hellström wrote:
On Sun, 2024-11-24 at 07:46 -0500, Sasha Levin wrote:
From: Thomas Hellström thomas.hellstrom@linux.intel.com
[ Upstream commit 823a566221a5639f6c69424897218e5d6431a970 ]
When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the number of acquired lockdep_maps of mutexes of the same class, and also keeps a pointer to the first acquired lockdep_map of a class. That pointer is then used for various comparison-, printing- and checking purposes, but there is no mechanism to actively ensure that lockdep_map stays in memory. Instead, a warning is printed if the lockdep_map is freed and there are still held locks of the same lock class, even if the lockdep_map itself has been released.
In the context of WW/WD transactions that means that if a user unlocks and frees a ww_mutex from within an ongoing ww transaction, and that mutex happens to be the first ww_mutex grabbed in the transaction, such a warning is printed and there might be a risk of a UAF.
Note that this is only problem when lockdep is enabled and affects only dereferences of struct lockdep_map.
Adjust to this by adding a fake lockdep_map to the acquired context and make sure it is the first acquired lockdep map of the associated ww_mutex class. Then hold it for the duration of the WW/WD transaction.
This has the side effect that trying to lock a ww mutex *without* a ww_acquire_context but where a such context has been acquire, we'd see a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so modify that particular test to not acquire a ww_acquire_context if it is not going to be used.
Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20241009092031.6356-1-thomas.hellstrom@linux.intel... Signed-off-by: Sasha Levin sashal@kernel.org
It looks like the last version was not picked for this patch.
https://lore.kernel.org/all/172918113162.1279674.6570518059490493206@2413ebb...
The version in this autosel patch regresses the locking api selftests and should not be backported. Same for the corresponding backports for 6.11 and 6.6. Let me know if I should reply separately to those.
This is what ended up landing upstream...
I can drop it from the autosel queue, but if this has issues then you should also fix it up upstream.
From: Przemek Kitszel przemyslaw.kitszel@intel.com
[ Upstream commit fcc22ac5baf06dd17193de44b60dbceea6461983 ]
Change scoped_guard() and scoped_cond_guard() macros to make reasoning about them easier for static analysis tools (smatch, compiler diagnostics), especially to enable them to tell if the given usage of scoped_guard() is with a conditional lock class (interruptible-locks, try-locks) or not (like simple mutex_lock()).
Add compile-time error if scoped_cond_guard() is used for non-conditional lock class.
Beyond easier tooling and a little shrink reported by bloat-o-meter this patch enables developer to write code like:
int foo(struct my_drv *adapter) { scoped_guard(spinlock, &adapter->some_spinlock) return adapter->spinlock_protected_var; }
Current scoped_guard() implementation does not support that, due to compiler complaining: error: control reaches end of non-void function [-Werror=return-type]
Technical stuff about the change: scoped_guard() macro uses common idiom of using "for" statement to declare a scoped variable. Unfortunately, current logic is too hard for compiler diagnostics to be sure that there is exactly one loop step; fix that.
To make any loop so trivial that there is no above warning, it must not depend on any non-const variable to tell if there are more steps. There is no obvious solution for that in C, but one could use the compound statement expression with "goto" jumping past the "loop", effectively leaving only the subscope part of the loop semantics.
More impl details: one more level of macro indirection is now needed to avoid duplicating label names; I didn't spot any other place that is using the "for (...; goto label) if (0) label: break;" idiom, so it's not packed for reuse beyond scoped_guard() family, what makes actual macros code cleaner.
There was also a need to introduce const true/false variable per lock class, it is used to aid compiler diagnostics reasoning about "exactly 1 step" loops (note that converting that to function would undo the whole benefit).
Big thanks to Andy Shevchenko for help on this patch, both internal and public, ranging from whitespace/formatting, through commit message clarifications, general improvements, ending with presenting alternative approaches - all despite not even liking the idea.
Big thanks to Dmitry Torokhov for the idea of compile-time check for scoped_cond_guard() (to use it only with conditional locsk), and general improvements for the patch.
Big thanks to David Lechner for idea to cover also scoped_cond_guard().
Signed-off-by: Przemek Kitszel przemyslaw.kitszel@intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Dmitry Torokhov dmitry.torokhov@gmail.com Link: https://lkml.kernel.org/r/20241018113823.171256-1-przemyslaw.kitszel@intel.c... Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/cleanup.h | 52 +++++++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 10 deletions(-)
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h index 038b2d523bf88..9464724b99737 100644 --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -285,14 +285,20 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \ * similar to scoped_guard(), except it does fail when the lock * acquire fails. * + * Only for conditional locks. */
+#define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond) \ +static __maybe_unused const bool class_##_name##_is_conditional = _is_cond + #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ { return *_T; }
#define DEFINE_GUARD_COND(_name, _ext, _condlock) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \ EXTEND_CLASS(_name, _ext, \ ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \ class_##_name##_t _T) \ @@ -303,17 +309,40 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \ CLASS(_name, __UNIQUE_ID(guard))
#define __guard_ptr(_name) class_##_name##_lock_ptr +#define __is_cond_ptr(_name) class_##_name##_is_conditional
-#define scoped_guard(_name, args...) \ - for (CLASS(_name, scope)(args), \ - *done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1) - -#define scoped_cond_guard(_name, _fail, args...) \ - for (CLASS(_name, scope)(args), \ - *done = NULL; !done; done = (void *)1) \ - if (!__guard_ptr(_name)(&scope)) _fail; \ - else - +/* + * Helper macro for scoped_guard(). + * + * Note that the "!__is_cond_ptr(_name)" part of the condition ensures that + * compiler would be sure that for the unconditional locks the body of the + * loop (caller-provided code glued to the else clause) could not be skipped. + * It is needed because the other part - "__guard_ptr(_name)(&scope)" - is too + * hard to deduce (even if could be proven true for unconditional locks). + */ +#define __scoped_guard(_name, _label, args...) \ + for (CLASS(_name, scope)(args); \ + __guard_ptr(_name)(&scope) || !__is_cond_ptr(_name); \ + ({ goto _label; })) \ + if (0) { \ +_label: \ + break; \ + } else + +#define scoped_guard(_name, args...) \ + __scoped_guard(_name, __UNIQUE_ID(label), args) + +#define __scoped_cond_guard(_name, _fail, _label, args...) \ + for (CLASS(_name, scope)(args); true; ({ goto _label; })) \ + if (!__guard_ptr(_name)(&scope)) { \ + BUILD_BUG_ON(!__is_cond_ptr(_name)); \ + _fail; \ +_label: \ + break; \ + } else + +#define scoped_cond_guard(_name, _fail, args...) \ + __scoped_cond_guard(_name, _fail, __UNIQUE_ID(label), args) /* * Additional helper macros for generating lock guards with types, either for * locks that don't have a native type (eg. RCU, preempt) or those that need a @@ -369,14 +398,17 @@ static inline class_##_name##_t class_##_name##_constructor(void) \ }
#define DEFINE_LOCK_GUARD_1(_name, _type, _lock, _unlock, ...) \ +__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ __DEFINE_UNLOCK_GUARD(_name, _type, _unlock, __VA_ARGS__) \ __DEFINE_LOCK_GUARD_1(_name, _type, _lock)
#define DEFINE_LOCK_GUARD_0(_name, _lock, _unlock, ...) \ +__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__) \ __DEFINE_LOCK_GUARD_0(_name, _lock)
#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \ EXTEND_CLASS(_name, _ext, \ ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\ if (_T->lock && !(_condlock)) _T->lock = NULL; \
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit c163e40af9b2331b2c629fd4ec8b703ed4d4ae39 ]
clocksource_delta() has two variants. One with a check for negative motion, which is only selected by x86. This is a historic leftover as this function was previously used in the time getter hot paths.
Since 135225a363ae timekeeping_cycles_to_ns() has unconditional protection against this as a by-product of the protection against 64bit math overflow.
clocksource_delta() is only used in the clocksource watchdog and in timekeeping_advance(). The extra conditional there is not hurting anyone.
Remove the config option and unconditionally prevent negative motion of the readout.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: John Stultz jstultz@google.com Link: https://lore.kernel.org/all/20241031120328.599430157@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/Kconfig | 1 - kernel/time/Kconfig | 5 ----- kernel/time/timekeeping_internal.h | 7 ------- 3 files changed, 13 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7b9a7e8f39acc..171be04eca1f5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -145,7 +145,6 @@ config X86 select ARCH_HAS_PARANOID_L1D_FLUSH select BUILDTIME_TABLE_SORT select CLKEVT_I8253 - select CLOCKSOURCE_VALIDATE_LAST_CYCLE select CLOCKSOURCE_WATCHDOG # Word-size accesses may read uninitialized data past the trailing \0 # in strings and cause false KMSAN reports. diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 8ebb6d5a106be..b0b97a60aaa6f 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -17,11 +17,6 @@ config ARCH_CLOCKSOURCE_DATA config ARCH_CLOCKSOURCE_INIT bool
-# Clocksources require validation of the clocksource against the last -# cycle update - x86/TSC misfeature -config CLOCKSOURCE_VALIDATE_LAST_CYCLE - bool - # Timekeeping vsyscall support config GENERIC_TIME_VSYSCALL bool diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h index 4ca2787d1642e..1d4854d5c386e 100644 --- a/kernel/time/timekeeping_internal.h +++ b/kernel/time/timekeeping_internal.h @@ -15,7 +15,6 @@ extern void tk_debug_account_sleep_time(const struct timespec64 *t); #define tk_debug_account_sleep_time(x) #endif
-#ifdef CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) { u64 ret = (now - last) & mask; @@ -26,12 +25,6 @@ static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) */ return ret & ~(mask >> 1) ? 0 : ret; } -#else -static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) -{ - return (now - last) & mask; -} -#endif
/* Semi public for serialization of non timekeeper VDSO updates. */ extern raw_spinlock_t timekeeper_lock;
Hi Sasha,
but why do you think it makes sense to backport these "uprobes" cleanups?
Oleg.
On 11/24, Sasha Levin wrote:
From: Oleg Nesterov oleg@redhat.com
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
Clear utask->xol_vaddr unconditionally, even if this addr is not valid, xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
Add a comment to explain why do we need to validate slot_addr.
Simplify the validation above. We can simply check offset < PAGE_SIZE, unsigned underflows are fine, it should work if slot_addr < area->vaddr.
Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
kernel/events/uprobes.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4b52cb2ae6d62..cc605df73d72f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1683,8 +1683,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) static void xol_free_insn_slot(struct task_struct *tsk) { struct xol_area *area;
- unsigned long vma_end; unsigned long slot_addr;
unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask) return;
@@ -1693,24 +1693,21 @@ static void xol_free_insn_slot(struct task_struct *tsk) if (unlikely(!slot_addr)) return;
- tsk->utask->xol_vaddr = 0; area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
unsigned long offset;
int slot_nr;
offset = slot_addr - area->vaddr;
slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
if (slot_nr >= UINSNS_PER_PAGE)
return;
offset = slot_addr - area->vaddr;
/*
* slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
* This check can only fail if the "[uprobes]" vma was mremap'ed.
*/
if (offset < PAGE_SIZE) {
int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap); atomic_dec(&area->slot_count); smp_mb__after_atomic(); /* pairs with prepare_to_wait() */ if (waitqueue_active(&area->wq)) wake_up(&area->wq);
}tsk->utask->xol_vaddr = 0;
}
-- 2.43.0
On Sun, Nov 24, 2024 at 02:13:57PM +0100, Oleg Nesterov wrote:
Hi Sasha,
but why do you think it makes sense to backport these "uprobes" cleanups?
If it doesn't, I'm happy to drop them. This commit was selected mostly because of:
- Clear utask->xol_vaddr unconditionally, even if this addr is not valid, xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
Which sounded to me like more than a cleanup.
On 11/24, Sasha Levin wrote:
On Sun, Nov 24, 2024 at 02:13:57PM +0100, Oleg Nesterov wrote:
Hi Sasha,
but why do you think it makes sense to backport these "uprobes" cleanups?
If it doesn't, I'm happy to drop them. This commit was selected mostly because of:
I'd suggest to drop.
- Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
Which sounded to me like more than a cleanup.
Sorry for confusion. This patch doesn't make much sense without the next changes.
Oleg.
linux-stable-mirror@lists.linaro.org