On Wed, May 22, 2013 at 01:18:14PM +0200, Maarten Lankhorst wrote:
Lacking the actual msg atm, I'm going to paste in here...
Subject: [PATCH v3 2/3] mutex: add support for wound/wait style locks, v3 From: Maarten Lankhorst maarten.lankhorst@xxxxxxxxxxxxx
Changes since RFC patch v1:
- Updated to use atomic_long instead of atomic, since the reservation_id was a long.
- added mutex_reserve_lock_slow and mutex_reserve_lock_intr_slow
- removed mutex_locked_set_reservation_id (or w/e it was called)
Changes since RFC patch v2:
- remove use of __mutex_lock_retval_arg, add warnings when using wrong combination of mutex_(,reserve_)lock/unlock.
Changes since v1:
- Add __always_inline to __mutex_lock_common, otherwise reservation paths can be triggered from normal locks, because __builtin_constant_p might evaluate to false for the constant 0 in that case. Tests for this have been added in the next patch.
- Updated documentation slightly.
Changes since v2:
- Renamed everything to ww_mutex. (mlankhorst)
- Added ww_acquire_ctx and ww_class. (mlankhorst)
- Added a lot of checks for wrong api usage. (mlankhorst)
- Documentation updates. (danvet)
Signed-off-by: Maarten Lankhorst maarten.lankhorst@xxxxxxxxxxxxx Signed-off-by: Daniel Vetter daniel.vetter@xxxxxxxx
Documentation/ww-mutex-design.txt | 322 +++++++++++++++++++++++++++ include/linux/mutex-debug.h | 1 include/linux/mutex.h | 257 +++++++++++++++++++++ kernel/mutex.c | 445 ++++++++++++++++++++++++++++++++++++- lib/debug_locks.c | 2 5 files changed, 1010 insertions(+), 17 deletions(-) create mode 100644 Documentation/ww-mutex-design.txt
diff --git a/Documentation/ww-mutex-design.txt b/Documentation/ww-mutex-design.txt new file mode 100644 index 0000000..154bae3 --- /dev/null +++ b/Documentation/ww-mutex-design.txt @@ -0,0 +1,322 @@ +Wait/Wound Deadlock-Proof Mutex Design +======================================
+Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
+Motivation for WW-Mutexes +-------------------------
+GPU's do operations that commonly involve many buffers. Those buffers +can be shared across contexts/processes, exist in different memory +domains (for example VRAM vs system memory), and so on. And with +PRIME / dmabuf, they can even be shared across devices. So there are +a handful of situations where the driver needs to wait for buffers to +become ready. If you think about this in terms of waiting on a buffer +mutex for it to become available, this presents a problem because +there is no way to guarantee that buffers appear in a execbuf/batch in +the same order in all contexts. That is directly under control of +userspace, and a result of the sequence of GL calls that an application +makes. Which results in the potential for deadlock. The problem gets +more complex when you consider that the kernel may need to migrate the +buffer(s) into VRAM before the GPU operates on the buffer(s), which +may in turn require evicting some other buffers (and you don't want to +evict other buffers which are already queued up to the GPU), but for a +simplified understanding of the problem you can ignore this.
+The algorithm that TTM came up with for dealing with this problem is quite +simple. For each group of buffers (execbuf) that need to be locked, the caller +would be assigned a unique reservation id/ticket, from a global counter. In +case of deadlock while locking all the buffers associated with a execbuf, the +one with the lowest reservation ticket (i.e. the oldest task) wins, and the one +with the higher reservation id (i.e. the younger task) unlocks all of the +buffers that it has already locked, and then tries again.
+In the RDBMS literature this deadlock handling approach is called wait/wound: +The older tasks waits until it can acquire the contended lock. The younger tasks +needs to back off and drop all the locks it is currently holding, i.e. the +younger task is wounded.
+Concepts +--------
+Compared to normal mutexes two additional concepts/objects show up in the lock +interface for w/w mutexes:
+Acquire context: To ensure eventual forward progress it is important the a task +trying to acquire locks doesn't grab a new reservation id, but keeps the one it +acquired when starting the lock acquisition. This ticket is stored in the +acquire context. Furthermore the acquire context keeps track of debugging state +to catch w/w mutex interface abuse.
+W/w class: In contrast to normal mutexes the lock class needs to be explicit for +w/w mutexes, since it is required to initialize the acquire context.
+Furthermore there are three different classe of w/w lock acquire functions: +- Normal lock acquisition with a context, using ww_mutex_lock +- Slowpath lock acquisition on the contending lock, used by the wounded task
- after having dropped all already acquired locks. These functions have the
- _slow postfix.
See below, I don't see the need for this interface.
+- Functions to only acquire a single w/w mutex, which results in the exact same
- semantics as a normal mutex. These functions have the _single postfix.
This is missing rationale.
+Of course, all the usual variants for handling wake-ups due to signals are also +provided.
+Usage +-----
+Three different ways to acquire locks within the same w/w class. Common +definitions for methods 1&2.
+static DEFINE_WW_CLASS(ww_class);
+struct obj {
- sct ww_mutex lock;
- /* obj data */
+};
+struct obj_entry {
- struct list_head *list;
- struct obj *obj;
+};
+Method 1, using a list in execbuf->buffers that's not allowed to be reordered. +This is useful if a list of required objects is already tracked somewhere. +Furthermore the lock helper can use propagate the -EALREADY return code back to +the caller as a signal that an object is twice on the list. This is useful if +the list is constructed from userspace input and the ABI requires userspace to +no have duplicate entries (e.g. for a gpu commandbuffer submission ioctl).
+int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) +{
- struct obj *res_obj = NULL;
- struct obj_entry *contended_entry = NULL;
- struct obj_entry *entry;
- ww_acquire_init(ctx, &ww_class);
+retry:
- list_for_each_entry (list, entry) {
if (entry == res_obj) {
res_obj = NULL;
continue;
}
ret = ww_mutex_lock(&entry->obj->lock, ctx);
if (ret < 0) {
contended_obj = entry;
goto err;
}
- }
- ww_acquire_done(ctx);
- return 0;
+err:
- list_for_each_entry_continue_reverse (list, contended_entry, entry)
ww_mutex_unlock(&entry->obj->lock);
- if (res_obj)
ww_mutex_unlock(&res_obj->lock);
- if (ret == -EDEADLK) {
/* we lost out in a seqno race, lock and retry.. */
ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
I missing the need for ww_mutex_lock_slow(). AFAICT we should be able to tell its the first lock in the ctx and thus we cannot possibly deadlock.
res_obj = contended_entry->obj;
goto retry;
- }
- ww_acquire_fini(ctx);
- return ret;
+}
... you certainly went all out on documentation.
diff --git a/include/linux/mutex-debug.h b/include/linux/mutex-debug.h index 731d77d..4ac8b19 100644 --- a/include/linux/mutex-debug.h +++ b/include/linux/mutex-debug.h @@ -3,6 +3,7 @@ #include <linux/linkage.h> #include <linux/lockdep.h> +#include <linux/debug_locks.h> /*
- Mutexes - debugging helpers:
diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 9121595..004f863 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -74,6 +74,35 @@ struct mutex_waiter { #endif }; +struct ww_class {
- atomic_long_t stamp;
- struct lock_class_key acquire_key;
- struct lock_class_key mutex_key;
- const char *acquire_name;
- const char *mutex_name;
+};
+struct ww_acquire_ctx {
- struct task_struct *task;
- unsigned long stamp;
+#ifdef CONFIG_DEBUG_MUTEXES
- unsigned acquired, done_acquire;
- struct ww_class *ww_class;
- struct ww_mutex *contending_lock;
+#endif +#ifdef CONFIG_DEBUG_LOCK_ALLOC
- struct lockdep_map dep_map;
+#endif +};
+struct ww_mutex {
- struct mutex base;
- struct ww_acquire_ctx *ctx;
+#ifdef CONFIG_DEBUG_MUTEXES
- struct ww_class *ww_class;
+#endif +};
@@ -167,6 +236,192 @@ extern int __must_check mutex_lock_killable(struct mutex *lock); */ extern int mutex_trylock(struct mutex *lock); extern void mutex_unlock(struct mutex *lock);
+/**
- ww_acquire_init - initialize a w/w acquire context
- @ctx: w/w acquire context to initialize
- @ww_class: w/w class of the context
- Initializes an context to acquire multiple mutexes of the given w/w class.
- Context-based w/w mutex acquiring can be done in any order whatsoever within
- a given lock class. Deadlocks will be detected and handled with the
- wait/wound logic.
- Mixing of context-based w/w mutex acquiring and single w/w mutex locking can
- result in undetected deadlocks and is so forbidden. Mixing different contexts
- for the same w/w class when acquiring mutexes can also result in undetected
- deadlocks, and is hence also forbidden.
- Nesting of acquire contexts for _different_ w/w classes is possible, subject
- to the usual locking rules between different lock classes.
- An acquire context must be release by the same task before the memory is
- freed with ww_acquire_fini. It is recommended to allocate the context itself
- on the stack.
- */
+static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
struct ww_class *ww_class)
+{
- ctx->task = current;
- do {
ctx->stamp = atomic_long_inc_return(&ww_class->stamp);
- } while (unlikely(!ctx->stamp));
I suppose we'll figure something out when this becomes a bottleneck. Ideally we'd do something like:
ctx->stamp = local_clock();
but for now we cannot guarantee that's not jiffies, and I suppose that's a tad too coarse to work for this.
Also, why is 0 special?
+#ifdef CONFIG_DEBUG_MUTEXES
- ctx->ww_class = ww_class;
- ctx->acquired = ctx->done_acquire = 0;
- ctx->contending_lock = NULL;
+#endif +#ifdef CONFIG_DEBUG_LOCK_ALLOC
- debug_check_no_locks_freed((void *)ctx, sizeof(*ctx));
- lockdep_init_map(&ctx->dep_map, ww_class->acquire_name,
&ww_class->acquire_key, 0);
- mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
+#endif +}
+/**
- ww_mutex_trylock_single - tries to acquire the w/w mutex without acquire context
- @lock: mutex to lock
- Trylocks a mutex without acquire context, so no deadlock detection is
- possible. Returns 0 if the mutex has been acquired.
- Unlocking the mutex must happen with a call to ww_mutex_unlock_single.
- */
+static inline int __must_check ww_mutex_trylock_single(struct ww_mutex *lock) +{
- return mutex_trylock(&lock->base);
+}
trylocks can never deadlock they don't block per definition, I don't see the point of the _single() thing here.
+/**
- ww_mutex_lock_single - acquire the w/w mutex without acquire context
- @lock: mutex to lock
- Locks a mutex without acquire context, so no deadlock detection is
- possible.
- Unlocking the mutex must happen with a call to ww_mutex_unlock_single.
- */
+static inline void ww_mutex_lock_single(struct ww_mutex *lock) +{
- mutex_lock(&lock->base);
+}
as per the above, I'm missing the rationale for having this.
diff --git a/kernel/mutex.c b/kernel/mutex.c index 84a5f07..66807c7 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -127,16 +127,156 @@ void __sched mutex_unlock(struct mutex *lock) EXPORT_SYMBOL(mutex_unlock); +/**
- ww_mutex_unlock - release the w/w mutex
- @lock: the mutex to be released
- Unlock a mutex that has been locked by this task previously
- with ww_mutex_lock* using an acquire context. It is forbidden to release the
- locks after releasing the acquire context.
- This function must not be used in interrupt context. Unlocking
- of a unlocked mutex is not allowed.
- Note that locks acquired with one of the ww_mutex_lock*single variant must be
- unlocked with ww_mutex_unlock_single.
- */
+void __sched ww_mutex_unlock(struct ww_mutex *lock) +{
- /*
* The unlocking fastpath is the 0->1 transition from 'locked'
* into 'unlocked' state:
*/
+#ifdef CONFIG_DEBUG_MUTEXES
- DEBUG_LOCKS_WARN_ON(!lock->ctx);
- if (lock->ctx) {
DEBUG_LOCKS_WARN_ON(!lock->ctx->acquired);
if (lock->ctx->acquired > 0)
lock->ctx->acquired--;
- }
+#endif
- lock->ctx = NULL;
barriers should always have a comment explaining the exact ordering and the pairing barrier's location.
- smp_mb__before_atomic_inc();
+#ifndef CONFIG_DEBUG_MUTEXES
- /*
* When debugging is enabled we must not clear the owner before time,
* the slow path will always be taken, and that clears the owner field
* after verifying that it was indeed current.
*/
- mutex_clear_owner(&lock->base);
+#endif
- __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+} +EXPORT_SYMBOL(ww_mutex_unlock);
+static inline int __sched +__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx) +{
- struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
- struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
- if (!hold_ctx)
return 0;
- if (unlikely(ctx->stamp == hold_ctx->stamp))
return -EALREADY;
Why compare stamps? I expected: ctx == hold_ctx here.
- if (unlikely(ctx->stamp - hold_ctx->stamp <= LONG_MAX)) {
Why not simply write: ctx->stamp > hold_ctx->stamp ?
If we need to deal with equal stamps from different contexts we could tie-break based on ctx address or so, but seeing its a global counter from the class, that shouldn't happen for now.
+#ifdef CONFIG_DEBUG_MUTEXES
DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
ctx->contending_lock = ww;
+#endif
return -EDEADLK;
- }
- return 0;
+}
+static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
struct ww_acquire_ctx *ww_ctx,
bool ww_slow)
+{ +#ifdef CONFIG_DEBUG_MUTEXES
- /*
* If this WARN_ON triggers, you used mutex_lock to acquire,
* but released with ww_mutex_unlock in this call.
*/
- DEBUG_LOCKS_WARN_ON(ww->ctx);
- /*
* Not quite done after ww_acquire_done() ?
*/
- DEBUG_LOCKS_WARN_ON(ww_ctx->done_acquire);
- if (ww_slow) {
s/ww_slow/!ww_ctx->acquired/
DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww);
ww_ctx->contending_lock = NULL;
- } else
DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
- /*
* Naughty, using a different class can lead to undefined behavior!
*/
- DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
- if (ww_slow)
DEBUG_LOCKS_WARN_ON(ww_ctx->acquired > 0);
- ww_ctx->acquired++;
+#endif +}
+/*
- after acquiring lock with fastpath or when we lost out in contested
- slowpath, set ctx and wake up any waiters so they can recheck.
- This function is never called when CONFIG_DEBUG_LOCK_ALLOC is set,
- as the fastpath and opportunistic spinning are disabled in that case.
- */
+static __always_inline void +ww_mutex_set_context_fastpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
+{
- unsigned long flags;
- struct mutex_waiter *cur;
- ww_mutex_lock_acquired(lock, ctx, false);
- lock->ctx = ctx;
missing comment
- smp_mb__after_atomic_dec();
- /*
* Check if lock is contended, if not there is nobody to wake up
*/
- if (likely(atomic_read(&lock->base.count) == 0))
return;
- /*
* Uh oh, we raced in fastpath, wake up everyone in this case,
* so they can see the new ctx
*/
- spin_lock_mutex(&lock->base.wait_lock, flags);
- list_for_each_entry(cur, &lock->base.wait_list, list) {
debug_mutex_wake_waiter(&lock->base, cur);
wake_up_process(cur->task);
- }
- spin_unlock_mutex(&lock->base.wait_lock, flags);
+}
/*
- Lock a mutex (possibly interruptible), slowpath:
*/ -static inline int __sched +static __always_inline int __sched __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
struct lockdep_map *nest_lock, unsigned long ip)
struct lockdep_map *nest_lock, unsigned long ip,
struct ww_acquire_ctx *ww_ctx, bool ww_slow)
{ struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags;
- int ret;
preempt_disable(); mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip); @@ -163,6 +303,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, for (;;) { struct task_struct *owner;
if (!__builtin_constant_p(ww_ctx == NULL) && !ww_slow) {
Since we _know_ ww_ctx isn't NULL, we can trivially do: s/ww_slow/!ww_ctx->acquired/
struct ww_mutex *ww;
ww = container_of(lock, struct ww_mutex, base);
if (ACCESS_ONCE(ww->ctx))
What's the point of this ACCESS_ONCE()?
break;
}
- /*
- If there's an owner, wait for it to either
- release the lock or go to sleep.
@@ -173,6 +321,13 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, if (atomic_cmpxchg(&lock->count, 1, 0) == 1) { lock_acquired(&lock->dep_map, ip);
Should this not also have a __builtin_constant_p(ww_ctx == NULL) ?
if (ww_slow) {
struct ww_mutex *ww;
ww = container_of(lock, struct ww_mutex, base);
ww_mutex_set_context_fastpath(ww, ww_ctx);
}
mutex_set_owner(lock); preempt_enable(); return 0;
@@ -228,15 +383,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * TASK_UNINTERRUPTIBLE case.) */ if (unlikely(signal_pending_state(state, task))) {
mutex_remove_waiter(lock, &waiter,
task_thread_info(task));
mutex_release(&lock->dep_map, 1, ip);
spin_unlock_mutex(&lock->wait_lock, flags);
ret = -EINTR;
goto err;
}
debug_mutex_free_waiter(&waiter);
preempt_enable();
return -EINTR;
if (!__builtin_constant_p(ww_ctx == NULL) && !ww_slow) {
ret = __mutex_lock_check_stamp(lock, ww_ctx);
if (ret)
}goto err;
- __set_task_state(task, state);
/* didn't get the lock, go to sleep: */ @@ -251,6 +407,30 @@ done: mutex_remove_waiter(lock, &waiter, current_thread_info()); mutex_set_owner(lock);
- if (!__builtin_constant_p(ww_ctx == NULL)) {
struct ww_mutex *ww = container_of(lock,
struct ww_mutex,
base);
struct mutex_waiter *cur;
/*
* This branch gets optimized out for the common case,
* and is only important for ww_mutex_lock.
*/
ww_mutex_lock_acquired(ww, ww_ctx, ww_slow);
ww->ctx = ww_ctx;
/*
* Give any possible sleeping processes the chance to wake up,
* so they can recheck if they have to back off.
*/
list_for_each_entry(cur, &lock->wait_list, list) {
debug_mutex_wake_waiter(lock, cur);
wake_up_process(cur->task);
}
- }
- /* set it to 0 if there are no waiters left: */ if (likely(list_empty(&lock->wait_list))) atomic_set(&lock->count, 0);
@@ -261,6 +441,14 @@ done: preempt_enable(); return 0;
+err:
- mutex_remove_waiter(lock, &waiter, task_thread_info(task));
- spin_unlock_mutex(&lock->wait_lock, flags);
- debug_mutex_free_waiter(&waiter);
- mutex_release(&lock->dep_map, 1, ip);
- preempt_enable();
- return ret;
} #ifdef CONFIG_DEBUG_LOCK_ALLOC @@ -268,7 +456,8 @@ void __sched mutex_lock_nested(struct mutex *lock, unsigned int subclass) { might_sleep();
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
subclass, NULL, _RET_IP_, 0, 0);
}
The pendant in me has to tell you 4x that NULL != 0 :-)
+EXPORT_SYMBOL_GPL(ww_mutex_lock); +EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible); +EXPORT_SYMBOL_GPL(ww_mutex_lock_slow); +EXPORT_SYMBOL_GPL(ww_mutex_lock_slow_interruptible);
Now having to do the _slow stuff saves lines and interface complexity!
@@ -401,20 +738,39 @@ __mutex_lock_slowpath(atomic_t *lock_count) { struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0,
NULL, _RET_IP_, 0, 0);
} static noinline int __sched __mutex_lock_killable_slowpath(struct mutex *lock) {
- return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
- return __mutex_lock_common(lock, TASK_KILLABLE, 0,
NULL, _RET_IP_, 0, 0);
} static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock) {
- return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
- return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0,
NULL, _RET_IP_, 0, 0);
}
A few more cases where NULL != 0 :-)