This is a small fallout from a work to allow batching WW mutex locks and
unlocks.
Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but
the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to
"Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested
here is to implement also the "Wound-Wait" algorithm as a per-WW-class
choice, as it has advantages in some cases. See for example
http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/dead…
Now Wound-Wait is a preemptive algorithm, and the preemption is implemented
using a lazy scheme: If a wounded transaction is about to go to sleep on
a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock
prevention. Since with WW mutexes we also require the aborted transaction to
sleep waiting to lock the WW mutex it was aborted on, this choice also provides
a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first
WW mutex lock after the transaction was wounded whether the WW mutex was
contended or not, the transaction might frequently be restarted without a wait,
which is far from optimal. Note also that with the lazy preemption scheme,
contrary to Wait-Die there will be no rollbacks on lock contention of locks
held by a transaction that has completed its locking sequence.
The modeset locks are then changed from Wait-Die to Wound-Wait since the
typical locking pattern of those locks very well matches the criterion for
a substantial reduction in the number of rollbacks. For reservation objects,
the benefit is more unclear at this point and they remain using Wait-Die.
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: David Airlie <airlied(a)linux.ie>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Kate Stewart <kstewart(a)linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne(a)nexb.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-doc(a)vger.kernel.org
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
v2:
Updated DEFINE_WW_CLASS() API, minor code- and comment fixes as
detailed in each patch.
v3:
Included cleanup patch from Peter Zijlstra. Documentation fixes and
a small correctness fix.
v4:
Reworked the correctness fix.
The issue:
Currently in ION if you allocate uncached memory it is possible that there
are still dirty lines in the cache. And often these dirty lines in the
cache are the zeros which were meant to clear out any sensitive kernel
data.
What this means is that if you allocate uncached memory from ION, and then
subsequently write to that buffer (using the uncached mapping you are
provided by ION) then the data you have written could be corrupted at some
point in the future if a dirty line is evicted from the cache.
Also this means there is a potential security issue. If an un-privileged
userspace user allocated uncached memory (for example from the system heap)
and then if they were to read from that buffer (through the un-cached
mapping they are provided by ION), and if some of the zeros which were
written to that memory are still in the cache then this un-privileged
userspace user could read potentially sensitive kernel data.
An unacceptable fix:
I have included some sample code which should resolve this issue for the
system heap and the cma heap on some architectures, however this code
would not be acceptable for upstreaming since it uses hacks to clean
the cache.
Similar changes would need to be made to carveout heap and chunk heap.
I would appreciate some feedback on the proper way for ION to clean the
caches for memory it has allocated that is intended for uncached access.
I realize that it may be tempting, as a solution to this problem, to
simply strip uncached support from ION. I hope that we can try to find a
solution which preserves uncached memory support as ION uncached memory
is often used (though perhaps not in upstreamed code) in cases such as
multimedia use cases where there is no CPU access required, in secure
heap allocations, and in some cases where there is minimal CPU access
and therefore uncached memory performs better.
Signed-off-by: Liam Mark <lmark(a)codeaurora.org>
---
drivers/staging/android/ion/ion.c | 16 ++++++++++++++++
drivers/staging/android/ion/ion.h | 5 ++++-
drivers/staging/android/ion/ion_cma_heap.c | 3 +++
drivers/staging/android/ion/ion_page_pool.c | 8 ++++++--
drivers/staging/android/ion/ion_system_heap.c | 7 ++++++-
5 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index 57e0d8035b2e..10e967b0a0f4 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -38,6 +38,22 @@ bool ion_buffer_cached(struct ion_buffer *buffer)
return !!(buffer->flags & ION_FLAG_CACHED);
}
+void ion_pages_sync_for_device(struct page *page, size_t size,
+ enum dma_data_direction dir)
+{
+ struct scatterlist sg;
+ struct device dev = {0};
+
+ /* hack, use dummy device */
+ arch_setup_dma_ops(&dev, 0, 0, NULL, false);
+
+ sg_init_table(&sg, 1);
+ sg_set_page(&sg, page, size, 0);
+ /* hack, use phys address for dma address */
+ sg_dma_address(&sg) = page_to_phys(page);
+ dma_sync_sg_for_device(&dev, &sg, 1, dir);
+}
+
/* this function should only be called while dev->lock is held */
static void ion_buffer_add(struct ion_device *dev,
struct ion_buffer *buffer)
diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h
index a238f23c9116..227b9928d185 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -192,6 +192,9 @@ struct ion_heap {
*/
bool ion_buffer_cached(struct ion_buffer *buffer);
+void ion_pages_sync_for_device(struct page *page, size_t size,
+ enum dma_data_direction dir);
+
/**
* ion_buffer_fault_user_mappings - fault in user mappings of this buffer
* @buffer: buffer
@@ -333,7 +336,7 @@ struct ion_page_pool {
struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, unsigned int order,
bool cached);
void ion_page_pool_destroy(struct ion_page_pool *pool);
-struct page *ion_page_pool_alloc(struct ion_page_pool *pool);
+struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool);
void ion_page_pool_free(struct ion_page_pool *pool, struct page *page);
/** ion_page_pool_shrink - shrinks the size of the memory cached in the pool
diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c
index 49718c96bf9e..82e80621d114 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -59,6 +59,9 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer,
memset(page_address(pages), 0, size);
}
+ if (!ion_buffer_cached(buffer))
+ ion_pages_sync_for_device(pages, size, DMA_BIDIRECTIONAL);
+
table = kmalloc(sizeof(*table), GFP_KERNEL);
if (!table)
goto err;
diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index b3017f12835f..169a321778ed 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -63,7 +63,7 @@ static struct page *ion_page_pool_remove(struct ion_page_pool *pool, bool high)
return page;
}
-struct page *ion_page_pool_alloc(struct ion_page_pool *pool)
+struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool)
{
struct page *page = NULL;
@@ -76,8 +76,12 @@ struct page *ion_page_pool_alloc(struct ion_page_pool *pool)
page = ion_page_pool_remove(pool, false);
mutex_unlock(&pool->mutex);
- if (!page)
+ if (!page) {
page = ion_page_pool_alloc_pages(pool);
+ *from_pool = false;
+ } else {
+ *from_pool = true;
+ }
return page;
}
diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c
index bc19cdd30637..3bb4604e032b 100644
--- a/drivers/staging/android/ion/ion_system_heap.c
+++ b/drivers/staging/android/ion/ion_system_heap.c
@@ -57,13 +57,18 @@ static struct page *alloc_buffer_page(struct ion_system_heap *heap,
bool cached = ion_buffer_cached(buffer);
struct ion_page_pool *pool;
struct page *page;
+ bool from_pool;
if (!cached)
pool = heap->uncached_pools[order_to_index(order)];
else
pool = heap->cached_pools[order_to_index(order)];
- page = ion_page_pool_alloc(pool);
+ page = ion_page_pool_alloc(pool, &from_pool);
+
+ if (!from_pool && !ion_buffer_cached(buffer))
+ ion_pages_sync_for_device(page, PAGE_SIZE << order,
+ DMA_BIDIRECTIONAL);
return page;
}
--
1.8.5.2
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Dear All,
The CMA related functions cma_alloc() and dma_alloc_from_contiguous()
have gfp mask parameter, but sadly they only support __GFP_NOWARN flag.
This gave their users a misleading feeling that any standard memory
allocation flags are supported, what resulted in the security issue when
caller have set __GFP_ZERO flag and expected the buffer to be cleared.
This patchset changes gfp_mask parameter to a simple boolean no_warn
argument, which covers all the underlaying code supports.
This patchset is a result of the following discussion:
https://patchwork.kernel.org/patch/10461919/
Best regards
Marek Szyprowski
Samsung R&D Institute Poland
Patch summary:
Marek Szyprowski (2):
mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
dma: remove unsupported gfp_mask parameter from
dma_alloc_from_contiguous()
arch/arm/mm/dma-mapping.c | 5 +++--
arch/arm64/mm/dma-mapping.c | 4 ++--
arch/powerpc/kvm/book3s_hv_builtin.c | 2 +-
arch/xtensa/kernel/pci-dma.c | 2 +-
drivers/iommu/amd_iommu.c | 2 +-
drivers/iommu/intel-iommu.c | 3 ++-
drivers/s390/char/vmcp.c | 2 +-
drivers/staging/android/ion/ion_cma_heap.c | 2 +-
include/linux/cma.h | 2 +-
include/linux/dma-contiguous.h | 4 ++--
kernel/dma/contiguous.c | 6 +++---
kernel/dma/direct.c | 3 ++-
mm/cma.c | 8 ++++----
mm/cma_debug.c | 2 +-
14 files changed, 25 insertions(+), 22 deletions(-)
--
2.17.1
On Wed, Jul 04, 2018 at 09:26:39AM +0200, Tomeu Vizoso wrote:
> On 07/04/2018 07:53 AM, Gerd Hoffmann wrote:
> > On Tue, Jul 03, 2018 at 10:37:57AM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 03, 2018 at 09:53:58AM +0200, Gerd Hoffmann wrote:
> > > > A driver to let userspace turn memfd regions into dma-bufs.
> > > >
> > > > Use case: Allows qemu create dmabufs for the vga framebuffer or
> > > > virtio-gpu ressources. Then they can be passed around to display
> > > > those guest things on the host. To spice client for classic full
> > > > framebuffer display, and hopefully some day to wayland server for
> > > > seamless guest window display.
> > > >
> > > > qemu test branch:
> > > > https://git.kraxel.org/cgit/qemu/log/?h=sirius/udmabuf
> > > >
> > > > Cc: David Airlie <airlied(a)linux.ie>
> > > > Cc: Tomeu Vizoso <tomeu.vizoso(a)collabora.com>
> > > > Cc: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
> > > > Cc: Daniel Vetter <daniel(a)ffwll.ch>
> > > > Signed-off-by: Gerd Hoffmann <kraxel(a)redhat.com>
> > >
> > > I think some ack for a 2nd use-case, like virtio-wl or whatever would be
> > > really cool. To give us some assurance that this is generically useful.
> >
> > Tomeu? Laurent?
>
> Sorry, but I think I will need some help to understand how this could help
> in the virtio-wl case [adding Zach Reizner to CC].
>
> Any graphics buffers that are allocated with memfd will be shared with the
> compositor via wl_shm, without need for dmabufs.
Within one machine, yes. Once virtualization is added to the mix things
become more complicated ...
When using virtio-gpu the guest will allocate graphics buffers from
normal (guest) ram, then register these buffers (which are allowed to be
scattered) with the host as resource.
qemu can use memfd to allocate guest ram. Now, with the help of
udmabuf, qemu can create a *host* dma-buf for the *guest* graphics
buffer.
That dma-buf can be used by qemu internally (mmap it to get a linear
mapping of the resource, to avoid copying). It can be passed on to
spice-client, to display the guest framebuffer.
And I think it could also be quite useful to pass guest wayland windows
to the host compositor, without mapping host-allocated buffers into the
guest, so we don't have do deal with the "find some address space for
the mapping" issue in the first place. There are more things needed to
complete this of course, but it's a building block ...
cheers,
Gerd
- Intro section that links to how this is exposed to userspace.
- Lots more hyperlinks.
- Minor clarifications and style polish
v2: Add misplaced hunk of kerneldoc from a different patch.
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
Documentation/driver-api/dma-buf.rst | 6 ++
drivers/dma-buf/dma-fence.c | 147 +++++++++++++++++++--------
2 files changed, 109 insertions(+), 44 deletions(-)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index dc384f2f7f34..b541e97c7ab1 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -130,6 +130,12 @@ Reservation Objects
DMA Fences
----------
+.. kernel-doc:: drivers/dma-buf/dma-fence.c
+ :doc: DMA fences overview
+
+DMA Fences Functions Reference
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
.. kernel-doc:: drivers/dma-buf/dma-fence.c
:export:
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 7a92f85a4cec..1551ca7df394 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -38,12 +38,43 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
*/
static atomic64_t dma_fence_context_counter = ATOMIC64_INIT(0);
+/**
+ * DOC: DMA fences overview
+ *
+ * DMA fences, represented by &struct dma_fence, are the kernel internal
+ * synchronization primitive for DMA operations like GPU rendering, video
+ * encoding/decoding, or displaying buffers on a screen.
+ *
+ * A fence is initialized using dma_fence_init() and completed using
+ * dma_fence_signal(). Fences are associated with a context, allocated through
+ * dma_fence_context_alloc(), and all fences on the same context are
+ * fully ordered.
+ *
+ * Since the purposes of fences is to facilitate cross-device and
+ * cross-application synchronization, there's multiple ways to use one:
+ *
+ * - Individual fences can be exposed as a &sync_file, accessed as a file
+ * descriptor from userspace, created by calling sync_file_create(). This is
+ * called explicit fencing, since userspace passes around explicit
+ * synchronization points.
+ *
+ * - Some subsystems also have their own explicit fencing primitives, like
+ * &drm_syncobj. Compared to &sync_file, a &drm_syncobj allows the underlying
+ * fence to be updated.
+ *
+ * - Then there's also implicit fencing, where the synchronization points are
+ * implicitly passed around as part of shared &dma_buf instances. Such
+ * implicit fences are stored in &struct reservation_object through the
+ * &dma_buf.resv pointer.
+ */
+
/**
* dma_fence_context_alloc - allocate an array of fence contexts
- * @num: [in] amount of contexts to allocate
+ * @num: amount of contexts to allocate
*
- * This function will return the first index of the number of fences allocated.
- * The fence context is used for setting fence->context to a unique number.
+ * This function will return the first index of the number of fence contexts
+ * allocated. The fence context is used for setting &dma_fence.context to a
+ * unique number by passing the context to dma_fence_init().
*/
u64 dma_fence_context_alloc(unsigned num)
{
@@ -59,10 +90,14 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
* Signal completion for software callbacks on a fence, this will unblock
* dma_fence_wait() calls and run all the callbacks added with
* dma_fence_add_callback(). Can be called multiple times, but since a fence
- * can only go from unsignaled to signaled state, it will only be effective
- * the first time.
+ * can only go from the unsignaled to the signaled state and not back, it will
+ * only be effective the first time.
+ *
+ * Unlike dma_fence_signal(), this function must be called with &dma_fence.lock
+ * held.
*
- * Unlike dma_fence_signal, this function must be called with fence->lock held.
+ * Returns 0 on success and a negative error value when @fence has been
+ * signalled already.
*/
int dma_fence_signal_locked(struct dma_fence *fence)
{
@@ -102,8 +137,11 @@ EXPORT_SYMBOL(dma_fence_signal_locked);
* Signal completion for software callbacks on a fence, this will unblock
* dma_fence_wait() calls and run all the callbacks added with
* dma_fence_add_callback(). Can be called multiple times, but since a fence
- * can only go from unsignaled to signaled state, it will only be effective
- * the first time.
+ * can only go from the unsignaled to the signaled state and not back, it will
+ * only be effective the first time.
+ *
+ * Returns 0 on success and a negative error value when @fence has been
+ * signalled already.
*/
int dma_fence_signal(struct dma_fence *fence)
{
@@ -136,9 +174,9 @@ EXPORT_SYMBOL(dma_fence_signal);
/**
* dma_fence_wait_timeout - sleep until the fence gets signaled
* or until timeout elapses
- * @fence: [in] the fence to wait on
- * @intr: [in] if true, do an interruptible wait
- * @timeout: [in] timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
+ * @fence: the fence to wait on
+ * @intr: if true, do an interruptible wait
+ * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
*
* Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the
* remaining timeout in jiffies on success. Other error values may be
@@ -148,6 +186,8 @@ EXPORT_SYMBOL(dma_fence_signal);
* directly or indirectly (buf-mgr between reservation and committing)
* holds a reference to the fence, otherwise the fence might be
* freed before return, resulting in undefined behavior.
+ *
+ * See also dma_fence_wait() and dma_fence_wait_any_timeout().
*/
signed long
dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
@@ -167,6 +207,13 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
}
EXPORT_SYMBOL(dma_fence_wait_timeout);
+/**
+ * dma_fence_release - default relese function for fences
+ * @kref: &dma_fence.recfount
+ *
+ * This is the default release functions for &dma_fence. Drivers shouldn't call
+ * this directly, but instead call dma_fence_put().
+ */
void dma_fence_release(struct kref *kref)
{
struct dma_fence *fence =
@@ -184,6 +231,13 @@ void dma_fence_release(struct kref *kref)
}
EXPORT_SYMBOL(dma_fence_release);
+/**
+ * dma_fence_free - default release function for &dma_fence.
+ * @fence: fence to release
+ *
+ * This is the default implementation for &dma_fence_ops.release. It calls
+ * kfree_rcu() on @fence.
+ */
void dma_fence_free(struct dma_fence *fence)
{
kfree_rcu(fence, rcu);
@@ -192,10 +246,11 @@ EXPORT_SYMBOL(dma_fence_free);
/**
* dma_fence_enable_sw_signaling - enable signaling on fence
- * @fence: [in] the fence to enable
+ * @fence: the fence to enable
*
- * this will request for sw signaling to be enabled, to make the fence
- * complete as soon as possible
+ * This will request for sw signaling to be enabled, to make the fence
+ * complete as soon as possible. This calls &dma_fence_ops.enable_signaling
+ * internally.
*/
void dma_fence_enable_sw_signaling(struct dma_fence *fence)
{
@@ -220,24 +275,24 @@ EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
/**
* dma_fence_add_callback - add a callback to be called when the fence
* is signaled
- * @fence: [in] the fence to wait on
- * @cb: [in] the callback to register
- * @func: [in] the function to call
+ * @fence: the fence to wait on
+ * @cb: the callback to register
+ * @func: the function to call
*
- * cb will be initialized by dma_fence_add_callback, no initialization
+ * @cb will be initialized by dma_fence_add_callback(), no initialization
* by the caller is required. Any number of callbacks can be registered
* to a fence, but a callback can only be registered to one fence at a time.
*
* Note that the callback can be called from an atomic context. If
* fence is already signaled, this function will return -ENOENT (and
- * *not* call the callback)
+ * *not* call the callback).
*
* Add a software callback to the fence. Same restrictions apply to
- * refcount as it does to dma_fence_wait, however the caller doesn't need to
- * keep a refcount to fence afterwards: when software access is enabled,
- * the creator of the fence is required to keep the fence alive until
- * after it signals with dma_fence_signal. The callback itself can be called
- * from irq context.
+ * refcount as it does to dma_fence_wait(), however the caller doesn't need to
+ * keep a refcount to fence afterward dma_fence_add_callback() has returned:
+ * when software access is enabled, the creator of the fence is required to keep
+ * the fence alive until after it signals with dma_fence_signal(). The callback
+ * itself can be called from irq context.
*
* Returns 0 in case of success, -ENOENT if the fence is already signaled
* and -EINVAL in case of error.
@@ -286,7 +341,7 @@ EXPORT_SYMBOL(dma_fence_add_callback);
/**
* dma_fence_get_status - returns the status upon completion
- * @fence: [in] the dma_fence to query
+ * @fence: the dma_fence to query
*
* This wraps dma_fence_get_status_locked() to return the error status
* condition on a signaled fence. See dma_fence_get_status_locked() for more
@@ -311,8 +366,8 @@ EXPORT_SYMBOL(dma_fence_get_status);
/**
* dma_fence_remove_callback - remove a callback from the signaling list
- * @fence: [in] the fence to wait on
- * @cb: [in] the callback to remove
+ * @fence: the fence to wait on
+ * @cb: the callback to remove
*
* Remove a previously queued callback from the fence. This function returns
* true if the callback is successfully removed, or false if the fence has
@@ -323,6 +378,9 @@ EXPORT_SYMBOL(dma_fence_get_status);
* doing, since deadlocks and race conditions could occur all too easily. For
* this reason, it should only ever be done on hardware lockup recovery,
* with a reference held to the fence.
+ *
+ * Behaviour is undefined if @cb has not been added to @fence using
+ * dma_fence_add_callback() beforehand.
*/
bool
dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
@@ -359,9 +417,9 @@ dma_fence_default_wait_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
/**
* dma_fence_default_wait - default sleep until the fence gets signaled
* or until timeout elapses
- * @fence: [in] the fence to wait on
- * @intr: [in] if true, do an interruptible wait
- * @timeout: [in] timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
+ * @fence: the fence to wait on
+ * @intr: if true, do an interruptible wait
+ * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
*
* Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the
* remaining timeout in jiffies on success. If timeout is zero the value one is
@@ -454,12 +512,12 @@ dma_fence_test_signaled_any(struct dma_fence **fences, uint32_t count,
/**
* dma_fence_wait_any_timeout - sleep until any fence gets signaled
* or until timeout elapses
- * @fences: [in] array of fences to wait on
- * @count: [in] number of fences to wait on
- * @intr: [in] if true, do an interruptible wait
- * @timeout: [in] timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
- * @idx: [out] the first signaled fence index, meaningful only on
- * positive return
+ * @fences: array of fences to wait on
+ * @count: number of fences to wait on
+ * @intr: if true, do an interruptible wait
+ * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
+ * @idx: used to store the first signaled fence index, meaningful only on
+ * positive return
*
* Returns -EINVAL on custom fence wait implementation, -ERESTARTSYS if
* interrupted, 0 if the wait timed out, or the remaining timeout in jiffies
@@ -468,6 +526,8 @@ dma_fence_test_signaled_any(struct dma_fence **fences, uint32_t count,
* Synchronous waits for the first fence in the array to be signaled. The
* caller needs to hold a reference to all fences in the array, otherwise a
* fence might be freed before return, resulting in undefined behavior.
+ *
+ * See also dma_fence_wait() and dma_fence_wait_timeout().
*/
signed long
dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
@@ -540,19 +600,18 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
/**
* dma_fence_init - Initialize a custom fence.
- * @fence: [in] the fence to initialize
- * @ops: [in] the dma_fence_ops for operations on this fence
- * @lock: [in] the irqsafe spinlock to use for locking this fence
- * @context: [in] the execution context this fence is run on
- * @seqno: [in] a linear increasing sequence number for this context
+ * @fence: the fence to initialize
+ * @ops: the dma_fence_ops for operations on this fence
+ * @lock: the irqsafe spinlock to use for locking this fence
+ * @context: the execution context this fence is run on
+ * @seqno: a linear increasing sequence number for this context
*
* Initializes an allocated fence, the caller doesn't have to keep its
* refcount after committing with this fence, but it will need to hold a
- * refcount again if dma_fence_ops.enable_signaling gets called. This can
- * be used for other implementing other types of fence.
+ * refcount again if &dma_fence_ops.enable_signaling gets called.
*
* context and seqno are used for easy comparison between fences, allowing
- * to check which fence is later by simply using dma_fence_later.
+ * to check which fence is later by simply using dma_fence_later().
*/
void
dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
--
2.18.0
Am 04.07.2018 um 11:09 schrieb Michel Dänzer:
> On 2018-07-04 10:31 AM, Christian König wrote:
>> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>>> From: Michel Dänzer <michel.daenzer(a)amd.com>
>>>
>>> Fixes the BUG_ON spuriously triggering under the following
>>> circumstances:
>>>
>>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>> the same reservation object, so it calls
>>> reservation_object_reserve_shared with that reservation object once
>>> for each such BO.
>>> * In reservation_object_reserve_shared, old->shared_count ==
>>> old->shared_max - 1, so obj->staged is freed in preparation of an
>>> in-place update.
>>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>> once for each of the BOs above, always with the same fence.
>>> * The first call adds the fence in the remaining free slot, after which
>>> old->shared_count == old->shared_max.
>> Well, the explanation here is not correct. For multiple BOs using the
>> same reservation object we won't call
>> reservation_object_add_shared_fence() multiple times because we move
>> those to the duplicates list in ttm_eu_reserve_buffers().
>>
>> But this bug can still happen because we call
>> reservation_object_add_shared_fence() manually with fences for the same
>> context in a couple of places.
>>
>> One prominent case which comes to my mind are for the VM BOs during
>> updates. Another possibility are VRAM BOs which need to be cleared.
> Thanks. How about the following:
>
> * ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
> * In reservation_object_reserve_shared, shared_count == shared_max - 1,
> so obj->staged is freed in preparation of an in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
> after which shared_count == shared_max.
> * The amdgpu driver also calls reservation_object_add_shared_fence for
> the same reservation object, and the BUG_ON triggers.
I would rather completely drop the reference to the ttm_eu_* functions,
cause those wrappers are completely unrelated to the problem.
Instead let's just note something like the following:
* When reservation_object_reserve_shared is called with shared_count ==
shared_max - 1,
so obj->staged is freed in preparation of an in-place update.
* Now reservation_object_add_shared_fence is called with the first fence
and after that shared_count == shared_max.
* After that reservation_object_add_shared_fence can be called with
follow up fences from the same context, but since shared_count ==
shared_max we would run into this BUG_ON.
> However, nothing bad would happen in
> reservation_object_add_shared_inplace, since all fences use the same
> context, so they can only occupy a single slot.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
>
> Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
> as I suspect this fix is necessary under the circumstances described
> there as well.
The rest sounds good to me.
Regards,
Christian.
Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
> From: Michel Dänzer <michel.daenzer(a)amd.com>
>
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
>
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
> the same reservation object, so it calls
> reservation_object_reserve_shared with that reservation object once
> for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
> old->shared_max - 1, so obj->staged is freed in preparation of an
> in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
> once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
> old->shared_count == old->shared_max.
Well, the explanation here is not correct. For multiple BOs using the
same reservation object we won't call
reservation_object_add_shared_fence() multiple times because we move
those to the duplicates list in ttm_eu_reserve_buffers().
But this bug can still happen because we call
reservation_object_add_shared_fence() manually with fences for the same
context in a couple of places.
One prominent case which comes to my mind are for the VM BOs during
updates. Another possibility are VRAM BOs which need to be cleared.
>
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Michel Dänzer <michel.daenzer(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>.
Regards,
Christian.
> ---
> drivers/dma-buf/reservation.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> if (signaled) {
> RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
> } else {
> + BUG_ON(fobj->shared_count >= fobj->shared_max);
> RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
> fobj->shared_count++;
> }
> @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
> old = reservation_object_get_list(obj);
> obj->staged = NULL;
>
> - if (!fobj) {
> - BUG_ON(old->shared_count >= old->shared_max);
> + if (!fobj)
> reservation_object_add_shared_inplace(obj, old, fence);
> - } else
> + else
> reservation_object_add_shared_replace(obj, old, fobj, fence);
> }
> EXPORT_SYMBOL(reservation_object_add_shared_fence);