Hi Guangming,
Am 19.07.21 um 07:19 schrieb guangming.cao(a)mediatek.com:
> From: Guangming Cao <Guangming.Cao(a)mediatek.com>
>
> Dmabuf sysfs stat is used for dmabuf info track.
> but these file maybe still use after buffer release/detach,
> should clear it before buffer release/detach.
In general looks correct to me, but Hridya already send out a patch to
partially revert the attachment sysfs files since they caused a bunch of
more problems for some users.
Please wait for that to land in branch drm-misc-next and then rebase
yours on top of it. I will give you a ping when that is done.
Thanks,
Christian.
>
> Signed-off-by: Guangming Cao <Guangming.Cao(a)mediatek.com>
> ---
> drivers/dma-buf/dma-buf.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 510b42771974..9fa4620bd4bb 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -76,12 +76,12 @@ static void dma_buf_release(struct dentry *dentry)
> */
> BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
>
> + dma_buf_stats_teardown(dmabuf);
> dmabuf->ops->release(dmabuf);
>
> if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
> dma_resv_fini(dmabuf->resv);
>
> - dma_buf_stats_teardown(dmabuf);
> module_put(dmabuf->owner);
> kfree(dmabuf->name);
> kfree(dmabuf);
> @@ -875,10 +875,11 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
> dma_resv_lock(dmabuf->resv, NULL);
> list_del(&attach->node);
> dma_resv_unlock(dmabuf->resv);
> +
> + dma_buf_attach_stats_teardown(attach);
> if (dmabuf->ops->detach)
> dmabuf->ops->detach(dmabuf, attach);
>
> - dma_buf_attach_stats_teardown(attach);
> kfree(attach);
> }
> EXPORT_SYMBOL_GPL(dma_buf_detach);
Am 14.07.21 um 14:03 schrieb guangming.cao(a)mediatek.com:
> From: Guangming.Cao <guangming.cao(a)mediatek.com>
>
> On Wed, 2021-07-14 at 12:43 +0200, Christian K鰊ig wrote:
>> Am 14.07.21 um 11:44 schrieb guangming.cao(a)mediatek.com:
>>> From: Guangming Cao <Guangming.Cao(a)mediatek.com>
>>>
>>> On Wed, 2021-07-14 at 10:46 +0200, Christian K鰊ig wrote:
>>>> Am 14.07.21 um 09:11 schrieb guangming.cao(a)mediatek.com:
>>>>> From: Guangming Cao <Guangming.Cao(a)mediatek.com>
>>>>>
>>>>> Add a refcount for kernel to prevent UAF(Use After Free) issue.
>>>> Well NAK on so many levels.
>>>>
>>>>> We can assume a case like below:
>>>>> 1. kernel space alloc dma_buf(file count = 1)
>>>>> 2. kernel use dma_buf to get fd(file count = 1)
>>>>> 3. userspace use fd to do mapping (file count = 2)
>>>> Creating an userspace mapping increases the reference count for
>>>> the
>>>> underlying file object.
>>>>
>>>> See the implementation of mmap_region():
>>>> ...
>>>> vma->vm_file = get_file(file);
>>>> error = call_mmap(file, vma);
>>>> ...
>>>>
>>>> What can happen is the the underlying exporter redirects the mmap
>>>> to
>>>> a
>>>> different file, e.g. TTM or GEM drivers do that all the time.
>>>>
>>>> But this is fine since then the VA mapping is independent of the
>>>> DMA-
>>>> buf.
>>>>
>>>>> 4. kernel call dma_buf_put (file count = 1)
>>>>> 5. userpsace close buffer fd(file count = 0)
>>>>> 6. at this time, buffer is released, but va is valid!!
>>>>> So we still can read/write buffer via mmap va,
>>>>> it maybe cause memory leak, or kernel exception.
>>>>> And also, if we use "ls -ll" to watch corresponding
>>>>> process
>>>>> fd link info, it also will cause kernel exception.
>>>>>
>>>>> Another case:
>>>>> Using dma_buf_fd to generate more than 1 fd, because
>>>>> dma_buf_fd will not increase file count, thus, when
>>>>> close
>>>>> the second fd, it maybe occurs error.
>>>> Each opened fd will increase the reference count so this is
>>>> certainly
>>>> not correct what you describe here.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>> Yes, mmap will increase file count by calling get_file, so step[2]
>>> ->
>>> step[3], file count increase 1.
>>>
>>> But, dma_buf_fd() will not increase file count.
>>> function "dma_buf_fd(struct dma_buf *dmabuf, int flags)" just get
>>> an
>>> unused fd, via call "get_unused_fd_flags(flags)", and call
>>> "fd_install(fd, dmabuf->file)", it will let associated "struct
>>> file*"
>>> in task's fdt->fd[fd] points to this dma_buf.file, not increase the
>>> file count of dma_buf.file.
>>> I think this is confusing, I can get more than 1 fds via
>>> dma_buf_fd,
>>> but they don't need to close it because they don't increase file
>>> count.
>>>
>>> However, dma_buf_put() can decrease file count at kernel side
>>> directly.
>>> If somebody write a ko to put file count of dma_buf.file many
>>> times, it
>>> will cause buffer freed earlier than except. At last on Android, I
>>> think this is a little bit dangerous.
>> dma_buf_fd() takes the dma_buf pointer and converts it into a fd. So
>> the
>> reference is consumed.
>>
>> That's why users of this interface make sure to get a separate
>> reference, see drm_gem_prime_handle_to_fd() for example:
>>
>> ...
>> out_have_handle:
>> ret = dma_buf_fd(dmabuf, flags);
>> /*
>> * We must _not_ remove the buffer from the handle cache since
>> the
>> newly
>> * created dma buf is already linked in the global obj->dma_buf
>> pointer,
>> * and that is invariant as long as a userspace gem handle
>> exists.
>> * Closing the handle will clean out the cache anyway, so we
>> don't
>> leak.
>> */
>> if (ret < 0) {
>> goto fail_put_dmabuf;
>> } else {
>> *prime_fd = ret;
>> ret = 0;
>> }
>>
>> goto out;
>>
>> fail_put_dmabuf:
>> dma_buf_put(dmabuf);
>> out:
>> ...
>>
>> You could submit a patch to improve the documentation and explicitly
>> note on dma_buf_fd() that the reference is consumed, but all of this
>> is
>> working perfectly fine.
>>
>> Regards,
>> Christian.
>>
> Thanks for your reply!
>
> Yes, drm works fine because it fully understand what dma-buf api will
> do. Improve the documentation is really good idea to prevent this case.
>
> But, what I can't understand is, for kernel api exported to
> corresponding users, we don't need to ensure all api is safe?
Well the API is perfectly safe, it is just not what you are expecting.
> And for general cases, dma-buf framework also need to prevent this
> case, isn't it, it will make dma-buf framework more strong?
What we could do is to move getting the reference into that function if
all users of that function does that anyway.
This would then be more defensive because new users of dma_buf_fd()
can't forget to grab a reference.
But this needs a complete audit of the kernel with all of the users of
dma_buf_fd().
Regards,
Christian.
>
>
> BRs!
> Guangming
>>>>> Solution:
>>>>> Add a kernel count for dma_buf, and make sure the file
>>>>> count
>>>>> of dma_buf.file hold by kernel is 1.
>>>>>
>>>>> Notes: For this solution, kref couldn't work because kernel ref
>>>>> maybe added from 0, but kref don't allow it.
>>>>>
>>>>> Signed-off-by: Guangming Cao <Guangming.Cao(a)mediatek.com>
>>>>> ---
>>>>> drivers/dma-buf/dma-buf.c | 23 +++++++++++++++++++----
>>>>> include/linux/dma-buf.h | 6 ++++--
>>>>> 2 files changed, 23 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-
>>>>> buf.c
>>>>> index 511fe0d217a0..04ee92aac8b9 100644
>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>> @@ -62,6 +62,7 @@ static void dma_buf_release(struct dentry
>>>>> *dentry)
>>>>> if (unlikely(!dmabuf))
>>>>> return;
>>>>>
>>>>> + WARN_ON(atomic64_read(&dmabuf->kernel_ref));
>>>>> BUG_ON(dmabuf->vmapping_counter);
>>>>>
>>>>> /*
>>>>> @@ -555,6 +556,7 @@ struct dma_buf *dma_buf_export(const struct
>>>>> dma_buf_export_info *exp_info)
>>>>> goto err_module;
>>>>> }
>>>>>
>>>>> + atomic64_set(&dmabuf->kernel_ref, 1);
>>>>> dmabuf->priv = exp_info->priv;
>>>>> dmabuf->ops = exp_info->ops;
>>>>> dmabuf->size = exp_info->size;
>>>>> @@ -617,6 +619,9 @@ int dma_buf_fd(struct dma_buf *dmabuf, int
>>>>> flags)
>>>>>
>>>>> fd_install(fd, dmabuf->file);
>>>>>
>>>>> + /* Add file cnt for each new fd */
>>>>> + get_file(dmabuf->file);
>>>>> +
>>>>> return fd;
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(dma_buf_fd);
>>>>> @@ -626,12 +631,13 @@ EXPORT_SYMBOL_GPL(dma_buf_fd);
>>>>> * @fd: [in] fd associated with the struct dma_buf to
>>>>> be
>>>>> returned
>>>>> *
>>>>> * On success, returns the struct dma_buf associated with an
>>>>> fd;
>>>>> uses
>>>>> - * file's refcounting done by fget to increase refcount.
>>>>> returns
>>>>> ERR_PTR
>>>>> - * otherwise.
>>>>> + * dmabuf's ref refcounting done by kref_get to increase
>>>>> refcount.
>>>>> + * Returns ERR_PTR otherwise.
>>>>> */
>>>>> struct dma_buf *dma_buf_get(int fd)
>>>>> {
>>>>> struct file *file;
>>>>> + struct dma_buf *dmabuf;
>>>>>
>>>>> file = fget(fd);
>>>>>
>>>>> @@ -643,7 +649,12 @@ struct dma_buf *dma_buf_get(int fd)
>>>>> return ERR_PTR(-EINVAL);
>>>>> }
>>>>>
>>>>> - return file->private_data;
>>>>> + dmabuf = file->private_data;
>>>>> + /* replace file count increase as ref increase for kernel
>>>>> user
>>>>> */
>>>>> + get_dma_buf(dmabuf);
>>>>> + fput(file);
>>>>> +
>>>>> + return dmabuf;
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(dma_buf_get);
>>>>>
>>>>> @@ -662,7 +673,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
>>>>> if (WARN_ON(!dmabuf || !dmabuf->file))
>>>>> return;
>>>>>
>>>>> - fput(dmabuf->file);
>>>>> + if (WARN_ON(!atomic64_read(&dmabuf->kernel_ref)))
>>>>> + return;
>>>>> +
>>>>> + if (!atomic64_dec_return(&dmabuf->kernel_ref))
>>>>> + fput(dmabuf->file);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>
>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>>> index efdc56b9d95f..bc790cb028eb 100644
>>>>> --- a/include/linux/dma-buf.h
>>>>> +++ b/include/linux/dma-buf.h
>>>>> @@ -308,6 +308,7 @@ struct dma_buf_ops {
>>>>> struct dma_buf {
>>>>> size_t size;
>>>>> struct file *file;
>>>>> + atomic64_t kernel_ref;
>>>>> struct list_head attachments;
>>>>> const struct dma_buf_ops *ops;
>>>>> struct mutex lock;
>>>>> @@ -436,7 +437,7 @@ struct dma_buf_export_info {
>>>>> .owner = THIS_MODULE }
>>>>>
>>>>> /**
>>>>> - * get_dma_buf - convenience wrapper for get_file.
>>>>> + * get_dma_buf - increase a kernel ref of dma-buf
>>>>> * @dmabuf: [in] pointer to dma_buf
>>>>> *
>>>>> * Increments the reference count on the dma-buf, needed in
>>>>> case
>>>>> of drivers
>>>>> @@ -446,7 +447,8 @@ struct dma_buf_export_info {
>>>>> */
>>>>> static inline void get_dma_buf(struct dma_buf *dmabuf)
>>>>> {
>>>>> - get_file(dmabuf->file);
>>>>> + if (atomic64_inc_return(&dmabuf->kernel_ref) == 1)
>>>>> + get_file(dmabuf->file);
>>>>> }
>>>>>
>>>>> /**
>>
Am 14.07.21 um 09:11 schrieb guangming.cao(a)mediatek.com:
> From: Guangming Cao <Guangming.Cao(a)mediatek.com>
>
> Add a refcount for kernel to prevent UAF(Use After Free) issue.
Well NAK on so many levels.
>
> We can assume a case like below:
> 1. kernel space alloc dma_buf(file count = 1)
> 2. kernel use dma_buf to get fd(file count = 1)
> 3. userspace use fd to do mapping (file count = 2)
Creating an userspace mapping increases the reference count for the
underlying file object.
See the implementation of mmap_region():
...
vma->vm_file = get_file(file);
error = call_mmap(file, vma);
...
What can happen is the the underlying exporter redirects the mmap to a
different file, e.g. TTM or GEM drivers do that all the time.
But this is fine since then the VA mapping is independent of the DMA-buf.
> 4. kernel call dma_buf_put (file count = 1)
> 5. userpsace close buffer fd(file count = 0)
> 6. at this time, buffer is released, but va is valid!!
> So we still can read/write buffer via mmap va,
> it maybe cause memory leak, or kernel exception.
> And also, if we use "ls -ll" to watch corresponding process
> fd link info, it also will cause kernel exception.
>
> Another case:
> Using dma_buf_fd to generate more than 1 fd, because
> dma_buf_fd will not increase file count, thus, when close
> the second fd, it maybe occurs error.
Each opened fd will increase the reference count so this is certainly
not correct what you describe here.
Regards,
Christian.
>
> Solution:
> Add a kernel count for dma_buf, and make sure the file count
> of dma_buf.file hold by kernel is 1.
>
> Notes: For this solution, kref couldn't work because kernel ref
> maybe added from 0, but kref don't allow it.
>
> Signed-off-by: Guangming Cao <Guangming.Cao(a)mediatek.com>
> ---
> drivers/dma-buf/dma-buf.c | 23 +++++++++++++++++++----
> include/linux/dma-buf.h | 6 ++++--
> 2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 511fe0d217a0..04ee92aac8b9 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -62,6 +62,7 @@ static void dma_buf_release(struct dentry *dentry)
> if (unlikely(!dmabuf))
> return;
>
> + WARN_ON(atomic64_read(&dmabuf->kernel_ref));
> BUG_ON(dmabuf->vmapping_counter);
>
> /*
> @@ -555,6 +556,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> goto err_module;
> }
>
> + atomic64_set(&dmabuf->kernel_ref, 1);
> dmabuf->priv = exp_info->priv;
> dmabuf->ops = exp_info->ops;
> dmabuf->size = exp_info->size;
> @@ -617,6 +619,9 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags)
>
> fd_install(fd, dmabuf->file);
>
> + /* Add file cnt for each new fd */
> + get_file(dmabuf->file);
> +
> return fd;
> }
> EXPORT_SYMBOL_GPL(dma_buf_fd);
> @@ -626,12 +631,13 @@ EXPORT_SYMBOL_GPL(dma_buf_fd);
> * @fd: [in] fd associated with the struct dma_buf to be returned
> *
> * On success, returns the struct dma_buf associated with an fd; uses
> - * file's refcounting done by fget to increase refcount. returns ERR_PTR
> - * otherwise.
> + * dmabuf's ref refcounting done by kref_get to increase refcount.
> + * Returns ERR_PTR otherwise.
> */
> struct dma_buf *dma_buf_get(int fd)
> {
> struct file *file;
> + struct dma_buf *dmabuf;
>
> file = fget(fd);
>
> @@ -643,7 +649,12 @@ struct dma_buf *dma_buf_get(int fd)
> return ERR_PTR(-EINVAL);
> }
>
> - return file->private_data;
> + dmabuf = file->private_data;
> + /* replace file count increase as ref increase for kernel user */
> + get_dma_buf(dmabuf);
> + fput(file);
> +
> + return dmabuf;
> }
> EXPORT_SYMBOL_GPL(dma_buf_get);
>
> @@ -662,7 +673,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
> if (WARN_ON(!dmabuf || !dmabuf->file))
> return;
>
> - fput(dmabuf->file);
> + if (WARN_ON(!atomic64_read(&dmabuf->kernel_ref)))
> + return;
> +
> + if (!atomic64_dec_return(&dmabuf->kernel_ref))
> + fput(dmabuf->file);
> }
> EXPORT_SYMBOL_GPL(dma_buf_put);
>
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index efdc56b9d95f..bc790cb028eb 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -308,6 +308,7 @@ struct dma_buf_ops {
> struct dma_buf {
> size_t size;
> struct file *file;
> + atomic64_t kernel_ref;
> struct list_head attachments;
> const struct dma_buf_ops *ops;
> struct mutex lock;
> @@ -436,7 +437,7 @@ struct dma_buf_export_info {
> .owner = THIS_MODULE }
>
> /**
> - * get_dma_buf - convenience wrapper for get_file.
> + * get_dma_buf - increase a kernel ref of dma-buf
> * @dmabuf: [in] pointer to dma_buf
> *
> * Increments the reference count on the dma-buf, needed in case of drivers
> @@ -446,7 +447,8 @@ struct dma_buf_export_info {
> */
> static inline void get_dma_buf(struct dma_buf *dmabuf)
> {
> - get_file(dmabuf->file);
> + if (atomic64_inc_return(&dmabuf->kernel_ref) == 1)
> + get_file(dmabuf->file);
> }
>
> /**
tldr; DMA buffers aren't normal memory, expecting that you can use
them like that (like calling get_user_pages works, or that they're
accounting like any other normal memory) cannot be guaranteed.
Since some userspace only runs on integrated devices, where all
buffers are actually all resident system memory, there's a huge
temptation to assume that a struct page is always present and useable
like for any more pagecache backed mmap. This has the potential to
result in a uapi nightmare.
To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
blocks get_user_pages and all the other struct page based
infrastructure for everyone. In spirit this is the uapi counterpart to
the kernel-internal CONFIG_DMABUF_DEBUG.
Motivated by a recent patch which wanted to swich the system dma-buf
heap to vm_insert_page instead of vm_insert_pfn.
v2:
Jason brought up that we also want to guarantee that all ptes have the
pte_special flag set, to catch fast get_user_pages (on architectures
that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
>From auditing the various functions to insert pfn pte entires
(vm_insert_pfn_prot, remap_pfn_range and all it's callers like
dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
this should be the correct flag to check for.
References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-Wb…
Acked-by: Christian König <christian.koenig(a)amd.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
--
Resending this so I can test the next two patches for vgem/shmem in
intel-gfx-ci. Last round failed somehow, but I can't repro that at all
locally here.
No immediate plans to merge this patch here since ttm isn't addressed
yet (and there we have the hugepte issue, for which I don't think we
have a clear consensus yet).
-Daniel
---
drivers/dma-buf/dma-buf.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 510b42771974..65cbd7f0f16a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -130,6 +130,7 @@ static struct file_system_type dma_buf_fs_type = {
static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
{
struct dma_buf *dmabuf;
+ int ret;
if (!is_dma_buf_file(file))
return -EINVAL;
@@ -145,7 +146,11 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
dmabuf->size >> PAGE_SHIFT)
return -EINVAL;
- return dmabuf->ops->mmap(dmabuf, vma);
+ ret = dmabuf->ops->mmap(dmabuf, vma);
+
+ WARN_ON(!(vma->vm_flags & VM_PFNMAP));
+
+ return ret;
}
static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -1276,6 +1281,8 @@ EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);
int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
unsigned long pgoff)
{
+ int ret;
+
if (WARN_ON(!dmabuf || !vma))
return -EINVAL;
@@ -1296,7 +1303,11 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
vma_set_file(vma, dmabuf->file);
vma->vm_pgoff = pgoff;
- return dmabuf->ops->mmap(dmabuf, vma);
+ ret = dmabuf->ops->mmap(dmabuf, vma);
+
+ WARN_ON(!(vma->vm_flags & VM_PFNMAP));
+
+ return ret;
}
EXPORT_SYMBOL_GPL(dma_buf_mmap);
--
2.32.0
Specifically document the new/clarified rules around how the shared
fences do not have any ordering requirements against the exclusive
fence.
But also document all the things a bit better, given how central
struct dma_resv to dynamic buffer management the docs have been very
inadequat.
- Lots more links to other pieces of the puzzle. Unfortunately
ttm_buffer_object has no docs, so no links :-(
- Explain/complain a bit about dma_resv_locking_ctx(). I still don't
like that one, but fixing the ttm call chains is going to be
horrible. Plus we want to plug in real slowpath locking when we do
that anyway.
- Main part of the patch is some actual docs for struct dma_resv.
Overall I think we still have a lot of bad naming in this area (e.g.
dma_resv.fence is singular, but contains the multiple shared fences),
but I think that's more indicative of how the semantics and rules are
just not great.
Another thing that's real awkard is how chaining exclusive fences
right now means direct dma_resv.exclusive_fence pointer access with an
rcu_assign_pointer. Not so great either.
v2:
- Fix a pile of typos (Matt, Jason)
- Hammer it in that breaking the rules leads to use-after-free issues
around dma-buf sharing (Christian)
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Cc: Jason Ekstrand <jason(a)jlekstrand.net>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
drivers/dma-buf/dma-resv.c | 24 ++++++---
include/linux/dma-buf.h | 7 +++
include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++--
3 files changed, 124 insertions(+), 11 deletions(-)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index e744fd87c63c..84fbe60629e3 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -48,6 +48,8 @@
* write operations) or N shared fences (read operations). The RCU
* mechanism is used to protect read access to fences from locked
* write-side updates.
+ *
+ * See struct dma_resv for more details.
*/
DEFINE_WD_CLASS(reservation_ww_class);
@@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini);
* @num_fences: number of fences we want to add
*
* Should be called before dma_resv_add_shared_fence(). Must
- * be called with obj->lock held.
+ * be called with @obj locked through dma_resv_lock().
+ *
+ * Note that the preallocated slots need to be re-reserved if @obj is unlocked
+ * at any time before calling dma_resv_add_shared_fence(). This is validated
+ * when CONFIG_DEBUG_MUTEXES is enabled.
*
* RETURNS
* Zero for success, or -errno
@@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max);
* @obj: the reservation object
* @fence: the shared fence to add
*
- * Add a fence to a shared slot, obj->lock must be held, and
+ * Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and
* dma_resv_reserve_shared() has been called.
+ *
+ * See also &dma_resv.fence for a discussion of the semantics.
*/
void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
{
@@ -278,9 +286,11 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);
/**
* dma_resv_add_excl_fence - Add an exclusive fence.
* @obj: the reservation object
- * @fence: the shared fence to add
+ * @fence: the exclusive fence to add
*
- * Add a fence to the exclusive slot. The obj->lock must be held.
+ * Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock().
+ * Note that this function replaces all fences attached to @obj, see also
+ * &dma_resv.fence_excl for a discussion of the semantics.
*/
void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
{
@@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)
* fence
*
* Callers are not required to hold specific locks, but maybe hold
- * dma_resv_lock() already
+ * dma_resv_lock() already.
+ *
* RETURNS
- * true if all fences signaled, else false
+ *
+ * True if all fences signaled, else false.
*/
bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all)
{
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 2b814fde0d11..8cc0c55877a6 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -420,6 +420,13 @@ struct dma_buf {
* - Dynamic importers should set fences for any access that they can't
* disable immediately from their &dma_buf_attach_ops.move_notify
* callback.
+ *
+ * IMPORTANT:
+ *
+ * All drivers must obey the struct dma_resv rules, specifically the
+ * rules for updating fences, see &dma_resv.fence_excl and
+ * &dma_resv.fence. If these dependency rules are broken access tracking
+ * can be lost resulting in use after free issues.
*/
struct dma_resv *resv;
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index e1ca2080a1ff..9100dd3dc21f 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -62,16 +62,90 @@ struct dma_resv_list {
/**
* struct dma_resv - a reservation object manages fences for a buffer
- * @lock: update side lock
- * @seq: sequence count for managing RCU read-side synchronization
- * @fence_excl: the exclusive fence, if there is one currently
- * @fence: list of current shared fences
+ *
+ * There are multiple uses for this, with sometimes slightly different rules in
+ * how the fence slots are used.
+ *
+ * One use is to synchronize cross-driver access to a struct dma_buf, either for
+ * dynamic buffer management or just to handle implicit synchronization between
+ * different users of the buffer in userspace. See &dma_buf.resv for a more
+ * in-depth discussion.
+ *
+ * The other major use is to manage access and locking within a driver in a
+ * buffer based memory manager. struct ttm_buffer_object is the canonical
+ * example here, since this is where reservation objects originated from. But
+ * use in drivers is spreading and some drivers also manage struct
+ * drm_gem_object with the same scheme.
*/
struct dma_resv {
+ /**
+ * @lock:
+ *
+ * Update side lock. Don't use directly, instead use the wrapper
+ * functions like dma_resv_lock() and dma_resv_unlock().
+ *
+ * Drivers which use the reservation object to manage memory dynamically
+ * also use this lock to protect buffer object state like placement,
+ * allocation policies or throughout command submission.
+ */
struct ww_mutex lock;
+
+ /**
+ * @seq:
+ *
+ * Sequence count for managing RCU read-side synchronization, allows
+ * read-only access to @fence_excl and @fence while ensuring we take a
+ * consistent snapshot.
+ */
seqcount_ww_mutex_t seq;
+ /**
+ * @fence_excl:
+ *
+ * The exclusive fence, if there is one currently.
+ *
+ * There are two ways to update this fence:
+ *
+ * - First by calling dma_resv_add_excl_fence(), which replaces all
+ * fences attached to the reservation object. To guarantee that no
+ * fences are lost, this new fence must signal only after all previous
+ * fences, both shared and exclusive, have signalled. In some cases it
+ * is convenient to achieve that by attaching a struct dma_fence_array
+ * with all the new and old fences.
+ *
+ * - Alternatively the fence can be set directly, which leaves the
+ * shared fences unchanged. To guarantee that no fences are lost, this
+ * new fence must signal only after the previous exclusive fence has
+ * signalled. Since the shared fences are staying intact, it is not
+ * necessary to maintain any ordering against those. If semantically
+ * only a new access is added without actually treating the previous
+ * one as a dependency the exclusive fences can be strung together
+ * using struct dma_fence_chain.
+ *
+ * Note that actual semantics of what an exclusive or shared fence mean
+ * is defined by the user, for reservation objects shared across drivers
+ * see &dma_buf.resv.
+ */
struct dma_fence __rcu *fence_excl;
+
+ /**
+ * @fence:
+ *
+ * List of current shared fences.
+ *
+ * There are no ordering constraints of shared fences against the
+ * exclusive fence slot. If a waiter needs to wait for all access, it
+ * has to wait for both sets of fences to signal.
+ *
+ * A new fence is added by calling dma_resv_add_shared_fence(). Since
+ * this often needs to be done past the point of no return in command
+ * submission it cannot fail, and therefore sufficient slots need to be
+ * reserved by calling dma_resv_reserve_shared().
+ *
+ * Note that actual semantics of what an exclusive or shared fence mean
+ * is defined by the user, for reservation objects shared across drivers
+ * see &dma_buf.resv.
+ */
struct dma_resv_list __rcu *fence;
};
@@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {}
* undefined order, a #ww_acquire_ctx is passed to unwind if a cycle
* is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation
* object may be locked by itself by passing NULL as @ctx.
+ *
+ * When a die situation is indicated by returning -EDEADLK all locks held by
+ * @ctx must be unlocked and then dma_resv_lock_slow() called on @obj.
+ *
+ * Unlocked by calling dma_resv_unlock().
+ *
+ * See also dma_resv_lock_interruptible() for the interruptible variant.
*/
static inline int dma_resv_lock(struct dma_resv *obj,
struct ww_acquire_ctx *ctx)
@@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj,
* undefined order, a #ww_acquire_ctx is passed to unwind if a cycle
* is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation
* object may be locked by itself by passing NULL as @ctx.
+ *
+ * When a die situation is indicated by returning -EDEADLK all locks held by
+ * @ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on
+ * @obj.
+ *
+ * Unlocked by calling dma_resv_unlock().
*/
static inline int dma_resv_lock_interruptible(struct dma_resv *obj,
struct ww_acquire_ctx *ctx)
@@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj,
* Acquires the reservation object after a die case. This function
* will sleep until the lock becomes available. See dma_resv_lock() as
* well.
+ *
+ * See also dma_resv_lock_slow_interruptible() for the interruptible variant.
*/
static inline void dma_resv_lock_slow(struct dma_resv *obj,
struct ww_acquire_ctx *ctx)
@@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj,
* if they overlap with a writer.
*
* Also note that since no context is provided, no deadlock protection is
- * possible.
+ * possible, which is also not needed for a trylock.
*
* Returns true if the lock was acquired, false otherwise.
*/
@@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj)
*
* Returns the context used to lock a reservation object or NULL if no context
* was used or the object is not locked at all.
+ *
+ * WARNING: This interface is pretty horrible, but TTM needs it because it
+ * doesn't pass the struct ww_acquire_ctx around in some very long callchains.
+ * Everyone else just uses it to check whether they're holding a reservation or
+ * not.
*/
static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj)
{
--
2.32.0