Am 15.04.24 um 12:35 schrieb zhiguojiang:
> 在 2024/4/12 14:39, Christian König 写道:
>> [Some people who received this message don't often get email from
>> christian.koenig(a)amd.com. Learn why this is important at
>> https://aka.ms/LearnAboutSenderIdentification ]
>>
>> Am 12.04.24 um 08:19 schrieb zhiguojiang:
>>> [SNIP]
>>> -> Here task 2220 do epoll again where internally it will get/put then
>>> start to free twice and lead to final crash.
>>>
>>> Here is the basic flow:
>>>
>>> 1. Thread A install the dma_buf_fd via dma_buf_export, the fd refcount
>>> is 1
>>>
>>> 2. Thread A add the fd to epoll list via epoll_ctl(EPOLL_CTL_ADD)
>>>
>>> 3. After use the dma buf, Thread A close the fd, then here fd refcount
>>> is 0,
>>> and it will run __fput finally to release the file
>>
>> Stop, that isn't correct.
>>
>> The fs layer which calls dma_buf_poll() should make sure that the file
>> can't go away until the function returns.
>>
>> Then inside dma_buf_poll() we add another reference to the file while
>> installing the callback:
>>
>> /* Paired with fput in dma_buf_poll_cb */
>> get_file(dmabuf->file);
> Hi,
>
> The problem may just occurred here.
>
> Is it possible file reference count already decreased to 0 and fput
> already being in progressing just before calling
> get_file(dmabuf->file) in dma_buf_poll()?
No, exactly that isn't possible.
If a function gets a dma_buf pointer or even more general any reference
counted pointer which has already decreased to 0 then that is a major
bug in the caller of that function.
BTW: It's completely illegal to work around such issues by using
file_count() or RCU functions. So when you suggest stuff like that it
will immediately face rejection.
Regards,
Christian.
>
>>
>> This reference is only dropped after the callback is completed in
>> dma_buf_poll_cb():
>>
>> /* Paired with get_file in dma_buf_poll */
>> fput(dmabuf->file);
>>
>> So your explanation for the issue just seems to be incorrect.
>>
>>>
>>> 4. Here Thread A not do epoll_ctl(EPOLL_CTL_DEL) manunally, so it
>>> still resides in one epoll_list.
>>> Although __fput will call eventpoll_release to remove the file from
>>> binded epoll list,
>>> but it has small time window where Thread B jumps in.
>>
>> Well if that is really the case then that would then be a bug in
>> epoll_ctl().
>>
>>>
>>> 5. During the small window, Thread B do the poll action for
>>> dma_buf_fd, where it will fget/fput for the file,
>>> this means the fd refcount will be 0 -> 1 -> 0, and it will call
>>> __fput again.
>>> This will lead to __fput twice for the same file.
>>>
>>> 6. So the potenial fix is use get_file_rcu which to check if file
>>> refcount already zero which means under free.
>>> If so, we just return and no need to do the dma_buf_poll.
>>
>> Well to say it bluntly as far as I can see this suggestion is completely
>> nonsense.
>>
>> When the reference to the file goes away while dma_buf_poll() is
>> executed then that's a massive bug in the caller of that function.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Here is the race condition:
>>>
>>> Thread A Thread B
>>> dma_buf_export
>>> fd_refcount is 1
>>> epoll_ctl(EPOLL_ADD)
>>> add dma_buf_fd to epoll list
>>> close(dma_buf_fd)
>>> fd_refcount is 0
>>> __fput
>>> dma_buf_poll
>>> fget
>>> fput
>>> fd_refcount is zero again
>>>
>>> Thanks
>>>
>>
>
On 10/04/2024 11:29, Alexandre Mergnat wrote:
>
>
> On 09/04/2024 17:46, Krzysztof Kozlowski wrote:
>>> + soc {
>>> + #address-cells = <2>;
>>> + #size-cells = <2>;
>>> +
>>> + afe@11220000 {
>> Did you implement the comment or decided to keep afe?
>>
>
> Though it was clear according to [1]:
> "
> Audio Front End, this is the same name used for other MTK SoC, to be
> consistent.
>
> Cook a new patch serie to change "afe" by "audio-controller" for all MTK
> SoC would be great.
> "
>
> I want to keep it and fix it later with ALL other MTK SoC.
> You didn't answer after that, I though it was ok for you...
Then no, I don't agree. If you add code, which you already plan to fix,
it means the code is not correct somehow. Then just add correct code in
the beginning.
Best regards,
Krzysztof
Am 12.04.24 um 08:19 schrieb zhiguojiang:
> [SNIP]
> -> Here task 2220 do epoll again where internally it will get/put then
> start to free twice and lead to final crash.
>
> Here is the basic flow:
>
> 1. Thread A install the dma_buf_fd via dma_buf_export, the fd refcount
> is 1
>
> 2. Thread A add the fd to epoll list via epoll_ctl(EPOLL_CTL_ADD)
>
> 3. After use the dma buf, Thread A close the fd, then here fd refcount
> is 0,
> and it will run __fput finally to release the file
Stop, that isn't correct.
The fs layer which calls dma_buf_poll() should make sure that the file
can't go away until the function returns.
Then inside dma_buf_poll() we add another reference to the file while
installing the callback:
/* Paired with fput in dma_buf_poll_cb */
get_file(dmabuf->file);
This reference is only dropped after the callback is completed in
dma_buf_poll_cb():
/* Paired with get_file in dma_buf_poll */
fput(dmabuf->file);
So your explanation for the issue just seems to be incorrect.
>
> 4. Here Thread A not do epoll_ctl(EPOLL_CTL_DEL) manunally, so it
> still resides in one epoll_list.
> Although __fput will call eventpoll_release to remove the file from
> binded epoll list,
> but it has small time window where Thread B jumps in.
Well if that is really the case then that would then be a bug in
epoll_ctl().
>
> 5. During the small window, Thread B do the poll action for
> dma_buf_fd, where it will fget/fput for the file,
> this means the fd refcount will be 0 -> 1 -> 0, and it will call
> __fput again.
> This will lead to __fput twice for the same file.
>
> 6. So the potenial fix is use get_file_rcu which to check if file
> refcount already zero which means under free.
> If so, we just return and no need to do the dma_buf_poll.
Well to say it bluntly as far as I can see this suggestion is completely
nonsense.
When the reference to the file goes away while dma_buf_poll() is
executed then that's a massive bug in the caller of that function.
Regards,
Christian.
>
> Here is the race condition:
>
> Thread A Thread B
> dma_buf_export
> fd_refcount is 1
> epoll_ctl(EPOLL_ADD)
> add dma_buf_fd to epoll list
> close(dma_buf_fd)
> fd_refcount is 0
> __fput
> dma_buf_poll
> fget
> fput
> fd_refcount is zero again
>
> Thanks
>
On Thu, Apr 11, 2024 at 1:21 AM Rong Qianfeng <11065417(a)vivo.com> wrote:
>
>
> 在 2024/4/10 0:37, T.J. Mercier 写道:
> > [You don't often get email from tjmercier(a)google.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065417(a)vivo.com> wrote:
> >>
> >> 在 2024/4/8 15:58, Christian König 写道:
> >>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >>>> [SNIP]
> >>>>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >>>>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >>>>>> offset and len.
> >>>>> You have not given an use case for this so it is a bit hard to
> >>>>> review. And from the existing use cases I don't see why this should
> >>>>> be necessary.
> >>>>>
> >>>>> Even worse from the existing backend implementation I don't even see
> >>>>> how drivers should be able to fulfill this semantics.
> >>>>>
> >>>>> Please explain further,
> >>>>> Christian.
> >>>> Here is a practical case:
> >>>> The user space can allocate a large chunk of dma-buf for
> >>>> self-management, used as a shared memory pool.
> >>>> Small dma-buf can be allocated from this shared memory pool and
> >>>> released back to it after use, thus improving the speed of dma-buf
> >>>> allocation and release.
> >>>> Additionally, custom functionalities such as memory statistics and
> >>>> boundary checking can be implemented in the user space.
> >>>> Of course, the above-mentioned functionalities require the
> >>>> implementation of a partial cache sync interface.
> >>> Well that is obvious, but where is the code doing that?
> >>>
> >>> You can't send out code without an actual user of it. That will
> >>> obviously be rejected.
> >>>
> >>> Regards,
> >>> Christian.
> >> In fact, we have already used the user-level dma-buf memory pool in the
> >> camera shooting scenario on the phone.
> >>
> >> From the test results, The execution time of the photo shooting
> >> algorithm has been reduced from 3.8s to 3s.
> >>
> > For phones, the (out of tree) Android version of the system heap has a
> > page pool connected to a shrinker. That allows you to skip page
> > allocation without fully pinning the memory like you get when
> > allocating a dma-buf that's way larger than necessary. If it's for a
> > camera maybe you need physically contiguous memory, but it's also
> > possible to set that up.
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/d…
> >
> Thank you for the reminder.
>
> The page pool of the system heap can meet the needs of most scenarios,
> but the camera shooting scenario is quite special.
>
> Why the camera shooting scenario needs to have a dma-buf memory pool in
> the user-level?
>
> (1) The memory demand is extremely high and time requirements are
> stringent.
Preallocating for this makes sense to me, whether it is individual
buffers or one large one.
> (2) The memory in the page pool(system heap) is easily reclaimed or used
> by other apps.
Yeah, by design I guess. I have seen an implementation where the page
pool is proactively refilled after it has been empty for some time
which would help in scenarios where the pool is often reclaimed and
low/empty. You could add that in a vendor heap.
> (3) High concurrent allocation and release (with deferred-free) lead to
> high memory usage peaks.
Hopefully this is not every frame? I saw enough complaints about the
deferred free of pool pages that it hasn't been carried forward since
Android 13, so this should be less of a problem on recent kernels.
> Additionally, after the camera exits, the shared memory pool can be
> released, with minimal impact.
Why do you care about the difference here? In one case it's when the
buffer refcount goes to 0 and memory is freed immediately, and in the
other it's the next time reclaim runs the shrinker.
I don't want to add UAPI for DMA_BUF_IOCTL_SYNC_PARTIAL to Android
without it being in the upstream kernel. I don't think we can get that
without an in-kernel user of dma_buf_begin_cpu_access_partial first,
even though your use case wouldn't rely on that in-kernel usage. :\ So
if you want to add this to phones for your camera app, then I think
your best option is to add a vendor driver which implements this IOCTL
and calls the dma_buf_begin_cpu_access_partial functions which are
already exported.
Best,
T.J.
On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065417(a)vivo.com> wrote:
>
>
> 在 2024/4/8 15:58, Christian König 写道:
> > Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >> [SNIP]
> >>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >>>> offset and len.
> >>>
> >>> You have not given an use case for this so it is a bit hard to
> >>> review. And from the existing use cases I don't see why this should
> >>> be necessary.
> >>>
> >>> Even worse from the existing backend implementation I don't even see
> >>> how drivers should be able to fulfill this semantics.
> >>>
> >>> Please explain further,
> >>> Christian.
> >> Here is a practical case:
> >> The user space can allocate a large chunk of dma-buf for
> >> self-management, used as a shared memory pool.
> >> Small dma-buf can be allocated from this shared memory pool and
> >> released back to it after use, thus improving the speed of dma-buf
> >> allocation and release.
> >> Additionally, custom functionalities such as memory statistics and
> >> boundary checking can be implemented in the user space.
> >> Of course, the above-mentioned functionalities require the
> >> implementation of a partial cache sync interface.
> >
> > Well that is obvious, but where is the code doing that?
> >
> > You can't send out code without an actual user of it. That will
> > obviously be rejected.
> >
> > Regards,
> > Christian.
>
> In fact, we have already used the user-level dma-buf memory pool in the
> camera shooting scenario on the phone.
>
> From the test results, The execution time of the photo shooting
> algorithm has been reduced from 3.8s to 3s.
>
For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker. That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/d…
> To be honest, I didn't understand your concern "...how drivers should be
> able to fulfill this semantics." Can you please help explain it in more
> detail?
>
> Thanks,
>
> Rong Qianfeng.
>
> >
> >>
> >> Thanks
> >> Rong Qianfeng.
> >
>
On Wed, 10 Apr 2024 16:02:00 +0200 Julien Panis wrote:
> > You shouldn't build the skb upfront any more. Give the page to the HW,
> > once HW sends you a completion - build the skbs. If build fails
> > (allocation failure) just give the page back to HW. If it succeeds,
> > however, you'll get a skb which is far more likely to be cache hot.
>
> Not sure I get this point.
>
> "Give the page to the HW" = Do you mean that I should dma_map_single()
> a full page (|PAGE_SIZE|) in am65_cpsw_nuss_rx_push() ?
Yes, I think so. I think that's what you effectively do now anyway,
you just limit the len and wrap it in an skb. But
am65_cpsw_nuss_rx_push() will effectively get that page back from
skb->data and map it.
On Wed, 10 Apr 2024 10:36:16 +0200 Julien Panis wrote:
> Also, about mem alloc failures, shouldn't we free 'pool' on kstrdup_const()
> error at the beginning of k3_cppi_desc_pool_create_name() ?
> I mean, it's not visible in my patch but I now wonder if this was done
> properly even before I modify the file.
Yes, it uses managed memory (devm_*) but prueth (I didn't check other
callers) calls it from .ndo_open. So while not technically a full leak
we can accumulate infinite memory by repeatedly failing here :S
On Mon, 08 Apr 2024 11:38:04 +0200 Julien Panis wrote:
> +static struct sk_buff *am65_cpsw_alloc_skb(struct am65_cpsw_rx_chn *rx_chn,
> + struct net_device *ndev,
> + unsigned int len,
> + int desc_idx,
> + bool allow_direct)
> +{
> + struct sk_buff *skb;
> + struct page *page;
> +
> + page = page_pool_dev_alloc_pages(rx_chn->page_pool);
> + if (unlikely(!page))
> + return NULL;
> +
> + len += AM65_CPSW_HEADROOM;
> +
> + skb = build_skb(page_address(page), len);
You shouldn't build the skb upfront any more. Give the page to the HW,
once HW sends you a completion - build the skbs. If build fails
(allocation failure) just give the page back to HW. If it succeeds,
however, you'll get a skb which is far more likely to be cache hot.
> + if (unlikely(!skb)) {
> + page_pool_put_full_page(rx_chn->page_pool, page, allow_direct);
> + rx_chn->pages[desc_idx] = NULL;
> + return NULL;
> + }
On Mon, 08 Apr 2024 11:38:03 +0200 Julien Panis wrote:
> goto gen_pool_create_fail;
> }
>
> + pool->desc_infos = kcalloc(pool->num_desc,
> + sizeof(*pool->desc_infos), GFP_KERNEL);
> + if (!pool->desc_infos) {
> + ret = -ENOMEM;
> + dev_err(pool->dev,
> + "pool descriptor infos alloc failed %d\n", ret);
Please don't add errors on mem alloc failures. They just bloat the
kernel, there will be a rather large OOM splat in the logs if GFP_KERNEL
allocation fails.
> + kfree_const(pool_name);
> + goto gen_pool_desc_infos_alloc_fail;
> + }
> +
> pool->gen_pool->name = pool_name;
If you add the new allocation after this line, I think you wouldn't
have to free pool_name under the if () explicitly.
--
pw-bot: cr
On 09/04/2024 15:42, Alexandre Mergnat wrote:
> Add the audio codec sub-device. This sub-device is used to set required
> and optional voltage properties between the codec and the board.
>
> Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> ---
> Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml b/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml
> index 37423c2e0fdf..7c6a4a587b5f 100644
> --- a/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml
> +++ b/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml
> @@ -37,6 +37,11 @@ properties:
> "#interrupt-cells":
> const: 2
>
> + codec:
> + type: object
> + $ref: /schemas/sound/mt6357.yaml
> + unevaluatedProperties: false
Just put the properties here, without the codec node.
Also, your example is now incomplete.
Best regards,
Krzysztof
On 09/04/2024 15:42, Alexandre Mergnat wrote:
> Add soundcard bindings for the MT8365 SoC with the MT6357 audio codec.
>
> Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> +patternProperties:
> + "^dai-link-[0-9]+$":
> + type: object
> + description:
> + Container for dai-link level properties and CODEC sub-nodes.
> +
> + properties:
> + codec:
> + type: object
> + description: Holds subnode which indicates codec dai.
> +
> + properties:
> + sound-dai:
> + maxItems: 1
> + description: phandle of the codec DAI
> +
> + additionalProperties: false
> +
> + link-name:
> + description:
> + This property corresponds to the name of the BE dai-link to which
> + we are going to update parameters in this node.
> + items:
> + const: 2ND_I2S_BE
What is the type of link-name? Why is it fixed? How can you have here
multiple dai links if all of them must have the same name?
Best regards,
Krzysztof
Am 09.04.24 um 09:32 schrieb Rong Qianfeng:
>
> 在 2024/4/8 15:58, Christian König 写道:
>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
>>> [SNIP]
>>>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
>>>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
>>>>> offset and len.
>>>>
>>>> You have not given an use case for this so it is a bit hard to
>>>> review. And from the existing use cases I don't see why this should
>>>> be necessary.
>>>>
>>>> Even worse from the existing backend implementation I don't even
>>>> see how drivers should be able to fulfill this semantics.
>>>>
>>>> Please explain further,
>>>> Christian.
>>> Here is a practical case:
>>> The user space can allocate a large chunk of dma-buf for
>>> self-management, used as a shared memory pool.
>>> Small dma-buf can be allocated from this shared memory pool and
>>> released back to it after use, thus improving the speed of dma-buf
>>> allocation and release.
>>> Additionally, custom functionalities such as memory statistics and
>>> boundary checking can be implemented in the user space.
>>> Of course, the above-mentioned functionalities require the
>>> implementation of a partial cache sync interface.
>>
>> Well that is obvious, but where is the code doing that?
>>
>> You can't send out code without an actual user of it. That will
>> obviously be rejected.
>>
>> Regards,
>> Christian.
>
> In fact, we have already used the user-level dma-buf memory pool in
> the camera shooting scenario on the phone.
>
> From the test results, The execution time of the photo shooting
> algorithm has been reduced from 3.8s to 3s.
>
> To be honest, I didn't understand your concern "...how drivers should
> be able to fulfill this semantics." Can you please help explain it in
> more detail?
Well you don't give any upstream driver code which actually uses this
interface.
If you want to suggest some changes to the core Linux kernel your driver
actually needs to be upstream.
As long as that isn't the case this approach here is a completely no-go.
Regards,
Christian.
>
> Thanks,
>
> Rong Qianfeng.
>
>>
>>>
>>> Thanks
>>> Rong Qianfeng.
>>
On Tue, 09 Apr 2024 12:13:25 +0200, Alexandre Mergnat wrote:
> Add MT8365 audio codec bindings to set required
> and optional voltage properties between the codec and the board.
> The properties are:
> - phandle of the requiered power supply.
> - Setup of microphone bias voltage.
> - Setup of the speaker pin pull-down.
>
> Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> ---
> .../devicetree/bindings/sound/mt6357.yaml | 54 ++++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/sound/mt6357.yaml: properties:vaud28-supply: '$ref' is not one of ['description', 'deprecated']
from schema $id: http://devicetree.org/meta-schemas/core.yaml#
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240226-aud…
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
On Tue, 09 Apr 2024 12:13:24 +0200, Alexandre Mergnat wrote:
> Add the audio codec sub-device. This sub-device is used to set required
> and optional voltage properties between the codec and the board.
>
> Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> ---
> Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml | 5 +++++
> 1 file changed, 5 insertions(+)
>
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml:
Error in referenced schema matching $id: http://devicetree.org/schemas/sound/mt6357.yaml
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240226-aud…
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> [SNIP]
>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
>>> offset and len.
>>
>> You have not given an use case for this so it is a bit hard to
>> review. And from the existing use cases I don't see why this should
>> be necessary.
>>
>> Even worse from the existing backend implementation I don't even see
>> how drivers should be able to fulfill this semantics.
>>
>> Please explain further,
>> Christian.
> Here is a practical case:
> The user space can allocate a large chunk of dma-buf for
> self-management, used as a shared memory pool.
> Small dma-buf can be allocated from this shared memory pool and
> released back to it after use, thus improving the speed of dma-buf
> allocation and release.
> Additionally, custom functionalities such as memory statistics and
> boundary checking can be implemented in the user space.
> Of course, the above-mentioned functionalities require the
> implementation of a partial cache sync interface.
Well that is obvious, but where is the code doing that?
You can't send out code without an actual user of it. That will
obviously be rejected.
Regards,
Christian.
>
> Thanks
> Rong Qianfeng.
Am 02.04.24 um 08:49 schrieb zhiguojiang:
>> As far as I can see that's not because of the DMA-buf code, but
>> because you are somehow using this interface incorrectly.
>>
>> When dma_buf_poll() is called it is mandatory for the caller to hold
>> a reference to the file descriptor on which the poll operation is
>> executed.
>>
>> So adding code like "if (!file_count(file))" in the beginning of
>> dma_buf_poll() is certainly broken.
>>
>> My best guess is that you have some unbalanced
>> dma_buf_get()/dma_buf_put() somewhere instead.
>>
>>
> Hi Christian,
>
> The kernel dma_buf_poll() code shound not cause system crashes due to
> the user mode usage logical issues ?
What user mode logical issues are you talking about? Closing a file
while polling on it is perfectly valid.
dma_buf_poll() is called by the filesystem layer and it's the filesystem
layer which should make sure that a file can't be closed while polling
for an event.
If that doesn't work then you have stumbled over a massive bug in the fs
layer. And I have some doubts that this is actually the case.
What is more likely is that some driver messes up the reference count
and because of this you see an UAF.
Anyway as far as I can see the DMA-buf code is correct regarding this.
Regards,
Christian.
>
> Thanks
>
>
> 在 2024/4/1 20:22, Christian König 写道:
>> Am 27.03.24 um 03:29 schrieb Zhiguo Jiang:
>>> The issue is a UAF issue of dmabuf file fd. Throght debugging, we found
>>> that the dmabuf file fd is added to the epoll event listener list, and
>>> when it is released, it is not removed from the epoll list, which leads
>>> to the UAF(Use-After-Free) issue.
>>
>> As far as I can see that's not because of the DMA-buf code, but
>> because you are somehow using this interface incorrectly.
>>
>> When dma_buf_poll() is called it is mandatory for the caller to hold
>> a reference to the file descriptor on which the poll operation is
>> executed.
>>
>> So adding code like "if (!file_count(file))" in the beginning of
>> dma_buf_poll() is certainly broken.
>>
>> My best guess is that you have some unbalanced
>> dma_buf_get()/dma_buf_put() somewhere instead.
>>
>> Regards,
>> Christian.
>>
>>>
>>> The UAF issue can be solved by checking dmabuf file->f_count value and
>>> skipping the poll operation for the closed dmabuf file in the
>>> dma_buf_poll(). We have tested this solved patch multiple times and
>>> have not reproduced the uaf issue.
>>>
>>> crash dump:
>>> list_del corruption, ffffff8a6f143a90->next is LIST_POISON1
>>> (dead000000000100)
>>> ------------[ cut here ]------------
>>> kernel BUG at lib/list_debug.c:55!
>>> Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>> pc : __list_del_entry_valid+0x98/0xd4
>>> lr : __list_del_entry_valid+0x98/0xd4
>>> sp : ffffffc01d413d00
>>> x29: ffffffc01d413d00 x28: 00000000000000c0 x27: 0000000000000020
>>> x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000080007
>>> x23: ffffff8b22e5dcc0 x22: ffffff88a6be12d0 x21: ffffff8b22e572b0
>>> x20: ffffff80254ed0a0 x19: ffffff8a6f143a00 x18: ffffffda5efed3c0
>>> x17: 6165642820314e4f x16: 53494f505f545349 x15: 4c20736920747865
>>> x14: 6e3e2d3039613334 x13: 2930303130303030 x12: 0000000000000018
>>> x11: ffffff8b6c188000 x10: 00000000ffffffff x9 : 6c8413a194897b00
>>> x8 : 6c8413a194897b00 x7 : 74707572726f6320 x6 : 6c65645f7473696c
>>> x5 : ffffff8b6c3b2a3e x4 : ffffff8b6c3b2a40 x3 : ffff103000001005
>>> x2 : 0000000000000001 x1 : 00000000000000c0 x0 : 000000000000004e
>>> Call trace:
>>> __list_del_entry_valid+0x98/0xd4
>>> dma_buf_file_release+0x48/0x90
>>> __fput+0xf4/0x280
>>> ____fput+0x10/0x20
>>> task_work_run+0xcc/0xf4
>>> do_notify_resume+0x2a0/0x33c
>>> el0_svc+0x5c/0xa4
>>> el0t_64_sync_handler+0x68/0xb4
>>> el0t_64_sync+0x1a0/0x1a4
>>> Code: d0006fe0 912c5000 f2fbd5a2 94231a01 (d4210000)
>>> ---[ end trace 0000000000000000 ]---
>>> Kernel panic - not syncing: Oops - BUG: Fatal exception
>>> SMP: stopping secondary CPUs
>>>
>>> Signed-off-by: Zhiguo Jiang <justinjiang(a)vivo.com>
>>> ---
>>> drivers/dma-buf/dma-buf.c | 28 ++++++++++++++++++++++++----
>>> 1 file changed, 24 insertions(+), 4 deletions(-)
>>> mode change 100644 => 100755 drivers/dma-buf/dma-buf.c
>>>
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index 8fe5aa67b167..e469dd9288cc
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -240,6 +240,10 @@ static __poll_t dma_buf_poll(struct file *file,
>>> poll_table *poll)
>>> struct dma_resv *resv;
>>> __poll_t events;
>>> + /* Check if the file exists */
>>> + if (!file_count(file))
>>> + return EPOLLERR;
>>> +
>>> dmabuf = file->private_data;
>>> if (!dmabuf || !dmabuf->resv)
>>> return EPOLLERR;
>>> @@ -266,8 +270,15 @@ static __poll_t dma_buf_poll(struct file *file,
>>> poll_table *poll)
>>> spin_unlock_irq(&dmabuf->poll.lock);
>>> if (events & EPOLLOUT) {
>>> - /* Paired with fput in dma_buf_poll_cb */
>>> - get_file(dmabuf->file);
>>> + /*
>>> + * Paired with fput in dma_buf_poll_cb,
>>> + * Skip poll for the closed file.
>>> + */
>>> + if (!get_file_rcu(&dmabuf->file)) {
>>> + events &= ~EPOLLOUT;
>>> + dcb->active = 0;
>>> + goto clear_out_event;
>>> + }
>>> if (!dma_buf_poll_add_cb(resv, true, dcb))
>>> /* No callback queued, wake up any other waiters */
>>> @@ -277,6 +288,7 @@ static __poll_t dma_buf_poll(struct file *file,
>>> poll_table *poll)
>>> }
>>> }
>>> +clear_out_event:
>>> if (events & EPOLLIN) {
>>> struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_in;
>>> @@ -289,8 +301,15 @@ static __poll_t dma_buf_poll(struct file
>>> *file, poll_table *poll)
>>> spin_unlock_irq(&dmabuf->poll.lock);
>>> if (events & EPOLLIN) {
>>> - /* Paired with fput in dma_buf_poll_cb */
>>> - get_file(dmabuf->file);
>>> + /*
>>> + * Paired with fput in dma_buf_poll_cb,
>>> + * Skip poll for the closed file.
>>> + */
>>> + if (!get_file_rcu(&dmabuf->file)) {
>>> + events &= ~EPOLLIN;
>>> + dcb->active = 0;
>>> + goto clear_in_event;
>>> + }
>>> if (!dma_buf_poll_add_cb(resv, false, dcb))
>>> /* No callback queued, wake up any other waiters */
>>> @@ -300,6 +319,7 @@ static __poll_t dma_buf_poll(struct file *file,
>>> poll_table *poll)
>>> }
>>> }
>>> +clear_in_event:
>>> dma_resv_unlock(resv);
>>> return events;
>>> }
>>
>
Am 01.04.24 um 14:39 schrieb Tvrtko Ursulin:
>
> On 29/03/2024 00:00, T.J. Mercier wrote:
>> On Thu, Mar 28, 2024 at 7:53 AM Tvrtko Ursulin <tursulin(a)igalia.com>
>> wrote:
>>>
>>> From: Tvrtko Ursulin <tursulin(a)ursulin.net>
>>>
>>> There is no point in compiling in the list and mutex operations
>>> which are
>>> only used from the dma-buf debugfs code, if debugfs is not compiled in.
>>>
>>> Put the code in questions behind some kconfig guards and so save
>>> some text
>>> and maybe even a pointer per object at runtime when not enabled.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tursulin(a)ursulin.net>
>>
>> Reviewed-by: T.J. Mercier <tjmercier(a)google.com>
>
> Thanks!
>
> How would patches to dma-buf be typically landed? Via what tree I
> mean? drm-misc-next?
That should go through drm-misc-next.
And feel free to add Reviewed-by: Christian König
<christian.koenig(a)amd.com> as well.
Regards,
Christian.
>
> Regards,
>
> Tvrtko