Linaro-mm-sig October 2025

linaro-mm-sig@lists.linaro.org

49 participants
57 discussions

Re: [PATCH 0/2] optimization of dma-buf system_heap allocation

by Matthew Wilcox

On Tue, Oct 14, 2025 at 04:32:28PM +0800, zhaoyang.huang wrote: > From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com> > > This series of patches would like to introduce alloc_pages_bulk_list in > dma-buf which need to call back the API for page allocation. Start with the problem you're trying to solve.

8 months, 2 weeks

Re: [PATCH 2/2] driver: dma-buf: use alloc_pages_bulk_list for order-0 allocation

by Christian König

On 14.10.25 14:44, Zhaoyang Huang wrote: > On Tue, Oct 14, 2025 at 7:59 PM Christian König > <christian.koenig(a)amd.com> wrote: >> >> On 14.10.25 10:32, zhaoyang.huang wrote: >>> From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com> >>> >>> The size of once dma-buf allocation could be dozens MB or much more >>> which introduce a loop of allocating several thousands of order-0 pages. >>> Furthermore, the concurrent allocation could have dma-buf allocation enter >>> direct-reclaim during the loop. This commit would like to eliminate the >>> above two affections by introducing alloc_pages_bulk_list in dma-buf's >>> order-0 allocation. This patch is proved to be conditionally helpful >>> in 18MB allocation as decreasing the time from 24604us to 6555us and no >>> harm when bulk allocation can't be done(fallback to single page >>> allocation) >> >> Well that sounds like an absolutely horrible idea. >> >> See the handling of allocating only from specific order is *exactly* there to avoid the behavior of bulk allocation. >> >> What you seem to do with this patch here is to add on top of the behavior to avoid allocating large chunks from the buddy the behavior to allocate large chunks from the buddy because that is faster. > emm, this patch doesn't change order-8 and order-4's allocation > behaviour but just to replace the loop of order-0 allocations into > once bulk allocation in the fallback way. What is your concern about > this? As far as I know the bulk allocation favors splitting large pages into smaller ones instead of allocating smaller pages first. That's where the performance benefit comes from. But that is exactly what we try to avoid here by allocating only certain order of pages. Regards, Christian. >> >> So this change here doesn't looks like it will fly very high. Please explain what you're actually trying to do, just optimize allocation time? >> >> Regards, >> Christian. >> >>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com> >>> --- >>> drivers/dma-buf/heaps/system_heap.c | 36 +++++++++++++++++++---------- >>> 1 file changed, 24 insertions(+), 12 deletions(-) >>> >>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c >>> index bbe7881f1360..71b028c63bd8 100644 >>> --- a/drivers/dma-buf/heaps/system_heap.c >>> +++ b/drivers/dma-buf/heaps/system_heap.c >>> @@ -300,8 +300,8 @@ static const struct dma_buf_ops system_heap_buf_ops = { >>> .release = system_heap_dma_buf_release, >>> }; >>> >>> -static struct page *alloc_largest_available(unsigned long size, >>> - unsigned int max_order) >>> +static void alloc_largest_available(unsigned long size, >>> + unsigned int max_order, unsigned int *num_pages, struct list_head *list) >>> { >>> struct page *page; >>> int i; >>> @@ -312,12 +312,19 @@ static struct page *alloc_largest_available(unsigned long size, >>> if (max_order < orders[i]) >>> continue; >>> >>> - page = alloc_pages(order_flags[i], orders[i]); >>> - if (!page) >>> + if (orders[i]) { >>> + page = alloc_pages(order_flags[i], orders[i]); >>> + if (page) { >>> + list_add(&page->lru, list); >>> + *num_pages = 1; >>> + } >>> + } else >>> + *num_pages = alloc_pages_bulk_list(LOW_ORDER_GFP, size / PAGE_SIZE, list); >>> + >>> + if (list_empty(list)) >>> continue; >>> - return page; >>> + return; >>> } >>> - return NULL; >>> } >>> >>> static struct dma_buf *system_heap_allocate(struct dma_heap *heap, >>> @@ -335,6 +342,8 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, >>> struct list_head pages; >>> struct page *page, *tmp_page; >>> int i, ret = -ENOMEM; >>> + unsigned int num_pages; >>> + LIST_HEAD(head); >>> >>> buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); >>> if (!buffer) >>> @@ -348,6 +357,8 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, >>> INIT_LIST_HEAD(&pages); >>> i = 0; >>> while (size_remaining > 0) { >>> + num_pages = 0; >>> + INIT_LIST_HEAD(&head); >>> /* >>> * Avoid trying to allocate memory if the process >>> * has been killed by SIGKILL >>> @@ -357,14 +368,15 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, >>> goto free_buffer; >>> } >>> >>> - page = alloc_largest_available(size_remaining, max_order); >>> - if (!page) >>> + alloc_largest_available(size_remaining, max_order, &num_pages, &head); >>> + if (!num_pages) >>> goto free_buffer; >>> >>> - list_add_tail(&page->lru, &pages); >>> - size_remaining -= page_size(page); >>> - max_order = compound_order(page); >>> - i++; >>> + list_splice_tail(&head, &pages); >>> + max_order = folio_order(lru_to_folio(&head)); >>> + size_remaining -= PAGE_SIZE * (num_pages << max_order); >>> + i += num_pages; >>> + >>> } >>> >>> table = &buffer->sg_table; >>

8 months, 2 weeks

Re: [PATCH 2/2] driver: dma-buf: use alloc_pages_bulk_list for order-0 allocation

by Christian König

On 14.10.25 10:32, zhaoyang.huang wrote: > From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com> > > The size of once dma-buf allocation could be dozens MB or much more > which introduce a loop of allocating several thousands of order-0 pages. > Furthermore, the concurrent allocation could have dma-buf allocation enter > direct-reclaim during the loop. This commit would like to eliminate the > above two affections by introducing alloc_pages_bulk_list in dma-buf's > order-0 allocation. This patch is proved to be conditionally helpful > in 18MB allocation as decreasing the time from 24604us to 6555us and no > harm when bulk allocation can't be done(fallback to single page > allocation) Well that sounds like an absolutely horrible idea. See the handling of allocating only from specific order is *exactly* there to avoid the behavior of bulk allocation. What you seem to do with this patch here is to add on top of the behavior to avoid allocating large chunks from the buddy the behavior to allocate large chunks from the buddy because that is faster. So this change here doesn't looks like it will fly very high. Please explain what you're actually trying to do, just optimize allocation time? Regards, Christian. > Signed-off-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com> > --- > drivers/dma-buf/heaps/system_heap.c | 36 +++++++++++++++++++---------- > 1 file changed, 24 insertions(+), 12 deletions(-) > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c > index bbe7881f1360..71b028c63bd8 100644 > --- a/drivers/dma-buf/heaps/system_heap.c > +++ b/drivers/dma-buf/heaps/system_heap.c > @@ -300,8 +300,8 @@ static const struct dma_buf_ops system_heap_buf_ops = { > .release = system_heap_dma_buf_release, > }; > > -static struct page *alloc_largest_available(unsigned long size, > - unsigned int max_order) > +static void alloc_largest_available(unsigned long size, > + unsigned int max_order, unsigned int *num_pages, struct list_head *list) > { > struct page *page; > int i; > @@ -312,12 +312,19 @@ static struct page *alloc_largest_available(unsigned long size, > if (max_order < orders[i]) > continue; > > - page = alloc_pages(order_flags[i], orders[i]); > - if (!page) > + if (orders[i]) { > + page = alloc_pages(order_flags[i], orders[i]); > + if (page) { > + list_add(&page->lru, list); > + *num_pages = 1; > + } > + } else > + *num_pages = alloc_pages_bulk_list(LOW_ORDER_GFP, size / PAGE_SIZE, list); > + > + if (list_empty(list)) > continue; > - return page; > + return; > } > - return NULL; > } > > static struct dma_buf *system_heap_allocate(struct dma_heap *heap, > @@ -335,6 +342,8 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, > struct list_head pages; > struct page *page, *tmp_page; > int i, ret = -ENOMEM; > + unsigned int num_pages; > + LIST_HEAD(head); > > buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); > if (!buffer) > @@ -348,6 +357,8 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, > INIT_LIST_HEAD(&pages); > i = 0; > while (size_remaining > 0) { > + num_pages = 0; > + INIT_LIST_HEAD(&head); > /* > * Avoid trying to allocate memory if the process > * has been killed by SIGKILL > @@ -357,14 +368,15 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, > goto free_buffer; > } > > - page = alloc_largest_available(size_remaining, max_order); > - if (!page) > + alloc_largest_available(size_remaining, max_order, &num_pages, &head); > + if (!num_pages) > goto free_buffer; > > - list_add_tail(&page->lru, &pages); > - size_remaining -= page_size(page); > - max_order = compound_order(page); > - i++; > + list_splice_tail(&head, &pages); > + max_order = folio_order(lru_to_folio(&head)); > + size_remaining -= PAGE_SIZE * (num_pages << max_order); > + i += num_pages; > + > } > > table = &buffer->sg_table;

8 months, 2 weeks

[PATCH 1/2] dma-buf: replace "#if" with just "if"

by Christian König

No need to conditional compile that code, let the compilers dead code elimination handle it instead. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/dma-buf.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 2bcf9ceca997..2305bb2cc1f1 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1141,8 +1141,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, } mangle_sg_table(sg_table); -#ifdef CONFIG_DMA_API_DEBUG - { + if (IS_ENABLED(CONFIG_DMA_API_DEBUG)) { struct scatterlist *sg; u64 addr; int len; @@ -1154,10 +1153,10 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(len)) { pr_debug("%s: addr %llx or len %x is not page aligned!\n", __func__, addr, len); + break; } } } -#endif /* CONFIG_DMA_API_DEBUG */ return sg_table; error_unmap: -- 2.43.0

8 months, 2 weeks

Re: [PATCH v3 2/2] accel: Add Arm Ethos-U NPU driver

by Rob Herring

On Mon, Sep 29, 2025 at 2:22 PM Frank Li <Frank.li(a)nxp.com> wrote: > > On Fri, Sep 26, 2025 at 03:00:49PM -0500, Rob Herring (Arm) wrote: > > Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a > > relatively simple interface with single command stream to describe > > buffers, operation settings, and network operations. It supports up to 8 > > memory regions (though no h/w bounds on a region). The Ethos NPUs > > are designed to use an SRAM for scratch memory. Region 2 is reserved > > for SRAM (like the downstream driver stack and compiler). Userspace > > doesn't need access to the SRAM. > > > ... > > + > > +static int ethosu_init(struct ethosu_device *ethosudev) > > +{ > > + int ret; > > + u32 id, config; > > + > > + ret = devm_pm_runtime_enable(ethosudev->base.dev); > > + if (ret) > > + return ret; > > + > > + ret = pm_runtime_resume_and_get(ethosudev->base.dev); > > + if (ret) > > + return ret; > > + > > + pm_runtime_set_autosuspend_delay(ethosudev->base.dev, 50); > > + pm_runtime_use_autosuspend(ethosudev->base.dev); > > pm_runtime_use_autosuspend() should be after last register read > readl_relaxed(ethosudev->regs + NPU_REG_CONFIG); > > incase schedule happen between pm_runtime_use_autosuspend(ethosudev->base.dev); > and readl(). All the call does is enable autosuspend. I don't think it matters exactly when we enable it. We already did a get preventing autosuspend until we do a put. > > + /* If PM is disabled, we need to call ethosu_device_resume() manually. */ > > + if (!IS_ENABLED(CONFIG_PM)) { > > + ret = ethosu_device_resume(ethosudev->base.dev); > > + if (ret) > > + return ret; > > + } > > + > > + ethosudev->npu_info.id = id = readl_relaxed(ethosudev->regs + NPU_REG_ID); > > + ethosudev->npu_info.config = config = readl_relaxed(ethosudev->regs + NPU_REG_CONFIG); > ... > > + > > +/** > > + * ethosu_gem_create_with_handle() - Create a GEM object and attach it to a handle. > > + * @file: DRM file. > > + * @ddev: DRM device. > > + * @size: Size of the GEM object to allocate. > > + * @flags: Combination of drm_ethosu_bo_flags flags. > > + * @handle: Pointer holding the handle pointing to the new GEM object. > > + * > > + * Return: Zero on success > > + */ > > +int ethosu_gem_create_with_handle(struct drm_file *file, > > + struct drm_device *ddev, > > + u64 *size, u32 flags, u32 *handle) > > +{ > > + int ret; > > + struct drm_gem_dma_object *mem; > > + struct ethosu_gem_object *bo; > > move 'ret' here to keep reverise christmas tree order. Is that the order DRM likes? It's got to be the dumbest coding standard we have. > > + > > + mem = drm_gem_dma_create(ddev, *size); > > + if (IS_ERR(mem)) > > + return PTR_ERR(mem); > > + > > + bo = to_ethosu_bo(&mem->base); > > + bo->flags = flags; > > + > > + /* > > + * Allocate an id of idr table where the obj is registered > > + * and handle has the id what user can see. > > + */ > > + ret = drm_gem_handle_create(file, &mem->base, handle); > > + if (!ret) > > + *size = bo->base.base.size; > > + > > + /* drop reference from allocate - handle holds it now. */ > > + drm_gem_object_put(&mem->base); > > + > > + return ret; > > +} > > + > ... > > + > > +static void cmd_state_init(struct cmd_state *st) > > +{ > > + /* Initialize to all 1s to detect missing setup */ > > + memset(st, 0xff, sizeof(*st)); > > +} > > + > > +static u64 cmd_to_addr(u32 *cmd) > > +{ > > + return ((u64)((cmd[0] & 0xff0000) << 16)) | cmd[1]; > > will FIELD_PREP helpful? Like this?: return ((u64)FIELD_PREP(GENMASK(23, 16), cmd[0]) << 32) | cmd[1]; Questionable to me if that's better... > > > +} > > + > > +static u64 dma_length(struct ethosu_validated_cmdstream_info *info, > > + struct dma_state *dma_st, struct dma *dma) > > +{ > > + s8 mode = dma_st->mode; > > + u64 len = dma->len; > > + > > + if (mode >= 1) { > > + len += dma->stride[0]; > > + len *= dma_st->size0; > > + } > > + if (mode == 2) { > > + len += dma->stride[1]; > > + len *= dma_st->size1; > > + } > > + if (dma->region >= 0) > > + info->region_size[dma->region] = max(info->region_size[dma->region], > > + len + dma->offset); > > + > > + return len; > > +} > > + > ... > > > + > > +static void ethosu_job_handle_irq(struct ethosu_device *dev) > > +{ > > + u32 status; > > + > > + pm_runtime_mark_last_busy(dev->base.dev); > > I think don't need pm_runtime_mark_last_busy() here because > pm_runtime_put_autosuspend() already call pm_runtime_mark_last_busy(). > > only mark last busy without pm_runtime_put() can't affect run time pm > state, still in active state. Yes, agreed. Copied from rocket, so Tomeu, you may want to look at that. > > > + > > + status = readl_relaxed(dev->regs + NPU_REG_STATUS); > > + > > + if (status & (STATUS_BUS_STATUS | STATUS_CMD_PARSE_ERR)) { > > + dev_err(dev->base.dev, "Error IRQ - %x\n", status); > > + drm_sched_fault(&dev->sched); > > + return; > > + } > > + > > + scoped_guard(mutex, &dev->job_lock) { > > + if (dev->in_flight_job) { > > + dma_fence_signal(dev->in_flight_job->done_fence); > > + pm_runtime_put_autosuspend(dev->base.dev); > > + dev->in_flight_job = NULL; > > + } > > + } > > +} > > + > ... > > + > > +int ethosu_job_init(struct ethosu_device *dev) > > +{ > > + struct drm_sched_init_args args = { > > + .ops = &ethosu_sched_ops, > > + .num_rqs = DRM_SCHED_PRIORITY_COUNT, > > + .credit_limit = 1, > > + .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS), > > + .name = dev_name(dev->base.dev), > > + .dev = dev->base.dev, > > + }; > > + int ret; > > + > > + spin_lock_init(&dev->fence_lock); > > + mutex_init(&dev->job_lock); > > + mutex_init(&dev->sched_lock); > > now perfer use dev_mutex_init(). $ git grep dev_mutex_init (END) Huh??

8 months, 3 weeks

Re: [PATCH v3 1/2] dt-bindings: npu: Add Arm Ethos-U65/U85

by Rob Herring

On Mon, Sep 29, 2025 at 1:54 PM Frank Li <Frank.li(a)nxp.com> wrote: > > On Fri, Sep 26, 2025 at 03:00:48PM -0500, Rob Herring (Arm) wrote: > > Add a binding schema for Arm Ethos-U65/U85 NPU. The Arm Ethos-U NPUs are > > designed for edge AI inference applications. > > > > Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org> > > --- > > .../devicetree/bindings/npu/arm,ethos.yaml | 79 ++++++++++++++++++++++ > > 1 file changed, 79 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/npu/arm,ethos.yaml b/Documentation/devicetree/bindings/npu/arm,ethos.yaml > > new file mode 100644 > > index 000000000000..716c4997f976 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/npu/arm,ethos.yaml > > @@ -0,0 +1,79 @@ > > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause > > +%YAML 1.2 > > +--- > > +$id: http://devicetree.org/schemas/npu/arm,ethos.yaml# > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > + > > +title: Arm Ethos U65/U85 > > + > > +maintainers: > > + - Rob Herring <robh(a)kernel.org> > > + > > +description: > > > + The Arm Ethos-U NPUs are designed for IoT inference applications. The NPUs > > + can accelerate 8-bit and 16-bit integer quantized networks: > > + > > + Transformer networks (U85 only) > > + Convolutional Neural Networks (CNN) > > + Recurrent Neural Networks (RNN) > > + > > + Further documentation is available here: > > + > > + U65 TRM: https://developer.arm.com/documentation/102023/ > > + U85 TRM: https://developer.arm.com/documentation/102685/ > > + > > +properties: > > + compatible: > > + oneOf: > > + - items: > > + - enum: > > + - fsl,imx93-npu > > + - const: arm,ethos-u65 > > + - items: > > + - {} > > what's means {} here ?, just not allow arm,ethos-u85 alone? Yes, u85 support is currently on a FVP model. The naming for it isn't really clear yet nor is it clear if it ever will be. So really just a placeholder until there is a chip using it. It keeps folks from using just the fallback. > > Reviewed-by: Frank Li <Frank.Li(a)nxp.com> Thanks, Rob

8 months, 3 weeks

Re: [PATCH] dma-buf: use SB_I_NOEXEC and SB_I_NODEV

by Christoph Hellwig

On Tue, Oct 07, 2025 at 11:10:32PM -0700, Kees Cook wrote: > The dma-buf pseudo-filesystem should never have executable mappings nor > device nodes. Set SB_I_NOEXEC and SB_I_NODEV on the superblock to enforce > this at the filesystem level, similar to secretmem, commit 98f99394a104 > ("secretmem: use SB_I_NOEXEC"). > > Fix the syzbot-reported warning from the exec code to enforce this > requirement: Can you please just enforce this in init_pseudo? If a file system really wants to support devices or executable it can clear them, but a quick grep suggests that none of them should.

8 months, 3 weeks

Re: [PATCH v17 35/47] i2c: rename wait_for_completion callback to wait_for_completion_cb

by Wolfram Sang

On Thu, Oct 02, 2025 at 05:12:35PM +0900, Byungchul Park wrote: > Functionally no change. This patch is a preparation for DEPT(DEPendency > Tracker) to track dependencies related to a scheduler API, > wait_for_completion(). > > Unfortunately, struct i2c_algo_pca_data has a callback member named > wait_for_completion, that is the same as the scheduler API, which makes > it hard to change the scheduler API to a macro form because of the > ambiguity. > > Add a postfix _cb to the callback member to remove the ambiguity. > > Signed-off-by: Byungchul Park <byungchul(a)sk.com> This patch seems reasonable in any case. I'll pick it, so you have one dependency less. Good luck with the series! Applied to for-next, thanks!

8 months, 3 weeks

Re: [PATCH v17 09/47] arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64

by Mark Brown

On Fri, Oct 03, 2025 at 10:46:41AM +0900, Byungchul Park wrote: > On Thu, Oct 02, 2025 at 12:39:31PM +0100, Mark Brown wrote: > > On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote: > > > dept needs to notice every entrance from user to kernel mode to treat > > > every kernel context independently when tracking wait-event dependencies. > > > Roughly, system call and user oriented fault are the cases. > > > Make dept aware of the entrances of arm64 and add support > > > CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64. > > The description of what needs to be tracked probably needs some > > tightening up here, it's not clear to me for example why exceptions for > > mops or the vector extensions aren't included here, or what the > > distinction is with error faults like BTI or GCS not being tracked? > Thanks for the feedback but I'm afraid I don't get you. Can you explain > in more detail with example? Your commit log says we need to track every entrance from user mode to kernel mode but the code only adds tracking to syscalls and some memory faults. The exception types listed above (and some others) also result in entries to the kernel from userspace. > JFYI, pairs of wait and its event need to be tracked to see if each > event can be prevented from being reachable by other waits like: > context X context Y > > lock L > ... > initiate event A context start toward event A > ... ... > wait A // wait for event A and lock L // wait for unlock L and > // prevent unlock L // prevent event A > ... ... > unlock L unlock L > ... > event A > I meant things like this need to be tracked. I don't think that's at all clear from the above context, and the handling for some of the above exception types (eg, the vector extensions) includes taking locks.

8 months, 4 weeks

Re: [PATCH v17 28/47] dept: add documentation for dept

by Bagas Sanjaya

On Thu, Oct 02, 2025 at 05:12:28PM +0900, Byungchul Park wrote: > This document describes the concept and APIs of dept. > > Signed-off-by: Byungchul Park <byungchul(a)sk.com> > --- > Documentation/dependency/dept.txt | 735 ++++++++++++++++++++++++++ > Documentation/dependency/dept_api.txt | 117 ++++ > 2 files changed, 852 insertions(+) > create mode 100644 Documentation/dependency/dept.txt > create mode 100644 Documentation/dependency/dept_api.txt What about writing dept docs in reST (like the rest of kernel documentation)? ---- >8 ---- diff --git a/Documentation/dependency/dept.txt b/Documentation/locking/dept.rst similarity index 92% rename from Documentation/dependency/dept.txt rename to Documentation/locking/dept.rst index 5dd358b96734e6..7b90a0d95f0876 100644 --- a/Documentation/dependency/dept.txt +++ b/Documentation/locking/dept.rst @@ -8,7 +8,7 @@ How lockdep works Lockdep detects a deadlock by checking lock acquisition order. For example, a graph to track acquisition order built by lockdep might look -like: +like:: A -> B - \ @@ -16,12 +16,12 @@ like: / C -> D - - where 'A -> B' means that acquisition A is prior to acquisition B - with A still held. +where 'A -> B' means that acquisition A is prior to acquisition B +with A still held. Lockdep keeps adding each new acquisition order into the graph in runtime. For example, 'E -> C' will be added when the two locks have -been acquired in the order, E and then C. The graph will look like: +been acquired in the order, E and then C. The graph will look like:: A -> B - \ @@ -32,10 +32,10 @@ been acquired in the order, E and then C. The graph will look like: \ / ------------------ - where 'A -> B' means that acquisition A is prior to acquisition B - with A still held. +where 'A -> B' means that acquisition A is prior to acquisition B +with A still held. -This graph contains a subgraph that demonstrates a loop like: +This graph contains a subgraph that demonstrates a loop like:: -> E - / \ @@ -67,6 +67,8 @@ mechanisms, lockdep doesn't work. Can lockdep detect the following deadlock? +:: + context X context Y context Z mutex_lock A @@ -80,6 +82,8 @@ Can lockdep detect the following deadlock? No. What about the following? +:: + context X context Y mutex_lock A @@ -101,7 +105,7 @@ What leads a deadlock --------------------- A deadlock occurs when one or multi contexts are waiting for events that -will never happen. For example: +will never happen. For example:: context X context Y context Z @@ -121,24 +125,24 @@ We call this *deadlock*. If an event occurrence is a prerequisite to reaching another event, we call it *dependency*. In this example: - Event A occurrence is a prerequisite to reaching event C. - Event C occurrence is a prerequisite to reaching event B. - Event B occurrence is a prerequisite to reaching event A. + * Event A occurrence is a prerequisite to reaching event C. + * Event C occurrence is a prerequisite to reaching event B. + * Event B occurrence is a prerequisite to reaching event A. In terms of dependency: - Event C depends on event A. - Event B depends on event C. - Event A depends on event B. + * Event C depends on event A. + * Event B depends on event C. + * Event A depends on event B. -Dependency graph reflecting this example will look like: +Dependency graph reflecting this example will look like:: -> C -> A -> B - / \ \ / ---------------- - where 'A -> B' means that event A depends on event B. +where 'A -> B' means that event A depends on event B. A circular dependency exists. Such a circular dependency leads a deadlock since no waiters can have desired events triggered. @@ -152,7 +156,7 @@ Introduce DEPT -------------- DEPT(DEPendency Tracker) tracks wait and event instead of lock -acquisition order so as to recognize the following situation: +acquisition order so as to recognize the following situation:: context X context Y context Z @@ -165,18 +169,18 @@ acquisition order so as to recognize the following situation: event A and builds up a dependency graph in runtime that is similar to lockdep. -The graph might look like: +The graph might look like:: -> C -> A -> B - / \ \ / ---------------- - where 'A -> B' means that event A depends on event B. +where 'A -> B' means that event A depends on event B. DEPT keeps adding each new dependency into the graph in runtime. For example, 'B -> D' will be added when event D occurrence is a -prerequisite to reaching event B like: +prerequisite to reaching event B like:: | v @@ -184,7 +188,7 @@ prerequisite to reaching event B like: . event B -After the addition, the graph will look like: +After the addition, the graph will look like:: -> D / @@ -209,6 +213,8 @@ How DEPT works Let's take a look how DEPT works with the 1st example in the section 'Limitation of lockdep'. +:: + context X context Y context Z mutex_lock A @@ -220,7 +226,7 @@ Let's take a look how DEPT works with the 1st example in the section mutex_unlock A mutex_unlock A -Adding comments to describe DEPT's view in terms of wait and event: +Adding comments to describe DEPT's view in terms of wait and event:: context X context Y context Z @@ -248,7 +254,7 @@ Adding comments to describe DEPT's view in terms of wait and event: mutex_unlock A /* event A */ -Adding more supplementary comments to describe DEPT's view in detail: +Adding more supplementary comments to describe DEPT's view in detail:: context X context Y context Z @@ -283,7 +289,7 @@ Adding more supplementary comments to describe DEPT's view in detail: mutex_unlock A /* event A that's been valid since 4 */ -Let's build up dependency graph with this example. Firstly, context X: +Let's build up dependency graph with this example. Firstly, context X:: context X @@ -292,7 +298,7 @@ Let's build up dependency graph with this example. Firstly, context X: /* start to take into account event B's context */ /* 2 */ -There are no events to create dependency. Next, context Y: +There are no events to create dependency. Next, context Y:: context Y @@ -317,13 +323,13 @@ waits between 3 and the event, event B does not create dependency. For event A, there is a wait, folio_lock B, between 1 and the event. Which means event A cannot be triggered if event B does not wake up the wait. Therefore, we can say event A depends on event B, say, 'A -> B'. The -graph will look like after adding the dependency: +graph will look like after adding the dependency:: A -> B - where 'A -> B' means that event A depends on event B. +where 'A -> B' means that event A depends on event B. -Lastly, context Z: +Lastly, context Z:: context Z @@ -343,7 +349,7 @@ wait, mutex_lock A, between 2 and the event - remind 2 is at a very start and before the wait in timeline. Which means event B cannot be triggered if event A does not wake up the wait. Therefore, we can say event B depends on event A, say, 'B -> A'. The graph will look like -after adding the dependency: +after adding the dependency:: -> A -> B - / \ @@ -367,6 +373,8 @@ Interpret DEPT report The following is the example in the section 'How DEPT works'. +:: + context X context Y context Z mutex_lock A @@ -402,7 +410,7 @@ The following is the example in the section 'How DEPT works'. We can Simplify this by replacing each waiting point with [W], each point where its event's context starts with [S] and each event with [E]. -This example will look like after the replacement: +This example will look like after the replacement:: context X context Y context Z @@ -419,6 +427,8 @@ This example will look like after the replacement: DEPT uses the symbols [W], [S] and [E] in its report as described above. The following is an example reported by DEPT for a real problem. +:: + Link: https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SA… Link: https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.pa… @@ -620,6 +630,8 @@ The following is an example reported by DEPT for a real problem. Let's take a look at the summary that is the most important part. +:: + --------------------------------------------------- summary --------------------------------------------------- @@ -639,7 +651,7 @@ Let's take a look at the summary that is the most important part. [W]: the wait blocked [E]: the event not reachable -The summary shows the following scenario: +The summary shows the following scenario:: context A context B context ?(unknown) @@ -652,7 +664,7 @@ The summary shows the following scenario: [E] unlock(&ni->ni_lock:0) -Adding supplementary comments to describe DEPT's view in detail: +Adding supplementary comments to describe DEPT's view in detail:: context A context B context ?(unknown) @@ -677,7 +689,7 @@ Adding supplementary comments to describe DEPT's view in detail: [E] unlock(&ni->ni_lock:0) /* event that's been valid since 2 */ -Let's build up dependency graph with this report. Firstly, context A: +Let's build up dependency graph with this report. Firstly, context A:: context A @@ -697,13 +709,13 @@ wait, folio_lock(&f1), between 2 and the event. Which means unlock(&ni->ni_lock:0) is not reachable if folio_unlock(&f1) does not wake up the wait. Therefore, we can say unlock(&ni->ni_lock:0) depends on folio_unlock(&f1), say, 'unlock(&ni->ni_lock:0) -> folio_unlock(&f1)'. -The graph will look like after adding the dependency: +The graph will look like after adding the dependency:: unlock(&ni->ni_lock:0) -> folio_unlock(&f1) - where 'A -> B' means that event A depends on event B. +where 'A -> B' means that event A depends on event B. -Secondly, context B: +Secondly, context B:: context B @@ -719,14 +731,14 @@ very start and before the wait in timeline. Which means folio_unlock(&f1) is not reachable if unlock(&ni->ni_lock:0) does not wake up the wait. Therefore, we can say folio_unlock(&f1) depends on unlock(&ni->ni_lock:0), say, 'folio_unlock(&f1) -> unlock(&ni->ni_lock:0)'. The graph will look -like after adding the dependency: +like after adding the dependency:: -> unlock(&ni->ni_lock:0) -> folio_unlock(&f1) - / \ \ / ------------------------------------------------ - where 'A -> B' means that event A depends on event B. +where 'A -> B' means that event A depends on event B. A new loop has been created. So DEPT can report it as a deadlock! Cool! diff --git a/Documentation/dependency/dept_api.txt b/Documentation/locking/dept_api.rst similarity index 97% rename from Documentation/dependency/dept_api.txt rename to Documentation/locking/dept_api.rst index 8e0d5a118a460e..96c4d65f4a9a2d 100644 --- a/Documentation/dependency/dept_api.txt +++ b/Documentation/locking/dept_api.rst @@ -10,6 +10,8 @@ already applied into the existing synchronization primitives e.g. waitqueue, swait, wait_for_completion(), dma fence and so on. The basic APIs of SDT are: +.. code-block:: c + /* * After defining 'struct dept_map map', initialize the instance. */ @@ -27,6 +29,8 @@ APIs of SDT are: The advanced APIs of SDT are: +.. code-block:: c + /* * After defining 'struct dept_map map', initialize the instance * using an external key. @@ -83,6 +87,8 @@ Do not use these APIs directly. These are the wrappers for typical locks, that have been already applied into major locks internally e.g. spin lock, mutex, rwlock and so on. The APIs of LDT are: +.. code-block:: c + ldt_init(map, key, sub, name); ldt_lock(map, sub_local, try, nest, ip); ldt_rlock(map, sub_local, try, nest, ip, queued); @@ -96,6 +102,8 @@ Raw APIs -------- Do not use these APIs directly. The raw APIs of dept are: +.. code-block:: c + dept_free_range(start, size); dept_map_init(map, key, sub, name); dept_map_reinit(map, key, sub, name); diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst index 6a9ea96c8bcb70..7ec3dce7fee425 100644 --- a/Documentation/locking/index.rst +++ b/Documentation/locking/index.rst @@ -24,6 +24,8 @@ Locking percpu-rw-semaphore robust-futexes robust-futex-ABI + dept + dept_api .. only:: subproject and html > +Can lockdep detect the following deadlock? > + > + context X context Y context Z > + > + mutex_lock A > + folio_lock B > + folio_lock B <- DEADLOCK > + mutex_lock A <- DEADLOCK > + folio_unlock B > + folio_unlock B > + mutex_unlock A > + mutex_unlock A > + > +No. What about the following? > + > + context X context Y > + > + mutex_lock A > + mutex_lock A <- DEADLOCK > + wait_for_complete B <- DEADLOCK > + complete B > + mutex_unlock A > + mutex_unlock A Can you explain how DEPT detects deadlock on the second example above (like the first one being described in "How DEPT works" section)? Confused... -- An old man doll... just what I always wanted! - Clara

8 months, 4 weeks

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig October 2025