- Linaro-mm-sig - lists.linaro.org

Re: [PATCH v1 1/5] dt-bindings: dmaengine: qcom: gpi: Add additional arg to dma-cell property

by Rob Herring (Arm)

On Tue, 15 Oct 2024 17:37:46 +0530, Jyothi Kumar Seerapu wrote: > When high performance with multiple i2c messages in a single transfer > is required, employ Block Event Interrupt (BEI) to trigger interrupts > after specific messages transfer and the last message transfer, > thereby reducing interrupts. > > For each i2c message transfer, a series of Transfer Request Elements(TREs) > must be programmed, including config tre for frequency configuration, > go tre for holding i2c address and dma tre for holding dma buffer address, > length as per the hardware programming guide. For transfer using BEI, > multiple I2C messages may necessitate the preparation of config, go, > and tx DMA TREs. However, a channel TRE size of 64 is often insufficient, > potentially leading to failures due to inadequate memory space. > > Add additional argument to dma-cell property for channel TRE size. > With this, adjust the channel TRE size via the device tree. > The default size is 64, but clients can modify this value based on > their specific requirements. > > Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu(a)quicinc.com> > --- > Documentation/devicetree/bindings/dma/qcom,gpi.yaml | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > My bot found errors running 'make dt_binding_check' on your patch: yamllint warnings/errors: dtschema/dtc warnings/errors: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/dma/qcom,gpi.yaml: properties:#dma-cells: 'minItems' is not one of ['description', 'deprecated', 'const', 'enum', 'minimum', 'maximum', 'multipleOf', 'default', '$ref', 'oneOf'] from schema $id: http://devicetree.org/meta-schemas/core.yaml# /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/dma/qcom,gpi.yaml: properties:#dma-cells: 'maxItems' is not one of ['description', 'deprecated', 'const', 'enum', 'minimum', 'maximum', 'multipleOf', 'default', '$ref', 'oneOf'] from schema $id: http://devicetree.org/meta-schemas/core.yaml# doc reference errors (make refcheckdocs): See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/202410151207… The base for the series is generally the latest rc1. A different dependency should be noted in *this* patch. If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date: pip3 install dtschema --upgrade Please check and re-submit after running the above command yourself. Note that DT_SCHEMA_FILES can be set to your schema file to speed up checking your schema. However, it must be unset to test all examples with your schema.

1 year, 4 months

1
0
0 0

Re: [PATCH v2] dma-buf: fix S_IRUGO to 0444, block comments, func declaration

by John Stultz

On Sat, Oct 5, 2024 at 11:10 AM Pintu Kumar <quic_pintu(a)quicinc.com> wrote: > > These warnings/errors are reported by checkpatch. > Fix them with minor changes to make it clean. > No other functional changes. > > WARNING: Block comments use * on subsequent lines > + /* only support discovering the end of the buffer, > + but also allow SEEK_SET to maintain the idiomatic > > WARNING: Block comments use a trailing */ on a separate line > + SEEK_END(0), SEEK_CUR(0) pattern */ > > WARNING: Block comments use a trailing */ on a separate line > + * before passing the sgt back to the exporter. */ > > ERROR: "foo * bar" should be "foo *bar" > +static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, > > WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. > + d = debugfs_create_file("bufinfo", S_IRUGO, dma_buf_debugfs_dir, > > total: 1 errors, 4 warnings, 1746 lines checked > > Signed-off-by: Pintu Kumar <quic_pintu(a)quicinc.com> Looks ok to me. Thanks for sending these cleanups! Acked-by: John Stultz <jstultz(a)google.com>

1 year, 4 months

1
0
0 0

Notes from the XDC 2024 "universal buffer allocator" workshop

by Laurent Pinchart

Hello, We held a workshop at XDC 2024 titled "Towards a universal buffer allocator for Linux", whose abstract was as follows: Buffer allocation for media contents, despite being required for any framework or application dealing with image capture, processing, decoding, encoding, rendering and display, remains an area plagued by many unsolved problems. Over time improvements have been made to APIs for buffer allocation, both on the kernel side (standardization of the DRM dumb buffer API, or DMA heaps, to name a few) and in userspace (most notably with GBM, and the buffer management API in Vulkan), or for specific use cases (e.g. gralloc in Android). Unfortunately, no universal solution exists to allocate buffers shared by multiple devices. This is hindering interoperability and forces userspace to pile hacks and workarounds. (https://indico.freedesktop.org/event/6/contributions/395/). Here are the raw notes from the workshop. XDC 2024 - Buffer allocation workshop ===================================== Attendees: - Erico Nunes <ernunes(a)redhat.com> - James Jones <jajones(a)nvidia.com> - Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> - Nicolas Dufresne <nicolas(a)collabora.com> - Yunxiang (Teddy) Li <yunxiang.li(a)amd.com> Relevant content: - XDC 2016: https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation… - XDC 2020: https://lpc.events/event/9/contributions/615/attachments/704/1301/XDC_2020_… Discussions ----------- NICs are relevant to the discussion, but likely only on large servers. The proposal needs to accomodate that. Nicolas asked about support for the CPU as a device. A lot of pipelines are hybrid, with CPU processing and dedicated hardware processing. James said previous proposals were able to support the CPU as a device. The last proposal was focussed on usages, so could support CPUs. Demi things this could be viewed as in a similar way as type checking in compilers, which have to resolve constraints. Nicolas is annoyed that we force applications to make a decision, requiring them to go the hard route of reallocation if they get it wrong. Some of the parameters are the MMU configuration, which can't be changed afterwards in Linux. Demi said that there could be cases where the fastest thing to do is to use shadow memory and copy. Nicolas said the first question that would come back is if this is something userspace should decide. Yunxiang asked if we're looking at designing a kernel interface to convey constraints. Nicolas said some kernel APIs expose some constraints. For instance, V4L2 allows discovering some stride constraints. Yunxiang thinks constraints won't be able to scale. Capabilities would be better. The difference is that the hardware device can have lots of constraints, but could more easily say some of the things it can do. Adding a new capability later wouldn't break anything, but adding a new constraint would require all components to understand it. James said the latest proposal handled constraints in a forward-compatible way with versioned constraints. Demi prefers the capabilities approach. Yunxiang said devices could expose which memory they can access. Nicolas said that in GStreamer, the current approach is trial and error. Capabilities would reduce trial and error. Everybody has hacks on top. James is instead bringing a formula that solves this. As soon as each of the nodes involved can be identified (which GStreamer can do now), we can calculate and combine but can't sustain adding capabilities. Liviu said GStreamer could tell which capabilities it supports, and push hardware vendors to comply with that. If the hardware is not compatible, GStreamer would provide converters. Nicolas said buffer sharing is a trade-off. We will find things that won't break it but will make it slower. Demi said that cache-incoherent DMA will usually have higher bandwidth than cache-coherent DMA, unless all the components share the same cache. Discussion followed on capabilities vs. constraints. It's partly a vocabulary matter. The first proposal from James was using the word capability. Constraints in the last proposal describe what a device can do. Nicolas has only stride and number of planes as constraints in GStreamer. James' mechanism has support for much more. Yunxiang asked if we can include the pixel format as constraints. James said we first have to resolve formats and modifiers, and then handle allocation constraints. Yunxiang thinks the allocator should also take care of the format. Nicolas said that when using a tiling modifier, we already reduce the scope of incompatibilities. When using e.g. an Intel tiling modifier, there's a bigger guarantee that the devices that support it will be able to inter-operate as they're designed for that purpose. Demi said it should be possible to validate parameters for buffers without being Mesa. The format modifiers are way too opaque. James said that for each format we need to query possible modifiers, and for each format+modifier we need to query constraints. Modifiers are mostly considered opaque to applications, constraints are semi-opaque, and formats are not opaque. The problem that remains hard to James is when it comes to locality, how do we say "local to a device" ? There's no serializable representation of a device. Nicolas says that as soon as it stays in the graphics stack, the stack will hide the problem from applications. A sysfs path could be used on Linux to identify a device. James asked how an application would be allowed to allocate buffers from e.g. a DRI device, if the allocator library told it that the allocation needs to be done there. This starts becoming a permission issue or a logind issue. Passing fds could help, but when sitting on top of GL, we have no way to get an fd. Nicolas said Vulkan should be OK, but GL is dead. Nicolas asked why it would be useful to get access to fds. James said that at the bottom of the problem we have the question of which fd we send an ioctl to to allocate memory. We could start small to have something and break the chicken and egg problem. James thought about using GBM to start with, but this isn't good for GStreamer. James and Nicolas thought that kernel drivers would register dmabuf heaps. One issue with dmabuf heaps today is that allocation isn't tracked, so we bypass cgroups. DRM and V4L2 have the same issue, offering an infinite amount of memory. logind has a responsibility of not making it worse, so didn't allow access to dmabuf heaps. The problem needs to be fixed in the kernel first, adding accounting. This would then be an incentive to abandon device-specific allocation APIs, as memory accounting would come for free when using dmabuf heaps. Nicolas wonders why the obvious path forward of implementing heaps for devices that have specific constraints didn't happen. James thinks people may just have been too busy. Demi brought up the VFIO case. The hardware device may not be managed by the kernel, it may be managed by userspace, or passed to a guest. Capabilities for those devices can't come from a kernel device. James said the key is communicating the constraints in a serializable way. Allocation would involve passing dmabufs between guests and hosts. Allocation from DMA heaps and from subsystem-specific APIs will coexist for a long time. This needs to be abstracted in a userspace library, so that the underlying allocators can evolve. Demi asked if GBM is needed when using Vulkan. Internally in mesa, everything uses GBM for allocation. James said it's a good argument to ask why not using Vulkan only. If we decide to use GBM to bootstrap this evolution, and GBM allocates for a specific device, then it doesn't seem to be a good fit. We don't want to add support for system memory allocation to GBM. Erico said that there's a use case for GBM even when using Vulkan when there's a separate display device that has no Vulkan implementation. We need an iterative approach. For instance a simple useful improvement would be to expose stride constraints on DRM devices. Conclusions ----------- - We need some library (existing or new one) - We need some API - We need someone to bootstrap this - We need a more iterative approach Action points ------------- - Add memory accounting to DMA heaps - Push drivers to implement their own heaps to replace subsystem allocation APIs -- Regards, Laurent Pinchart

1 year, 4 months

1
0
0 0

Re: [PATCH v1 04/10] media: platform: mediatek: add isp_7x cam-raw unit

by AngeloGioacchino Del Regno

Il 09/10/24 13:15, Shu-hsiang Yang ha scritto: > Introduces the ISP pipeline driver for the MediaTek ISP raw and yuv > modules. Key functionalities include data processing, V4L2 integration, > resource management, debug support, and various control operations. > Additionally, IRQ handling, platform device management, and MediaTek > ISP DMA format support are also included. > > Signed-off-by: Shu-hsiang Yang <Shu-hsiang.Yang(a)mediatek.com> > --- > .../mediatek/isp/isp_7x/camsys/mtk_cam-raw.c | 5359 +++++++++++++++++ > .../mediatek/isp/isp_7x/camsys/mtk_cam-raw.h | 325 + > .../isp/isp_7x/camsys/mtk_cam-raw_debug.c | 403 ++ > .../isp/isp_7x/camsys/mtk_cam-raw_debug.h | 39 + > .../isp_7x/camsys/mtk_camera-v4l2-controls.h | 65 + > 5 files changed, 6191 insertions(+) > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw.c > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw_debug.c > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw_debug.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_camera-v4l2-controls.h > > diff --git a/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw.c b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw.c > new file mode 100644 > index 000000000000..c025f53c952d > --- /dev/null > +++ b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_cam-raw.c > @@ -0,0 +1,5359 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// > +// Copyright (c) 2022 MediaTek Inc. > + > +#include <linux/clk.h> > +#include <linux/interrupt.h> > +#include <linux/iopoll.h> > +#include <linux/module.h> > +#include <linux/of.h> > +#include <linux/of_platform.h> > +#include <linux/platform_device.h> > +#include <linux/pm_runtime.h> > +#include <linux/vmalloc.h> > +#include <linux/videodev2.h> > +#include <linux/suspend.h> > +#include <linux/rtc.h> > + > +#include <media/v4l2-device.h> > +#include <media/v4l2-event.h> > +#include <media/v4l2-ioctl.h> > +#include <media/v4l2-subdev.h> > + > +#include <soc/mediatek/smi.h> > + > +#include "mtk_cam.h" > +#include "mtk_cam-feature.h" > +#include "mtk_cam-raw.h" > + > +#include "mtk_cam-regs-mt8188.h" > + > +#include "mtk_cam-video.h" > +#include "mtk_cam-seninf-if.h" > +#include "mtk_camera-v4l2-controls.h" > + > +#include "mtk_cam-dmadbg.h" > +#include "mtk_cam-raw_debug.h" > + > +static unsigned int debug_raw; > +module_param(debug_raw, uint, 0644); > +MODULE_PARM_DESC(debug_raw, "activates debug info"); > + > +static int debug_raw_num = -1; > +module_param(debug_raw_num, int, 0644); > +MODULE_PARM_DESC(debug_raw_num, "debug: num of used raw devices"); > + > +static int debug_pixel_mode = -1; > +module_param(debug_pixel_mode, int, 0644); > +MODULE_PARM_DESC(debug_pixel_mode, "debug: pixel mode"); > + > +static int debug_clk_idx = -1; > +module_param(debug_clk_idx, int, 0644); > +MODULE_PARM_DESC(debug_clk_idx, "debug: clk idx"); > + > +static int debug_dump_fbc; > +module_param(debug_dump_fbc, int, 0644); > +MODULE_PARM_DESC(debug_dump_fbc, "debug: dump fbc"); > + In addition to the first review that I gave you on patch [02/10]: please drop all those module parameters. If you want debug switches, use debugfs instead. Regards, Angelo

1 year, 4 months

1
0
0 0

Re: [PATCH v2] dma-buf: fix S_IRUGO to 0444, block comments, func declaration

by Pintu Agarwal

Hi, On Sat, 5 Oct 2024 at 23:40, Pintu Kumar <quic_pintu(a)quicinc.com> wrote: > > These warnings/errors are reported by checkpatch. > Fix them with minor changes to make it clean. > No other functional changes. > > WARNING: Block comments use * on subsequent lines > + /* only support discovering the end of the buffer, > + but also allow SEEK_SET to maintain the idiomatic > > WARNING: Block comments use a trailing */ on a separate line > + SEEK_END(0), SEEK_CUR(0) pattern */ > > WARNING: Block comments use a trailing */ on a separate line > + * before passing the sgt back to the exporter. */ > > ERROR: "foo * bar" should be "foo *bar" > +static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, > > WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. > + d = debugfs_create_file("bufinfo", S_IRUGO, dma_buf_debugfs_dir, > > total: 1 errors, 4 warnings, 1746 lines checked > > Signed-off-by: Pintu Kumar <quic_pintu(a)quicinc.com> > > --- > Changes in V1 suggested by Sumit Semwal: > Change commit title, and mention exact reason of fix in commit log. > V1: https://lore.kernel.org/all/CAOuPNLg1=YCUFXW-76A_gZm_PE1MFSugNvg3dEdkfujXV_… > --- > drivers/dma-buf/dma-buf.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 8892bc701a66..2e63d50e46d3 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -176,8 +176,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence) > dmabuf = file->private_data; > > /* only support discovering the end of the buffer, > - but also allow SEEK_SET to maintain the idiomatic > - SEEK_END(0), SEEK_CUR(0) pattern */ > + * but also allow SEEK_SET to maintain the idiomatic > + * SEEK_END(0), SEEK_CUR(0) pattern. > + */ > if (whence == SEEK_END) > base = dmabuf->size; > else if (whence == SEEK_SET) > @@ -782,13 +783,14 @@ static void mangle_sg_table(struct sg_table *sg_table) > /* To catch abuse of the underlying struct page by importers mix > * up the bits, but take care to preserve the low SG_ bits to > * not corrupt the sgt. The mixing is undone in __unmap_dma_buf > - * before passing the sgt back to the exporter. */ > + * before passing the sgt back to the exporter. > + */ > for_each_sgtable_sg(sg_table, sg, i) > sg->page_link ^= ~0xffUL; > #endif > > } > -static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, > +static struct sg_table *__map_dma_buf(struct dma_buf_attachment *attach, > enum dma_data_direction direction) > { > struct sg_table *sg_table; > @@ -1694,7 +1696,7 @@ static int dma_buf_init_debugfs(void) > > dma_buf_debugfs_dir = d; > > - d = debugfs_create_file("bufinfo", S_IRUGO, dma_buf_debugfs_dir, > + d = debugfs_create_file("bufinfo", 0444, dma_buf_debugfs_dir, > NULL, &dma_buf_debug_fops); > if (IS_ERR(d)) { > pr_debug("dma_buf: debugfs: failed to create node bufinfo\n"); > -- Pushed V2 here. Any further comment on this ? Thanks, Pintu

1 year, 4 months

1
0
0 0

Re: [PATCH 2/3] dma-buf/heaps: replace kmap_atomic with kmap_local_page

by T.J. Mercier

On Tue, Oct 1, 2024 at 7:51 PM Pintu Kumar <quic_pintu(a)quicinc.com> wrote: > > Use of kmap_atomic/kunmap_atomic is deprecated, use > kmap_local_page/kunmap_local instead. > > This is reported by checkpatch. > Also fix repeated word issue. > > WARNING: Deprecated use of 'kmap_atomic', prefer 'kmap_local_page' instead > + void *vaddr = kmap_atomic(page); > > WARNING: Deprecated use of 'kunmap_atomic', prefer 'kunmap_local' instead > + kunmap_atomic(vaddr); > > WARNING: Possible repeated word: 'by' > + * has been killed by by SIGKILL > > total: 0 errors, 3 warnings, 405 lines checked > > Signed-off-by: Pintu Kumar <quic_pintu(a)quicinc.com> Reviewed-by: T.J. Mercier <tjmercier(a)google.com> The Android kernels have been doing this for over a year, so should be pretty well tested at this point: https://r.android.com/c/kernel/common/+/2500840 > --- > drivers/dma-buf/heaps/cma_heap.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c > index 93be88b805fe..8c55431cc16c 100644 > --- a/drivers/dma-buf/heaps/cma_heap.c > +++ b/drivers/dma-buf/heaps/cma_heap.c > @@ -309,13 +309,13 @@ static struct dma_buf *cma_heap_allocate(struct dma_heap *heap, > struct page *page = cma_pages; > > while (nr_clear_pages > 0) { > - void *vaddr = kmap_atomic(page); > + void *vaddr = kmap_local_page(page); > > memset(vaddr, 0, PAGE_SIZE); > - kunmap_atomic(vaddr); > + kunmap_local(vaddr); > /* > * Avoid wasting time zeroing memory if the process > - * has been killed by by SIGKILL > + * has been killed by SIGKILL. > */ > if (fatal_signal_pending(current)) > goto free_cma; > -- > 2.17.1 >

1 year, 4 months

2
1
0 0

Re: [PATCH v1 01/10] dt-bindings: media: mediatek: add camsys device

by Rob Herring

On Wed, Oct 09, 2024 at 07:15:42PM +0800, Shu-hsiang Yang wrote: > 1. Add camera isp7x module device document > 2. Add camera interface device document > > Signed-off-by: Shu-hsiang Yang <Shu-hsiang.Yang(a)mediatek.com> > --- > .../media/mediatek/mediatek,cam-raw.yaml | 169 ++++++++++++++++++ > .../media/mediatek/mediatek,cam-yuv.yaml | 148 +++++++++++++++ > .../media/mediatek/mediatek,camisp.yaml | 71 ++++++++ > .../media/mediatek/mediatek,seninf-core.yaml | 106 +++++++++++ > .../media/mediatek/mediatek,seninf.yaml | 88 +++++++++ > 5 files changed, 582 insertions(+) > create mode 100644 Documentation/devicetree/bindings/media/mediatek/mediatek,cam-raw.yaml > create mode 100644 Documentation/devicetree/bindings/media/mediatek/mediatek,cam-yuv.yaml > create mode 100644 Documentation/devicetree/bindings/media/mediatek/mediatek,camisp.yaml > create mode 100644 Documentation/devicetree/bindings/media/mediatek/mediatek,seninf-core.yaml > create mode 100644 Documentation/devicetree/bindings/media/mediatek/mediatek,seninf.yaml > > diff --git a/Documentation/devicetree/bindings/media/mediatek/mediatek,cam-raw.yaml b/Documentation/devicetree/bindings/media/mediatek/mediatek,cam-raw.yaml > new file mode 100644 > index 000000000000..c709e4bf0a18 > --- /dev/null > +++ b/Documentation/devicetree/bindings/media/mediatek/mediatek,cam-raw.yaml > @@ -0,0 +1,169 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +# Copyright (c) 2024 MediaTek Inc. > + > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/media/mediatek/mediatek,cam-raw.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: The cam-raw unit of MediaTek ISP system > + > +maintainers: > + - Shu-hsiang Yang <shu-hsiang.yang(a)mediatek.com> > + - Shun-yi Wang <shun-yi.wang(a)mediatek.com> > + - Teddy Chen <teddy.chen(a)mediatek.com> > + > +description: > + MediaTek cam-raw is the camera RAW processing unit in MediaTek SoC. > + > +properties: > + compatible: > + const: mediatek,cam-raw Compatibles should be SoC specific. > + > + "#address-cells": > + const: 2 > + > + "#size-cells": > + const: 2 Don't need these. You don't have any child nodes with addresses. > + > + reg: > + items: > + minItems: 2 > + maxItems: 4 Drop items. > + minItems: 1 > + maxItems: 2 You have to specify what each region is. Why does it vary? A h/w block either has register region or it doesn't. > + > + reg-names: > + minItems: 1 > + maxItems: 2 You must define the names. > + > + mediatek,cam-id: > + description: > + Describes the index of MediaTek cam-raw unit for ISP > + $ref: /schemas/types.yaml#/definitions/uint32 > + enum: [0, 1, 2] No, we don't put module indices in DT. > + > + mediatek,larbs: > + description: > + Describes MediaTek bus infrastructure unit for ISP system. > + List of phandle to the local arbiters in the current SoCs. > + Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. > + $ref: /schemas/types.yaml#/definitions/phandle-array > + minItems: 1 > + maxItems: 32 > + > + interrupts: > + minItems: 1 > + > + dma-ranges: > + description: > + Describes the address information of IOMMU mapping to memory. > + Defines six fields for the MediaTek IOMMU extended iova, pa, and size. > + minItems: 1 > + > + power-domains: > + minItems: 1 > + > + clocks: > + minItems: 4 > + maxItems: 16 You have to define what the clocks are. > + > + clock-names: > + minItems: 4 > + maxItems: 16 > + > + assigned-clocks: > + maxItems: 1 > + > + assigned-clock-parents: > + maxItems: 1 Drop. You don't need to document assigned-clocks. > + > + iommus: > + description: > + Points to the respective IOMMU block with master port as argument, see > + Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml for details. > + Ports are according to the HW. > + minItems: 1 > + maxItems: 32 > + > +required: > + - compatible > + - reg > + - reg-names > + - interrupts > + - power-domains > + - clocks > + - clock-names > + - iommus > + > +additionalProperties: false > + > +examples: > + - | > + #include <dt-bindings/interrupt-controller/arm-gic.h> > + #include <dt-bindings/interrupt-controller/irq.h> > + #include <dt-bindings/power/mediatek,mt8188-power.h> > + #include <dt-bindings/clock/mediatek,mt8188-clk.h> > + #include <dt-bindings/memory/mediatek,mt8188-memory-port.h> > + > + soc { > + #address-cells = <2>; > + #size-cells = <2>; > + > + cam_raw_a@16030000 { > + compatible = "mediatek,cam-raw"; > + reg = <0 0x16030000 0 0x8000>, > + <0 0x16038000 0 0x8000>; > + reg-names = "base", "inner_base"; > + mediatek,cam-id = <0>; > + mediatek,larbs = <&larb16a>; > + interrupts = <GIC_SPI 300 IRQ_TYPE_LEVEL_HIGH 0>; > + #address-cells = <2>; > + #size-cells = <2>; > + dma-ranges = <0x2 0x0 0x0 0x40000000 0x1 0x0>; > + power-domains = <&spm MT8188_POWER_DOMAIN_CAM_SUBA>; > + clocks = <&camsys CLK_CAM_MAIN_CAM2MM0_GALS>, > + <&camsys CLK_CAM_MAIN_CAM2MM1_GALS>, > + <&camsys CLK_CAM_MAIN_CAM2SYS_GALS>, > + <&camsys CLK_CAM_MAIN_CAM>, > + <&camsys CLK_CAM_MAIN_CAMTG>, > + <&camsys_rawa CLK_CAM_RAWA_LARBX>, > + <&camsys_rawa CLK_CAM_RAWA_CAM>, > + <&camsys_rawa CLK_CAM_RAWA_CAMTG>, > + <&topckgen CLK_TOP_CAM>, > + <&topckgen CLK_TOP_CAMTG>, > + <&topckgen CLK_TOP_CAMTM>; > + clock-names = "camsys_cam2mm0_cgpdn", > + "camsys_cam2mm1_cgpdn", > + "camsys_cam2sys_cgpdn", > + "camsys_cam_cgpdn", > + "camsys_camtg_cgpdn", > + "camsys_rawa_larbx_cgpdn", > + "camsys_rawa_cam_cgpdn", > + "camsys_rawa_camtg_cgpdn", > + "topckgen_top_cam", > + "topckgen_top_camtg", > + "topckgen_top_camtm"; > + assigned-clocks = <&topckgen CLK_TOP_CAM>; > + assigned-clock-parents = <&topckgen CLK_TOP_UNIVPLL_D5>; > + iommus = <&vpp_iommu M4U_PORT_L16A_IMGO_R1>, > + <&vpp_iommu M4U_PORT_L16A_CQI_R1>, > + <&vpp_iommu M4U_PORT_L16A_CQI_R2>, > + <&vpp_iommu M4U_PORT_L16A_BPCI_R1>, > + <&vpp_iommu M4U_PORT_L16A_LSCI_R1>, > + <&vpp_iommu M4U_PORT_L16A_RAWI_R2>, > + <&vpp_iommu M4U_PORT_L16A_RAWI_R3>, > + <&vpp_iommu M4U_PORT_L16A_UFDI_R2>, > + <&vpp_iommu M4U_PORT_L16A_UFDI_R3>, > + <&vpp_iommu M4U_PORT_L16A_RAWI_R4>, > + <&vpp_iommu M4U_PORT_L16A_RAWI_R5>, > + <&vpp_iommu M4U_PORT_L16A_AAI_R1>, > + <&vpp_iommu M4U_PORT_L16A_UFDI_R5>, > + <&vpp_iommu M4U_PORT_L16A_FHO_R1>, > + <&vpp_iommu M4U_PORT_L16A_AAO_R1>, > + <&vpp_iommu M4U_PORT_L16A_TSFSO_R1>, > + <&vpp_iommu M4U_PORT_L16A_FLKO_R1>; > + }; > + }; > + > +... > diff --git a/Documentation/devicetree/bindings/media/mediatek/mediatek,cam-yuv.yaml b/Documentation/devicetree/bindings/media/mediatek/mediatek,cam-yuv.yaml > new file mode 100644 > index 000000000000..30dfd5e5ecb1 > --- /dev/null > +++ b/Documentation/devicetree/bindings/media/mediatek/mediatek,cam-yuv.yaml Similar comments on the rest. Rob

1 year, 4 months

1
0
0 0

Re: [PATCH v1 02/10] media: platform: mediatek: add seninf controller

by AngeloGioacchino Del Regno

Il 09/10/24 13:15, Shu-hsiang Yang ha scritto: > Introduces support for the sensor interface in the MediaTek SoC, > with the focus on CSI and stream control. The key functionalities > include parameter control, metering and maintaining status information, > interrupt handling, and debugging. These features ensure effective > management and debugging of the camera sensor interface hardware. > > Signed-off-by: Shu-hsiang Yang <Shu-hsiang.Yang(a)mediatek.com> > --- > .../isp_7x/camsys/mtk_csi_phy_2_0/Makefile | 5 + > .../mtk_csi_phy_2_0/mtk_cam-seninf-cammux.h | 911 ++++++ > .../mtk_cam-seninf-csi0-cphy.h | 69 + > .../mtk_cam-seninf-csi0-dphy.h | 139 + > .../mtk_cam-seninf-hw_phy_2_0.c | 2879 +++++++++++++++++ > .../mtk_cam-seninf-mipi-rx-ana-cdphy-csi0a.h | 257 ++ > .../mtk_cam-seninf-seninf1-csi2.h | 415 +++ > .../mtk_cam-seninf-seninf1-mux.h | 147 + > .../mtk_csi_phy_2_0/mtk_cam-seninf-seninf1.h | 47 + > .../mtk_csi_phy_2_0/mtk_cam-seninf-tg1.h | 49 + > .../mtk_csi_phy_2_0/mtk_cam-seninf-top-ctrl.h | 99 + > 11 files changed, 5017 insertions(+) > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/Makefile The PHY driver should go to the PHY directory as a PHY driver - or at least part of this mtk_csi_phy_2_0 driver should go there. I see that this is tightly integrated with the rest of the code in seninf, but there seems to be many functions that are just handling a "real" PHY. > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-cammux.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-csi0-cphy.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-csi0-dphy.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-hw_phy_2_0.c > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-mipi-rx-ana-cdphy-csi0a.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-seninf1-csi2.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-seninf1-mux.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-seninf1.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-tg1.h > create mode 100644 drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-top-ctrl.h > > diff --git a/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/Makefile b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/Makefile > new file mode 100644 > index 000000000000..e00b8d3904a9 > --- /dev/null > +++ b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/Makefile > @@ -0,0 +1,5 @@ > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2022 MediaTek Inc. > + > +mtk-cam-isp-objs += \ > + mtk_csi_phy_2_0/mtk_cam-seninf-hw_phy_2_0.o > diff --git a/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-cammux.h b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-cammux.h > new file mode 100644 > index 000000000000..ec3c621d742a > --- /dev/null > +++ b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-cammux.h > @@ -0,0 +1,911 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (c) 2022 MediaTek Inc. > + */ > + > +#ifndef __SENINF_CAM_MUX_H__ > +#define __SENINF_CAM_MUX_H__ > + > +#define SENINF_CAM_MUX_CTRL_0 0x0000 > +#define RG_SENINF_CAM_MUX0_SRC_SEL_SHIFT 0 > +#define RG_SENINF_CAM_MUX0_SRC_SEL_MASK (0xf << 0) For all definitions: drop _SHIFT and use the GENMASK(x,y) macro. #define RG_SENINF_CAM_MUX0_SRC_SEL GENMASK(7, 0) #define RG_SENINF_CAM_MUX1_SRC_SEL GENMASK(11, 8) etc. > +#define RG_SENINF_CAM_MUX1_SRC_SEL_SHIFT 8 > +#define RG_SENINF_CAM_MUX1_SRC_SEL_MASK (0xf << 8) > +#define RG_SENINF_CAM_MUX2_SRC_SEL_SHIFT 16 > +#define RG_SENINF_CAM_MUX2_SRC_SEL_MASK (0xf << 16) > +#define RG_SENINF_CAM_MUX3_SRC_SEL_SHIFT 24 > +#define RG_SENINF_CAM_MUX3_SRC_SEL_MASK (0xf << 24) ..snip.. > diff --git a/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-hw_phy_2_0.c b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-hw_phy_2_0.c > new file mode 100644 > index 000000000000..f24d8a056d0e > --- /dev/null > +++ b/drivers/media/platform/mediatek/isp/isp_7x/camsys/mtk_csi_phy_2_0/mtk_cam-seninf-hw_phy_2_0.c > @@ -0,0 +1,2879 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright (c) 2022 MediaTek Inc. > + > +#include <linux/module.h> > +#include <linux/delay.h> > + > +#include "../mtk_cam-seninf.h" > +#include "../mtk_cam-seninf-hw.h" > +#include "../mtk_cam-seninf-regs.h" > +#include "mtk_cam-seninf-top-ctrl.h" > +#include "mtk_cam-seninf-seninf1-mux.h" > +#include "mtk_cam-seninf-seninf1.h" > +#include "mtk_cam-seninf-seninf1-csi2.h" > +#include "mtk_cam-seninf-tg1.h" > +#include "mtk_cam-seninf-cammux.h" > +#include "mtk_cam-seninf-mipi-rx-ana-cdphy-csi0a.h" > +#include "mtk_cam-seninf-csi0-cphy.h" > +#include "mtk_cam-seninf-csi0-dphy.h" > +#include "../kd_imgsensor_define_v4l2.h" > + > +/* seninf cfg default, dts may override */ dts may override: for which reason? Why should DT override that? For which usecase? > +static struct mtk_cam_seninf_cfg _seninf_cfg = { > + .mux_num = 8, > + .seninf_num = 4, > + .cam_mux_num = 11, > + .pref_mux_num = 11, > +}; > + > +struct mtk_cam_seninf_cfg *g_seninf_cfg = &_seninf_cfg; That's unused. Drop. > + > +static inline void mtk_cam_seninf_set_di_ch_ctrl(void __iomem *pseninf, > + unsigned int stream_id, > + struct seninf_vc *vc) > +{ > + if (stream_id > 7) No magic numbers, please. #define SOMETHING 7 > + return; > + > + SENINF_BITS(pseninf, SENINF_CSI2_S0_DI_CTRL + (stream_id << 0x2), Is it me, or is SENINF_BITS() not defined?! > + RG_CSI2_DT_SEL, vc->dt); > + SENINF_BITS(pseninf, SENINF_CSI2_S0_DI_CTRL + (stream_id << 0x2), > + RG_CSI2_VC_SEL, vc->vc); > + SENINF_BITS(pseninf, SENINF_CSI2_S0_DI_CTRL + (stream_id << 0x2), > + RG_CSI2_DT_INTERLEAVE_MODE, 1); > + SENINF_BITS(pseninf, SENINF_CSI2_S0_DI_CTRL + (stream_id << 0x2), > + RG_CSI2_VC_INTERLEAVE_EN, 1); > + ..snip.. > +} > + > +int mtk_cam_seninf_init_iomem(struct seninf_ctx *ctx, void __iomem *if_base, > + void __iomem *ana_base) > +{ > + u32 i; > + > + ctx->reg_ana_csi_rx[CSI_PORT_0] = Please don't use "a = b = c" assignments. In this case, that impacts on human readability. > + ctx->reg_ana_csi_rx[CSI_PORT_0A] = ana_base + 0; > + ctx->reg_ana_csi_rx[CSI_PORT_0B] = ana_base + 0x1000; Again, no magic numbers please. #define SOMETHING 0x1000 > + > + ctx->reg_ana_csi_rx[CSI_PORT_1] = > + ctx->reg_ana_csi_rx[CSI_PORT_1A] = ana_base + 0x4000; > + ctx->reg_ana_csi_rx[CSI_PORT_1B] = ana_base + 0x5000; ..snip.. > + return 0; > +} > + > +int mtk_cam_seninf_init_port(struct seninf_ctx *ctx, int port) > +{ > + u32 port_num; > + > + if (port >= CSI_PORT_0A) > + port_num = (port - CSI_PORT_0) >> 1; I think that you really want to use bitfield.h macros to simplify this. > + else > + port_num = port; > + > + ctx->port = port; > + ctx->port_num = port_num; > + ctx->port_a = CSI_PORT_0A + (port_num << 1); > + ctx->port_b = ctx->port_a + 1; > + ctx->is_4d1c = (port == port_num); > + > + switch (port) { > + case CSI_PORT_0: > + ctx->seninf_idx = SENINF_1; Is the CSI port to SENINF mapping supposed to be static and unchangeable? If yes, then you want to add that into an array indexed by CSI_PORT_xx, so that you end up doing ctx->seninf_idx = seninf_csi[port]; > + break; > + case CSI_PORT_0A: > + ctx->seninf_idx = SENINF_1; > + break; > + case CSI_PORT_0B: > + ctx->seninf_idx = SENINF_2; > + break; > + case CSI_PORT_1: > + ctx->seninf_idx = SENINF_3; > + break; > + case CSI_PORT_1A: > + ctx->seninf_idx = SENINF_3; > + break; > + case CSI_PORT_1B: > + ctx->seninf_idx = SENINF_4; > + break; > + case CSI_PORT_2: > + ctx->seninf_idx = SENINF_5; > + break; > + case CSI_PORT_2A: > + ctx->seninf_idx = SENINF_5; > + break; > + case CSI_PORT_2B: > + ctx->seninf_idx = SENINF_6; > + break; > + case CSI_PORT_3: > + ctx->seninf_idx = SENINF_7; > + break; > + case CSI_PORT_3A: > + ctx->seninf_idx = SENINF_7; > + break; > + case CSI_PORT_3B: > + ctx->seninf_idx = SENINF_8; > + break; > + default: > + dev_dbg(ctx->dev, "invalid port %d\n", port); > + return -EINVAL; > + } > + > + return 0; > +} > + > +int mtk_cam_seninf_is_cammux_used(struct seninf_ctx *ctx, int cam_mux) > +{ > + void __iomem *seninf_cam_mux = ctx->reg_if_cam_mux; > + u32 temp = SENINF_READ_REG(seninf_cam_mux, SENINF_CAM_MUX_EN); Please, just use readl() or regmap_read() (preferrably the latter, and preferrably also define regmap reg fields). > + > + if (cam_mux >= _seninf_cfg.cam_mux_num) { > + dev_dbg(ctx->dev, > + "%s err cam_mux %d >= SENINF_CAM_MUX_NUM %d\n", > + __func__, cam_mux, _seninf_cfg.cam_mux_num); That shouldn't ever happen, should it. In that case, it's dev_err(), or even a WARN(), if necessary. > + > + return 0; > + } > + > + return !!(temp & (1 << cam_mux)); > +} > + > +int mtk_cam_seninf_cammux(struct seninf_ctx *ctx, int cam_mux) > +{ > + void __iomem *seninf_cam_mux = ctx->reg_if_cam_mux; > + u32 temp; > + > + if (cam_mux >= _seninf_cfg.cam_mux_num) { This check is duplicated across many functions... you want to do something such that it either never happens or it gets commonized. > + dev_dbg(ctx->dev, > + "%s err cam_mux %d >= SENINF_CAM_MUX_NUM %d\n", > + __func__, cam_mux, _seninf_cfg.cam_mux_num); > + > + return 0; You return 0, so there's no problem if this is the wrong cam_mux?!?!? > + } > + > + temp = SENINF_READ_REG(seninf_cam_mux, SENINF_CAM_MUX_EN); > + SENINF_WRITE_REG(seninf_cam_mux, SENINF_CAM_MUX_EN, > + temp | (1 << cam_mux)); BIT(cam_mux) > + > + SENINF_WRITE_REG(seninf_cam_mux, SENINF_CAM_MUX_IRQ_STATUS, > + 3 << (cam_mux * 2)); /* clr irq */ > + > + dev_dbg(ctx->dev, "cam_mux %d EN 0x%x IRQ_EN 0x%x IRQ_STATUS 0x%x\n", > + cam_mux, SENINF_READ_REG(seninf_cam_mux, SENINF_CAM_MUX_EN), > + SENINF_READ_REG(seninf_cam_mux, SENINF_CAM_MUX_IRQ_EN), > + SENINF_READ_REG(seninf_cam_mux, SENINF_CAM_MUX_IRQ_STATUS)); > + > + return 0; > +} > + > + > +int mtk_cam_seninf_set_vc(struct seninf_ctx *ctx, u32 seninf_idx, > + struct seninf_vcinfo *vcinfo) > +{ > + void __iomem *seninf_csi2 = ctx->reg_if_csi2[seninf_idx]; > + int i; > + struct seninf_vc *vc; > + > + if (!vcinfo || !vcinfo->cnt) > + return 0; Does !vcinfo mean that Virtual Channel should not be set? Please add a comment explaining that. > + > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S0_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S1_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S2_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S3_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S4_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S5_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S6_DI_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_S7_DI_CTRL, 0); > + > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_CH0_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_CH1_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_CH2_CTRL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_CH3_CTRL, 0); > + > + for (i = 0; i < vcinfo->cnt; i++) { > + vc = &vcinfo->vc[i]; > + > + /* General Long Packet Data Types: 0x10-0x17 */ > + if (vc->dt >= 0x10 && vc->dt <= 0x17) { > + SENINF_BITS(seninf_csi2, SENINF_CSI2_OPT, > + RG_CSI2_GENERIC_LONG_PACKET_EN, 1); > + } > + > + mtk_cam_seninf_set_di_ch_ctrl(seninf_csi2, i, vc); > + } > + > + dev_dbg(ctx->dev, "DI_CTRL 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x\n", > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S0_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S1_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S2_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S3_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S4_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S5_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S6_DI_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_S7_DI_CTRL)); > + > + dev_dbg(ctx->dev, "CH_CTRL 0x%x 0x%x 0x%x 0x%x\n", > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_CH0_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_CH1_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_CH2_CTRL), > + SENINF_READ_REG(seninf_csi2, SENINF_CSI2_CH3_CTRL)); > + > + return 0; > +} > + ..snip.. > + > +static int csirx_phyA_power_on(struct seninf_ctx *ctx, u32 port_idx, int en) Please, lower case on all functions. csirx_phy_a_power_on(). Also, you want to split this function in two: one should be csirx_phy_a_power_off() and one for power_on(). static void csirx_phy_a_power_off(struct seninf_ctx *ctx, u32 port_idx) { > +{ > + void __iomem *base = ctx->reg_ana_csi_rx[port_idx]; > + > + SENINF_BITS(base, CDPHY_RX_ANA_8, RG_CSI0_L0_T0AB_EQ_OS_CAL_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_8, RG_CSI0_L1_T1AB_EQ_OS_CAL_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_8, RG_CSI0_L2_T1BC_EQ_OS_CAL_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_8, RG_CSI0_XX_T0BC_EQ_OS_CAL_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_8, RG_CSI0_XX_T0CA_EQ_OS_CAL_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_8, RG_CSI0_XX_T1CA_EQ_OS_CAL_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, RG_CSI0_BG_LPF_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, RG_CSI0_BG_CORE_EN, 0); > + usleep_range(200, 300); > + } static int csirx_phy_a_power_on(struct seninf_ctx *ctx, u32 port_idx) <-- NO int en { void *iomem base = ..... /* Power it off first to ... reset, I believe? */ csirx_phy_a_power_off(); power on sequence here return 0; } > + if (en) { > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_BG_CORE_EN, 1); > + usleep_range(30, 40); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_BG_LPF_EN, 1); > + udelay(1); > + SENINF_BITS(base, CDPHY_RX_ANA_8, > + RG_CSI0_L0_T0AB_EQ_OS_CAL_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_8, > + RG_CSI0_L1_T1AB_EQ_OS_CAL_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_8, > + RG_CSI0_L2_T1BC_EQ_OS_CAL_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_8, > + RG_CSI0_XX_T0BC_EQ_OS_CAL_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_8, > + RG_CSI0_XX_T0CA_EQ_OS_CAL_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_8, > + RG_CSI0_XX_T1CA_EQ_OS_CAL_EN, 1); > + udelay(1); > + } > + > + return 0; > +} > + > +static int apply_efuse_data(struct seninf_ctx *ctx) > +{ void __iomem *base; u32 m_csi_efuse = ctx->m_csi_efuse; u32 port; int ret; > + int ret = 0; > + void __iomem *base; > + u32 port; > + u32 m_csi_efuse = ctx->m_csi_efuse; > + > + if (m_csi_efuse == 0) { > + dev_dbg(ctx->dev, "No efuse data. Returned.\n"); > + return -1; > + } ..snip.. > + > + return ret; return 0; > +} > + > +static int csirx_phyA_init(struct seninf_ctx *ctx) > +{ > + u32 i, port; > + void __iomem *base; > + > + port = ctx->port; > + for (i = 0; i <= ctx->is_4d1c; i++) { Move the setup in a different function, then call it with something(port); if (is_4d1c) something(port_b); > + port = i ? ctx->port_b : ctx->port; > + base = ctx->reg_ana_csi_rx[port]; > + SENINF_BITS(base, CDPHY_RX_ANA_1, > + RG_CSI0_BG_LPRX_VTL_SEL, 0x4); > + SENINF_BITS(base, CDPHY_RX_ANA_1, > + RG_CSI0_BG_LPRX_VTH_SEL, 0x4); > + SENINF_BITS(base, CDPHY_RX_ANA_2, > + RG_CSI0_BG_ALP_RX_VTL_SEL, 0x4); > + SENINF_BITS(base, CDPHY_RX_ANA_2, > + RG_CSI0_BG_ALP_RX_VTH_SEL, 0x4); > + SENINF_BITS(base, CDPHY_RX_ANA_1, > + RG_CSI0_BG_VREF_SEL, 0x8); > + SENINF_BITS(base, CDPHY_RX_ANA_1, > + RG_CSI0_CDPHY_EQ_DES_VREF_SEL, 0x2); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_9, > + RG_CSI0_RESERVE, 0x3003); > + SENINF_BITS(base, CDPHY_RX_ANA_SETTING_0, > + CSR_CSI_RST_MODE, 0x2); > + > + SENINF_BITS(base, CDPHY_RX_ANA_2, > + RG_CSI0_L0P_T0A_HSRT_CODE, 0x10); > + SENINF_BITS(base, CDPHY_RX_ANA_2, > + RG_CSI0_L0N_T0B_HSRT_CODE, 0x10); > + SENINF_BITS(base, CDPHY_RX_ANA_3, > + RG_CSI0_L1P_T0C_HSRT_CODE, 0x10); > + SENINF_BITS(base, CDPHY_RX_ANA_3, > + RG_CSI0_L1N_T1A_HSRT_CODE, 0x10); > + SENINF_BITS(base, CDPHY_RX_ANA_4, > + RG_CSI0_L2P_T1B_HSRT_CODE, 0x10); > + SENINF_BITS(base, CDPHY_RX_ANA_4, > + RG_CSI0_L2N_T1C_HSRT_CODE, 0x10); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_T0_CDR_FIRST_EDGE_EN, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_T1_CDR_FIRST_EDGE_EN, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_2, > + RG_CSI0_CPHY_T0_CDR_SELF_CAL_EN, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_2, > + RG_CSI0_CPHY_T1_CDR_SELF_CAL_EN, 0x0); > + > + SENINF_BITS(base, CDPHY_RX_ANA_6, > + RG_CSI0_CPHY_T0_CDR_CK_DELAY, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_7, > + RG_CSI0_CPHY_T1_CDR_CK_DELAY, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_6, > + RG_CSI0_CPHY_T0_CDR_AB_WIDTH, 0x9); > + SENINF_BITS(base, CDPHY_RX_ANA_6, > + RG_CSI0_CPHY_T0_CDR_BC_WIDTH, 0x9); > + SENINF_BITS(base, CDPHY_RX_ANA_6, > + RG_CSI0_CPHY_T0_CDR_CA_WIDTH, 0x9); > + SENINF_BITS(base, CDPHY_RX_ANA_7, > + RG_CSI0_CPHY_T1_CDR_AB_WIDTH, 0x9); > + SENINF_BITS(base, CDPHY_RX_ANA_7, > + RG_CSI0_CPHY_T1_CDR_BC_WIDTH, 0x9); > + SENINF_BITS(base, CDPHY_RX_ANA_7, > + RG_CSI0_CPHY_T1_CDR_CA_WIDTH, 0x9); > + > + dev_dbg(ctx->dev, "port:%d CDPHY_RX_ANA_0(0x%x)\n", > + port, SENINF_READ_REG(base, CDPHY_RX_ANA_0)); > + } > + > + apply_efuse_data(ctx); > + > + return 0; > +} > + > +static int csirx_dphy_init(struct seninf_ctx *ctx) > +{ > + void __iomem *base = ctx->reg_ana_dphy_top[ctx->port]; > + int settle_delay_dt, settle_delay_ck, hs_trail, hs_trail_en; > + int bit_per_pixel; u8 bits_per_pixel; > + u64 data_rate; > + > + settle_delay_dt = ctx->is_cphy ? ctx->core->cphy_settle_delay_dt : > + ctx->core->dphy_settle_delay_dt; Please set settle_delay_ck and hs_trail here. > + > + SENINF_BITS(base, DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_CDPHY_RX_LD0_TRIO0_HS_SETTLE_PARAMETER, > + settle_delay_dt); > + SENINF_BITS(base, DPHY_RX_DATA_LANE1_HS_PARAMETER, > + RG_CDPHY_RX_LD1_TRIO1_HS_SETTLE_PARAMETER, > + settle_delay_dt); > + SENINF_BITS(base, DPHY_RX_DATA_LANE2_HS_PARAMETER, > + RG_CDPHY_RX_LD2_TRIO2_HS_SETTLE_PARAMETER, > + settle_delay_dt); > + SENINF_BITS(base, DPHY_RX_DATA_LANE3_HS_PARAMETER, > + RG_CDPHY_RX_LD3_TRIO3_HS_SETTLE_PARAMETER, > + settle_delay_dt); > + > + settle_delay_ck = ctx->core->settle_delay_ck; > + > + SENINF_BITS(base, DPHY_RX_CLOCK_LANE0_HS_PARAMETER, > + RG_DPHY_RX_LC0_HS_SETTLE_PARAMETER, > + settle_delay_ck); > + SENINF_BITS(base, DPHY_RX_CLOCK_LANE1_HS_PARAMETER, > + RG_DPHY_RX_LC1_HS_SETTLE_PARAMETER, > + settle_delay_ck); > + > + SENINF_BITS(base, DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_CDPHY_RX_LD0_TRIO0_HS_PREPARE_PARAMETER, 2); > + SENINF_BITS(base, DPHY_RX_DATA_LANE1_HS_PARAMETER, > + RG_CDPHY_RX_LD1_TRIO1_HS_PREPARE_PARAMETER, 2); > + SENINF_BITS(base, DPHY_RX_DATA_LANE2_HS_PARAMETER, > + RG_CDPHY_RX_LD2_TRIO2_HS_PREPARE_PARAMETER, 2); > + SENINF_BITS(base, DPHY_RX_DATA_LANE3_HS_PARAMETER, > + RG_CDPHY_RX_LD3_TRIO3_HS_PREPARE_PARAMETER, 2); > + > + hs_trail = ctx->hs_trail_parameter; > + > + SENINF_BITS(base, DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_DPHY_RX_LD0_HS_TRAIL_PARAMETER, hs_trail); > + SENINF_BITS(base, DPHY_RX_DATA_LANE1_HS_PARAMETER, > + RG_DPHY_RX_LD1_HS_TRAIL_PARAMETER, hs_trail); > + SENINF_BITS(base, DPHY_RX_DATA_LANE2_HS_PARAMETER, > + RG_DPHY_RX_LD2_HS_TRAIL_PARAMETER, hs_trail); > + SENINF_BITS(base, DPHY_RX_DATA_LANE3_HS_PARAMETER, > + RG_DPHY_RX_LD3_HS_TRAIL_PARAMETER, hs_trail); > + > + if (!ctx->is_cphy) { > + bit_per_pixel = 10; data_rate = ctx->customized_pixel_rate ? ctx->customized_pixel_rate : ctx->mipi_pixel_rate; data_rate *= bits_per_pixel; > + if (ctx->customized_pixel_rate != 0) > + data_rate = ctx->customized_pixel_rate * bit_per_pixel; > + else > + data_rate = ctx->mipi_pixel_rate * bit_per_pixel; > + > + do_div(data_rate, ctx->num_data_lanes); > + hs_trail_en = data_rate < 1400000000; > + SENINF_BITS(base, DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_DPHY_RX_LD0_HS_TRAIL_EN, hs_trail_en); > + SENINF_BITS(base, DPHY_RX_DATA_LANE1_HS_PARAMETER, > + RG_DPHY_RX_LD1_HS_TRAIL_EN, hs_trail_en); > + SENINF_BITS(base, DPHY_RX_DATA_LANE2_HS_PARAMETER, > + RG_DPHY_RX_LD2_HS_TRAIL_EN, hs_trail_en); > + SENINF_BITS(base, DPHY_RX_DATA_LANE3_HS_PARAMETER, > + RG_DPHY_RX_LD3_HS_TRAIL_EN, hs_trail_en); > + } > + > + return 0; > +} > + > +static int csirx_cphy_init(struct seninf_ctx *ctx) > +{ > + void __iomem *base = ctx->reg_ana_cphy_top[ctx->port]; > + > + SENINF_BITS(base, CPHY_RX_DETECT_CTRL_POST, > + RG_CPHY_RX_DATA_VALID_POST_EN, 1); > + > + return 0; > +} > + > +static int csirx_phy_init(struct seninf_ctx *ctx) > +{ > + csirx_phyA_init(ctx); > + > + csirx_dphy_init(ctx); > + csirx_cphy_init(ctx); > + > + return 0; > +} > + > +static int csirx_seninf_csi2_setting(struct seninf_ctx *ctx) > +{ > + void __iomem *seninf_csi2 = ctx->reg_if_csi2[ctx->seninf_idx]; > + int csi_en; > + > + SENINF_BITS(seninf_csi2, SENINF_CSI2_DBG_CTRL, > + RG_CSI2_DBG_PACKET_CNT_EN, 1); > + > + /* lane/trio count */ > + SENINF_BITS(seninf_csi2, SENINF_CSI2_RESYNC_MERGE_CTRL, > + RG_CSI2_RESYNC_CYCLE_CNT_OPT, 1); > + > + csi_en = (1 << ctx->num_data_lanes) - 1; csi_en = BIT(ctx->num_data_lanes) - 1; or csi_en = GENMASK(ctx->num_data_lanes - 1, 0); > + > + if (!ctx->is_cphy) { > + SENINF_BITS(seninf_csi2, SENINF_CSI2_OPT, RG_CSI2_CPHY_SEL, 0); > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_EN, csi_en); > + SENINF_BITS(seninf_csi2, SENINF_CSI2_HDR_MODE_0, > + RG_CSI2_HEADER_MODE, 0); > + SENINF_BITS(seninf_csi2, SENINF_CSI2_HDR_MODE_0, > + RG_CSI2_HEADER_LEN, 0); > + } else { > + u8 map_hdr_len[] = {0, 1, 2, 4, 5}; u8 map_hdr_len[] = { 0, 1, 2, 4, 5 }; > + > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_EN, csi_en); > + SENINF_BITS(seninf_csi2, SENINF_CSI2_OPT, > + RG_CSI2_CPHY_SEL, 1); > + SENINF_BITS(seninf_csi2, SENINF_CSI2_HDR_MODE_0, > + RG_CSI2_HEADER_MODE, 2); > + SENINF_BITS(seninf_csi2, SENINF_CSI2_HDR_MODE_0, > + RG_CSI2_HEADER_LEN, > + map_hdr_len[ctx->num_data_lanes]); > + } > + > + return 0; > +} > + > +static int csirx_seninf_setting(struct seninf_ctx *ctx) > +{ > + void __iomem *seninf = ctx->reg_if_ctrl[ctx->seninf_idx]; > + > + /* enable/disable seninf csi2 */ > + SENINF_BITS(seninf, SENINF_CSI2_CTRL, RG_SENINF_CSI2_EN, 1); > + > + /* enable/disable seninf, enable after csi2, testmdl is done */ > + SENINF_BITS(seninf, SENINF_CTRL, SENINF_EN, 1); > + > + return 0; > +} > + > +static int csirx_seninf_top_setting(struct seninf_ctx *ctx) > +{ ..snip.. > + > + /* port operation mode */ > + switch (ctx->port) { > + case CSI_PORT_0: > + case CSI_PORT_0A: > + case CSI_PORT_0B: reg = TOP_PHY_CTRL_CSI0; field_cphy = PHY_SENINF_MUX0_CPHY_EN; field_dphy = PHY_SENINF_MUX0_DPHY_EN; break; case CSI_PORT_1: case CSI_PORT_1A: case CSI_PORT_1B: reg = TOP_PHY_CTRL_CSI1; field_cphy = PHY_SENINF_MUX1_CPHY_EN; field_dphy = PHY_SENINF_MUX1_DPHY_EN; break; .... etc } regmap_write_bits(regmap, reg, field_cphy, ctx->is_cphy); regmap_write_bits(regmap, reg, field_dphy, !ctx->is_cphy); > + if (!ctx->is_cphy) { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI0, > + PHY_SENINF_MUX0_CPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI0, > + PHY_SENINF_MUX0_DPHY_EN, 1); > + } else { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI0, > + PHY_SENINF_MUX0_DPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI0, > + PHY_SENINF_MUX0_CPHY_EN, 1); > + } > + break; > + case CSI_PORT_1: > + case CSI_PORT_1A: > + case CSI_PORT_1B: > + if (!ctx->is_cphy) { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI1, > + PHY_SENINF_MUX1_CPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI1, > + PHY_SENINF_MUX1_DPHY_EN, 1); > + } else { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI1, > + PHY_SENINF_MUX1_DPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI1, > + PHY_SENINF_MUX1_CPHY_EN, 1); > + } > + break; > + case CSI_PORT_2: > + case CSI_PORT_2A: > + case CSI_PORT_2B: > + if (!ctx->is_cphy) { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI2, > + PHY_SENINF_MUX2_CPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI2, > + PHY_SENINF_MUX2_DPHY_EN, 1); > + } else { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI2, > + PHY_SENINF_MUX2_DPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI2, > + PHY_SENINF_MUX2_CPHY_EN, 1); > + } > + break; > + case CSI_PORT_3: > + case CSI_PORT_3A: > + case CSI_PORT_3B: > + if (!ctx->is_cphy) { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI3, > + PHY_SENINF_MUX3_CPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI3, > + PHY_SENINF_MUX3_DPHY_EN, 1); > + } else { > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI3, > + PHY_SENINF_MUX3_DPHY_EN, 0); > + SENINF_BITS(seninf_top, SENINF_TOP_PHY_CTRL_CSI3, > + PHY_SENINF_MUX3_CPHY_EN, 1); > + } > + break; > + default: > + break; So if this is called with a CSI_PORT that is out of range, we're not erroring out?!?! > + } > + > + return 0; > +} > + > +static int csirx_phyA_setting(struct seninf_ctx *ctx) > +{ > + void __iomem *base, *baseA, *baseB; > + > + base = ctx->reg_ana_csi_rx[ctx->port]; > + baseA = ctx->reg_ana_csi_rx[ctx->port_a]; > + baseB = ctx->reg_ana_csi_rx[ctx->port_b]; > + > + if (!ctx->is_cphy) { /* dphy */ For this huge register write sequence, you want to split it in at least two functions to improve human readability and reduce indentation. > + if (ctx->is_4d1c) { > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_EN, 0); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_EN, 0); > + /* clear clk sel first */ > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKMODE_EN, 0); > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKMODE_EN, 0); > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKMODE_EN, 0); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKMODE_EN, 0); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKMODE_EN, 0); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKMODE_EN, 0); > + > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKSEL, 1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKSEL, 1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKSEL, 1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKSEL, 1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKSEL, 1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKSEL, 1); > + > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKMODE_EN, 0); > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKMODE_EN, 0); > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKMODE_EN, 1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKMODE_EN, 0); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKMODE_EN, 0); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKMODE_EN, 0); > + > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + } else { > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_EN, 0); > + /* clear clk sel first */ > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKMODE_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKMODE_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKMODE_EN, 0); > + > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKSEL, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKSEL, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKSEL, 0); > + > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L0_CKMODE_EN, 0); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L1_CKMODE_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_DPHY_L2_CKMODE_EN, 0); > + > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + } > + } else { /* cphy */ > + if (ctx->is_4d1c) { > + SENINF_BITS(baseA, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_EN, 1); > + SENINF_BITS(baseB, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_EN, 1); > + > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x0); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x3); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x0); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x3); > + SENINF_BITS(baseA, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + } else { > + SENINF_BITS(base, CDPHY_RX_ANA_0, > + RG_CSI0_CPHY_EN, 1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_BW, 0x3); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_IS, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_LATCH_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG0_EN, 0x1); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_DG1_EN, 0x0); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR0, 0x3); > + SENINF_BITS(base, CDPHY_RX_ANA_5, > + RG_CSI0_CDPHY_EQ_SR1, 0x0); > + } > + } > + > + /* phyA power on */ > + if (ctx->is_4d1c) { > + csirx_phyA_power_on(ctx, ctx->port_a, 1); > + csirx_phyA_power_on(ctx, ctx->port_b, 1); > + } else { > + csirx_phyA_power_on(ctx, ctx->port, 1); > + } > + > + return 0; > +} > + > +static int csirx_dphy_setting(struct seninf_ctx *ctx) > +{ > + void __iomem *base = ctx->reg_ana_dphy_top[ctx->port]; > + > + if (ctx->is_4d1c) { > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD3_SEL, 4); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD2_SEL, 0); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD1_SEL, 3); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD0_SEL, 1); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LC0_SEL, 2); > + > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD0_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD1_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD2_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD3_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LC0_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LC1_EN, 0); > + } else { > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD3_SEL, 5); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD2_SEL, 3); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD1_SEL, 2); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LD0_SEL, 0); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LC1_SEL, 4); > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, RG_DPHY_RX_LC0_SEL, 1); > + > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD0_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD1_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD2_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LD3_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LC0_EN, 1); > + SENINF_BITS(base, DPHY_RX_LANE_EN, DPHY_RX_LC1_EN, 1); > + } > + > + SENINF_BITS(base, DPHY_RX_LANE_SELECT, DPHY_RX_CK_DATA_MUX_EN, 1); > + > + return 0; > +} > + > +static int csirx_cphy_setting(struct seninf_ctx *ctx) > +{ > + void __iomem *base = ctx->reg_ana_cphy_top[ctx->port]; u32 reg_lprx[] = { CPHY_RX_TR0_LPRX_EN, CPHY_RX_TR1_LPRX_EN, ...TR2, ...TR3 }; > + > + switch (ctx->port) { > + case CSI_PORT_0: > + case CSI_PORT_1: > + case CSI_PORT_2: > + case CSI_PORT_3: > + case CSI_PORT_0A: > + case CSI_PORT_1A: > + case CSI_PORT_2A: > + case CSI_PORT_3A: for (i = 0; i < ctx->num_data_lanes; i++) regmap_write_bits(regmap, CPHY_RX_CTRL, reg_lprx[i], 1); for (; i < ARRAY_SIZE(reg_lprx); i++) regmap_write_bits(regmap, CPHY_RX_CTRL, reg_lprx[i], 0); > + if (ctx->num_data_lanes == 3) { > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR0_LPRX_EN, 1); > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR1_LPRX_EN, 1); > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR2_LPRX_EN, 1); > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR3_LPRX_EN, 0); > + } else if (ctx->num_data_lanes == 2) { > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR0_LPRX_EN, 1); > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR1_LPRX_EN, 1); > + } else { > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR0_LPRX_EN, 1); > + } > + break; > + case CSI_PORT_0B: > + case CSI_PORT_1B: > + case CSI_PORT_2B: > + case CSI_PORT_3B: for (i = 0; i < ctx->num_data_lanes; i++) { regmap_write_bits(regmap, CPHY_RX_CTRL, reg_lprx[i], 0); regmap_write_bits(regmap, CPHY_RX_CTRL, reg_lprx[i+2], 1); } > + if (ctx->num_data_lanes == 2) { > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR2_LPRX_EN, 1); > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR3_LPRX_EN, 1); > + } else { > + SENINF_BITS(base, CPHY_RX_CTRL, CPHY_RX_TR2_LPRX_EN, 1); > + } > + break; > + default: > + break; > + } > + > + return 0; > +} > + > +static int csirx_phy_setting(struct seninf_ctx *ctx) > +{ > + csirx_phyA_setting(ctx); > + Please, if (ctx->is_cphy) x() else y() > + if (!ctx->is_cphy) > + csirx_dphy_setting(ctx); > + else > + csirx_cphy_setting(ctx); > + > + return 0; > +} > + > +int mtk_cam_seninf_set_csi_mipi(struct seninf_ctx *ctx) > +{ > + csirx_phy_init(ctx); > + > + /* seninf csi2 */ > + csirx_seninf_csi2_setting(ctx); > + > + /* seninf */ > + csirx_seninf_setting(ctx); > + > + /* seninf top */ > + csirx_seninf_top_setting(ctx); > + > + /* phy */ > + csirx_phy_setting(ctx); > + > + return 0; > +} > + > +int mtk_cam_seninf_poweroff(struct seninf_ctx *ctx) > +{ > + void __iomem *seninf_csi2; > + > + seninf_csi2 = ctx->reg_if_csi2[ctx->seninf_idx]; > + > + SENINF_WRITE_REG(seninf_csi2, SENINF_CSI2_EN, 0x0); > + > + if (ctx->is_4d1c) { > + csirx_phyA_power_on(ctx, ctx->port_a, 0); > + csirx_phyA_power_on(ctx, ctx->port_b, 0); > + } else { > + csirx_phyA_power_on(ctx, ctx->port, 0); > + } > + > + return 0; > +} > + > +int mtk_cam_seninf_reset(struct seninf_ctx *ctx, u32 seninf_idx) > +{ > + int i; > + void __iomem *seninf_mux; > + void __iomem *seninf = ctx->reg_if_ctrl[seninf_idx]; > + > + SENINF_BITS(seninf, SENINF_CSI2_CTRL, SENINF_CSI2_SW_RST, 1); What about adding a reset controller that includes SENINF_CSI(x)_SW_RST? Also, why are you resetting only CSI2 and not the others? > + udelay(1); > + SENINF_BITS(seninf, SENINF_CSI2_CTRL, SENINF_CSI2_SW_RST, 0); > + > + dev_dbg(ctx->dev, "reset seninf %d\n", seninf_idx); > + > + for (i = SENINF_MUX1; i < _seninf_cfg.mux_num; i++) > + if (mtk_cam_seninf_get_top_mux_ctrl(ctx, i) == seninf_idx && > + mtk_cam_seninf_is_mux_used(ctx, i)) { > + seninf_mux = ctx->reg_if_mux[i]; > + SENINF_BITS(seninf_mux, SENINF_MUX_CTRL_0, > + SENINF_MUX_SW_RST, 1); > + udelay(1); > + SENINF_BITS(seninf_mux, SENINF_MUX_CTRL_0, > + SENINF_MUX_SW_RST, 0); > + dev_dbg(ctx->dev, "reset mux %d\n", i); > + } > + > + return 0; > +} > + > +int mtk_cam_seninf_set_idle(struct seninf_ctx *ctx) > +{ > + int i; > + struct seninf_vcinfo *vcinfo = &ctx->vcinfo; > + struct seninf_vc *vc; > + > + for (i = 0; i < vcinfo->cnt; i++) { > + vc = &vcinfo->vc[i]; > + if (vc->enable) { > + mtk_cam_seninf_disable_mux(ctx, vc->mux); > + mtk_cam_seninf_disable_cammux(ctx, vc->cam); > + ctx->pad2cam[vc->out_pad] = 0xff; > + } > + } > + > + return 0; > +} > + > +int mtk_cam_seninf_get_mux_meter(struct seninf_ctx *ctx, u32 mux, > + struct mtk_cam_seninf_mux_meter *meter) > +{ > + void __iomem *seninf_mux; > + s64 hv, hb, vv, vb, w, h; > + u64 mipi_pixel_rate, vb_in_us, hb_in_us, line_time_in_us; > + u32 res; > + > + seninf_mux = ctx->reg_if_mux[mux]; > + > + SENINF_BITS(seninf_mux, SENINF_MUX_FRAME_SIZE_MON_CTRL, > + RG_SENINF_MUX_FRAME_SIZE_MON_EN, 1); > + > + hv = SENINF_READ_REG(seninf_mux, SENINF_MUX_FRAME_SIZE_MON_H_VALID); > + hb = SENINF_READ_REG(seninf_mux, SENINF_MUX_FRAME_SIZE_MON_H_BLANK); > + vv = SENINF_READ_REG(seninf_mux, SENINF_MUX_FRAME_SIZE_MON_V_VALID); > + vb = SENINF_READ_REG(seninf_mux, SENINF_MUX_FRAME_SIZE_MON_V_BLANK); > + res = SENINF_READ_REG(seninf_mux, SENINF_MUX_SIZE); > + > + w = res & 0xffff; Another case of bitfield.h macros open-coded, or missed opportunity to just use regmap to simplify this. > + h = res >> 16; And same here, of course. > + > + if (ctx->fps_n && ctx->fps_d) { > + mipi_pixel_rate = w * ctx->fps_n * (vv + vb); > + do_div(mipi_pixel_rate, ctx->fps_d); Use div_s64() for safe 64-bits division. > + do_div(mipi_pixel_rate, hv); > + > + vb_in_us = vb * ctx->fps_d * 1000000; > + do_div(vb_in_us, vv + vb); > + do_div(vb_in_us, ctx->fps_n); > + > + hb_in_us = hb * ctx->fps_d * 1000000; > + do_div(hb_in_us, vv + vb); > + do_div(hb_in_us, ctx->fps_n); > + > + line_time_in_us = (hv + hb) * ctx->fps_d * 1000000; > + do_div(line_time_in_us, vv + vb); > + do_div(line_time_in_us, ctx->fps_n); > + > + meter->mipi_pixel_rate = mipi_pixel_rate; > + meter->vb_in_us = vb_in_us; > + meter->hb_in_us = hb_in_us; > + meter->line_time_in_us = line_time_in_us; > + } else { > + meter->mipi_pixel_rate = -1; > + meter->vb_in_us = -1; > + meter->hb_in_us = -1; > + meter->line_time_in_us = -1; > + } > + > + meter->width = w; > + meter->height = h; > + > + meter->h_valid = hv; > + meter->h_blank = hb; > + meter->v_valid = vv; > + meter->v_blank = vb; > + > + return 0; > +} > + > +ssize_t mtk_cam_seninf_show_status(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + int i, len; > + struct seninf_core *core; struct seninf_core *core = dev_get_drvdata(dev); > + struct seninf_ctx *ctx; > + struct seninf_vc *vc; > + struct media_link *link; > + struct media_pad *pad; > + struct mtk_cam_seninf_mux_meter meter; > + void __iomem *csi2, *pmux; > + void __iomem *rx, *pcammux; int i, len = 0; > + > + core = dev_get_drvdata(dev); > + len = 0; > + > + mutex_lock(&core->mutex); > + > + list_for_each_entry(ctx, &core->list, list) { > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\n[%s] port %d intf %d test %d cphy %d lanes %d\n", > + ctx->subdev.name, ctx->port, ctx->seninf_idx, > + ctx->is_test_model, ctx->is_cphy, > + ctx->num_data_lanes); > + > + pad = &ctx->pads[PAD_SINK]; > + list_for_each_entry(link, &pad->entity->links, list) { > + if (link->sink == pad) { > + len += snprintf(buf + len, PAGE_SIZE - len, > + "source %s flags 0x%lx\n", > + link->source->entity->name, > + link->flags); > + } > + } > + > + if (!ctx->streaming) > + continue; > + > + csi2 = ctx->reg_if_csi2[ctx->seninf_idx]; > + rx = ctx->reg_ana_dphy_top[ctx->port]; > + > + len += snprintf(buf + len, PAGE_SIZE - len, > + "csi2 irq_stat 0x%08x\n", > + SENINF_READ_REG(csi2, SENINF_CSI2_IRQ_STATUS)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "csi2 line_frame_num 0x%08x\n", > + SENINF_READ_REG(csi2, SENINF_CSI2_LINE_FRAME_NUM)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "csi2 packet_status 0x%08x\n", > + SENINF_READ_REG(csi2, SENINF_CSI2_PACKET_STATUS)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "csi2 packet_cnt_status 0x%08x\n", > + SENINF_READ_REG(csi2, SENINF_CSI2_PACKET_CNT_STATUS)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "rx-ana settle ck 0x%02x dt 0x%02x\n", > + SENINF_READ_BITS(rx, DPHY_RX_CLOCK_LANE0_HS_PARAMETER, > + RG_DPHY_RX_LC0_HS_SETTLE_PARAMETER), > + SENINF_READ_BITS(rx, > + DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_CDPHY_RX_LD0_TRIO0_HS_SETTLE_PARAMETER)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "rx-ana trail en %u param 0x%02x\n", > + SENINF_READ_BITS(rx, DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_DPHY_RX_LD0_HS_TRAIL_EN), > + SENINF_READ_BITS(rx, DPHY_RX_DATA_LANE0_HS_PARAMETER, > + RG_DPHY_RX_LD0_HS_TRAIL_PARAMETER)); > + > + len += snprintf(buf + len, PAGE_SIZE - len, > + "data_not_enough_cnt : <%d>", > + ctx->data_not_enough_cnt); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "err_lane_resync_cnt : <%d>", > + ctx->err_lane_resync_cnt); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "crc_err_cnt : <%d>", ctx->crc_err_flag); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "ecc_err_double_cnt : <%d>", > + ctx->ecc_err_double_cnt); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "ecc_err_corrected_cnt : <%d>\n", > + ctx->ecc_err_corrected_cnt); > + > + for (i = 0; i < ctx->vcinfo.cnt; i++) { > + vc = &ctx->vcinfo.vc[i]; > + pmux = ctx->reg_if_mux[vc->mux]; > + pcammux = ctx->reg_if_cam_mux; > + > + len += snprintf(buf + len, PAGE_SIZE - len, > + "[%d] vc 0x%x dt 0x%x mux %d cam %d\n", > + i, vc->vc, vc->dt, vc->mux, vc->cam); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\tmux[%d] en %d src %d irq_stat 0x%x\n", > + vc->mux, > + mtk_cam_seninf_is_mux_used(ctx, vc->mux), > + mtk_cam_seninf_get_top_mux_ctrl(ctx, vc->mux), > + SENINF_READ_REG(pmux, SENINF_MUX_IRQ_STATUS)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\t\tfifo_overrun_cnt : <%d>\n", > + ctx->fifo_overrun_cnt); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\tcam[%d] en %d src %d exp 0x%x res 0x%x irq_stat 0x%x\n", > + vc->cam, > + mtk_cam_seninf_is_cammux_used(ctx, vc->cam), > + mtk_cam_seninf_get_cammux_ctrl(ctx, vc->cam), > + mtk_cam_seninf_get_cammux_exp(ctx, vc->cam), > + mtk_cam_seninf_get_cammux_res(ctx, vc->cam), > + SENINF_READ_REG(pcammux, > + SENINF_CAM_MUX_IRQ_STATUS)); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\t\tsize_err_cnt : <%d>\n", > + ctx->size_err_cnt); > + > + if (vc->feature == VC_RAW_DATA || > + vc->feature == VC_STAGGER_NE || > + vc->feature == VC_STAGGER_ME || > + vc->feature == VC_STAGGER_SE) { > + mtk_cam_seninf_get_mux_meter(ctx, vc->mux, > + &meter); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\t--- mux meter ---\n"); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\twidth %d height %d\n", > + meter.width, meter.height); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\th_valid %d, h_blank %d\n", > + meter.h_valid, meter.h_blank); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\tv_valid %d, v_blank %d\n", > + meter.v_valid, meter.v_blank); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\tmipi_pixel_rate %lld\n", > + meter.mipi_pixel_rate); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\tv_blank %lld us\n", > + meter.vb_in_us); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\th_blank %lld us\n", > + meter.hb_in_us); > + len += snprintf(buf + len, PAGE_SIZE - len, > + "\tline_time %lld us\n", > + meter.line_time_in_us); > + } > + } > + } > + > + mutex_unlock(&core->mutex); > + > + return len; > +} > + > +#define SENINF_DRV_DEBUG_MAX_DELAY 400 > + > +static inline void > +mtk_cam_seninf_clear_matched_cam_mux_irq(struct seninf_ctx *ctx, > + u32 cam_mux_idx, Why is this a u32 param?.... > + u32 vc_idx, > + s32 enabled) > +{ > + u8 used_cammux; ...if this is a u8 local variable? (cam_mux_idx should be u8 instead - same for vc_idx) > + > + if (cam_mux_idx >= SENINF_CAM_MUX_NUM) { > + dev_info(ctx->dev, "unsupport cam_mux(%u)", cam_mux_idx); > + return; > + } > + if (vc_idx >= SENINF_VC_MAXCNT) { > + dev_info(ctx->dev, "unsupport vc_idx(%u)", vc_idx); > + return; > + } > + > + used_cammux = ctx->vcinfo.vc[vc_idx].cam; > + if (used_cammux == cam_mux_idx && > + enabled & (1 << cam_mux_idx)) { > + dev_dbg(ctx->dev, > + "before clear cam mux%u recSize = 0x%x, irq = 0x%x", > + cam_mux_idx, > + SENINF_READ_REG(ctx->reg_if_cam_mux, > + SENINF_CAM_MUX0_CHK_RES + (0x10 * cam_mux_idx)), > + SENINF_READ_REG(ctx->reg_if_cam_mux, > + SENINF_CAM_MUX_IRQ_STATUS)); > + > + SENINF_WRITE_REG(ctx->reg_if_cam_mux, > + SENINF_CAM_MUX_IRQ_STATUS, > + 3 << (cam_mux_idx * 2)); > + } > +} > + > +static inline void mtk_cam_seninf_check_matched_cam_mux(struct seninf_ctx *ctx, > + u32 cam_mux_idx, > + u32 vc_idx, > + s32 enabled, > + s32 irq_status) > +{ > + u8 used_cammux; > + > + if (cam_mux_idx >= SENINF_CAM_MUX_NUM) { > + dev_info(ctx->dev, "unsupport cam_mux(%u)", cam_mux_idx); > + return; > + } > + if (vc_idx >= SENINF_VC_MAXCNT) { > + dev_info(ctx->dev, "unsupport vc_idx(%u)", vc_idx); > + return; > + } > + > + used_cammux = ctx->vcinfo.vc[vc_idx].cam; > + > + if (used_cammux == cam_mux_idx && enabled & (1 << cam_mux_idx)) { > + int rec_size = SENINF_READ_REG(ctx->reg_if_cam_mux, > + SENINF_CAM_MUX0_CHK_RES + (0x10 * cam_mux_idx)); > + int exp_size = SENINF_READ_REG(ctx->reg_if_cam_mux, > + SENINF_CAM_MUX0_CHK_CTL_1 + (0x10 * cam_mux_idx)); > + if (rec_size != exp_size) { > + dev_dbg(ctx->dev, > + "cam mux%u size mismatch, (rec, exp) = (0x%x, 0x%x)", > + cam_mux_idx, rec_size, exp_size); > + } > + if ((irq_status & > + (3 << (cam_mux_idx * 2))) != 0) { > + dev_dbg(ctx->dev, > + "cam mux%u size mismatch!, irq = 0x%x", > + cam_mux_idx, irq_status); > + } > + } > +} > + ..snip.. > + > +#define SBUF 256 > +int mtk_cam_seninf_irq_handler(int irq, void *data) > +{ > + struct seninf_core *core = (struct seninf_core *)data; > + unsigned long flags; /* for mipi err detection */ > + struct seninf_ctx *ctx; > + struct seninf_vc *vc; > + void __iomem *csi2, *pmux, *seninf_cam_mux; > + int i; > + unsigned int csi_irq_ro; > + unsigned int mux_irq_ro; > + unsigned int cam_irq_exp_ro; > + unsigned int cam_irq_res_ro; > + char seninf_log[SBUF]; > + unsigned int wcnt = 0; > + > + spin_lock_irqsave(&core->spinlock_irq, flags); > + > + /* debug for set_reg case: REG_KEY_CSI_IRQ_EN */ > + if (core->csi_irq_en_flag) { > + list_for_each_entry(ctx, &core->list, list) { > + csi2 = ctx->reg_if_csi2[ctx->seninf_idx]; > + csi_irq_ro = > + SENINF_READ_REG(csi2, SENINF_CSI2_IRQ_STATUS); > + > + if (csi_irq_ro) { > + SENINF_WRITE_REG(csi2, SENINF_CSI2_IRQ_STATUS, > + 0xFFFFFFFF); > + } > + > + if (csi_irq_ro & (0x1 << RO_CSI2_ECC_ERR_CORRECTED_IRQ_SHIFT)) > + ctx->ecc_err_corrected_cnt++; > + if (csi_irq_ro & (0x1 << RO_CSI2_ECC_ERR_DOUBLE_IRQ_SHIFT)) > + ctx->ecc_err_double_cnt++; > + if (csi_irq_ro & (0x1 << RO_CSI2_CRC_ERR_IRQ_SHIFT)) > + ctx->crc_err_cnt++; > + if (csi_irq_ro & (0x1 << RO_CSI2_ERR_LANE_RESYNC_IRQ_SHIFT)) > + ctx->err_lane_resync_cnt++; > + if (csi_irq_ro & (0x1 << RO_CSI2_RECEIVE_DATA_NOT_ENOUGH_IRQ_SHIFT)) > + ctx->data_not_enough_cnt++; > + > + for (i = 0; i < ctx->vcinfo.cnt; i++) { > + vc = &ctx->vcinfo.vc[i]; > + pmux = ctx->reg_if_mux[vc->mux]; > + seninf_cam_mux = ctx->reg_if_cam_mux; > + > + mux_irq_ro = SENINF_READ_REG(pmux, > + SENINF_MUX_IRQ_STATUS); > + > + cam_irq_exp_ro = SENINF_READ_REG(seninf_cam_mux, > + SENINF_CAM_MUX0_CHK_CTL_1 + > + (0x10 * (vc->cam))); > + > + cam_irq_res_ro = SENINF_READ_REG(seninf_cam_mux, > + SENINF_CAM_MUX0_CHK_RES + > + (0x10 * (vc->cam))); > + > + if (mux_irq_ro) > + SENINF_WRITE_REG(pmux, > + SENINF_MUX_IRQ_STATUS, > + 0xFFFFFFFF); > + > + if (cam_irq_res_ro != cam_irq_exp_ro) > + SENINF_WRITE_REG(seninf_cam_mux, > + SENINF_CAM_MUX0_CHK_RES + > + (0x10 * (vc->cam)), > + 0xFFFFFFFF); > + > + if (mux_irq_ro & (0x1 << 0)) > + ctx->fifo_overrun_cnt++; > + > + if (cam_irq_res_ro != cam_irq_exp_ro) > + ctx->size_err_cnt++; > + } > + > + /* dump status counter: debug for electrical signal */ > + if (ctx->data_not_enough_cnt >= core->detection_cnt || > + ctx->err_lane_resync_cnt >= core->detection_cnt || > + ctx->crc_err_cnt >= core->detection_cnt || > + ctx->ecc_err_double_cnt >= core->detection_cnt || > + ctx->ecc_err_corrected_cnt >= core->detection_cnt || > + ctx->fifo_overrun_cnt >= core->detection_cnt || > + ctx->size_err_cnt >= core->detection_cnt) { > + /* disable all interrupts */ > + SENINF_WRITE_REG(csi2, SENINF_CSI2_IRQ_EN, 0x80000000); > + > + if (ctx->data_not_enough_cnt >= core->detection_cnt) > + ctx->data_not_enough_flag = 1; > + if (ctx->err_lane_resync_cnt >= core->detection_cnt) > + ctx->err_lane_resync_flag = 1; > + if (ctx->crc_err_cnt >= core->detection_cnt) > + ctx->crc_err_flag = 1; > + if (ctx->ecc_err_double_cnt >= core->detection_cnt) > + ctx->ecc_err_double_flag = 1; > + if (ctx->ecc_err_corrected_cnt >= core->detection_cnt) > + ctx->ecc_err_corrected_flag = 1; > + if (ctx->fifo_overrun_cnt >= core->detection_cnt) > + ctx->fifo_overrun_flag = 1; > + if (ctx->size_err_cnt >= core->detection_cnt) > + ctx->size_err_flag = 1; > + > + wcnt = snprintf(seninf_log, SBUF, "info: %s", __func__); You're not even printing seninf_log, so either print it (no, please don't) or just drop it. > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " data_not_enough_count: %d", > + ctx->data_not_enough_cnt); > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " err_lane_resync_count: %d", > + ctx->err_lane_resync_cnt); > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " crc_err_count: %d", > + ctx->crc_err_cnt); > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " ecc_err_double_count: %d", > + ctx->ecc_err_double_cnt); > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " ecc_err_corrected_count: %d", > + ctx->ecc_err_corrected_cnt); > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " fifo_overrun_count: %d", > + ctx->fifo_overrun_cnt); > + wcnt += snprintf(seninf_log + wcnt, SBUF - wcnt, > + " size_err_count: %d", > + ctx->size_err_cnt); > + } > + } > + } > + > + spin_unlock_irqrestore(&core->spinlock_irq, flags); > + > + return 0; > +} > + > +int mtk_cam_seninf_set_sw_cfg_busy(struct seninf_ctx *ctx, bool enable, > + int index) > +{ > + void __iomem *seninf_cam_mux = ctx->reg_if_cam_mux; mask = index ? RG_SENINF_CAM_MUX_DYN_SWITCH_BSY1 : RG_SENINF_CAM_MUX_DYN_SWITCH_BSY0 > + > + if (index == 0) > + SENINF_BITS(seninf_cam_mux, SENINF_CAM_MUX_DYN_CTRL, > + RG_SENINF_CAM_MUX_DYN_SWITCH_BSY0, enable); > + else > + SENINF_BITS(seninf_cam_mux, SENINF_CAM_MUX_DYN_CTRL, > + RG_SENINF_CAM_MUX_DYN_SWITCH_BSY1, enable); > + return 0; > +} > + > +int mtk_cam_seninf_set_cam_mux_dyn_en(struct seninf_ctx *ctx, bool enable, > + int cam_mux, int index) > +{ > + void __iomem *seninf_cam_mux = ctx->reg_if_cam_mux; > + u32 tmp = 0; > + > + if (index == 0) { > + tmp = SENINF_READ_BITS(seninf_cam_mux, SENINF_CAM_MUX_DYN_EN, > + RG_SENINF_CAM_MUX_DYN_SWITCH_EN0); regmap_set_bits() will definitely help here. > + if (enable) > + tmp = tmp | (1 << cam_mux); > + else > + tmp = tmp & ~(1 << cam_mux); > + > + SENINF_BITS(seninf_cam_mux, SENINF_CAM_MUX_DYN_EN, > + RG_SENINF_CAM_MUX_DYN_SWITCH_EN0, tmp); > + } else { > + tmp = SENINF_READ_BITS(seninf_cam_mux, SENINF_CAM_MUX_DYN_EN, > + RG_SENINF_CAM_MUX_DYN_SWITCH_EN1); > + if (enable) > + tmp = tmp | (1 << cam_mux); > + else > + tmp = tmp & ~(1 << cam_mux); > + > + SENINF_BITS(seninf_cam_mux, SENINF_CAM_MUX_DYN_EN, > + RG_SENINF_CAM_MUX_DYN_SWITCH_EN1, tmp); > + } > + > + return 0; > +} > + There is surely more to say on this driver, and it's far from being near to upstream quality. Please start with addressing these comments on the entire series, then we can go on with further reviews. Regards, Angelo

1 year, 4 months

1
0
0 0

[PATCH AUTOSEL 6.11 72/76] drm/amdgpu: nuke the VM PD/PT shadow handling

by Sasha Levin

From: Christian König <christian.koenig(a)amd.com> [ Upstream commit 7181faaa4703705939580abffaf9cb5d6b50dbb7 ] This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König <christian.koenig(a)amd.com> Acked-by: Lijo Lazar <lijo.lazar(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 87 +-------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 67 +--------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 21 ----- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 ---- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 56 +------------ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 19 +---- 7 files changed, 6 insertions(+), 265 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 137a88b8de453..a1b2bf3db55b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1091,10 +1091,6 @@ struct amdgpu_device { struct amdgpu_virt virt; - /* link all shadow bo */ - struct list_head shadow_list; - struct mutex shadow_list_lock; - /* record hw reset is performed */ bool has_hw_reset; u8 reset_magic[AMDGPU_RESET_MAGIC_NUM]; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index bcacf2e35eba0..dfd468729d52b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4073,9 +4073,6 @@ int amdgpu_device_init(struct amdgpu_device *adev, spin_lock_init(&adev->mm_stats.lock); spin_lock_init(&adev->wb.lock); - INIT_LIST_HEAD(&adev->shadow_list); - mutex_init(&adev->shadow_list_lock); - INIT_LIST_HEAD(&adev->reset_list); INIT_LIST_HEAD(&adev->ras_list); @@ -4980,80 +4977,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev) return 0; } -/** - * amdgpu_device_recover_vram - Recover some VRAM contents - * - * @adev: amdgpu_device pointer - * - * Restores the contents of VRAM buffers from the shadows in GTT. Used to - * restore things like GPUVM page tables after a GPU reset where - * the contents of VRAM might be lost. - * - * Returns: - * 0 on success, negative error code on failure. - */ -static int amdgpu_device_recover_vram(struct amdgpu_device *adev) -{ - struct dma_fence *fence = NULL, *next = NULL; - struct amdgpu_bo *shadow; - struct amdgpu_bo_vm *vmbo; - long r = 1, tmo; - - if (amdgpu_sriov_runtime(adev)) - tmo = msecs_to_jiffies(8000); - else - tmo = msecs_to_jiffies(100); - - dev_info(adev->dev, "recover vram bo from shadow start\n"); - mutex_lock(&adev->shadow_list_lock); - list_for_each_entry(vmbo, &adev->shadow_list, shadow_list) { - /* If vm is compute context or adev is APU, shadow will be NULL */ - if (!vmbo->shadow) - continue; - shadow = vmbo->shadow; - - /* No need to recover an evicted BO */ - if (!shadow->tbo.resource || - shadow->tbo.resource->mem_type != TTM_PL_TT || - shadow->tbo.resource->start == AMDGPU_BO_INVALID_OFFSET || - shadow->parent->tbo.resource->mem_type != TTM_PL_VRAM) - continue; - - r = amdgpu_bo_restore_shadow(shadow, &next); - if (r) - break; - - if (fence) { - tmo = dma_fence_wait_timeout(fence, false, tmo); - dma_fence_put(fence); - fence = next; - if (tmo == 0) { - r = -ETIMEDOUT; - break; - } else if (tmo < 0) { - r = tmo; - break; - } - } else { - fence = next; - } - } - mutex_unlock(&adev->shadow_list_lock); - - if (fence) - tmo = dma_fence_wait_timeout(fence, false, tmo); - dma_fence_put(fence); - - if (r < 0 || tmo <= 0) { - dev_err(adev->dev, "recover vram bo from shadow failed, r is %ld, tmo is %ld\n", r, tmo); - return -EIO; - } - - dev_info(adev->dev, "recover vram bo from shadow done\n"); - return 0; -} - - /** * amdgpu_device_reset_sriov - reset ASIC for SR-IOV vf * @@ -5116,12 +5039,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, if (r) return r; - if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) { + if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) amdgpu_inc_vram_lost(adev); - r = amdgpu_device_recover_vram(adev); - } - if (r) - return r; /* need to be called during full access so we can't do it later like * bare-metal does. @@ -5541,9 +5460,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, } } - if (!r) - r = amdgpu_device_recover_vram(tmp_adev); - else + if (r) tmp_adev->asic_reset_res = r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e32161f6b67a3..a987f671b1d53 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -77,24 +77,6 @@ static void amdgpu_bo_user_destroy(struct ttm_buffer_object *tbo) amdgpu_bo_destroy(tbo); } -static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo) -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev); - struct amdgpu_bo *shadow_bo = ttm_to_amdgpu_bo(tbo), *bo; - struct amdgpu_bo_vm *vmbo; - - bo = shadow_bo->parent; - vmbo = to_amdgpu_bo_vm(bo); - /* in case amdgpu_device_recover_vram got NULL of bo->parent */ - if (!list_empty(&vmbo->shadow_list)) { - mutex_lock(&adev->shadow_list_lock); - list_del_init(&vmbo->shadow_list); - mutex_unlock(&adev->shadow_list_lock); - } - - amdgpu_bo_destroy(tbo); -} - /** * amdgpu_bo_is_amdgpu_bo - check if the buffer object is an &amdgpu_bo * @bo: buffer object to be checked @@ -108,8 +90,7 @@ static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo) bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo) { if (bo->destroy == &amdgpu_bo_destroy || - bo->destroy == &amdgpu_bo_user_destroy || - bo->destroy == &amdgpu_bo_vm_destroy) + bo->destroy == &amdgpu_bo_user_destroy) return true; return false; @@ -722,52 +703,6 @@ int amdgpu_bo_create_vm(struct amdgpu_device *adev, return r; } -/** - * amdgpu_bo_add_to_shadow_list - add a BO to the shadow list - * - * @vmbo: BO that will be inserted into the shadow list - * - * Insert a BO to the shadow list. - */ -void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo) -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(vmbo->bo.tbo.bdev); - - mutex_lock(&adev->shadow_list_lock); - list_add_tail(&vmbo->shadow_list, &adev->shadow_list); - vmbo->shadow->parent = amdgpu_bo_ref(&vmbo->bo); - vmbo->shadow->tbo.destroy = &amdgpu_bo_vm_destroy; - mutex_unlock(&adev->shadow_list_lock); -} - -/** - * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow - * - * @shadow: &amdgpu_bo shadow to be restored - * @fence: dma_fence associated with the operation - * - * Copies a buffer object's shadow content back to the object. - * This is used for recovering a buffer from its shadow in case of a gpu - * reset where vram context may be lost. - * - * Returns: - * 0 for success or a negative error code on failure. - */ -int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence) - -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev); - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; - uint64_t shadow_addr, parent_addr; - - shadow_addr = amdgpu_bo_gpu_offset(shadow); - parent_addr = amdgpu_bo_gpu_offset(shadow->parent); - - return amdgpu_copy_buffer(ring, shadow_addr, parent_addr, - amdgpu_bo_size(shadow), NULL, fence, - true, false, 0); -} - /** * amdgpu_bo_kmap - map an &amdgpu_bo buffer object * @bo: &amdgpu_bo buffer object to be mapped diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index bc42ccbde659a..a4fa1f296daec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -130,8 +130,6 @@ struct amdgpu_bo_user { struct amdgpu_bo_vm { struct amdgpu_bo bo; - struct amdgpu_bo *shadow; - struct list_head shadow_list; struct amdgpu_vm_bo_base entries[]; }; @@ -269,22 +267,6 @@ static inline bool amdgpu_bo_encrypted(struct amdgpu_bo *bo) return bo->flags & AMDGPU_GEM_CREATE_ENCRYPTED; } -/** - * amdgpu_bo_shadowed - check if the BO is shadowed - * - * @bo: BO to be tested. - * - * Returns: - * NULL if not shadowed or else return a BO pointer. - */ -static inline struct amdgpu_bo *amdgpu_bo_shadowed(struct amdgpu_bo *bo) -{ - if (bo->tbo.type == ttm_bo_type_kernel) - return to_amdgpu_bo_vm(bo)->shadow; - - return NULL; -} - bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo); void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain); @@ -343,9 +325,6 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo); u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo); void amdgpu_bo_get_memory(struct amdgpu_bo *bo, struct amdgpu_mem_stats *stats); -void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo); -int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, - struct dma_fence **fence); uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, uint32_t domain); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index a060c28f0877c..8cda1d02dade3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -465,7 +465,6 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm, { uint64_t new_vm_generation = amdgpu_vm_generation(adev, vm); struct amdgpu_vm_bo_base *bo_base; - struct amdgpu_bo *shadow; struct amdgpu_bo *bo; int r; @@ -486,16 +485,10 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm, spin_unlock(&vm->status_lock); bo = bo_base->bo; - shadow = amdgpu_bo_shadowed(bo); r = validate(param, bo); if (r) return r; - if (shadow) { - r = validate(param, shadow); - if (r) - return r; - } if (bo->tbo.type != ttm_bo_type_kernel) { amdgpu_vm_bo_moved(bo_base); @@ -2123,10 +2116,6 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, { struct amdgpu_vm_bo_base *bo_base; - /* shadow bo doesn't have bo base, its validation needs its parent */ - if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo)) - bo = bo->parent; - for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { struct amdgpu_vm *vm = bo_base->vm; @@ -2454,7 +2443,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, root_bo = amdgpu_bo_ref(&root->bo); r = amdgpu_bo_reserve(root_bo, true); if (r) { - amdgpu_bo_unref(&root->shadow); amdgpu_bo_unref(&root_bo); goto error_free_delayed; } @@ -2546,11 +2534,6 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) vm->last_update = dma_fence_get_stub(); vm->is_compute_context = true; - /* Free the shadow bo for compute VM */ - amdgpu_bo_unref(&to_amdgpu_bo_vm(vm->root.bo)->shadow); - - goto unreserve_bo; - unreserve_bo: amdgpu_bo_unreserve(vm->root.bo); return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index e39d6e7643bfb..c8e0b8cfd3363 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -383,14 +383,6 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (r) return r; - if (vmbo->shadow) { - struct amdgpu_bo *shadow = vmbo->shadow; - - r = ttm_bo_validate(&shadow->tbo, &shadow->placement, &ctx); - if (r) - return r; - } - if (!drm_dev_enter(adev_to_drm(adev), &idx)) return -ENODEV; @@ -448,10 +440,7 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t xcp_id) { struct amdgpu_bo_param bp; - struct amdgpu_bo *bo; - struct dma_resv *resv; unsigned int num_entries; - int r; memset(&bp, 0, sizeof(bp)); @@ -484,42 +473,7 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (vm->root.bo) bp.resv = vm->root.bo->tbo.base.resv; - r = amdgpu_bo_create_vm(adev, &bp, vmbo); - if (r) - return r; - - bo = &(*vmbo)->bo; - if (vm->is_compute_context || (adev->flags & AMD_IS_APU)) { - (*vmbo)->shadow = NULL; - return 0; - } - - if (!bp.resv) - WARN_ON(dma_resv_lock(bo->tbo.base.resv, - NULL)); - resv = bp.resv; - memset(&bp, 0, sizeof(bp)); - bp.size = amdgpu_vm_pt_size(adev, level); - bp.domain = AMDGPU_GEM_DOMAIN_GTT; - bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC; - bp.type = ttm_bo_type_kernel; - bp.resv = bo->tbo.base.resv; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - bp.xcp_id_plus1 = xcp_id + 1; - - r = amdgpu_bo_create(adev, &bp, &(*vmbo)->shadow); - - if (!resv) - dma_resv_unlock(bo->tbo.base.resv); - - if (r) { - amdgpu_bo_unref(&bo); - return r; - } - - amdgpu_bo_add_to_shadow_list(*vmbo); - - return 0; + return amdgpu_bo_create_vm(adev, &bp, vmbo); } /** @@ -569,7 +523,6 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev, return 0; error_free_pt: - amdgpu_bo_unref(&pt->shadow); amdgpu_bo_unref(&pt_bo); return r; } @@ -581,17 +534,10 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev, */ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) { - struct amdgpu_bo *shadow; - if (!entry->bo) return; entry->bo->vm_bo = NULL; - shadow = amdgpu_bo_shadowed(entry->bo); - if (shadow) { - ttm_bo_set_bulk_move(&shadow->tbo, NULL); - amdgpu_bo_unref(&shadow); - } ttm_bo_set_bulk_move(&entry->bo->tbo, NULL); spin_lock(&entry->vm->status_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 9b748d7058b5c..390432a22ddd5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -35,16 +35,7 @@ */ static int amdgpu_vm_sdma_map_table(struct amdgpu_bo_vm *table) { - int r; - - r = amdgpu_ttm_alloc_gart(&table->bo.tbo); - if (r) - return r; - - if (table->shadow) - r = amdgpu_ttm_alloc_gart(&table->shadow->tbo); - - return r; + return amdgpu_ttm_alloc_gart(&table->bo.tbo); } /* Allocate a new job for @count PTE updates */ @@ -273,17 +264,13 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p, if (!p->pages_addr) { /* set page commands needed */ - if (vmbo->shadow) - amdgpu_vm_sdma_set_ptes(p, vmbo->shadow, pe, addr, - count, incr, flags); amdgpu_vm_sdma_set_ptes(p, bo, pe, addr, count, incr, flags); return 0; } /* copy commands needed */ - ndw -= p->adev->vm_manager.vm_pte_funcs->copy_pte_num_dw * - (vmbo->shadow ? 2 : 1); + ndw -= p->adev->vm_manager.vm_pte_funcs->copy_pte_num_dw; /* for padding */ ndw -= 7; @@ -298,8 +285,6 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p, pte[i] |= flags; } - if (vmbo->shadow) - amdgpu_vm_sdma_copy_ptes(p, vmbo->shadow, pe, nptes); amdgpu_vm_sdma_copy_ptes(p, bo, pe, nptes); pe += nptes * 8; -- 2.43.0

1 year, 4 months

2
2
0 0

[PATCH AUTOSEL 6.10 66/70] drm/amdgpu: nuke the VM PD/PT shadow handling

by Sasha Levin

From: Christian König <christian.koenig(a)amd.com> [ Upstream commit 7181faaa4703705939580abffaf9cb5d6b50dbb7 ] This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König <christian.koenig(a)amd.com> Acked-by: Lijo Lazar <lijo.lazar(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 87 +-------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 67 +--------------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 21 ----- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 ---- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 56 +------------ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 19 +---- 7 files changed, 6 insertions(+), 265 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f87d53e183c3d..b48e8066c01cc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1078,10 +1078,6 @@ struct amdgpu_device { struct amdgpu_virt virt; - /* link all shadow bo */ - struct list_head shadow_list; - struct mutex shadow_list_lock; - /* record hw reset is performed */ bool has_hw_reset; u8 reset_magic[AMDGPU_RESET_MAGIC_NUM]; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e66546df0bc19..1df57a02a7598 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4072,9 +4072,6 @@ int amdgpu_device_init(struct amdgpu_device *adev, spin_lock_init(&adev->mm_stats.lock); spin_lock_init(&adev->wb.lock); - INIT_LIST_HEAD(&adev->shadow_list); - mutex_init(&adev->shadow_list_lock); - INIT_LIST_HEAD(&adev->reset_list); INIT_LIST_HEAD(&adev->ras_list); @@ -4979,80 +4976,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev) return 0; } -/** - * amdgpu_device_recover_vram - Recover some VRAM contents - * - * @adev: amdgpu_device pointer - * - * Restores the contents of VRAM buffers from the shadows in GTT. Used to - * restore things like GPUVM page tables after a GPU reset where - * the contents of VRAM might be lost. - * - * Returns: - * 0 on success, negative error code on failure. - */ -static int amdgpu_device_recover_vram(struct amdgpu_device *adev) -{ - struct dma_fence *fence = NULL, *next = NULL; - struct amdgpu_bo *shadow; - struct amdgpu_bo_vm *vmbo; - long r = 1, tmo; - - if (amdgpu_sriov_runtime(adev)) - tmo = msecs_to_jiffies(8000); - else - tmo = msecs_to_jiffies(100); - - dev_info(adev->dev, "recover vram bo from shadow start\n"); - mutex_lock(&adev->shadow_list_lock); - list_for_each_entry(vmbo, &adev->shadow_list, shadow_list) { - /* If vm is compute context or adev is APU, shadow will be NULL */ - if (!vmbo->shadow) - continue; - shadow = vmbo->shadow; - - /* No need to recover an evicted BO */ - if (!shadow->tbo.resource || - shadow->tbo.resource->mem_type != TTM_PL_TT || - shadow->tbo.resource->start == AMDGPU_BO_INVALID_OFFSET || - shadow->parent->tbo.resource->mem_type != TTM_PL_VRAM) - continue; - - r = amdgpu_bo_restore_shadow(shadow, &next); - if (r) - break; - - if (fence) { - tmo = dma_fence_wait_timeout(fence, false, tmo); - dma_fence_put(fence); - fence = next; - if (tmo == 0) { - r = -ETIMEDOUT; - break; - } else if (tmo < 0) { - r = tmo; - break; - } - } else { - fence = next; - } - } - mutex_unlock(&adev->shadow_list_lock); - - if (fence) - tmo = dma_fence_wait_timeout(fence, false, tmo); - dma_fence_put(fence); - - if (r < 0 || tmo <= 0) { - dev_err(adev->dev, "recover vram bo from shadow failed, r is %ld, tmo is %ld\n", r, tmo); - return -EIO; - } - - dev_info(adev->dev, "recover vram bo from shadow done\n"); - return 0; -} - - /** * amdgpu_device_reset_sriov - reset ASIC for SR-IOV vf * @@ -5112,12 +5035,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, if (r) return r; - if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) { + if (adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) amdgpu_inc_vram_lost(adev); - r = amdgpu_device_recover_vram(adev); - } - if (r) - return r; /* need to be called during full access so we can't do it later like * bare-metal does. @@ -5529,9 +5448,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, } } - if (!r) - r = amdgpu_device_recover_vram(tmp_adev); - else + if (r) tmp_adev->asic_reset_res = r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index c556c8b653fa4..5f3b2c157824e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -77,24 +77,6 @@ static void amdgpu_bo_user_destroy(struct ttm_buffer_object *tbo) amdgpu_bo_destroy(tbo); } -static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo) -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev); - struct amdgpu_bo *shadow_bo = ttm_to_amdgpu_bo(tbo), *bo; - struct amdgpu_bo_vm *vmbo; - - bo = shadow_bo->parent; - vmbo = to_amdgpu_bo_vm(bo); - /* in case amdgpu_device_recover_vram got NULL of bo->parent */ - if (!list_empty(&vmbo->shadow_list)) { - mutex_lock(&adev->shadow_list_lock); - list_del_init(&vmbo->shadow_list); - mutex_unlock(&adev->shadow_list_lock); - } - - amdgpu_bo_destroy(tbo); -} - /** * amdgpu_bo_is_amdgpu_bo - check if the buffer object is an &amdgpu_bo * @bo: buffer object to be checked @@ -108,8 +90,7 @@ static void amdgpu_bo_vm_destroy(struct ttm_buffer_object *tbo) bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo) { if (bo->destroy == &amdgpu_bo_destroy || - bo->destroy == &amdgpu_bo_user_destroy || - bo->destroy == &amdgpu_bo_vm_destroy) + bo->destroy == &amdgpu_bo_user_destroy) return true; return false; @@ -722,52 +703,6 @@ int amdgpu_bo_create_vm(struct amdgpu_device *adev, return r; } -/** - * amdgpu_bo_add_to_shadow_list - add a BO to the shadow list - * - * @vmbo: BO that will be inserted into the shadow list - * - * Insert a BO to the shadow list. - */ -void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo) -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(vmbo->bo.tbo.bdev); - - mutex_lock(&adev->shadow_list_lock); - list_add_tail(&vmbo->shadow_list, &adev->shadow_list); - vmbo->shadow->parent = amdgpu_bo_ref(&vmbo->bo); - vmbo->shadow->tbo.destroy = &amdgpu_bo_vm_destroy; - mutex_unlock(&adev->shadow_list_lock); -} - -/** - * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow - * - * @shadow: &amdgpu_bo shadow to be restored - * @fence: dma_fence associated with the operation - * - * Copies a buffer object's shadow content back to the object. - * This is used for recovering a buffer from its shadow in case of a gpu - * reset where vram context may be lost. - * - * Returns: - * 0 for success or a negative error code on failure. - */ -int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence) - -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev); - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; - uint64_t shadow_addr, parent_addr; - - shadow_addr = amdgpu_bo_gpu_offset(shadow); - parent_addr = amdgpu_bo_gpu_offset(shadow->parent); - - return amdgpu_copy_buffer(ring, shadow_addr, parent_addr, - amdgpu_bo_size(shadow), NULL, fence, - true, false, 0); -} - /** * amdgpu_bo_kmap - map an &amdgpu_bo buffer object * @bo: &amdgpu_bo buffer object to be mapped diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index bc42ccbde659a..a4fa1f296daec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -130,8 +130,6 @@ struct amdgpu_bo_user { struct amdgpu_bo_vm { struct amdgpu_bo bo; - struct amdgpu_bo *shadow; - struct list_head shadow_list; struct amdgpu_vm_bo_base entries[]; }; @@ -269,22 +267,6 @@ static inline bool amdgpu_bo_encrypted(struct amdgpu_bo *bo) return bo->flags & AMDGPU_GEM_CREATE_ENCRYPTED; } -/** - * amdgpu_bo_shadowed - check if the BO is shadowed - * - * @bo: BO to be tested. - * - * Returns: - * NULL if not shadowed or else return a BO pointer. - */ -static inline struct amdgpu_bo *amdgpu_bo_shadowed(struct amdgpu_bo *bo) -{ - if (bo->tbo.type == ttm_bo_type_kernel) - return to_amdgpu_bo_vm(bo)->shadow; - - return NULL; -} - bool amdgpu_bo_is_amdgpu_bo(struct ttm_buffer_object *bo); void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain); @@ -343,9 +325,6 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo); u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo); void amdgpu_bo_get_memory(struct amdgpu_bo *bo, struct amdgpu_mem_stats *stats); -void amdgpu_bo_add_to_shadow_list(struct amdgpu_bo_vm *vmbo); -int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, - struct dma_fence **fence); uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, uint32_t domain); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 0f7106066480e..4820277e3c550 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -465,7 +465,6 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm, { uint64_t new_vm_generation = amdgpu_vm_generation(adev, vm); struct amdgpu_vm_bo_base *bo_base; - struct amdgpu_bo *shadow; struct amdgpu_bo *bo; int r; @@ -486,16 +485,10 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm, spin_unlock(&vm->status_lock); bo = bo_base->bo; - shadow = amdgpu_bo_shadowed(bo); r = validate(param, bo); if (r) return r; - if (shadow) { - r = validate(param, shadow); - if (r) - return r; - } if (bo->tbo.type != ttm_bo_type_kernel) { amdgpu_vm_bo_moved(bo_base); @@ -2125,10 +2118,6 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, { struct amdgpu_vm_bo_base *bo_base; - /* shadow bo doesn't have bo base, its validation needs its parent */ - if (bo->parent && (amdgpu_bo_shadowed(bo->parent) == bo)) - bo = bo->parent; - for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { struct amdgpu_vm *vm = bo_base->vm; @@ -2456,7 +2445,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, root_bo = amdgpu_bo_ref(&root->bo); r = amdgpu_bo_reserve(root_bo, true); if (r) { - amdgpu_bo_unref(&root->shadow); amdgpu_bo_unref(&root_bo); goto error_free_delayed; } @@ -2548,11 +2536,6 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) vm->last_update = dma_fence_get_stub(); vm->is_compute_context = true; - /* Free the shadow bo for compute VM */ - amdgpu_bo_unref(&to_amdgpu_bo_vm(vm->root.bo)->shadow); - - goto unreserve_bo; - unreserve_bo: amdgpu_bo_unreserve(vm->root.bo); return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index f07647a9a9d97..ad3a9e594a406 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -383,14 +383,6 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (r) return r; - if (vmbo->shadow) { - struct amdgpu_bo *shadow = vmbo->shadow; - - r = ttm_bo_validate(&shadow->tbo, &shadow->placement, &ctx); - if (r) - return r; - } - if (!drm_dev_enter(adev_to_drm(adev), &idx)) return -ENODEV; @@ -448,10 +440,7 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t xcp_id) { struct amdgpu_bo_param bp; - struct amdgpu_bo *bo; - struct dma_resv *resv; unsigned int num_entries; - int r; memset(&bp, 0, sizeof(bp)); @@ -484,42 +473,7 @@ int amdgpu_vm_pt_create(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (vm->root.bo) bp.resv = vm->root.bo->tbo.base.resv; - r = amdgpu_bo_create_vm(adev, &bp, vmbo); - if (r) - return r; - - bo = &(*vmbo)->bo; - if (vm->is_compute_context || (adev->flags & AMD_IS_APU)) { - (*vmbo)->shadow = NULL; - return 0; - } - - if (!bp.resv) - WARN_ON(dma_resv_lock(bo->tbo.base.resv, - NULL)); - resv = bp.resv; - memset(&bp, 0, sizeof(bp)); - bp.size = amdgpu_vm_pt_size(adev, level); - bp.domain = AMDGPU_GEM_DOMAIN_GTT; - bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC; - bp.type = ttm_bo_type_kernel; - bp.resv = bo->tbo.base.resv; - bp.bo_ptr_size = sizeof(struct amdgpu_bo); - bp.xcp_id_plus1 = xcp_id + 1; - - r = amdgpu_bo_create(adev, &bp, &(*vmbo)->shadow); - - if (!resv) - dma_resv_unlock(bo->tbo.base.resv); - - if (r) { - amdgpu_bo_unref(&bo); - return r; - } - - amdgpu_bo_add_to_shadow_list(*vmbo); - - return 0; + return amdgpu_bo_create_vm(adev, &bp, vmbo); } /** @@ -569,7 +523,6 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev, return 0; error_free_pt: - amdgpu_bo_unref(&pt->shadow); amdgpu_bo_unref(&pt_bo); return r; } @@ -581,17 +534,10 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev, */ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) { - struct amdgpu_bo *shadow; - if (!entry->bo) return; entry->bo->vm_bo = NULL; - shadow = amdgpu_bo_shadowed(entry->bo); - if (shadow) { - ttm_bo_set_bulk_move(&shadow->tbo, NULL); - amdgpu_bo_unref(&shadow); - } ttm_bo_set_bulk_move(&entry->bo->tbo, NULL); spin_lock(&entry->vm->status_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index 9b748d7058b5c..390432a22ddd5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -35,16 +35,7 @@ */ static int amdgpu_vm_sdma_map_table(struct amdgpu_bo_vm *table) { - int r; - - r = amdgpu_ttm_alloc_gart(&table->bo.tbo); - if (r) - return r; - - if (table->shadow) - r = amdgpu_ttm_alloc_gart(&table->shadow->tbo); - - return r; + return amdgpu_ttm_alloc_gart(&table->bo.tbo); } /* Allocate a new job for @count PTE updates */ @@ -273,17 +264,13 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p, if (!p->pages_addr) { /* set page commands needed */ - if (vmbo->shadow) - amdgpu_vm_sdma_set_ptes(p, vmbo->shadow, pe, addr, - count, incr, flags); amdgpu_vm_sdma_set_ptes(p, bo, pe, addr, count, incr, flags); return 0; } /* copy commands needed */ - ndw -= p->adev->vm_manager.vm_pte_funcs->copy_pte_num_dw * - (vmbo->shadow ? 2 : 1); + ndw -= p->adev->vm_manager.vm_pte_funcs->copy_pte_num_dw; /* for padding */ ndw -= 7; @@ -298,8 +285,6 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p, pte[i] |= flags; } - if (vmbo->shadow) - amdgpu_vm_sdma_copy_ptes(p, vmbo->shadow, pe, nptes); amdgpu_vm_sdma_copy_ptes(p, bo, pe, nptes); pe += nptes * 8; -- 2.43.0

1 year, 4 months

1
0
0 0

Re: [PATCH 1/3] dma-buf: replace symbolic permission S_IRUGO with octal 0444

by Sumit Semwal

Hello Pintu, On Tue, 1 Oct 2024 at 23:16, Pintu Kumar <quic_pintu(a)quicinc.com> wrote: > > Symbolic permissions are not preferred, instead use the octal. > Also, fix other warnings/errors as well for cleanup. > > WARNING: Block comments use * on subsequent lines > + /* only support discovering the end of the buffer, > + but also allow SEEK_SET to maintain the idiomatic > > WARNING: Block comments use a trailing */ on a separate line > + SEEK_END(0), SEEK_CUR(0) pattern */ > > WARNING: Block comments use a trailing */ on a separate line > + * before passing the sgt back to the exporter. */ > > ERROR: "foo * bar" should be "foo *bar" > +static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, > > WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. > + d = debugfs_create_file("bufinfo", S_IRUGO, dma_buf_debugfs_dir, > > total: 1 errors, 4 warnings, 1746 lines checked > > Signed-off-by: Pintu Kumar <quic_pintu(a)quicinc.com> Thanks for this patch - could you please also mention in the commit log how did you find this? It looks like you ran checkpatch, but it's not clear from the commit log. Since this patch does multiple things related to checkpatch warnings (change S_IRUGO to 0444, comments correction, function declaration correction), can I please ask you to change the commit title to also reflect that? > --- > drivers/dma-buf/dma-buf.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 8892bc701a66..2e63d50e46d3 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -176,8 +176,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence) > dmabuf = file->private_data; > > /* only support discovering the end of the buffer, > - but also allow SEEK_SET to maintain the idiomatic > - SEEK_END(0), SEEK_CUR(0) pattern */ > + * but also allow SEEK_SET to maintain the idiomatic > + * SEEK_END(0), SEEK_CUR(0) pattern. > + */ > if (whence == SEEK_END) > base = dmabuf->size; > else if (whence == SEEK_SET) > @@ -782,13 +783,14 @@ static void mangle_sg_table(struct sg_table *sg_table) > /* To catch abuse of the underlying struct page by importers mix > * up the bits, but take care to preserve the low SG_ bits to > * not corrupt the sgt. The mixing is undone in __unmap_dma_buf > - * before passing the sgt back to the exporter. */ > + * before passing the sgt back to the exporter. > + */ > for_each_sgtable_sg(sg_table, sg, i) > sg->page_link ^= ~0xffUL; > #endif > > } > -static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, > +static struct sg_table *__map_dma_buf(struct dma_buf_attachment *attach, > enum dma_data_direction direction) > { > struct sg_table *sg_table; > @@ -1694,7 +1696,7 @@ static int dma_buf_init_debugfs(void) > > dma_buf_debugfs_dir = d; > > - d = debugfs_create_file("bufinfo", S_IRUGO, dma_buf_debugfs_dir, > + d = debugfs_create_file("bufinfo", 0444, dma_buf_debugfs_dir, > NULL, &dma_buf_debug_fops); > if (IS_ERR(d)) { > pr_debug("dma_buf: debugfs: failed to create node bufinfo\n"); > -- > 2.17.1 > Best, Sumit.

1 year, 4 months

2
1
0 0

Re: [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor

by Jani Nikula

On Wed, 02 Oct 2024, Boris Brezillon <boris.brezillon(a)collabora.com> wrote: > Queued to drm-misc-next after applying the few modifications I > mentioned. Also added Steve's ack (given on IRC) on the first patch. Can we have the drm-tip rebuild conflict resolution too, please? diff --cc drivers/gpu/drm/panthor/panthor_drv.c index c520f156e2d7,f9b93f84d611..000000000000 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@@ -1383,7 -1476,7 +1476,11 @@@ static const struct file_operations pan .read = drm_read, .llseek = noop_llseek, .mmap = panthor_mmap, ++<<<<<<< HEAD + .fop_flags = FOP_UNSIGNED_OFFSET, ++======= + .show_fdinfo = drm_show_fdinfo, ++>>>>>>> drm-misc/drm-misc-next }; #ifdef CONFIG_DEBUG_FS > >> >> .../testing/sysfs-driver-panthor-profiling | 10 + >> Documentation/gpu/panthor.rst | 46 +++ >> drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +- >> drivers/gpu/drm/panthor/panthor_device.h | 36 ++ >> drivers/gpu/drm/panthor/panthor_drv.c | 73 ++++ >> drivers/gpu/drm/panthor/panthor_gem.c | 12 + >> drivers/gpu/drm/panthor/panthor_sched.c | 384 +++++++++++++++--- >> drivers/gpu/drm/panthor/panthor_sched.h | 2 + >> 8 files changed, 531 insertions(+), 50 deletions(-) >> create mode 100644 Documentation/ABI/testing/sysfs-driver-panthor-profiling >> create mode 100644 Documentation/gpu/panthor.rst >> > -- Jani Nikula, Intel

1 year, 4 months

1
0
0 0

Re: [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 02/10/2024 09:38, Boris Brezillon wrote: > On Tue, 24 Sep 2024 00:06:21 +0100 > Adrián Larumbe <adrian.larumbe(a)collabora.com> wrote: > >> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >> + u32 cs_ringbuf_size) >> +{ >> + u32 min_profiled_job_instrs = U32_MAX; >> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >> + >> + /* >> + * We want to calculate the minimum size of a profiled job's CS, >> + * because since they need additional instructions for the sampling >> + * of performance metrics, they might take up further slots in >> + * the queue's ringbuffer. This means we might not need as many job >> + * slots for keeping track of their profiling information. What we >> + * need is the maximum number of slots we should allocate to this end, >> + * which matches the maximum number of profiled jobs we can place >> + * simultaneously in the queue's ring buffer. >> + * That has to be calculated separately for every single job profiling >> + * flag, but not in the case job profiling is disabled, since unprofiled >> + * jobs don't need to keep track of this at all. >> + */ >> + for (u32 i = 0; i < last_flag; i++) { >> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > > I'll get rid of this check when applying, as suggested by Steve. Steve, > with this modification do you want me to add your R-b? Yes, please do. Thanks, Steve > BTW, I've also fixed a bunch of checkpatch errors/warnings, so you > might want to run checkpatch --strict next time. > >> + min_profiled_job_instrs = >> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >> + } >> + >> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >> +}

1 year, 4 months

1
0
0 0

Re: Question about 'dma_resv_get_fences'

by Christian König

Hi, Am 30.09.24 um 21:38 schrieb Zichen Xie: > Dear Linux Developers for DMA BUFFER SHARING FRAMEWORK, > > We are curious about the function 'dma_resv_get_fences' here: > https://elixir.bootlin.com/linux/v6.11/source/drivers/dma-buf/dma-resv.c#L5…, > and the logic below: > ``` > dma_resv_for_each_fence_unlocked(&cursor, fence) { > > if (dma_resv_iter_is_restarted(&cursor)) { > struct dma_fence **new_fences; > unsigned int count; > > while (*num_fences) > dma_fence_put((*fences)[--(*num_fences)]); > > count = cursor.num_fences + 1; > > /* Eventually re-allocate the array */ > new_fences = krealloc_array(*fences, count, > sizeof(void *), > GFP_KERNEL); > if (count && !new_fences) { > kfree(*fences); > *fences = NULL; > *num_fences = 0; > dma_resv_iter_end(&cursor); > return -ENOMEM; > } > *fences = new_fences; > } > > (*fences)[(*num_fences)++] = dma_fence_get(fence); > } > ``` > The existing check 'if (count && !new_fences)' may fail if count==0, > and 'krealloc_array' with count==0 is an undefined behavior. The > realloc may fail and return a NULL pointer, leading to a NULL Pointer > Dereference in '(*fences)[(*num_fences)++] = dma_fence_get(fence);' You already answered the question yourself "count = cursor.num_fences + 1;". So count can never be 0. What could theoretically be possible is that num_fences overflows, but this value isn't userspace controllable and we would run into memory allocation failures long before that happened. But we could potentially remove this whole handling since if there are no fences in the dma_resv object we don't enter the loop in the first place. Regards, Christian. > > Please correct us if we miss some key prerequisites for this function! > Thank you very much!

1 year, 4 months

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 27/09/2024 15:53, Adrián Larumbe wrote: > On 25.09.2024 10:56, Steven Price wrote: >> On 23/09/2024 21:43, Adrián Larumbe wrote: >>> Hi Steve, >>> >>> On 23.09.2024 09:55, Steven Price wrote: >>>> On 20/09/2024 23:36, Adrián Larumbe wrote: >>>>> Hi Steve, thanks for the review. >>>> >>>> Hi Adrián, >>>> >>>>> I've applied all of your suggestions for the next patch series revision, so I'll >>>>> only be answering to your question about the calc_profiling_ringbuf_num_slots >>>>> function further down below. >>>>> >>>> >>>> [...] >>>> >>>>>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>>>>>> .free_job = queue_free_job, >>>>>>> }; >>>>>>> >>>>>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>>>>>> + u32 cs_ringbuf_size) >>>>>>> +{ >>>>>>> + u32 min_profiled_job_instrs = U32_MAX; >>>>>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>>>>>> + >>>>>>> + /* >>>>>>> + * We want to calculate the minimum size of a profiled job's CS, >>>>>>> + * because since they need additional instructions for the sampling >>>>>>> + * of performance metrics, they might take up further slots in >>>>>>> + * the queue's ringbuffer. This means we might not need as many job >>>>>>> + * slots for keeping track of their profiling information. What we >>>>>>> + * need is the maximum number of slots we should allocate to this end, >>>>>>> + * which matches the maximum number of profiled jobs we can place >>>>>>> + * simultaneously in the queue's ring buffer. >>>>>>> + * That has to be calculated separately for every single job profiling >>>>>>> + * flag, but not in the case job profiling is disabled, since unprofiled >>>>>>> + * jobs don't need to keep track of this at all. >>>>>>> + */ >>>>>>> + for (u32 i = 0; i < last_flag; i++) { >>>>>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>>>>>> + min_profiled_job_instrs = >>>>>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>>>>>> + } >>>>>>> + >>>>>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>>>>>> +} >>>>>> >>>>>> I may be missing something, but is there a situation where this is >>>>>> different to calc_job_credits(0)? AFAICT the infrastructure you've added >>>>>> can only add extra instructions to the no-flags case - whereas this >>>>>> implies you're thinking that instructions may also be removed (or replaced). >>>>>> >>>>>> Steve >>>>> >>>>> Since we create a separate kernel BO to hold the profiling information slot, we >>>>> need one that would be able to accomodate as many slots as the maximum number of >>>>> profiled jobs we can insert simultaneously into the queue's ring buffer. Because >>>>> profiled jobs always take more instructions than unprofiled ones, then we would >>>>> usually need fewer slots than the number of unprofiled jobs we could insert at >>>>> once in the ring buffer. >>>>> >>>>> Because we represent profiling metrics with a bit mask, then we need to test the >>>>> size of the CS for every single metric enabled in isolation, since enabling more >>>>> than one will always mean a bigger CS, and therefore fewer jobs tracked at once >>>>> in the queue's ring buffer. >>>>> >>>>> In our case, calling calc_job_credits(0) would simply tell us the number of >>>>> instructions we need for a normal job with no profiled features enabled, which >>>>> would always requiere less instructions than profiled ones, and therefore more >>>>> slots in the profiling info kernel BO. But we don't need to keep track of >>>>> profiling numbers for unprofiled jobs, so there's no point in calculating this >>>>> number. >>>>> >>>>> At first I was simply allocating a profiling info kernel BO as big as the number >>>>> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out >>>>> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would >>>>> be wasted, since profiled jobs always require more slots because they hold more >>>>> instructions, so fewer profiling slots in said kernel BO. >>>>> >>>>> The value of this approach will eventually manifest if we decided to keep track of >>>>> more profiling metrics, since this code won't have to change at all, other than >>>>> adding new profiling flags in the panthor_device_profiling_flags enum. >>>> >>>> Thanks for the detailed explanation. I think what I was missing is that >>>> the loop is checking each bit flag independently and *not* checking >>>> calc_job_credits(0). >>>> >>>> The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what >>>> confused me - that should be completely redundant. Or at least we need >>>> something more intelligent if we have profiling bits which are not >>>> mutually compatible. >>> >>> I thought of an alternative that would only test bits that are actually part of >>> the mask: >>> >>> static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>> u32 cs_ringbuf_size) >>> { >>> u32 min_profiled_job_instrs = U32_MAX; >>> u32 profiling_mask = PANTHOR_DEVICE_PROFILING_ALL; >>> >>> while (profiling_mask) { >>> u32 i = ffs(profiling_mask) - 1; >>> profiling_mask &= ~BIT(i); >>> min_profiled_job_instrs = >>> min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>> } >>> >>> return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>> } >>> >>> However, I don't think this would be more efficient, because ffs() is probably >>> fetching the first set bit by performing register shifts, and I guess this would >>> take somewhat longer than iterating over every single bit from the last one, >>> even if also matching them against the whole mask, just in case in future >>> additions of performance metrics we decide to leave some of the lower >>> significance bits untouched. >> >> Efficiency isn't very important here - we're not on a fast path, so it's >> more about ensuring the code is readable. I don't think the above is >> more readable then the original for loop. >> >>> Regarding your question about mutual compatibility, I don't think that is an >>> issue here, because we're testing bits in isolation. If in the future we find >>> out that some of the values we're profiling cannot be sampled at once, we can >>> add that logic to the sysfs knob handler, to make sure UM cannot set forbidden >>> profiling masks. >> >> My comment about compatibility is because in the original above you were >> calculating the top bit of PANTHOR_DEVICE_PROFILING_ALL: >> >>> u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >> >> then looping between 0 and that bit: >> >>> for (u32 i = 0; i < last_flag; i++) { >> >> So the test: >> >>> if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >> >> would only fail if PANTHOR_DEVICE_PROFILING_ALL had gaps in the bits >> that it set. The only reason I can think for that to be true in the >> future is if there is some sort of incompatibility - e.g. maybe there's >> an old and new way of doing some form of profiling with the old way >> being kept for backwards compatibility. But I suspect if/when that is >> required we'll need to revisit this function anyway. So that 'if' >> statement seems completely redundant (it's trivially always true). > > I think you're right about this. Would you be fine with the rest of the patch > as it is in revision 8 if I also deleted this bitmask check? Yes the rest of it looks fine. Thanks, Steve >> Steve >> >>>> I'm also not entirely sure that the amount of RAM saved is significant, >>>> but you've already written the code so we might as well have the saving ;) >>> >>> I think this was more evident before Boris suggested we reduce the basic slot >>> size to that of a single cache line, because then the minimum profiled job >>> might've taken twice as many ringbuffer slots as a nonprofiled one. In that >>> case, we would need a half as big BO for holding the sampled data (in case the >>> least size profiled job CS would extend over the 16 instruction boundary). >>> I still think this is a good idea so that in the future we don't need to worry >>> about adjusting the code that deals with preparing the right boilerplate CS, >>> since it'll only be a matter of adding new instructions inside prepare_job_instrs(). >>> >>>> Thanks, >>>> Steve >>>> >>>>> Regards, >>>>> Adrian >>>>> >>>>>>> + >>>>>>> static struct panthor_queue * >>>>>>> group_create_queue(struct panthor_group *group, >>>>>>> const struct drm_panthor_queue_create *args) >>>>>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>>>>>> goto err_free_queue; >>>>>>> } >>>>>>> >>>>>>> + queue->profiling.slot_count = >>>>>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>>>>>> + >>>>>>> + queue->profiling.slots = >>>>>>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>>>>>> + queue->profiling.slot_count * >>>>>>> + sizeof(struct panthor_job_profiling_data), >>>>>>> + DRM_PANTHOR_BO_NO_MMAP, >>>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>>>>>> + PANTHOR_VM_KERNEL_AUTO_VA); >>>>>>> + >>>>>>> + if (IS_ERR(queue->profiling.slots)) { >>>>>>> + ret = PTR_ERR(queue->profiling.slots); >>>>>>> + goto err_free_queue; >>>>>>> + } >>>>>>> + >>>>>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>>>>>> + if (ret) >>>>>>> + goto err_free_queue; >>>>>>> + >>>>>>> + /* >>>>>>> + * Credit limit argument tells us the total number of instructions >>>>>>> + * across all CS slots in the ringbuffer, with some jobs requiring >>>>>>> + * twice as many as others, depending on their profiling status. >>>>>>> + */ >>>>>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>>>>>> group->ptdev->scheduler->wq, 1, >>>>>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>>>>>> + args->ringbuf_size / sizeof(u64), >>>>>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>>>>>> group->ptdev->reset.wq, >>>>>>> NULL, "panthor-queue", group->ptdev->base.dev); >>>>>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>>>>>> { >>>>>>> struct panthor_group_pool *gpool = pfile->groups; >>>>>>> struct panthor_job *job; >>>>>>> + u32 credits; >>>>>>> int ret; >>>>>>> >>>>>>> if (qsubmit->pad) >>>>>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> + job->profiling.mask = pfile->ptdev->profile_mask; >>>>>>> + credits = calc_job_credits(job->profiling.mask); >>>>>>> + if (credits == 0) { >>>>>>> + ret = -EINVAL; >>>>>>> + goto err_put_job; >>>>>>> + } >>>>>>> + >>>>>>> ret = drm_sched_job_init(&job->base, >>>>>>> &job->group->queues[job->queue_idx]->entity, >>>>>>> - 1, job->group); >>>>>>> + credits, job->group); >>>>>>> if (ret) >>>>>>> goto err_put_job; >>>>>>> >>>>>

1 year, 4 months

1
0
0 0

[RFC PATCH 0/4] Linaro restricted heap

by Jens Wiklander

Hi, This patch set is based on top of Yong Wu's restricted heap patch set [1]. It's also a continuation on Olivier's Add dma-buf secure-heap patch set [2]. The Linaro restricted heap uses genalloc in the kernel to manage the heap carvout. This is a difference from the Mediatek restricted heap which relies on the secure world to manage the carveout. I've tried to adress the comments on [2], but [1] introduces changes so I'm afraid I've had to skip some comments. This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v1 repo sync -j8 cd build make toolchains -j4 make all -j$(nproc) make run-only # login and at the prompt: xtest --sdp-basic https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above. The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. Cheers, Jens [1] https://lore.kernel.org/dri-devel/20240515112308.10171-1-yong.wu@mediatek.c… [2] https://lore.kernel.org/lkml/20220805135330.970-1-olivier.masse@nxp.com/ Changes since Olivier's post [2]: * Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap * Simplifications and cleanup * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support" * Replaced the word "secure" with "restricted" where applicable Etienne Carriere (1): tee: new ioctl to a register tee_shm from a dmabuf file descriptor Jens Wiklander (2): dma-buf: heaps: restricted_heap: add no_map attribute dma-buf: heaps: add Linaro restricted dmabuf heap support Olivier Masse (1): dt-bindings: reserved-memory: add linaro,restricted-heap .../linaro,restricted-heap.yaml | 56 ++++++ drivers/dma-buf/heaps/Kconfig | 10 ++ drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/restricted_heap.c | 17 +- drivers/dma-buf/heaps/restricted_heap.h | 2 + .../dma-buf/heaps/restricted_heap_linaro.c | 165 ++++++++++++++++++ drivers/tee/tee_core.c | 38 ++++ drivers/tee/tee_shm.c | 104 ++++++++++- include/linux/tee_drv.h | 11 ++ include/uapi/linux/tee.h | 29 +++ 10 files changed, 426 insertions(+), 7 deletions(-) create mode 100644 Documentation/devicetree/bindings/reserved-memory/linaro,restricted-heap.yaml create mode 100644 drivers/dma-buf/heaps/restricted_heap_linaro.c -- 2.34.1

1 year, 4 months

10
35
0 0

Re: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.11 next-20240927] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240923230912.2207320-4-adrian.larumbe%40collabo… patch subject: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support config: arm-randconfig-002-20240929 (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@…) compiler: arm-linux-gnueabi-gcc (GCC) 14.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409291048.zLqDeqpO-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/linux/math64.h:6, from include/linux/time.h:6, from include/linux/stat.h:19, from include/linux/module.h:13, from drivers/gpu/drm/panthor/panthor_drv.c:7: drivers/gpu/drm/panthor/panthor_drv.c: In function 'panthor_gpu_show_fdinfo': >> drivers/gpu/drm/panthor/panthor_drv.c:1389:45: error: implicit declaration of function 'arch_timer_get_cntfrq' [-Wimplicit-function-declaration] 1389 | arch_timer_get_cntfrq())); | ^~~~~~~~~~~~~~~~~~~~~ include/linux/math.h:40:39: note: in definition of macro 'DIV_ROUND_DOWN_ULL' 40 | ({ unsigned long long _tmp = (ll); do_div(_tmp, d); _tmp; }) | ^~ drivers/gpu/drm/panthor/panthor_drv.c:1388:28: note: in expansion of macro 'DIV_ROUND_UP_ULL' 1388 | DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), | ^~~~~~~~~~~~~~~~ vim +/arch_timer_get_cntfrq +1389 drivers/gpu/drm/panthor/panthor_drv.c 1377 1378 static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, 1379 struct panthor_file *pfile, 1380 struct drm_printer *p) 1381 { 1382 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_ALL) 1383 panthor_fdinfo_gather_group_samples(pfile); 1384 1385 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP) { 1386 #ifdef CONFIG_ARM_ARCH_TIMER 1387 drm_printf(p, "drm-engine-panthor:\t%llu ns\n", 1388 DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), > 1389 arch_timer_get_cntfrq())); 1390 #endif 1391 } 1392 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_CYCLES) 1393 drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); 1394 1395 drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate); 1396 drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency); 1397 } 1398 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 year, 4 months

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 23/09/2024 21:43, Adrián Larumbe wrote: > Hi Steve, > > On 23.09.2024 09:55, Steven Price wrote: >> On 20/09/2024 23:36, Adrián Larumbe wrote: >>> Hi Steve, thanks for the review. >> >> Hi Adrián, >> >>> I've applied all of your suggestions for the next patch series revision, so I'll >>> only be answering to your question about the calc_profiling_ringbuf_num_slots >>> function further down below. >>> >> >> [...] >> >>>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>>>> .free_job = queue_free_job, >>>>> }; >>>>> >>>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>>>> + u32 cs_ringbuf_size) >>>>> +{ >>>>> + u32 min_profiled_job_instrs = U32_MAX; >>>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>>>> + >>>>> + /* >>>>> + * We want to calculate the minimum size of a profiled job's CS, >>>>> + * because since they need additional instructions for the sampling >>>>> + * of performance metrics, they might take up further slots in >>>>> + * the queue's ringbuffer. This means we might not need as many job >>>>> + * slots for keeping track of their profiling information. What we >>>>> + * need is the maximum number of slots we should allocate to this end, >>>>> + * which matches the maximum number of profiled jobs we can place >>>>> + * simultaneously in the queue's ring buffer. >>>>> + * That has to be calculated separately for every single job profiling >>>>> + * flag, but not in the case job profiling is disabled, since unprofiled >>>>> + * jobs don't need to keep track of this at all. >>>>> + */ >>>>> + for (u32 i = 0; i < last_flag; i++) { >>>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>>>> + min_profiled_job_instrs = >>>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>>>> + } >>>>> + >>>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>>>> +} >>>> >>>> I may be missing something, but is there a situation where this is >>>> different to calc_job_credits(0)? AFAICT the infrastructure you've added >>>> can only add extra instructions to the no-flags case - whereas this >>>> implies you're thinking that instructions may also be removed (or replaced). >>>> >>>> Steve >>> >>> Since we create a separate kernel BO to hold the profiling information slot, we >>> need one that would be able to accomodate as many slots as the maximum number of >>> profiled jobs we can insert simultaneously into the queue's ring buffer. Because >>> profiled jobs always take more instructions than unprofiled ones, then we would >>> usually need fewer slots than the number of unprofiled jobs we could insert at >>> once in the ring buffer. >>> >>> Because we represent profiling metrics with a bit mask, then we need to test the >>> size of the CS for every single metric enabled in isolation, since enabling more >>> than one will always mean a bigger CS, and therefore fewer jobs tracked at once >>> in the queue's ring buffer. >>> >>> In our case, calling calc_job_credits(0) would simply tell us the number of >>> instructions we need for a normal job with no profiled features enabled, which >>> would always requiere less instructions than profiled ones, and therefore more >>> slots in the profiling info kernel BO. But we don't need to keep track of >>> profiling numbers for unprofiled jobs, so there's no point in calculating this >>> number. >>> >>> At first I was simply allocating a profiling info kernel BO as big as the number >>> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out >>> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would >>> be wasted, since profiled jobs always require more slots because they hold more >>> instructions, so fewer profiling slots in said kernel BO. >>> >>> The value of this approach will eventually manifest if we decided to keep track of >>> more profiling metrics, since this code won't have to change at all, other than >>> adding new profiling flags in the panthor_device_profiling_flags enum. >> >> Thanks for the detailed explanation. I think what I was missing is that >> the loop is checking each bit flag independently and *not* checking >> calc_job_credits(0). >> >> The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what >> confused me - that should be completely redundant. Or at least we need >> something more intelligent if we have profiling bits which are not >> mutually compatible. > > I thought of an alternative that would only test bits that are actually part of > the mask: > > static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > u32 cs_ringbuf_size) > { > u32 min_profiled_job_instrs = U32_MAX; > u32 profiling_mask = PANTHOR_DEVICE_PROFILING_ALL; > > while (profiling_mask) { > u32 i = ffs(profiling_mask) - 1; > profiling_mask &= ~BIT(i); > min_profiled_job_instrs = > min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > } > > return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > } > > However, I don't think this would be more efficient, because ffs() is probably > fetching the first set bit by performing register shifts, and I guess this would > take somewhat longer than iterating over every single bit from the last one, > even if also matching them against the whole mask, just in case in future > additions of performance metrics we decide to leave some of the lower > significance bits untouched. Efficiency isn't very important here - we're not on a fast path, so it's more about ensuring the code is readable. I don't think the above is more readable then the original for loop. > Regarding your question about mutual compatibility, I don't think that is an > issue here, because we're testing bits in isolation. If in the future we find > out that some of the values we're profiling cannot be sampled at once, we can > add that logic to the sysfs knob handler, to make sure UM cannot set forbidden > profiling masks. My comment about compatibility is because in the original above you were calculating the top bit of PANTHOR_DEVICE_PROFILING_ALL: > u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); then looping between 0 and that bit: > for (u32 i = 0; i < last_flag; i++) { So the test: > if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) would only fail if PANTHOR_DEVICE_PROFILING_ALL had gaps in the bits that it set. The only reason I can think for that to be true in the future is if there is some sort of incompatibility - e.g. maybe there's an old and new way of doing some form of profiling with the old way being kept for backwards compatibility. But I suspect if/when that is required we'll need to revisit this function anyway. So that 'if' statement seems completely redundant (it's trivially always true). Steve >> I'm also not entirely sure that the amount of RAM saved is significant, >> but you've already written the code so we might as well have the saving ;) > > I think this was more evident before Boris suggested we reduce the basic slot > size to that of a single cache line, because then the minimum profiled job > might've taken twice as many ringbuffer slots as a nonprofiled one. In that > case, we would need a half as big BO for holding the sampled data (in case the > least size profiled job CS would extend over the 16 instruction boundary). > I still think this is a good idea so that in the future we don't need to worry > about adjusting the code that deals with preparing the right boilerplate CS, > since it'll only be a matter of adding new instructions inside prepare_job_instrs(). > >> Thanks, >> Steve >> >>> Regards, >>> Adrian >>> >>>>> + >>>>> static struct panthor_queue * >>>>> group_create_queue(struct panthor_group *group, >>>>> const struct drm_panthor_queue_create *args) >>>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>>>> goto err_free_queue; >>>>> } >>>>> >>>>> + queue->profiling.slot_count = >>>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>>>> + >>>>> + queue->profiling.slots = >>>>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>>>> + queue->profiling.slot_count * >>>>> + sizeof(struct panthor_job_profiling_data), >>>>> + DRM_PANTHOR_BO_NO_MMAP, >>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>>>> + PANTHOR_VM_KERNEL_AUTO_VA); >>>>> + >>>>> + if (IS_ERR(queue->profiling.slots)) { >>>>> + ret = PTR_ERR(queue->profiling.slots); >>>>> + goto err_free_queue; >>>>> + } >>>>> + >>>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>>>> + if (ret) >>>>> + goto err_free_queue; >>>>> + >>>>> + /* >>>>> + * Credit limit argument tells us the total number of instructions >>>>> + * across all CS slots in the ringbuffer, with some jobs requiring >>>>> + * twice as many as others, depending on their profiling status. >>>>> + */ >>>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>>>> group->ptdev->scheduler->wq, 1, >>>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>>>> + args->ringbuf_size / sizeof(u64), >>>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>>>> group->ptdev->reset.wq, >>>>> NULL, "panthor-queue", group->ptdev->base.dev); >>>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>>>> { >>>>> struct panthor_group_pool *gpool = pfile->groups; >>>>> struct panthor_job *job; >>>>> + u32 credits; >>>>> int ret; >>>>> >>>>> if (qsubmit->pad) >>>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>>>> } >>>>> } >>>>> >>>>> + job->profiling.mask = pfile->ptdev->profile_mask; >>>>> + credits = calc_job_credits(job->profiling.mask); >>>>> + if (credits == 0) { >>>>> + ret = -EINVAL; >>>>> + goto err_put_job; >>>>> + } >>>>> + >>>>> ret = drm_sched_job_init(&job->base, >>>>> &job->group->queues[job->queue_idx]->entity, >>>>> - 1, job->group); >>>>> + credits, job->group); >>>>> if (ret) >>>>> goto err_put_job; >>>>> >>> > > > Adrian Larumbe

1 year, 4 months

1
0
0 0

Re: [PATCH] dma-buf: Add syntax highlighting to code listings in the document

by Christian König

Nothing wrong with this, I just didn't had time to double check it myself and then forgotten about it. Going to push it to drm-misc-next. Regards, Christian. Am 23.09.24 um 11:22 schrieb Tommy Chiang: > Ping. > Please let me know if I'm doing something wrong. > > On Mon, Feb 19, 2024 at 11:00 AM Tommy Chiang <ototot(a)chromium.org> wrote: >> Kindly ping :) >> >> On Fri, Jan 19, 2024 at 11:33 AM Tommy Chiang <ototot(a)chromium.org> wrote: >>> This patch tries to improve the display of the code listing >>> on The Linux Kernel documentation website for dma-buf [1] . >>> >>> Originally, it appears that it was attempting to escape >>> the '*' character, but looks like it's not necessary (now), >>> so we are seeing something like '\*' on the webite. >>> >>> This patch removes these unnecessary backslashes and adds syntax >>> highlighting to improve the readability of the code listing. >>> >>> [1] https://docs.kernel.org/driver-api/dma-buf.html >>> >>> Signed-off-by: Tommy Chiang <ototot(a)chromium.org> >>> --- >>> drivers/dma-buf/dma-buf.c | 15 +++++++++------ >>> 1 file changed, 9 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c >>> index 8fe5aa67b167..e083a0ab06d7 100644 >>> --- a/drivers/dma-buf/dma-buf.c >>> +++ b/drivers/dma-buf/dma-buf.c >>> @@ -1282,10 +1282,12 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF); >>> * vmap interface is introduced. Note that on very old 32-bit architectures >>> * vmalloc space might be limited and result in vmap calls failing. >>> * >>> - * Interfaces:: >>> + * Interfaces: >>> * >>> - * void \*dma_buf_vmap(struct dma_buf \*dmabuf, struct iosys_map \*map) >>> - * void dma_buf_vunmap(struct dma_buf \*dmabuf, struct iosys_map \*map) >>> + * .. code-block:: c >>> + * >>> + * void *dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map) >>> + * void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map) >>> * >>> * The vmap call can fail if there is no vmap support in the exporter, or if >>> * it runs out of vmalloc space. Note that the dma-buf layer keeps a reference >>> @@ -1342,10 +1344,11 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF); >>> * enough, since adding interfaces to intercept pagefaults and allow pte >>> * shootdowns would increase the complexity quite a bit. >>> * >>> - * Interface:: >>> + * Interface: >>> + * >>> + * .. code-block:: c >>> * >>> - * int dma_buf_mmap(struct dma_buf \*, struct vm_area_struct \*, >>> - * unsigned long); >>> + * int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); >>> * >>> * If the importing subsystem simply provides a special-purpose mmap call to >>> * set up a mapping in userspace, calling do_mmap with &dma_buf.file will >>> -- >>> 2.43.0.381.gb435a96ce8-goog >>>

1 year, 4 months

1
0
0 0

Re: [PATCH v7 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 23/09/2024 11:18, Boris Brezillon wrote: > On Mon, 23 Sep 2024 10:07:14 +0100 > Steven Price <steven.price(a)arm.com> wrote: > >>> +static struct dma_fence * >>> +queue_run_job(struct drm_sched_job *sched_job) >>> +{ >>> + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); >>> + struct panthor_group *group = job->group; >>> + struct panthor_queue *queue = group->queues[job->queue_idx]; >>> + struct panthor_device *ptdev = group->ptdev; >>> + struct panthor_scheduler *sched = ptdev->scheduler; >>> + struct panthor_job_ringbuf_instrs instrs; >> >> instrs isn't initialised... >> >>> + struct panthor_job_cs_params cs_params; >>> + struct dma_fence *done_fence; >>> + int ret; >>> >>> /* Stream size is zero, nothing to do except making sure all previously >>> * submitted jobs are done before we signal the >>> @@ -2900,17 +3062,23 @@ queue_run_job(struct drm_sched_job *sched_job) >>> queue->fence_ctx.id, >>> atomic64_inc_return(&queue->fence_ctx.seqno)); >>> >>> - memcpy(queue->ringbuf->kmap + ringbuf_insert, >>> - call_instrs, sizeof(call_instrs)); >>> + job->profiling.slot = queue->profiling.seqno++; >>> + if (queue->profiling.seqno == queue->profiling.slot_count) >>> + queue->profiling.seqno = 0; >>> + >>> + job->ringbuf.start = queue->iface.input->insert; >>> + >>> + get_job_cs_params(job, &cs_params); >>> + prepare_job_instrs(&cs_params, &instrs); >> >> ...but it's passed into prepare_job_instrs() which depends on >> instrs.count (same bug as was in calc_job_credits()) - sorry I didn't >> spot it last review. > > Hm, can't we initialize instr::count to zero in prepare_job_instrs() > instead? Indeed that would probably be better! I hadn't noticed there were two places in the previous review. Steve

1 year, 4 months

1
0
0 0

Re: [PATCH v7 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 21/09/2024 00:43, Adrián Larumbe wrote: > Enable calculations of job submission times in clock cycles and wall > time. This is done by expanding the boilerplate command stream when running > a job to include instructions that compute said times right before and > after a user CS. > > A separate kernel BO is created per queue to store those values. Jobs can > access their sampled data through an index different from that of the > queue's ringbuffer. The reason for this is saving memory on the profiling > information kernel BO, since the amount of simultaneous profiled jobs we > can write into the queue's ringbuffer might be much smaller than for > regular jobs, as the former take more CSF instructions. > > This commit is done in preparation for enabling DRM fdinfo support in the > Panthor driver, which depends on the numbers calculated herein. > > A profile mode mask has been added that will in a future commit allow UM to > toggle performance metric sampling behaviour, which is disabled by default > to save power. When a ringbuffer CS is constructed, timestamp and cycling > sampling instructions are added depending on the enabled flags in the > profiling mask. > > A helper was provided that calculates the number of instructions for a > given set of enablement mask, and these are passed as the number of credits > when initialising a DRM scheduler job. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> > Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com> > Reviewed-by: Liviu Dudau <liviu.dudau(a)arm.com> I think just one bug remaining - see below... > --- > drivers/gpu/drm/panthor/panthor_device.h | 22 ++ > drivers/gpu/drm/panthor/panthor_sched.c | 328 +++++++++++++++++++---- > 2 files changed, 301 insertions(+), 49 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index e388c0472ba7..a48e30d0af30 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -66,6 +66,25 @@ struct panthor_irq { > atomic_t suspended; > }; > > +/** > + * enum panthor_device_profiling_mode - Profiling state > + */ > +enum panthor_device_profiling_flags { > + /** @PANTHOR_DEVICE_PROFILING_DISABLED: Profiling is disabled. */ > + PANTHOR_DEVICE_PROFILING_DISABLED = 0, > + > + /** @PANTHOR_DEVICE_PROFILING_CYCLES: Sampling job cycles. */ > + PANTHOR_DEVICE_PROFILING_CYCLES = BIT(0), > + > + /** @PANTHOR_DEVICE_PROFILING_TIMESTAMP: Sampling job timestamp. */ > + PANTHOR_DEVICE_PROFILING_TIMESTAMP = BIT(1), > + > + /** @PANTHOR_DEVICE_PROFILING_ALL: Sampling everything. */ > + PANTHOR_DEVICE_PROFILING_ALL = > + PANTHOR_DEVICE_PROFILING_CYCLES | > + PANTHOR_DEVICE_PROFILING_TIMESTAMP, > +}; > + > /** > * struct panthor_device - Panthor device > */ > @@ -162,6 +181,9 @@ struct panthor_device { > */ > struct page *dummy_latest_flush; > } pm; > + > + /** @profile_mask: User-set profiling flags for job accounting. */ > + u32 profile_mask; > }; > > /** > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index 42afdf0ddb7e..6da5c3d0015e 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -93,6 +93,9 @@ > #define MIN_CSGS 3 > #define MAX_CSG_PRIO 0xf > > +#define NUM_INSTRS_PER_CACHE_LINE (64 / sizeof(u64)) > +#define MAX_INSTRS_PER_JOB 24 > + > struct panthor_group; > > /** > @@ -476,6 +479,18 @@ struct panthor_queue { > */ > struct list_head in_flight_jobs; > } fence_ctx; > + > + /** @profiling: Job profiling data slots and access information. */ > + struct { > + /** @slots: Kernel BO holding the slots. */ > + struct panthor_kernel_bo *slots; > + > + /** @slot_count: Number of jobs ringbuffer can hold at once. */ > + u32 slot_count; > + > + /** @seqno: Index of the next available profiling information slot. */ > + u32 seqno; > + } profiling; > }; > > /** > @@ -661,6 +676,18 @@ struct panthor_group { > struct list_head wait_node; > }; > > +struct panthor_job_profiling_data { > + struct { > + u64 before; > + u64 after; > + } cycles; > + > + struct { > + u64 before; > + u64 after; > + } time; > +}; > + > /** > * group_queue_work() - Queue a group work > * @group: Group to queue the work for. > @@ -774,6 +801,15 @@ struct panthor_job { > > /** @done_fence: Fence signaled when the job is finished or cancelled. */ > struct dma_fence *done_fence; > + > + /** @profiling: Job profiling information. */ > + struct { > + /** @mask: Current device job profiling enablement bitmask. */ > + u32 mask; > + > + /** @slot: Job index in the profiling slots BO. */ > + u32 slot; > + } profiling; > }; > > static void > @@ -838,6 +874,7 @@ static void group_free_queue(struct panthor_group *group, struct panthor_queue * > > panthor_kernel_bo_destroy(queue->ringbuf); > panthor_kernel_bo_destroy(queue->iface.mem); > + panthor_kernel_bo_destroy(queue->profiling.slots); > > /* Release the last_fence we were holding, if any. */ > dma_fence_put(queue->fence_ctx.last_fence); > @@ -1982,8 +2019,6 @@ tick_ctx_init(struct panthor_scheduler *sched, > } > } > > -#define NUM_INSTRS_PER_SLOT 16 > - > static void > group_term_post_processing(struct panthor_group *group) > { > @@ -2815,65 +2850,192 @@ static void group_sync_upd_work(struct work_struct *work) > group_put(group); > } > > -static struct dma_fence * > -queue_run_job(struct drm_sched_job *sched_job) > +struct panthor_job_ringbuf_instrs { > + u64 buffer[MAX_INSTRS_PER_JOB]; > + u32 count; > +}; > + > +struct panthor_job_instr { > + u32 profile_mask; > + u64 instr; > +}; > + > +#define JOB_INSTR(__prof, __instr) \ > + { \ > + .profile_mask = __prof, \ > + .instr = __instr, \ > + } > + > +static void > +copy_instrs_to_ringbuf(struct panthor_queue *queue, > + struct panthor_job *job, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + u64 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > + u64 start = job->ringbuf.start & (ringbuf_size - 1); > + u64 size, written; > + > + /* > + * We need to write a whole slot, including any trailing zeroes > + * that may come at the end of it. Also, because instrs.buffer has > + * been zero-initialised, there's no need to pad it with 0's > + */ > + instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + size = instrs->count * sizeof(u64); > + WARN_ON(size > ringbuf_size); > + written = min(ringbuf_size - start, size); > + > + memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); > + > + if (written < size) > + memcpy(queue->ringbuf->kmap, > + &instrs->buffer[written/sizeof(u64)], > + size - written); > +} > + > +struct panthor_job_cs_params { > + u32 profile_mask; > + u64 addr_reg; u64 val_reg; > + u64 cycle_reg; u64 time_reg; > + u64 sync_addr; u64 times_addr; > + u64 cs_start; u64 cs_size; > + u32 last_flush; u32 waitall_mask; > +}; > + > +static void > +get_job_cs_params(struct panthor_job *job, struct panthor_job_cs_params *params) > { > - struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > struct panthor_group *group = job->group; > struct panthor_queue *queue = group->queues[job->queue_idx]; > struct panthor_device *ptdev = group->ptdev; > struct panthor_scheduler *sched = ptdev->scheduler; > - u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > - u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); > - u64 addr_reg = ptdev->csif_info.cs_reg_count - > - ptdev->csif_info.unpreserved_cs_reg_count; > - u64 val_reg = addr_reg + 2; > - u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > - job->queue_idx * sizeof(struct panthor_syncobj_64b); > - u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > - struct dma_fence *done_fence; > - int ret; > > - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { > - /* MOV32 rX+2, cs.latest_flush */ > - (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, > + params->addr_reg = ptdev->csif_info.cs_reg_count - > + ptdev->csif_info.unpreserved_cs_reg_count; > + params->val_reg = params->addr_reg + 2; > + params->cycle_reg = params->addr_reg; > + params->time_reg = params->val_reg; > > - /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > - (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, > + params->sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > + job->queue_idx * sizeof(struct panthor_syncobj_64b); > + params->times_addr = panthor_kernel_bo_gpuva(queue->profiling.slots) + > + (job->profiling.slot * sizeof(struct panthor_job_profiling_data)); > + params->waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > > - /* MOV48 rX:rX+1, cs.start */ > - (1ull << 56) | (addr_reg << 48) | job->call_info.start, > + params->cs_start = job->call_info.start; > + params->cs_size = job->call_info.size; > + params->last_flush = job->call_info.latest_flush; > > - /* MOV32 rX+2, cs.size */ > - (2ull << 56) | (val_reg << 48) | job->call_info.size, > + params->profile_mask = job->profiling.mask; > +} > > - /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > - (3ull << 56) | (1 << 16), > +#define JOB_INSTR_ALWAYS(instr) \ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (instr)) > +#define JOB_INSTR_TIMESTAMP(instr) \ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, (instr)) > +#define JOB_INSTR_CYCLES(instr) \ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, (instr)) > > +static void > +prepare_job_instrs(const struct panthor_job_cs_params *params, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + const struct panthor_job_instr instr_seq[] = { > + /* MOV32 rX+2, cs.latest_flush */ > + JOB_INSTR_ALWAYS((2ull << 56) | (params->val_reg << 48) | params->last_flush), > + /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > + JOB_INSTR_ALWAYS((36ull << 56) | (0ull << 48) | (params->val_reg << 40) | (0 << 16) | 0x233), > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR_CYCLES((1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + offsetof(struct panthor_job_profiling_data, cycles.before))), > + /* STORE_STATE cycles */ > + JOB_INSTR_CYCLES((40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR_TIMESTAMP((1ull << 56) | (params->time_reg << 48) | (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.before))), > + /* STORE_STATE timer */ > + JOB_INSTR_TIMESTAMP((40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > + /* MOV48 rX:rX+1, cs.start */ > + JOB_INSTR_ALWAYS((1ull << 56) | (params->addr_reg << 48) | params->cs_start), > + /* MOV32 rX+2, cs.size */ > + JOB_INSTR_ALWAYS((2ull << 56) | (params->val_reg << 48) | params->cs_size), > + /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > + JOB_INSTR_ALWAYS((3ull << 56) | (1 << 16)), > /* CALL rX:rX+1, rX+2 */ > - (32ull << 56) | (addr_reg << 40) | (val_reg << 32), > - > + JOB_INSTR_ALWAYS((32ull << 56) | (params->addr_reg << 40) | (params->val_reg << 32)), > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR_CYCLES((1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + offsetof(struct panthor_job_profiling_data, cycles.after))), > + /* STORE_STATE cycles */ > + JOB_INSTR_CYCLES((40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR_TIMESTAMP((1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + offsetof(struct panthor_job_profiling_data, time.after))), > + /* STORE_STATE timer */ > + JOB_INSTR_TIMESTAMP((40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > /* MOV48 rX:rX+1, sync_addr */ > - (1ull << 56) | (addr_reg << 48) | sync_addr, > - > + JOB_INSTR_ALWAYS((1ull << 56) | (params->addr_reg << 48) | params->sync_addr), > /* MOV48 rX+2, #1 */ > - (1ull << 56) | (val_reg << 48) | 1, > - > + JOB_INSTR_ALWAYS((1ull << 56) | (params->val_reg << 48) | 1), > /* WAIT(all) */ > - (3ull << 56) | (waitall_mask << 16), > - > + JOB_INSTR_ALWAYS((3ull << 56) | (params->waitall_mask << 16)), > /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ > - (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, > + JOB_INSTR_ALWAYS((51ull << 56) | (0ull << 48) | (params->addr_reg << 40) | > + (params->val_reg << 32) | (0 << 16) | 1), > + /* ERROR_BARRIER, so we can recover from faults at job boundaries. */ > + JOB_INSTR_ALWAYS((47ull << 56)), > + }; > + u32 pad; > > - /* ERROR_BARRIER, so we can recover from faults at job > - * boundaries. > - */ > - (47ull << 56), > + /* NEED to be cacheline aligned to please the prefetcher. */ > + static_assert(sizeof(instrs->buffer) % 64 == 0, > + "panthor_job_ringbuf_instrs::buffer is not aligned on a cacheline"); > + > + /* Make sure we have enough storage to store the whole sequence. */ > + static_assert(ALIGN(ARRAY_SIZE(instr_seq), NUM_INSTRS_PER_CACHE_LINE) == > + ARRAY_SIZE(instrs->buffer), > + "instr_seq vs panthor_job_ringbuf_instrs::buffer size mismatch"); > + > + for (u32 i = 0; i < ARRAY_SIZE(instr_seq); i++) { > + /* If the profile mask of this instruction is not enabled, skip it. */ > + if (instr_seq[i].profile_mask && > + !(instr_seq[i].profile_mask & params->profile_mask)) > + continue; > + > + instrs->buffer[instrs->count++] = instr_seq[i].instr; > + } > + > + pad = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + memset(&instrs->buffer[instrs->count], 0, > + (pad - instrs->count) * sizeof(instrs->buffer[0])); > + instrs->count = pad; > +} > + > +static u32 calc_job_credits(u32 profile_mask) > +{ > + struct panthor_job_ringbuf_instrs instrs = { > + .count = 0, > + }; > + struct panthor_job_cs_params params = { > + .profile_mask = profile_mask, > }; > > - /* Need to be cacheline aligned to please the prefetcher. */ > - static_assert(sizeof(call_instrs) % 64 == 0, > - "call_instrs is not aligned on a cacheline"); > + prepare_job_instrs(&params, &instrs); > + return instrs.count; > +} > + > +static struct dma_fence * > +queue_run_job(struct drm_sched_job *sched_job) > +{ > + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > + struct panthor_group *group = job->group; > + struct panthor_queue *queue = group->queues[job->queue_idx]; > + struct panthor_device *ptdev = group->ptdev; > + struct panthor_scheduler *sched = ptdev->scheduler; > + struct panthor_job_ringbuf_instrs instrs; instrs isn't initialised... > + struct panthor_job_cs_params cs_params; > + struct dma_fence *done_fence; > + int ret; > > /* Stream size is zero, nothing to do except making sure all previously > * submitted jobs are done before we signal the > @@ -2900,17 +3062,23 @@ queue_run_job(struct drm_sched_job *sched_job) > queue->fence_ctx.id, > atomic64_inc_return(&queue->fence_ctx.seqno)); > > - memcpy(queue->ringbuf->kmap + ringbuf_insert, > - call_instrs, sizeof(call_instrs)); > + job->profiling.slot = queue->profiling.seqno++; > + if (queue->profiling.seqno == queue->profiling.slot_count) > + queue->profiling.seqno = 0; > + > + job->ringbuf.start = queue->iface.input->insert; > + > + get_job_cs_params(job, &cs_params); > + prepare_job_instrs(&cs_params, &instrs); ...but it's passed into prepare_job_instrs() which depends on instrs.count (same bug as was in calc_job_credits()) - sorry I didn't spot it last review. Initializing instrs makes everything work for me. I'm not sure quite what kernel configuration you are using but I wonder if you've got a 'hardening' option enabled which is causing the stack to be zero-initialised. It's worth turning it off for testing purposes ;) Steve > + copy_instrs_to_ringbuf(queue, job, &instrs); > + > + job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64)); > > panthor_job_get(&job->base); > spin_lock(&queue->fence_ctx.lock); > list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs); > spin_unlock(&queue->fence_ctx.lock); > > - job->ringbuf.start = queue->iface.input->insert; > - job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); > - > /* Make sure the ring buffer is updated before the INSERT > * register. > */ > @@ -3003,6 +3171,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { > .free_job = queue_free_job, > }; > > +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > + u32 cs_ringbuf_size) > +{ > + u32 min_profiled_job_instrs = U32_MAX; > + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); > + > + /* > + * We want to calculate the minimum size of a profiled job's CS, > + * because since they need additional instructions for the sampling > + * of performance metrics, they might take up further slots in > + * the queue's ringbuffer. This means we might not need as many job > + * slots for keeping track of their profiling information. What we > + * need is the maximum number of slots we should allocate to this end, > + * which matches the maximum number of profiled jobs we can place > + * simultaneously in the queue's ring buffer. > + * That has to be calculated separately for every single job profiling > + * flag, but not in the case job profiling is disabled, since unprofiled > + * jobs don't need to keep track of this at all. > + */ > + for (u32 i = 0; i < last_flag; i++) { > + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > + min_profiled_job_instrs = > + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > + } > + > + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > +} > + > static struct panthor_queue * > group_create_queue(struct panthor_group *group, > const struct drm_panthor_queue_create *args) > @@ -3056,9 +3252,35 @@ group_create_queue(struct panthor_group *group, > goto err_free_queue; > } > > + queue->profiling.slot_count = > + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); > + > + queue->profiling.slots = > + panthor_kernel_bo_create(group->ptdev, group->vm, > + queue->profiling.slot_count * > + sizeof(struct panthor_job_profiling_data), > + DRM_PANTHOR_BO_NO_MMAP, > + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | > + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, > + PANTHOR_VM_KERNEL_AUTO_VA); > + > + if (IS_ERR(queue->profiling.slots)) { > + ret = PTR_ERR(queue->profiling.slots); > + goto err_free_queue; > + } > + > + ret = panthor_kernel_bo_vmap(queue->profiling.slots); > + if (ret) > + goto err_free_queue; > + > + /* > + * Credit limit argument tells us the total number of instructions > + * across all CS slots in the ringbuffer, with some jobs requiring > + * twice as many as others, depending on their profiling status. > + */ > ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, > group->ptdev->scheduler->wq, 1, > - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), > + args->ringbuf_size / sizeof(u64), > 0, msecs_to_jiffies(JOB_TIMEOUT_MS), > group->ptdev->reset.wq, > NULL, "panthor-queue", group->ptdev->base.dev); > @@ -3354,6 +3576,7 @@ panthor_job_create(struct panthor_file *pfile, > { > struct panthor_group_pool *gpool = pfile->groups; > struct panthor_job *job; > + u32 credits; > int ret; > > if (qsubmit->pad) > @@ -3407,9 +3630,16 @@ panthor_job_create(struct panthor_file *pfile, > } > } > > + job->profiling.mask = pfile->ptdev->profile_mask; > + credits = calc_job_credits(job->profiling.mask); > + if (credits == 0) { > + ret = -EINVAL; > + goto err_put_job; > + } > + > ret = drm_sched_job_init(&job->base, > &job->group->queues[job->queue_idx]->entity, > - 1, job->group); > + credits, job->group); > if (ret) > goto err_put_job; >

1 year, 4 months

1
0
0 0

Re: [PATCH v7 2/5] drm/panthor: record current and maximum device clock frequencies

by Steven Price

On 21/09/2024 00:43, Adrián Larumbe wrote: > In order to support UM in calculating rates of GPU utilisation, the current > operating and maximum GPU clock frequencies must be recorded during device > initialisation, and also during OPP state transitions. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> I thought I gave my r-b on v6 and I can't actually see any change: Reviewed-by: Steven Price <steven.price(a)arm.com> > --- > drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +++++++++++++++++- > drivers/gpu/drm/panthor/panthor_device.h | 6 ++++++ > 2 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c > index c6d3c327cc24..9d0f891b9b53 100644 > --- a/drivers/gpu/drm/panthor/panthor_devfreq.c > +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c > @@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct panthor_devfreq *pdevfreq) > static int panthor_devfreq_target(struct device *dev, unsigned long *freq, > u32 flags) > { > + struct panthor_device *ptdev = dev_get_drvdata(dev); > struct dev_pm_opp *opp; > + int err; > > opp = devfreq_recommended_opp(dev, freq, flags); > if (IS_ERR(opp)) > return PTR_ERR(opp); > dev_pm_opp_put(opp); > > - return dev_pm_opp_set_rate(dev, *freq); > + err = dev_pm_opp_set_rate(dev, *freq); > + if (!err) > + ptdev->current_frequency = *freq; > + > + return err; > } > > static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq) > @@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > struct panthor_devfreq *pdevfreq; > struct dev_pm_opp *opp; > unsigned long cur_freq; > + unsigned long freq = ULONG_MAX; > int ret; > > pdevfreq = drmm_kzalloc(&ptdev->base, sizeof(*ptdev->devfreq), GFP_KERNEL); > @@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > return PTR_ERR(opp); > > panthor_devfreq_profile.initial_freq = cur_freq; > + ptdev->current_frequency = cur_freq; > > /* Regulator coupling only takes care of synchronizing/balancing voltage > * updates, but the coupled regulator needs to be enabled manually. > @@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > > dev_pm_opp_put(opp); > > + /* Find the fastest defined rate */ > + opp = dev_pm_opp_find_freq_floor(dev, &freq); > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + ptdev->fast_rate = freq; > + > + dev_pm_opp_put(opp); > + > /* > * Setup default thresholds for the simple_ondemand governor. > * The values are chosen based on experiments. > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index a48e30d0af30..2109905813e8 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -184,6 +184,12 @@ struct panthor_device { > > /** @profile_mask: User-set profiling flags for job accounting. */ > u32 profile_mask; > + > + /** @current_frequency: Device clock frequency at present. Set by DVFS*/ > + unsigned long current_frequency; > + > + /** @fast_rate: Maximum device clock frequency. Set by DVFS */ > + unsigned long fast_rate; > }; > > /**

1 year, 4 months

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 20/09/2024 23:36, Adrián Larumbe wrote: > Hi Steve, thanks for the review. Hi Adrián, > I've applied all of your suggestions for the next patch series revision, so I'll > only be answering to your question about the calc_profiling_ringbuf_num_slots > function further down below. > [...] >>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>> .free_job = queue_free_job, >>> }; >>> >>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>> + u32 cs_ringbuf_size) >>> +{ >>> + u32 min_profiled_job_instrs = U32_MAX; >>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>> + >>> + /* >>> + * We want to calculate the minimum size of a profiled job's CS, >>> + * because since they need additional instructions for the sampling >>> + * of performance metrics, they might take up further slots in >>> + * the queue's ringbuffer. This means we might not need as many job >>> + * slots for keeping track of their profiling information. What we >>> + * need is the maximum number of slots we should allocate to this end, >>> + * which matches the maximum number of profiled jobs we can place >>> + * simultaneously in the queue's ring buffer. >>> + * That has to be calculated separately for every single job profiling >>> + * flag, but not in the case job profiling is disabled, since unprofiled >>> + * jobs don't need to keep track of this at all. >>> + */ >>> + for (u32 i = 0; i < last_flag; i++) { >>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>> + min_profiled_job_instrs = >>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>> + } >>> + >>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>> +} >> >> I may be missing something, but is there a situation where this is >> different to calc_job_credits(0)? AFAICT the infrastructure you've added >> can only add extra instructions to the no-flags case - whereas this >> implies you're thinking that instructions may also be removed (or replaced). >> >> Steve > > Since we create a separate kernel BO to hold the profiling information slot, we > need one that would be able to accomodate as many slots as the maximum number of > profiled jobs we can insert simultaneously into the queue's ring buffer. Because > profiled jobs always take more instructions than unprofiled ones, then we would > usually need fewer slots than the number of unprofiled jobs we could insert at > once in the ring buffer. > > Because we represent profiling metrics with a bit mask, then we need to test the > size of the CS for every single metric enabled in isolation, since enabling more > than one will always mean a bigger CS, and therefore fewer jobs tracked at once > in the queue's ring buffer. > > In our case, calling calc_job_credits(0) would simply tell us the number of > instructions we need for a normal job with no profiled features enabled, which > would always requiere less instructions than profiled ones, and therefore more > slots in the profiling info kernel BO. But we don't need to keep track of > profiling numbers for unprofiled jobs, so there's no point in calculating this > number. > > At first I was simply allocating a profiling info kernel BO as big as the number > of simultaneous unprofiled job slots in the ring queue, but Boris pointed out > that since queue ringbuffers can be as big as 2GiB, a lot of this memory would > be wasted, since profiled jobs always require more slots because they hold more > instructions, so fewer profiling slots in said kernel BO. > > The value of this approach will eventually manifest if we decided to keep track of > more profiling metrics, since this code won't have to change at all, other than > adding new profiling flags in the panthor_device_profiling_flags enum. Thanks for the detailed explanation. I think what I was missing is that the loop is checking each bit flag independently and *not* checking calc_job_credits(0). The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what confused me - that should be completely redundant. Or at least we need something more intelligent if we have profiling bits which are not mutually compatible. I'm also not entirely sure that the amount of RAM saved is significant, but you've already written the code so we might as well have the saving ;) Thanks, Steve > Regards, > Adrian > >>> + >>> static struct panthor_queue * >>> group_create_queue(struct panthor_group *group, >>> const struct drm_panthor_queue_create *args) >>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>> goto err_free_queue; >>> } >>> >>> + queue->profiling.slot_count = >>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>> + >>> + queue->profiling.slots = >>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>> + queue->profiling.slot_count * >>> + sizeof(struct panthor_job_profiling_data), >>> + DRM_PANTHOR_BO_NO_MMAP, >>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>> + PANTHOR_VM_KERNEL_AUTO_VA); >>> + >>> + if (IS_ERR(queue->profiling.slots)) { >>> + ret = PTR_ERR(queue->profiling.slots); >>> + goto err_free_queue; >>> + } >>> + >>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>> + if (ret) >>> + goto err_free_queue; >>> + >>> + /* >>> + * Credit limit argument tells us the total number of instructions >>> + * across all CS slots in the ringbuffer, with some jobs requiring >>> + * twice as many as others, depending on their profiling status. >>> + */ >>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>> group->ptdev->scheduler->wq, 1, >>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>> + args->ringbuf_size / sizeof(u64), >>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>> group->ptdev->reset.wq, >>> NULL, "panthor-queue", group->ptdev->base.dev); >>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>> { >>> struct panthor_group_pool *gpool = pfile->groups; >>> struct panthor_job *job; >>> + u32 credits; >>> int ret; >>> >>> if (qsubmit->pad) >>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>> } >>> } >>> >>> + job->profiling.mask = pfile->ptdev->profile_mask; >>> + credits = calc_job_credits(job->profiling.mask); >>> + if (credits == 0) { >>> + ret = -EINVAL; >>> + goto err_put_job; >>> + } >>> + >>> ret = drm_sched_job_init(&job->base, >>> &job->group->queues[job->queue_idx]->entity, >>> - 1, job->group); >>> + credits, job->group); >>> if (ret) >>> goto err_put_job; >>> >

1 year, 4 months

1
0
0 0

[RFC PATCH] dma-buf/dma-fence: Use a successful read_trylock() annotation for dma_fence_begin_signalling()

by Thomas Hellström

Condsider the following call sequence: /* Upper layer */ dma_fence_begin_signalling(); lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); ... The driver might here use a utility that is annotated as intended for the dma-fence signalling critical path. Now if the upper layer isn't correctly annotated yet for whatever reason, resulting in /* Upper layer */ lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); We will receive a false lockdep locking order violation notification from dma_fence_begin_signalling(). However entering a dma-fence signalling critical section itself doesn't block and could not cause a deadlock. So use a successful read_trylock() annotation instead for dma_fence_begin_signalling(). That will make sure that the locking order is correctly registered in the first case, and doesn't register any locking order in the second case. The alternative is of course to make sure that the "Upper layer" is always correctly annotated. But experience shows that's not easily achievable in all cases. Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> --- drivers/dma-buf/dma-fence.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index f177c56269bb..17f632768ef9 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -308,8 +308,8 @@ bool dma_fence_begin_signalling(void) if (in_atomic()) return true; - /* ... and non-recursive readlock */ - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_); + /* ... and non-recursive successful read_trylock */ + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_); return false; } @@ -340,7 +340,7 @@ void __dma_fence_might_wait(void) lock_map_acquire(&dma_fence_lockdep_map); lock_map_release(&dma_fence_lockdep_map); if (tmp) - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_); + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _THIS_IP_); } #endif -- 2.39.2

1 year, 4 months

3
7
0 0

[PATCH 1/2] dma-buf/dma-fence: remove unnecessary callbacks

by Christian König

The fence_value_str and timeline_value_str callbacks were just an unnecessary abstraction in the SW sync implementation. The only caller of those callbacks already knew that the fence in questions is a timeline_fence. So print the values directly instead of using a redirection. Additional to that remove the implementations from virtgpu and vgem. As far as I can see those were never used in the first place. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/sw_sync.c | 16 ---------------- drivers/dma-buf/sync_debug.c | 21 ++------------------- drivers/gpu/drm/vgem/vgem_fence.c | 15 --------------- drivers/gpu/drm/virtio/virtgpu_fence.c | 16 ---------------- include/linux/dma-fence.h | 21 --------------------- 5 files changed, 2 insertions(+), 87 deletions(-) diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index c353029789cf..f7ce4c6b8b8e 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -178,20 +178,6 @@ static bool timeline_fence_enable_signaling(struct dma_fence *fence) return true; } -static void timeline_fence_value_str(struct dma_fence *fence, - char *str, int size) -{ - snprintf(str, size, "%lld", fence->seqno); -} - -static void timeline_fence_timeline_value_str(struct dma_fence *fence, - char *str, int size) -{ - struct sync_timeline *parent = dma_fence_parent(fence); - - snprintf(str, size, "%d", parent->value); -} - static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) { struct sync_pt *pt = dma_fence_to_sync_pt(fence); @@ -214,8 +200,6 @@ static const struct dma_fence_ops timeline_fence_ops = { .enable_signaling = timeline_fence_enable_signaling, .signaled = timeline_fence_signaled, .release = timeline_fence_release, - .fence_value_str = timeline_fence_value_str, - .timeline_value_str = timeline_fence_timeline_value_str, .set_deadline = timeline_fence_set_deadline, }; diff --git a/drivers/dma-buf/sync_debug.c b/drivers/dma-buf/sync_debug.c index 237bce21d1e7..270daae7d89a 100644 --- a/drivers/dma-buf/sync_debug.c +++ b/drivers/dma-buf/sync_debug.c @@ -82,25 +82,8 @@ static void sync_print_fence(struct seq_file *s, seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec); } - if (fence->ops->timeline_value_str && - fence->ops->fence_value_str) { - char value[64]; - bool success; - - fence->ops->fence_value_str(fence, value, sizeof(value)); - success = strlen(value); - - if (success) { - seq_printf(s, ": %s", value); - - fence->ops->timeline_value_str(fence, value, - sizeof(value)); - - if (strlen(value)) - seq_printf(s, " / %s", value); - } - } - + seq_printf(s, ": %lld", fence->seqno); + seq_printf(s, " / %d", parent->value); seq_putc(s, '\n'); } diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c index e15754178395..5298d995faa7 100644 --- a/drivers/gpu/drm/vgem/vgem_fence.c +++ b/drivers/gpu/drm/vgem/vgem_fence.c @@ -53,25 +53,10 @@ static void vgem_fence_release(struct dma_fence *base) dma_fence_free(&fence->base); } -static void vgem_fence_value_str(struct dma_fence *fence, char *str, int size) -{ - snprintf(str, size, "%llu", fence->seqno); -} - -static void vgem_fence_timeline_value_str(struct dma_fence *fence, char *str, - int size) -{ - snprintf(str, size, "%llu", - dma_fence_is_signaled(fence) ? fence->seqno : 0); -} - static const struct dma_fence_ops vgem_fence_ops = { .get_driver_name = vgem_fence_get_driver_name, .get_timeline_name = vgem_fence_get_timeline_name, .release = vgem_fence_release, - - .fence_value_str = vgem_fence_value_str, - .timeline_value_str = vgem_fence_timeline_value_str, }; static void vgem_fence_timeout(struct timer_list *t) diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c b/drivers/gpu/drm/virtio/virtgpu_fence.c index f28357dbde35..44c1d8ef3c4d 100644 --- a/drivers/gpu/drm/virtio/virtgpu_fence.c +++ b/drivers/gpu/drm/virtio/virtgpu_fence.c @@ -49,26 +49,10 @@ static bool virtio_gpu_fence_signaled(struct dma_fence *f) return false; } -static void virtio_gpu_fence_value_str(struct dma_fence *f, char *str, int size) -{ - snprintf(str, size, "[%llu, %llu]", f->context, f->seqno); -} - -static void virtio_gpu_timeline_value_str(struct dma_fence *f, char *str, - int size) -{ - struct virtio_gpu_fence *fence = to_virtio_gpu_fence(f); - - snprintf(str, size, "%llu", - (u64)atomic64_read(&fence->drv->last_fence_id)); -} - static const struct dma_fence_ops virtio_gpu_fence_ops = { .get_driver_name = virtio_gpu_get_driver_name, .get_timeline_name = virtio_gpu_get_timeline_name, .signaled = virtio_gpu_fence_signaled, - .fence_value_str = virtio_gpu_fence_value_str, - .timeline_value_str = virtio_gpu_timeline_value_str, }; struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device *vgdev, diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index e7ad819962e3..cf91cae6e30f 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -238,27 +238,6 @@ struct dma_fence_ops { */ void (*release)(struct dma_fence *fence); - /** - * @fence_value_str: - * - * Callback to fill in free-form debug info specific to this fence, like - * the sequence number. - * - * This callback is optional. - */ - void (*fence_value_str)(struct dma_fence *fence, char *str, int size); - - /** - * @timeline_value_str: - * - * Fills in the current value of the timeline as a string, like the - * sequence number. Note that the specific fence passed to this function - * should not matter, drivers should only use it to look up the - * corresponding timeline structures. - */ - void (*timeline_value_str)(struct dma_fence *fence, - char *str, int size); - /** * @set_deadline: * -- 2.34.1

1 year, 4 months

1
1
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig