On 11/27/21 5:05 PM, Jonathan Cameron wrote:
>> Non-coherent mapping with no cache sync:
>> - fileio:
>> read: 156 MiB/s
>> write: 123 MiB/s
>> - dmabuf:
>> read: 234 MiB/s (capped by sample rate)
>> write: 182 MiB/s
>>
>> Non-coherent reads with no cache sync + write-combine writes:
>> - fileio:
>> read: 156 MiB/s
>> write: 140 MiB/s
>> - dmabuf:
>> read: 234 MiB/s (capped by sample rate)
>> write: 210 MiB/s
>>
>>
>> A few things we can deduce from this:
>>
>> * Write-combine is not available on Zynq/ARM? If it was working, it
>> should give a better performance than the coherent mapping, but it
>> doesn't seem to do anything at all. At least it doesn't harm
>> performance.
> I'm not sure it's very relevant to this sort of streaming write.
> If you write a sequence of addresses then nothing stops them getting combined
> into a single write whether or not it is write-combining.
There is a difference at which point they can get combined. With
write-combine they can be coalesced into a single transaction anywhere
in the interconnect, as early as the CPU itself. Without write-cobmine
the DDR controller might decide to combine them, but not earlier. This
can make a difference especially if the write is a narrow write, i.e.
the access size is smaller than the buswidth.
Lets say you do 32-bit writes, but your bus is 64 bits wide. With WC two
32-bits can be combined into a 64-bit write. Without WC that is not
possible and you are potentially not using the bus to its fullest
capacity. This is especially true if the memory bus is wider than the
widest access size of the CPU.
Hi guys,
as discussed before this set of patches completely rework the dma_resv semantic
and spreads the new handling over all the existing drivers and users.
First of all this drops the DAG approach because it requires that every single
driver implements those relatively complicated rules correctly and any
violation of that immediately leads to either corruption of freed memory or
even more severe security problems.
Instead we just keep all fences around all the time until they are signaled.
Only fences with the same context are assumed to be signaled in the correct
order since this is exercised elsewhere as well. Replacing fences is now only
supported for hardware mechanism like VM page table updates where the hardware
can guarantee that the resource can't be accessed any more.
Then the concept of a single exclusive fence and multiple shared fences is
dropped as well.
Instead the dma_resv object is now just a container for dma_fence objects where
each fence has associated usage flags. Those use flags describe how the
operation represented by the dma_fence object is using the resource protected
by the dma_resv object. This allows us to add multiple fences for each usage
type.
Additionally to the existing WRITE/READ usages this patch set also adds the new
KERNEL and OTHER usages. The KERNEL usages is used in cases where the kernel
needs to do some operation with the resource protected by the dma_resv object,
like copies or clears. Those are mandatory to wait for when dynamic memory
management is used.
The OTHER usage is for cases where we don't want that the operation represented
by the dma_fence object participate in any implicit sync but needs to be
respected by the kernel memory management. Examples for those are VM page table
updates and preemption fences.
While doing this the new implementation cleans up existing workarounds all over
the place, but especially amdgpu and TTM. Surprisingly I also found two use
cases for the KERNEL/OTHER usage in i915 and Nouveau, those might need more
thoughts.
In general the existing functionality should been preserved, the only downside
is that we now always need to reserve a slot before adding a fence. The newly
added call to the reservation function can probably use some more cleanup.
TODOs: Testing, testing, testing, doublechecking the newly added
kerneldoc for any typos.
Please review and/or comment,
Christian.
Am 26.11.21 um 08:49 schrieb guangming.cao(a)mediatek.com:
> From: Guangming <Guangming.Cao(a)mediatek.com>
>
> For previous version, it uses 'sg_table.nent's to traverse sg_table in pages
> free flow.
> However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of
> created entries in the DMA adderess space.
> So, use 'sg_table.nents' in pages free flow will case some pages can't be freed.
>
> Here we should use sg_table.orig_nents to free pages memory, but use the
> sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common
> helper 'for_each_sg' which maybe cause memory leak) is much better.
>
> Fixes: d963ab0f15fb0 ("dma-buf: system_heap: Allocate higher order pages if available")
> Signed-off-by: Guangming <Guangming.Cao(a)mediatek.com>
> Reviewed-by: Robin Murphy <robin.murphy(a)arm.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
> Cc: <stable(a)vger.kernel.org> # 5.11.*
> ---
> v4: Correct commit message
> 1. Cc stable(a)vger.kernel.org in commit message and add required kernel version.
> 2. Add reviewed-by since patch V2 and V4 are same and V2 is reviewed by Robin.
> 3. There is no new code change in V4.
> V3: Cc stable(a)vger.kernel.org
> 1. This patch needs to be merged stable branch, add stable(a)vger.kernel.org
> in mail list.
> 2. Correct some spelling mistake.
> 3. There is No new code change in V3.
> V2: use 'for_each_sgtable_sg' to 'replece for_each_sg' as suggested by Robin.
>
> ---
> drivers/dma-buf/heaps/system_heap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 23a7e74ef966..8660508f3684 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
> int i;
>
> table = &buffer->sg_table;
> - for_each_sg(table->sgl, sg, table->nents, i) {
> + for_each_sgtable_sg(table, sg, i) {
> struct page *page = sg_page(sg);
>
> __free_pages(page, compound_order(page));
On Fri, Nov 26, 2021 at 11:16:05AM +0800, guangming.cao(a)mediatek.com wrote:
> From: Guangming <Guangming.Cao(a)mediatek.com>
>
> For previous version, it uses 'sg_table.nent's to traverse sg_table in pages
> free flow.
> However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of
> created entries in the DMA adderess space.
> So, use 'sg_table.nents' in pages free flow will case some pages can't be freed.
>
> Here we should use sg_table.orig_nents to free pages memory, but use the
> sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common
> helper 'for_each_sg' which maybe cause memory leak) is much better.
>
> Fixes: d963ab0f15fb0 ("dma-buf: system_heap: Allocate higher order pages if available")
>
> Signed-off-by: Guangming <Guangming.Cao(a)mediatek.com>
> ---
> drivers/dma-buf/heaps/system_heap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 23a7e74ef966..8660508f3684 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
> int i;
>
> table = &buffer->sg_table;
> - for_each_sg(table->sgl, sg, table->nents, i) {
> + for_each_sgtable_sg(table, sg, i) {
> struct page *page = sg_page(sg);
>
> __free_pages(page, compound_order(page));
> --
> 2.17.1
>
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
25.11.2021 19:53, Akhil R пишет:
> Add support for the ACPI based device registration so that the driver
> can be also enabled through ACPI table.
>
> This does not include the ACPI support for Tegra VI and DVC I2C.
>
> Signed-off-by: Akhil R <akhilrajeev(a)nvidia.com>
> ---
> drivers/i2c/busses/i2c-tegra.c | 52 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 40 insertions(+), 12 deletions(-)
>
> v4 changes:
> * changed has_acpi_companion to ACPI_HANDLE for consistency across
> all functions
> v3 - https://lkml.org/lkml/2021/11/25/173
> v2 - https://lkml.org/lkml/2021/11/23/82
> v1 - https://lkml.org/lkml/2021/11/19/393
Andy suggested to make v5 with extra patch that will make use of the
generic i2c_timings, but it should be a separate change.
This patch is good to me. Please provide the full changelog for each
version next time, instead of the links. Thanks!
If you'll make v5, then feel free to add my r-b:
Reviewed-by: Dmitry Osipenko <digetx(a)gmail.com>
25.11.2021 22:28, Andy Shevchenko пишет:
>> - err = reset_control_reset(i2c_dev->rst);
>> + if (handle)
>> + err = acpi_evaluate_object(handle, "_RST", NULL, NULL);
> Does it compile for CONFIG_ACPI=n case?
>
It compiles and works fine with CONFIG_ACPI=n.
On 2021-11-25 13:49, guangming.cao(a)mediatek.com wrote:
> From: Guangming <Guangming.Cao(a)mediatek.com>
>
> Use (for_each_sgtable_sg) rather than (for_each_sg) to traverse
> sg_table to free sg_table.
> Use (for_each_sg) maybe will casuse some pages can't be freed
> when send wrong nents number.
It's still worth spelling out that this is fixing a bug where the
current code should have been using table->orig_nents - it's just that
switching to the sgtable helper is the best way to make the fix, since
it almost entirely removes the possibility of making that (previously
rather common) mistake.
If it helps, for the change itself:
Reviewed-by: Robin Murphy <robin.murphy(a)arm.com>
Thanks,
Robin.
> Signed-off-by: Guangming <Guangming.Cao(a)mediatek.com>
> ---
> drivers/dma-buf/heaps/system_heap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 23a7e74ef966..8660508f3684 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
> int i;
>
> table = &buffer->sg_table;
> - for_each_sg(table->sgl, sg, table->nents, i) {
> + for_each_sgtable_sg(table, sg, i) {
> struct page *page = sg_page(sg);
>
> __free_pages(page, compound_order(page));
>
25.11.2021 12:07, Akhil R пишет:
> Add support for the ACPI based device registration so that the driver
> can be also enabled through ACPI table.
>
> This does not include the ACPI support for Tegra VI and DVC I2C.
>
> Signed-off-by: Akhil R <akhilrajeev(a)nvidia.com>
> ---
> drivers/i2c/busses/i2c-tegra.c | 52 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 40 insertions(+), 12 deletions(-)
>
> v3 changes
> * removed acpi_has_method check.
> * moved dev_err_probe to init_reset function to make it consistent with
> init_clocks.
> * Updates in commit message as suggested.
>
> v2 - https://lkml.org/lkml/2021/11/23/82
> v1 - https://lkml.org/lkml/2021/11/19/393
Akhil, the patch looks almost good, thank you. Please see one minor
question below.
> diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
> index c883044..b889eb3 100644
> --- a/drivers/i2c/busses/i2c-tegra.c
> +++ b/drivers/i2c/busses/i2c-tegra.c
> @@ -6,6 +6,7 @@
> * Author: Colin Cross <ccross(a)android.com>
> */
>
> +#include <linux/acpi.h>
> #include <linux/bitfield.h>
> #include <linux/clk.h>
> #include <linux/delay.h>
> @@ -608,6 +609,7 @@ static int tegra_i2c_wait_for_config_load(struct tegra_i2c_dev *i2c_dev)
> static int tegra_i2c_init(struct tegra_i2c_dev *i2c_dev)
> {
> u32 val, clk_divisor, clk_multiplier, tsu_thd, tlow, thigh, non_hs_mode;
> + acpi_handle handle = ACPI_HANDLE(i2c_dev->dev);
...
> +static int tegra_i2c_init_reset(struct tegra_i2c_dev *i2c_dev)
> +{
> + if (has_acpi_companion(i2c_dev->dev))
> + return 0;
Can we use ACPI_HANDLE() everywhere instead of has_acpi_companion()? For
consistency. I guess that's what Andy was asking about in v1?
24.11.2021 19:40, Akhil R пишет:
>> 24.11.2021 10:18, Akhil R пишет:
>>>> *i2c_dev)
>>>>> i2c_dev->is_vi = true; }
>>>> How are you going to differentiate the VI I2C from a non-VI? This
>>>> doesn't look right.
>>> This patch adds the ACPI support to only non-VI I2C. The device_ids in
>>> match table are added accordingly. I suppose, of_device_is_compatible
>>> always returns false as there is no device tree.
>>> Agree with the other comments.
>>
>> Will the VI I2C have a different ACPI ID or how it's going to work?
> As there is a different compatible for VI I2C in device tree, I suppose the ACPI
> would have a different ID as well. I think the logic would also need an update
> if to have VI I2C using the ACPI. But that wasn't actually considered in this patch.
Thanks, you could reflected it in the commit message.