When build perf tool with passing option 'CORESIGHT=1' explicitly, if
the feature test fails for library libopencsd, the build doesn't
complain the feature failure and continue to build the tool with
disabling the CoreSight feature insteadly.
This patch changes the building behaviour, when build perf tool with the
option 'CORESIGHT=1' and detect the failure for testing feature
libopencsd, the build process will be aborted and it shows the complaint
info.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
Changes from v1:
Fixed a typo in the error message.
tools/perf/Makefile.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 4a0d9a6defc7..5df79538486b 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -489,6 +489,8 @@ ifdef CORESIGHT
CFLAGS += -DCS_RAW_PACKED
endif
endif
+ else
+ dummy := $(error Error: No libopencsd library found or the version is not up-to-date. Please install recent libopencsd to build with CORESIGHT=1)
endif
endif
--
2.25.1
When build perf tool with passing option 'CORESIGHT=1' explicitly, if
the feature test fails for library libopencsd, the build doesn't
complain the feature failure and continue to build the tool with
disabling the CoreSight feature insteadly.
This patch changes the building behaviour, when build perf tool with the
option 'CORESIGHT=1' and detect the failure for testing feature
libopencsd, the build process will be aborted and it shows the complaint
info.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tools/perf/Makefile.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 4a0d9a6defc7..5df79538486b 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -489,6 +489,8 @@ ifdef CORESIGHT
CFLAGS += -DCS_RAW_PACKED
endif
endif
+ else
+ dummy := $(error Error: No libopencsd library found or the version is not up-to-date. Please install recent libopencsd to build with COREISGHT=1)
endif
endif
--
2.25.1
Hi Tao,
On Thu, 19 Aug 2021 at 03:29, Tao Zhang <quic_taozha(a)quicinc.com> wrote:
>
> This series adds Coresight support for SM8250 Soc on RB5 board.
> It is composed of two elements.
> a) Add ETM PID for Kryo-5XX.
> b) Add coresight support to DTS for RB5.
>
> This series applies to coresight/next
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
> Tao Zhang (2):
> coresight: etm4x: Add ETM PID for Kryo-5XX
> arm64: dts: qcom: sm8250: Add Coresight support
>
> arch/arm64/boot/dts/qcom/qrb5165-rb5.dts | 442 +++++++++++++++++-
> .../coresight/coresight-etm4x-core.c | 1 +
> 2 files changed, 439 insertions(+), 4 deletions(-)
>
I have added your work to my patchset queue. On the other hand I have
a lot of patches to review these days and as such won't be able to
look at it for 4 to 5 weeks.
Thanks,
Mathieu
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
Hi Leo,
We are interested in adding support in Coresight driver to collect trace data at
the time of kernel panic. Would like to know the current status of this
patch series you posted some time back [1]. If no one is working on this,
we would like to revive and resubmit this.
Appreciate your guidance on this.
Also please discard the earlier duplicate emails which i have send to
you by mistake.
Thanks.
1. https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1652258.html.
This patch set is to address the comments in the patch set v1 [1] and
updates the patches.
Changes from v1:
- Added Adrian's review tag for patch 01;
- Refined comment in patch 01 (James);
- Added James' review and test tag for patch 02.
[1] https://lore.kernel.org/patchwork/cover/1473936/
Leo Yan (2):
perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
perf auxtrace arm: Support
compat_auxtrace_mmap__{read_head|write_tail}
tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++
tools/perf/util/auxtrace.c | 88 +++++++++++++++++++++++++++--
tools/perf/util/auxtrace.h | 22 +++++++-
3 files changed, 135 insertions(+), 7 deletions(-)
--
2.25.1
Hi Tao,
On Thu, Aug 19, 2021 at 05:29:37PM +0800, Tao Zhang wrote:
> The input parameter of the function pm_runtime_put should be the
> same in the function cti_enable_hw and cti_disable_hw. The correct
> parameter to use here should be dev->parent.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-cti-core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index e2a3620..8988b2e 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -175,7 +175,7 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
> coresight_disclaim_device_unlocked(csdev);
> CS_LOCK(drvdata->base);
> spin_unlock(&drvdata->spinlock);
> - pm_runtime_put(dev);
> + pm_runtime_put(dev->parent);
coresight_register() allocates data structure 'coresight_device' and
assigns probed device to the field 'coresight_device::dev::parent';
thus afterwards we need to pass 'coresight_device::dev::parent' for
pm_runtime_xxx() functions. It's not directive for understanding, so
log the info at here.
For the patch:
Reviewed-by: Leo Yan <leo.yan(a)linaro.org>
We could wait for Mike to review as well.
Thanks,
Leo
> return 0;
>
> /* not disabled this call */
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the
low level's architecture code, e.g. for Arm64, it maps the memory with
the attribution "Normal non-cacheable"; this can be concluded from the
definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h.
Later when access the AUX bounce buffer, since the memory mapping is
non-cacheable, it's low efficiency due to every load instruction must
reach out DRAM.
This patch changes to allocate pages with alloc_pages_node(), thus the
driver can access the memory with cacheable mapping in the kernel linear
virtual address; therefore, because load instructions can fetch data
from cache lines rather than always read data from DRAM, the driver can
boost memory coping performance. After using the cacheable mapping, the
driver uses dma_sync_single_for_cpu() to invalidate cacheline prior to
read bounce buffer so can avoid read stale trace data.
By measurement the duration for function tmc_update_etr_buffer() with
ftrace function_graph tracer, it shows the performance significant
improvement for copying 4MiB data from bounce buffer:
# echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise
# echo tmc_update_etr_buffer > set_graph_function
# echo function_graph > current_tracer
before:
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) | tmc_update_etr_buffer() {
...
2) # 8148.320 us | }
after:
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) | tmc_update_etr_buffer() {
...
2) # 2463.980 us | }
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
Changes from v2:
Sync the entire buffer in one go when the tracing is wrap around
(Suzuki);
Add Suzuki's review tage.
Changes from v1:
Set "flat_buf->daddr" to 0 when fails to map DMA region; and dropped the
unexpected if condition change in tmc_etr_free_flat_buf().
.../hwtracing/coresight/coresight-tmc-etr.c | 47 ++++++++++++++++---
1 file changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 13fd1fc730ed..ac37e9376d2b 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -21,6 +21,7 @@
struct etr_flat_buf {
struct device *dev;
+ struct page *pages;
dma_addr_t daddr;
void *vaddr;
size_t size;
@@ -600,6 +601,7 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
{
struct etr_flat_buf *flat_buf;
struct device *real_dev = drvdata->csdev->dev.parent;
+ ssize_t aligned_size;
/* We cannot reuse existing pages for flat buf */
if (pages)
@@ -609,11 +611,18 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
if (!flat_buf)
return -ENOMEM;
- flat_buf->vaddr = dma_alloc_coherent(real_dev, etr_buf->size,
- &flat_buf->daddr, GFP_KERNEL);
- if (!flat_buf->vaddr) {
- kfree(flat_buf);
- return -ENOMEM;
+ aligned_size = PAGE_ALIGN(etr_buf->size);
+ flat_buf->pages = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO,
+ get_order(aligned_size));
+ if (!flat_buf->pages)
+ goto fail_alloc_pages;
+
+ flat_buf->vaddr = page_address(flat_buf->pages);
+ flat_buf->daddr = dma_map_page(real_dev, flat_buf->pages, 0,
+ aligned_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(real_dev, flat_buf->daddr)) {
+ flat_buf->daddr = 0;
+ goto fail_dma_map_page;
}
flat_buf->size = etr_buf->size;
@@ -622,6 +631,12 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
etr_buf->mode = ETR_MODE_FLAT;
etr_buf->private = flat_buf;
return 0;
+
+fail_dma_map_page:
+ __free_pages(flat_buf->pages, get_order(aligned_size));
+fail_alloc_pages:
+ kfree(flat_buf);
+ return -ENOMEM;
}
static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
@@ -630,15 +645,20 @@ static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
if (flat_buf && flat_buf->daddr) {
struct device *real_dev = flat_buf->dev->parent;
+ ssize_t aligned_size = PAGE_ALIGN(etr_buf->size);
- dma_free_coherent(real_dev, flat_buf->size,
- flat_buf->vaddr, flat_buf->daddr);
+ dma_unmap_page(real_dev, flat_buf->daddr, aligned_size,
+ DMA_FROM_DEVICE);
+ __free_pages(flat_buf->pages, get_order(aligned_size));
}
kfree(flat_buf);
}
static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
{
+ struct etr_flat_buf *flat_buf = etr_buf->private;
+ struct device *real_dev = flat_buf->dev->parent;
+
/*
* Adjust the buffer to point to the beginning of the trace data
* and update the available trace data.
@@ -648,6 +668,19 @@ static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
etr_buf->len = etr_buf->size;
else
etr_buf->len = rwp - rrp;
+
+ /*
+ * The driver always starts tracing at the beginning of the buffer,
+ * the only reason why we would get a wrap around is when the buffer
+ * is full. Sync the entire buffer in one go for this case.
+ */
+ if (etr_buf->offset + etr_buf->len > etr_buf->size)
+ dma_sync_single_for_cpu(real_dev, flat_buf->daddr,
+ etr_buf->size, DMA_FROM_DEVICE);
+ else
+ dma_sync_single_for_cpu(real_dev,
+ flat_buf->daddr + etr_buf->offset,
+ etr_buf->len, DMA_FROM_DEVICE);
}
static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
--
2.25.1