If a profiling program runs in a non-root PID namespace, if CoreSight
driver enables PID tracing (with contextID), it can lead to mismatching
issue between the context ID traced in hardware (from the root
namespace) and the PIDs gathered by profiling tool (e.g. perf) in its
non-root namespace.
CoreSight driver has tried to address this issue for the contextID
related interfaces under sysfs, but it misses to prevent user to set
VMID (virtual contextID) for kernel runs in EL2 with VHE; furthermore,
it misses to handle the case when the profiling tool runs in the
non-root PID namespace.
For this reason, this patch series is to correct contextID tracing for
non-root namespace. After applied this patchset, patch 02 doesn't
permit users to access virtual contextID via sysfs nodes in the non-root
PID namespace, patch 03 and 04 stop to trace PID packet for non-root PID
namespace.
This patch is dependent on the patchset "pid: Introduce helper
task_is_in_root_ns()" [1].
[1] https://lore.kernel.org/lkml/20211208083320.472503-1-leo.yan@linaro.org/
Leo Yan (4):
coresight: etm4x: Add lock for reading virtual context ID comparator
coresight: etm4x: Don't use virtual contextID for non-root PID
namespace
coresight: etm4x: Don't trace PID for non-root PID namespace
coresight: etm3x: Don't trace PID for non-root PID namespace
.../coresight/coresight-etm3x-core.c | 4 +++
.../coresight/coresight-etm4x-core.c | 10 +++++--
.../coresight/coresight-etm4x-sysfs.c | 30 +++++++++++++++++++
3 files changed, 42 insertions(+), 2 deletions(-)
--
2.25.1
On Tue, Dec 14, 2021 at 04:58:32PM +1100, Balbir Singh wrote:
> On Wed, Dec 08, 2021 at 04:33:17PM +0800, Leo Yan wrote:
> > This patch replaces open code with task_is_in_init_pid_ns() to check if
> > a task is in root PID namespace.
> >
> > Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
> > ---
> > drivers/connector/cn_proc.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
> > index 646ad385e490..ccac1c453080 100644
> > --- a/drivers/connector/cn_proc.c
> > +++ b/drivers/connector/cn_proc.c
> > @@ -358,7 +358,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
> > * other namespaces.
> > */
> > if ((current_user_ns() != &init_user_ns) ||
> > - (task_active_pid_ns(current) != &init_pid_ns))
> > + !task_is_in_init_pid_ns(current))
> > return;
> >
>
> Sounds like there might scope for other wrappers - is_current_in_user_init_ns()
Indeed, for new wrapper is_current_in_user_init_ns(), I searched kernel
and located there have multiple places use open code. If no one works
on this refactoring, I will send a new patchset for it separately.
> Acked-by: Balbir Singh <bsingharora(a)gmail.com>
Thank you for review, Balbir! Also thanks review from Leon and
Suzuki.
Leo
Hi,
On Tue, Dec 14, 2021 at 03:16:42AM +0800, kernel test robot wrote:
> Hi Leo,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v5.16-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Leo-Yan/coresight-etm-Correct-PID-…
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2585cf9dfaaddf00b069673f27bb3f8530e2039c
> config: arm-buildonly-randconfig-r003-20211213 (https://download.01.org/0day-ci/archive/20211214/202112140344.viPmOWp6-lkp@…)
> compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b6a2ddb6c8ac29412b1361810972e15221fa021c)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install arm cross compiling tool for clang build
> # apt-get install binutils-arm-linux-gnueabi
> # https://github.com/0day-ci/linux/commit/81d5f47788c40d34c8159d64d4505eb4852…
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Leo-Yan/coresight-etm-Correct-PID-tracing-for-non-root-namespace/20211213-201632
> git checkout 81d5f47788c40d34c8159d64d4505eb485254e8f
> # save the config file to linux build tree
> mkdir build_dir
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/hwtracing/coresight/
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp(a)intel.com>
>
> All errors (new ones prefixed by >>):
>
> >> drivers/hwtracing/coresight/coresight-etm3x-core.c:344:7: error: implicit declaration of function 'task_is_in_init_pid_ns' [-Werror,-Wimplicit-function-declaration]
> if (!task_is_in_init_pid_ns(current))
> ^
> 1 error generated.
>
>
> vim +/task_is_in_init_pid_ns +344 drivers/hwtracing/coresight/coresight-etm3x-core.c
>
> 301
> 302 #define ETM3X_SUPPORTED_OPTIONS (ETMCR_CYC_ACC | \
> 303 ETMCR_TIMESTAMP_EN | \
> 304 ETMCR_RETURN_STACK)
> 305
> 306 static int etm_parse_event_config(struct etm_drvdata *drvdata,
> 307 struct perf_event *event)
> 308 {
> 309 struct etm_config *config = &drvdata->config;
> 310 struct perf_event_attr *attr = &event->attr;
> 311
> 312 if (!attr)
> 313 return -EINVAL;
> 314
> 315 /* Clear configuration from previous run */
> 316 memset(config, 0, sizeof(struct etm_config));
> 317
> 318 if (attr->exclude_kernel)
> 319 config->mode = ETM_MODE_EXCL_KERN;
> 320
> 321 if (attr->exclude_user)
> 322 config->mode = ETM_MODE_EXCL_USER;
> 323
> 324 /* Always start from the default config */
> 325 etm_set_default(config);
> 326
> 327 /*
> 328 * By default the tracers are configured to trace the whole address
> 329 * range. Narrow the field only if requested by user space.
> 330 */
> 331 if (config->mode)
> 332 etm_config_trace_mode(config);
> 333
> 334 /*
> 335 * At this time only cycle accurate, return stack and timestamp
> 336 * options are available.
> 337 */
> 338 if (attr->config & ~ETM3X_SUPPORTED_OPTIONS)
> 339 return -EINVAL;
> 340
> 341 config->ctrl = attr->config;
> 342
> 343 /* Don't trace contextID when runs in non-root PID namespace */
> > 344 if (!task_is_in_init_pid_ns(current))
This patchset is based on another patchset [1], as described on the
cover letter patch. This is why here reports for building failure.
To avoid the false positive reporting, if any better practice I can
follow up to resolve the dependency between two patchsets, please let
me know and I will do in next time.
Thanks,
Leo
[1] https://lore.kernel.org/lkml/20211208083320.472503-1-leo.yan@linaro.org/
> 345 config->ctrl &= ~ETMCR_CTXID_SIZE;
> 346
> 347 /*
> 348 * Possible to have cores with PTM (supports ret stack) and ETM
> 349 * (never has ret stack) on the same SoC. So if we have a request
> 350 * for return stack that can't be honoured on this core then
> 351 * clear the bit - trace will still continue normally
> 352 */
> 353 if ((config->ctrl & ETMCR_RETURN_STACK) &&
> 354 !(drvdata->etmccer & ETMCCER_RETSTACK))
> 355 config->ctrl &= ~ETMCR_RETURN_STACK;
> 356
> 357 return 0;
> 358 }
> 359
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Sat, Dec 11, 2021 at 05:02:21PM +0800, Jason Wang wrote:
> The double `the' in the comment in line 732 is repeated. Remove one
> of them from the comment.
>
> Signed-off-by: Jason Wang <wangborong(a)cdjrlc.com>
> ---
> drivers/hwtracing/coresight/coresight-core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 8a18c71df37a..88653d1c06a4 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -729,7 +729,7 @@ static inline void coresight_put_ref(struct coresight_device *csdev)
> * coresight_grab_device - Power up this device and any of the helper
> * devices connected to it for trace operation. Since the helper devices
> * don't appear on the trace path, they should be handled along with the
> - * the master device.
> + * master device.
Applied.
Thanks,
Mathieu
> */
> static int coresight_grab_device(struct coresight_device *csdev)
> {
> --
> 2.34.1
>
On 08/12/2021 02:45, Ian Rogers wrote:
> Perf cpu map has various functions where a cpumap and index are passed
> in order to load the cpu. A problem with this is that the wrong index
> may be passed for the cpumap, causing problems like aggregation on the
> wrong CPU:
> https://lore.kernel.org/lkml/20211204023409.969668-1-irogers@google.com/
>
> This patch set refactors the cpu map API, greatly reducing it and
> explicitly passing the cpu (rather than the pair) to functions that
> need it. Comments are added at the same time.
>
> Ian Rogers (22):
> libperf: Add comments to perf_cpu_map.
> perf stat: Add aggr creators that are passed a cpu.
> perf stat: Switch aggregation to use for_each loop
> perf stat: Switch to cpu version of cpu_map__get
> perf cpumap: Switch cpu_map__build_map to cpu function
> perf cpumap: Remove map+index get_socket
> perf cpumap: Remove map+index get_die
> perf cpumap: Remove map+index get_core
> perf cpumap: Remove map+index get_node
> perf cpumap: Add comments to aggr_cpu_id
> perf cpumap: Remove unused cpu_map__socket
> perf cpumap: Simplify equal function name.
> perf cpumap: Rename empty functions.
> perf cpumap: Document cpu__get_node and remove redundant function
> perf cpumap: Remove map from function names that don't use a map.
> perf cpumap: Remove cpu_map__cpu, use libperf function.
> perf cpumap: Refactor cpu_map__build_map
> perf cpumap: Rename cpu_map__get_X_aggr_by_cpu functions
> perf cpumap: Move 'has' function to libperf
> perf cpumap: Add some comments to cpu_aggr_map
> perf cpumap: Trim the cpu_aggr_map
> perf stat: Fix memory leak in check_per_pkg
>
> tools/lib/perf/cpumap.c | 7 +-
> tools/lib/perf/include/internal/cpumap.h | 9 +-
> tools/lib/perf/include/perf/cpumap.h | 1 +
> tools/perf/arch/arm/util/cs-etm.c | 16 +-
> tools/perf/builtin-ftrace.c | 2 +-
> tools/perf/builtin-sched.c | 6 +-
> tools/perf/builtin-stat.c | 273 ++++++++++++-----------
> tools/perf/tests/topology.c | 10 +-
> tools/perf/util/cpumap.c | 182 ++++++---------
> tools/perf/util/cpumap.h | 102 ++++++---
> tools/perf/util/cputopo.c | 2 +-
> tools/perf/util/env.c | 6 +-
> tools/perf/util/stat-display.c | 69 +++---
> tools/perf/util/stat.c | 9 +-
> tools/perf/util/stat.h | 3 +-
> 15 files changed, 361 insertions(+), 336 deletions(-)
>
For the whole set:
Reviewed-by: James Clark <james.clark(a)arm.com>
I didn't see any obvious issues with mixing up aggregation modes or CPU/idx types. Also
gave perf stat a test in the different modes and didn't see an issue.
But I'm wondering if it's possible to go further and add a struct around the CPU int so that the
compiler checks for correctness instead. It still seems quite easy to mix up index and
CPU, for example these functions are subtly different, but both use int:
LIBPERF_API int perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
LIBPERF_API bool perf_cpu_map__has(const struct perf_cpu_map *map, int cpu);
Something like this would make it impossible to make a mistake:
struct cpu { int cpu };
I mean it's more of a coincidence that CPUs can be identified by an integer, but they are more
of an object than an integer, so it could make sense to wrap it. But maybe it could be quite
cumbersome to use and be overkill.
Em Wed, Dec 08, 2021 at 06:34:14AM -0800, Ian Rogers escreveu:
> On Wed, Dec 8, 2021 at 4:06 AM John Garry <john.garry(a)huawei.com> wrote:
> >
> > On 08/12/2021 02:45, Ian Rogers wrote:
> > > diff --git a/tools/lib/perf/include/internal/cpumap.h b/tools/lib/perf/include/internal/cpumap.h
> > > index 840d4032587b..1c1726f4a04e 100644
> > > --- a/tools/lib/perf/include/internal/cpumap.h
> > > +++ b/tools/lib/perf/include/internal/cpumap.h
> > > @@ -4,9 +4,16 @@
> > >
> > > #include <linux/refcount.h>
> > >
> > > +/**
> > > + * A sized, reference counted, sorted array of integers representing CPU
> > > + * numbers. This is commonly used to capture which CPUs a PMU is associated
> > > + * with.
> > > + */
> > > struct perf_cpu_map {
> > > refcount_t refcnt;
> > > + /** Length of the map array. */
> > > int nr;
> > > + /** The CPU values. */
> > > int map[];
> >
> > would simply more distinct names for the variables help instead of or in
> > addition to comments?
Well, in this case the typical usage doesn't help, as 'struct
perf_cpu_map' are being used simply as "map" where it should be cpu_map,
so we would have:
cpu_map->nr
And all should be obvious, no? Otherwise we would have redundant 'cpu',
like:
cpu_map->nr_cpus
And 'map' should really be entries, so:
cpu_map->entries[index];
Would be clear enough, o?
> Thanks John! I agree. The phrase that is often used is intention
> revealing names. The kernel style for naming is to be brief:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#naming
> These names are both brief. nr is a little unusual, of course an
> integer is a number - size and length are common names in situations
> like these. In this case number makes sense as it is the number of
> CPUs in the array, and there is a certain readability in saying number
> of CPUs and not length or size of CPUs. The name map I have issue
> with, it is always a smell if you are calling a variable a data type.
> Given the convention in the context of this code I decided to leave
> it. Something like array_of_cpu_values would be more intention
> revealing but when run through the variable name shrinkifier could end
> up as just being array, which would be little better than map.
>
> The guidance on comments is that they are good and to focus on the
> what of what the code is doing:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#commenting
> refcnt was intention revealing enough and so I didn't add a comment to it.
>
> > Generally developers don't always check comments where the struct is
> > defined when the meaning could be judged intuitively
>
> Agreed. I think there could be a follow up to change to better names.
> As I was lacking a better suggestion I think for the time being, and
> in this patch set, we can keep things as they are.
>
> Thanks,
> Ian
>
> > Thanks,
> > John
> >
--
- Arnaldo
There are two checks, one is for size when running without admin, but
this one is covered by the driver and reported on in more detail here
(builtin-record.c):
pr_err("Permission error mapping pages.\n"
"Consider increasing "
"/proc/sys/kernel/perf_event_mlock_kb,\n"
"or try again with a smaller value of -m/--mmap_pages.\n"
"(current value: %u,%u)\n",
This had the effect of artificially limiting the aux buffer size to a
value smaller than what was allowed because perf_event_mlock_kb wasn't
taken into account.
The second is to check for a power of two, but this is covered here
(evlist.c):
pr_info("rounding mmap pages size to %s (%lu pages)\n",
buf, pages);
Signed-off-by: James Clark <james.clark(a)arm.com>
---
tools/perf/arch/arm/util/cs-etm.c | 19 -------------------
1 file changed, 19 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 293a23bf8be3..8a3d54a86c9c 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -407,25 +407,6 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
}
- /* Validate auxtrace_mmap_pages provided by user */
- if (opts->auxtrace_mmap_pages) {
- unsigned int max_page = (KiB(128) / page_size);
- size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
-
- if (!privileged &&
- opts->auxtrace_mmap_pages > max_page) {
- opts->auxtrace_mmap_pages = max_page;
- pr_err("auxtrace too big, truncating to %d\n",
- max_page);
- }
-
- if (!is_power_of_2(sz)) {
- pr_err("Invalid mmap size for %s: must be a power of 2\n",
- CORESIGHT_ETM_PMU_NAME);
- return -EINVAL;
- }
- }
-
if (opts->auxtrace_snapshot_mode)
pr_debug2("%s snapshot size: %zu\n", CORESIGHT_ETM_PMU_NAME,
opts->auxtrace_snapshot_size);
--
2.28.0