Re: IPC drop down on AMD epyc 7702P

28 Apr 2025


      Hello Prateek,
thank's for your reponse.
...
Looking at the commit logs, it looks like these commits do solve other
problems around load balancing and might not be trivial to revert
without evaluating the damages.
it's definitely not a productizable workaround !
...
The processor you are running on, the AME EPYC 7702P based on the Zen2
architecture contains 4 cores / 8 threads per CCX (LLC domain) which is
perhaps why reducing the thread count to below this limit is helping
your workload.
What we suspect is that when running the workload, the threads that
regularly sleep trigger a newidle balancing which causes them to move
to another CCX leading to higher number of L3 misses.
To confirm this, would it be possible to run the workload with the
not-yet-upstream perf sched stats [1] tool and share the result from
perf sched stats diff for the data from v6.12.17 and v6.12.17 + patch
to rule out any other second order effect.
[1] 
https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/
I had to patch tools/perf/util/session.c : static int 
open_file_read(struct perf_data *data) due to "failed to open perf.data: 
File exists" (looked more like a compiler issue than a tool/perf issue)
$ ./perf sched stats diff perf.data.6.12.17 perf.data.6.12.17patched > 
perf.diff (see perf.diff attached)
...
Assuming you control these deployments, would it possible to run
the workload on a kernel running with "relax_domain_level=2" kernel
cmdline that restricts newidle balance to only within the CCX. As a
side effect, it also limits  task wakeups to the same LLC domain but
I would still like to know if this makes a difference to the
workload you are running.
On vanilla 6.12.17 it gives the IPC we expected:
+--------------------+--------------------------+-----------------------+
|                    | relax_domain_level unset | relax_domain_level=2  |
+--------------------+--------------------------+-----------------------+
| Threads            |  210                     | 210                  |
| Utilization (%)    |  65,86                   | 52,01                |
| CPU effective freq |  1 622,93                |  1 294,12             |
| IPC                |  1,14                    | 1,42                 |
| L2 access (pti)    |  34,36                   | 38,18                |
| L2 miss   (pti)    |  7,34                    | 7,78                 |
| L3 miss   (abs)    |  39 711 971 741          |  33 929 609 924       |
| Mem (GB/s)         |  70,68                   | 49,10                |
| Context switches   |  109 281 524             |  107 896 729          |
+--------------------+--------------------------+-----------------------+
Kind regards,
JB

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: IPC drop down on AMD epyc 7702P