Re: [PATCH] arm64: smp: Skip MC domain for SoCs without shared cache

15 Feb 2022

On Tue, 15 Feb 2022 at 17:46, Will Deacon will@kernel.org wrote:
...
On Tue, Feb 15, 2022 at 08:44:23AM -0800, Darren Hart wrote:
...
On Tue, Feb 15, 2022 at 04:38:59PM +0000, Will Decon wrote:
...
On Fri, Feb 11, 2022 at 03:20:51AM +0000, Song Bao Hua (Barry Song) wrote:
...
...
-----Original Message-----
From: Darren Hart [mailto:darren@os.amperecomputing.com]
Sent: Friday, February 11, 2022 2:43 PM
To: LKML linux-kernel@vger.kernel.org; Linux Arm
linux-arm-kernel@lists.infradead.org
Cc: Catalin Marinas catalin.marinas@arm.com; Will Deacon will@kernel.org;
Peter Zijlstra peterz@infradead.org; Vincent Guittot
vincent.guittot@linaro.org; Song Bao Hua (Barry Song)
song.bao.hua@hisilicon.com; Valentin Schneider
valentin.schneider@arm.com; D . Scott Phillips
scott@os.amperecomputing.com; Ilkka Koskinen
ilkka@os.amperecomputing.com; stable@vger.kernel.org
Subject: [PATCH] arm64: smp: Skip MC domain for SoCs without shared cache
SoCs such as the Ampere Altra define clusters but have no shared
processor-side cache. As of v5.16 with CONFIG_SCHED_CLUSTER and
CONFIG_SCHED_MC, build_sched_domain() will BUG() with:
BUG: arch topology borken
     the CLS domain not a subset of the MC domain
for each CPU (160 times for a 2 socket 80 core Altra system). The MC
level cpu mask is then extended to that of the CLS child, and is later
removed entirely as redundant.
This change detects when all cpu_coregroup_mask weights=1 and uses an
alternative sched_domain_topology equivalent to the default if
CONFIG_SCHED_MC were disabled.
The final resulting sched domain topology is unchanged with or without
CONFIG_SCHED_CLUSTER, and the BUG is avoided:
For CPU0:
With CLS:
CLS  [0-1]
DIE  [0-79]
NUMA [0-159]
Without CLS:
DIE  [0-79]
NUMA [0-159]
Cc: Catalin Marinas catalin.marinas@arm.com
Cc: Will Deacon will@kernel.org
Cc: Peter Zijlstra peterz@infradead.org
Cc: Vincent Guittot vincent.guittot@linaro.org
Cc: Barry Song song.bao.hua@hisilicon.com
Cc: Valentin Schneider valentin.schneider@arm.com
Cc: D. Scott Phillips scott@os.amperecomputing.com
Cc: Ilkka Koskinen ilkka@os.amperecomputing.com
Cc: stable@vger.kernel.org # 5.16.x
Signed-off-by: Darren Hart darren@os.amperecomputing.com
Hi Darrent,
What kind of resources are clusters sharing on Ampere Altra?
So on Altra, cpus are not sharing LLC? Each LLC is separate
for each cpu?
...

arch/arm64/kernel/smp.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27df5c1e6baa..0a78ac5c8830 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -715,9 +715,22 @@ void __init smp_init_cpus(void)
        }
 }
+static struct sched_domain_topology_level arm64_no_mc_topology[] = {
+#ifdef CONFIG_SCHED_SMT

  { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },



+#endif



+#ifdef CONFIG_SCHED_CLUSTER

  { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },



+#endif

  { cpu_cpu_mask, SD_INIT_NAME(DIE) },


  { NULL, },



+};



void __init smp_prepare_cpus(unsigned int max_cpus)
 {
        const struct cpu_operations *ops;

  bool use_no_mc_topology = true;
  int err;
  unsigned int cpu;
  unsigned int this_cpu;



@@ -758,6 +771,25 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
            set_cpu_present(cpu, true);
            numa_store_cpu_info(cpu);



          /*


           * Only use no_mc topology if all cpu_coregroup_mask weights=1


           */


          if (cpumask_weight(cpu_coregroup_mask(cpu)) > 1)


                  use_no_mc_topology = false;



This seems to be wrong? If you have 5 cpus,
Cpu0 has cpu_coregroup_mask(cpu)== 1, cpu1-4
has cpu_coregroup_mask(cpu)== 4, for cpu0, you still
need to remove MC, but for cpu1-4, you will need
CLS and MC both?
What is the *current* behaviour on such a system?
As I understand it, any system that uses the default topology which has
a cpus_coregroup weight of 1 and a child (cluster, smt, ...) weight > 1
will behave as described above by printing the following for each CPU
matching this criteria:
BUG: arch topology borken
        the [CLS,SMT,...] domain not a subset of the MC domain
And then extend the MC domain cpumask to match that of the child and continue
on.
That would still be the behavior for this type of system after this
patch is applied.
That's what I thought, but in that case applying your patch is a net
improvement: systems either get current or better behaviour.
CLUSTER level is normally defined as a intermediate group of the MC
level and both levels have the scheduler flag SD_SHARE_PKG_RESOURCES
flag
In the case of Ampere altra, they consider that CPUA have a CLUSTER
level which SD_SHARE_PKG_RESOURCES with another CPUB but the next and
larger MC level then says that CPUA doesn't SD_SHARE_PKG_RESOURCES
with CPUB which seems to be odd because the SD_SHARE_PKG_RESOURCES has
not disappeared.
Looks like there is a mismatch in topology description
...
Barry -- why shouldn't we take this as-is?
Will

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] arm64: smp: Skip MC domain for SoCs without shared cache