On Mon, Mar 14, 2022 at 05:35:05PM +0100, Dietmar Eggemann wrote:
On 09/03/2022 19:26, Darren Hart wrote:
On Wed, Mar 09, 2022 at 01:50:07PM +0100, Dietmar Eggemann wrote:
On 08/03/2022 18:49, Darren Hart wrote:
On Tue, Mar 08, 2022 at 05:03:07PM +0100, Dietmar Eggemann wrote:
On 08/03/2022 12:04, Vincent Guittot wrote:
On Tue, 8 Mar 2022 at 11:30, Will Deacon will@kernel.org wrote:
[...]
I do not have any better idea than this tweak here either in case the platform can't provide a cleaner setup.
I'd argue The platform is describing itself accurately in ACPI PPTT terms. The topology doesn't fit nicely within the kernel abstractions today. This is an area where I hope to continue to improve things going forward.
I see. And I assume lying about SCU/LLC boundaries in ACPI is not an option since it messes up /sys/devices/system/cpu/cpu0/cache/index*/.
[...]
I'm not aware of a way to accurately describe the SCU topology in the PPTT, and the risk we run with lying about LLC topology is that lie has to be comprehended by all OSes and not conflict with other lies people may ask for. In general, I think it is preferable and more maintainable to describe the topology as accurately and honestly as we can within the existing platform mechanisms (PPTT, HMAT, etc) and work on the higher level abstractions to accommodate a broader set of topologies as they emerge (as well as working to more fully describe the topology with new platform level mechanisms as needed).
As I mentioned, I intend to continue looking in to how to improve the current abstractions. For now, it sounds like we have agreement that this patch can be merged to address the BUG?
What about swapping the CLS and MC cpumasks for such a machine? This would avoid that the task scheduler has to deal with a system which has CLS but no MC. We essentially promote the CLS cpumask up to MC in this case.
cat /sys/kernel/debug/sched/domains/cpu0/domain*/name MC ^^ DIE NUMA
cat /sys/kernel/debug/sched/domains/cpu0# cat domain*/flags SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING ^^^^^^^^^^^^^^^^^^^^^^ SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_PREFER_SIBLING SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SERIALIZE SD_OVERLAP SD_NUMA
Only very lightly tested on Altra and Juno-r0 (DT).
--->8---
From 54bef59e7f50fa41b7ae39190fd71af57209c27d Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann dietmar.eggemann@arm.com Date: Mon, 14 Mar 2022 15:08:23 +0000 Subject: [PATCH] arch_topology: Swap MC & CLS SD mask if MC weight==1 & subset(MC,CLS)
This avoids the issue of having a system with a CLS SD but no MC SD. CLS should be sub-SD of MC.
Hi Dietmar,
Ultimately, this delivers the same result. I do think it imposes more complexity for everyone to address what as far as I'm aware only affect the one system.
I don't think the term "Cluster" has a clear and universally understood definition, so I don't think it's a given that "CLS should be sub-SD of MC". I think this has been assumed, and that assumption has mostly held up, but this is an abstraction, and the abstraction should follow the physical topologies rather than the other way around in my opinion. If that's the primary motivation for this approach, I don't think it justifies the additional complexity.
All told, I prefer the 2 line change contained within cpu_coregroup_mask() which handles the one known exception with minimal impact. It's easy enough to come back to this to address more cases with a more complex solution if needed in the future - but I prefer to introduce the least amount of complexity as possible to address the known issues, especially if the end result is the same and the cost is paid by the affected systems.
Thanks,
The cpumask under /sys/devices/system/cpu/cpu*/cache/index* and /sys/devices/system/cpu/cpu*/topology are not changed by this.
Signed-off-by: Dietmar Eggemann dietmar.eggemann@arm.com
drivers/base/arch_topology.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 976154140f0b..9af90a5625c7 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -614,7 +614,7 @@ static int __init parse_dt_topology(void) struct cpu_topology cpu_topology[NR_CPUS]; EXPORT_SYMBOL_GPL(cpu_topology); -const struct cpumask *cpu_coregroup_mask(int cpu) +const struct cpumask *_cpu_coregroup_mask(int cpu) { const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); @@ -631,11 +631,37 @@ const struct cpumask *cpu_coregroup_mask(int cpu) return core_mask; } -const struct cpumask *cpu_clustergroup_mask(int cpu) +const struct cpumask *_cpu_clustergroup_mask(int cpu) { return &cpu_topology[cpu].cluster_sibling; } +static int +swap_masks(const cpumask_t *core_mask, const cpumask_t *cluster_mask) +{
- if (cpumask_weight(core_mask) == 1 &&
cpumask_subset(core_mask, cluster_mask))
return 1;
- return 0;
+}
+const struct cpumask *cpu_coregroup_mask(int cpu) +{
- const cpumask_t *cluster_mask = _cpu_clustergroup_mask(cpu);
- const cpumask_t *core_mask = _cpu_coregroup_mask(cpu);
- return swap_masks(core_mask, cluster_mask) ? cluster_mask : core_mask;
+}
+const struct cpumask *cpu_clustergroup_mask(int cpu) +{
- const cpumask_t *cluster_mask = _cpu_clustergroup_mask(cpu);
- const cpumask_t *core_mask = _cpu_coregroup_mask(cpu);
- return swap_masks(core_mask, cluster_mask) ? core_mask : cluster_mask;
+}
void update_siblings_masks(unsigned int cpuid) { struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid]; -- 2.25.1