On 22/10/2019 12:43, Dietmar Eggemann wrote:
First I thought we can do with a little less drama by only preventing arch_scale_cpu_capacity() from consuming >= nr_cpu_ids.
@@ -1894,6 +1894,9 @@ static struct sched_domain_topology_level struct sched_domain_topology_level *tl, *asym_tl = NULL; unsigned long cap;
if (cpumask_empty(cpu_map))
return NULL;
Until I tried to hp'ed in CPU4 after CPU4/5 had been hp'ed out (your example further below) and I got another:
[ 68.014564] Unable to handle kernel paging request at virtual address fffe8009903d8ee0 ... [ 68.191293] Call trace: [ 68.193712] partition_sched_domains_locked+0x1a4/0x4a0 [ 68.198882] rebuild_sched_domains_locked+0x4d0/0x7b0 [ 68.203880] rebuild_sched_domains+0x24/0x40 [ 68.208104] cpuset_hotplug_workfn+0xe0/0x5f8 ...
@@ -2213,6 +2216,11 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[], * will be recomputed in function * update_tasks_root_domain(). */
if (cpumask_empty(doms_cur[i]))
printk("doms_cur[%d] empty\n", i);
rd = cpu_rq(cpumask_any(doms_cur[i]))->rd;
doms_cur[i] is empty when hp'ing in CPU4 again.
Your patch fixes this as well.
Thanks for giving it a spin!
Might be worth noting that this is not only about asym CPU capacity handling but missing checks after cpumask operations in case the cpuset is empty.
Aye, we end up saving whatever we're given (doms_cur = doms_new at the end of the rebuild). As you pointed out this is also an issue for the operation done by
f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
but it has been introduced after the asymmetry check, hence why I'm tagging the latter for stable.