While the static key is correctly initialized as being disabled, it will remain forever enabled once turned on. This means that if we start with an asymmetric system and hotplug out enough CPUs to end up with an SMP system, the static key will remain set - which is obviously wrong. We should detect this and turn off things like misfit migration and EAS wakeups.
Having that key enabled should also mandate
per_cpu(sd_asym_cpucapacity, cpu) != NULL
for all CPUs, but this is obviously not true with the above.
On top of that, sched domain rebuilds first lead to attaching the NULL domain to the affected CPUs, which means there will be a window where the static key is set but the sd_asym_cpucapacity shortcut points to NULL even if asymmetry hasn't been hotplugged out.
Disable the static key when destroying domains, and let build_sched_domains() (re) enable it as needed.
Cc: stable@vger.kernel.org Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations") Signed-off-by: Valentin Schneider valentin.schneider@arm.com --- kernel/sched/topology.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b5667a273bf6..c49ae57a0611 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2123,7 +2123,8 @@ static void detach_destroy_domains(const struct cpumask *cpu_map) { int i;
+ static_branch_disable_cpuslocked(&sched_asym_cpucapacity); + rcu_read_lock(); for_each_cpu(i, cpu_map) cpu_attach_domain(NULL, &def_root_domain, i); -- 2.22.0
On Mon, 14 Oct 2019 at 13:47, Valentin Schneider valentin.schneider@arm.com wrote:
While the static key is correctly initialized as being disabled, it will remain forever enabled once turned on. This means that if we start with an asymmetric system and hotplug out enough CPUs to end up with an SMP system, the static key will remain set - which is obviously wrong. We should detect this and turn off things like misfit migration and EAS wakeups.
Having that key enabled should also mandate
per_cpu(sd_asym_cpucapacity, cpu) != NULL
for all CPUs, but this is obviously not true with the above.
On top of that, sched domain rebuilds first lead to attaching the NULL domain to the affected CPUs, which means there will be a window where the static key is set but the sd_asym_cpucapacity shortcut points to NULL even if asymmetry hasn't been hotplugged out.
Disable the static key when destroying domains, and let build_sched_domains() (re) enable it as needed.
Cc: stable@vger.kernel.org Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations") Signed-off-by: Valentin Schneider valentin.schneider@arm.com
Acked-by: Vincent Guittot <vincent .guittot@linaro.org>
kernel/sched/topology.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b5667a273bf6..c49ae57a0611 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2123,7 +2123,8 @@ static void detach_destroy_domains(const struct cpumask *cpu_map) { int i;
static_branch_disable_cpuslocked(&sched_asym_cpucapacity);
rcu_read_lock(); for_each_cpu(i, cpu_map) cpu_attach_domain(NULL, &def_root_domain, i);
-- 2.22.0
Hi Valentin,
On Monday 14 Oct 2019 at 12:47:10 (+0100), Valentin Schneider wrote:
While the static key is correctly initialized as being disabled, it will remain forever enabled once turned on. This means that if we start with an asymmetric system and hotplug out enough CPUs to end up with an SMP system, the static key will remain set - which is obviously wrong. We should detect this and turn off things like misfit migration and EAS wakeups.
FWIW we already clear the EAS static key properly (based on the sd pointer, not the static key), so this is really only for the capacity-aware stuff.
Having that key enabled should also mandate
per_cpu(sd_asym_cpucapacity, cpu) != NULL
for all CPUs, but this is obviously not true with the above.
On top of that, sched domain rebuilds first lead to attaching the NULL domain to the affected CPUs, which means there will be a window where the static key is set but the sd_asym_cpucapacity shortcut points to NULL even if asymmetry hasn't been hotplugged out.
Disable the static key when destroying domains, and let build_sched_domains() (re) enable it as needed.
Cc: stable@vger.kernel.org Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations") Signed-off-by: Valentin Schneider valentin.schneider@arm.com
kernel/sched/topology.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b5667a273bf6..c49ae57a0611 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2123,7 +2123,8 @@ static void detach_destroy_domains(const struct cpumask *cpu_map) { int i;
- static_branch_disable_cpuslocked(&sched_asym_cpucapacity);
- rcu_read_lock(); for_each_cpu(i, cpu_map) cpu_attach_domain(NULL, &def_root_domain, i);
So what happens it you have mutiple root domains ? You might skip build_sched_domains() for one of them and end up not setting the static key when you should no ?
I suppose an alternative would be to play with static_branch_inc() / static_branch_dec() from build_sched_domains() or something along those lines.
Thanks, Quentin
On Mon, 14 Oct 2019 at 14:16, Quentin Perret qperret@google.com wrote:
Hi Valentin,
On Monday 14 Oct 2019 at 12:47:10 (+0100), Valentin Schneider wrote:
While the static key is correctly initialized as being disabled, it will remain forever enabled once turned on. This means that if we start with an asymmetric system and hotplug out enough CPUs to end up with an SMP system, the static key will remain set - which is obviously wrong. We should detect this and turn off things like misfit migration and EAS wakeups.
FWIW we already clear the EAS static key properly (based on the sd pointer, not the static key), so this is really only for the capacity-aware stuff.
Having that key enabled should also mandate
per_cpu(sd_asym_cpucapacity, cpu) != NULL
for all CPUs, but this is obviously not true with the above.
On top of that, sched domain rebuilds first lead to attaching the NULL domain to the affected CPUs, which means there will be a window where the static key is set but the sd_asym_cpucapacity shortcut points to NULL even if asymmetry hasn't been hotplugged out.
Disable the static key when destroying domains, and let build_sched_domains() (re) enable it as needed.
Cc: stable@vger.kernel.org Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations") Signed-off-by: Valentin Schneider valentin.schneider@arm.com
kernel/sched/topology.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b5667a273bf6..c49ae57a0611 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2123,7 +2123,8 @@ static void detach_destroy_domains(const struct cpumask *cpu_map) { int i;
static_branch_disable_cpuslocked(&sched_asym_cpucapacity);
rcu_read_lock(); for_each_cpu(i, cpu_map) cpu_attach_domain(NULL, &def_root_domain, i);
So what happens it you have mutiple root domains ? You might skip build_sched_domains() for one of them and end up not setting the static key when you should no ?
good point
I suppose an alternative would be to play with static_branch_inc() / static_branch_dec() from build_sched_domains() or something along those lines.
Thanks, Quentin
(Replying to the reply because for some reason my mail client never got your reply?!)
On 14/10/2019 14:29, Vincent Guittot wrote:
On Mon, 14 Oct 2019 at 14:16, Quentin Perret qperret@google.com wrote:
FWIW we already clear the EAS static key properly (based on the sd pointer, not the static key), so this is really only for the capacity-aware stuff.
Ah, right.
So what happens it you have mutiple root domains ? You might skip build_sched_domains() for one of them and end up not setting the static key when you should no ?
I suppose an alternative would be to play with static_branch_inc() / static_branch_dec() from build_sched_domains() or something along those lines.
Hmph, so I went with the concept that having the key set should mandate having a non-NULL sd_asym_cpucapacity domain, which is why I unset it as soon as one CPU gets attached to a NULL domain.
Sadly as you pointed out, this doesn't work if we have another root domain that sees asymmetry. It also kinda sounds broken to have SDs of a root domain that does not see asymmetry (e.g. LITTLEs only) to see that key being set. Maybe what we want is to have a key per root domain?
Thanks, Quentin
On Monday 14 Oct 2019 at 14:46:58 (+0100), Valentin Schneider wrote:
(Replying to the reply because for some reason my mail client never got your reply?!)
On 14/10/2019 14:29, Vincent Guittot wrote:
On Mon, 14 Oct 2019 at 14:16, Quentin Perret qperret@google.com wrote:
FWIW we already clear the EAS static key properly (based on the sd pointer, not the static key), so this is really only for the capacity-aware stuff.
Ah, right.
So what happens it you have mutiple root domains ? You might skip build_sched_domains() for one of them and end up not setting the static key when you should no ?
I suppose an alternative would be to play with static_branch_inc() / static_branch_dec() from build_sched_domains() or something along those lines.
Hmph, so I went with the concept that having the key set should mandate having a non-NULL sd_asym_cpucapacity domain, which is why I unset it as soon as one CPU gets attached to a NULL domain.
Sadly as you pointed out, this doesn't work if we have another root domain that sees asymmetry. It also kinda sounds broken to have SDs of a root domain that does not see asymmetry (e.g. LITTLEs only) to see that key being set. Maybe what we want is to have a key per root domain?
Right, but that's not possible by definition -- static keys aren't variables. The static keys for asym CPUs and for EAS are just to optimize the case when they're disabled, but when they _are_ enabled, you have no choice but do another per-rd check.
And to clarify what I tried to say before, it might be possible to 'count' the number of RDs that have SD_ASYM_CPUCAPACITY set using static_branch_inc()/dec(), like we do for the SMT static key. I remember trying to do something like that for EAS, but that was easier said than done ... :)
Thanks, Quentin
On 14/10/2019 14:52, Quentin Perret wrote:
Right, but that's not possible by definition -- static keys aren't variables. The static keys for asym CPUs and for EAS are just to optimize the case when they're disabled, but when they _are_ enabled, you have no choice but do another per-rd check.
Bleh, right, realized my nonsense after sending the email.
And to clarify what I tried to say before, it might be possible to 'count' the number of RDs that have SD_ASYM_CPUCAPACITY set using static_branch_inc()/dec(), like we do for the SMT static key. I remember trying to do something like that for EAS, but that was easier said than done ... :)
Gotcha, the reason I didn't go with this is that I wanted to maintain the relationship between the key and the flag (you either have both or none). It feels icky to have the key set and to have a NULL sd_asym_cpucapacity pointer.
An alternative might be to have a separate counter for asymmetric rd's, always disable the key on domain destruction and use that counter to figure out if we need to restore it. If we don't care about having a NULL SD pointer while the key is set, we could use the included counter as you're suggesting.
Thanks, Quentin
On 14/10/2019 18:03, Valentin Schneider wrote:
On 14/10/2019 14:52, Quentin Perret wrote:
Right, but that's not possible by definition -- static keys aren't variables. The static keys for asym CPUs and for EAS are just to optimize the case when they're disabled, but when they _are_ enabled, you have no choice but do another per-rd check.
Bleh, right, realized my nonsense after sending the email.
And to clarify what I tried to say before, it might be possible to 'count' the number of RDs that have SD_ASYM_CPUCAPACITY set using static_branch_inc()/dec(), like we do for the SMT static key. I remember trying to do something like that for EAS, but that was easier said than done ... :)
Gotcha, the reason I didn't go with this is that I wanted to maintain the relationship between the key and the flag (you either have both or none). It feels icky to have the key set and to have a NULL sd_asym_cpucapacity pointer.
An alternative might be to have a separate counter for asymmetric rd's, always disable the key on domain destruction and use that counter to figure out if we need to restore it. If we don't care about having a NULL SD pointer while the key is set, we could use the included counter as you're suggesting.
I still don't understand the benefit of the counter approach here. sched_smt_present counts the number of cores with SMT. So in case you have 2 SMT cores with 2 HW threads and you CPU hp out one CPU, you still have sched_smt_present, although 1 CPU doesn't have a SMT thread sibling anymore.
Valentin's patch makes sure that sched_asym_cpucapacity is correctly set when the sd hierarchy is rebuild due to CPU hp. Including the unlikely scenario that an asymmetric CPU capacity system (based on DT's capacity-dmips-mhz values) turns normal SMT because of the max frequency values of the CPUs involved.
Systems with a mix of asymmetric and symmetric CPU capacity rd's have to live with the fact that wake_cap and misfit handling is enabled for them. This should be the case already today.
There should be no SD_ASYM_CPUCAPACITY flag on the sd's of the CPUs of the symmetric CPU capacity rd's. I.e. update_top_cache_domain() should set sd_asym_cpucapacity=NULL for those CPUs.
So as a rule we could say even if a static key enables a code path, a derefenced sd still has to be checked against NULL.
On 15/10/2019 10:22, Dietmar Eggemann wrote:
I still don't understand the benefit of the counter approach here. sched_smt_present counts the number of cores with SMT. So in case you have 2 SMT cores with 2 HW threads and you CPU hp out one CPU, you still have sched_smt_present, although 1 CPU doesn't have a SMT thread sibling anymore.
Valentin's patch makes sure that sched_asym_cpucapacity is correctly set when the sd hierarchy is rebuild due to CPU hp. Including the unlikely scenario that an asymmetric CPU capacity system (based on DT's capacity-dmips-mhz values) turns normal SMT because of the max frequency values of the CPUs involved.
Systems with a mix of asymmetric and symmetric CPU capacity rd's have to live with the fact that wake_cap and misfit handling is enabled for them. This should be the case already today.
Good point, that's what I slowly came to realize this morning.
There should be no SD_ASYM_CPUCAPACITY flag on the sd's of the CPUs of the symmetric CPU capacity rd's. I.e. update_top_cache_domain() should set sd_asym_cpucapacity=NULL for those CPUs.
So as a rule we could say even if a static key enables a code path, a derefenced sd still has to be checked against NULL.
Yeah, I think there's no escaping it. Let me see if I can do something sensible regarding the static key.
On Tuesday 15 Oct 2019 at 11:22:12 (+0200), Dietmar Eggemann wrote:
I still don't understand the benefit of the counter approach here. sched_smt_present counts the number of cores with SMT. So in case you have 2 SMT cores with 2 HW threads and you CPU hp out one CPU, you still have sched_smt_present, although 1 CPU doesn't have a SMT thread sibling anymore.
Right, and we want something similar for the asym static key I think. That is, it should be enabled if _at least one_ RD is asymmetric.
Valentin's patch makes sure that sched_asym_cpucapacity is correctly set when the sd hierarchy is rebuild due to CPU hp.
As mentioned in a previous email, I think this patch is broken in case you have multiple asymmetric RDs, but counting should fix it, though it might not be that easy to implement.
Including the unlikely scenario that an asymmetric CPU capacity system (based on DT's capacity-dmips-mhz values) turns normal SMT because of the max frequency values of the CPUs involved.
I'm not sure what you meant here ?
Systems with a mix of asymmetric and symmetric CPU capacity rd's have to live with the fact that wake_cap and misfit handling is enabled for them. This should be the case already today.
There should be no SD_ASYM_CPUCAPACITY flag on the sd's of the CPUs of the symmetric CPU capacity rd's. I.e. update_top_cache_domain() should set sd_asym_cpucapacity=NULL for those CPUs.
So as a rule we could say even if a static key enables a code path, a derefenced sd still has to be checked against NULL.
Right, that's already true today, and I don't see a possible alternative to that. Same thing for the EAS static key and the pd list btw.
Thanks, Quentin
On 15/10/2019 13:07, Quentin Perret wrote:
On Tuesday 15 Oct 2019 at 11:22:12 (+0200), Dietmar Eggemann wrote:
I still don't understand the benefit of the counter approach here. sched_smt_present counts the number of cores with SMT. So in case you have 2 SMT cores with 2 HW threads and you CPU hp out one CPU, you still have sched_smt_present, although 1 CPU doesn't have a SMT thread sibling anymore.
Right, and we want something similar for the asym static key I think. That is, it should be enabled if _at least one_ RD is asymmetric.
Valentin's patch makes sure that sched_asym_cpucapacity is correctly set when the sd hierarchy is rebuild due to CPU hp.
As mentioned in a previous email, I think this patch is broken in case you have multiple asymmetric RDs, but counting should fix it, though it might not be that easy to implement.
Correct, now I see your point! Can be easily recreated on JUNO setting up two exclusive cpusets, each with one big CPU and then hp'ing one of them.
And partition_sched_domains_locked()/build_perf_domains() handles this by distinguishing existing vs. new PDs and or'ing has_eas.
Including the unlikely scenario that an asymmetric CPU capacity system (based on DT's capacity-dmips-mhz values) turns normal SMT because of the max frequency values of the CPUs involved.
I'm not sure what you meant here ?
The rebuild of the sd hierarchy after we know the max frequency values of the CPUs from CpuFreq (2. SD hierarchy build on asym CPU capacity systems with CPUfreq).
E.g on Juno when you set capacity-dmips-mhz_{big;little}=791;1024 with max_CPU_freq_{big;little}=1,100,000;850,000.
It's a theoretical case and might not even work due to rounding errors.
Systems with a mix of asymmetric and symmetric CPU capacity rd's have to live with the fact that wake_cap and misfit handling is enabled for them. This should be the case already today.
There should be no SD_ASYM_CPUCAPACITY flag on the sd's of the CPUs of the symmetric CPU capacity rd's. I.e. update_top_cache_domain() should set sd_asym_cpucapacity=NULL for those CPUs.
So as a rule we could say even if a static key enables a code path, a derefenced sd still has to be checked against NULL.
Right, that's already true today, and I don't see a possible alternative to that. Same thing for the EAS static key and the pd list btw.
Agreed.
linux-stable-mirror@lists.linaro.org