On Thu, Feb 17, 2022 at 07:56:00AM +1300, Barry Song wrote: ...
Then, there is another point: In your case, CLUSTER level still has the flag SD_SHARE_PKG_RESOURCES which is used to define some scheduler internal variable like sd_llc(sched domain last level of cache) which allows fast task migration between this cpus in this level at wakeup. In your case the sd_llc should not be the cluster but the MC with only one CPU. But I would not be surprised that most of perf improvement comes from this sd_llc wrongly set to cluster instead of the single CPU
I assume this "mistake" is actually what Ampere altra needs while it is wrong but getting right result? Ampere altra has already got both:
Hi Barry,
Generally yes - although I do think we're placing too much emphasis on the "right" or "wrong" of a heuristic which are more fluid in definition over time. (e.g. I expect this will look different in a year based on what we learn from this and other non current default topologies).
- Load Balance between clusters
- wake_affine by select sibling cpu which is sharing SCU
I am not sure how much 1 and 2 are helping Darren's workloads respectively.
We definitely see improvements with load balancing between clusters. We're running some tests with the wake_affine patchset you pointed me to (thanks for that). My initial tbench runs resulted in higher average and max latencies reported. I need to collect more results and see the impact to other benchmarks of interest before I have more to share on that.
Hi Darren, if you read Vincent's comments carefully, you will find it is pointless for you to test the wake_affine patchset as you have already got it. in your case, sd_llc_id is set to sd_cluster level due to PKG_RESOURCES sharing. So with my new patchset for wake_affine, it is completely redundant for your machine as it works with the assumption cluster-> llc. but for your case, llc=cluster, so it works in cluster->cluster.
Thanks Barry,
Makes sense as described. I did see degradation in the tests we ran with this patch applied to 5.17-rc3. I'll have to follow up with you on that when I can dig into it more. I'd be interested in the specifics of your testing to run something similar. I think you said you were reporting on tbench?