Hello, Waiman.
On Wed, Apr 12, 2023 at 11:37:53AM -0400, Waiman Long wrote:
This patch series introduces a new "isolcpus" partition type to the existing list of {member, root, isolated} types. The primary reason of adding this new "isolcpus" partition is to facilitate the distribution of isolated CPUs down the cgroup v2 hierarchy.
The other non-member partition types have the limitation that their parents have to be valid partitions too. It will be hard to create a partition a few layers down the hierarchy.
It is relatively rare to have applications that require creation of a separate scheduling domain (root). However, it is more common to have applications that require the use of isolated CPUs (isolated), e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options to get that statically. Of course, the "isolated" partition is another way to achieve that dynamically.
Modern container orchestration tools like Kubernetes use the cgroup hierarchy to manage different containers. If a container needs to use isolated CPUs, it is hard to get those with existing set of cpuset partition types. With this patch series, a new "isolcpus" partition can be created to hold a set of isolated CPUs that can be pull into other "isolated" partitions.
The "isolcpus" partition is special that there can have at most one instance of this in a system. It serves as a pool for isolated CPUs and cannot hold tasks or sub-cpusets underneath it. It is also not cpu-exclusive so that the isolated CPUs can be distributed down the sibling hierarchies, though those isolated CPUs will not be useable until the partition type becomes "isolated".
Once isolated CPUs are needed in a cgroup, the administrator can write a list of isolated CPUs into its "cpuset.cpus" and change its partition type to "isolated" to pull in those isolated CPUs from the "isolcpus" partition and use them in that cgroup. That will make the distribution of isolated CPUs to cgroups that need them much easier.
I'm not sure about this. It feels really hacky in that it side-steps the distribution hierarchy completely. I can imagine a non-isolated cpuset wanting to allow isolated cpusets downstream but that should be done hierarchically - e.g. by allowing a cgroup to express what isolated cpus are allowed in the subtree. Also, can you give more details on the targeted use cases?
Thanks.