Linux-kselftest-mirror

linux-kselftest-mirror@lists.linaro.org

106 participants
14251 discussions

[PATCH bpf-next v3 0/3] Add FOU support for externally controlled ipip devices

by Christian Ehrig

This patch set adds support for using FOU or GUE encapsulation with an ipip device operating in collect-metadata mode and a set of kfuncs for controlling encap parameters exposed to a BPF tc-hook. BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses) in the ingress path of an externally controlled tunnel interface via the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be redirected to the same or a different externally controlled tunnel interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt} helpers and a call to bpf_redirect. This enables us to redirect packets between tunnel interfaces - and potentially change the encapsulation type - using only a single BPF program. Today this approach works fine for a couple of tunnel combinations. For example: redirecting packets between Geneve and GRE interfaces or GRE and plain ipip interfaces. However, redirecting using FOU or GUE is not supported today. The ip_tunnel module does not allow us to egress packets using additional UDP encapsulation from an ipip device in collect-metadata mode. Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to the tunnel metadata. It can be filled by a new BPF kfunc introduced in Patch 2 and evaluated by the ip_tunnel egress path. This will allow us to use FOU and GUE encap with externally controlled ipip devices. Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap. These helpers can be used to set and get UDP encap parameters from the BPF tc-hook doing the packet redirect. Patch 3 adds BPF tunnel selftests using the two kfuncs. --- v3: - Integrate selftest into test_progs (Alexei) v2: - Fixes for checkpatch.pl - Fixes for kernel test robot Christian Ehrig (3): ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs selftests/bpf: Test FOU kfuncs for externally controlled ipip devices include/net/fou.h | 2 + include/net/ip_tunnels.h | 28 ++-- net/ipv4/Makefile | 2 +- net/ipv4/fou_bpf.c | 119 ++++++++++++++ net/ipv4/fou_core.c | 5 + net/ipv4/ip_tunnel.c | 22 ++- net/ipv4/ipip.c | 1 + net/ipv6/sit.c | 2 +- .../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++- .../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++ 10 files changed, 432 insertions(+), 19 deletions(-) create mode 100644 net/ipv4/fou_bpf.c -- 2.39.2

2 years, 9 months

[PATCH v1 RESEND 0/6] mm: (pte|pmd)_mkdirty() should not unconditionally allow for write access

by David Hildenbrand

This is the follow-up on [1], adding selftests (testing for known issues we added workarounds for and other issues that haven't been fixed yet), fixing sparc64, reverting the workarounds, and perform one cleanup. The patch from [1] was modified slightly (updated/extended patch description, dropped one unnecessary NOP instruction from the ASM in __pte_mkhwwrite()). Retested on x86_64 and sparc64 (sun4u in QEMU). I scanned most architectures to make sure their (pte|pmd)_mkdirty() handling is correct. To be sure, we can run the selftests and find out if other architectures are still affectes (loongarch was fixed recently as well). Based on master for now. I don't expect surprises regarding mm-tress, but I can rebase if there are any problems. [1] https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Peter Xu <peterx(a)redhat.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Sam Ravnborg <sam(a)ravnborg.org> Cc: Yu Zhao <yuzhao(a)google.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> David Hildenbrand (6): selftests/mm: reuse read_pmd_pagesize() in COW selftest selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs without write permissions sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit mm/migrate: revert "mm/migrate: fix wrongly apply write bit after mkdirty on sparc64" mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"" mm/huge_memory: conditionally call maybe_mkwrite() and drop pte_wrprotect() in __split_huge_pmd_locked() arch/sparc/include/asm/pgtable_64.h | 116 +++--- mm/huge_memory.c | 16 +- mm/migrate.c | 2 - tools/testing/selftests/mm/Makefile | 2 + tools/testing/selftests/mm/cow.c | 33 +- tools/testing/selftests/mm/khugepaged.c | 4 + tools/testing/selftests/mm/mkdirty.c | 379 ++++++++++++++++++ tools/testing/selftests/mm/soft-dirty.c | 3 + .../selftests/mm/split_huge_page_test.c | 4 + tools/testing/selftests/mm/vm_util.c | 4 +- 10 files changed, 468 insertions(+), 95 deletions(-) create mode 100644 tools/testing/selftests/mm/mkdirty.c -- 2.39.2

2 years, 9 months

[RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition

by Waiman Long

This patch series introduces a new "isolcpus" partition type to the existing list of {member, root, isolated} types. The primary reason of adding this new "isolcpus" partition is to facilitate the distribution of isolated CPUs down the cgroup v2 hierarchy. The other non-member partition types have the limitation that their parents have to be valid partitions too. It will be hard to create a partition a few layers down the hierarchy. It is relatively rare to have applications that require creation of a separate scheduling domain (root). However, it is more common to have applications that require the use of isolated CPUs (isolated), e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options to get that statically. Of course, the "isolated" partition is another way to achieve that dynamically. Modern container orchestration tools like Kubernetes use the cgroup hierarchy to manage different containers. If a container needs to use isolated CPUs, it is hard to get those with existing set of cpuset partition types. With this patch series, a new "isolcpus" partition can be created to hold a set of isolated CPUs that can be pull into other "isolated" partitions. The "isolcpus" partition is special that there can have at most one instance of this in a system. It serves as a pool for isolated CPUs and cannot hold tasks or sub-cpusets underneath it. It is also not cpu-exclusive so that the isolated CPUs can be distributed down the sibling hierarchies, though those isolated CPUs will not be useable until the partition type becomes "isolated". Once isolated CPUs are needed in a cgroup, the administrator can write a list of isolated CPUs into its "cpuset.cpus" and change its partition type to "isolated" to pull in those isolated CPUs from the "isolcpus" partition and use them in that cgroup. That will make the distribution of isolated CPUs to cgroups that need them much easier. In the future, we may be able to extend this special "isolcpus" partition type to support other isolation attributes like those that can be specified with the "isolcpus" boot command line and related options. Waiman Long (5): cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling cgroup/cpuset: Add a new "isolcpus" paritition root state cgroup/cpuset: Make isolated partition pull CPUs from isolcpus partition cgroup/cpuset: Documentation update for the new "isolcpus" partition cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition Documentation/admin-guide/cgroup-v2.rst | 89 ++- kernel/cgroup/cpuset.c | 548 +++++++++++++++--- .../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++---- 3 files changed, 789 insertions(+), 224 deletions(-) -- 2.31.1

2 years, 9 months

[PATCH v8 0/6] Some improvements of resctrl selftest

by Shaopeng Tan

Hello, The aim of this patch series is to improve the resctrl selftest. Without these fixes, some unnecessary processing will be executed and test results will be confusing. There is no behavior change in test themselves. [patch 1] Make write_schemata() run to set up shemata with 100% allocation on first run in MBM test. [patch 2] The MBA test result message is always output as "ok", make output message to be "not ok" if MBA check result is failed. [patch 3] When a child process is created by fork(), the buffer of the parent process is also copied. Flush the buffer before executing fork(). [patch 4] An error occurs whether in parents process or child process, the parents process always kills child process and runs umount_resctrlfs(), and the child process always waits to be killed by the parent process. [patch 5] If a signal received, to cleanup properly before exiting the parent process, commonize the signal handler registered for CMT/MBM/MBA tests and reuse it in CAT, also unregister the signal handler at the end of each test. [patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result files function cat/cmt/mbm/mba_test_cleanup() are called twice. Delete once. This patch series is based on Linux v6.2-rc7. Difference from v7: [patch 4] - Fix commitlog. [patch 5] - Fix commitlog. Pervious versions of this series: [v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit… [v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit… [v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit… [v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit… [v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit… [v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits… [v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit… Shaopeng Tan (6): selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test selftests/resctrl: Return MBA check result and make it to output message selftests/resctrl: Flush stdout file buffer before executing fork() selftests/resctrl: Cleanup properly when an error occurs in CAT test selftests/resctrl: Commonize the signal handler register/unregister for all tests selftests/resctrl: Remove duplicate codes that clear each test result file tools/testing/selftests/resctrl/cat_test.c | 29 ++++---- tools/testing/selftests/resctrl/cmt_test.c | 7 +- tools/testing/selftests/resctrl/fill_buf.c | 14 ---- tools/testing/selftests/resctrl/mba_test.c | 23 +++---- tools/testing/selftests/resctrl/mbm_test.c | 20 +++--- tools/testing/selftests/resctrl/resctrl.h | 2 + .../testing/selftests/resctrl/resctrl_tests.c | 4 -- tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++----- tools/testing/selftests/resctrl/resctrlfs.c | 5 +- 9 files changed, 96 insertions(+), 75 deletions(-) -- 2.27.0

2 years, 9 months

Re: [PATCH v6 1/3] mm: add new api to enable ksm per process

by Matthew Wilcox

On Tue, Apr 11, 2023 at 08:16:46PM -0700, Stefan Roesch wrote: > case PR_SET_VMA: > error = prctl_set_vma(arg2, arg3, arg4, arg5); > break; > +#ifdef CONFIG_KSM > + case PR_SET_MEMORY_MERGE: > + if (mmap_write_lock_killable(me->mm)) > + return -EINTR; > + > + if (arg2) { > + int err = ksm_add_mm(me->mm); > + > + if (!err) > + ksm_add_vmas(me->mm); in the last version of this patch, you reported the error. Now you swallow the error. I have no idea which is correct, but you've changed the behaviour without explaining it, so I assume it's wrong. > + } else { > + clear_bit(MMF_VM_MERGE_ANY, &me->mm->flags); > + } > + mmap_write_unlock(me->mm); > + break; > + case PR_GET_MEMORY_MERGE: > + if (arg2 || arg3 || arg4 || arg5) > + return -EINVAL; > + > + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags); > + break; Why do we need a GET? Just for symmetry, or is there an actual need for it?

2 years, 9 months

[PATCH bpf-next v6 4/4] selftests: xsk: Add tests for 8K and 9K frame sizes

by Kal Conley

Add tests: - RUN_TO_COMPLETION_8K_FRAME_SIZE: frame_size=8192 (aligned) - UNALIGNED_9K_FRAME_SIZE: frame_size=9000 (unaligned) Signed-off-by: Kal Conley <kal.conley(a)dectris.com> Acked-by: Magnus Karlsson <magnus.karlsson(a)intel.com> --- tools/testing/selftests/bpf/xskxceiver.c | 25 ++++++++++++++++++++++++ tools/testing/selftests/bpf/xskxceiver.h | 2 ++ 2 files changed, 27 insertions(+) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 7eccf57a0ccc..86797de7fc50 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -1841,6 +1841,17 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ pkt_stream_replace(test, DEFAULT_PKT_CNT, PKT_SIZE); testapp_validate_traffic(test); break; + case TEST_TYPE_RUN_TO_COMPLETION_8K_FRAME: + if (!hugepages_present(test->ifobj_tx)) { + ksft_test_result_skip("No 2M huge pages present.\n"); + return; + } + test_spec_set_name(test, "RUN_TO_COMPLETION_8K_FRAME_SIZE"); + test->ifobj_tx->umem->frame_size = 8192; + test->ifobj_rx->umem->frame_size = 8192; + pkt_stream_replace(test, DEFAULT_PKT_CNT, PKT_SIZE); + testapp_validate_traffic(test); + break; case TEST_TYPE_RX_POLL: test->ifobj_rx->use_poll = true; test_spec_set_name(test, "POLL_RX"); @@ -1904,6 +1915,20 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ if (!testapp_unaligned(test)) return; break; + case TEST_TYPE_UNALIGNED_9K_FRAME: + if (!hugepages_present(test->ifobj_tx)) { + ksft_test_result_skip("No 2M huge pages present.\n"); + return; + } + test_spec_set_name(test, "UNALIGNED_9K_FRAME_SIZE"); + test->ifobj_tx->umem->frame_size = 9000; + test->ifobj_rx->umem->frame_size = 9000; + test->ifobj_tx->umem->unaligned_mode = true; + test->ifobj_rx->umem->unaligned_mode = true; + pkt_stream_replace(test, DEFAULT_PKT_CNT, PKT_SIZE); + test->ifobj_rx->pkt_stream->use_addr_for_fill = true; + testapp_validate_traffic(test); + break; case TEST_TYPE_HEADROOM: testapp_headroom(test); break; diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 919327807a4e..7f52f737f5e9 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -69,12 +69,14 @@ enum test_mode { enum test_type { TEST_TYPE_RUN_TO_COMPLETION, TEST_TYPE_RUN_TO_COMPLETION_2K_FRAME, + TEST_TYPE_RUN_TO_COMPLETION_8K_FRAME, TEST_TYPE_RUN_TO_COMPLETION_SINGLE_PKT, TEST_TYPE_RX_POLL, TEST_TYPE_TX_POLL, TEST_TYPE_POLL_RXQ_TMOUT, TEST_TYPE_POLL_TXQ_TMOUT, TEST_TYPE_UNALIGNED, + TEST_TYPE_UNALIGNED_9K_FRAME, TEST_TYPE_ALIGNED_INV_DESC, TEST_TYPE_ALIGNED_INV_DESC_2K_FRAME, TEST_TYPE_UNALIGNED_INV_DESC, -- 2.39.2

2 years, 9 months

[PATCH bpf-next v6 3/4] selftests: xsk: Use hugepages when umem->frame_size > PAGE_SIZE

by Kal Conley

HugeTLB UMEMs now support chunk_size > PAGE_SIZE. Set MAP_HUGETLB when frame_size > PAGE_SIZE for future tests. Signed-off-by: Kal Conley <kal.conley(a)dectris.com> Acked-by: Magnus Karlsson <magnus.karlsson(a)intel.com> --- tools/testing/selftests/bpf/xskxceiver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 5a9691e942de..7eccf57a0ccc 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -1289,7 +1289,7 @@ static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject) void *bufs; int ret; - if (ifobject->umem->unaligned_mode) + if (ifobject->umem->frame_size > sysconf(_SC_PAGESIZE) || ifobject->umem->unaligned_mode) mmap_flags |= MAP_HUGETLB; if (ifobject->shared_umem) -- 2.39.2

2 years, 9 months

[RFC PATCH 5/5] cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition

by Waiman Long

This patch extends the test_cpuset_prs.sh test script to support testing the new isolcpus partition by adding new tests for specifically for isolcpus partition. In addition, the following changes are also made: 1) Remove the first column of the TEST_MATRIX as it is always the same and so is redundant. 2) Add a new C1 cgroup directory for testing and add that column to the TEST_MATRIX. 3) Add support for the .__DEBUG__.cpuset.cpus.subpartitions file if "cgroup_debug" kernel boot option is specified and a new column into TEST_MATRIX for testing against this cgroup control file. 4) Add another column to for the list of expected isolated CPUs and compare it with the actual value by looking at the state of /sys/kernel/debug/sched/domains. Signed-off-by: Waiman Long <longman(a)redhat.com> --- .../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++++++------ 1 file changed, 258 insertions(+), 118 deletions(-) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index 2b5215cc599f..7fa2bfe6c1c0 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -23,18 +23,18 @@ WAIT_INOTIFY=$(cd $(dirname $0); pwd)/wait_inotify CGROUP2=$(mount -t cgroup2 | head -1 | awk -e '{print $3}') [[ -n "$CGROUP2" ]] || skip_test "Cgroup v2 mount point not found!" -CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//") -[[ $CPUS -lt 8 ]] && skip_test "Test needs at least 8 cpus available!" +NR_CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//") +[[ $NR_CPUS -lt 8 ]] && skip_test "Test needs at least 8 cpus available!" # Set verbose flag and delay factor PROG=$1 -VERBOSE= +VERBOSE=0 DELAY_FACTOR=1 SCHED_DEBUG= while [[ "$1" = -* ]] do case "$1" in - -v) VERBOSE=1 + -v) ((VERBOSE++)) # Enable sched/verbose can slow thing down [[ $DELAY_FACTOR -eq 1 ]] && DELAY_FACTOR=2 @@ -52,7 +52,7 @@ do done # Set sched verbose flag if available when "-v" option is specified -if [[ -n "$VERBOSE" && -d /sys/kernel/debug/sched ]] +if [[ $VERBOSE -gt 0 && -d /sys/kernel/debug/sched ]] then # Used to restore the original setting during cleanup SCHED_DEBUG=$(cat /sys/kernel/debug/sched/verbose) @@ -103,7 +103,7 @@ test_partition() [[ $? -eq 0 ]] || exit 1 ACTUAL_VAL=$(cat cpuset.cpus.partition) [[ $ACTUAL_VAL != $EXPECTED_VAL ]] && { - echo "cpuset.cpus.partition: expect $EXPECTED_VAL, found $EXPECTED_VAL" + echo "cpuset.cpus.partition: expect $EXPECTED_VAL, found $ACTUAL_VAL" echo "Test FAILED" exit 1 } @@ -114,7 +114,7 @@ test_effective_cpus() EXPECTED_VAL=$1 ACTUAL_VAL=$(cat cpuset.cpus.effective) [[ "$ACTUAL_VAL" != "$EXPECTED_VAL" ]] && { - echo "cpuset.cpus.effective: expect '$EXPECTED_VAL', found '$EXPECTED_VAL'" + echo "cpuset.cpus.effective: expect '$EXPECTED_VAL', found '$ACTUAL_VAL'" echo "Test FAILED" exit 1 } @@ -204,124 +204,175 @@ test_isolated() # Cgroup test hierarchy # # test -- A1 -- A2 -- A3 -# \- B1 +# +- B1 +# +- C1 # -# P<v> = set cpus.partition (0:member, 1:root, 2:isolated, -1:root invalid) +# P<v> = set cpus.partition (0:member, 1:root, 2:isolated, 3: isolcpus) # C<l> = add cpu-list # S<p> = use prefix in subtree_control # T = put a task into cgroup -# O<c>-<v> = Write <v> to CPU online file of <c> +# O<c>=<v> = Write <v> to CPU online file of <c> # SETUP_A123_PARTITIONS="C1-3:P1:S+ C2-3:P1:S+ C3:P1" TEST_MATRIX=( - # test old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate - # ---- ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ - " S+ C0-1 . . C2-3 S+ C4-5 . . 0 A2:0-1" - " S+ C0-1 . . C2-3 P1 . . . 0 " - " S+ C0-1 . . C2-3 P1:S+ C0-1:P1 . . 0 " - " S+ C0-1 . . C2-3 P1:S+ C1:P1 . . 0 " - " S+ C0-1:S+ . . C2-3 . . . P1 0 " - " S+ C0-1:P1 . . C2-3 S+ C1 . . 0 " - " S+ C0-1:P1 . . C2-3 S+ C1:P1 . . 0 " - " S+ C0-1:P1 . . C2-3 S+ C1:P1 . P1 0 " - " S+ C0-1:P1 . . C2-3 C4-5 . . . 0 A1:4-5" - " S+ C0-1:P1 . . C2-3 S+:C4-5 . . . 0 A1:4-5" - " S+ C0-1 . . C2-3:P1 . . . C2 0 " - " S+ C0-1 . . C2-3:P1 . . . C4-5 0 B1:4-5" - " S+ C0-3:P1:S+ C2-3:P1 . . . . . . 0 A1:0-1,A2:2-3" - " S+ C0-3:P1:S+ C2-3:P1 . . C1-3 . . . 0 A1:1,A2:2-3" - " S+ C2-3:P1:S+ C3:P1 . . C3 . . . 0 A1:,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P1 . . C3 P0 . . 0 A1:3,A2:3 A1:P1,A2:P0" - " S+ C2-3:P1:S+ C2:P1 . . C2-4 . . . 0 A1:3-4,A2:2" - " S+ C2-3:P1:S+ C3:P1 . . C3 . . C0-2 0 A1:,B1:0-2 A1:P1,A2:P1" - " S+ $SETUP_A123_PARTITIONS . C2-3 . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- + " C0-1 . . C2-3 S+ C4-5 . . . 0 A2:0-1" + " C0-1 . . C2-3 P1 . . . . 0 " + " C0-1 . . C2-3 P1:S+ C0-1:P1 . . . 0 " + " C0-1 . . C2-3 P1:S+ C1:P1 . . . 0 " + " C0-1:S+ . . C2-3 . . . P1 . 0 " + " C0-1:P1 . . C2-3 S+ C1 . . . 0 " + " C0-1:P1 . . C2-3 S+ C1:P1 . . . 0 " + " C0-1:P1 . . C2-3 S+ C1:P1 . P1 . 0 " + " C0-1:P1 . . C2-3 C4-5 . . . . 0 A1:4-5" + " C0-1:P1 . . C2-3 S+:C4-5 . . . . 0 A1:4-5" + " C0-1 . . C2-3:P1 . . . C2 . 0 " + " C0-1 . . C2-3:P1 . . . C4-5 . 0 B1:4-5" + "C0-3:P1:S+ C2-3:P1 . . . . . . . 0 A1:0-1,A2:2-3" + "C0-3:P1:S+ C2-3:P1 . . C1-3 . . . . 0 A1:1,A2:2-3" + "C2-3:P1:S+ C3:P1 . . C3 . . . . 0 A1:,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . C3 P0 . . . 0 A1:3,A2:3 A1:P1,A2:P0" + "C2-3:P1:S+ C2:P1 . . C2-4 . . . . 0 A1:3-4,A2:2" + "C2-3:P1:S+ C3:P1 . . C3 . . C0-2 . 0 A1:,B1:0-2 A1:P1,A2:P1" + "$SETUP_A123_PARTITIONS . C2-3 . . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" # CPU offlining cases: - " S+ C0-1 . . C2-3 S+ C4-5 . O2-0 0 A1:0-1,B1:3" - " S+ C0-3:P1:S+ C2-3:P1 . . O2-0 . . . 0 A1:0-1,A2:3" - " S+ C0-3:P1:S+ C2-3:P1 . . O2-0 O2-1 . . 0 A1:0-1,A2:2-3" - " S+ C0-3:P1:S+ C2-3:P1 . . O1-0 . . . 0 A1:0,A2:2-3" - " S+ C0-3:P1:S+ C2-3:P1 . . O1-0 O1-1 . . 0 A1:0-1,A2:2-3" - " S+ C2-3:P1:S+ C3:P1 . . O3-0 O3-1 . . 0 A1:2,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P2 . . O3-0 O3-1 . . 0 A1:2,A2:3 A1:P1,A2:P2" - " S+ C2-3:P1:S+ C3:P1 . . O2-0 O2-1 . . 0 A1:2,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P2 . . O2-0 O2-1 . . 0 A1:2,A2:3 A1:P1,A2:P2" - " S+ C2-3:P1:S+ C3:P1 . . O2-0 . . . 0 A1:,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P1 . . O3-0 . . . 0 A1:2,A2: A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P1 . . T:O2-0 . . . 0 A1:3,A2:3 A1:P1,A2:P-1" - " S+ C2-3:P1:S+ C3:P1 . . . T:O3-0 . . 0 A1:2,A2:2 A1:P1,A2:P-1" - " S+ $SETUP_A123_PARTITIONS . O1-0 . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . O2-0 . . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . O3-0 . . . 0 A1:1,A2:2,A3: A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . . T:O2-0 . . 0 A1:1,A2:3,A3:3 A1:P1,A2:P1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . . . T:O3-0 . 0 A1:1,A2:2,A3:2 A1:P1,A2:P1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 O1-1 . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . . T:O2-0 O2-1 . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . . . T:O3-0 O3-1 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 O2-0 O1-1 . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 O2-0 O2-1 . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" - - # test old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate - # ---- ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ + " C0-1 . . C2-3 S+ C4-5 . O2=0 . 0 A1:0-1,B1:3" + "C0-3:P1:S+ C2-3:P1 . . O2=0 . . . . 0 A1:0-1,A2:3" + "C0-3:P1:S+ C2-3:P1 . . O2=0 O2=1 . . . 0 A1:0-1,A2:2-3" + "C0-3:P1:S+ C2-3:P1 . . O1=0 . . . . 0 A1:0,A2:2-3" + "C0-3:P1:S+ C2-3:P1 . . O1=0 O1=1 . . . 0 A1:0-1,A2:2-3" + "C2-3:P1:S+ C3:P1 . . O3=0 O3=1 . . . 0 A1:2,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P2 . . O3=0 O3=1 . . . 0 A1:2,A2:3 A1:P1,A2:P2" + "C2-3:P1:S+ C3:P1 . . O2=0 O2=1 . . . 0 A1:2,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P2 . . O2=0 O2=1 . . . 0 A1:2,A2:3 A1:P1,A2:P2" + "C2-3:P1:S+ C3:P1 . . O2=0 . . . . 0 A1:,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . O3=0 . . . . 0 A1:2,A2: A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . T:O2=0 . . . . 0 A1:3,A2:3 A1:P1,A2:P-1" + "C2-3:P1:S+ C3:P1 . . . T:O3=0 . . . 0 A1:2,A2:2 A1:P1,A2:P-1" + "$SETUP_A123_PARTITIONS . O1=0 . . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . O2=0 . . . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . O3=0 . . . . 0 A1:1,A2:2,A3: A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . T:O1=0 . . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" + "$SETUP_A123_PARTITIONS . . T:O2=0 . . . 0 A1:1,A2:3,A3:3 A1:P1,A2:P1,A3:P-1" + "$SETUP_A123_PARTITIONS . . . T:O3=0 . . 0 A1:1,A2:2,A3:2 A1:P1,A2:P1,A3:P-1" + "$SETUP_A123_PARTITIONS . T:O1=0 O1=1 . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . . T:O2=0 O2=1 . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . . . T:O3=0 O3=1 . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O1=1 . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O2=1 . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" + + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- + # + # isolcpus partition tests + # + + # isolcpus partition can have empty cpuset.cpus & effective cpus + " . . . P3 . . . . . 0 B1: B1:P3" + + # isolcpus partition is not exclusive + " C1-2 . . C3:P3 C1-3:S+ C3 . . . 0 A1:1-2,A2:1-2,B1:3 B1:P3" + " C1-3 . . C3 . . . P3 . 0 A1:1-2,B1:3 B1:P3" + + # Only 1 isolcpus partition is allowed + " . . . C3:P3 C1:P3 . . . . 0 A1:1,B1:3 A1:P-3,B1:P3" + + # Isolated partition can pull isolated cpus from isolcpus partition + " C1-3:S+ C3 . C3:P3 . P2 . . . 0 A1:1-2,A2:3,B1: A2:P2,B1:P3 .:3,B1:3 3" + " C1-3:S+ C3 . C3:P3 . P2 . C2-3 . 0 A1:1,A2:3,B1:2 A2:P2,B1:P3 .:2-3,B1:3 2-3" + + # Isolated partition becomes invalid if cpu update fails pulling + " C1-3:S+ C3 . C3:P3 . P2:C2-3 . . . 0 A1:1-2,A2:2,B1:3 A2:P-2,B1:P3 .:3,B1: 3" + " C1-3:S+ C3 . C3:P3 . P2 . C1 . 0 A1:2-3,A2:3,B1:1 A2:P-2,B1:P3 .:1,B1: 1" + + # Once isolated partition pulls cpus from isolcpus, parent can shrink cpu list + " C1-3:S+ C3:P2 . C3:P3 C1-2 . . . . 0 A1:1-2,A2:3,B1: A2:P2,B1:P3 . 3" + " C1-3:S+ C3:P2 . C3:P3 C1 . . . . 0 A1:1,A2:3,B1: A2:P2,B1:P3 . 3" + + # Isolated partition can't be enabled if it can't pull all isolated cpus from parent or isolcpus + " C1-3:S+ C2 . C3:P3 . P2 . . . 0 A1:1-2,A2:2,B1:3 A2:P-2,B1:P3" + + # Isolated/isolcpus partition online/offline tests + " C1-3:S+ C3 . C2-3:P3 . P2 O2=0 . . 0 A1:1,A2:3,B1: A2:P2,B1:P3 .:2-3,B1:3 2-3" + " C1-3:S+ C3 . C2-3:P3 . P2 O2=0 O2=1 . 0 A1:1,A2:3,B1:2 A2:P2,B1:P3 .:2-3,B1:3 2-3" + " C1-3:S+ C2-3 . C2-3:P3 . P2 O2=0 . . 0 A1:1,A2:3,B1: A2:P2,B1:P3 .:2-3,B1:2-3 2-3" + " C1-3:S+ C2-3 . C2-3:P3 . P2 O2=0 O2=1 . 0 A1:1,A2:2-3,B1: A2:P2,B1:P3 .:2-3,B1:2-3 2-3" + + # Isolated partition pulling from isolcpus become invalid if all isolated cpus gone + " C1-3:S+ C3 . C2-3:P3 . P2 O3=0 . . 0 A1:1,A2:1,B1:2 A2:P-2,B1:P3 .:2-3,B1:" + " C1-3:S+ C3 . C2-3:P3 . P2 O3=0 O3=1 . 0 A1:1,A2:1,B1:2-3 A2:P-2,B1:P3 .:2-3,B1:" + + # Hotplug won't affect isolcpus partition with empty cpus_allowed + " C1-3 . . P3 . . O1=0 . . 0 A1:2-3,B1: B1:P3" + + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- # # Incorrect change to cpuset.cpus invalidates partition root # # Adding CPUs to partition root that are not in parent's # cpuset.cpus is allowed, but those extra CPUs are ignored. - " S+ C2-3:P1:S+ C3:P1 . . . C2-4 . . 0 A1:,A2:2-3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . . C2-4 . . . 0 A1:,A2:2-3 A1:P1,A2:P1" # Taking away all CPUs from parent or itself if there are tasks # will make the partition invalid. - " S+ C2-3:P1:S+ C3:P1 . . T C2-3 . . 0 A1:2-3,A2:2-3 A1:P1,A2:P-1" - " S+ C3:P1:S+ C3 . . T P1 . . 0 A1:3,A2:3 A1:P1,A2:P-1" - " S+ $SETUP_A123_PARTITIONS . T:C2-3 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . T:C2-3:C1-3 . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "C2-3:P1:S+ C3:P1 . . T C2-3 . . . 0 A1:2-3,A2:2-3 A1:P1,A2:P-1" + " C3:P1:S+ C3 . . T P1 . . . 0 A1:3,A2:3 A1:P1,A2:P-1" + "$SETUP_A123_PARTITIONS . T:C2-3 . . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" + "$SETUP_A123_PARTITIONS . T:C2-3:C1-3 . . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" # Changing a partition root to member makes child partitions invalid - " S+ C2-3:P1:S+ C3:P1 . . P0 . . . 0 A1:2-3,A2:3 A1:P0,A2:P-1" - " S+ $SETUP_A123_PARTITIONS . C2-3 P0 . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P0,A3:P-1" + "C2-3:P1:S+ C3:P1 . . P0 . . . . 0 A1:2-3,A2:3 A1:P0,A2:P-1" + "$SETUP_A123_PARTITIONS . C2-3 P0 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P0,A3:P-1" # cpuset.cpus can contains cpus not in parent's cpuset.cpus as long # as they overlap. - " S+ C2-3:P1:S+ . . . . C3-4:P1 . . 0 A1:2,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ . . . . C3-4:P1 . . . 0 A1:2,A2:3 A1:P1,A2:P1" # Deletion of CPUs distributed to child cgroup is allowed. - " S+ C0-1:P1:S+ C1 . C2-3 C4-5 . . . 0 A1:4-5,A2:4-5" + "C0-1:P1:S+ C1 . C2-3 C4-5 . . . . 0 A1:4-5,A2:4-5" # To become a valid partition root, cpuset.cpus must overlap parent's # cpuset.cpus. - " S+ C0-1:P1 . . C2-3 S+ C4-5:P1 . . 0 A1:0-1,A2:0-1 A1:P1,A2:P-1" + " C0-1:P1 . . C2-3 S+ C4-5:P1 . . . 0 A1:0-1,A2:0-1 A1:P1,A2:P-1" # Enabling partition with child cpusets is allowed - " S+ C0-1:S+ C1 . C2-3 P1 . . . 0 A1:0-1,A2:1 A1:P1" + " C0-1:S+ C1 . C2-3 P1 . . . . 0 A1:0-1,A2:1 A1:P1" # A partition root with non-partition root parent is invalid, but it # can be made valid if its parent becomes a partition root too. - " S+ C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1,A2:1 A1:P0,A2:P-2" - " S+ C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0,A2:1 A1:P1,A2:P2" + " C0-1:S+ C1 . C2-3 . P2 . . . 0 A1:0-1,A2:1 A1:P0,A2:P-2" + " C0-1:S+ C1:P2 . C2-3 P1 . . . . 0 A1:0,A2:1 A1:P1,A2:P2" # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " S+ C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P0" - " S+ C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1" - " S+ C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1" + " C0-1:P1 . . C2-3 C0-2 . . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P0" + " C0-1:P1 . . P1:C2-3 C0-2 . . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1" + " C0-1 . . P1:C2-3 C0-2 . . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1" - # test old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate - # ---- ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- # Failure cases: # A task cannot be added to a partition with no cpu - " S+ C2-3:P1:S+ C3:P1 . . O2-0:T . . . 1 A1:,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . O2=0:T . . . . 1 A1:,A2:3 A1:P1,A2:P1" + + # Task is not allowed in an isolcpus partition + " . . . C3:P3 . . . T . 1" + + # Child cpuset is not allowed under an isolcpus partition + " C1:P3 . . . S+ . . . . 1" ) # # Write to the cpu online file -# $1 - <c>-<v> where <c> = cpu number, <v> value to be written +# $1 - <c>=<v> where <c> = cpu number, <v> value to be written # write_cpu_online() { - CPU=${1%-*} - VAL=${1#*-} + CPU=${1%=*} + VAL=${1#*=} CPUFILE=//sys/devices/system/cpu/cpu${CPU}/online if [[ $VAL -eq 0 ]] then @@ -349,11 +400,12 @@ set_ctrl_state() TMPMSG=/tmp/.msg_$$ CGRP=$1 STATE=$2 - SHOWERR=${3}${VERBOSE} + SHOWERR=${3} CTRL=${CTRL:=$CONTROLLER} HASERR=0 REDIRECT="2> $TMPMSG" [[ -z "$STATE" || "$STATE" = '.' ]] && return 0 + [[ $VERBOSE -gt 0 ]] && SHOWERR=1 rm -f $TMPMSG for CMD in $(echo $STATE | sed -e "s/:/ /g") @@ -383,6 +435,9 @@ set_ctrl_state() ;; 2) VAL=isolated ;; + 3) + VAL=isolcpus + ;; *) echo "Invalid partition state - $VAL" exit 1 @@ -430,7 +485,7 @@ online_cpus() [[ -n "OFFLINE_CPUS" ]] && { for C in $OFFLINE_CPUS do - write_cpu_online ${C}-1 + write_cpu_online ${C}=1 done } } @@ -442,19 +497,23 @@ reset_cgroup_states() { echo 0 > $CGROUP2/cgroup.procs online_cpus - rmdir A1/A2/A3 A1/A2 A1 B1 > /dev/null 2>&1 + rmdir A1/A2/A3 A1/A2 A1 B1 C1 > /dev/null 2>&1 set_ctrl_state . S- pause 0.01 } dump_states() { - for DIR in A1 A1/A2 A1/A2/A3 B1 + for DIR in . A1 A1/A2 A1/A2/A3 B1 C1 do ECPUS=$DIR/cpuset.cpus.effective PRS=$DIR/cpuset.cpus.partition + PCPUS=$DIR/cpuset.cpus.subpartitions + [[ -e $PCPUS ]] || + PCPUS=$DIR/.__DEBUG__.cpuset.cpus.subpartitions [[ -e $ECPUS ]] && echo "$ECPUS: $(cat $ECPUS)" [[ -e $PRS ]] && echo "$PRS: $(cat $PRS)" + [[ -e $PCPUS ]] && echo "$PCPUS: $(cat $PCPUS)" done } @@ -478,6 +537,26 @@ check_effective_cpus() done } +# +# Check subparts cpus +# $1 - check string, format: <cgroup>:<cpu-list>[,<cgroup>:<cpu-list>]* +# +check_subparts_cpus() +{ + CHK_STR=$1 + for CHK in $(echo $CHK_STR | sed -e "s/,/ /g") + do + set -- $(echo $CHK | sed -e "s/:/ /g") + CGRP=$1 + CPUS=$2 + [[ $CGRP = A2 ]] && CGRP=A1/A2 + [[ $CGRP = A3 ]] && CGRP=A1/A2/A3 + FILE=$CGRP/.__DEBUG__.cpuset.cpus.subpartitions + [[ -e $FILE ]] || return 0 # Skip test + [[ $CPUS = $(cat $FILE) ]] || return 1 + done +} + # # Check cgroup states # $1 - check string, format: <cgroup>:<state>[,<cgroup>:<state>]* @@ -512,18 +591,80 @@ check_cgroup_states() isolated) VAL=2 ;; + isolcpus) + VAL=3 + ;; "root invalid"*) VAL=-1 ;; "isolated invalid"*) VAL=-2 ;; + "isolcpus invalid"*) + VAL=-3 + ;; esac [[ $EVAL != $VAL ]] && return 1 done return 0 } +# +# Get isolated (including offline) CPUs by looking at +# /sys/kernel/debug/sched/domains and compare that with the expected value. +# +# $1 - expected isolated cpu list +# +check_isolcpus() +{ + EXPECT_VAL=$1 + ISOLCPUS= + LASTISOLCPU= + SCHED_DOMAINS=/sys/kernel/debug/sched/domains + [[ -d $SCHED_DOMAINS ]] || return 0 # Skip check + + for ((CPU=0; CPU < $NR_CPUS; CPU++)) + do + [[ -n "$(ls ${SCHED_DOMAINS}/cpu$CPU)" ]] && continue + + if [[ -z "$LASTISOLCPU" ]] + then + ISOLCPUS=$CPU + LASTISOLCPU=$CPU + elif [[ "$LASTISOLCPU" -eq $((CPU - 1)) ]] + then + echo $ISOLCPUS | grep -q "\<$LASTISOLCPU\$" + if [[ $? -eq 0 ]] + then + ISOLCPUS=${ISOLCPUS}- + fi + LASTISOLCPU=$CPU + else + if [[ $ISOLCPUS = *- ]] + then + ISOLCPUS=${ISOLCPUS}$LASTISOLCPU + fi + ISOLCPUS=${ISOLCPUS},$CPU + LASTISOLCPU=$CPU + fi + done + [[ "$ISOLCPUS" = *- ]] && ISOLCPUS=${ISOLCPUS}$LASTISOLCPU + [[ $EXPECT_VAL = $ISOLCPUS ]] +} + +test_fail() +{ + TESTNUM=$1 + TESTTYPE=$2 + ADDINFO=$3 + echo "Test $TEST[$TESTNUM] failed $TESTTYPE check!" + [[ -n "$ADDINFO" ]] && echo "*** $ADDINFO ***" + eval echo \"\${$TEST[$I]}\" + echo + dump_states + exit 1 +} + # # Run cpuset state transition test # $1 - test matrix name @@ -548,60 +689,59 @@ run_state_test() while [[ $I -lt $CNT ]] do echo "Running test $I ..." > /dev/console + [[ $VERBOSE -gt 1 ]] && eval echo \"\${$TEST[$I]}\" eval set -- "\${$TEST[$I]}" - ROOT=$1 - OLD_A1=$2 - OLD_A2=$3 - OLD_A3=$4 - OLD_B1=$5 - NEW_A1=$6 - NEW_A2=$7 - NEW_A3=$8 - NEW_B1=$9 + OLD_A1=$1 + OLD_A2=$2 + OLD_A3=$3 + OLD_B1=$4 + NEW_A1=$5 + NEW_A2=$6 + NEW_A3=$7 + NEW_B1=$8 + NEW_C1=$9 RESULT=${10} ECPUS=${11} STATES=${12} + PCPUS=${13} + ICPUS=${14} - set_ctrl_state_noerr . $ROOT + set_ctrl_state_noerr . "S+" + set_ctrl_state_noerr B1 $OLD_B1 set_ctrl_state_noerr A1 $OLD_A1 set_ctrl_state_noerr A1/A2 $OLD_A2 set_ctrl_state_noerr A1/A2/A3 $OLD_A3 - set_ctrl_state_noerr B1 $OLD_B1 RETVAL=0 set_ctrl_state A1 $NEW_A1; ((RETVAL += $?)) set_ctrl_state A1/A2 $NEW_A2; ((RETVAL += $?)) set_ctrl_state A1/A2/A3 $NEW_A3; ((RETVAL += $?)) set_ctrl_state B1 $NEW_B1; ((RETVAL += $?)) + set_ctrl_state C1 $NEW_C1; ((RETVAL += $?)) - [[ $RETVAL -ne $RESULT ]] && { - echo "Test $TEST[$I] failed result check!" - eval echo \"\${$TEST[$I]}\" - dump_states - exit 1 - } + [[ $RETVAL -ne $RESULT ]] && test_fail $I result [[ -n "$ECPUS" && "$ECPUS" != . ]] && { check_effective_cpus $ECPUS - [[ $? -ne 0 ]] && { - echo "Test $TEST[$I] failed effective CPU check!" - eval echo \"\${$TEST[$I]}\" - echo - dump_states - exit 1 - } + [[ $? -ne 0 ]] && test_fail $I "effective CPU" } - [[ -n "$STATES" ]] && { + [[ -n "$STATES" && "$STATES" != . ]] && { check_cgroup_states $STATES - [[ $? -ne 0 ]] && { - echo "FAILED: Test $TEST[$I] failed states check!" - eval echo \"\${$TEST[$I]}\" - echo - dump_states - exit 1 - } + [[ $? -ne 0 ]] && test_fail $I states } + [[ -n "$PCPUS" && "$PCPUS" != . ]] && { + check_subparts_cpus $PCPUS + [[ $? -ne 0 ]] && test_fail $I "subpartitions CPU" + } + + # Compare the expected isolated CPUs with the actual ones, + # if available + [[ -n "$ICPUS" ]] && { + check_isolcpus $ICPUS + [[ $? -ne 0 ]] && test_fail $I "isolated CPU" \ + "Expect $ICPUS, get $ISOLCPUS instead" + } reset_cgroup_states # # Check to see if effective cpu list changes @@ -612,7 +752,7 @@ run_state_test() echo "Effective cpus changed to $NEWLIST after test $I!" exit 1 } - [[ -n "$VERBOSE" ]] && echo "Test $I done." + [[ $VERBOSE -gt 0 ]] && echo "Test $I done." ((I++)) done echo "All $I tests of $TEST PASSED." @@ -655,7 +795,7 @@ test_inotify() rm -f $PRS wait_inotify $PWD/cpuset.cpus.partition $PRS & pause 0.01 - set_ctrl_state . "O1-0" + set_ctrl_state . "O1=0" pause 0.01 check_cgroup_states ".:P-1" if [[ $? -ne 0 ]] -- 2.31.1

2 years, 9 months

[RFC PATCH 4/5] cgroup/cpuset: Documentation update for the new "isolcpus" partition

by Waiman Long

This patch updates the cgroup-v2.rst file to include information about the new "isolcpus" partition type. Signed-off-by: Waiman Long <longman(a)redhat.com> --- Documentation/admin-guide/cgroup-v2.rst | 89 +++++++++++++++++++------ 1 file changed, 70 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index f67c0829350b..352a02849fa7 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2225,7 +2225,8 @@ Cpuset Interface Files ========== ===================================== "member" Non-root member of a partition "root" Partition root - "isolated" Partition root without load balancing + "isolcpus" Partition root for isolated CPUs pool + "isolated" Partition root for isolated CPUs ========== ===================================== The root cgroup is always a partition root and its state @@ -2237,24 +2238,41 @@ Cpuset Interface Files its descendants except those that are separate partition roots themselves and their descendants. + When set to "isolcpus", the CPUs in that partition root will + be in an isolated state without any load balancing from the + scheduler. This partition root is special as there can be at + most one instance of it in a system and no task or child cpuset + is allowed in this cgroup. It acts as a pool of isolated CPUs to + be pulled into other "isolated" partitions. The "cpuset.cpus" + of an "isolcpus" partition root contains the list of isolated + CPUs it holds, where "cpuset.cpus.effective" contains the list + of freely available isolated CPUs that are ready to be pull + into other "isolated" partition. + When set to "isolated", the CPUs in that partition root will be in an isolated state without any load balancing from the scheduler. Tasks placed in such a partition with multiple CPUs should be carefully distributed and bound to each of the - individual CPUs for optimal performance. - - The value shown in "cpuset.cpus.effective" of a partition root - is the CPUs that the partition root can dedicate to a potential - new child partition root. The new child subtracts available - CPUs from its parent "cpuset.cpus.effective". - - A partition root ("root" or "isolated") can be in one of the - two possible states - valid or invalid. An invalid partition - root is in a degraded state where some state information may - be retained, but behaves more like a "member". - - All possible state transitions among "member", "root" and - "isolated" are allowed. + individual CPUs for optimal performance. The isolated CPUs can + come from either the parent partition root or from an "isolcpus" + partition if the parent cannot satisfy its request. + + The value shown in "cpuset.cpus.effective" of a partition root is + the CPUs that the partition root can dedicate to a potential new + child partition root. The new child partition subtracts available + CPUs from its parent "cpuset.cpus.effective". An exception is + an "isolated" partition that pulls its isolated CPUs from the + "isolcpus" partition root that is not its direct parent. + + A partition root can be in one of the two possible states - + valid or invalid. An invalid partition root is in a degraded + state where some state information may be retained, but behaves + more like a "member". + + All possible state transitions among "member", "root", "isolcpus" + and "isolated" are allowed. However, the partition root may + not be valid if the corresponding prerequisite conditions are + not met. On read, the "cpuset.cpus.partition" file can show the following values. @@ -2262,16 +2280,18 @@ Cpuset Interface Files ============================= ===================================== "member" Non-root member of a partition "root" Partition root - "isolated" Partition root without load balancing + "isolcpus" Partition root for isolated CPUs pool + "isolated" Partition root for isolated CPUs "root invalid (<reason>)" Invalid partition root + "isolcpus invalid (<reason>)" Invalid isolcpus partition root "isolated invalid (<reason>)" Invalid isolated partition root ============================= ===================================== In the case of an invalid partition root, a descriptive string on - why the partition is invalid is included within parentheses. + why the partition is invalid may be included within parentheses. - For a partition root to become valid, the following conditions - must be met. + For a "root" partition root to become valid, the following + conditions must be met. 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they are not shared by any of its siblings (exclusivity rule). @@ -2281,6 +2301,37 @@ Cpuset Interface Files 4) The "cpuset.cpus.effective" cannot be empty unless there is no task associated with this partition. + A valid "isolcpus" partition root requires the following + conditions. + + 1) The parent cgroup is a valid partition root. + 2) The "cpuset.cpus" must be a subset of parent's "cpuset.cpus" + including an empty cpu list. + 3) There can be no more than one valid "isolcpus" partition. + 4) No task or child cpuset is allowed. + + Note that an "isolcpus" partition is not exclusive and its + isolated CPUs can be distributed down sibling cgroups even + though they may not appear in their "cpuset.cpus.effective". + + A valid "isolated" partition root can pull isolated CPUs from + either its parent partition or from the "isolcpus" partition. + It also requires the following conditions to be met. + + 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they + are not shared by any of its siblings (exclusivity rule). + 2) The "cpuset.cpus" is not empty and must be a subset of + parent's "cpuset.cpus". + 3) The "cpuset.cpus.effective" cannot be empty unless there is + no task associated with this partition. + + If pulling isolated CPUS from "isolcpus" partition, + the "cpuset.cpus" must also be a subset of "isolcpus" + partition's "cpuset.cpus" and all the requested CPUs must + be available for pulling, i.e. in "isolcpus" partition's + "cpuset.cpus.effective". In this case, its hierarchical parent + does not need to be a valid partition root. + External events like hotplug or changes to "cpuset.cpus" can cause a valid partition root to become invalid and vice versa. Note that a task cannot be moved to a cgroup with empty -- 2.31.1

2 years, 9 months

[RFC PATCH 3/5] cgroup/cpuset: Make isolated partition pull CPUs from isolcpus partition

by Waiman Long

With the addition of a new "isolcpus" partition in a previous patch, this patch adds the capability for a privileged user to pull isolated CPUs from the "isolcpus" partition to an "isolated" partition if its parent cannot satisfy its request directly. The following conditions must be true for the pulling of isolated CPUs from "isolcpus" partition to be successful. (1) The value of "cpuset.cpus" must still be a subset of its parent's "cpuset.cpus" to ensure proper inheritance even though these CPUs cannot be used until the cpuset becomes an "isolated" partition. (2) All the CPUs in "cpuset.cpus" are freely available in the "isolcpus" partition, i.e. in its "cpuset.cpus.effective" and not yet claimed by other isolated partitions. With this change, the CPUs in an "isolated" partition can either come from the "isolcpus" partition or from its direct parent, but not both. Now the parent of an isolated partition does not need to be a partition root anymore. Because of the cpu exclusive nature of an "isolated" partition, these isolated CPUs cannot be distributed to other siblings of that isolated partition. Changes to "cpuset.cpus" of such an isolated partition is allowed as long as all the newly requested CPUs can be granted from the "isolcpus" partition. Otherwise, the partition will become invalid. This makes the management and distribution of isolated CPUs to those applications that require them much easier. An "isolated" partition that pulls CPUs from the special "isolcpus" partition can now have 2 parents - the "isolcpus" partition where it gets its isolated CPUs and its hierarchical parent where it gets all the other resources. However, such an "isolated" partition cannot have subpartitions as all the CPUs from "isolcpus" must be in the same isolated state. Signed-off-by: Waiman Long <longman(a)redhat.com> --- kernel/cgroup/cpuset.c | 282 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 264 insertions(+), 18 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 444eae3a9a6b..a5bbd43ed46e 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -101,6 +101,7 @@ enum prs_errcode { PERR_ISOLCPUS, PERR_ISOLTASK, PERR_ISOLCHILD, + PERR_ISOPARENT, }; static const char * const perr_strings[] = { @@ -114,6 +115,7 @@ static const char * const perr_strings[] = { [PERR_ISOLCPUS] = "An isolcpus partition is already present", [PERR_ISOLTASK] = "Isolcpus partition can't have tasks", [PERR_ISOLCHILD] = "Isolcpus partition can't have children", + [PERR_ISOPARENT] = "Isolated/isolcpus parent can't have subpartition", }; struct cpuset { @@ -1333,6 +1335,195 @@ static void update_partition_sd_lb(struct cpuset *cs, int old_prs) rebuild_sched_domains_locked(); } +/* + * isolcpus_pull - Enable or disable pulling of isolated cpus from isolcpus + * @cs: the cpuset to update + * @cmd: the command code (only partcmd_enable or partcmd_disable) + * Return: 1 if successful, 0 if error + * + * Note that pulling isolated cpus from isolcpus or cpus from parent does + * not require rebuilding sched domains. So we can change the flags directly. + */ +static int isolcpus_pull(struct cpuset *cs, enum subparts_cmd cmd) +{ + struct cpuset *parent = parent_cs(cs); + + if (!isolcpus_cs) + return 0; + + /* + * To enable pulling of isolated CPUs from isolcpus, cpus_allowed + * must be a subset of both its parent's cpus_allowed and isolcpus_cs's + * effective_cpus and the user has sysadmin privilege. + */ + if ((cmd == partcmd_enable) && capable(CAP_SYS_ADMIN) && + cpumask_subset(cs->cpus_allowed, isolcpus_cs->effective_cpus) && + cpumask_subset(cs->cpus_allowed, parent->cpus_allowed)) { + /* + * Move cpus from effective_cpus to subparts_cpus & make + * cs a child of isolcpus partition. + */ + spin_lock_irq(&callback_lock); + cpumask_andnot(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cs->cpus_allowed); + cpumask_or(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, cs->cpus_allowed); + cpumask_copy(cs->effective_cpus, cs->cpus_allowed); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + + if (cs->use_parent_ecpus) { + cs->use_parent_ecpus = false; + parent->child_ecpus_count--; + } + list_add(&cs->isol_sibling, &isol_children); + clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + spin_unlock_irq(&callback_lock); + return 1; + } + + if ((cmd == partcmd_disable) && !list_empty(&cs->isol_sibling)) { + /* + * This can be called after isolcpus shrinks its cpu list. + * So not all the cpus should be returned back to isolcpus. + */ + WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED); + spin_lock_irq(&callback_lock); + cpumask_andnot(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, cs->cpus_allowed); + cpumask_or(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cs->effective_cpus); + cpumask_and(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, + isolcpus_cs->cpus_allowed); + cpumask_and(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cpu_active_mask); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + + if (!cpumask_and(cs->effective_cpus, parent->effective_cpus, + cs->cpus_allowed)) { + cs->use_parent_ecpus = true; + parent->child_ecpus_count++; + cpumask_copy(cs->effective_cpus, + parent->effective_cpus); + } + list_del_init(&cs->isol_sibling); + cs->partition_root_state = PRS_INVALID_ISOLATED; + cs->prs_err = PERR_INVCPUS; + + set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + clear_bit(CS_CPU_EXCLUSIVE, &cs->flags); + spin_unlock_irq(&callback_lock); + return 1; + } + return 0; +} + +static void isolcpus_disable(void) +{ + struct cpuset *child, *next; + + list_for_each_entry_safe(child, next, &isol_children, isol_sibling) + WARN_ON_ONCE(isolcpus_pull(child, partcmd_disable)); + + isolcpus_cs = NULL; +} + +/* + * isolcpus_cpus_update - cpuset.cpus change in isolcpus partition + */ +static void isolcpus_cpus_update(struct cpuset *cs) +{ + struct cpuset *child, *next; + + if (WARN_ON_ONCE(isolcpus_cs != cs)) + return; + + if (list_empty(&isol_children)) + return; + + /* + * Remove child isolated partitions that are not fully covered by + * subparts_cpus. + */ + list_for_each_entry_safe(child, next, &isol_children, + isol_sibling) { + if (cpumask_subset(child->cpus_allowed, + cs->subparts_cpus)) + continue; + + isolcpus_pull(child, partcmd_disable); + } +} + +/* + * isolated_cpus_update - cpuset.cpus change in isolated partition + * + * Return: 1 if no further action needs, 0 otherwise + */ +static int isolated_cpus_update(struct cpuset *cs, struct cpumask *newmask, + struct tmpmasks *tmp) +{ + struct cpumask *addmask = tmp->addmask; + struct cpumask *delmask = tmp->delmask; + + if (WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED) || + list_empty(&cs->isol_sibling)) + return 0; + + if (WARN_ON_ONCE(!isolcpus_cs) || cpumask_empty(newmask)) { + isolcpus_pull(cs, partcmd_disable); + return 0; + } + + if (cpumask_andnot(addmask, newmask, cs->cpus_allowed)) { + /* + * Check if isolcpus partition can provide the new CPUs + */ + if (!cpumask_subset(addmask, isolcpus_cs->cpus_allowed) || + cpumask_intersects(addmask, isolcpus_cs->subparts_cpus)) { + isolcpus_pull(cs, partcmd_disable); + return 0; + } + + /* + * Pull addmask isolated CPUs from isolcpus partition + */ + spin_lock_irq(&callback_lock); + cpumask_andnot(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, addmask); + cpumask_andnot(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, addmask); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + spin_unlock_irq(&callback_lock); + } + + if (cpumask_andnot(tmp->delmask, cs->cpus_allowed, newmask)) { + /* + * Return isolated CPUs back to isolcpus partition + */ + spin_lock_irq(&callback_lock); + cpumask_or(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, delmask); + cpumask_or(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, delmask); + cpumask_and(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cpu_active_mask); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + spin_unlock_irq(&callback_lock); + } + + spin_lock_irq(&callback_lock); + cpumask_copy(cs->cpus_allowed, newmask); + cpumask_andnot(cs->effective_cpus, newmask, cs->subparts_cpus); + cpumask_and(cs->effective_cpus, cs->effective_cpus, cpu_active_mask); + spin_unlock_irq(&callback_lock); + return 1; +} + /** * update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset * @cs: The cpuset that requests change in partition root state @@ -1579,7 +1770,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, spin_unlock_irq(&callback_lock); if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS)) - isolcpus_cs = NULL; + isolcpus_disable(); if (adding || deleting) update_tasks_cpumask(parent, tmp->addmask); @@ -1625,6 +1816,12 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, struct cpuset *parent = parent_cs(cp); bool update_parent = false; + /* + * Skip isolated cpuset that pull isolated CPUs from isolcpus + */ + if (!list_empty(&cp->isol_sibling)) + continue; + compute_effective_cpumask(tmp->new_cpus, cp, parent); /* @@ -1742,7 +1939,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, WARN_ON(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); - update_tasks_cpumask(cp, tmp->new_cpus); + update_tasks_cpumask(cp, cp->effective_cpus); /* * On legacy hierarchy, if the effective cpumask of any non- @@ -1888,6 +2085,10 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, return retval; if (cs->partition_root_state) { + if (!list_empty(&cs->isol_sibling) && + isolated_cpus_update(cs, trialcs->cpus_allowed, &tmp)) + goto update_hier; /* CPUs update done */ + if (invalidate) update_parent_subparts_cpumask(cs, partcmd_invalidate, NULL, &tmp); @@ -1920,6 +2121,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, } spin_unlock_irq(&callback_lock); +update_hier: #ifdef CONFIG_CPUMASK_OFFSTACK /* Now trialcs->cpus_allowed is available */ tmp.new_cpus = trialcs->cpus_allowed; @@ -1928,8 +2130,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, /* effective_cpus will be updated here */ update_cpumasks_hier(cs, &tmp, false); - if (cs->partition_root_state) { - bool force = (cs->partition_root_state == PRS_ISOLCPUS); + if (cs->partition_root_state && list_empty(&cs->isol_sibling)) { struct cpuset *parent = parent_cs(cs); /* @@ -1937,8 +2138,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, * cpusets if they use parent's effective_cpus or when * the current cpuset is an isolcpus partition. */ - if (parent->child_ecpus_count || force) - update_sibling_cpumasks(parent, cs, &tmp, force); + if (cs->partition_root_state == PRS_ISOLCPUS) { + update_sibling_cpumasks(parent, cs, &tmp, true); + isolcpus_cpus_update(cs); + } else if (parent->child_ecpus_count) { + update_sibling_cpumasks(parent, cs, &tmp, false); + } /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */ update_partition_sd_lb(cs, old_prs); @@ -2307,7 +2512,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, return err; } -/** +/* * update_prstate - update partition_root_state * @cs: the cpuset to update * @new_prs: new partition root state @@ -2325,13 +2530,10 @@ static int update_prstate(struct cpuset *cs, int new_prs) return 0; /* - * For a previously invalid partition root, leave it at being - * invalid if new_prs is not "member". + * For a previously invalid partition root, treat it like a "member". */ - if (new_prs && is_prs_invalid(old_prs)) { - cs->partition_root_state = -new_prs; - return 0; - } + if (new_prs && is_prs_invalid(old_prs)) + old_prs = PRS_MEMBER; if (alloc_cpumasks(NULL, &tmpmask)) return -ENOMEM; @@ -2371,6 +2573,21 @@ static int update_prstate(struct cpuset *cs, int new_prs) } } + /* + * A parent isolated partition that gets its isolated CPUs from + * isolcpus cannot have subpartition. + */ + if (new_prs && !list_empty(&parent->isol_sibling)) { + err = PERR_ISOPARENT; + goto out; + } + + if ((old_prs == PRS_ISOLATED) && !list_empty(&cs->isol_sibling)) { + isolcpus_pull(cs, partcmd_disable); + old_prs = 0; + } + WARN_ON_ONCE(!list_empty(&cs->isol_sibling)); + err = update_partition_exclusive(cs, new_prs); if (err) goto out; @@ -2386,6 +2603,10 @@ static int update_prstate(struct cpuset *cs, int new_prs) err = update_parent_subparts_cpumask(cs, partcmd_enable, NULL, &tmpmask); + if (err && (new_prs == PRS_ISOLATED) && + isolcpus_pull(cs, partcmd_enable)) + err = 0; /* Successful isolcpus pull */ + if (err) goto out; } else if (old_prs && new_prs) { @@ -2445,7 +2666,7 @@ static int update_prstate(struct cpuset *cs, int new_prs) if (new_prs == PRS_ISOLCPUS) isolcpus_cs = cs; else if (cs == isolcpus_cs) - isolcpus_cs = NULL; + isolcpus_disable(); /* * Update child cpusets, if present. @@ -3674,8 +3895,31 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) } parent = parent_cs(cs); - compute_effective_cpumask(&new_cpus, cs, parent); nodes_and(new_mems, cs->mems_allowed, parent->effective_mems); + /* + * In the special case of a valid isolated cpuset pulling isolated + * cpus from isolcpus. We just need to mask offline cpus from + * cpus_allowed unless all the isolated cpus are gone. + */ + if (!list_empty(&cs->isol_sibling)) { + if (!cpumask_and(&new_cpus, cs->cpus_allowed, cpu_active_mask)) + isolcpus_pull(cs, partcmd_disable); + } else if ((cs->partition_root_state == PRS_ISOLCPUS) && + cpumask_empty(cs->cpus_allowed)) { + /* + * For isolcpus with empty cpus_allowed, just update + * effective_mems and be done with it. + */ + spin_lock_irq(&callback_lock); + if (nodes_empty(new_mems)) + cs->effective_mems = parent->effective_mems; + else + cs->effective_mems = new_mems; + spin_unlock_irq(&callback_lock); + goto unlock; + } else { + compute_effective_cpumask(&new_cpus, cs, parent); + } if (cs->nr_subparts_cpus) /* @@ -3707,10 +3951,12 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) * the following conditions hold: * 1) empty effective cpus but not valid empty partition. * 2) parent is invalid or doesn't grant any cpus to child - * partitions. + * partitions and not an isolated cpuset pulling cpus from + * isolcpus. */ - if (is_partition_valid(cs) && (!parent->nr_subparts_cpus || - (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) { + if (is_partition_valid(cs) && + ((!parent->nr_subparts_cpus && list_empty(&cs->isol_sibling)) || + (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) { int old_prs, parent_prs; update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp); -- 2.31.1

2 years, 9 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror