On 12/21/25 13:26, Li Wang wrote:
The hugetlb cgroup usage wait loops in charge_reserved_hugetlb.sh were unbounded and could hang forever if the expected cgroup file value never appears (e.g. due to write_to_hugetlbfs in Error mapping).
--- Error log --- # uname -r 6.12.0-xxx.el10.aarch64+64k
# ls /sys/kernel/mm/hugepages/hugepages-* hugepages-16777216kB/ hugepages-2048kB/ hugepages-524288kB/
#./charge_reserved_hugetlb.sh -cgroup-v2 # ----------------------------------------- ... # nr hugepages = 10 # writing cgroup limit: 5368709120 # writing reseravation limit: 5368709120 ... # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 # Waiting for hugetlb memory reservation to reach size 2684354560. # 0 ...
Introduce a small helper, wait_for_file_value(), and use it for:
- waiting for reservation usage to drop to 0,
- waiting for reservation usage to reach a given size,
- waiting for fault usage to reach a given size.
This makes the waits consistent and adds a hard timeout (60 tries with 1s sleep) so the test fails instead of stalling indefinitely.
Signed-off-by: Li Wang liwang@redhat.com Cc: David Hildenbrand david@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Shuah Khan shuah@kernel.org Cc: Waiman Long longman@redhat.com
.../selftests/mm/charge_reserved_hugetlb.sh | 51 +++++++++++-------- 1 file changed, 30 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh index fa6713892d82..447769657634 100755 --- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh +++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh @@ -100,7 +100,7 @@ function setup_cgroup() { echo writing cgroup limit: "$cgroup_limit" echo "$cgroup_limit" >$cgroup_path/$name/hugetlb.${MB}MB.$fault_limit_file
- echo writing reseravation limit: "$reservation_limit"
- echo writing reservation limit: "$reservation_limit" echo "$reservation_limit" > \ $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file
@@ -112,41 +112,50 @@ function setup_cgroup() { fi } +function wait_for_file_value() {
- local path="$1"
- local expect="$2"
- local max_tries=60
- if [[ ! -r "$path" ]]; then
- echo "ERROR: cannot read '$path', missing or permission denied"
- return 1
- fi
- for ((i=1; i<=max_tries; i++)); do
- local cur="$(cat "$path")"
- if [[ "$cur" == "$expect" ]]; then
return 0- fi
- echo "Waiting for $path to become '$expect' (current: '$cur') (try $i/$max_tries)"
- sleep 1
- done
- echo "ERROR: timeout waiting for $path to become '$expect'"
- return 1
+}
- function wait_for_hugetlb_memory_to_get_depleted() { local cgroup="$1" local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
- # Wait for hugetlbfs memory to get depleted.
- while [ $(cat $path) != 0 ]; do
- echo Waiting for hugetlb memory to get depleted.
- cat $path
- sleep 0.5
- done
- wait_for_file_value "$path" "0" }
function wait_for_hugetlb_memory_to_get_reserved() { local cgroup="$1" local size="$2"
- local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
- # Wait for hugetlbfs memory to get written.
- while [ $(cat $path) != $size ]; do
- echo Waiting for hugetlb memory reservation to reach size $size.
- cat $path
- sleep 0.5
- done
- wait_for_file_value "$path" "$size" }
function wait_for_hugetlb_memory_to_get_written() { local cgroup="$1" local size="$2"
- local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
- # Wait for hugetlbfs memory to get written.
- while [ $(cat $path) != $size ]; do
- echo Waiting for hugetlb memory to reach size $size.
- cat $path
- sleep 0.5
- done
- wait_for_file_value "$path" "$size" }
function write_hugetlbfs_and_get_usage() {
It would indeed be cleaner to propagate the error and make it clearer how we then cleanly abort the test.
But if it keeps working for the time beiing, fine with me
Acked-by: David Hildenbrand (Red Hat) david@kernel.org