The resctrl selftest currently exhibits several failures on Hygon CPUs due to missing vendor detection and edge-case handling specific to Hygon's architecture.
This patch series addresses three distinct issues: 1. A division-by-zero crash in SNC detection on some platforms (e.g., Hygon). 2. Missing CPU vendor detection, causing the test to fail with "# Can not get vendor info..." on Hygon CPUs. 3. Incorrect handling of non-contiguous CBM support on Hygon CPUs.
These changes enable resctrl selftest to run successfully on Hygon CPUs that support Platform QoS features.
Maintainer notes: ----------------- Patch 1: selftests/resctrl: Fix a division by zero error on Hygon - This is a candidate for backport with "Fixes:" tag.
Patch 2: selftests/resctrl: Define CPU vendor IDs as bits to match usage - This is *not* a candidate for backport since it is an enhancement and preparatory patch for patch 3.
Patch 3: selftests/resctrl: Add CPU vendor detection for Hygon Patch 4: selftests/resctrl: Fix non-contiguous CBM check for Hygon - Even though they are fixes they are *not* candidates for backport since they are based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl. -----------------
Changelog: v4: - Cover letter: add maintainer notes outlining how these patches to be handled (Reinette). - Re-organize the patch series to move original patch 3 to the beginning of series. The patch order has changed between v3 and v4 (Reinette): v3 -> v4 patch #3 -> patch #1 patch #1 -> patch #2 patch #2 -> patch #3 patch #4 -> patch #4 - Patch 2: 1. Resolve a conflict against latest upstream kernel (Reinette). 2. Fix a nit to maintain the reverse fir ordering of variables in detect_vendor() (Reinette). - Patch 3: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com - Patch 4: move the maintainer note into the cover letter (Reinette).
v3: - Patch 1: 1. Update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions (Fenghua). 2. Split the code changes of "define CPU vendor IDs as bits to match usage" from original patch 1 into a separate patch (this patch, suggested by Fenghua and Reinette). 3. Introduce the flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic (Reinette). - Patch 2 (original patch 1): 1. Move the code changes of "define CPU vendor IDs as bits to match usage" into patch 1. - Patch 3 (original patch 2): 1. Fix a nit of code comment for affected platforms (Fenghua). 2. Add Reviewed-by: Fenghua Yu fenghuay@nvidia.com. - Patch 4 (original patch 3): 1. Fix a nit to avoid calling get_vendor() twice (Fenghua). 2. Add Reviewed-by: Fenghua Yu fenghuay@nvidia.com.
v2: - Patch 1: switch all of the vendor id bitmasks to use BIT() (Reinette) - Patch 2: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com - Patch 3: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com add a maintainer note to highlight it is not a candidate for backport (Reinette)
Xiaochen Shen (4): selftests/resctrl: Fix a division by zero error on Hygon selftests/resctrl: Define CPU vendor IDs as bits to match usage selftests/resctrl: Add CPU vendor detection for Hygon selftests/resctrl: Fix non-contiguous CBM check for Hygon
tools/testing/selftests/resctrl/cat_test.c | 6 ++-- tools/testing/selftests/resctrl/resctrl.h | 8 ++++-- .../testing/selftests/resctrl/resctrl_tests.c | 28 +++++++++++++------ tools/testing/selftests/resctrl/resctrlfs.c | 10 +++++++ 4 files changed, 39 insertions(+), 13 deletions(-)
Commit
a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with: "Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform.
Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com --- tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 195f04c4d158..b9c1bfb6cc02 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus;
+ /* + * On some platforms (e.g. Hygon), + * cache_cpus < node_cpus, the calculated snc_mode is 0. + * + * Set snc_mode = 1 to indicate that SNC mode is not + * supported on the platform. + */ + if (!snc_mode) + snc_mode = 1; + if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific
However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2
A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs.
Accordingly, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Furthermore, introduce a bool flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic. This ensures the vendor ID is detected only once and resolves the ambiguity of using the same variable 'vendor' both as a value and as a state.
Suggested-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Fenghua Yu fenghuay@nvidia.com Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net --- tools/testing/selftests/resctrl/resctrl.h | 7 ++--- .../testing/selftests/resctrl/resctrl_tests.c | 26 +++++++++++++------ 2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 3c51bdac2dfa..4f9c7d04c98d 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "kselftest.h"
#define MB (1024 * 1024) @@ -36,8 +37,8 @@ * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1)
#define END_OF_TESTS 1
@@ -163,7 +164,7 @@ extern int snc_unreliable; extern char llc_occup_path[1024];
int snc_nodes_per_l3_cache(void); -int get_vendor(void); +unsigned int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..980ecf2bcf10 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -23,16 +23,24 @@ static struct resctrl_test *resctrl_tests[] = { &l2_noncont_cat_test, };
-static int detect_vendor(void) +static unsigned int detect_vendor(void) { - FILE *inf = fopen("/proc/cpuinfo", "r"); - int vendor_id = 0; + FILE *inf; + static unsigned int vendor_id; char *s = NULL; char *res; + static bool initialized;
- if (!inf) + if (initialized) return vendor_id;
+ inf = fopen("/proc/cpuinfo", "r"); + if (!inf) { + vendor_id = 0; + initialized = true; + return vendor_id; + } + res = fgrep(inf, "vendor_id");
if (res) @@ -45,15 +53,17 @@ static int detect_vendor(void)
fclose(inf); free(res); + + initialized = true; return vendor_id; }
-int get_vendor(void) +unsigned int get_vendor(void) { - static int vendor = -1; + unsigned int vendor; + + vendor = detect_vendor();
- if (vendor == -1) - vendor = detect_vendor(); if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com --- tools/testing/selftests/resctrl/resctrl.h | 1 + tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 4f9c7d04c98d..afe635b6e48d 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -39,6 +39,7 @@ */ #define ARCH_INTEL BIT(0) #define ARCH_AMD BIT(1) +#define ARCH_HYGON BIT(2)
#define END_OF_TESTS 1
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 980ecf2bcf10..78626bbae976 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -50,6 +50,8 @@ static unsigned int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD; + else if (s && !strcmp(s, ": HygonGenuine\n")) + vendor_id = ARCH_HYGON;
fclose(inf); free(res);
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com --- tools/testing/selftests/resctrl/cat_test.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..f00b622c1460 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,10 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param
static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + unsigned int vendor_id = get_vendor(); + + /* AMD and Hygon always support non-contiguous CBM. */ + if (vendor_id == ARCH_AMD || vendor_id == ARCH_HYGON) return true;
#if defined(__i386__) || defined(__x86_64__) /* arch */
linux-kselftest-mirror@lists.linaro.org