The resctrl selftest currently exhibits several failures on Hygon CPUs due to missing vendor detection and edge-case handling specific to Hygon's architecture.
This patch series addresses three distinct issues: 1. Missing CPU vendor detection, causing the test to fail with "# Can not get vendor info..." on Hygon CPUs. 2. A division-by-zero crash in SNC detection on Hygon CPUs. 3. Incorrect handling of non-contiguous CBM support on Hygon CPUs.
These changes enable resctrl selftest to run successfully on Hygon CPUs that support Platform QoS features.
Changelog: v2: - Patch 1: switch all of the vendor id bitmasks to use BIT() (Reinette) - Patch 2: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com - Patch 3: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com add a maintainer note to highlight it is not a candidate for backport (Reinette)
Xiaochen Shen (3): selftests/resctrl: Add CPU vendor detection for Hygon selftests/resctrl: Fix a division by zero error on Hygon selftests/resctrl: Fix non-contiguous CBM check for Hygon
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- tools/testing/selftests/resctrl/resctrl.h | 6 ++++-- tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 4 files changed, 18 insertions(+), 4 deletions(-)
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net --- tools/testing/selftests/resctrl/resctrl.h | 6 ++++-- tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..411ee10380a5 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h"
#define MB (1024 * 1024) @@ -36,8 +37,9 @@ * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1) +#define ARCH_HYGON BIT(2)
#define END_OF_TESTS 1
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..9bf35f3beb6b 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -42,6 +42,8 @@ static int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD; + else if (s && !strcmp(s, ": HygonGenuine\n")) + vendor_id = ARCH_HYGON;
fclose(inf); free(res);
Hi, Xiaochen,
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net
tools/testing/selftests/resctrl/resctrl.h | 6 ++++-- tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..411ee10380a5 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h" #define MB (1024 * 1024) @@ -36,8 +37,9 @@
- Define as bits because they're used for vendor_specific bitmask in
- the struct resctrl_test.
*/ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1) +#define ARCH_HYGON BIT(2) #define END_OF_TESTS 1 diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..9bf35f3beb6b 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -42,6 +42,8 @@ static int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD;
- else if (s && !strcmp(s, ": HygonGenuine\n"))
vendor_id = ARCH_HYGON;
Since vendor_id is bitmask now and BIT() is a UL value, it's better to define it as "unsigned int" (unsigned long is a bit overkill). Otherwise, type conversion may be risky.
Is it better to change vendor_id as "unsigned int", static unsigned int detect_vendor(), and a couple of other places?
fclose(inf); free(res);
Thanks. -Fenghua
Commit
a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with: "Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform.
Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com --- tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 195f04c4d158..2b075e7334bf 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus;
+ /* + * On certain Hygon platforms: + * cache_cpus < node_cpus, the calculated snc_mode is 0. + * + * Set snc_mode = 1 to indicate that SNC mode is not + * supported on the platform. + */ + if (!snc_mode) + snc_mode = 1; + if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
Hi, Xiaochen,
On 12/5/25 01:25, Xiaochen Shen wrote:
Commit
a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with: "Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform.
Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 195f04c4d158..2b075e7334bf 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus;
/** On certain Hygon platforms:
nit. This situation could happen on other platforms than Hygon. Maybe it's better to have a more generic comment here? * On some platforms (e.g. Hygon),
Reviewed-by: Fenghua Yu fenghuay@nvidia.com
* cache_cpus < node_cpus, the calculated snc_mode is 0.** Set snc_mode = 1 to indicate that SNC mode is not* supported on the platform.*/if (!snc_mode)snc_mode = 1;- if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
Thanks. -Fenghua
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com --- Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param
static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON) return true;
#if defined(__i386__) || defined(__x86_64__) /* arch */
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param static bool arch_supports_noncont_cat(const struct resctrl_test *test) {
- /* AMD always supports non-contiguous CBM. */
- if (get_vendor() == ARCH_AMD)
- /* AMD and Hygon always supports non-contiguous CBM. */
- if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
unsigned int vendor_id = get_vendor();
if (vendor_id == ARCH_AMD || vendor_id == ARCH_HYGON)
return true;#if defined(__i386__) || defined(__x86_64__) /* arch */
Reviewed-by: Fenghua Yu fenghuay@nvidia.com
Thanks.
-Fenghua
Hi Fenghua,
On 12/5/25 11:39 AM, Fenghua Yu wrote:
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
Are you perhaps referring to detect_vendor()? detect_vendor() does the actual digging to determine the vendor ID and is indeed called just once by get_vendor(). In subsequent calls get_vendor() just returns the static ID.
Reinette
Hi, Reinette,
On 12/5/25 13:30, Reinette Chatre wrote:
Hi Fenghua,
On 12/5/25 11:39 AM, Fenghua Yu wrote:
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
Are you perhaps referring to detect_vendor()? detect_vendor() does the actual digging to determine the vendor ID and is indeed called just once by get_vendor(). In subsequent calls get_vendor() just returns the static ID.
There is still cost to call get_vendor() (call, push, cmp, pop, ret, etc) in subsequent calls. I just feel it's redundant to call it multiple times in just one sentence.
Thanks.
-Fenghua
Hi Tony,
On 12/9/25 3:42 PM, Luck, Tony wrote:
On Tue, Dec 09, 2025 at 03:02:14PM -0800, Reinette Chatre wrote:
I suggest this be simplified to not have the vendor ID be used both as a value and as a state. Here is some pseudo-code that should be able to accomplish this:
unsigned int detect_vendor(void) { static bool initialized = false; static unsigned int vendor_id; ... FILE *inf;
if (initialized) return vendor_id; inf = fopen("/proc/cpuinfo", "r"); if (!inf) { vendor_id = 0; initialized = true; return vendor_id; } /* initialize vendor_id from /proc/cpuinfo */ initialized = true; return vendor_id;}
unsigned int get_vendor(void) { unsigned int vendor; vendor = detect_vendor();
if (vendor == 0) ksft_print_msg(...); return vendor;}
If detect_vendor() failed, this you'd get the ksft_print_msg() for every call to get_vendor().
Right. This is intended to match existing behavior. The goal is to only do the work of querying the vendor information once. The tests are independent so to avoid the failure message about obtaining vendor information to only be in the test that did the original query it is printed in every test's output.
Why not split completly.
static unsigned int vendor_id;
void detect_vendor(void) { FILE *inf = fopen("/proc/cpuinfo", "r");
if (!inf) { ... warning unable to get vendor id ... }
... initialize from /proc/cpuinfo ...
... warn if doesn't find a known vendor ... }
Call detect_vendor() at the beginning of main() in each test.
This will repeat the vendor detection for every (currently six) test? This seems unnecessary work to me considering this only needs to be done once.
Then just use "vendor_id" whenever you need to test for some vendor specific feature.
Including the warning within detect_vendor() and calling it in each test does address the goal of including any vendor detection failure message in log of every test that depends on it. Even so, from what I can tell this separates the message from where test actually fails though, making things harder to debug, and I expect will result in more code duplication: the duplicated calls to detect_vendor() and likely tests needing to duplicate current get_vendor() (more below):
If I understand correctly this would look something like:
some_function() { if (vendor_id == ARCH_TBD) /* Do something */ }
resctrl_test::run_test(...) { /* prep */ detect_vendor(); /* may print warning */ /* * Do not fail the test if vendor detection fails since not all * flows may depend on vendor information. */ /* run actual test */ /* do some things that result in messages in test log */ some_function(); /* do some things that result in messages in test log */ } In above the test log will show early that vendor detection failed but it is not clear in the test log what the impact on the test is by being clear where in the test the failed vendor detection is needed/used.
I anticipate that tests will start to be defensive and do things like below that would create unnecessary duplication that is currently handled by get_vendor(). some_function() { if (vendor_id == 0) { ksft_print_msg("Vendor invalid, cannot do <something>\n"); return; }
if (vendor_id == ARCH_TBD) /* Do something */ }
Just failing tests if the vendor cannot be determined may be an easy solution but since not all tests depend on vendor information this seems too severe.
Reinette
Hi Reinette,
On 12/10/2025 7:02 AM, Reinette Chatre wrote:
I suggest this be simplified to not have the vendor ID be used both as a value and as a state. Here is some pseudo-code that should be able to accomplish this:
unsigned int detect_vendor(void) { static bool initialized = false; static unsigned int vendor_id; ... FILE *inf;
if (initialized) return vendor_id; inf = fopen("/proc/cpuinfo", "r"); if (!inf) { vendor_id = 0; initialized = true; return vendor_id; } /* initialize vendor_id from /proc/cpuinfo */ initialized = true; return vendor_id;}
unsigned int get_vendor(void) { unsigned int vendor; vendor = detect_vendor();
if (vendor == 0) ksft_print_msg(...); return vendor;}
Reinette
Thank you very much! I will make the change in v3 patch series.
Could you help review the revised patch description for the change? -------------------------------- ... and makes it obvious when adding new vendor IDs.
Accordingly, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Furthermore, introduce a bool flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic. This ensures the vendor ID is detected only once and resolves the ambiguity of using the same variable 'vendor' both as a value and as a state.
--------------------------------
Thank you!
Best regards, Xiaochen Shen
Hi Fenghua,
On 12/6/2025 2:53 AM, Fenghua Yu wrote:
@@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus; + /* + * On certain Hygon platforms:
nit. This situation could happen on other platforms than Hygon. Maybe it's better to have a more generic comment here? * On some platforms (e.g. Hygon),
I will update the comment as you suggested. Thank you!
Reviewed-by: Fenghua Yu fenghuay@nvidia.com
Thank you!
+ * cache_cpus < node_cpus, the calculated snc_mode is 0. + * + * Set snc_mode = 1 to indicate that SNC mode is not + * supported on the platform. + */ + if (!snc_mode) + snc_mode = 1;
if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
Thanks. -Fenghua
Best regards, Xiaochen Shen
Hi Fenghua,
On 12/6/2025 3:28 AM, Fenghua Yu wrote:
--- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -42,6 +42,8 @@ static int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD; + else if (s && !strcmp(s, ": HygonGenuine\n")) + vendor_id = ARCH_HYGON;
Since vendor_id is bitmask now and BIT() is a UL value, it's better to define it as "unsigned int" (unsigned long is a bit overkill). Otherwise, type conversion may be risky.
Thank you for the suggestion. How about using BIT_U8() instead of BIT()? In my opinion, 8-bits type "unsigned int" is enough for "vendor id".
Is it better to change vendor_id as "unsigned int", static unsigned int detect_vendor(), and a couple of other places?
Yes. It is better to update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Should I split the code changes (using BIT_xx(), updates of type 'unsigned int') into a separate patch?
The patch may look like: ----------------------------- commit baaabb7bd3a3e45a8093422b576383da20488aca Author: Xiaochen Shen shenxiaochen@open-hieco.net Date: Mon Dec 8 14:26:45 2025 +0800
selftests/resctrl: Improve type definitions of CPU vendor IDs
In file resctrl.h: ----------------- /* * CPU vendor IDs * * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ #define ARCH_INTEL 1 #define ARCH_AMD 2 -----------------
The comment before the CPU vendor IDs defines attempts to provide guidance but it is clearly still quite subtle that these values are required to be unique bits. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific
A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs.
Additionally, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Suggested-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Fenghua Yu fenghuay@nvidia.com Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..2922dfbf9090 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h"
#define MB (1024 * 1024) @@ -36,8 +37,8 @@ * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT_U8(0) +#define ARCH_AMD BIT_U8(1)
#define END_OF_TESTS 1
@@ -163,7 +164,7 @@ extern int snc_unreliable; extern char llc_occup_path[1024];
int snc_nodes_per_l3_cache(void); -int get_vendor(void); +unsigned int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..0fef2e4171e7 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -23,10 +23,10 @@ static struct resctrl_test *resctrl_tests[] = { &l2_noncont_cat_test, };
-static int detect_vendor(void) +static unsigned int detect_vendor(void) { FILE *inf = fopen("/proc/cpuinfo", "r"); - int vendor_id = 0; + unsigned int vendor_id = 0; char *s = NULL; char *res;
@@ -48,12 +48,14 @@ static int detect_vendor(void) return vendor_id; }
-int get_vendor(void) +unsigned int get_vendor(void) { - static int vendor = -1; + static unsigned int vendor;
- if (vendor == -1) + if (vendor == 0) vendor = detect_vendor(); + + /* detect_vendor() returns invalid vendor id */ if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");
-----------------------------
Thank you!
Best regards, Xiaochen Shen
Hi Fenghua,
On 12/6/2025 3:39 AM, Fenghua Yu wrote:
static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
unsigned int vendor_id = get_vendor();
if (vendor_id == ARCH_AMD || vendor_id == ARCH_HYGON)
Thank you! I will update the code as you suggested.
return true; #if defined(__i386__) || defined(__x86_64__) /* arch */
Reviewed-by: Fenghua Yu fenghuay@nvidia.com
Thank you!
Thanks.
-Fenghua
Best regards, Xiaochen Shen
Hi Xiaochen,
On 12/8/25 12:01 AM, Xiaochen Shen wrote:
Hi Fenghua,
On 12/6/2025 3:28 AM, Fenghua Yu wrote:
--- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -42,6 +42,8 @@ static int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD; + else if (s && !strcmp(s, ": HygonGenuine\n")) + vendor_id = ARCH_HYGON;
Since vendor_id is bitmask now and BIT() is a UL value, it's better to define it as "unsigned int" (unsigned long is a bit overkill). Otherwise, type conversion may be risky.
Thank you for the suggestion. How about using BIT_U8() instead of BIT()? In my opinion, 8-bits type "unsigned int" is enough for "vendor id".
BIT() is fine here. I prefer that types used by selftests are consistent, that is, not a mix of user space and kernel types. There may be good motivation to switch to kernel types but then it needs to be throughout the resctrl selftests, which is not something this work needs to take on.
Is it better to change vendor_id as "unsigned int", static unsigned int detect_vendor(), and a couple of other places?
Yes. It is better to update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Should I split the code changes (using BIT_xx(), updates of type 'unsigned int') into a separate patch?
I agree this would be better as a separate patch.
The patch may look like:
commit baaabb7bd3a3e45a8093422b576383da20488aca Author: Xiaochen Shen shenxiaochen@open-hieco.net Date: Mon Dec 8 14:26:45 2025 +0800
selftests/resctrl: Improve type definitions of CPU vendor IDs
Instead of a generic "Improve" it can just be specific about what it does: "selftests/resctrl: Define CPU vendor IDs as bits to match usage"
In file resctrl.h: ----------------- /* * CPU vendor IDs * * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ #define ARCH_INTEL 1 #define ARCH_AMD 2 ----------------- The comment before the CPU vendor IDs defines attempts to provide guidance but it is clearly still quite subtle that these values are
I wrote "clearly" in response to the earlier patch that did not follow the quoted documentation, implying that the documentation was not sufficient. I do not think "clearly" applies here. This can just be specific about how these values are used ... which this paragraph duplicates from the quoted comment so either this paragraph or the code quote could be dropped?
required to be unique bits. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs. Additionally, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Suggested-by: Fenghua Yu <fenghuay@nvidia.com> Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..2922dfbf9090 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h"
#define MB (1024 * 1024) @@ -36,8 +37,8 @@
- Define as bits because they're used for vendor_specific bitmask in
- the struct resctrl_test.
*/ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT_U8(0) +#define ARCH_AMD BIT_U8(1)
#define END_OF_TESTS 1
@@ -163,7 +164,7 @@ extern int snc_unreliable; extern char llc_occup_path[1024];
int snc_nodes_per_l3_cache(void); -int get_vendor(void); +unsigned int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..0fef2e4171e7 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -23,10 +23,10 @@ static struct resctrl_test *resctrl_tests[] = { &l2_noncont_cat_test, };
-static int detect_vendor(void) +static unsigned int detect_vendor(void) { FILE *inf = fopen("/proc/cpuinfo", "r");
int vendor_id = 0;
unsigned int vendor_id = 0; char *s = NULL; char *res;@@ -48,12 +48,14 @@ static int detect_vendor(void) return vendor_id; }
-int get_vendor(void) +unsigned int get_vendor(void) {
static int vendor = -1;
static unsigned int vendor;
if (vendor == -1)
if (vendor == 0) vendor = detect_vendor();/* detect_vendor() returns invalid vendor id */ if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");
detect_vendor() returns 0 if it cannot detect the vendor. Using "0" as well as return value of detect_vendor() to indicate that detect_vendor() should be run will thus cause detect_vendor() to always be called on failure even though it will keep failing.
Can vendor be kept as int and just cast it on return? This may be introducing the risky type conversion that the changelog claims to avoid though ....
Reinette
Hi Reinette,
On 12/9/2025 1:57 AM, Reinette Chatre wrote:
Thank you for the suggestion. How about using BIT_U8() instead of BIT()? In my opinion, 8-bits type "unsigned int" is enough for "vendor id".
BIT() is fine here. I prefer that types used by selftests are consistent, that is, not a mix of user space and kernel types. There may be good motivation to switch to kernel types but then it needs to be throughout the resctrl selftests, which is not something this work needs to take on.
Thank you. I will keep BIT() here.
Should I split the code changes (using BIT_xx(), updates of type 'unsigned int') into a separate patch?
I agree this would be better as a separate patch.
Sure. I will add a prerequisite patch in this series.
The patch may look like:
commit baaabb7bd3a3e45a8093422b576383da20488aca Author: Xiaochen Shen shenxiaochen@open-hieco.net Date: Mon Dec 8 14:26:45 2025 +0800
selftests/resctrl: Improve type definitions of CPU vendor IDsInstead of a generic "Improve" it can just be specific about what it does: "selftests/resctrl: Define CPU vendor IDs as bits to match usage"
Thank you for the suggestion. The subject of the patch looks much better.
In file resctrl.h:
/* * CPU vendor IDs * * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ #define ARCH_INTEL 1 #define ARCH_AMD 2 ----------------- The comment before the CPU vendor IDs defines attempts to provide guidance but it is clearly still quite subtle that these values areI wrote "clearly" in response to the earlier patch that did not follow the quoted documentation, implying that the documentation was not sufficient. I do not think "clearly" applies here. This can just be specific about how these values are used ... which this paragraph duplicates from the quoted comment so either this paragraph or the code quote could be dropped?
Thank you for the suggestion. The revised patch description as below: -------------------------------------- The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific
However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2
A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs. ... --------------------------------------
required to be unique bits. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific-int get_vendor(void) +unsigned int get_vendor(void) {
static int vendor = -1;
static unsigned int vendor;
if (vendor == -1)
if (vendor == 0) vendor = detect_vendor();/* detect_vendor() returns invalid vendor id */ if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");detect_vendor() returns 0 if it cannot detect the vendor. Using "0" as well as return value of detect_vendor() to indicate that detect_vendor() should be run will thus cause detect_vendor() to always be called on failure even though it will keep failing.
Thank you. I got it. In original code, "static int vendor = -1;" does it intentionally.
Can vendor be kept as int and just cast it on return? This may be introducing the risky type conversion that the changelog claims to avoid though ....
This is really a dilemma. I could keep vendor as int, even thought the code doesn't look graceful. I will try to add a comment for it. The code changes may look like: ------------------------------- -int get_vendor(void) +unsigned int get_vendor(void) { static int vendor = -1;
+ /* + * Notes on vendor: + * -1: initial value, detect_vendor() is not called yet. + * 0: detect_vendor() returns 0 if it cannot detect the vendor. + * > 0: detect_vendor() returns valid vendor id. + * + * The return type of detect_vendor() is 'unsigned int'. + * Cast vendor from 'int' to 'unsigned int' on return. + */ if (vendor == -1) vendor = detect_vendor(); + if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");
- return vendor; + return (unsigned int) vendor; } -------------------------------
Thank you!
Best regards, Xiaochen Shen
Hi Xiaochen,
On 12/8/25 10:10 PM, Xiaochen Shen wrote:
On 12/9/2025 1:57 AM, Reinette Chatre wrote:
...
In file resctrl.h:
/* * CPU vendor IDs * * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ #define ARCH_INTEL 1 #define ARCH_AMD 2 ----------------- The comment before the CPU vendor IDs defines attempts to provide guidance but it is clearly still quite subtle that these values areI wrote "clearly" in response to the earlier patch that did not follow the quoted documentation, implying that the documentation was not sufficient. I do not think "clearly" applies here. This can just be specific about how these values are used ... which this paragraph duplicates from the quoted comment so either this paragraph or the code quote could be dropped?
Thank you for the suggestion. The revised patch description as below:
The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2 A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs.
Thank you. Looks good to me.
...
required to be unique bits. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific-int get_vendor(void) +unsigned int get_vendor(void) {
static int vendor = -1;
static unsigned int vendor;
if (vendor == -1)
if (vendor == 0) vendor = detect_vendor();/* detect_vendor() returns invalid vendor id */ if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");detect_vendor() returns 0 if it cannot detect the vendor. Using "0" as well as return value of detect_vendor() to indicate that detect_vendor() should be run will thus cause detect_vendor() to always be called on failure even though it will keep failing.
Thank you. I got it. In original code, "static int vendor = -1;" does it intentionally.
Can vendor be kept as int and just cast it on return? This may be introducing the risky type conversion that the changelog claims to avoid though ....
This is really a dilemma. I could keep vendor as int, even thought the code doesn't look graceful. I will try to add a comment for it. The code changes may look like:
-int get_vendor(void) +unsigned int get_vendor(void) { static int vendor = -1;
/** Notes on vendor:* -1: initial value, detect_vendor() is not called yet.* 0: detect_vendor() returns 0 if it cannot detect the vendor.* > 0: detect_vendor() returns valid vendor id.** The return type of detect_vendor() is 'unsigned int'.* Cast vendor from 'int' to 'unsigned int' on return.*/ if (vendor == -1) vendor = detect_vendor();if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");
return vendor;
return (unsigned int) vendor;}
I suggest this be simplified to not have the vendor ID be used both as a value and as a state. Here is some pseudo-code that should be able to accomplish this:
unsigned int detect_vendor(void) { static bool initialized = false; static unsigned int vendor_id; ... FILE *inf;
if (initialized) return vendor_id;
inf = fopen("/proc/cpuinfo", "r"); if (!inf) { vendor_id = 0; initialized = true; return vendor_id; }
/* initialize vendor_id from /proc/cpuinfo */
initialized = true; return vendor_id; }
unsigned int get_vendor(void) { unsigned int vendor; vendor = detect_vendor();
if (vendor == 0) ksft_print_msg(...);
return vendor; }
Reinette
On Tue, Dec 09, 2025 at 03:02:14PM -0800, Reinette Chatre wrote:
I suggest this be simplified to not have the vendor ID be used both as a value and as a state. Here is some pseudo-code that should be able to accomplish this:
unsigned int detect_vendor(void) { static bool initialized = false; static unsigned int vendor_id; ... FILE *inf;
if (initialized) return vendor_id; inf = fopen("/proc/cpuinfo", "r"); if (!inf) { vendor_id = 0; initialized = true; return vendor_id; } /* initialize vendor_id from /proc/cpuinfo */ initialized = true; return vendor_id;}
unsigned int get_vendor(void) { unsigned int vendor; vendor = detect_vendor();
if (vendor == 0) ksft_print_msg(...); return vendor;}
If detect_vendor() failed, this you'd get the ksft_print_msg() for every call to get_vendor().
Why not split completly.
static unsigned int vendor_id;
void detect_vendor(void) { FILE *inf = fopen("/proc/cpuinfo", "r");
if (!inf) { ... warning unable to get vendor id ... }
... initialize from /proc/cpuinfo ...
... warn if doesn't find a known vendor ... }
Call detect_vendor() at the beginning of main() in each test.
Then just use "vendor_id" whenever you need to test for some vendor specific feature.
-Tony
linux-kselftest-mirror@lists.linaro.org