Recent change in how get_user() handles pointers [1] has a specific case for LAM. It assigns a different bitmask that's later used to check whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel) it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses get_user() in its implementation. Execute the syscall with differently tagged pointers to verify that valid user pointers are passing through and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The test was ran without issues with both the LAM lines from [1] untouched and commented out. The test was also ran without issues with LAM_SUP both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundatio... [2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linu...
Maciej Wieczor-Retman (3): selftests/lam: Move cpu_has_la57() to use cpuinfo flag selftests/lam: Skip test if LAM is disabled selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 120 ++++++++++++++++++++++++++++-- 1 file changed, 115 insertions(+), 5 deletions(-)
In current form cpu_has_la57() reports platform's support for LA57 through reading the output of cpuid. A much more useful information is whether 5-level paging is actually enabled on the running system.
Presence of the la57 flag in /proc/cpuinfo signifies that 5-level paging was compiled into the kernel, is supported by the platform and wasn't disabled by kernel command line argument.
Use system() with cat and grep to figure out if la57 is enabled on the running system.
Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Acked-by: Shuah Khan skhan@linuxfoundation.org --- Changelog v5: - Remove "cat" from system() call and use only "grep".
Changelog v4: - Add this patch to the series.
tools/testing/selftests/x86/lam.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c index 0ea4f6813930..93f2f762d6b5 100644 --- a/tools/testing/selftests/x86/lam.c +++ b/tools/testing/selftests/x86/lam.c @@ -124,14 +124,9 @@ static inline int cpu_has_lam(void) return (cpuinfo[0] & (1 << 26)); }
-/* Check 5-level page table feature in CPUID.(EAX=07H, ECX=00H):ECX.[bit 16] */ static inline int cpu_has_la57(void) { - unsigned int cpuinfo[4]; - - __cpuid_count(0x7, 0, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]); - - return (cpuinfo[2] & (1 << 16)); + return !system("grep -wq la57 /proc/cpuinfo"); }
/*
On 11/27/24 09:35, Maciej Wieczor-Retman wrote:
-/* Check 5-level page table feature in CPUID.(EAX=07H, ECX=00H):ECX.[bit 16] */ static inline int cpu_has_la57(void) {
- unsigned int cpuinfo[4];
- __cpuid_count(0x7, 0, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]);
- return (cpuinfo[2] & (1 << 16));
- return !system("grep -wq la57 /proc/cpuinfo");
}
I would rather we find another way.
First, we've documented the behavior a bit in here:
https://docs.kernel.org/arch/x86/cpuinfo.html
The important part is:
"The absence of a flag in /proc/cpuinfo by itself means almost nothing to an end user."
Even worse, let's say there's a CPU bug and we say define a bug bit:
bugs : spectre_v1 spectre_v2 ... la57_is_broken
How is that grep going to work out? ;)
Could you poke around and see if there is any existing ABI that we can use to query LA57 support? Maybe one of the things KVM exports, or some TASK_SIZE_MAX comparisons?
On 2025-01-24 at 08:14:41 -0800, Dave Hansen wrote:
On 11/27/24 09:35, Maciej Wieczor-Retman wrote:
-/* Check 5-level page table feature in CPUID.(EAX=07H, ECX=00H):ECX.[bit 16] */ static inline int cpu_has_la57(void) {
- unsigned int cpuinfo[4];
- __cpuid_count(0x7, 0, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]);
- return (cpuinfo[2] & (1 << 16));
- return !system("grep -wq la57 /proc/cpuinfo");
}
I would rather we find another way.
First, we've documented the behavior a bit in here:
https://docs.kernel.org/arch/x86/cpuinfo.html
The important part is:
"The absence of a flag in /proc/cpuinfo by itself means almost nothing to an end user."
Even worse, let's say there's a CPU bug and we say define a bug bit:
bugs : spectre_v1 spectre_v2 ... la57_is_broken
How is that grep going to work out? ;)
Could you poke around and see if there is any existing ABI that we can use to query LA57 support? Maybe one of the things KVM exports, or some TASK_SIZE_MAX comparisons?
Sure, I'll try to find some other way.
My previous tactic was to munmap() a high address and see if it works. Does that sound okay in case there isn't anything else would indicate la57 to userspace reliably?
On 1/24/25 12:17, Maciej Wieczor-Retman wrote:
Could you poke around and see if there is any existing ABI that we can use to query LA57 support? Maybe one of the things KVM exports, or some TASK_SIZE_MAX comparisons?
Sure, I'll try to find some other way.
My previous tactic was to munmap() a high address and see if it works. Does that sound okay in case there isn't anything else would indicate la57 to userspace reliably?
Yeah, I think that's fine. I can't think of any obvious horrible pitfalls. All I'd ask is that you spend 20 minutes grepping for things that are conditional on X86_FEATURE_LA57 and see if there's anything else that's a good candidate.
On 2025-01-24 at 21:17:22 +0100, Maciej Wieczor-Retman wrote:
On 2025-01-24 at 08:14:41 -0800, Dave Hansen wrote:
On 11/27/24 09:35, Maciej Wieczor-Retman wrote:
Sure, I'll try to find some other way.
My previous tactic was to munmap() a high address and see if it works. Does that sound okay in case there isn't anything else would indicate la57 to userspace reliably?
Sorry, meant mmap() not munmap()
-- Kind regards Maciej Wieczór-Retman
Until LASS is merged into the kernel [1], LAM is left disabled in the config file. Running the LAM selftest with disabled LAM only results in unhelpful output.
Use one of LAM syscalls() to determine whether the kernel was compiled with LAM support (CONFIG_ADDRESS_MASKING) or not. Skip running the tests in the latter case.
[1] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linu...
Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Acked-by: Shuah Khan skhan@linuxfoundation.org --- Changelog v4: - Add this patch to the series.
tools/testing/selftests/x86/lam.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c index 93f2f762d6b5..4ec37a4be007 100644 --- a/tools/testing/selftests/x86/lam.c +++ b/tools/testing/selftests/x86/lam.c @@ -124,6 +124,14 @@ static inline int cpu_has_lam(void) return (cpuinfo[0] & (1 << 26)); }
+static inline int kernel_has_lam(void) +{ + unsigned long bits; + + syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits); + return !!bits; +} + static inline int cpu_has_la57(void) { return !system("grep -wq la57 /proc/cpuinfo"); @@ -1181,6 +1189,11 @@ int main(int argc, char **argv) return KSFT_SKIP; }
+ if (!kernel_has_lam()) { + ksft_print_msg("LAM is disabled in the kernel!\n"); + return KSFT_SKIP; + } + while ((c = getopt(argc, argv, "ht:")) != -1) { switch (c) { case 't':
On 11/27/24 09:35, Maciej Wieczor-Retman wrote:
+static inline int kernel_has_lam(void) +{
- unsigned long bits;
- syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits);
- return !!bits;
+}
Generally, I'm less picky about selftest/ code than in-kernel code. But people really do take selftest code and use it as a starting point for production code.
I'd much rather have overly verbose, obviously correct code:
err = syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits);
/* Handle syscall failure, like pre-LAM kernels: */ if (err) return 0
/* Tag bits are empty on non-LAM systems: */ return !!bits;
Actually, I was going to argue for that^ just on style and writing good code. But then I spotted a bug. What happens if the kernel has CONFIG_ADDRESS_MASKING=n, either because it is config'd off or it's old? The:
put_user(0, (unsigned long __user *)arg2);
won't ever get run and 'bits' will be uninitialized.
So, I think this code was trying to be compact, fast and clever. But it really just turns out to be buggy.
On 2025-01-24 at 08:23:09 -0800, Dave Hansen wrote:
On 11/27/24 09:35, Maciej Wieczor-Retman wrote:
+static inline int kernel_has_lam(void) +{
- unsigned long bits;
- syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits);
- return !!bits;
+}
Generally, I'm less picky about selftest/ code than in-kernel code. But people really do take selftest code and use it as a starting point for production code.
I'd much rather have overly verbose, obviously correct code:
err = syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits);
/* Handle syscall failure, like pre-LAM kernels: */ if (err) return 0
/* Tag bits are empty on non-LAM systems: */ return !!bits;
Sure, more comments is always good :)
Actually, I was going to argue for that^ just on style and writing good code. But then I spotted a bug. What happens if the kernel has CONFIG_ADDRESS_MASKING=n, either because it is config'd off or it's old? The:
put_user(0, (unsigned long __user *)arg2);
won't ever get run and 'bits' will be uninitialized.
Huh, yeah, you're right. I tested it with both CONFIG_ADDRESS_MASKING=n and =y, and on systems with it both available and not available but must've been a coincidence it worked.
I'll fix the checks and init bits for the next version.
So, I think this code was trying to be compact, fast and clever. But it really just turns out to be buggy.
Recent change in how get_user() handles pointers [1] has a specific case for LAM. It assigns a different bitmask that's later used to check whether a pointer comes from userland in get_user().
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses get_user() in its implementation. Execute the syscall with differently tagged pointers to verify that valid user pointers are passing through and invalid kernel/non-canonical pointers are not.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundatio...
Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Acked-by: Shuah Khan skhan@linuxfoundation.org --- Changelog v4: - Use the changed cpu_has_la57() instead of mmap() to figure out current paging level. - Apply Kirill's other comments: Remove redundant always true check, close the ioctl file before exiting, change mapping size to PAGE_SIZE so it looks less odd. - Turn this patch into a series and move some text to the cover letter.
Changelog v3: - mmap the pointer passed to get_user to high address if 5 level paging is enabled and to low address if 4 level paging is enabled.
Changelog v2: - Use mmap with HIGH_ADDR to check if we're in 5 or 4 level pagetables.
tools/testing/selftests/x86/lam.c | 102 ++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c index 4ec37a4be007..b602541e7bdf 100644 --- a/tools/testing/selftests/x86/lam.c +++ b/tools/testing/selftests/x86/lam.c @@ -4,6 +4,7 @@ #include <stdlib.h> #include <string.h> #include <sys/syscall.h> +#include <sys/ioctl.h> #include <time.h> #include <signal.h> #include <setjmp.h> @@ -43,7 +44,15 @@ #define FUNC_INHERITE 0x20 #define FUNC_PASID 0x40
+/* get_user() pointer test cases */ +#define GET_USER_USER 0 +#define GET_USER_KERNEL_TOP 1 +#define GET_USER_KERNEL_BOT 2 +#define GET_USER_KERNEL 3 + #define TEST_MASK 0x7f +#define L5_SIGN_EXT_MASK (0xFFUL << 56) +#define L4_SIGN_EXT_MASK (0x1FFFFUL << 47)
#define LOW_ADDR (0x1UL << 30) #define HIGH_ADDR (0x3UL << 48) @@ -373,6 +382,72 @@ static int handle_syscall(struct testcases *test) return ret; }
+static int get_user_syscall(struct testcases *test) +{ + uint64_t ptr_address, bitmask; + int fd, ret = 0; + void *ptr; + + if (cpu_has_la57()) { + bitmask = L5_SIGN_EXT_MASK; + ptr_address = HIGH_ADDR; + } else { + bitmask = L4_SIGN_EXT_MASK; + ptr_address = LOW_ADDR; + } + + ptr = mmap((void *)ptr_address, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); + + if (ptr == MAP_FAILED) { + perror("failed to map byte to pass into get_user"); + return 1; + } + + if (set_lam(test->lam) != 0) { + ret = 2; + goto error; + } + + fd = memfd_create("lam_ioctl", 0); + if (fd == -1) { + munmap(ptr, PAGE_SIZE); + exit(EXIT_FAILURE); + } + + switch (test->later) { + case GET_USER_USER: + /* Control group - properly tagger user pointer */ + ptr = (void *)set_metadata((uint64_t)ptr, test->lam); + break; + case GET_USER_KERNEL_TOP: + /* Kernel address with top bit cleared */ + bitmask &= (bitmask >> 1); + ptr = (void *)((uint64_t)ptr | bitmask); + break; + case GET_USER_KERNEL_BOT: + /* Kernel address with bottom sign-extension bit cleared */ + bitmask &= (bitmask << 1); + ptr = (void *)((uint64_t)ptr | bitmask); + break; + case GET_USER_KERNEL: + /* Try to pass a kernel address */ + ptr = (void *)((uint64_t)ptr | bitmask); + break; + default: + printf("Invalid test case value passed!\n"); + break; + } + + if (ioctl(fd, FIOASYNC, ptr) != 0) + ret = 1; + + close(fd); +error: + munmap(ptr, PAGE_SIZE); + return ret; +} + int sys_uring_setup(unsigned int entries, struct io_uring_params *p) { return (int)syscall(__NR_io_uring_setup, entries, p); @@ -886,6 +961,33 @@ static struct testcases syscall_cases[] = { .test_func = handle_syscall, .msg = "SYSCALL:[Negative] Disable LAM. Dereferencing pointer with metadata.\n", }, + { + .later = GET_USER_USER, + .lam = LAM_U57_BITS, + .test_func = get_user_syscall, + .msg = "GET_USER: get_user() and pass a properly tagged user pointer.\n", + }, + { + .later = GET_USER_KERNEL_TOP, + .expected = 1, + .lam = LAM_U57_BITS, + .test_func = get_user_syscall, + .msg = "GET_USER:[Negative] get_user() with a kernel pointer and the top bit cleared.\n", + }, + { + .later = GET_USER_KERNEL_BOT, + .expected = 1, + .lam = LAM_U57_BITS, + .test_func = get_user_syscall, + .msg = "GET_USER:[Negative] get_user() with a kernel pointer and the bottom sign-extension bit cleared.\n", + }, + { + .later = GET_USER_KERNEL, + .expected = 1, + .lam = LAM_U57_BITS, + .test_func = get_user_syscall, + .msg = "GET_USER:[Negative] get_user() and pass a kernel pointer.\n", + }, };
static struct testcases mmap_cases[] = {
On 11/27/24 09:35, Maciej Wieczor-Retman wrote: ...
- switch (test->later) {
- case GET_USER_USER:
/* Control group - properly tagger user pointer */
ptr = (void *)set_metadata((uint64_t)ptr, test->lam);
break;
s/tagger/tagged/ ?
- default:
printf("Invalid test case value passed!\n");
break;
- }
- if (ioctl(fd, FIOASYNC, ptr) != 0)
ret = 1;
- close(fd);
+error:
- munmap(ptr, PAGE_SIZE);
- return ret;
+}
I'd really prefer that the theory of operation be in a code comment and not just in the changelog.
On 2025-01-24 at 08:32:18 -0800, Dave Hansen wrote:
On 11/27/24 09:35, Maciej Wieczor-Retman wrote: ...
- switch (test->later) {
- case GET_USER_USER:
/* Control group - properly tagger user pointer */
ptr = (void *)set_metadata((uint64_t)ptr, test->lam);
break;
s/tagger/tagged/ ?
Thanks!
- default:
printf("Invalid test case value passed!\n");
break;
- }
- if (ioctl(fd, FIOASYNC, ptr) != 0)
ret = 1;
- close(fd);
+error:
- munmap(ptr, PAGE_SIZE);
- return ret;
+}
I'd really prefer that the theory of operation be in a code comment and not just in the changelog.
Sure, I'll comment the ioctl() usage so it's more clear what's happening.
Hello Shuah, I'd like to bump this series for visibility and ask if you still consider these patches okay to merge?
Just checked and there were no conflicts after applying it on the newest kselftest-next.
On 2024-11-27 at 18:35:28 +0100, Maciej Wieczor-Retman wrote:
Recent change in how get_user() handles pointers [1] has a specific case for LAM. It assigns a different bitmask that's later used to check whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel) it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses get_user() in its implementation. Execute the syscall with differently tagged pointers to verify that valid user pointers are passing through and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The test was ran without issues with both the LAM lines from [1] untouched and commented out. The test was also ran without issues with LAM_SUP both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundatio... [2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linu...
Maciej Wieczor-Retman (3): selftests/lam: Move cpu_has_la57() to use cpuinfo flag selftests/lam: Skip test if LAM is disabled selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 120 ++++++++++++++++++++++++++++-- 1 file changed, 115 insertions(+), 5 deletions(-)
-- 2.47.1
On 1/15/25 02:06, Maciej Wieczor-Retman wrote:
Hello Shuah, I'd like to bump this series for visibility and ask if you still consider these patches okay to merge?
Just checked and there were no conflicts after applying it on the newest kselftest-next.
This is x86 test and usually goes through x86 tree. I can take this through kselftest tree if I get an ack from x86 maintainer.
On 2024-11-27 at 18:35:28 +0100, Maciej Wieczor-Retman wrote:
Recent change in how get_user() handles pointers [1] has a specific case for LAM. It assigns a different bitmask that's later used to check whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel) it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses get_user() in its implementation. Execute the syscall with differently tagged pointers to verify that valid user pointers are passing through and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The test was ran without issues with both the LAM lines from [1] untouched and commented out. The test was also ran without issues with LAM_SUP both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundatio... [2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linu...
Maciej Wieczor-Retman (3): selftests/lam: Move cpu_has_la57() to use cpuinfo flag selftests/lam: Skip test if LAM is disabled selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 120 ++++++++++++++++++++++++++++-- 1 file changed, 115 insertions(+), 5 deletions(-)
-- 2.47.1
thanks, -- Shuah
linux-kselftest-mirror@lists.linaro.org