On 6/14/24 12:06 PM, Jason A. Donenfeld wrote:
> Hook up the generic vDSO implementation to the x86 vDSO data page. Since
> the existing vDSO infrastructure is heavily based on the timekeeping
> functionality, which works over arrays of bases, a new macro is
> introduced for vvars that are not arrays.
>
> The vDSO function requires a ChaCha20 implementation that does not write
> to the stack, yet can still do an entire ChaCha20 permutation, so
> provide this using SSE2, since this is userland code that must work on
> all x86-64 processors. There's a simple test for this code as well.
>
> Reviewed-by: Samuel Neves <sneves(a)dei.uc.pt> # for vgetrandom-chacha.S
> Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/entry/vdso/Makefile | 3 +-
> arch/x86/entry/vdso/vdso.lds.S | 2 +
> arch/x86/entry/vdso/vgetrandom-chacha.S | 178 ++++++++++++++++++
> arch/x86/entry/vdso/vgetrandom.c | 17 ++
> arch/x86/include/asm/vdso/getrandom.h | 55 ++++++
> arch/x86/include/asm/vdso/vsyscall.h | 2 +
> arch/x86/include/asm/vvar.h | 16 ++
> tools/testing/selftests/vDSO/.gitignore | 1 +
> tools/testing/selftests/vDSO/Makefile | 13 ++
> .../testing/selftests/vDSO/vdso_test_chacha.c | 43 +++++
Hi Jason,
This is a large patch, so it might be helpful to split out the selftests
into their own patch. In fact, my comments here are only about those.
I'm adding linux-kselftest to Cc for visibility, and I've also Cc'd you
on a related selftests/vDSO series I just now posted [1].
In fact, I think it might work well if you insert patches 2/3 and 3/3
from that series, and build on top of those for the
selftests/vDSO/Makefile. See below for details.
...
> diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile
> index a33b4d200a32..8b87ebea1630 100644
> --- a/tools/testing/selftests/vDSO/Makefile
> +++ b/tools/testing/selftests/vDSO/Makefile
> @@ -3,6 +3,7 @@ include ../lib.mk
>
> uname_M := $(shell uname -m 2>/dev/null || echo not)
> ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
> +SODIUM := $(shell pkg-config --libs libsodium 2>/dev/null)
>
> TEST_GEN_PROGS := $(OUTPUT)/vdso_test_gettimeofday $(OUTPUT)/vdso_test_getcpu
> TEST_GEN_PROGS += $(OUTPUT)/vdso_test_abi
> @@ -12,9 +13,15 @@ TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
> endif
> TEST_GEN_PROGS += $(OUTPUT)/vdso_test_correctness
> TEST_GEN_PROGS += $(OUTPUT)/vdso_test_getrandom
> +ifeq ($(uname_M),x86_64)
> +ifneq ($(SODIUM),)
> +TEST_GEN_PROGS += $(OUTPUT)/vdso_test_chacha
Unfortunately, this is "pre-existing wrong". :) That is, it is following
a pre-existing pattern that is broken: the $(OUTPUT) is not supposed to
be part of TEST_GEN_PROGS. Fixing it requires other changes, though, as
I've done in [2].
[1] https://lore.kernel.org/all/20240614233105.265009-1-jhubbard@nvidia.com/
[2] https://lore.kernel.org/all/20240614233105.265009-3-jhubbard@nvidia.com/
[3] https://lore.kernel.org/all/20240614233105.265009-4-jhubbard@nvidia.com/
> +endif
> +endif
>
> CFLAGS := -std=gnu99
> CFLAGS_vdso_standalone_test_x86 := -nostdlib -fno-asynchronous-unwind-tables -fno-stack-protector
> +CFLAGS_vdso_test_chacha := $(SODIUM) -idirafter $(top_srcdir)/include -idirafter $(top_srcdir)/arch/$(ARCH)/include -idirafter include -D__ASSEMBLY__ -DBULID_VDSO -DCONFIG_FUNCTION_ALIGNMENT=0 -Wa,--noexecstack
Line breaks via "\" are allowed in Makefiles. Might need two or three here.
> LDFLAGS_vdso_test_correctness := -ldl
> ifeq ($(CONFIG_X86_32),y)
> LDLIBS += -lgcc_s
> @@ -35,3 +42,9 @@ $(OUTPUT)/vdso_test_correctness: vdso_test_correctness.c
> -o $@ \
> $(LDFLAGS_vdso_test_correctness)
> $(OUTPUT)/vdso_test_getrandom: parse_vdso.c
> +$(OUTPUT)/vdso_test_chacha: CFLAGS += $(CFLAGS_vdso_test_chacha)
This also follows an unfortunate pattern, which I've also fixed just today
in a patch [3]. Please just add to CFLAGS directly, rather than creating
these name-mangled intermediate variables. See [3] for how that looks.
> +$(OUTPUT)/vdso_test_chacha: $(top_srcdir)/arch/$(ARCH)/entry/vdso/vgetrandom-chacha.S
> +$(OUTPUT)/vdso_test_chacha: include/asm/rwonce.h
> +include/asm/rwonce.h:
> + mkdir -p include/asm
> + touch $@
thanks,
--
John Hubbard
NVIDIA
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page.
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In case of a transparent hugepage, it will be split into 4K pages
as well; userspace will stop enjoying the transparent performance.
In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to. The interface to userspace is a
new sysctl called enable_soft_offline under /proc/sys/vm. By default
enable_soft_line is 1 to preserve existing behavior in kernel.
Changelog
v1 => v2:
* incorporate feedbacks from both Miaohe Lin <linmiaohe(a)huawei.com> and
Jane Chu <jane.chu(a)oracle.com>.
* make the switch to control all pages, instead of HugeTLB specific.
* change the API from
/sys/kernel/mm/hugepages/hugepages-${size}kB/softoffline_corrected_errors
to /proc/sys/vm/enable_soft_offline.
* minor update to test code.
* update documentation of the user control API.
* v2 is based on commit 83a7eefedc9b ("Linux 6.10-rc3").
Jiaqi Yan (3):
mm/memory-failure: userspace controls soft-offlining pages
selftest/mm: test enable_soft_offline behaviors
docs: mm: add enable_soft_offline sysctl
Documentation/admin-guide/sysctl/vm.rst | 15 +
mm/memory-failure.c | 16 ++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 258 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
6 files changed, 295 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.2.505.gda0bf45e8d-goog
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit cb708ab9f584f159798b60853edcf0c8b67ce295 ]
It's slightly better to set _GNU_SOURCE in the source code, but if one
must do it via the compiler invocation, then the best way to do so is
this:
$(CC) -D_GNU_SOURCE=
...because otherwise, if this form is used:
$(CC) -D_GNU_SOURCE
...then that leads the compiler to set a value, as if you had passed in:
$(CC) -D_GNU_SOURCE=1
That, in turn, leads to warnings under both gcc and clang, like this:
futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined
Fix this by using the "-D_GNU_SOURCE=" form.
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/futex/functional/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index a392d0917b4e5..994fa3468f170 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
INCLUDES := -I../include -I../../ $(KHDR_INCLUDES)
-CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) $(KHDR_INCLUDES)
+CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE= -pthread $(INCLUDES) $(KHDR_INCLUDES)
LDLIBS := -lpthread -lrt
LOCAL_HDRS := \
--
2.43.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit cb708ab9f584f159798b60853edcf0c8b67ce295 ]
It's slightly better to set _GNU_SOURCE in the source code, but if one
must do it via the compiler invocation, then the best way to do so is
this:
$(CC) -D_GNU_SOURCE=
...because otherwise, if this form is used:
$(CC) -D_GNU_SOURCE
...then that leads the compiler to set a value, as if you had passed in:
$(CC) -D_GNU_SOURCE=1
That, in turn, leads to warnings under both gcc and clang, like this:
futex_requeue_pi.c:20: warning: "_GNU_SOURCE" redefined
Fix this by using the "-D_GNU_SOURCE=" form.
Reviewed-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/futex/functional/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index a392d0917b4e5..994fa3468f170 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
INCLUDES := -I../include -I../../ $(KHDR_INCLUDES)
-CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) $(KHDR_INCLUDES)
+CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE= -pthread $(INCLUDES) $(KHDR_INCLUDES)
LDLIBS := -lpthread -lrt
LOCAL_HDRS := \
--
2.43.0