mmap() fails to allocate 16GB virtual space chunk, skipping both low and
high VA range iterations. Hence, reduce MAP_CHUNK_SIZE to 1GB and update
relevant macros as required.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash(a)arm.com>
---
tools/testing/selftests/mm/virtual_address_range.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index c0592646ed93..50564512c5ee 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -15,11 +15,15 @@
/*
* Maximum address range mapped with a single mmap()
- * call is little bit more than 16GB. Hence 16GB is
+ * call is little bit more than 1GB. Hence 1GB is
* chosen as the single chunk size for address space
* mapping.
*/
-#define MAP_CHUNK_SIZE 17179869184UL /* 16GB */
+
+#define SZ_1GB (1024 * 1024 * 1024UL)
+#define SZ_1TB (1024 * 1024 * 1024 * 1024UL)
+
+#define MAP_CHUNK_SIZE SZ_1GB
/*
* Address space till 128TB is mapped without any hint
@@ -36,7 +40,7 @@
* are supported so far.
*/
-#define NR_CHUNKS_128TB 8192UL /* Number of 16GB chunks for 128TB */
+#define NR_CHUNKS_128TB ((128 * SZ_1TB) / MAP_CHUNK_SIZE) /* Number of chunks for 128TB */
#define NR_CHUNKS_256TB (NR_CHUNKS_128TB * 2UL)
#define NR_CHUNKS_384TB (NR_CHUNKS_128TB * 3UL)
--
2.30.2
Hi,
I of course took the opportunity at my first time to make a mistake: two
patches were missing in v1.. please note that patch #9 and #10 are newly
added.
Previous version:
v1: https://lore.kernel.org/rcu/20230315235454.2993-1-boqun.feng@gmail.com/
Changes since v1:
* Add two missing patches.
* Fix checkpatch warnings.
You will also be able to find the series at:
https://github.com/fbq/linux rcu/rcutorture.2023.03.20a
top commit is:
6bc6e6b27524
List of changes:
Bhaskar Chowdhury (1):
tools: rcu: Add usage function and check for argument
Paul E. McKenney (7):
rcutorture: Add test_nmis module parameter
rcutorture: Set CONFIG_BOOTPARAM_HOTPLUG_CPU0 to offline CPU 0
rcutorture: Make scenario TREE04 enable lazy call_rcu()
torture: Permit kvm-again.sh --duration to default to previous run
torture: Enable clocksource watchdog with "tsc=watchdog"
rcuscale: Move shutdown from wait_event() to wait_event_idle()
refscale: Move shutdown from wait_event() to wait_event_idle()
Yue Hu (1):
rcutorture: Eliminate variable n_rcu_torture_boost_rterror
Zqiang (1):
rcutorture: Create nocb kthreads only when testing rcu in
CONFIG_RCU_NOCB_CPU=y kernels
kernel/rcu/rcuscale.c | 7 ++-
kernel/rcu/rcutorture.c | 49 +++++++++++++++----
kernel/rcu/refscale.c | 2 +-
tools/rcu/extract-stall.sh | 26 +++++++---
.../selftests/rcutorture/bin/kvm-again.sh | 2 +-
.../selftests/rcutorture/bin/torture.sh | 6 +--
.../selftests/rcutorture/configs/rcu/TREE01 | 1 +
.../selftests/rcutorture/configs/rcu/TREE04 | 1 +
8 files changed, 69 insertions(+), 25 deletions(-)
mode change 100644 => 100755 tools/rcu/extract-stall.sh
--
2.38.1
This adds support for receiving KeyUpdate messages (RFC 8446, 4.6.3
[1]). A sender transmits a KeyUpdate message and then changes its TX
key. The receiver should react by updating its RX key before
processing the next message.
This patchset implements key updates by:
1. pausing decryption when a KeyUpdate message is received, to avoid
attempting to use the old key to decrypt a record encrypted with
the new key
2. returning -EKEYEXPIRED to syscalls that cannot receive the
KeyUpdate message, until the rekey has been performed by userspace
3. passing the KeyUpdate message to userspace as a control message
4. allowing updates of the crypto_info via the TLS_TX/TLS_RX
setsockopts
This API has been tested with gnutls to make sure that it allows
userspace libraries to implement key updates [2]. Thanks to Frantisek
Krenzelok <fkrenzel(a)redhat.com> for providing the implementation in
gnutls and testing the kernel patches.
Note: in a future series, I'll clean up tls_set_sw_offload and
eliminate the per-cipher copy-paste using tls_cipher_size_desc.
[1] https://www.rfc-editor.org/rfc/rfc8446#section-4.6.3
[2] https://gitlab.com/gnutls/gnutls/-/merge_requests/1625
Changes in v2
use reverse xmas tree ordering in tls_set_sw_offload and
do_tls_setsockopt_conf
turn the alt_crypto_info into an else if
selftests: add rekey_fail test
Vadim suggested simplifying tls_set_sw_offload by copying the new
crypto_info in the context in do_tls_setsockopt_conf, and then
detecting the rekey in tls_set_sw_offload based on whether the iv was
already set, but I don't think we can have a common error path
(otherwise we'd free the aead etc on rekey failure). I decided instead
to reorganize tls_set_sw_offload so that the context is unmodified
until we know the rekey cannot fail. Some fields will be touched
during the rekey, but they will be set to the same value they had
before the rekey (prot->rec_seq_size, etc).
Apoorv suggested to name the struct tls_crypto_info_keys "tls13"
rather than "tls12". Since we're using the same crypto_info data for
TLS1.3 as for 1.2, even if the tests only run for TLS1.3, I'd rather
keep the "tls12" name, in case we end up adding a
"tls13_crypto_info_aes_gcm_128" type in the future.
Kuniyuki and Apoorv also suggested preventing rekeys on RX when we
haven't received a matching KeyUpdate message, but I'd rather let
userspace handle this and have a symmetric API between TX and RX on
the kernel side. It's a bit of a foot-gun, but we can't really stop a
broken userspace from rolling back the rec_seq on an existing
crypto_info either, and that seems like a worse possible breakage.
Sabrina Dubroca (5):
tls: remove tls_context argument from tls_set_sw_offload
tls: block decryption when a rekey is pending
tls: implement rekey for TLS1.3
selftests: tls: add key_generation argument to tls_crypto_info_init
selftests: tls: add rekey tests
include/net/tls.h | 4 +
net/tls/tls.h | 3 +-
net/tls/tls_device.c | 2 +-
net/tls/tls_main.c | 37 +++-
net/tls/tls_sw.c | 189 +++++++++++++----
tools/testing/selftests/net/tls.c | 336 +++++++++++++++++++++++++++++-
6 files changed, 511 insertions(+), 60 deletions(-)
--
2.38.1
The purpose of this series is to improve/harden the security
provided by the Linux kernel's RPCSEC GSS Kerberos 5 mechanism.
There are lots of clean-ups in this series, but the pertinent
feature is the addition of a clean deprecation path for the DES-
and SHA1-based encryption types in accordance with Internet BCPs.
This series disables DES-based enctypes by default, provides a
mechanism for disabling SHA1-based enctypes, and introduces two
modern enctypes that do not use deprecated crypto algorithms.
Not only does that improve security for Kerberos 5 users, but it
also prepares SunRPC for eventually switching to a shared common
kernel Kerberos 5 implementation, which surely will not implement
any deprecated encryption types (in particular, DES-based ones).
Today, MIT supports both of the newly-introduced enctypes, but
Heimdal does not appear to. Thus distributions can enable and
disable kernel enctype support to match the set of enctypes
supported in their user space Kerberos libraries.
Scott has been kicking the tires -- we've found no regressions with
the current SHA1-based enctypes, while the new ones are disabled by
default until we have an opportunity for interop testing. The KUnit
tests for the new enctypes pass and this implementation successfully
interoperates with itself using these enctypes. Therefore I believe
it to be safe to merge.
When this series gets merged, the Linux NFS community should select
and announce a date-certain for removal of SunRPC's DES-based
enctype code.
---
Changes since v1:
- Addressed Simo's NAK on "SUNRPC: Improve Kerberos confounder generation"
- Added Cc: linux-kselftest@ for review of the KUnit-related patches
Chuck Lever (41):
SUNRPC: Add header ifdefs to linux/sunrpc/gss_krb5.h
SUNRPC: Remove .blocksize field from struct gss_krb5_enctype
SUNRPC: Remove .conflen field from struct gss_krb5_enctype
SUNRPC: Improve Kerberos confounder generation
SUNRPC: Obscure Kerberos session key
SUNRPC: Refactor set-up for aux_cipher
SUNRPC: Obscure Kerberos encryption keys
SUNRPC: Obscure Kerberos signing keys
SUNRPC: Obscure Kerberos integrity keys
SUNRPC: Refactor the GSS-API Per Message calls in the Kerberos mechanism
SUNRPC: Remove another switch on ctx->enctype
SUNRPC: Add /proc/net/rpc/gss_krb5_enctypes file
NFSD: Replace /proc/fs/nfsd/supported_krb5_enctypes with a symlink
SUNRPC: Replace KRB5_SUPPORTED_ENCTYPES macro
SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES
SUNRPC: Remove ->encrypt and ->decrypt methods from struct gss_krb5_enctype
SUNRPC: Rename .encrypt_v2 and .decrypt_v2 methods
SUNRPC: Hoist KDF into struct gss_krb5_enctype
SUNRPC: Clean up cipher set up for v1 encryption types
SUNRPC: Parametrize the key length passed to context_v2_alloc_cipher()
SUNRPC: Add new subkey length fields
SUNRPC: Refactor CBC with CTS into helpers
SUNRPC: Add gk5e definitions for RFC 8009 encryption types
SUNRPC: Add KDF-HMAC-SHA2
SUNRPC: Add RFC 8009 encryption and decryption functions
SUNRPC: Advertise support for RFC 8009 encryption types
SUNRPC: Support the Camellia enctypes
SUNRPC: Add KDF_FEEDBACK_CMAC
SUNRPC: Advertise support for the Camellia encryption types
SUNRPC: Move remaining internal definitions to gss_krb5_internal.h
SUNRPC: Add KUnit tests for rpcsec_krb5.ko
SUNRPC: Export get_gss_krb5_enctype()
SUNRPC: Add KUnit tests RFC 3961 Key Derivation
SUNRPC: Add Kunit tests for RFC 3962-defined encryption/decryption
SUNRPC: Add KDF KUnit tests for the RFC 6803 encryption types
SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types
SUNRPC: Add encryption KUnit tests for the RFC 6803 encryption types
SUNRPC: Add KDF-HMAC-SHA2 Kunit tests
SUNRPC: Add RFC 8009 checksum KUnit tests
SUNRPC: Add RFC 8009 encryption KUnit tests
SUNRPC: Add encryption self-tests
fs/nfsd/nfsctl.c | 74 +-
include/linux/sunrpc/gss_krb5.h | 196 +--
include/linux/sunrpc/gss_krb5_enctypes.h | 41 -
net/sunrpc/.kunitconfig | 30 +
net/sunrpc/Kconfig | 96 +-
net/sunrpc/auth_gss/Makefile | 2 +
net/sunrpc/auth_gss/auth_gss.c | 17 +
net/sunrpc/auth_gss/gss_krb5_crypto.c | 656 +++++--
net/sunrpc/auth_gss/gss_krb5_internal.h | 232 +++
net/sunrpc/auth_gss/gss_krb5_keys.c | 416 ++++-
net/sunrpc/auth_gss/gss_krb5_mech.c | 730 +++++---
net/sunrpc/auth_gss/gss_krb5_seal.c | 122 +-
net/sunrpc/auth_gss/gss_krb5_seqnum.c | 2 +
net/sunrpc/auth_gss/gss_krb5_test.c | 2040 ++++++++++++++++++++++
net/sunrpc/auth_gss/gss_krb5_unseal.c | 63 +-
net/sunrpc/auth_gss/gss_krb5_wrap.c | 124 +-
net/sunrpc/auth_gss/svcauth_gss.c | 65 +
17 files changed, 4001 insertions(+), 905 deletions(-)
delete mode 100644 include/linux/sunrpc/gss_krb5_enctypes.h
create mode 100644 net/sunrpc/.kunitconfig
create mode 100644 net/sunrpc/auth_gss/gss_krb5_internal.h
create mode 100644 net/sunrpc/auth_gss/gss_krb5_test.c
--
Chuck Lever
Changes in v2:
* Dropped patches which were pulled into maintainer trees.
* Split BPF patches out into another series targeting bpf-next.
* trace-agent now falls back to debugfs if tracefs isn't present.
* Added Acked-by from mst(a)redhat.com to series.
* Added a typo fixup for the virtio-trace README.
Steven, assuming there are no objections, would you feel comfortable
taking this series through your tree?
---
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
But, from Documentation/trace/ftrace.rst:
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
There are many places where this older debugfs path is still used in
code comments, selftests, examples and tools, so let's update them to
avoid confusion.
I've broken up the series as best I could by maintainer or directory,
and I've only sent people the patches that I think they care about to
avoid spamming everyone.
Ross Zwisler (6):
tracing: always use canonical ftrace path
selftests: use canonical ftrace path
leaking_addresses: also skip canonical ftrace path
tools/kvm_stat: use canonical ftrace path
tools/virtio: use canonical ftrace path
tools/virtio: fix typo in README instructions
include/linux/kernel.h | 2 +-
include/linux/tracepoint.h | 4 ++--
kernel/trace/Kconfig | 20 +++++++++----------
kernel/trace/kprobe_event_gen_test.c | 2 +-
kernel/trace/ring_buffer.c | 2 +-
kernel/trace/synth_event_gen_test.c | 2 +-
kernel/trace/trace.c | 2 +-
samples/user_events/example.c | 4 ++--
scripts/leaking_addresses.pl | 1 +
scripts/tracing/draw_functrace.py | 6 +++---
tools/kvm/kvm_stat/kvm_stat | 2 +-
tools/lib/api/fs/tracing_path.c | 4 ++--
.../testing/selftests/user_events/dyn_test.c | 2 +-
.../selftests/user_events/ftrace_test.c | 10 +++++-----
.../testing/selftests/user_events/perf_test.c | 8 ++++----
tools/testing/selftests/vm/protection_keys.c | 4 ++--
tools/tracing/latency/latency-collector.c | 2 +-
tools/virtio/virtio-trace/README | 4 ++--
tools/virtio/virtio-trace/trace-agent.c | 12 +++++++----
19 files changed, 49 insertions(+), 44 deletions(-)
--
2.39.1.637.g21b0678d19-goog
From: Xiaoyan Li <lixiaoyan(a)google.com>
When compound pages are enabled, although the mm layer still
returns an array of page pointers, a subset (or all) of them
may have the same page head since a max 180kb skb can span 2
hugepages if it is on the boundary, be a mix of pages and 1 hugepage,
or fit completely in a hugepage. Instead of referencing page head
on all page pointers, use page length arithmetic to only call page
head when referencing a known different page head to avoid touching
a cold cacheline.
Tested:
See next patch with changes to tcp_mmap
Correntess:
On a pair of separate hosts as send with MSG_ZEROCOPY will
force a copy on tx if using loopback alone, check that the SHA
on the message sent is equivalent to checksum on the message received,
since the current program already checks for the length.
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
./tcp_mmap -s -z
./tcp_mmap -H $DADDR -z
SHA256 is correct
received 2 MB (100 % mmap'ed) in 0.005914 s, 2.83686 Gbit
cpu usage user:0.001984 sys:0.000963, 1473.5 usec per MB, 10 c-switches
Performance:
Run neper between adjacent hosts with the same config
tcp_stream -Z --skip-rx-copy -6 -T 20 -F 1000 --stime-use-proc --test-length=30
Before patch: stime_end=37.670000
After patch: stime_end=30.310000
Signed-off-by: Coco Li <lixiaoyan(a)google.com>
---
net/core/datagram.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index e4ff2db40c98..5662dff3d381 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -622,12 +622,12 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
frag = skb_shinfo(skb)->nr_frags;
while (length && iov_iter_count(from)) {
+ struct page *head, *last_head = NULL;
struct page *pages[MAX_SKB_FRAGS];
- struct page *last_head = NULL;
+ int refs, order, n = 0;
size_t start;
ssize_t copied;
unsigned long truesize;
- int refs, n = 0;
if (frag == MAX_SKB_FRAGS)
return -EMSGSIZE;
@@ -650,9 +650,17 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
} else {
refcount_add(truesize, &skb->sk->sk_wmem_alloc);
}
+
+ head = compound_head(pages[n]);
+ order = compound_order(head);
+
for (refs = 0; copied != 0; start = 0) {
int size = min_t(int, copied, PAGE_SIZE - start);
- struct page *head = compound_head(pages[n]);
+
+ if (pages[n] - head > (1UL << order) - 1) {
+ head = compound_head(pages[n]);
+ order = compound_order(head);
+ }
start += (pages[n] - head) << PAGE_SHIFT;
copied -= size;
--
2.40.0.rc1.284.g88254d51c5-goog
Currently, anonymous PTE-mapped THPs cannot be collapsed in-place:
collapsing (e.g., via MADV_COLLAPSE) implies allocating a fresh THP and
mapping that new THP via a PMD: as it's a fresh anon THP, it will get the
exclusive flag set on the head page and everybody is happy.
However, if the kernel would ever support in-place collapse of anonymous
THPs (replacing a page table mapping each sub-page of a THP via PTEs with a
single PMD mapping the complete THP), exclusivity information stored for
each sub-page would have to be collapsed accordingly:
(1) All PTEs map !exclusive anon sub-pages: the in-place collapsed THP
must not not have the exclusive flag set on the head page mapped by
the PMD. This is the easiest case to handle ("simply don't set any
exclusive flags").
(2) All PTEs map exclusive anon sub-pages: when collapsing, we have to
clear the exclusive flag from all tail pages and only leave the
exclusive flag set for the head page. Otherwise, fork() after
collapse would not clear the exclusive flags from the tail pages
and we'd be in trouble once PTE-mapping the shared THP when writing
to shared tail pages that still have the exclusive flag set. This
would effectively revert what the PTE-mapping code does when
propagating the exclusive flag to all sub-pages.
(3) PTEs map a mixture of exclusive and !exclusive anon sub-pages (can
happen e.g., due to MADV_DONTFORK before fork()). We must not
collapse the THP in-place, otherwise bad things may happen:
the exclusive flags of sub-pages would get ignored and the
exclusive flag of the head page would get used instead.
Now that we have MADV_COLLAPSE in place to trigger collapsing a THP,
let's add some test cases that would bail out early, if we'd
voluntarily/accidantially unlock in-place collapse for anon THPs and
forget about taking proper care of exclusive flags.
Running the test on a kernel with MADV_COLLAPSE support:
# [INFO] Anonymous THP tests
# [RUN] Basic COW after fork() when collapsing before fork()
ok 169 No leak from parent into child
# [RUN] Basic COW after fork() when collapsing after fork() (fully shared)
ok 170 # SKIP MADV_COLLAPSE failed: Invalid argument
# [RUN] Basic COW after fork() when collapsing after fork() (lower shared)
ok 171 No leak from parent into child
# [RUN] Basic COW after fork() when collapsing after fork() (upper shared)
ok 172 No leak from parent into child
For now, MADV_COLLAPSE always seems to fail if all PTEs map shared
sub-pages.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Zach O'Keefe <zokeefe(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
A patch from Hugh made me explore the wonderful world of in-place collapse
of THP, and I was briefly concerned that it would apply to anon THP as
well. After thinking about it a bit, I decided to add test cases, to better
be safe than sorry in any case, and to document how PG_anon_exclusive is to
be handled in that case.
---
tools/testing/selftests/vm/cow.c | 228 +++++++++++++++++++++++++++++++
1 file changed, 228 insertions(+)
diff --git a/tools/testing/selftests/vm/cow.c b/tools/testing/selftests/vm/cow.c
index 26f6ea3079e2..16216d893d96 100644
--- a/tools/testing/selftests/vm/cow.c
+++ b/tools/testing/selftests/vm/cow.c
@@ -30,6 +30,10 @@
#include "../kselftest.h"
#include "vm_util.h"
+#ifndef MADV_COLLAPSE
+#define MADV_COLLAPSE 25
+#endif
+
static size_t pagesize;
static int pagemap_fd;
static size_t thpsize;
@@ -1178,6 +1182,228 @@ static int tests_per_anon_test_case(void)
return tests;
}
+enum anon_thp_collapse_test {
+ ANON_THP_COLLAPSE_UNSHARED,
+ ANON_THP_COLLAPSE_FULLY_SHARED,
+ ANON_THP_COLLAPSE_LOWER_SHARED,
+ ANON_THP_COLLAPSE_UPPER_SHARED,
+};
+
+static void do_test_anon_thp_collapse(char *mem, size_t size,
+ enum anon_thp_collapse_test test)
+{
+ struct comm_pipes comm_pipes;
+ char buf;
+ int ret;
+
+ ret = setup_comm_pipes(&comm_pipes);
+ if (ret) {
+ ksft_test_result_fail("pipe() failed\n");
+ return;
+ }
+
+ /*
+ * Trigger PTE-mapping the THP by temporarily mapping a single subpage
+ * R/O, such that we can try collapsing it later.
+ */
+ ret = mprotect(mem + pagesize, pagesize, PROT_READ);
+ if (ret) {
+ ksft_test_result_fail("mprotect() failed\n");
+ goto close_comm_pipes;
+ }
+ ret = mprotect(mem + pagesize, pagesize, PROT_READ | PROT_WRITE);
+ if (ret) {
+ ksft_test_result_fail("mprotect() failed\n");
+ goto close_comm_pipes;
+ }
+
+ switch (test) {
+ case ANON_THP_COLLAPSE_UNSHARED:
+ /* Collapse before actually COW-sharing the page. */
+ ret = madvise(mem, size, MADV_COLLAPSE);
+ if (ret) {
+ ksft_test_result_skip("MADV_COLLAPSE failed: %s\n",
+ strerror(errno));
+ goto close_comm_pipes;
+ }
+ break;
+ case ANON_THP_COLLAPSE_FULLY_SHARED:
+ /* COW-share the full PTE-mapped THP. */
+ break;
+ case ANON_THP_COLLAPSE_LOWER_SHARED:
+ /* Don't COW-share the upper part of the THP. */
+ ret = madvise(mem + size / 2, size / 2, MADV_DONTFORK);
+ if (ret) {
+ ksft_test_result_fail("MADV_DONTFORK failed\n");
+ goto close_comm_pipes;
+ }
+ break;
+ case ANON_THP_COLLAPSE_UPPER_SHARED:
+ /* Don't COW-share the lower part of the THP. */
+ ret = madvise(mem, size / 2, MADV_DONTFORK);
+ if (ret) {
+ ksft_test_result_fail("MADV_DONTFORK failed\n");
+ goto close_comm_pipes;
+ }
+ break;
+ default:
+ assert(false);
+ }
+
+ ret = fork();
+ if (ret < 0) {
+ ksft_test_result_fail("fork() failed\n");
+ goto close_comm_pipes;
+ } else if (!ret) {
+ switch (test) {
+ case ANON_THP_COLLAPSE_UNSHARED:
+ case ANON_THP_COLLAPSE_FULLY_SHARED:
+ exit(child_memcmp_fn(mem, size, &comm_pipes));
+ break;
+ case ANON_THP_COLLAPSE_LOWER_SHARED:
+ exit(child_memcmp_fn(mem, size / 2, &comm_pipes));
+ break;
+ case ANON_THP_COLLAPSE_UPPER_SHARED:
+ exit(child_memcmp_fn(mem + size / 2, size / 2,
+ &comm_pipes));
+ break;
+ default:
+ assert(false);
+ }
+ }
+
+ while (read(comm_pipes.child_ready[0], &buf, 1) != 1)
+ ;
+
+ switch (test) {
+ case ANON_THP_COLLAPSE_UNSHARED:
+ break;
+ case ANON_THP_COLLAPSE_UPPER_SHARED:
+ case ANON_THP_COLLAPSE_LOWER_SHARED:
+ /*
+ * Revert MADV_DONTFORK such that we merge the VMAs and are
+ * able to actually collapse.
+ */
+ ret = madvise(mem, size, MADV_DOFORK);
+ if (ret) {
+ ksft_test_result_fail("MADV_DOFORK failed\n");
+ write(comm_pipes.parent_ready[1], "0", 1);
+ wait(&ret);
+ goto close_comm_pipes;
+ }
+ /* FALLTHROUGH */
+ case ANON_THP_COLLAPSE_FULLY_SHARED:
+ /* Collapse before anyone modified the COW-shared page. */
+ ret = madvise(mem, size, MADV_COLLAPSE);
+ if (ret) {
+ ksft_test_result_skip("MADV_COLLAPSE failed: %s\n",
+ strerror(errno));
+ write(comm_pipes.parent_ready[1], "0", 1);
+ wait(&ret);
+ goto close_comm_pipes;
+ }
+ break;
+ default:
+ assert(false);
+ }
+
+ /* Modify the page. */
+ memset(mem, 0xff, size);
+ write(comm_pipes.parent_ready[1], "0", 1);
+
+ wait(&ret);
+ if (WIFEXITED(ret))
+ ret = WEXITSTATUS(ret);
+ else
+ ret = -EINVAL;
+
+ ksft_test_result(!ret, "No leak from parent into child\n");
+close_comm_pipes:
+ close_comm_pipes(&comm_pipes);
+}
+
+static void test_anon_thp_collapse_unshared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_UNSHARED);
+}
+
+static void test_anon_thp_collapse_fully_shared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_FULLY_SHARED);
+}
+
+static void test_anon_thp_collapse_lower_shared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_LOWER_SHARED);
+}
+
+static void test_anon_thp_collapse_upper_shared(char *mem, size_t size)
+{
+ do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_UPPER_SHARED);
+}
+
+/*
+ * Test cases that are specific to anonymous THP: pages in private mappings
+ * that may get shared via COW during fork().
+ */
+static const struct test_case anon_thp_test_cases[] = {
+ /*
+ * Basic COW test for fork() without any GUP when collapsing a THP
+ * before fork().
+ *
+ * Re-mapping a PTE-mapped anon THP using a single PMD ("in-place
+ * collapse") might easily get COW handling wrong when not collapsing
+ * exclusivity information properly.
+ */
+ {
+ "Basic COW after fork() when collapsing before fork()",
+ test_anon_thp_collapse_unshared,
+ },
+ /* Basic COW test, but collapse after COW-sharing a full THP. */
+ {
+ "Basic COW after fork() when collapsing after fork() (fully shared)",
+ test_anon_thp_collapse_fully_shared,
+ },
+ /*
+ * Basic COW test, but collapse after COW-sharing the lower half of a
+ * THP.
+ */
+ {
+ "Basic COW after fork() when collapsing after fork() (lower shared)",
+ test_anon_thp_collapse_lower_shared,
+ },
+ /*
+ * Basic COW test, but collapse after COW-sharing the upper half of a
+ * THP.
+ */
+ {
+ "Basic COW after fork() when collapsing after fork() (upper shared)",
+ test_anon_thp_collapse_upper_shared,
+ },
+};
+
+static void run_anon_thp_test_cases(void)
+{
+ int i;
+
+ if (!thpsize)
+ return;
+
+ ksft_print_msg("[INFO] Anonymous THP tests\n");
+
+ for (i = 0; i < ARRAY_SIZE(anon_thp_test_cases); i++) {
+ struct test_case const *test_case = &anon_thp_test_cases[i];
+
+ ksft_print_msg("[RUN] %s\n", test_case->desc);
+ do_run_with_thp(test_case->fn, THP_RUN_PMD);
+ }
+}
+
+static int tests_per_anon_thp_test_case(void)
+{
+ return thpsize ? 1 : 0;
+}
+
typedef void (*non_anon_test_fn)(char *mem, const char *smem, size_t size);
static void test_cow(char *mem, const char *smem, size_t size)
@@ -1518,6 +1744,7 @@ int main(int argc, char **argv)
ksft_print_header();
ksft_set_plan(ARRAY_SIZE(anon_test_cases) * tests_per_anon_test_case() +
+ ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() +
ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case());
gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR);
@@ -1526,6 +1753,7 @@ int main(int argc, char **argv)
ksft_exit_fail_msg("opening pagemap failed\n");
run_anon_test_cases();
+ run_anon_thp_test_cases();
run_non_anon_test_cases();
err = ksft_get_fail_cnt();
--
2.39.0
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1]. I'm about to send a v2
of that series out that incorporates the vDSO function.
I was asked about the performance delta between this and something like
sysfs. I created a small test program [2] and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://public-inbox.org/libc-alpha/20230206194819.1679472-1-evan@rivosinc.…
[2] https://pastebin.com/x84NEKaS
Changes in v4:
- Used real types in syscall prototypes (Arnd)
- Fixed static line break in do_riscv_hwprobe() (Conor)
- Added newlines between documentation lists (Conor)
- Crispen up size types to size_t, and cpu indices to int (Joe)
- Fix copy_from_user() return logic bug (found via kselftests!)
- Add __user to SYSCALL_DEFINE() to fix warning
- More newlines in BASE_BEHAVIOR_IMA documentation (Conor)
- Add newlines to CPUPERF_0 documentation (Conor)
- Add UNSUPPORTED value (Conor)
- Switched from DT to alternatives-based probing (Rob)
- Crispen up cpu index type to always be int (Conor)
- Fixed selftests commit description, no more tiny libc (Mark Brown)
- Fixed selftest syscall prototype types to match v4.
- Added a prototype to fix -Wmissing-prototype warning (lkp(a)intel.com)
- Fixed rv32 build failure (lkp(a)intel.com)
- Make vdso prototype match syscall types update
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
- Introduced vDSO function
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
Evan Green (6):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface
RISC-V: Add hwprobe vDSO function and data
Documentation/riscv/hwprobe.rst | 86 +++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/Kconfig | 1 +
arch/riscv/errata/thead/errata.c | 9 +
arch/riscv/include/asm/alternative.h | 5 +
arch/riscv/include/asm/cpufeature.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 13 +
arch/riscv/include/asm/syscall.h | 4 +
arch/riscv/include/asm/vdso/data.h | 17 ++
arch/riscv/include/asm/vdso/gettimeofday.h | 8 +
arch/riscv/include/uapi/asm/hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/unistd.h | 9 +
arch/riscv/kernel/alternative.c | 19 ++
arch/riscv/kernel/cpu.c | 8 +-
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/smpboot.c | 1 +
arch/riscv/kernel/sys_riscv.c | 225 +++++++++++++++++-
arch/riscv/kernel/vdso.c | 6 -
arch/riscv/kernel/vdso/Makefile | 2 +
arch/riscv/kernel/vdso/hwprobe.c | 52 ++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 ++
arch/riscv/kernel/vdso/vdso.lds.S | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 +++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 90 +++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +
27 files changed, 703 insertions(+), 13 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
--
2.25.1
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they almost all
depend on each other.
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
Stephen Boyd (11):
of: Load KUnit DTB from of_core_init()
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
.../clock/test,clk-kunit-parent-data.yaml | 47 ++
.../kunit/test,clk-kunit-fixed-rate.yaml | 35 ++
.../bindings/test/test,kunit-empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 108 +++++
drivers/base/test/platform_kunit.c | 186 +++++++
drivers/clk/.kunitconfig | 3 +
drivers/clk/Kconfig | 7 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 299 ++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 219 +++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 459 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 5 +
drivers/of/Kconfig | 23 +
drivers/of/Makefile | 7 +
drivers/of/base.c | 182 +++++++
drivers/of/kunit.dtso | 10 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 123 +++++
drivers/of/of_private.h | 6 +
drivers/of/of_test.c | 43 ++
drivers/of/overlay_test.c | 107 ++++
drivers/of/unittest.c | 101 +---
include/kunit/clk.h | 28 ++
include/kunit/of.h | 90 ++++
include/kunit/platform_device.h | 15 +
31 files changed, 2119 insertions(+), 102 deletions(-)
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-kunit-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/kunit/test,clk-kunit-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,kunit-empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/.kunitconfig
create mode 100644 drivers/of/kunit.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/of_test.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git