Use ARRAY_SIZE instead of dividing sizeof array with sizeof an
element.
Clean up the following coccicheck warning:
./tools/testing/selftests/x86/syscall_numbering.c:316:35-36: WARNING:
Use ARRAY_SIZE.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
tools/testing/selftests/x86/syscall_numbering.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/syscall_numbering.c b/tools/testing/selftests/x86/syscall_numbering.c
index 9915917..7d5e246 100644
--- a/tools/testing/selftests/x86/syscall_numbering.c
+++ b/tools/testing/selftests/x86/syscall_numbering.c
@@ -313,7 +313,7 @@ static void test_syscall_numbering(void)
* The MSB is supposed to be ignored, so we loop over a few
* to test that out.
*/
- for (size_t i = 0; i < sizeof(msbs)/sizeof(msbs[0]); i++) {
+ for (size_t i = 0; i < ARRAY_SIZE(msbs); i++) {
int msb = msbs[i];
run("Checking system calls with msb = %d (0x%x)\n",
msb, msb);
--
1.8.3.1
(39fe2fc96694 "selftests: kvm: make allocation of extra memory take effect")
changed the meaning of extra_mem_pages and treated it as slot0 memory size.
In fact extra_mem_pages is used for non-slot0 memory size, there is no custom
slot0 memory size support. See discuss in https://lkml.org/lkml/2021/6/3/551
for more details.
This patchset restores extra_mem_pages's original meaning and adds support for
custom slot0 memory with a new parameter slot0_mem_pages.
Run below command, all 39 tests passed.
# make -C tools/testing/selftests/ TARGETS=kvm run_tests
Zhenzhong Duan (3):
Revert "selftests: kvm: make allocation of extra memory take effect"
Revert "selftests: kvm: fix overlapping addresses in
memslot_perf_test"
selftests: kvm: Add support for customized slot0 memory size
.../testing/selftests/kvm/include/kvm_util.h | 7 +--
.../selftests/kvm/kvm_page_table_test.c | 2 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 47 +++++++++++++++----
.../selftests/kvm/lib/perf_test_util.c | 2 +-
.../testing/selftests/kvm/memslot_perf_test.c | 2 +-
5 files changed, 45 insertions(+), 15 deletions(-)
--
2.25.1
SRv6 End.DT46 Behavior is defined in the IETF RFC 8986 [1] along with SRv6
End.DT4 and End.DT6 Behaviors.
The proposed End.DT46 implementation is meant to support the decapsulation
of both IPv4 and IPv6 traffic coming from a *single* SRv6 tunnel.
The SRv6 End.DT46 Behavior greatly simplifies the setup and operations of
SRv6 VPNs in the Linux kernel.
- patch 1/2 is the core patch that adds support for the SRv6 End.DT46
Behavior;
- patch 2/2 adds the selftest for SRv6 End.DT46 Behavior.
The patch introducing the new SRv6 End.DT46 Behavior in iproute2 will
follow shortly.
Comments, suggestions and improvements are very welcome as always!
Thanks,
Andrea
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
Andrea Mayer (2):
seg6: add support for SRv6 End.DT46 Behavior
selftests: seg6: add selftest for SRv6 End.DT46 Behavior
include/uapi/linux/seg6_local.h | 2 +
net/ipv6/seg6_local.c | 94 ++-
.../selftests/net/srv6_end_dt46_l3vpn_test.sh | 573 ++++++++++++++++++
3 files changed, 647 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
--
2.20.1
Hello,
I plan to contribute to n_gsm for better support of 3GPP TS 27.010.
Is there any test framework/suite for this module that I can use to avoid regressions?
I could not find anything within the kselftest framework.
And if there is nothing already available: What would be the best place for this?
With best regards,
Daniel Starke
Bcc:
Subject: Re: [PATCH v4 07/15] docs: locking: futex2: Add documentation
Reply-To:
In-Reply-To: <20210603195924.361327-8-andrealmeid(a)collabora.com>
On Thu, 03 Jun 2021, Andr� Almeida wrote:
>Add a new documentation file specifying both userspace API and internal
>implementation details of futex2 syscalls.
I think equally important would be to provide a manpage for each new
syscall you are introducing, and keep mkt in the loop as in the past he
extensively documented and improved futex manpages, and overall has a
lot of experience with dealing with kernel interfaces.
Thanks,
Davidlohr
>
>Signed-off-by: André Almeida <andrealmeid(a)collabora.com>
>---
> Documentation/locking/futex2.rst | 198 +++++++++++++++++++++++++++++++
> Documentation/locking/index.rst | 1 +
> 2 files changed, 199 insertions(+)
> create mode 100644 Documentation/locking/futex2.rst
>
>diff --git a/Documentation/locking/futex2.rst b/Documentation/locking/futex2.rst
>new file mode 100644
>index 000000000000..2f74d7c97a55
>--- /dev/null
>+++ b/Documentation/locking/futex2.rst
>@@ -0,0 +1,198 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+======
>+futex2
>+======
>+
>+:Author: André Almeida <andrealmeid(a)collabora.com>
>+
>+futex, or fast user mutex, is a set of syscalls to allow userspace to create
>+performant synchronization mechanisms, such as mutexes, semaphores and
>+conditional variables in userspace. C standard libraries, like glibc, uses it
>+as a means to implement more high level interfaces like pthreads.
>+
>+The interface
>+=============
>+
>+uAPI functions
>+--------------
>+
>+.. kernel-doc:: kernel/futex2.c
>+ :identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue
>+
>+uAPI structures
>+---------------
>+
>+.. kernel-doc:: include/uapi/linux/futex.h
>+
>+The ``flag`` argument
>+---------------------
>+
>+The flag is used to specify the size of the futex word
>+(FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no
>+default size.
>+
>+By default, the timeout uses a monotonic clock, but can be used as a realtime
>+one by using the FUTEX_REALTIME_CLOCK flag.
>+
>+By default, futexes are of the private type, that means that this user address
>+will be accessed by threads that share the same memory region. This allows for
>+some internal optimizations, so they are faster. However, if the address needs
>+to be shared with different processes (like using ``mmap()`` or ``shm()``), they
>+need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that.
>+
>+By default, the operation has no NUMA-awareness, meaning that the user can't
>+choose the memory node where the kernel side futex data will be stored. The
>+user can choose the node where it wants to operate by setting the
>+FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, 32 or
>+64)::
>+
>+ struct futexX_numa {
>+ __uX value;
>+ __sX hint;
>+ };
>+
>+This structure should be passed at the ``void *uaddr`` of futex functions. The
>+address of the structure will be used to be waited on/waken on, and the
>+``value`` will be compared to ``val`` as usual. The ``hint`` member is used to
>+define which node the futex will use. When waiting, the futex will be
>+registered on a kernel-side table stored on that node; when waking, the futex
>+will be searched for on that given table. That means that there's no redundancy
>+between tables, and the wrong ``hint`` value will lead to undesired behavior.
>+Userspace is responsible for dealing with node migrations issues that may
>+occur. ``hint`` can range from [0, MAX_NUMA_NODES), for specifying a node, or
>+-1, to use the same node the current process is using.
>+
>+When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a
>+global table on allocated on the first node.
>+
>+The ``timo`` argument
>+---------------------
>+
>+As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout
>+options known to be buggy. Given that, ``timo`` should be a 64-bit timeout at
>+all platforms, using an absolute timeout value.
>+
>+Implementation
>+==============
>+
>+The internal implementation follows a similar design to the original futex.
>+Given that we want to replicate the same external behavior of current futex,
>+this should be somewhat expected.
>+
>+Waiting
>+-------
>+
>+For the wait operations, they are all treated as if you want to wait on N
>+futexes, so the path for futex_wait and futex_waitv is the basically the same.
>+For both syscalls, the first step is to prepare an internal list for the list
>+of futexes to wait for (using struct futexv_head). For futex_wait() calls, this
>+list will have a single object.
>+
>+We have a hash table, where waiters register themselves before sleeping. Then
>+the wake function checks this table looking for waiters at uaddr. The hash
>+bucket to be used is determined by a struct futex_key, that stores information
>+to uniquely identify an address from a given process. Given the huge address
>+space, there'll be hash collisions, so we store information to be later used on
>+collision treatment.
>+
>+First, for every futex we want to wait on, we check if (``*uaddr == val``).
>+This check is done holding the bucket lock, so we are correctly serialized with
>+any futex_wake() calls. If any waiter fails the check above, we dequeue all
>+futexes. The check (``*uaddr == val``) can fail for two reasons:
>+
>+- The values are different, and we return -EAGAIN. However, if while
>+ dequeueing we found that some futexes were awakened, we prioritize this
>+ and return success.
>+
>+- When trying to access the user address, we do so with page faults
>+ disabled because we are holding a bucket's spin lock (and can't sleep
>+ while holding a spin lock). If there's an error, it might be a page
>+ fault, or an invalid address. We release the lock, dequeue everyone
>+ (because it's illegal to sleep while there are futexes enqueued, we
>+ could lose wakeups) and try again with page fault enabled. If we
>+ succeed, this means that the address is valid, but we need to do
>+ all the work again. For serialization reasons, we need to have the
>+ spin lock when getting the user value. Additionally, for shared
>+ futexes, we also need to recalculate the hash, since the underlying
>+ mapping mechanisms could have changed when dealing with page fault.
>+ If, even with page fault enabled, we can't access the address, it
>+ means it's an invalid user address, and we return -EFAULT. For this
>+ case, we prioritize the error, even if some futexes were awaken.
>+
>+If the check is OK, they are enqueued on a linked list in our bucket, and
>+proceed to the next one. If all waiters succeed, we put the thread to sleep
>+until a futex_wake() call, timeout expires or we get a signal. After waking up,
>+we dequeue everyone, and check if some futex was awakened. This dequeue is done
>+by iteratively walking at each element of struct futex_head list.
>+
>+All enqueuing/dequeuing operations requires to hold the bucket lock, to avoid
>+racing while modifying the list.
>+
>+Waking
>+------
>+
>+We get the bucket that's storing the waiters at uaddr, and wake the required
>+number of waiters, checking for hash collision.
>+
>+There's an optimization that makes futex_wake() not take the bucket lock if
>+there's no one to be woken on that bucket. It checks an atomic counter that each
>+bucket has, if it says 0, then the syscall exits. In order for this to work, the
>+waiter thread increases it before taking the lock, so the wake thread will
>+correctly see that there's someone waiting and will continue the path to take
>+the bucket lock. To get the correct serialization, the waiter issues a memory
>+barrier after increasing the bucket counter and the waker issues a memory
>+barrier before checking it.
>+
>+Requeuing
>+---------
>+
>+The requeue path first checks for each struct futex_requeue and their flags.
>+Then, it will compare the expected value with the one at uaddr1::uaddr.
>+Following the same serialization explained at Waking_, we increase the atomic
>+counter for the bucket of uaddr2 before taking the lock. We need to have both
>+buckets locks at same time so we don't race with other futex operation. To
>+ensure the locks are taken in the same order for all threads (and thus avoiding
>+deadlocks), every requeue operation takes the "smaller" bucket first, when
>+comparing both addresses.
>+
>+If the compare with user value succeeds, we proceed by waking ``nr_wake``
>+futexes, and then requeuing ``nr_requeue`` from bucket of uaddr1 to the uaddr2.
>+This consists in a simple list deletion/addition and replacing the old futex key
>+with the new one.
>+
>+Futex keys
>+----------
>+
>+There are two types of futexes: private and shared ones. The private are futexes
>+meant to be used by threads that share the same memory space, are easier to be
>+uniquely identified and thus can have some performance optimization. The
>+elements for identifying one are: the start address of the page where the
>+address is, the address offset within the page and the current->mm pointer.
>+
>+Now, for uniquely identifying a shared futex:
>+
>+- If the page containing the user address is an anonymous page, we can
>+ just use the same data used for private futexes (the start address of
>+ the page, the address offset within the page and the current->mm
>+ pointer); that will be enough for uniquely identifying such futex. We
>+ also set one bit at the key to differentiate if a private futex is
>+ used on the same address (mixing shared and private calls does not
>+ work).
>+
>+- If the page is file-backed, current->mm maybe isn't the same one for
>+ every user of this futex, so we need to use other data: the
>+ page->index, a UUID for the struct inode and the offset within the
>+ page.
>+
>+Note that members of futex_key don't have any particular meaning after they
>+are part of the struct - they are just bytes to identify a futex. Given that,
>+we don't need to use a particular name or type that matches the original data,
>+we only need to care about the bitsize of each component and make both private
>+and shared fit in the same memory space.
>+
>+Source code documentation
>+=========================
>+
>+.. kernel-doc:: kernel/futex2.c
>+ :no-identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue
>diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
>index 7003bd5aeff4..9bf03c7fa1ec 100644
>--- a/Documentation/locking/index.rst
>+++ b/Documentation/locking/index.rst
>@@ -24,6 +24,7 @@ locking
> percpu-rw-semaphore
> robust-futexes
> robust-futex-ABI
>+ futex2
>
> .. only:: subproject and html
>
>--
>2.31.1
>
Add a libbpf dumper function that supports dumping a representation
of data passed in using the BTF id associated with the data in a
manner similar to the bpf_snprintf_btf helper.
Default output format is identical to that dumped by bpf_snprintf_btf()
(bar using tabs instead of spaces for indentation, but the indent string
can be customized also); for example, a "struct sk_buff" representation
would look like this:
(struct sk_buff){
(union){
(struct){
.next = (struct sk_buff *)0xffffffffffffffff,
.prev = (struct sk_buff *)0xffffffffffffffff,
(union){
.dev = (struct net_device *)0xffffffffffffffff,
.dev_scratch = (long unsigned int)18446744073709551615,
},
},
...
Patch 1 implements the dump functionality in a manner similar
to that in kernel/bpf/btf.c, but with a view to fitting into
libbpf more naturally. For example, rather than using flags,
boolean dump options are used to control output. In addition,
rather than combining checks for display (such as is this
field zero?) and actual display - as is done for the kernel
code - the code is organized to separate zero and overflow
checks from type display.
Patch 2 consists of selftests that utilize a dump printf function
to snprintf the dump output to a string for comparison with
expected output. Tests deliberately mirror those in
snprintf_btf helper test to keep output consistent, but
also cover overflow handling, var/section display.
Changes since v3 [1]
- Retained separation of emitting of type name cast prefixing
type values from existing functionality such as btf_dump_emit_type_chain()
since initial code-shared version had so many exceptions it became
hard to read. For example, we don't emit a type name if the type
to be displayed is an array member, we also always emit "forward"
definitions for structs/unions that aren't really forward definitions
(we just want a "struct foo" output for "(struct foo){.bar = ...".
We also always ignore modifiers const/volatile/restrict as they
clutter output when emitting large types.
- Added configurable 4-char indent string option; defaults to tab
(Andrii)
- Added support for BTF_KIND_FLOAT and associated tests (Andrii)
- Added support for BTF_KIND_FUNC_PROTO function pointers to
improve output of "ops" structures; for example:
(struct file_operations){
.owner = (struct module *)0xffffffffffffffff,
.llseek = (loff_t(*)(struct file *, loff_t, int))0xffffffffffffffff,
...
Added associated test also (Andrii)
- Added handling for enum bitfields and associated test (Andrii)
- Allocation of "struct btf_dump_data" done on-demand (Andrii)
- Removed ".field = " output from function emitting type name and
into caller (Andrii)
- Removed BTF_INT_OFFSET() support (Andrii)
- Use libbpf_err() to set errno for error cases (Andrii)
- btf_dump_dump_type_data() returns size written, which is used
when returning successfully from btf_dump__dump_type_data()
(Andrii)
Changes since v2 [2]
- Renamed function to btf_dump__dump_type_data, reorganized
arguments such that opts are last (Andrii)
- Modified code to separate questions about display such
as have we overflowed?/is this field zero? from actual
display of typed data, such that we ask those questions
separately from the code that actually displays typed data
(Andrii)
- Reworked code to handle overflow - where we do not provide
enough data for the type we wish to display - by returning
-E2BIG and attempting to present as much data as possible.
Such a mode of operation allows for tracers which retrieve
partial data (such as first 1024 bytes of a
"struct task_struct" say), and want to display that partial
data, while also knowing that it is not the full type.
Such tracers can then denote this (perhaps via "..." or
similar).
- Explored reusing existing type emit functions, such as
passing in a type id stack with a single type id to
btf_dump_emit_type_chain() to support the display of
typed data where a "cast" is prepended to the data to
denote its type; "(int)1", "(struct foo){", etc.
However the task of emitting a
".field_name = (typecast)" did not match well with model
of walking the stack to display innermost types first
and made the resultant code harder to read. Added a
dedicated btf_dump_emit_type_name() function instead which
is only ~70 lines (Andrii)
- Various cleanups around bitfield macros, unneeded member
iteration macros, avoiding compiler complaints when
displaying int da ta by casting to long long, etc (Andrii)
- Use DECLARE_LIBBPF_OPTS() in defining opts for tests (Andrii)
- Added more type tests, overflow tests, var tests and
section tests.
Changes since RFC [3]
- The initial approach explored was to share the kernel code
with libbpf using #defines to paper over the different needs;
however it makes more sense to try and fit in with libbpf
code style for maintenance. A comment in the code points at
the implementation in kernel/bpf/btf.c and notes that any
issues found in it should be fixed there or vice versa;
mirroring the tests should help with this also
(Andrii)
[1] https://lore.kernel.org/bpf/1622131170-8260-1-git-send-email-alan.maguire@o…
[2] https://lore.kernel.org/bpf/1610921764-7526-1-git-send-email-alan.maguire@o…
[3] https://lore.kernel.org/bpf/1610386373-24162-1-git-send-email-alan.maguire@…
Alan Maguire (2):
libbpf: BTF dumper support for typed data
selftests/bpf: add dump type data tests to btf dump tests
tools/lib/bpf/btf.h | 22 +
tools/lib/bpf/btf_dump.c | 1008 ++++++++++++++++++++-
tools/lib/bpf/libbpf.map | 1 +
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 638 +++++++++++++
4 files changed, 1667 insertions(+), 2 deletions(-)
--
1.8.3.1