The test_offload.py test fits in networking and bpf equally
well. We started adding more Python tests in networking
and some of the code in test_offload.py can be reused,
so move it to networking. Looks like it bit rotted over
time and some fixes are needed.
Admittedly more code could be extracted but I only had
the time for a minor cleanup :(
Jakub Kicinski (4):
selftests: move bpf-offload test from bpf to net
selftests: net: bpf_offload: wait for maps
selftests: net: declare section names for bpf_offload
selftests: net: reuse common code in bpf_offload
tools/testing/selftests/bpf/Makefile | 1 -
tools/testing/selftests/net/Makefile | 11 +-
.../test_offload.py => net/bpf_offload.py} | 138 +++++-------------
tools/testing/selftests/net/lib/py/nsim.py | 9 +-
.../sample_map_ret0.bpf.c} | 2 +-
.../sample_ret0.c => net/sample_ret0.bpf.c} | 3 +
6 files changed, 57 insertions(+), 107 deletions(-)
rename tools/testing/selftests/{bpf/test_offload.py => net/bpf_offload.py} (93%)
rename tools/testing/selftests/{bpf/progs/sample_map_ret0.c => net/sample_map_ret0.bpf.c} (96%)
rename tools/testing/selftests/{bpf/progs/sample_ret0.c => net/sample_ret0.bpf.c} (70%)
--
2.44.0
New version of the sleepable bpf_timer code, without BPF changes, as
they can now go through the HID tree independantly:
https://lore.kernel.org/all/20240315-hid-bpf-sleepable-v4-0-5658f2540564@ke…
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Jiri Kosina <jikos(a)kernel.org>
To: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <linux-input(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-doc(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the BPF changes, they can go independently in bpf-core
- dropped the HID-BPF integration tests with the sleppable timers,
I'll re-add them once both series (this and sleepable timers) are
merged
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (7):
HID: bpf/dispatch: regroup kfuncs definitions
HID: bpf: export hid_hw_output_report as a BPF kfunc
selftests/hid: add KASAN to the VM tests
selftests/hid: Add test for hid_bpf_hw_output_report
HID: bpf: allow to inject HID event from BPF
selftests/hid: add tests for hid_bpf_input_report
HID: bpf: allow to use bpf_timer_set_sleepable_cb() in tracing callbacks.
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 226 ++++++++++++++-------
drivers/hid/hid-core.c | 2 +
include/linux/hid_bpf.h | 3 +
tools/testing/selftests/hid/config.common | 1 +
tools/testing/selftests/hid/hid_bpf.c | 112 +++++++++-
tools/testing/selftests/hid/progs/hid.c | 46 +++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 6 +
8 files changed, 324 insertions(+), 74 deletions(-)
---
base-commit: 3e78a6c0d3e02e4cf881dc84c5127e9990f939d6
change-id: 20240314-b4-hid-bpf-new-funcs-ecf05d0ef870
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Currently, the sud_test expects the emulated syscall to return the
emulated syscall number. This assumption only works on architectures
were the syscall calling convention use the same register for syscall
number/syscall return value. This is not the case for RISC-V and thus
the return value must be also emulated using the provided ucontext.
Signed-off-by: Clément Léger <cleger(a)rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Acked-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
Changes in V2:
- Changes comment to be more explicit
- Use A7 syscall arg rather than hardcoding MAGIC_SYSCALL_1
---
.../selftests/syscall_user_dispatch/sud_test.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_test.c b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
index b5d592d4099e..d975a6767329 100644
--- a/tools/testing/selftests/syscall_user_dispatch/sud_test.c
+++ b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
@@ -158,6 +158,20 @@ static void handle_sigsys(int sig, siginfo_t *info, void *ucontext)
/* In preparation for sigreturn. */
SYSCALL_DISPATCH_OFF(glob_sel);
+
+ /*
+ * The tests for argument handling assume that `syscall(x) == x`. This
+ * is a NOP on x86 because the syscall number is passed in %rax, which
+ * happens to also be the function ABI return register. Other
+ * architectures may need to swizzle the arguments around.
+ */
+#if defined(__riscv)
+/* REG_A7 is not defined in libc headers */
+# define REG_A7 (REG_A0 + 7)
+
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A0] =
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A7];
+#endif
}
TEST(dispatch_and_return)
--
2.43.0
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v4 -> v5:
- Add 1st commit - flush id checks in udp_gro_receive segment which can be
backported by itself
- Add TCP measurements for the 5th commit
- Add flush id tests to ensure flush id logic is preserved in GRO
- Simplify gro_inet_flush by removing a branch
- v4:
https://lore.kernel.org/all/20240325182543.87683-1-richardbgobert@gmail.com/
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/all/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.com/
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (6):
net: gro: add flush check in udp_gro_receive_segment
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
selftests/net: add flush id selftests
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 +-
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 93 ++++++++++++--
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +-
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 7 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 54 +-------
net/ipv4/fou_core.c | 9 +-
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 22 +---
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 31 +++--
net/ipv6/ip6_offload.c | 41 +++---
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 +-
tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
25 files changed, 336 insertions(+), 161 deletions(-)
--
2.36.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Add start_server_addr helper
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Add function pointer for __start_server
selftests/bpf: Add start_server_setsockopt helper
selftests/bpf: Use start_server_setsockopt in sockopt_inherit
selftests/bpf: Use connect_to_fd in sockopt_inherit
selftests/bpf: Use start_server_* in test_tcp_check_syncookie
selftests/bpf: Use connect_to_addr in test_tcp_check_syncookie
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 53 +++++++++----
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/sk_assign.c | 53 +------------
.../bpf/prog_tests/sockopt_inherit.c | 64 ++++------------
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++----------------
.../bpf/test_tcp_check_syncookie_user.c | 76 +++----------------
8 files changed, 85 insertions(+), 281 deletions(-)
--
2.40.1
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace. IMO, the
kernel should provide a set of workingset interfaces that should be
generic enough to accommodate the various use cases, and be extensible
to potential future use cases. The current proposed interfaces are not
sufficient in that regard, but I would like to start somewhere, solicit
feedback, and iterate.
Use cases
==========
Job scheduling
For data center machines, workingset information allows the job scheduler
to right-size each job and land more jobs on the same host or NUMA node,
and in the case of a job with increasing workingset, policy decisions
can be made to migrate other jobs off the host/NUMA node, or oom-kill
the misbehaving job. If the job shape is very different from the machine
shape, knowing the workingset per-node can also help inform page
allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting enables the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
While this patch series does not extend the virtio-balloon device,
balloon policies benefit from workingset to more precisely determine
the size of the memory balloon. On desktops/laptops/mobile devices where
memory is scarce and overcommitted, the balloon sizing in multiple VMs
running on the same device can be orchestrated with workingset reports
from each one.
Promotion/Demotion
Similar to proactive reclaim, a workingset report enables demotion to a
slower tier of memory.
For promotion, the workingset report interfaces need to be extended to
report hotness and gather hotness information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
I realize this does not generalize well to hotness information, but I
lack the intuition for an abstraction that presents hotness in a useful
way. Based on a recent proposal for move_phys_pages[2], it seems like
userspace tiering software would like to move specific physical pages,
instead of informing the kernel "move x number of hot pages to y
device". Please advise.
[2]
https://lore.kernel.org/lkml/20240319172609.332900-1-gregory.price@memverge…
Implementation
==========
Currently, the reporting of user pages is based off of MGLRU, and
therefore requires CONFIG_LRU_GEN=y. We would benefit from more MGLRU
generations for a more fine-grained workingset report. I will make the
generation count configurable in the next version. The workingset
reporting mechanism is gated behind CONFIG_WORKINGSET_REPORT, and the
aging thread is behind CONFIG_WORKINGSET_REPORT_AGING.
--
Changes from RFC v2 -> RFC v3:
- Update to v6.8
- Added an aging kernel thread (gated behind config)
- Added basic selftests for sysfs interface files
- Track swapped out pages for reaccesses
- Refactoring and cleanup
- Dropped the virtio-balloon extension to make things manageable
Changes from RFC v1 -> RFC v2:
- Refactored the patchs into smaller pieces
- Renamed interfaces and functions from wss to wsr (Working Set Reporting)
- Fixed build errors when CONFIG_WSR is not set
- Changed working_set_num_bins to u8 for virtio-balloon
- Added support for per-NUMA node reporting for virtio-balloon
[rfc v1]
https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.co…
[rfc v2]
https://lore.kernel.org/linux-mm/20230621180454.973862-1-yuanchu@google.com/
Yuanchu Xie (8):
mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true
mm: aggregate working set information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend working set reporting to memcgs
mm: add per-memcg reaccess histogram
mm: add kernel aging thread for workingset reporting
mm: test system-wide workingset reporting
drivers/base/node.c | 3 +
include/linux/memcontrol.h | 5 +
include/linux/mmzone.h | 4 +
include/linux/workingset_report.h | 107 +++
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 45 ++
mm/memcontrol.c | 386 ++++++++-
mm/mmzone.c | 2 +
mm/vmscan.c | 95 ++-
mm/workingset.c | 9 +-
mm/workingset_report.c | 757 ++++++++++++++++++
mm/workingset_report_aging.c | 127 +++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
.../testing/selftests/mm/workingset_report.c | 315 ++++++++
.../testing/selftests/mm/workingset_report.h | 37 +
.../selftests/mm/workingset_report_test.c | 328 ++++++++
18 files changed, 2231 insertions(+), 10 deletions(-)
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.44.0.396.g6e790dbe36-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- add two more patches.
- use log_err instead of ASSERT in v3.
- let send_recv_data return int as Martin suggested.
v2:
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (4):
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
selftests/bpf: Support nonblock for send_recv_data
tools/testing/selftests/bpf/network_helpers.c | 125 +++++++++++++++++-
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +---------
3 files changed, 121 insertions(+), 76 deletions(-)
--
2.40.1