We recently notice that the step_after_suspend_test would
fail on our plenty devices. The test believesit failed to
enter suspend state with
$ sudo ./step_after_suspend_test
TAP version 13
Bail out! Failed to enter Suspend state
However, in the kernel message, I indeed see the system get
suspended and then wake up later.
[611172.033108] PM: suspend entry (s2idle)
[611172.044940] Filesystems sync: 0.006 seconds
[611172.052254] Freezing user space processes
[611172.059319] Freezing user space processes completed (elapsed 0.001 seconds)
[611172.067920] OOM killer disabled.
[611172.072465] Freezing remaining freezable tasks
[611172.080332] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[611172.089724] printk: Suspending console(s) (use no_console_suspend to debug)
[611172.117126] serial 00:03: disabled
--- some other hardware get reconnected ---
[611203.136277] OOM killer enabled.
[611203.140637] Restarting tasks ...
[611203.141135] usb 1-8.1: USB disconnect, device number 7
[611203.141755] done.
[611203.155268] random: crng reseeded on system resumption
[611203.162059] PM: suspend exit
After investigation, I notice that for the code block
if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
ksft_exit_fail_msg("Failed to enter Suspend state\n");
The write will return -1 and errno is set to 16 (device busy).
It should be caused by the write function is not successfully returned
before the system suspend and the return value get messed when waking up.
As a result, It may be better to check the time passed of those few instructions
to determine whether the suspend is executed correctly for it is pretty hard to
execute those few lines for 5 seconds.
The timer to wake up the system is set to expired after 5 seconds and no-rearm.
If the timer remaining time is 0 second and 0 nano secomd, it means the timer
expired and wake the system up. Otherwise, the system could be considered to
enter the suspend state failed if there is any remaining time.
Fixes: bfd092b8c2728 ("selftests: breakpoint: add step_after_suspend_test")
Reported-by: Sinadin Shan <sinadin.shan(a)oracle.com>
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
.../testing/selftests/breakpoints/step_after_suspend_test.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index dfec31fb9b30d..33f5542bf741d 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -152,7 +152,10 @@ void suspend(void)
if (err < 0)
ksft_exit_fail_msg("timerfd_settime() failed\n");
- if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
+ system("(echo mem > /sys/power/state) 2> /dev/null");
+
+ timerfd_gettime(timerfd,&spec);
+ if (spec.it_value.tv_sec != 0 || spec.it_value.tv_nsec != 0)
ksft_exit_fail_msg("Failed to enter Suspend state\n");
close(timerfd);
--
2.46.0
These variables are never referenced in the code, just remove it.
Signed-off-by: Ba Jing <bajing(a)cmss.chinamobile.com>
---
tools/testing/selftests/damon/access_memory_even.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/damon/access_memory_even.c b/tools/testing/selftests/damon/access_memory_even.c
index 3be121487432..a9f4e9aaf3a9 100644
--- a/tools/testing/selftests/damon/access_memory_even.c
+++ b/tools/testing/selftests/damon/access_memory_even.c
@@ -14,10 +14,8 @@
int main(int argc, char *argv[])
{
char **regions;
- clock_t start_clock;
int nr_regions;
int sz_region;
- int access_time_ms;
int i;
if (argc != 3) {
--
2.33.0
Currently, sk_lookup allows an ebpf program to run on the ingress socket
lookup path, and accept traffic not only on a range of addresses, but
also on a range of ports. At Cloudflare we use sk_lookup for two main
cases:
1. Sharing a single port between multiple services - i.e. two services
(or more) use disjoint IP ranges but share the same port;
2. Receiving traffic on all ports - i.e. a service which accepts traffic
on specific IP ranges but any port [1].
However, one main challenge we face while using sk_lookup for these use
cases is how to source return UDP traffic:
- On point 1. above, sometimes this range of addresses are not local
(i.e. there's no local routes for these in the server), which means we
need IP_TRANSPARENT set to be able to egress traffic from addresses
we've received traffic on (or simply IP_FREEBIND in the case of IPv6);
- And on point 2. above, allowing traffic to a range of ports means a
service could get traffic on multiple ports, but currently there's no
way to set the source UDP port egress traffic should be sourced from -
it's possible to receive the original destination port using the
IP_ORIGDSTADDR ancilliary message in recvmsg, but not set it in
sendmsg.
Both of these limitations can be worked around, but in a sub-optimal
way. Using IP_TRANSPARENT, for instance, requires special privileges.
And while one could use UDP connected sockets to send return traffic,
creating a connected socket for each different address a UDP traffic is
received on does have performance implications.
Given sk_lookup allows services to accept traffic on a range of
addresses or ports, it seems sensible to also allow return traffic to
proceed through as well, without needing extra configurations / set ups.
This patch sets out to fix both of this issues by:
1. Allowing users to set the src address/port egress traffic should be
sent from, when calling sendmsg();
2. Validating that this egress traffic comes from a socket that matches
an ingress socket in sk_lookup.
- If it does, traffic is allowed to proceed;
- Otherwise it falls back to the regular egress path.
The downsides to this is that this runs on the egress hot path, although
this work tries to minimise its impact by only performing the reverse
socket lookup when necessary (i.e. only when the src address/port are
modified). Further performance measurements are to be taken, but we're
reaching out early for feedback to see what the technical concerns are
and if we can address them.
[1] https://blog.cloudflare.com/how-we-built-spectrum/
Suggested-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Signed-off-by: Tiago Lam <tiagolam(a)cloudflare.com>
---
Changes in v2:
- Amended commit messages and cover letter to make the intent and
implementation clearer (Willem de Bruijn);
- Fixed socket comparison by not using socket cookies and comparing them
directly (Eric Dumazet);
- Fixed misspellings and checkpatch.pl warnings on line lengths (Simon
Horman);
- Fixed usage of start_server_addr() and gcc compilation (Philo Lu);
- Link to v1: https://lore.kernel.org/r/20240913-reverse-sk-lookup-v1-0-e721ea003d4c@clou…
---
Tiago Lam (3):
ipv4: Support setting src port in sendmsg().
ipv6: Support setting src port in sendmsg().
bpf: Add sk_lookup test to use ORIGDSTADDR cmsg.
include/net/ip.h | 1 +
net/ipv4/ip_sockglue.c | 11 +++
net/ipv4/udp.c | 35 +++++++++-
net/ipv6/datagram.c | 79 ++++++++++++++++++++++
net/ipv6/udp.c | 8 ++-
tools/testing/selftests/bpf/prog_tests/sk_lookup.c | 67 ++++++++++++------
6 files changed, 176 insertions(+), 25 deletions(-)
---
base-commit: 6562a89739bbefddb5495c09aaab67c1c3756f36
change-id: 20240909-reverse-sk-lookup-f7bf36292bc4
Best regards,
--
Tiago Lam <tiagolam(a)cloudflare.com>
+CC linux-kselftest
-------
On 22/09/2024 17:16, Gabriela Bittencourt wrote:
> Hey all,
>
> We are making these changes as part of a KUnit Hackathon at LKCamp [1].
>
> This patch sets out to refactor fs/unicode/utf8-selftest.c to KUnit tests.
>
> The first commit is the refactoring itself from self test into KUnit, while
> the second one applies the naming style conventions.
>
> We appreciate any feedback and suggestions. :)
>
> [1] https://lkcamp.dev/about/
>
> Co-developed-by: Pedro Orlando <porlando(a)lkcamp.dev>
> Co-developed-by: Danilo Pereira <dpereira(a)lkcamp.dev>
> Signed-off-by: Pedro Orlando <porlando(a)lkcamp.dev>
> Signed-off-by: Danilo Pereira <dpereira(a)lkcamp.dev>
> Signed-off-by: Gabriela Bittencourt <gbittencourt(a)lkcamp.dev>
>
> Gabriela Bittencourt (2):
> unicode: kunit: refactor selftest to kunit tests
> unicode: kunit: change tests filename and path
>
> fs/unicode/Kconfig | 5 +-
> fs/unicode/Makefile | 2 +-
> fs/unicode/tests/.kunitconfig | 3 +
> .../{utf8-selftest.c => tests/utf8_kunit.c} | 152 ++++++++----------
> 4 files changed, 76 insertions(+), 86 deletions(-)
> create mode 100644 fs/unicode/tests/.kunitconfig
> rename fs/unicode/{utf8-selftest.c => tests/utf8_kunit.c} (63%)
>
We recently notice that the step_after_suspend_test would
fail on our plenty devices. The test believesit failed to
enter suspend state with
$ sudo ./step_after_suspend_test
TAP version 13
Bail out! Failed to enter Suspend state
However, in the kernel message, I indeed see the system get
suspended and then wake up later.
[611172.033108] PM: suspend entry (s2idle)
[611172.044940] Filesystems sync: 0.006 seconds
[611172.052254] Freezing user space processes
[611172.059319] Freezing user space processes completed (elapsed 0.001 seconds)
[611172.067920] OOM killer disabled.
[611172.072465] Freezing remaining freezable tasks
[611172.080332] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[611172.089724] printk: Suspending console(s) (use no_console_suspend to debug)
[611172.117126] serial 00:03: disabled
--- some other hardware get reconnected ---
[611203.136277] OOM killer enabled.
[611203.140637] Restarting tasks ...
[611203.141135] usb 1-8.1: USB disconnect, device number 7
[611203.141755] done.
[611203.155268] random: crng reseeded on system resumption
[611203.162059] PM: suspend exit
After investigation, I notice that for the code block
if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
ksft_exit_fail_msg("Failed to enter Suspend state\n");
The write will return -1 and errno is set to 16 (device busy).
It should be caused by the write function is not successfully returned
before the system suspend and the return value get messed when waking up.
As a result, It may be better to check the time passed of those few instructions
to determine whether the suspend is executed correctly for it is pretty hard to
execute those few lines for 4 seconds, or even more if it is not long enough.
Fixes: bfd092b8c2728 ("selftests: breakpoint: add step_after_suspend_test")
Reported-by: Sinadin Shan <sinadin.shan(a)oracle.com>
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
.../selftests/breakpoints/step_after_suspend_test.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index dfec31fb9b30d..d615f091e5bae 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -18,6 +18,7 @@
#include <sys/timerfd.h>
#include <sys/types.h>
#include <sys/wait.h>
+#include <time.h>
#include "../kselftest.h"
@@ -133,6 +134,7 @@ void suspend(void)
int timerfd;
int err;
struct itimerspec spec = {};
+ clock_t t;
if (getuid() != 0)
ksft_exit_skip("Please run the test as root - Exiting.\n");
@@ -152,8 +154,11 @@ void suspend(void)
if (err < 0)
ksft_exit_fail_msg("timerfd_settime() failed\n");
- if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
- ksft_exit_fail_msg("Failed to enter Suspend state\n");
+ t = clock();
+ write(power_state_fd, "mem", strlen("mem"));
+ t = clock()-t;
+ if ((int)(t) < 4)
+ ksft_exit_fail_msg("Failed to enter Suspend state %d\n",errno);
close(timerfd);
close(power_state_fd);
--
2.45.2
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323
The value of __NR_userfaultfd was changed to 282 when
asm-generic/unistd.h was included. It makes the test to fail every time
as the correct number of this syscall on x86_64 is 323. Fix the header
to asm/unistd.h.
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/pagemap_ioctl.c b/tools/testing/selftests/mm/pagemap_ioctl.c
index fc90af2a97b80..bcc73b4e805c6 100644
--- a/tools/testing/selftests/mm/pagemap_ioctl.c
+++ b/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -15,7 +15,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <math.h>
-#include <asm-generic/unistd.h>
+#include <asm/unistd.h>
#include <pthread.h>
#include <sys/resource.h>
#include <assert.h>
--
2.39.2
With `long` mapped to `isize`, `size_t`/`__kernel_size_t` mapped to
usize and `char` mapped to `u8`, many of the existing casts are no
longer necessary.
Signed-off-by: Gary Guo <gary(a)garyguo.net>
---
rust/kernel/kunit.rs | 10 ++--------
rust/kernel/print.rs | 4 ++--
rust/kernel/str.rs | 6 +++---
rust/kernel/uaccess.rs | 27 +++++++--------------------
4 files changed, 14 insertions(+), 33 deletions(-)
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 0ba77276ae7ef..766aeb1c6aea8 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -17,10 +17,7 @@ pub fn err(args: fmt::Arguments<'_>) {
// are passing.
#[cfg(CONFIG_PRINTK)]
unsafe {
- bindings::_printk(
- b"\x013%pA\0".as_ptr() as _,
- &args as *const _ as *const c_void,
- );
+ bindings::_printk(b"\x013%pA\0".as_ptr(), &args as *const _ as *const c_void);
}
}
@@ -33,10 +30,7 @@ pub fn info(args: fmt::Arguments<'_>) {
// are passing.
#[cfg(CONFIG_PRINTK)]
unsafe {
- bindings::_printk(
- b"\x016%pA\0".as_ptr() as _,
- &args as *const _ as *const c_void,
- );
+ bindings::_printk(b"\x016%pA\0".as_ptr(), &args as *const _ as *const c_void);
}
}
diff --git a/rust/kernel/print.rs b/rust/kernel/print.rs
index 508b0221256c9..90ae4f2568910 100644
--- a/rust/kernel/print.rs
+++ b/rust/kernel/print.rs
@@ -104,7 +104,7 @@ pub unsafe fn call_printk(
#[cfg(CONFIG_PRINTK)]
unsafe {
bindings::_printk(
- format_string.as_ptr() as _,
+ format_string.as_ptr(),
module_name.as_ptr(),
&args as *const _ as *const c_void,
);
@@ -125,7 +125,7 @@ pub fn call_printk_cont(args: fmt::Arguments<'_>) {
#[cfg(CONFIG_PRINTK)]
unsafe {
bindings::_printk(
- format_strings::CONT.as_ptr() as _,
+ format_strings::CONT.as_ptr(),
&args as *const _ as *const c_void,
);
}
diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index 3980d37bd4b29..2d30bca079e37 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -190,7 +190,7 @@ pub unsafe fn from_char_ptr<'a>(ptr: *const crate::ffi::c_char) -> &'a Self {
// to a `NUL`-terminated C string.
let len = unsafe { bindings::strlen(ptr) } + 1;
// SAFETY: Lifetime guaranteed by the safety precondition.
- let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len as _) };
+ let bytes = unsafe { core::slice::from_raw_parts(ptr as _, len) };
// SAFETY: As `len` is returned by `strlen`, `bytes` does not contain interior `NUL`.
// As we have added 1 to `len`, the last byte is known to be `NUL`.
unsafe { Self::from_bytes_with_nul_unchecked(bytes) }
@@ -249,7 +249,7 @@ pub unsafe fn from_bytes_with_nul_unchecked_mut(bytes: &mut [u8]) -> &mut CStr {
/// Returns a C pointer to the string.
#[inline]
pub const fn as_char_ptr(&self) -> *const crate::ffi::c_char {
- self.0.as_ptr() as _
+ self.0.as_ptr()
}
/// Convert the string to a byte slice without the trailing `NUL` byte.
@@ -817,7 +817,7 @@ pub fn try_from_fmt(args: fmt::Arguments<'_>) -> Result<Self, Error> {
// SAFETY: The buffer is valid for read because `f.bytes_written()` is bounded by `size`
// (which the minimum buffer size) and is non-zero (we wrote at least the `NUL` terminator)
// so `f.bytes_written() - 1` doesn't underflow.
- let ptr = unsafe { bindings::memchr(buf.as_ptr().cast(), 0, (f.bytes_written() - 1) as _) };
+ let ptr = unsafe { bindings::memchr(buf.as_ptr().cast(), 0, f.bytes_written() - 1) };
if !ptr.is_null() {
return Err(EINVAL);
}
diff --git a/rust/kernel/uaccess.rs b/rust/kernel/uaccess.rs
index c746a1f1bb5ad..eb72fbcf152a1 100644
--- a/rust/kernel/uaccess.rs
+++ b/rust/kernel/uaccess.rs
@@ -8,7 +8,7 @@
alloc::Flags,
bindings,
error::Result,
- ffi::{c_ulong, c_void},
+ ffi::c_void,
prelude::*,
types::{AsBytes, FromBytes},
};
@@ -227,13 +227,9 @@ pub fn read_raw(&mut self, out: &mut [MaybeUninit<u8>]) -> Result {
if len > self.length {
return Err(EFAULT);
}
- let Ok(len_ulong) = c_ulong::try_from(len) else {
- return Err(EFAULT);
- };
- // SAFETY: `out_ptr` points into a mutable slice of length `len_ulong`, so we may write
+ // SAFETY: `out_ptr` points into a mutable slice of length `len`, so we may write
// that many bytes to it.
- let res =
- unsafe { bindings::copy_from_user(out_ptr, self.ptr as *const c_void, len_ulong) };
+ let res = unsafe { bindings::copy_from_user(out_ptr, self.ptr as *const c_void, len) };
if res != 0 {
return Err(EFAULT);
}
@@ -262,9 +258,6 @@ pub fn read<T: FromBytes>(&mut self) -> Result<T> {
if len > self.length {
return Err(EFAULT);
}
- let Ok(len_ulong) = c_ulong::try_from(len) else {
- return Err(EFAULT);
- };
let mut out: MaybeUninit<T> = MaybeUninit::uninit();
// SAFETY: The local variable `out` is valid for writing `size_of::<T>()` bytes.
//
@@ -275,7 +268,7 @@ pub fn read<T: FromBytes>(&mut self) -> Result<T> {
bindings::_copy_from_user(
out.as_mut_ptr().cast::<c_void>(),
self.ptr as *const c_void,
- len_ulong,
+ len,
)
};
if res != 0 {
@@ -338,12 +331,9 @@ pub fn write_slice(&mut self, data: &[u8]) -> Result {
if len > self.length {
return Err(EFAULT);
}
- let Ok(len_ulong) = c_ulong::try_from(len) else {
- return Err(EFAULT);
- };
- // SAFETY: `data_ptr` points into an immutable slice of length `len_ulong`, so we may read
+ // SAFETY: `data_ptr` points into an immutable slice of length `len`, so we may read
// that many bytes from it.
- let res = unsafe { bindings::copy_to_user(self.ptr as *mut c_void, data_ptr, len_ulong) };
+ let res = unsafe { bindings::copy_to_user(self.ptr as *mut c_void, data_ptr, len) };
if res != 0 {
return Err(EFAULT);
}
@@ -362,9 +352,6 @@ pub fn write<T: AsBytes>(&mut self, value: &T) -> Result {
if len > self.length {
return Err(EFAULT);
}
- let Ok(len_ulong) = c_ulong::try_from(len) else {
- return Err(EFAULT);
- };
// SAFETY: The reference points to a value of type `T`, so it is valid for reading
// `size_of::<T>()` bytes.
//
@@ -375,7 +362,7 @@ pub fn write<T: AsBytes>(&mut self, value: &T) -> Result {
bindings::_copy_to_user(
self.ptr as *mut c_void,
(value as *const T).cast::<c_void>(),
- len_ulong,
+ len,
)
};
if res != 0 {
--
2.44.1