All architectures should use a long aligned address passed to set_bit(). User processes can pass either a 32-bit or 64-bit sized value to be updated when tracing is enabled when on a 64-bit kernel. Both cases are ensured to be naturally aligned, however, that is not enough. The address must be long aligned without affecting checks on the value within the user process which require different adjustments for the bit for little and big endian CPUs.
32 bit on 64 bit, even when properly long aligned, still require a 32 bit offset to be done for BE. Due to this, it cannot be easily put into a generic method.
The abi_test also used a long, which broke the test on 64-bit BE machines. The change simply uses an int for 32-bit value checks and a long when on 64-bit kernels for 64-bit specific checks.
I've run these changes and self tests for user_events on ppc64 BE, x86_64 LE, and aarch64 LE. It'd be great to test this also on RISC-V, but I do not have one.
Clément Léger originally put a patch together for the alignment issue, but we uncovered more issues as we went further into the problem. Clément felt my version was better [1] so I am sending this series out that addresses the selftest, BE bit offset, and the alignment issue.
1. https://lore.kernel.org/linux-trace-kernel/713f4916-00ff-4a24-82d1-72884500a...
Beau Belgrave (2): tracing/user_events: Align set_bit() address for all archs selftests/user_events: Fix abi_test for BE archs
kernel/trace/trace_events_user.c | 58 ++++++++++++++++--- .../testing/selftests/user_events/abi_test.c | 16 ++--- 2 files changed, 60 insertions(+), 14 deletions(-)
base-commit: fc1653abba0d554aad80224e51bcad42b09895ed
All architectures should use a long aligned address passed to set_bit(). User processes can pass either a 32-bit or 64-bit sized value to be updated when tracing is enabled when on a 64-bit kernel. Both cases are ensured to be naturally aligned, however, that is not enough. The address must be long aligned without affecting checks on the value within the user process which require different adjustments for the bit for little and big endian CPUs.
Add a compat flag to user_event_enabler that indicates when a 32-bit value is being used on a 64-bit kernel. Long align addresses and correct the bit to be used by set_bit() to account for this alignment. Ensure compat flags are copied during forks and used during deletion clears.
Fixes: 7235759084a4 ("tracing/user_events: Use remote writes for event enablement") Link: https://lore.kernel.org/linux-trace-kernel/20230914131102.179100-1-cleger@ri...
Reported-by: Clément Léger cleger@rivosinc.com Suggested-by: Clément Léger cleger@rivosinc.com Signed-off-by: Beau Belgrave beaub@linux.microsoft.com --- kernel/trace/trace_events_user.c | 58 ++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 6f046650e527..b87f41187c6a 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -127,8 +127,13 @@ struct user_event_enabler { /* Bit 7 is for freeing status of enablement */ #define ENABLE_VAL_FREEING_BIT 7
-/* Only duplicate the bit value */ -#define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK +/* Bit 8 is for marking 32-bit on 64-bit */ +#define ENABLE_VAL_32_ON_64_BIT 8 + +#define ENABLE_VAL_COMPAT_MASK (1 << ENABLE_VAL_32_ON_64_BIT) + +/* Only duplicate the bit and compat values */ +#define ENABLE_VAL_DUP_MASK (ENABLE_VAL_BIT_MASK | ENABLE_VAL_COMPAT_MASK)
#define ENABLE_BITOPS(e) (&(e)->values)
@@ -174,6 +179,30 @@ struct user_event_validator { int flags; };
+static inline void align_addr_bit(unsigned long *addr, int *bit, + unsigned long *flags) +{ + if (IS_ALIGNED(*addr, sizeof(long))) { +#ifdef __BIG_ENDIAN + /* 32 bit on BE 64 bit requires a 32 bit offset when aligned. */ + if (test_bit(ENABLE_VAL_32_ON_64_BIT, flags)) + *bit += 32; +#endif + return; + } + + *addr = ALIGN_DOWN(*addr, sizeof(long)); + + /* + * We only support 32 and 64 bit values. The only time we need + * to align is a 32 bit value on a 64 bit kernel, which on LE + * is always 32 bits, and on BE requires no change when unaligned. + */ +#ifdef __LITTLE_ENDIAN + *bit += 32; +#endif +} + typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter *i, void *tpdata, bool *faulted);
@@ -482,6 +511,7 @@ static int user_event_enabler_write(struct user_event_mm *mm, unsigned long *ptr; struct page *page; void *kaddr; + int bit = ENABLE_BIT(enabler); int ret;
lockdep_assert_held(&event_mutex); @@ -497,6 +527,8 @@ static int user_event_enabler_write(struct user_event_mm *mm, test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler)))) return -EBUSY;
+ align_addr_bit(&uaddr, &bit, ENABLE_BITOPS(enabler)); + ret = pin_user_pages_remote(mm->mm, uaddr, 1, FOLL_WRITE | FOLL_NOFAULT, &page, NULL);
@@ -515,9 +547,9 @@ static int user_event_enabler_write(struct user_event_mm *mm,
/* Update bit atomically, user tracers must be atomic as well */ if (enabler->event && enabler->event->status) - set_bit(ENABLE_BIT(enabler), ptr); + set_bit(bit, ptr); else - clear_bit(ENABLE_BIT(enabler), ptr); + clear_bit(bit, ptr);
kunmap_local(kaddr); unpin_user_pages_dirty_lock(&page, 1, true); @@ -849,6 +881,12 @@ static struct user_event_enabler enabler->event = user; enabler->addr = uaddr; enabler->values = reg->enable_bit; + +#if BITS_PER_LONG >= 64 + if (reg->enable_size == 4) + set_bit(ENABLE_VAL_32_ON_64_BIT, ENABLE_BITOPS(enabler)); +#endif + retry: /* Prevents state changes from racing with new enablers */ mutex_lock(&event_mutex); @@ -2377,7 +2415,8 @@ static long user_unreg_get(struct user_unreg __user *ureg, }
static int user_event_mm_clear_bit(struct user_event_mm *user_mm, - unsigned long uaddr, unsigned char bit) + unsigned long uaddr, unsigned char bit, + unsigned long flags) { struct user_event_enabler enabler; int result; @@ -2385,7 +2424,7 @@ static int user_event_mm_clear_bit(struct user_event_mm *user_mm,
memset(&enabler, 0, sizeof(enabler)); enabler.addr = uaddr; - enabler.values = bit; + enabler.values = bit | flags; retry: /* Prevents state changes from racing with new enablers */ mutex_lock(&event_mutex); @@ -2415,6 +2454,7 @@ static long user_events_ioctl_unreg(unsigned long uarg) struct user_event_mm *mm = current->user_event_mm; struct user_event_enabler *enabler, *next; struct user_unreg reg; + unsigned long flags; long ret;
ret = user_unreg_get(ureg, ®); @@ -2425,6 +2465,7 @@ static long user_events_ioctl_unreg(unsigned long uarg) if (!mm) return -ENOENT;
+ flags = 0; ret = -ENOENT;
/* @@ -2441,6 +2482,9 @@ static long user_events_ioctl_unreg(unsigned long uarg) ENABLE_BIT(enabler) == reg.disable_bit) { set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));
+ /* We must keep compat flags for the clear */ + flags |= enabler->values & ENABLE_VAL_COMPAT_MASK; + if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler))) user_event_enabler_destroy(enabler, true);
@@ -2454,7 +2498,7 @@ static long user_events_ioctl_unreg(unsigned long uarg) /* Ensure bit is now cleared for user, regardless of event status */ if (!ret) ret = user_event_mm_clear_bit(mm, reg.disable_addr, - reg.disable_bit); + reg.disable_bit, flags);
return ret; }
The abi_test currently uses a long sized test value for enablement checks. On LE this works fine, however, on BE this results in inaccurate assert checks due to a bit being used and assuming it's value is the same on both LE and BE.
Use int type for 32-bit values and long type for 64-bit values to ensure appropriate behavior on both LE and BE.
Fixes: 60b1af8de8c1 ("tracing/user_events: Add ABI self-test") Signed-off-by: Beau Belgrave beaub@linux.microsoft.com --- tools/testing/selftests/user_events/abi_test.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/user_events/abi_test.c b/tools/testing/selftests/user_events/abi_test.c index 5125c42efe65..67af4c491c0c 100644 --- a/tools/testing/selftests/user_events/abi_test.c +++ b/tools/testing/selftests/user_events/abi_test.c @@ -46,7 +46,7 @@ static int change_event(bool enable) return ret; }
-static int reg_enable(long *enable, int size, int bit) +static int reg_enable(void *enable, int size, int bit) { struct user_reg reg = {0}; int fd = open(data_file, O_RDWR); @@ -68,7 +68,7 @@ static int reg_enable(long *enable, int size, int bit) return ret; }
-static int reg_disable(long *enable, int bit) +static int reg_disable(void *enable, int bit) { struct user_unreg reg = {0}; int fd = open(data_file, O_RDWR); @@ -89,12 +89,14 @@ static int reg_disable(long *enable, int bit) }
FIXTURE(user) { - long check; + int check; + long check_long; };
FIXTURE_SETUP(user) { change_event(false); self->check = 0; + self->check_long = 0; }
FIXTURE_TEARDOWN(user) { @@ -131,9 +133,9 @@ TEST_F(user, bit_sizes) {
#if BITS_PER_LONG == 8 /* Allow 0-64 bits for 64-bit */ - ASSERT_EQ(0, reg_enable(&self->check, sizeof(long), 63)); - ASSERT_NE(0, reg_enable(&self->check, sizeof(long), 64)); - ASSERT_EQ(0, reg_disable(&self->check, 63)); + ASSERT_EQ(0, reg_enable(&self->check_long, sizeof(long), 63)); + ASSERT_NE(0, reg_enable(&self->check_long, sizeof(long), 64)); + ASSERT_EQ(0, reg_disable(&self->check_long, 63)); #endif
/* Disallowed sizes (everything beside 4 and 8) */ @@ -195,7 +197,7 @@ static int clone_check(void *check) for (i = 0; i < 10; ++i) { usleep(100000);
- if (*(long *)check) + if (*(int *)check) return 0; }
Note, this doesn't seem to apply to my tree so I only added the first patch. I think this needs to go through Shuah's tree.
-- Steve
On Mon, 25 Sep 2023 23:08:29 +0000 Beau Belgrave beaub@linux.microsoft.com wrote:
The abi_test currently uses a long sized test value for enablement checks. On LE this works fine, however, on BE this results in inaccurate assert checks due to a bit being used and assuming it's value is the same on both LE and BE.
Use int type for 32-bit values and long type for 64-bit values to ensure appropriate behavior on both LE and BE.
Fixes: 60b1af8de8c1 ("tracing/user_events: Add ABI self-test") Signed-off-by: Beau Belgrave beaub@linux.microsoft.com
tools/testing/selftests/user_events/abi_test.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/user_events/abi_test.c b/tools/testing/selftests/user_events/abi_test.c index 5125c42efe65..67af4c491c0c 100644 --- a/tools/testing/selftests/user_events/abi_test.c +++ b/tools/testing/selftests/user_events/abi_test.c @@ -46,7 +46,7 @@ static int change_event(bool enable) return ret; } -static int reg_enable(long *enable, int size, int bit) +static int reg_enable(void *enable, int size, int bit) { struct user_reg reg = {0}; int fd = open(data_file, O_RDWR); @@ -68,7 +68,7 @@ static int reg_enable(long *enable, int size, int bit) return ret; } -static int reg_disable(long *enable, int bit) +static int reg_disable(void *enable, int bit) { struct user_unreg reg = {0}; int fd = open(data_file, O_RDWR); @@ -89,12 +89,14 @@ static int reg_disable(long *enable, int bit) } FIXTURE(user) {
- long check;
- int check;
- long check_long;
}; FIXTURE_SETUP(user) { change_event(false); self->check = 0;
- self->check_long = 0;
} FIXTURE_TEARDOWN(user) { @@ -131,9 +133,9 @@ TEST_F(user, bit_sizes) { #if BITS_PER_LONG == 8 /* Allow 0-64 bits for 64-bit */
- ASSERT_EQ(0, reg_enable(&self->check, sizeof(long), 63));
- ASSERT_NE(0, reg_enable(&self->check, sizeof(long), 64));
- ASSERT_EQ(0, reg_disable(&self->check, 63));
- ASSERT_EQ(0, reg_enable(&self->check_long, sizeof(long), 63));
- ASSERT_NE(0, reg_enable(&self->check_long, sizeof(long), 64));
- ASSERT_EQ(0, reg_disable(&self->check_long, 63));
#endif /* Disallowed sizes (everything beside 4 and 8) */ @@ -195,7 +197,7 @@ static int clone_check(void *check) for (i = 0; i < 10; ++i) { usleep(100000);
if (*(long *)check)
}if (*(int *)check) return 0;
On 10/3/23 18:59, Steven Rostedt wrote:
Note, this doesn't seem to apply to my tree so I only added the first patch. I think this needs to go through Shuah's tree.
-- Steve
Yes. I sent a fix up for rc4 - I can pull these two patches into linux-kselftest next
Steve! Does that work for you?
thanks, -- Shuah
On Wed, 4 Oct 2023 09:10:52 -0600 Shuah Khan skhan@linuxfoundation.org wrote:
On 10/3/23 18:59, Steven Rostedt wrote:
Note, this doesn't seem to apply to my tree so I only added the first patch. I think this needs to go through Shuah's tree.
-- Steve
Yes. I sent a fix up for rc4 - I can pull these two patches into linux-kselftest next
Steve! Does that work for you?
I applied the first patch to my tree, I think the second patch is fine to go separately through your tree.
-- Steve
On 10/4/23 09:14, Steven Rostedt wrote:
On Wed, 4 Oct 2023 09:10:52 -0600 Shuah Khan skhan@linuxfoundation.org wrote:
On 10/3/23 18:59, Steven Rostedt wrote:
Note, this doesn't seem to apply to my tree so I only added the first patch. I think this needs to go through Shuah's tree.
-- Steve
Yes. I sent a fix up for rc4 - I can pull these two patches into linux-kselftest next
Steve! Does that work for you?
I applied the first patch to my tree, I think the second patch is fine to go separately through your tree.
Yes I will apply this to linux-kselftest fixes branch once my PR clears.
thanks, -- Shuah
On 10/4/23 10:38, Shuah Khan wrote:
On 10/4/23 09:14, Steven Rostedt wrote:
On Wed, 4 Oct 2023 09:10:52 -0600 Shuah Khan skhan@linuxfoundation.org wrote:
On 10/3/23 18:59, Steven Rostedt wrote:
Note, this doesn't seem to apply to my tree so I only added the first patch. I think this needs to go through Shuah's tree.
-- Steve
Yes. I sent a fix up for rc4 - I can pull these two patches into linux-kselftest next
Steve! Does that work for you?
I applied the first patch to my tree, I think the second patch is fine to go separately through your tree.
Yes I will apply this to linux-kselftest fixes branch once my PR clears.
Hmm. Which tree is this patch based on? This doesn't apply to linux-kselftest fixes - I thought this was based on top of fixes since I sent in a fix for Linux 6.6-rc4 for user_events
Beau, Please rebase to the correct tree/branch and send v2 for this patch.
thanks, -- Shuah
On Thu, 5 Oct 2023 08:48:14 -0600 Shuah Khan skhan@linuxfoundation.org wrote:
Hmm. Which tree is this patch based on? This doesn't apply to linux-kselftest fixes - I thought this was based on top of fixes since I sent in a fix for Linux 6.6-rc4 for user_events
Beau, Please rebase to the correct tree/branch and send v2 for this patch.
Hmm, so this didn't apply to my tree nor yours.
Beau, can you verify which tree this goes to?
-- Steve
On Thu, Oct 05, 2023 at 11:08:15AM -0400, Steven Rostedt wrote:
On Thu, 5 Oct 2023 08:48:14 -0600 Shuah Khan skhan@linuxfoundation.org wrote:
Hmm. Which tree is this patch based on? This doesn't apply to linux-kselftest fixes - I thought this was based on top of fixes since I sent in a fix for Linux 6.6-rc4 for user_events
Beau, Please rebase to the correct tree/branch and send v2 for this patch.
Hmm, so this didn't apply to my tree nor yours.
Beau, can you verify which tree this goes to?
-- Steve
It was based on tracing/for-next.
I'll get a v2 out rebased upon linux-kselftest, does that work?
Thanks, -Beau
On Thu, 5 Oct 2023 09:52:30 -0700 Beau Belgrave beaub@linux.microsoft.com wrote:
It was based on tracing/for-next.
I'll get a v2 out rebased upon linux-kselftest, does that work?
Hmm, then it should have applied to my tree. I didn't look too deep.
Can you see if it applies to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/for-next ?
Thanks,
-- Steve
linux-kselftest-mirror@lists.linaro.org