 
            The call to read_word_at_a_time() in sized_strscpy() is problematic with MTE because it may trigger a tag check fault when reading across a tag granule (16 bytes) boundary. To make this code MTE compatible, let's start using load_unaligned_zeropad() on architectures where it is available (i.e. architectures that define CONFIG_DCACHE_WORD_ACCESS). Because load_unaligned_zeropad() takes care of page boundaries as well as tag granule boundaries, also disable the code preventing crossing page boundaries when using load_unaligned_zeropad().
Signed-off-by: Peter Collingbourne pcc@google.com Link: https://linux-review.googlesource.com/id/If4b22e43b5a4ca49726b4bf98ada827fdf... Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Cc: stable@vger.kernel.org --- v2: - new approach
lib/string.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/lib/string.c b/lib/string.c index eb4486ed40d25..b632c71df1a50 100644 --- a/lib/string.c +++ b/lib/string.c @@ -119,6 +119,7 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) if (count == 0 || WARN_ON_ONCE(count > INT_MAX)) return -E2BIG;
+#ifndef CONFIG_DCACHE_WORD_ACCESS #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS /* * If src is unaligned, don't cross a page boundary, @@ -133,12 +134,14 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) /* If src or dest is unaligned, don't do word-at-a-time. */ if (((long) dest | (long) src) & (sizeof(long) - 1)) max = 0; +#endif #endif
/* - * read_word_at_a_time() below may read uninitialized bytes after the - * trailing zero and use them in comparisons. Disable this optimization - * under KMSAN to prevent false positive reports. + * load_unaligned_zeropad() or read_word_at_a_time() below may read + * uninitialized bytes after the trailing zero and use them in + * comparisons. Disable this optimization under KMSAN to prevent + * false positive reports. */ if (IS_ENABLED(CONFIG_KMSAN)) max = 0; @@ -146,7 +149,11 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) while (max >= sizeof(unsigned long)) { unsigned long c, data;
+#ifdef CONFIG_DCACHE_WORD_ACCESS + c = load_unaligned_zeropad(src+res); +#else c = read_word_at_a_time(src+res); +#endif if (has_zero(c, &data, &constants)) { data = prep_zero_mask(c, data, &constants); data = create_zero_mask(data);
 
            On Tue, Mar 18, 2025 at 02:40:32PM -0700, Peter Collingbourne wrote:
The call to read_word_at_a_time() in sized_strscpy() is problematic with MTE because it may trigger a tag check fault when reading across a tag granule (16 bytes) boundary. To make this code MTE compatible, let's start using load_unaligned_zeropad() on architectures where it is available (i.e. architectures that define CONFIG_DCACHE_WORD_ACCESS). Because load_unaligned_zeropad() takes care of page boundaries as well as tag granule boundaries, also disable the code preventing crossing page boundaries when using load_unaligned_zeropad().
Signed-off-by: Peter Collingbourne pcc@google.com Link: https://linux-review.googlesource.com/id/If4b22e43b5a4ca49726b4bf98ada827fdf... Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Cc: stable@vger.kernel.org
v2:
- new approach
lib/string.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/lib/string.c b/lib/string.c index eb4486ed40d25..b632c71df1a50 100644 --- a/lib/string.c +++ b/lib/string.c @@ -119,6 +119,7 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) if (count == 0 || WARN_ON_ONCE(count > INT_MAX)) return -E2BIG; +#ifndef CONFIG_DCACHE_WORD_ACCESS #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
I would prefer this were written as:
#if !defined(CONFIG_DCACHE_WORD_ACCESS) && \ defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
Having 2 #ifs makes me think there is some reason for having them separable. But the logic here is for a single check.
/* * If src is unaligned, don't cross a page boundary, @@ -133,12 +134,14 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) /* If src or dest is unaligned, don't do word-at-a-time. */ if (((long) dest | (long) src) & (sizeof(long) - 1)) max = 0; +#endif #endif
(Then no second #endif needed)
/*
* read_word_at_a_time() below may read uninitialized bytes after the
* trailing zero and use them in comparisons. Disable this optimization
* under KMSAN to prevent false positive reports.
* load_unaligned_zeropad() or read_word_at_a_time() below may read
* uninitialized bytes after the trailing zero and use them in
* comparisons. Disable this optimization under KMSAN to prevent*/ if (IS_ENABLED(CONFIG_KMSAN)) max = 0;
* false positive reports.@@ -146,7 +149,11 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) while (max >= sizeof(unsigned long)) { unsigned long c, data; +#ifdef CONFIG_DCACHE_WORD_ACCESS
c = load_unaligned_zeropad(src+res);+#else c = read_word_at_a_time(src+res); +#endif if (has_zero(c, &data, &constants)) { data = prep_zero_mask(c, data, &constants); data = create_zero_mask(data);
The rest seems good. Though I do wonder: what happens on a page boundary for read_word_at_a_time(), then? We get back zero-filled remainder? Will that hide a missing NUL terminator? As in, it's not actually there because of the end of the page/granule, but a zero was put in, so now it looks like it's been terminated and the exception got eaten? And doesn't this hide MTE faults since we can't differentiate "overran MTE tag" from "overran granule while over-reading"?
 
            On Tue, Mar 18, 2025 at 8:06 PM Kees Cook kees@kernel.org wrote:
On Tue, Mar 18, 2025 at 02:40:32PM -0700, Peter Collingbourne wrote:
The call to read_word_at_a_time() in sized_strscpy() is problematic with MTE because it may trigger a tag check fault when reading across a tag granule (16 bytes) boundary. To make this code MTE compatible, let's start using load_unaligned_zeropad() on architectures where it is available (i.e. architectures that define CONFIG_DCACHE_WORD_ACCESS). Because load_unaligned_zeropad() takes care of page boundaries as well as tag granule boundaries, also disable the code preventing crossing page boundaries when using load_unaligned_zeropad().
Signed-off-by: Peter Collingbourne pcc@google.com Link: https://linux-review.googlesource.com/id/If4b22e43b5a4ca49726b4bf98ada827fdf... Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Cc: stable@vger.kernel.org
v2:
- new approach
lib/string.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/lib/string.c b/lib/string.c index eb4486ed40d25..b632c71df1a50 100644 --- a/lib/string.c +++ b/lib/string.c @@ -119,6 +119,7 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) if (count == 0 || WARN_ON_ONCE(count > INT_MAX)) return -E2BIG;
+#ifndef CONFIG_DCACHE_WORD_ACCESS #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
I would prefer this were written as:
#if !defined(CONFIG_DCACHE_WORD_ACCESS) && \ defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
Having 2 #ifs makes me think there is some reason for having them separable. But the logic here is for a single check.
There is indeed a reason for having two: there's an #else in the middle (which pertains to CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS only).
/* * If src is unaligned, don't cross a page boundary,@@ -133,12 +134,14 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) /* If src or dest is unaligned, don't do word-at-a-time. */ if (((long) dest | (long) src) & (sizeof(long) - 1)) max = 0; +#endif #endif
(Then no second #endif needed)
/*
* read_word_at_a_time() below may read uninitialized bytes after the
* trailing zero and use them in comparisons. Disable this optimization
* under KMSAN to prevent false positive reports.
* load_unaligned_zeropad() or read_word_at_a_time() below may read
* uninitialized bytes after the trailing zero and use them in
* comparisons. Disable this optimization under KMSAN to prevent
* false positive reports. */ if (IS_ENABLED(CONFIG_KMSAN)) max = 0;@@ -146,7 +149,11 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count) while (max >= sizeof(unsigned long)) { unsigned long c, data;
+#ifdef CONFIG_DCACHE_WORD_ACCESS
c = load_unaligned_zeropad(src+res);+#else c = read_word_at_a_time(src+res); +#endif if (has_zero(c, &data, &constants)) { data = prep_zero_mask(c, data, &constants); data = create_zero_mask(data);
The rest seems good. Though I do wonder: what happens on a page boundary for read_word_at_a_time(), then? We get back zero-filled remainder? Will that hide a missing NUL terminator? As in, it's not actually there because of the end of the page/granule, but a zero was put in, so now it looks like it's been terminated and the exception got eaten? And doesn't this hide MTE faults since we can't differentiate "overran MTE tag" from "overran granule while over-reading"?
Correct. The behavior I implemented seems good enough for now IMO (and at least good enough for backports). If we did want to detect this case, we would need an alternative to "load_unaligned_zeropad" that would do something other than fill the unreadable bytes with zeros in the fault handler. For example, it could fill them with all-ones. That would prevent the loop from being terminated by the tag check fault, and we would proceed to the next granule, take another tag check fault which would recur in the fault handler and cause the bug to be detected.
Peter
linux-stable-mirror@lists.linaro.org

