On Fri, Jan 14, 2022 at 1:58 PM Andrey Konovalov andreyknvl@gmail.com wrote:
On Thu, Jan 13, 2022 at 6:14 AM Peter Collingbourne pcc@google.com wrote:
It has been reported that the tag setting operation on newly-allocated pages can cause the page flags to be corrupted when performed concurrently with other flag updates as a result of the use of non-atomic operations.
Is it know how exactly this race happens? Why are flags for a newly allocated page being accessed concurrently?
In the report that we received, the race resulted in a crash in kswapd. This may just be a symptom of the problem though.
I haven't closely audited all of the callers to page_kasan_tag_set() to check whether they may be operating on already-visible pages, but at least it doesn't appear to be unanticipated that there may be other threads accessing the page flags concurrently with a call to page_kasan_tag_set() (see the calls to smp_wmb() in arch/arm64/kernel/mte.c, arch/arm64/mm/copypage.c and arch/arm64/mm/mteswap.c).
Fix the problem by using a compare-exchange loop to update the tag.
Signed-off-by: Peter Collingbourne pcc@google.com Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0ce... Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Cc: stable@vger.kernel.org
include/linux/mm.h | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index c768a7c81b0b..b544b0a9f537 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1531,11 +1531,17 @@ static inline u8 page_kasan_tag(const struct page *page)
static inline void page_kasan_tag_set(struct page *page, u8 tag) {
if (kasan_enabled()) {
tag ^= 0xff;
page->flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT);
page->flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT;
}
unsigned long old_flags, flags;
if (!kasan_enabled())
return;
tag ^= 0xff;
do {
old_flags = flags = page->flags;
I guess this should be at least READ_ONCE(page->flags) if we care about concurrency.
Makes sense. I copied this code from page_cpupid_xchg_last() in mm/mmzone.c which has the same problem. I'll send a patch to fix that one as well.
Peter