The following commit has been merged into the perf/core branch of tip:
Commit-ID: 6532783310e2b2f50dc13f46c49aa6546cb6e7a3
Gitweb: https://git.kernel.org/tip/6532783310e2b2f50dc13f46c49aa6546cb6e7a3
Author: Alexander Antonov <alexander.antonov(a)linux.intel.com>
AuthorDate: Thu, 17 Nov 2022 12:28:25
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Thu, 24 Nov 2022 11:09:20 +01:00
perf/x86/intel/uncore: Clear attr_update properly
Current clear_attr_update procedure in pmu_set_mapping() sets attr_update
field in NULL that is not correct because intel_uncore_type pmu types can
contain several groups in attr_update field. For example, SPR platform
already has uncore_alias_group to update and then UPI topology group will
be added in next patches.
Fix current behavior and clear attr_update group related to mapping only.
Fixes: bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
Signed-off-by: Alexander Antonov <alexander.antonov(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221117122833.3103580-4-alexander.antonov@linux.…
---
arch/x86/events/intel/uncore_snbep.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index d3323f1..0d06b56 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3872,6 +3872,21 @@ static const struct attribute_group *skx_iio_attr_update[] = {
NULL,
};
+static void pmu_clear_mapping_attr(const struct attribute_group **groups,
+ struct attribute_group *ag)
+{
+ int i;
+
+ for (i = 0; groups[i]; i++) {
+ if (groups[i] == ag) {
+ for (i++; groups[i]; i++)
+ groups[i - 1] = groups[i];
+ groups[i - 1] = NULL;
+ break;
+ }
+ }
+}
+
static int
pmu_set_mapping(struct intel_uncore_type *type, struct attribute_group *ag,
ssize_t (*show)(struct device*, struct device_attribute*, char*),
@@ -3926,7 +3941,7 @@ clear_attrs:
clear_topology:
pmu_free_topology(type);
clear_attr_update:
- type->attr_update = NULL;
+ pmu_clear_mapping_attr(type->attr_update, ag);
return ret;
}
Hi,
I'd like to request the follow commits to be backported to 5.15.y.
- dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
- 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
- a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
These patches fixed a data lost issue by preventing shmem pagecache from
being removed by memory error. These were not tagged for stable originally,
but that's revisited recently.
Thanks,
Naoya Horiguchi
On Wed, 23 Nov 2022 14:25:58 -0500
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
>
> After 65536 dynamic events have been added and removed, the "type" field
> of the event then uses the first type number that is available (not
> currently used by other events). A type number is the identifier of the
> binary blobs in the tracing ring buffer (known as events) to map them to
> logic that can parse the binary blob.
>
> The issue is that if a dynamic event (like a kprobe event) is traced and
> is in the ring buffer, and then that event is removed (because it is
> dynamic, which means it can be created and destroyed), if another dynamic
> event is created that has the same number that new event's logic on
> parsing the binary blob will be used.
>
> To show how this can be an issue, the following can crash the kernel:
>
> # cd /sys/kernel/tracing
> # for i in `seq 65536`; do
> echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
> # done
>
> For every iteration of the above, the writing to the kprobe_events will
> remove the old event and create a new one (with the same format) and
> increase the type number to the next available on until the type number
> reaches over 65535 which is the max number for the 16 bit type. After it
> reaches that number, the logic to allocate a new number simply looks for
> the next available number. When an dynamic event is removed, that number
> is then available to be reused by the next dynamic event created. That is,
> once the above reaches the max number, the number assigned to the event in
> that loop will remain the same.
>
> Now that means deleting one dynamic event and created another will reuse
> the previous events type number. This is where bad things can happen.
> After the above loop finishes, the kprobes/foo event which reads the
> do_sys_openat2 function call's first parameter as an integer.
>
> # echo 1 > kprobes/foo/enable
> # cat /etc/passwd > /dev/null
> # cat trace
> cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
> # echo 0 > kprobes/foo/enable
>
> Now if we delete the kprobe and create a new one that reads a string:
>
> # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
>
> And now we can the trace:
>
> # cat trace
> sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
> bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
Aah, right. Even if we remove one dynamic event, it was shown
as unknown events.
>
> And dmesg has:
>
> ==================================================================
> BUG: KASAN: use-after-free in string+0xd4/0x1c0
> Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
>
> CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
> Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5b/0x77
> print_report+0x17f/0x47b
> kasan_report+0xad/0x130
> string+0xd4/0x1c0
> vsnprintf+0x500/0x840
> seq_buf_vprintf+0x62/0xc0
> trace_seq_printf+0x10e/0x1e0
> print_type_string+0x90/0xa0
> print_kprobe_event+0x16b/0x290
> print_trace_line+0x451/0x8e0
> s_show+0x72/0x1f0
> seq_read_iter+0x58e/0x750
> seq_read+0x115/0x160
> vfs_read+0x11d/0x460
> ksys_read+0xa9/0x130
> do_syscall_64+0x3a/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fc2e972ade2
> Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
> RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
> RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
> R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
> R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
> </TASK>
>
> The buggy address belongs to the physical page:
> page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
> raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ^
> ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==================================================================
>
> This was found when Zheng Yejian sent a patch to convert the even type
> number assignment to use IDA, which gives the next available number, and
> this bug showed up in the fuzz testing by Yujie Liu and the kernel test
> robot. But after further analysis, I found that this behavior is the same
> as when the event type numbers go past the 16bit max (and the above shows
> that).
>
> As modules have a similar issue, but is dealt with by setting a
> "WAS_ENABLED" flag when a module event is enabled, and when the module is
> freed, if any of its events were enabled, the ring buffer that holds that
> event is also cleared, to prevent reading stale events. The same can be
> done for dynamic events.
Indeed. If the dynamic event had not been enabled, there is no reason
to clear the buffer.
This looks good to me.
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Thank you!
>
> If any dynamic event that is being removed was enabled, then make sure the
> buffers they were enabled in are now cleared.
>
> Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.co…
>
> Cc: stable(a)vger.kernel.org
> Depends-on: TBD ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
> Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework")
> Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events")
> Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in")
> Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced")
> Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
> Reported-by: Zheng Yejian <zhengyejian1(a)huawei.com>
> Reported-by: Yujie Liu <yujie.liu(a)intel.com>
> Reported-by: kernel test robot <yujie.liu(a)intel.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
> ---
> kernel/trace/trace_dynevent.c | 2 ++
> kernel/trace/trace_events.c | 11 ++++++++++-
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
> index 154996684fb5..4376887e0d8a 100644
> --- a/kernel/trace/trace_dynevent.c
> +++ b/kernel/trace/trace_dynevent.c
> @@ -118,6 +118,7 @@ int dyn_event_release(const char *raw_command, struct dyn_event_operations *type
> if (ret)
> break;
> }
> + tracing_reset_all_online_cpus();
> mutex_unlock(&event_mutex);
> out:
> argv_free(argv);
> @@ -214,6 +215,7 @@ int dyn_events_release_all(struct dyn_event_operations *type)
> break;
> }
> out:
> + tracing_reset_all_online_cpus();
> mutex_unlock(&event_mutex);
>
> return ret;
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 0449e3c7d327..3bfaf560ecc4 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2947,7 +2947,10 @@ static int probe_remove_event_call(struct trace_event_call *call)
> * TRACE_REG_UNREGISTER.
> */
> if (file->flags & EVENT_FILE_FL_ENABLED)
> - return -EBUSY;
> + goto busy;
> +
> + if (file->flags & EVENT_FILE_FL_WAS_ENABLED)
> + tr->clear_trace = true;
> /*
> * The do_for_each_event_file_safe() is
> * a double loop. After finding the call for this
> @@ -2960,6 +2963,12 @@ static int probe_remove_event_call(struct trace_event_call *call)
> __trace_remove_event_call(call);
>
> return 0;
> + busy:
> + /* No need to clear the trace now */
> + list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> + tr->clear_trace = false;
> + }
> + return -EBUSY;
> }
>
> /* Remove an event_call */
> --
> 2.35.1
>
>
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
The patch titled
Subject: mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
Date: Wed, 23 Nov 2022 17:56:52 +0100
Any codepath that zaps page table entries must invoke MMU notifiers to
ensure that secondary MMUs (like KVM) don't keep accessing pages which
aren't mapped anymore. Secondary MMUs don't hold their own references to
pages that are mirrored over, so failing to notify them can lead to page
use-after-free.
I'm marking this as addressing an issue introduced in commit f3f0e1d2150b
("khugepaged: add support of collapse for tmpfs/shmem pages"), but most of
the security impact of this only came in commit 27e1f8273113 ("khugepaged:
enable collapse pmd for pte-mapped THP"), which actually omitted flushes
for the removal of present PTEs, not just for the removal of empty page
tables.
Link: https://lkml.kernel.org/r/20221123165652.2204925-5-jannh@google.com
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem p=
ages")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths
+++ a/mm/khugepaged.c
@@ -1421,6 +1421,7 @@ static void collapse_and_free_pmd(struct
{
pmd_t pmd;
struct mmu_gather tlb;
+ struct mmu_notifier_range range;
mmap_assert_write_locked(mm);
if (vma->vm_file)
@@ -1433,12 +1434,16 @@ static void collapse_and_free_pmd(struct
lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
page_table_check_pte_clear_range(mm, addr, pmd);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, addr,
+ addr + HPAGE_PMD_SIZE);
+ mmu_notifier_invalidate_range_start(&range);
tlb_gather_mmu(&tlb, mm);
pmd = READ_ONCE(*pmdp);
pmd_clear(pmdp);
tlb_flush_pte_range(&tlb, addr, HPAGE_PMD_SIZE);
pte_free_tlb(&tlb, pmd_pgtable(pmd), addr);
tlb_finish_mmu(&tlb);
+ mmu_notifier_invalidate_range_end(&range);
mm_dec_nr_ptes(mm);
}
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-khugepaged-take-the-right-locks-for-page-table-retraction.patch
mmu_gather-use-macro-arguments-more-carefully.patch
mm-khugepaged-fix-gup-fast-interaction-by-freeing-ptes-via-mmu_gather.patch
mm-khugepaged-invoke-mmu-notifiers-in-shmem-file-collapse-paths.patch