On 3/16/2021 2:34 PM, Stephane Eranian wrote:
On Tue, Mar 16, 2021 at 5:28 AM Liang, Kan kan.liang@linux.intel.com wrote:
On 3/16/2021 3:22 AM, Namhyung Kim wrote:
Hi Peter and Kan,
On Thu, Mar 4, 2021 at 5:22 AM Peter Zijlstra peterz@infradead.org wrote:
On Wed, Mar 03, 2021 at 02:53:00PM -0500, Liang, Kan wrote:
On 3/3/2021 1:59 PM, Peter Zijlstra wrote:
On Wed, Mar 03, 2021 at 05:42:18AM -0800, kan.liang@linux.intel.com wrote:
> +++ b/arch/x86/events/intel/ds.c > @@ -2000,18 +2000,6 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d > continue; > } > - /* > - * On some CPUs the PEBS status can be zero when PEBS is > - * racing with clearing of GLOBAL_STATUS. > - * > - * Normally we would drop that record, but in the > - * case when there is only a single active PEBS event > - * we can assume it's for that event. > - */ > - if (!pebs_status && cpuc->pebs_enabled && > - !(cpuc->pebs_enabled & (cpuc->pebs_enabled-1))) > - pebs_status = cpuc->pebs_enabled;
Wouldn't something like:
pebs_status = p->status = cpus->pebs_enabled;
I didn't consider it as a potential solution in this patch because I don't think it's a proper way that SW modifies the buffer, which is supposed to be manipulated by the HW.
Right, but then HW was supposed to write sane values and it doesn't do that either ;-)
It's just a personal preference. I don't see any issue here. We may try it.
So I mostly agree with you, but I think it's a shame to unsupport such chips, HSW is still a plenty useable chip today.
I got a similar issue on ivybridge machines which caused kernel crash. My case it's related to the branch stack with PEBS events but I think it's the same issue. And I can confirm that the above approach of updating p->status fixed the problem.
I've talked to Stephane about this, and he wants to make it more robust when we see stale (or invalid) PEBS records. I'll send the patch soon.
Hi Namhyung,
In case you didn't see it, I've already submitted a patch to fix the issue last Friday. https://lore.kernel.org/lkml/1615555298-140216-1-git-send-email-kan.liang@li... But if you have a more robust proposal, please feel free to submit it.
This fixes the problem on the older systems. The other problem we identified related to the PEBS sample processing code is that you can end up with uninitialized perf_sample_data struct passed to perf_event_overflow():
setup_pebs_fixed_sample_data(pebs, data) { if (!pebs) return; perf_sample_data_init(data); <<< must be moved before the if (!pebs) ... }
__intel_pmu_pebs_event(pebs, data) { setup_sample(pebs, data) perf_event_overflow(data); ... }
If there is any other reason to get a pebs = NULL in fix_sample_data() or adaptive_sample_data(), then you must call perf_sample_data_init(data) BEFORE you return otherwise you end up in perf_event_overflow() with uninitialized data and you may die as follows:
[<ffffffff812f283d>] ? perf_output_copy+0x4d/0xb0 [<ffffffff812e0fb1>] perf_output_sample+0x561/0xab0 [<ffffffff812e0952>] ? __perf_event_header__init_id+0x112/0x130 [<ffffffff812e1be1>] ? perf_prepare_sample+0x1b1/0x730 [<ffffffff812e21b9>] perf_event_output_forward+0x59/0x80 [<ffffffff812e0634>] ? perf_event_update_userpage+0xf4/0x110 [<ffffffff812e4468>] perf_event_overflow+0x88/0xe0 [<ffffffff810175b8>] __intel_pmu_pebs_event+0x328/0x380
This all stems from get_next_pebs_record_by_bit() potentially returning NULL but the NULL is not handled correctly by the callers. This is what I'd like to see cleaned up in __intel_pmu_pebs_event() to avoid future problems.
Got it. Yes, we need another patch to correctly handle the potentially returning NULL. Thanks for pointing it out.
Thanks, Kan