Re: [PATCH] Revert "perf/x86: Allow zero PEBS status with only single active event"

3 Mar 2021


      On 3/3/2021 1:59 PM, Peter Zijlstra wrote:
...
On Wed, Mar 03, 2021 at 05:42:18AM -0800, kan.liang@linux.intel.com wrote:
...
For some old CPUs (HSW and earlier), the PEBS status in a PEBS record
may be mistakenly set to 0. To minimize the impact of the defect, the
commit was introduced to try to avoid dropping the PEBS record for some
cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates
the local pebs_status accordingly. However, it doesn't correct the PEBS
status in the PEBS record, which may trigger the crash, especially for
the large PEBS.
It's possible that all the PEBS records in a large PEBS have the PEBS
status 0. If so, the first get_next_pebs_record_by_bit() in the
__intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
PEBS, the 'count' parameter must > 1. The second
get_next_pebs_record_by_bit() will crash.
Two solutions were considered to fix the crash.

Keep the SW workaround and add extra checks in the
 get_next_pebs_record_by_bit() to workaround the issue. The
 get_next_pebs_record_by_bit() is a critical path. The extra checks
 will bring extra overhead for the latest CPUs which don't have the
 defect. Also, the defect can only be observed on some old CPUs
 (For example, the issue can be reproduced on an HSW client, but I
 didn't observe the issue on my Haswell server machine.). The impact
 of the defect should be limit.
 This solution is dropped.
Drop the SW workaround and revert the commit.
 It seems that the commit never works, because the PEBS status in the
 PEBS record never be changed. The get_next_pebs_record_by_bit() only
 checks the PEBS status in the PEBS record. The record is dropped
 eventually. Reverting the commit should not change the current
 behavior.

...
+++ b/arch/x86/events/intel/ds.c
@@ -2000,18 +2000,6 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
   		continue;
   	}

/*


 * On some CPUs the PEBS status can be zero when PEBS is


 * racing with clearing of GLOBAL_STATUS.


 *


 * Normally we would drop that record, but in the


 * case when there is only a single active PEBS event


 * we can assume it's for that event.


 */


if (!pebs_status && cpuc->pebs_enabled &&


	!(cpuc->pebs_enabled & (cpuc->pebs_enabled-1)))


	pebs_status = cpuc->pebs_enabled;



Wouldn't something like:
	pebs_status = p->status = cpus->pebs_enabled;


I didn't consider it as a potential solution in this patch because I 
don't think it's a proper way that SW modifies the buffer, which is 
supposed to be manipulated by the HW.
It's just a personal preference. I don't see any issue here. We may try it.
Vince, could you please help check whether Peter's suggestion fixes the 
crash?
Thanks,
Kan
...
actually fix things without adding overhead?

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] Revert "perf/x86: Allow zero PEBS status with only single active event"