On Wed, Mar 03, 2021 at 05:42:18AM -0800, kan.liang@linux.intel.com wrote:
For some old CPUs (HSW and earlier), the PEBS status in a PEBS record may be mistakenly set to 0. To minimize the impact of the defect, the commit was introduced to try to avoid dropping the PEBS record for some cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates the local pebs_status accordingly. However, it doesn't correct the PEBS status in the PEBS record, which may trigger the crash, especially for the large PEBS.
It's possible that all the PEBS records in a large PEBS have the PEBS status 0. If so, the first get_next_pebs_record_by_bit() in the __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large PEBS, the 'count' parameter must > 1. The second get_next_pebs_record_by_bit() will crash.
Two solutions were considered to fix the crash.
- Keep the SW workaround and add extra checks in the get_next_pebs_record_by_bit() to workaround the issue. The get_next_pebs_record_by_bit() is a critical path. The extra checks will bring extra overhead for the latest CPUs which don't have the defect. Also, the defect can only be observed on some old CPUs (For example, the issue can be reproduced on an HSW client, but I didn't observe the issue on my Haswell server machine.). The impact of the defect should be limit. This solution is dropped.
- Drop the SW workaround and revert the commit. It seems that the commit never works, because the PEBS status in the PEBS record never be changed. The get_next_pebs_record_by_bit() only checks the PEBS status in the PEBS record. The record is dropped eventually. Reverting the commit should not change the current behavior.
+++ b/arch/x86/events/intel/ds.c @@ -2000,18 +2000,6 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d continue; }
/*
* On some CPUs the PEBS status can be zero when PEBS is
* racing with clearing of GLOBAL_STATUS.
*
* Normally we would drop that record, but in the
* case when there is only a single active PEBS event
* we can assume it's for that event.
*/
if (!pebs_status && cpuc->pebs_enabled &&
!(cpuc->pebs_enabled & (cpuc->pebs_enabled-1)))
pebs_status = cpuc->pebs_enabled;
Wouldn't something like:
pebs_status = p->status = cpus->pebs_enabled;
actually fix things without adding overhead?