On Thu, Sep 16, 2021 at 10:07:07AM -0500, Bjorn Helgaas wrote:
This seems to be an ongoing issue, not just a point defect in a single product, and I really hate the onesy-twosy nature of this. Is there really no way to detect this issue automatically or fix whatever Linux bug makes us trip over this? I am no clock expert, so I have absolutely no idea whether this is possible.
X86 is gifted with the grant total of _0_ reliable clocks. Given no accurate time, it is impossible to tell which one of them is broken worst. Although I suppose we could attempt to synchronize against the PMU or MPERF..
We could possibly disable the tsc watchdog for X86_FEATURE_TSC_KNOWN_FREQ && X86_FEATURE_TSC_ADJUST I suppose.
And then have people with 'creative' BIOS get to keep the pieces.