On Thu, Sep 16, 2021 at 08:30:42AM -0700, Jakub Kicinski wrote:
On Thu, 16 Sep 2021 10:07:07 -0500 Bjorn Helgaas wrote:
On Thu, Sep 16, 2021 at 06:17:39AM -0700, Jakub Kicinski wrote:
My Lenovo T490s with i7-8665U had been marking TSC as unstable since v5.13, resulting in very sluggish desktop experience...
Including the actual dmesg log line here might help others locate this fix.
Good point, will add in v2.
clocksource: timekeeping watchdog on CPU3: hpet read-back delay of 316000ns, attempt 4, marking unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (14539801827657, -530891666)<-(14539319241737, -48307500) clocksource: Checking clocksource tsc synchronization from CPU 3 to CPUs 0-2,6-7. clocksource: Switched to clocksource hpet
I have a 8086:3e34 bridge, also known as "Host bridge: Intel Corporation Coffee Lake HOST and DRAM Controller (rev 0c)". Add it to the list.
We should perhaps consider applying this quirk more widely. The Intel documentation does not list my device [1], but linuxhw [2] does, and it seems to list a few more bridges we do not currently cover (3e31, 3ecc, 3e35, 3e0f).
In the fine tradition of:
e0748539e3d5 ("x86/intel: Disable HPET on Intel Ice Lake platforms") f8edbde885bb ("x86/intel: Disable HPET on Intel Coffee Lake H platforms") fc5db58539b4 ("x86/quirks: Disable HPET on Intel Coffe Lake platforms") 62187910b0fc ("x86/intel: Add quirk to disable HPET for the Baytrail plat form")
This seems to be an ongoing issue, not just a point defect in a single product, and I really hate the onesy-twosy nature of this.
Indeed. Or at least cover all Coffee Lakes in one fell swoop.
Is there really no way to detect this issue automatically or fix whatever Linux bug makes us trip over this? I am no clock expert, so I have absolutely no idea whether this is possible.
I'm deferring to clock experts. Paul mentioned he has some prototype patches that may help.
[1] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/8th-... [2] https://github.com/linuxhw/DevicePopulation/blob/master/README.md
Cc: stable@vger.kernel.org # v5.13+
How did you pick v5.13? force_disable_hpet() was added by 62187910b0fc ("x86/intel: Add quirk to disable HPET for the Baytrail platform"), which appeared in v3.15.
Erm, good question, it started happening for me (and others with the same laptop) with v5.13. I just sort of assumed it was 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold").
It usually takes a day to repro (4 hours was the quickest repro I've seen) so bisection was kind of out of question.
OK, so this is an intermittent condition where HPET is sometimes slow to access for a short period of time? If that is the case, my thought is to set the clocksource to be reinitialized (without a splat and without marking the clocksource unstable), and to splat (and mark the clocksource unstable) if it is not get a good read after 100 subsequent attempts.
So as long as the period of slowness lasts for less than 50 seconds, things would work fine.
Seem reasonable?
Thanx, Paul