Hi all,
I've tested the following changes, belonging to merge commit f7dd3b1734e, on top of 4.9.68 after a very easy backport from 4.10, and I think it may be worthwhile adding them to 4.9.x:
x86/tsc: Limit the adjust value further x86/tsc: Annotate printouts as firmware bug x86/tsc: Force TSC_ADJUST register to value >= zero x86/tsc: Validate TSC_ADJUST after resume x86/tsc: Validate cpumask pointer before accessing it x86/tsc: Fix broken CONFIG_X86_TSC=n build x86/tsc: Try to adjust TSC if sync test fails x86/tsc: Prepare warp test for TSC adjustment x86/tsc: Move sync cleanup to a safe place x86/tsc: Sync test only for the first cpu in a package x86/tsc: Verify TSC_ADJUST from idle x86/tsc: Store and check TSC ADJUST MSR x86/tsc: Detect random warps x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() x86/tsc: Finalize the split of the TSC_RELIABLE flag x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable x86/tsc: Mark TSC frequency determined by CPUID as known x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag
These changes percisely fix an issue I am having with a relatively new 8-core Intel(R) Core(TM) i7-7820X with an updated ASUS BIOS (December 2017).
Under v4.9.68, the kernel fallbacks on the chosen clocksource to HPET which just doesn't work - there is over a 200ms time drift that does not go away even after repeated ntpdate sync attempts.
For further testing I've posted a branch for these changes here:
https://github.com/kernelim/linux tsc-fix-for-4.9.x
On Wed, Dec 13, 2017 at 10:33:52AM +0200, Dan Aloni wrote:
Hi all,
I've tested the following changes, belonging to merge commit f7dd3b1734e, on top of 4.9.68 after a very easy backport from 4.10, and I think it may be worthwhile adding them to 4.9.x:
x86/tsc: Limit the adjust value further x86/tsc: Annotate printouts as firmware bug x86/tsc: Force TSC_ADJUST register to value >= zero x86/tsc: Validate TSC_ADJUST after resume x86/tsc: Validate cpumask pointer before accessing it x86/tsc: Fix broken CONFIG_X86_TSC=n build x86/tsc: Try to adjust TSC if sync test fails x86/tsc: Prepare warp test for TSC adjustment x86/tsc: Move sync cleanup to a safe place x86/tsc: Sync test only for the first cpu in a package x86/tsc: Verify TSC_ADJUST from idle x86/tsc: Store and check TSC ADJUST MSR x86/tsc: Detect random warps x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() x86/tsc: Finalize the split of the TSC_RELIABLE flag x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable x86/tsc: Mark TSC frequency determined by CPUID as known x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag
I need git commit ids to be able to do anything :)
These changes percisely fix an issue I am having with a relatively new 8-core Intel(R) Core(TM) i7-7820X with an updated ASUS BIOS (December 2017).
Under v4.9.68, the kernel fallbacks on the chosen clocksource to HPET which just doesn't work - there is over a 200ms time drift that does not go away even after repeated ntpdate sync attempts.
For further testing I've posted a branch for these changes here:
https://github.com/kernelim/linux tsc-fix-for-4.9.x
Why not just use 4.14 instead? That's much easier than trying to use an old kernel like 4.9, right?
thanks,
greg k-h
On Wed, Dec 13, 2017 at 10:03:35AM +0100, Greg KH wrote:
On Wed, Dec 13, 2017 at 10:33:52AM +0200, Dan Aloni wrote:
Hi all,
I've tested the following changes, belonging to merge commit f7dd3b1734e, on top of 4.9.68 after a very easy backport from 4.10, and I think it may be worthwhile adding them to 4.9.x:
[..]
I need git commit ids to be able to do anything :)
Sure, how about:
# git log 8c9b9d87b855 --oneline -n 19 --reverse --pretty="%h # %s" | awk -F" " '{print "git cherry-pick -x " $0}'
git cherry-pick -x 47c95a46d0fa # x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag git cherry-pick -x 4ca4df0b7eb0 # x86/tsc: Mark TSC frequency determined by CPUID as known git cherry-pick -x 4635fdc696a8 # x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable git cherry-pick -x f3a02ecebed7 # x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs git cherry-pick -x 984fecebda3b # x86/tsc: Finalize the split of the TSC_RELIABLE flag git cherry-pick -x 7b3d2f6e08ed # x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() git cherry-pick -x bec8520dca0d # x86/tsc: Detect random warps git cherry-pick -x 8b223bc7abe0 # x86/tsc: Store and check TSC ADJUST MSR git cherry-pick -x 1d0095feea59 # x86/tsc: Verify TSC_ADJUST from idle git cherry-pick -x a36f5136814b # x86/tsc: Sync test only for the first cpu in a package git cherry-pick -x 4c5e3c637521 # x86/tsc: Move sync cleanup to a safe place git cherry-pick -x 76d3b8515850 # x86/tsc: Prepare warp test for TSC adjustment git cherry-pick -x cc4db26899dc # x86/tsc: Try to adjust TSC if sync test fails git cherry-pick -x b836554386cc # x86/tsc: Fix broken CONFIG_X86_TSC=n build git cherry-pick -x 31f8a651fc57 # x86/tsc: Validate cpumask pointer before accessing it git cherry-pick -x 6a369583178d # x86/tsc: Validate TSC_ADJUST after resume git cherry-pick -x 5bae156241e0 # x86/tsc: Force TSC_ADJUST register to value >= zero git cherry-pick -x 16588f659257 # x86/tsc: Annotate printouts as firmware bug git cherry-pick -x 8c9b9d87b855 # x86/tsc: Limit the adjust value further
There's a conflict only in a one small place in the first few patches.
These changes percisely fix an issue I am having with a relatively new 8-core Intel(R) Core(TM) i7-7820X with an updated ASUS BIOS (December 2017).
Under v4.9.68, the kernel fallbacks on the chosen clocksource to HPET which just doesn't work - there is over a 200ms time drift that does not go away even after repeated ntpdate sync attempts.
For further testing I've posted a branch for these changes here:
https://github.com/kernelim/linux tsc-fix-for-4.9.x
Why not just use 4.14 instead? That's much easier than trying to use an old kernel like 4.9, right?
Yes, however the milage of 4.9.x seems more appealing somewhat.
I'll give 4.14.x a try mostly to see whether it solves hard locks that I've seen with 4.13.x (all Fedora-based stable kernels) on three of my machines -- an unrelated issue, and the main reason why I gave one of the LTS branches a try.
On Wed, Dec 13, 2017 at 11:45:20AM +0200, Dan Aloni wrote:
On Wed, Dec 13, 2017 at 10:03:35AM +0100, Greg KH wrote:
On Wed, Dec 13, 2017 at 10:33:52AM +0200, Dan Aloni wrote:
Hi all,
I've tested the following changes, belonging to merge commit f7dd3b1734e, on top of 4.9.68 after a very easy backport from 4.10, and I think it may be worthwhile adding them to 4.9.x:
[..]
I need git commit ids to be able to do anything :)
Sure, how about:
# git log 8c9b9d87b855 --oneline -n 19 --reverse --pretty="%h # %s" | awk -F" " '{print "git cherry-pick -x " $0}'
git cherry-pick -x 47c95a46d0fa # x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag git cherry-pick -x 4ca4df0b7eb0 # x86/tsc: Mark TSC frequency determined by CPUID as known git cherry-pick -x 4635fdc696a8 # x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable git cherry-pick -x f3a02ecebed7 # x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs git cherry-pick -x 984fecebda3b # x86/tsc: Finalize the split of the TSC_RELIABLE flag git cherry-pick -x 7b3d2f6e08ed # x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() git cherry-pick -x bec8520dca0d # x86/tsc: Detect random warps git cherry-pick -x 8b223bc7abe0 # x86/tsc: Store and check TSC ADJUST MSR git cherry-pick -x 1d0095feea59 # x86/tsc: Verify TSC_ADJUST from idle git cherry-pick -x a36f5136814b # x86/tsc: Sync test only for the first cpu in a package git cherry-pick -x 4c5e3c637521 # x86/tsc: Move sync cleanup to a safe place git cherry-pick -x 76d3b8515850 # x86/tsc: Prepare warp test for TSC adjustment git cherry-pick -x cc4db26899dc # x86/tsc: Try to adjust TSC if sync test fails git cherry-pick -x b836554386cc # x86/tsc: Fix broken CONFIG_X86_TSC=n build git cherry-pick -x 31f8a651fc57 # x86/tsc: Validate cpumask pointer before accessing it git cherry-pick -x 6a369583178d # x86/tsc: Validate TSC_ADJUST after resume git cherry-pick -x 5bae156241e0 # x86/tsc: Force TSC_ADJUST register to value >= zero git cherry-pick -x 16588f659257 # x86/tsc: Annotate printouts as firmware bug git cherry-pick -x 8c9b9d87b855 # x86/tsc: Limit the adjust value further
There's a conflict only in a one small place in the first few patches.
That's a lot of changes to be backported. I'm _really_ hesitant to do this, unless the maintainer of the code agrees it is ok...
These changes percisely fix an issue I am having with a relatively new 8-core Intel(R) Core(TM) i7-7820X with an updated ASUS BIOS (December 2017).
Under v4.9.68, the kernel fallbacks on the chosen clocksource to HPET which just doesn't work - there is over a 200ms time drift that does not go away even after repeated ntpdate sync attempts.
For further testing I've posted a branch for these changes here:
https://github.com/kernelim/linux tsc-fix-for-4.9.x
Why not just use 4.14 instead? That's much easier than trying to use an old kernel like 4.9, right?
Yes, however the milage of 4.9.x seems more appealing somewhat.
Why? 4.14 should be much better, it's newer, has more hardware support, more bugs fixed, and more new things left to debug :)
I'll give 4.14.x a try mostly to see whether it solves hard locks that I've seen with 4.13.x (all Fedora-based stable kernels) on three of my machines -- an unrelated issue, and the main reason why I gave one of the LTS branches a try.
You really should report that. Without that, odds are it will not be fixed.
thanks,
greg k-h
On Wed, Dec 13, 2017 at 10:57:55AM +0100, Greg KH wrote:
On Wed, Dec 13, 2017 at 11:45:20AM +0200, Dan Aloni wrote:
git cherry-pick -x 16588f659257 # x86/tsc: Annotate printouts as firmware bug git cherry-pick -x 8c9b9d87b855 # x86/tsc: Limit the adjust value further
There's a conflict only in a one small place in the first few patches.
[..] That's a lot of changes to be backported. I'm _really_ hesitant to do this, unless the maintainer of the code agrees it is ok...
I guessed so, that's why I probed. Otherwise I would have just sent out patches.
These changes percisely fix an issue I am having with a relatively new 8-core Intel(R) Core(TM) i7-7820X with an updated ASUS BIOS (December 2017).
Under v4.9.68, the kernel fallbacks on the chosen clocksource to HPET which just doesn't work - there is over a 200ms time drift that does not go away even after repeated ntpdate sync attempts.
For further testing I've posted a branch for these changes here:
https://github.com/kernelim/linux tsc-fix-for-4.9.x
Why not just use 4.14 instead? That's much easier than trying to use an old kernel like 4.9, right?
Yes, however the milage of 4.9.x seems more appealing somewhat.
Why? 4.14 should be much better, it's newer, has more hardware support, more bugs fixed, and more new things left to debug :)
I always enjoy debugging :)
I'll give 4.14.x a try mostly to see whether it solves hard locks that I've seen with 4.13.x (all Fedora-based stable kernels) on three of my machines -- an unrelated issue, and the main reason why I gave one of the LTS branches a try.
You really should report that. Without that, odds are it will not be fixed.
I am still collecting data, but these systems are being used rather constantly so the downtime is problematic. It's a) a rather new workstation, 2) an Intel Nuc, and 3) An old Lenovo Carbon X1 Gen 3.
I should have also used a vanilla build because I know that on LKML it has preference over the Fedora-based patchset. I will try to see if it produces on 4.14.x and perhaps kdump will be able to capture it this time.
On Wed, 13 Dec 2017, Greg KH wrote:
On Wed, Dec 13, 2017 at 11:45:20AM +0200, Dan Aloni wrote:
# git log 8c9b9d87b855 --oneline -n 19 --reverse --pretty="%h # %s" | awk -F" " '{print "git cherry-pick -x " $0}'
git cherry-pick -x 47c95a46d0fa # x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag git cherry-pick -x 4ca4df0b7eb0 # x86/tsc: Mark TSC frequency determined by CPUID as known git cherry-pick -x 4635fdc696a8 # x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable git cherry-pick -x f3a02ecebed7 # x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs git cherry-pick -x 984fecebda3b # x86/tsc: Finalize the split of the TSC_RELIABLE flag git cherry-pick -x 7b3d2f6e08ed # x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() git cherry-pick -x bec8520dca0d # x86/tsc: Detect random warps git cherry-pick -x 8b223bc7abe0 # x86/tsc: Store and check TSC ADJUST MSR git cherry-pick -x 1d0095feea59 # x86/tsc: Verify TSC_ADJUST from idle git cherry-pick -x a36f5136814b # x86/tsc: Sync test only for the first cpu in a package git cherry-pick -x 4c5e3c637521 # x86/tsc: Move sync cleanup to a safe place git cherry-pick -x 76d3b8515850 # x86/tsc: Prepare warp test for TSC adjustment git cherry-pick -x cc4db26899dc # x86/tsc: Try to adjust TSC if sync test fails git cherry-pick -x b836554386cc # x86/tsc: Fix broken CONFIG_X86_TSC=n build git cherry-pick -x 31f8a651fc57 # x86/tsc: Validate cpumask pointer before accessing it git cherry-pick -x 6a369583178d # x86/tsc: Validate TSC_ADJUST after resume git cherry-pick -x 5bae156241e0 # x86/tsc: Force TSC_ADJUST register to value >= zero git cherry-pick -x 16588f659257 # x86/tsc: Annotate printouts as firmware bug git cherry-pick -x 8c9b9d87b855 # x86/tsc: Limit the adjust value further
There's a conflict only in a one small place in the first few patches.
That's a lot of changes to be backported. I'm _really_ hesitant to do this, unless the maintainer of the code agrees it is ok...
Those TSC_ADJUST fixes are just an initial workaround. Peter has updated that since then to the final and proper solution, which makes it dependend on micro code version checks. If at all then the whole lot wants to be backported, which is way more than the above set.
Thanks,
tglx
linux-stable-mirror@lists.linaro.org