The commit: 3f235279828c ("x86/cpu: Restore AMD's DE_CFG MSR after resume") renamed the MSR_F10H_DECFG_LFENCE_SERIALIZE macro to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. The fix changed MSR_F10H_DECFG_LFENCE_SERIALIZE to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT in the init_amd() function, but should have used MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. This causes a discrepancy in the LFENCE serialization check in the init_amd() function.
This causes a ~16% sysbench memory regression, when running: sysbench --test=memory run
Fixes: 3f235279828c2a8aff3164fef08d58f7af2d64fc("x86/cpu: Restore AMD's DE_CFG MSR after resume ") Signed-off-by: Rhythm Mahajan rhythm.m.mahajan@oracle.com ---
The test result before the commit 3f2352798("x86/cpu: Restore AMD's DE_CFG MSR after resume")
$ sysbench --test=memory run sysbench 1.0.17 (using system LuaJIT 2.0.4)
Running the test with following options: Number of threads: 1 Initializing random number generator from current time
Running memory speed test with the following options: block size: 1KiB total size: 102400MiB operation: write scope: global
Initializing worker threads...
Threads started!
Total operations: 27466829 (2746182.07 per second)
26823.08 MiB transferred (2681.82 MiB/sec)
General statistics: total time: 10.0001s total number of events: 27466829
Latency (ms): min: 0.00 avg: 0.00 max: 0.20 95th percentile: 0.00 sum: 4041.60
Threads fairness: events (avg/stddev): 27466829.0000/0.00 execution time (avg/stddev): 4.0416/0.00
The test result after the commit 3f2352798("x86/cpu: Restore AMD's DE_CFG MSR after resume")
$ sysbench --test=memory run sysbench 1.0.17 (using system LuaJIT 2.0.4)
Running the test with following options: Number of threads: 1 Initializing random number generator from current time
Running memory speed test with the following options: block size: 1KiB total size: 102400MiB operation: write scope: global
Initializing worker threads...
Threads started!
Total operations: 33758407 (3375232.84 per second)
32967.19 MiB transferred (3296.13 MiB/sec)
General statistics: total time: 10.0001s total number of events: 33758407
Latency (ms): min: 0.00 avg: 0.00 max: 0.06 95th percentile: 0.00 sum: 4115.95
Threads fairness: events (avg/stddev): 33758407.0000/0.00 execution time (avg/stddev): 4.1160/0.00 --- arch/x86/kernel/cpu/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index ee5d0f943ec8c..4122afeaaaff5 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -941,7 +941,7 @@ static void init_amd(struct cpuinfo_x86 *c) * serializing. */ ret = rdmsrl_safe(MSR_AMD64_DE_CFG, &val); - if (!ret && (val & MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT)) { + if (!ret && (val & MSR_AMD64_DE_CFG_LFENCE_SERIALIZE)) { /* A serializing LFENCE stops RDTSC speculation */ set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); } else {
On Tue, Mar 14, 2023 at 04:11:59AM -0700, Rhythm Mahajan wrote:
The commit: 3f235279828c ("x86/cpu: Restore AMD's DE_CFG MSR after resume") renamed the MSR_F10H_DECFG_LFENCE_SERIALIZE macro to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. The fix changed MSR_F10H_DECFG_LFENCE_SERIALIZE to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT in the init_amd() function, but should have used MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. This causes a discrepancy in the LFENCE serialization check in the init_amd() function.
This causes a ~16% sysbench memory regression, when running: sysbench --test=memory run
Fixes: 3f235279828c2a8aff3164fef08d58f7af2d64fc("x86/cpu: Restore AMD's DE_CFG MSR after resume ")
Odd line-wrapping :(
And please use the proper way to reference SHA1 as documented in the kernel documentation.
And why is this only needed in 4.14.y? What about Linus's tree and all of the other stable trees?
Please get this fixed in Linus's tree first and then we can take a backport.
thanks,
greg k-h
On 15/03/23 1:27 pm, Greg KH wrote:
On Tue, Mar 14, 2023 at 04:11:59AM -0700, Rhythm Mahajan wrote:
The commit: 3f235279828c ("x86/cpu: Restore AMD's DE_CFG MSR after resume") renamed the MSR_F10H_DECFG_LFENCE_SERIALIZE macro to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. The fix changed MSR_F10H_DECFG_LFENCE_SERIALIZE to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT in the init_amd() function, but should have used MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. This causes a discrepancy in the LFENCE serialization check in the init_amd() function.
This causes a ~16% sysbench memory regression, when running: sysbench --test=memory run
Fixes: 3f235279828c2a8aff3164fef08d58f7af2d64fc("x86/cpu: Restore AMD's DE_CFG MSR after resume ")
Odd line-wrapping :(
And please use the proper way to reference SHA1 as documented in the kernel documentation.
Thanks, I will send a v2 for this.
And why is this only needed in 4.14.y? What about Linus's tree and all of the other stable trees?
The regression was introduced after the backport of 2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume") for 4.14.y and 4.9.y. Mainline and other stable don't have this regression. The fix is only needed for 4.14.y and 4.9.y.
Please get this fixed in Linus's tree first and then we can take a backport.
Mainline doesn't require this fix.
thanks,
greg k-h
Thanks, Rhythm
On Wed, Mar 15, 2023 at 02:11:57PM +0530, rhythm.m.mahajan@oracle.com wrote:
On 15/03/23 1:27 pm, Greg KH wrote:
On Tue, Mar 14, 2023 at 04:11:59AM -0700, Rhythm Mahajan wrote:
The commit: 3f235279828c ("x86/cpu: Restore AMD's DE_CFG MSR after resume") renamed the MSR_F10H_DECFG_LFENCE_SERIALIZE macro to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. The fix changed MSR_F10H_DECFG_LFENCE_SERIALIZE to MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT in the init_amd() function, but should have used MSR_AMD64_DE_CFG_LFENCE_SERIALIZE. This causes a discrepancy in the LFENCE serialization check in the init_amd() function.
This causes a ~16% sysbench memory regression, when running: sysbench --test=memory run
Fixes: 3f235279828c2a8aff3164fef08d58f7af2d64fc("x86/cpu: Restore AMD's DE_CFG MSR after resume ")
Odd line-wrapping :(
And please use the proper way to reference SHA1 as documented in the kernel documentation.
Thanks, I will send a v2 for this.
And why is this only needed in 4.14.y? What about Linus's tree and all of the other stable trees?
The regression was introduced after the backport of 2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume") for 4.14.y and 4.9.y. Mainline and other stable don't have this regression. The fix is only needed for 4.14.y and 4.9.y.
Ah, then please make this more obvious in your changelog text that this is a problem that is only caused by the backport and is not relevant anywhere else.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org