From: Peter Zijlstra peterz@infradead.org
commit ddd07b750382adc2b78fdfbec47af8a6e0d8ef37 upstream.
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: Bin Yang bin.yang@intel.com Cc: Mark Gross mark.gross@intel.com Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Wen Yang wenyang@linux.alibaba.com --- arch/x86/mm/pageattr.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 101f3ad0d6ad..ab87da7a6043 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -239,26 +239,12 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache, int in_flags, struct page **pages) { unsigned int i, level; -#ifdef CONFIG_PREEMPT - /* - * Avoid wbinvd() because it causes latencies on all CPUs, - * regardless of any CPU isolation that may be in effect. - * - * This should be extended for CAT enabled systems independent of - * PREEMPT because wbinvd() does not respect the CAT partitions and - * this is exposed to unpriviledged users through the graphics - * subsystem. - */ - unsigned long do_wbinvd = 0; -#else - unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */ -#endif
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
- on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1); + flush_tlb_all();
- if (!cache || do_wbinvd) + if (!cache) return;
/*
On Mon, Jul 04, 2022 at 11:45:08PM +0800, Wen Yang wrote:
From: Peter Zijlstra peterz@infradead.org
commit ddd07b750382adc2b78fdfbec47af8a6e0d8ef37 upstream.
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: Bin Yang bin.yang@intel.com Cc: Mark Gross mark.gross@intel.com Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Wen Yang wenyang@linux.alibaba.com
arch/x86/mm/pageattr.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 101f3ad0d6ad..ab87da7a6043 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -239,26 +239,12 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache, int in_flags, struct page **pages) { unsigned int i, level; -#ifdef CONFIG_PREEMPT
- /*
* Avoid wbinvd() because it causes latencies on all CPUs,
* regardless of any CPU isolation that may be in effect.
*
* This should be extended for CAT enabled systems independent of
* PREEMPT because wbinvd() does not respect the CAT partitions and
* this is exposed to unpriviledged users through the graphics
* subsystem.
*/
- unsigned long do_wbinvd = 0;
-#else
- unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */
-#endif BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
- on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
- flush_tlb_all();
- if (!cache || do_wbinvd)
- if (!cache) return;
/* -- 2.19.1.6.gb485710b
Why is this needed on 4.19.y? What problem does it solve, it looks only like an optimization, not a bugfix.
And if it's a bugfix, why only 4.19.y, why not older kernels too?
We need more information here please.
thanks,
greg k-h
在 2022/7/4 下午11:59, Greg Kroah-Hartman 写道:
On Mon, Jul 04, 2022 at 11:45:08PM +0800, Wen Yang wrote:
From: Peter Zijlstra peterz@infradead.org
commit ddd07b750382adc2b78fdfbec47af8a6e0d8ef37 upstream.
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: Bin Yang bin.yang@intel.com Cc: Mark Gross mark.gross@intel.com Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Wen Yang wenyang@linux.alibaba.com
arch/x86/mm/pageattr.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 101f3ad0d6ad..ab87da7a6043 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -239,26 +239,12 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache, int in_flags, struct page **pages) { unsigned int i, level; -#ifdef CONFIG_PREEMPT
- /*
* Avoid wbinvd() because it causes latencies on all CPUs,
* regardless of any CPU isolation that may be in effect.
*
* This should be extended for CAT enabled systems independent of
* PREEMPT because wbinvd() does not respect the CAT partitions and
* this is exposed to unpriviledged users through the graphics
* subsystem.
*/
- unsigned long do_wbinvd = 0;
-#else
- unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */
-#endif BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
- on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
- flush_tlb_all();
- if (!cache || do_wbinvd)
- if (!cache) return;
/* -- 2.19.1.6.gb485710b
Why is this needed on 4.19.y? What problem does it solve, it looks only like an optimization, not a bugfix.
And if it's a bugfix, why only 4.19.y, why not older kernels too?
We need more information here please.
On a 128-core Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz server, when the user program frequently calls nv_alloc_system_pages to allocate large memory, it often causes a delay of about 200 milliseconds for the entire system. In this way, other latency-sensitive tasks on this system are heavily impacted, causing stability issues in large-scale clusters as well.
nv_alloc_system_pages -> _set_memory_array -> change_page_attr_set_clr -> cpa_flush_array -> on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
This patch can be directly merged into the 4.19 kernel to solve this problem, and most of the machines in our production environment are 4.19 kernels.
We're also happy to apply it to the 4.14 and 4.9 kernels, and send the corresponding patches soon, although there are very few such servers in our production clusters.
-- Best wishes, Wen
On 7/4/22 20:45, Wen Yang wrote:> On a 128-core Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz server,
when the user program frequently calls nv_alloc_system_pages to allocate large memory,
We seem to be repeating the same conversation we had back in November of last year:
https://lore.kernel.org/all/9c415df9-9575-8217-03e9-a6bbf20a491a@linux.aliba...
It's the binary nvidia driver doing unusual stuff again. Please talk to the folks you got that driver from.
On Tue, Jul 05, 2022 at 11:45:29AM +0800, Wen Yang wrote:
在 2022/7/4 下午11:59, Greg Kroah-Hartman 写道:
On Mon, Jul 04, 2022 at 11:45:08PM +0800, Wen Yang wrote:
From: Peter Zijlstra peterz@infradead.org
commit ddd07b750382adc2b78fdfbec47af8a6e0d8ef37 upstream.
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: Bin Yang bin.yang@intel.com Cc: Mark Gross mark.gross@intel.com Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Wen Yang wenyang@linux.alibaba.com
arch/x86/mm/pageattr.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 101f3ad0d6ad..ab87da7a6043 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -239,26 +239,12 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache, int in_flags, struct page **pages) { unsigned int i, level; -#ifdef CONFIG_PREEMPT
- /*
* Avoid wbinvd() because it causes latencies on all CPUs,
* regardless of any CPU isolation that may be in effect.
*
* This should be extended for CAT enabled systems independent of
* PREEMPT because wbinvd() does not respect the CAT partitions and
* this is exposed to unpriviledged users through the graphics
* subsystem.
*/
- unsigned long do_wbinvd = 0;
-#else
- unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */
-#endif BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
- on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
- flush_tlb_all();
- if (!cache || do_wbinvd)
- if (!cache) return; /*
-- 2.19.1.6.gb485710b
Why is this needed on 4.19.y? What problem does it solve, it looks only like an optimization, not a bugfix.
And if it's a bugfix, why only 4.19.y, why not older kernels too?
We need more information here please.
On a 128-core Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz server, when the user program frequently calls nv_alloc_system_pages to allocate large memory, it often causes a delay of about 200 milliseconds for the entire system. In this way, other latency-sensitive tasks on this system are heavily impacted, causing stability issues in large-scale clusters as well.
nv_alloc_system_pages -> _set_memory_array -> change_page_attr_set_clr -> cpa_flush_array -> on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
This patch can be directly merged into the 4.19 kernel to solve this problem, and most of the machines in our production environment are 4.19 kernels.
Ah. So what has changed from last year when I rejected this then: https://lore.kernel.org/all/9c415df9-9575-8217-03e9-a6bbf20a491a@linux.aliba...
Please do not try to submit previously-rejected patches, that is very disingenuous.
greg k-h
linux-stable-mirror@lists.linaro.org