Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- drivers/gpu/drm/i915/i915_memcpy.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c index fdd550405fd3..7c0e022586bc 100644 --- a/drivers/gpu/drm/i915/i915_memcpy.c +++ b/drivers/gpu/drm/i915/i915_memcpy.c @@ -24,6 +24,7 @@
#include <linux/kernel.h> #include <asm/fpu/api.h> +#include <asm/simd.h>
#include "i915_memcpy.h"
@@ -38,6 +39,12 @@ static DEFINE_STATIC_KEY_FALSE(has_movntdqa); #ifdef CONFIG_AS_MOVNTDQA static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len) { + if (unlikely(!may_use_simd())) { + memcpy(dst, src, len); + return; + } + + kernel_fpu_begin();
while (len >= 4) { @@ -67,6 +74,11 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len) { + if (unlikely(!may_use_simd())) { + memcpy(dst, src, len); + return; + } + kernel_fpu_begin();
while (len >= 4) {
On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
That would indicate that these functions are used from IRQ/softirq which break otherwise if the kernel is also using the registers. The crypto code uses it for that purpose.
So Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de
May I ask how large the memcpy can be? I'm asking in case it is large and an explicit rescheduling point might be needed.
Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
Sebastian
From: Sebastian Andrzej Siewior
Sent: 01 May 2020 11:42 On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
That would indicate that these functions are used from IRQ/softirq which break otherwise if the kernel is also using the registers. The crypto code uses it for that purpose.
So Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de
May I ask how large the memcpy can be? I'm asking in case it is large and an explicit rescheduling point might be needed.
It is also quite likely that a 'rep movs' copy will be at least just as fast on modern hardware.
Clearly if you are copying to/from PCIe memory you need the largest resisters possible - but I think the graphics buffers are mapped cached? (Otherwise I wouldn't see 3ms 'spins' while it invalidates the entire screen buffer cache.)
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, May 1, 2020 at 4:42 AM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de
Thanks.
May I ask how large the memcpy can be? I'm asking in case it is large and an explicit rescheduling point might be needed.
Yea I was worried about that too. I'm not an i915 developer, but so far as I can tell:
- The path from intel_engine_cmd_parser is <= 256 KiB for "known users", so that's rather large. - The path from perf_memcpy is either 4k, 64k, or 4M, depending on the type of object, so that seems gigantic, but I think that might be selftest code. - The path from compress_page appears to be PAGE_SIZE, so 4k, which meshes with the limits we set agreed on few weeks ago for the crypto stuff. - The path from guc_read_update_log_buffer appears to be 8k, 32k, 2M, or 8M, depending on the type of object, so that seems absurdly huge and doesn't appear to be selftest code either like the other case.
I have no doubt the i915 developers will jump in here waving their arms, but either way, it sure seems to me like you might have a point.
On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
Err, why does i915 implements its own uncached memcpy instead of relying on core functionality to start with?
On Fri, May 1, 2020 at 12:07 PM Christoph Hellwig hch@infradead.org wrote:
On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
Err, why does i915 implements its own uncached memcpy instead of relying on core functionality to start with?
I was wondering the same. It sure does seem like this ought to be more generalized functionality, with a name that represents the type of transfer it's optimized for (wc or similar).
Quoting Christoph Hellwig (2020-05-01 19:07:31)
On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
Err, why does i915 implements its own uncached memcpy instead of relying on core functionality to start with?
What is this core functionality that provides movntqda? -Chris
On Sun, May 03, 2020 at 09:20:19PM +0100, Chris Wilson wrote:
Err, why does i915 implements its own uncached memcpy instead of relying on core functionality to start with?
What is this core functionality that provides movntqda?
A sensible name might be memcpy_uncached or mempcy_nontemporal. But the important point is that this should be arch code with a common fallback rather than hacking it up in drivers.
From: Christoph Hellwig
Sent: 04 May 2020 17:03
On Sun, May 03, 2020 at 09:20:19PM +0100, Chris Wilson wrote:
Err, why does i915 implements its own uncached memcpy instead of relying on core functionality to start with?
What is this core functionality that provides movntqda?
A sensible name might be memcpy_uncached or mempcy_nontemporal. But the important point is that this should be arch code with a common fallback rather than hacking it up in drivers.
More the point, you are trying to do a copy where: 1) The kernel isn't expected to read the data - so can bypass the cache. and maybe: 2) The data needs flushing from the cache to actual memory. and maybe: 3) The cache lines need invalidating.
The fallbacks depend on the required behaviour.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
Cc: stable@vger.kernel.org
The same argument as on the previous submission is that we return to the caller if we can't use movntqda as their fallback path should be faster than uncached memcpy. -Chris
On Sun, May 3, 2020 at 2:30 PM Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
Sometimes it's not okay to use SIMD registers, the conditions for which have changed subtly from kernel release to kernel release. Usually the pattern is to check for may_use_simd() and then fallback to using something slower in the unlikely case SIMD registers aren't available. So, this patch fixes up i915's accelerated memcpy routines to fallback to boring memcpy if may_use_simd() is false.
Cc: stable@vger.kernel.org
The same argument as on the previous submission is that we return to the caller if we can't use movntqda as their fallback path should be faster than uncached memcpy.
Oh, THAT's what you meant before. Okay, will follow up.
linux-stable-mirror@lists.linaro.org