Dear Greg,
Commit ef86f3a7 (genirq/affinity: assign vectors to all possible CPUs) added
for Linux 4.14.56 causes the aacraid module to not detect the attached devices
anymore on a Dell PowerEdge R720 with two six core 24x E5-2630 @ 2.30GHz.
```
$ dmesg | grep raid
[ 0.269768] raid6: sse2x1 gen() 7179 MB/s
[ 0.290069] raid6: sse2x1 xor() 5636 MB/s
[ 0.311068] raid6: sse2x2 gen() 9160 MB/s
[ 0.332076] raid6: sse2x2 xor() 6375 MB/s
[ 0.353075] raid6: sse2x4 gen() 11164 MB/s
[ 0.374064] raid6: sse2x4 xor() 7429 MB/s
[ 0.379001] raid6: using algorithm sse2x4 gen() 11164 MB/s
[ 0.386001] raid6: .... xor() 7429 MB/s, rmw enabled
[ 0.391008] raid6: using ssse3x2 recovery algorithm
[ 3.559682] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
[ 3.570061] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[ 10.725767] Adaptec aacraid driver 1.2.1[50834]-custom
[ 10.731724] aacraid 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 10.743295] aacraid: Comm Interface type3 enabled
$ lspci -nn | grep Adaptec
04:00.0 Serial Attached SCSI controller [0107]: Adaptec Series 8 12G SAS/PCIe 3 [9005:028d] (rev 01)
42:00.0 Serial Attached SCSI controller [0107]: Adaptec Smart Storage PQI 12G SAS/PCIe 3 [9005:028f] (rev 01)
```
But, it still works with a Dell PowerEdge R715 with two eight core AMD
Opteron 6136, the card below.
```
$ lspci -nn | grep Adaptec
22:00.0 Serial Attached SCSI controller [0107]: Adaptec Series 8 12G SAS/PCIe 3 [9005:028d] (rev 01)
```
Reverting the commit fixes the issue.
commit ef86f3a72adb8a7931f67335560740a7ad696d1d
Author: Christoph Hellwig <hch(a)lst.de>
Date: Fri Jan 12 10:53:05 2018 +0800
genirq/affinity: assign vectors to all possible CPUs
commit 84676c1f21e8ff54befe985f4f14dc1edc10046b upstream.
Currently we assign managed interrupt vectors to all present CPUs. This
works fine for systems were we only online/offline CPUs. But in case of
systems that support physical CPU hotplug (or the virtualized version of
it) this means the additional CPUs covered for in the ACPI tables or on
the command line are not catered for. To fix this we'd either need to
introduce new hotplug CPU states just for this case, or we can start
assining vectors to possible but not present CPUs.
Reported-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Tested-by: Stefan Haberland <sth(a)linux.vnet.ibm.com>
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
Cc: linux-kernel(a)vger.kernel.org
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
The problem doesn’t happen with Linux 4.17.11, so there are commits in
Linux master fixing this. Unfortunately, my attempts to find out failed.
I was able to cherry-pick the three commits below on top of 4.14.62,
but the problem persists.
6aba81b5a2f5 genirq/affinity: Don't return with empty affinity masks on error
355d7ecdea35 scsi: hpsa: fix selection of reply queue
e944e9615741 scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity
Trying to cherry-pick the commits below, referencing the commit
in question, gave conflicts.
1. adbe552349f2 scsi: megaraid_sas: fix selection of reply queue
2. d3056812e7df genirq/affinity: Spread irq vectors among present CPUs as far as possible
To avoid further trial and error with the server with a slow firmware,
do you know what commits should fix the issue?
Kind regards,
Paul
PS: I couldn’t find, who suggested this for stable, that means how
it was picked to be added to stable. Is there an easy way to find
that out?
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0722069a5374b904ec1a67f91249f90e1cfae259 Mon Sep 17 00:00:00 2001
From: Andreas Ziegler <andreas.ziegler(a)fau.de>
Date: Wed, 16 Jan 2019 15:16:29 +0100
Subject: [PATCH] tracing/uprobes: Fix output for multiple string arguments
When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.
This is best explained in an example:
Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):
$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
/sys/kernel/debug/tracing/uprobe_events
When the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:
[...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"
Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.
The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.
Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.
Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.
Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler(a)fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 19a1a8e19062..9bde07c06362 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -160,6 +160,13 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
if (ret >= 0) {
if (ret == maxlen)
dst[ret - 1] = '\0';
+ else
+ /*
+ * Include the terminating null byte. In this case it
+ * was copied by strncpy_from_user but not accounted
+ * for in ret.
+ */
+ ret++;
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
}
Since moving the bannable boolean into the context flags, we lost the
default setting of contexts being bannable. Oops.
Sadly because we have multi-level banning scheme, our testcase for being
banned cannot distinguish between the expected ban on the context and
the applied banned via the fd.
Fixes: 6095868a271d ("drm/i915: Complete kerneldoc for struct i915_gem_context")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v4.11+
---
drivers/gpu/drm/i915/i915_gem_context.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 280813a4bf82..102866967998 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -364,6 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
list_add_tail(&ctx->link, &dev_priv->contexts.list);
ctx->i915 = dev_priv;
ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
+ ctx->user_flags = BIT(UCONTEXT_BANNABLE);
for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
--
2.20.1