Commit 4c21b8fd8f14 (MIPS: seccomp: Handle indirect system calls (o32))
added indirect syscall detection for O32 processes running on MIPS64,
but it did not work correctly for big endian kernel/processes. The
reason is that the syscall number is loaded from ARG1 using the lw
instruction while this is a 64-bit value, so zero is loaded instead of
the syscall number.
Fix the code by using the ld instruction instead. When running a 32-bit
processes on a 64 bit CPU, the values are properly sign-extended, so it
ensures the value passed to syscall_trace_enter is correct.
Recent systemd versions with seccomp enabled whitelist the getpid
syscall for their internal processes (e.g. systemd-journald), but call
it through syscall(SYS_getpid). This fix therefore allows O32 big endian
systems with a 64-bit kernel to run recent systemd versions.
Signed-off-by: Aurelien Jarno <aurelien(a)aurel32.net>
Cc: <stable(a)vger.kernel.org> # v3.15+
---
arch/mips/kernel/scall64-o32.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index f158c5894a9a..feb2653490df 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -125,7 +125,7 @@ trace_a_syscall:
subu t1, v0, __NR_O32_Linux
move a1, v0
bnez t1, 1f /* __NR_syscall at offset 0 */
- lw a1, PT_R4(sp) /* Arg1 for __NR_syscall case */
+ ld a1, PT_R4(sp) /* Arg1 for __NR_syscall case */
.set pop
1: jal syscall_trace_enter
--
2.20.1
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: 8b298d3a0bd5 - Linux 5.0.7
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out a ref:
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Ref: 8b298d3a0bd5 - Linux 5.0.7
We then merged the patchset with `git am`:
drm-i915-gvt-do-not-let-pin-count-of-shadow-mm-go-ne.patch
kbuild-pkg-use-f-srctree-makefile-to-recurse-to-top-.patch
netfilter-nft_compat-use-.release_ops-and-remove-lis.patch
netfilter-nf_tables-use-after-free-in-dynamic-operat.patch
netfilter-nf_tables-add-missing-release_ops-in-error.patch
hv_netvsc-fix-unwanted-wakeup-after-tx_disable.patch
ibmvnic-fix-completion-structure-initialization.patch
ip6_tunnel-match-to-arphrd_tunnel6-for-dev-type.patch
ipv6-fix-dangling-pointer-when-ipv6-fragment.patch
ipv6-sit-reset-ip-header-pointer-in-ipip6_rcv.patch
kcm-switch-order-of-device-registration-to-fix-a-cra.patch
net-ethtool-not-call-vzalloc-for-zero-sized-memory-r.patch
net-gro-fix-gro-flush-when-receiving-a-gso-packet.patch
net-mlx5-decrease-default-mr-cache-size.patch
netns-provide-pure-entropy-for-net_hash_mix.patch
net-rds-force-to-destroy-connection-if-t_sock-is-nul.patch
net-sched-act_sample-fix-divide-by-zero-in-the-traff.patch
net-sched-fix-get-helper-of-the-matchall-cls.patch
openvswitch-fix-flow-actions-reallocation.patch
qmi_wwan-add-olicard-600.patch
r8169-disable-aspm-again.patch
sctp-initialize-_pad-of-sockaddr_in-before-copying-t.patch
tcp-ensure-dctcp-reacts-to-losses.patch
tcp-fix-a-potential-null-pointer-dereference-in-tcp_.patch
vrf-check-accept_source_route-on-the-original-netdev.patch
net-mlx5e-fix-error-handling-when-refreshing-tirs.patch
net-mlx5e-add-a-lock-on-tir-list.patch
nfp-validate-the-return-code-from-dev_queue_xmit.patch
nfp-disable-netpoll-on-representors.patch
bnxt_en-improve-rx-consumer-index-validity-check.patch
bnxt_en-reset-device-on-rx-buffer-errors.patch
net-ip_gre-fix-possible-use-after-free-in-erspan_rcv.patch
net-ip6_gre-fix-possible-use-after-free-in-ip6erspan.patch
net-bridge-always-clear-mcast-matching-struct-on-rep.patch
net-thunderx-fix-null-pointer-dereference-in-nicvf_o.patch
net-vrf-fix-ping-failed-when-vrf-mtu-is-set-to-0.patch
net-core-netif_receive_skb_list-unlist-skb-before-pa.patch
r8169-disable-default-rx-interrupt-coalescing-on-rtl.patch
net-mlx5-add-a-missing-check-on-idr_find-free-buf.patch
net-mlx5e-update-xoff-formula.patch
net-mlx5e-update-xon-formula.patch
kbuild-clang-choose-gcc_toolchain_dir-not-on-ld.patch
lib-string.c-implement-a-basic-bcmp.patch
revert-clk-meson-clean-up-clock-registration.patch
tty-mark-siemens-r3964-line-discipline-as-broken.patch
tty-ldisc-add-sysctl-to-prevent-autoloading-of-ldiscs.patch
hwmon-w83773g-select-regmap_i2c-to-fix-build-error.patch
hwmon-occ-fix-power-sensor-indexing.patch
smb3-allow-persistent-handle-timeout-to-be-configurable-on-mount.patch
hid-logitech-handle-0-scroll-events-for-the-m560.patch
acpica-clear-status-of-gpes-before-enabling-them.patch
acpica-namespace-remove-address-node-from-global-list-after-method-termination.patch
alsa-seq-fix-oob-reads-from-strlcpy.patch
alsa-hda-realtek-enable-headset-mic-of-acer-travelmate-b114-21-with-alc233.patch
alsa-hda-realtek-add-quirk-for-tuxedo-xc-1509.patch
alsa-xen-front-do-not-use-stream-buffer-size-before-it-is-set.patch
alsa-hda-add-two-more-machines-to-the-power_save_blacklist.patch
mm-huge_memory.c-fix-modifying-of-page-protection-by-insert_pfn_pmd.patch
arm64-dts-rockchip-fix-rk3328-sdmmc0-write-errors.patch
mmc-alcor-don-t-write-data-before-command-has-completed.patch
mmc-sdhci-omap-don-t-finish_mrq-on-a-command-error-during-tuning.patch
parisc-detect-qemu-earlier-in-boot-process.patch
parisc-regs_return_value-should-return-gpr28.patch
parisc-also-set-iaoq_b-in-instruction_pointer_set.patch
alarmtimer-return-correct-remaining-time.patch
drm-i915-gvt-do-not-deliver-a-workload-if-its-creation-fails.patch
drm-sun4i-dw-hdmi-lower-max.-supported-rate-for-h6.patch
drm-udl-add-a-release-method-and-delay-modeset-teardown.patch
kvm-svm-fix-potential-get_num_contig_pages-overflow.patch
include-linux-bitrev.h-fix-constant-bitrev.patch
mm-writeback-use-exact-memcg-dirty-counts.patch
asoc-intel-fix-crash-at-suspend-resume-after-failed-codec-registration.patch
asoc-fsl_esai-fix-channel-swap-issue-when-stream-starts.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
kernel build: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
ppc64le:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
kernel build: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
x86_64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
kernel build: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
✅ tuned: tune-processes-through-perf [8]
✅ Usex - version 1.9-29 [9]
🚧 ✅ stress: stress-ng [10]
ppc64le:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [11]
✅ tuned: tune-processes-through-perf [8]
✅ Usex - version 1.9-29 [9]
🚧 ✅ stress: stress-ng [10]
x86_64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [11]
✅ tuned: tune-processes-through-perf [8]
✅ Usex - version 1.9-29 [9]
🚧 ✅ stress: stress-ng [10]
Test source:
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/us…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
Waived tests (marked with 🚧)
-----------------------------
This test run included waived tests. Such tests are executed but their results
are not taken into account. Tests are waived when their results are not
reliable enough, e.g. when they're just introduced or are being fixed.
From: Breno Leitao <leitao(a)debian.org>
commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream.
Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
moved a code block around and this block uses a 'msr' variable outside of
the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared
inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when
CONFIG_PPC_TRANSACTION_MEM is not defined.
error: 'msr' undeclared (first use in this function)
This is not causing a compilation error in the mainline kernel, because
'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as
the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:
#define MSR_TM_ACTIVE(x) 0
This patch just fixes this issue avoiding the 'msr' variable usage outside
the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the
MSR_TM_ACTIVE() definition.
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Biedl <linux-kernel.bfrz(a)manchmal.in-ulm.de>
Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Michael Neuling <mikey(a)neuling.org>
---
Greg: I think the original patch got rejected with a conflict.
This correctly applies to v4.19.34.
---
arch/powerpc/kernel/signal_64.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index bbd1c73243..14b0f5b6a3 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -755,12 +755,25 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
&uc_transact->uc_mcontext))
goto badframe;
- }
- else
- /* Fall through, for non-TM restore */
+ } else
#endif
- if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext))
- goto badframe;
+ {
+ /*
+ * Fall through, for non-TM restore
+ *
+ * Unset MSR[TS] on the thread regs since MSR from user
+ * context does not have MSR active, and recheckpoint was
+ * not called since restore_tm_sigcontexts() was not called
+ * also.
+ *
+ * If not unsetting it, the code can RFID to userspace with
+ * MSR[TS] set, but without CPU in the proper state,
+ * causing a TM bad thing.
+ */
+ current->thread.regs->msr &= ~MSR_TS_MASK;
+ if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext))
+ goto badframe;
+ }
if (restore_altstack(&uc->uc_stack))
goto badframe;
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 Mon Sep 17 00:00:00 2001
From: Alexander Potapenko <glider(a)google.com>
Date: Tue, 2 Apr 2019 13:28:13 +0200
Subject: [PATCH] x86/asm: Use stricter assembly constraints in bitops
There's a number of problems with how arch/x86/include/asm/bitops.h
is currently using assembly constraints for the memory region
bitops are modifying:
1) Use memory clobber in bitops that touch arbitrary memory
Certain bit operations that read/write bits take a base pointer and an
arbitrarily large offset to address the bit relative to that base.
Inline assembly constraints aren't expressive enough to tell the
compiler that the assembly directive is going to touch a specific memory
location of unknown size, therefore we have to use the "memory" clobber
to indicate that the assembly is going to access memory locations other
than those listed in the inputs/outputs.
To indicate that BTR/BTS instructions don't necessarily touch the first
sizeof(long) bytes of the argument, we also move the address to assembly
inputs.
This particular change leads to size increase of 124 kernel functions in
a defconfig build. For some of them the diff is in NOP operations, other
end up re-reading values from memory and may potentially slow down the
execution. But without these clobbers the compiler is free to cache
the contents of the bitmaps and use them as if they weren't changed by
the inline assembly.
2) Use byte-sized arguments for operations touching single bytes.
Passing a long value to ANDB/ORB/XORB instructions makes the compiler
treat sizeof(long) bytes as being clobbered, which isn't the case. This
may theoretically lead to worse code in the case of heavy optimization.
Practical impact:
I've built a defconfig kernel and looked through some of the functions
generated by GCC 7.3.0 with and without this clobber, and didn't spot
any miscompilations.
However there is a (trivial) theoretical case where this code leads to
miscompilation:
https://lkml.org/lkml/2019/3/28/393
using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes
such a function in the kernel someday.
So the primary motivation is to fix an existing misuse of the asm
directive, which happens to work in certain configurations now, but
isn't guaranteed to work under different circumstances.
[ --mingo: Added -stable tag because defconfig only builds a fraction
of the kernel and the trivial testcase looks normal enough to
be used in existing or in-development code. ]
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: James Y Knight <jyknight(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Paul E. McKenney <paulmck(a)linux.ibm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
[ Edited the changelog, tidied up one of the defines. ]
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index d153d570bb04..8e790ec219a5 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -36,16 +36,17 @@
* bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
*/
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define RLONG_ADDR(x) "m" (*(volatile long *) (x))
+#define WBYTE_ADDR(x) "+m" (*(volatile char *) (x))
-#define ADDR BITOP_ADDR(addr)
+#define ADDR RLONG_ADDR(addr)
/*
* We do the locked ops that don't return the old value as
* a mask operation on a byte.
*/
#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr))
-#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3))
+#define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3))
#define CONST_MASK(nr) (1 << ((nr) & 7))
/**
@@ -73,7 +74,7 @@ set_bit(long nr, volatile unsigned long *addr)
: "memory");
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
- : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -88,7 +89,7 @@ set_bit(long nr, volatile unsigned long *addr)
*/
static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory");
+ asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
/**
@@ -110,8 +111,7 @@ clear_bit(long nr, volatile unsigned long *addr)
: "iq" ((u8)~CONST_MASK(nr)));
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
- : BITOP_ADDR(addr)
- : "Ir" (nr));
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -131,7 +131,7 @@ static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *ad
static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr));
+ asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
@@ -139,7 +139,7 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile
bool negative;
asm volatile(LOCK_PREFIX "andb %2,%1"
CC_SET(s)
- : CC_OUT(s) (negative), ADDR
+ : CC_OUT(s) (negative), WBYTE_ADDR(addr)
: "ir" ((char) ~(1 << nr)) : "memory");
return negative;
}
@@ -155,13 +155,9 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile
* __clear_bit() is non-atomic and implies release semantics before the memory
* operation. It can be used for an unlock if no other CPUs can concurrently
* modify other bits in the word.
- *
- * No memory barrier is required here, because x86 cannot reorder stores past
- * older loads. Same principle as spin_unlock.
*/
static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
{
- barrier();
__clear_bit(nr, addr);
}
@@ -176,7 +172,7 @@ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *
*/
static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr));
+ asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
/**
@@ -196,8 +192,7 @@ static __always_inline void change_bit(long nr, volatile unsigned long *addr)
: "iq" ((u8)CONST_MASK(nr)));
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0"
- : BITOP_ADDR(addr)
- : "Ir" (nr));
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -242,8 +237,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
asm(__ASM_SIZE(bts) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr));
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -282,8 +277,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
asm volatile(__ASM_SIZE(btr) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr));
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -294,8 +289,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
asm volatile(__ASM_SIZE(btc) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr) : "memory");
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -326,7 +321,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
asm volatile(__ASM_SIZE(bt) " %2,%1"
CC_SET(c)
: CC_OUT(c) (oldbit)
- : "m" (*(unsigned long *)addr), "Ir" (nr));
+ : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory");
return oldbit;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 Mon Sep 17 00:00:00 2001
From: Alexander Potapenko <glider(a)google.com>
Date: Tue, 2 Apr 2019 13:28:13 +0200
Subject: [PATCH] x86/asm: Use stricter assembly constraints in bitops
There's a number of problems with how arch/x86/include/asm/bitops.h
is currently using assembly constraints for the memory region
bitops are modifying:
1) Use memory clobber in bitops that touch arbitrary memory
Certain bit operations that read/write bits take a base pointer and an
arbitrarily large offset to address the bit relative to that base.
Inline assembly constraints aren't expressive enough to tell the
compiler that the assembly directive is going to touch a specific memory
location of unknown size, therefore we have to use the "memory" clobber
to indicate that the assembly is going to access memory locations other
than those listed in the inputs/outputs.
To indicate that BTR/BTS instructions don't necessarily touch the first
sizeof(long) bytes of the argument, we also move the address to assembly
inputs.
This particular change leads to size increase of 124 kernel functions in
a defconfig build. For some of them the diff is in NOP operations, other
end up re-reading values from memory and may potentially slow down the
execution. But without these clobbers the compiler is free to cache
the contents of the bitmaps and use them as if they weren't changed by
the inline assembly.
2) Use byte-sized arguments for operations touching single bytes.
Passing a long value to ANDB/ORB/XORB instructions makes the compiler
treat sizeof(long) bytes as being clobbered, which isn't the case. This
may theoretically lead to worse code in the case of heavy optimization.
Practical impact:
I've built a defconfig kernel and looked through some of the functions
generated by GCC 7.3.0 with and without this clobber, and didn't spot
any miscompilations.
However there is a (trivial) theoretical case where this code leads to
miscompilation:
https://lkml.org/lkml/2019/3/28/393
using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes
such a function in the kernel someday.
So the primary motivation is to fix an existing misuse of the asm
directive, which happens to work in certain configurations now, but
isn't guaranteed to work under different circumstances.
[ --mingo: Added -stable tag because defconfig only builds a fraction
of the kernel and the trivial testcase looks normal enough to
be used in existing or in-development code. ]
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: James Y Knight <jyknight(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Paul E. McKenney <paulmck(a)linux.ibm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
[ Edited the changelog, tidied up one of the defines. ]
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index d153d570bb04..8e790ec219a5 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -36,16 +36,17 @@
* bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
*/
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define RLONG_ADDR(x) "m" (*(volatile long *) (x))
+#define WBYTE_ADDR(x) "+m" (*(volatile char *) (x))
-#define ADDR BITOP_ADDR(addr)
+#define ADDR RLONG_ADDR(addr)
/*
* We do the locked ops that don't return the old value as
* a mask operation on a byte.
*/
#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr))
-#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3))
+#define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3))
#define CONST_MASK(nr) (1 << ((nr) & 7))
/**
@@ -73,7 +74,7 @@ set_bit(long nr, volatile unsigned long *addr)
: "memory");
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
- : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -88,7 +89,7 @@ set_bit(long nr, volatile unsigned long *addr)
*/
static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory");
+ asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
/**
@@ -110,8 +111,7 @@ clear_bit(long nr, volatile unsigned long *addr)
: "iq" ((u8)~CONST_MASK(nr)));
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
- : BITOP_ADDR(addr)
- : "Ir" (nr));
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -131,7 +131,7 @@ static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *ad
static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr));
+ asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
@@ -139,7 +139,7 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile
bool negative;
asm volatile(LOCK_PREFIX "andb %2,%1"
CC_SET(s)
- : CC_OUT(s) (negative), ADDR
+ : CC_OUT(s) (negative), WBYTE_ADDR(addr)
: "ir" ((char) ~(1 << nr)) : "memory");
return negative;
}
@@ -155,13 +155,9 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile
* __clear_bit() is non-atomic and implies release semantics before the memory
* operation. It can be used for an unlock if no other CPUs can concurrently
* modify other bits in the word.
- *
- * No memory barrier is required here, because x86 cannot reorder stores past
- * older loads. Same principle as spin_unlock.
*/
static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
{
- barrier();
__clear_bit(nr, addr);
}
@@ -176,7 +172,7 @@ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *
*/
static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr));
+ asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
/**
@@ -196,8 +192,7 @@ static __always_inline void change_bit(long nr, volatile unsigned long *addr)
: "iq" ((u8)CONST_MASK(nr)));
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0"
- : BITOP_ADDR(addr)
- : "Ir" (nr));
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -242,8 +237,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
asm(__ASM_SIZE(bts) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr));
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -282,8 +277,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
asm volatile(__ASM_SIZE(btr) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr));
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -294,8 +289,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
asm volatile(__ASM_SIZE(btc) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr) : "memory");
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -326,7 +321,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
asm volatile(__ASM_SIZE(bt) " %2,%1"
CC_SET(c)
: CC_OUT(c) (oldbit)
- : "m" (*(unsigned long *)addr), "Ir" (nr));
+ : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory");
return oldbit;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 Mon Sep 17 00:00:00 2001
From: Alexander Potapenko <glider(a)google.com>
Date: Tue, 2 Apr 2019 13:28:13 +0200
Subject: [PATCH] x86/asm: Use stricter assembly constraints in bitops
There's a number of problems with how arch/x86/include/asm/bitops.h
is currently using assembly constraints for the memory region
bitops are modifying:
1) Use memory clobber in bitops that touch arbitrary memory
Certain bit operations that read/write bits take a base pointer and an
arbitrarily large offset to address the bit relative to that base.
Inline assembly constraints aren't expressive enough to tell the
compiler that the assembly directive is going to touch a specific memory
location of unknown size, therefore we have to use the "memory" clobber
to indicate that the assembly is going to access memory locations other
than those listed in the inputs/outputs.
To indicate that BTR/BTS instructions don't necessarily touch the first
sizeof(long) bytes of the argument, we also move the address to assembly
inputs.
This particular change leads to size increase of 124 kernel functions in
a defconfig build. For some of them the diff is in NOP operations, other
end up re-reading values from memory and may potentially slow down the
execution. But without these clobbers the compiler is free to cache
the contents of the bitmaps and use them as if they weren't changed by
the inline assembly.
2) Use byte-sized arguments for operations touching single bytes.
Passing a long value to ANDB/ORB/XORB instructions makes the compiler
treat sizeof(long) bytes as being clobbered, which isn't the case. This
may theoretically lead to worse code in the case of heavy optimization.
Practical impact:
I've built a defconfig kernel and looked through some of the functions
generated by GCC 7.3.0 with and without this clobber, and didn't spot
any miscompilations.
However there is a (trivial) theoretical case where this code leads to
miscompilation:
https://lkml.org/lkml/2019/3/28/393
using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes
such a function in the kernel someday.
So the primary motivation is to fix an existing misuse of the asm
directive, which happens to work in certain configurations now, but
isn't guaranteed to work under different circumstances.
[ --mingo: Added -stable tag because defconfig only builds a fraction
of the kernel and the trivial testcase looks normal enough to
be used in existing or in-development code. ]
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: James Y Knight <jyknight(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Paul E. McKenney <paulmck(a)linux.ibm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
[ Edited the changelog, tidied up one of the defines. ]
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index d153d570bb04..8e790ec219a5 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -36,16 +36,17 @@
* bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
*/
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define RLONG_ADDR(x) "m" (*(volatile long *) (x))
+#define WBYTE_ADDR(x) "+m" (*(volatile char *) (x))
-#define ADDR BITOP_ADDR(addr)
+#define ADDR RLONG_ADDR(addr)
/*
* We do the locked ops that don't return the old value as
* a mask operation on a byte.
*/
#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr))
-#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3))
+#define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3))
#define CONST_MASK(nr) (1 << ((nr) & 7))
/**
@@ -73,7 +74,7 @@ set_bit(long nr, volatile unsigned long *addr)
: "memory");
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
- : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -88,7 +89,7 @@ set_bit(long nr, volatile unsigned long *addr)
*/
static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory");
+ asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
/**
@@ -110,8 +111,7 @@ clear_bit(long nr, volatile unsigned long *addr)
: "iq" ((u8)~CONST_MASK(nr)));
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
- : BITOP_ADDR(addr)
- : "Ir" (nr));
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -131,7 +131,7 @@ static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *ad
static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr));
+ asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
@@ -139,7 +139,7 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile
bool negative;
asm volatile(LOCK_PREFIX "andb %2,%1"
CC_SET(s)
- : CC_OUT(s) (negative), ADDR
+ : CC_OUT(s) (negative), WBYTE_ADDR(addr)
: "ir" ((char) ~(1 << nr)) : "memory");
return negative;
}
@@ -155,13 +155,9 @@ static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile
* __clear_bit() is non-atomic and implies release semantics before the memory
* operation. It can be used for an unlock if no other CPUs can concurrently
* modify other bits in the word.
- *
- * No memory barrier is required here, because x86 cannot reorder stores past
- * older loads. Same principle as spin_unlock.
*/
static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
{
- barrier();
__clear_bit(nr, addr);
}
@@ -176,7 +172,7 @@ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *
*/
static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
{
- asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr));
+ asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
/**
@@ -196,8 +192,7 @@ static __always_inline void change_bit(long nr, volatile unsigned long *addr)
: "iq" ((u8)CONST_MASK(nr)));
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0"
- : BITOP_ADDR(addr)
- : "Ir" (nr));
+ : : RLONG_ADDR(addr), "Ir" (nr) : "memory");
}
}
@@ -242,8 +237,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
asm(__ASM_SIZE(bts) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr));
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -282,8 +277,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
asm volatile(__ASM_SIZE(btr) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr));
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -294,8 +289,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
asm volatile(__ASM_SIZE(btc) " %2,%1"
CC_SET(c)
- : CC_OUT(c) (oldbit), ADDR
- : "Ir" (nr) : "memory");
+ : CC_OUT(c) (oldbit)
+ : ADDR, "Ir" (nr) : "memory");
return oldbit;
}
@@ -326,7 +321,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
asm volatile(__ASM_SIZE(bt) " %2,%1"
CC_SET(c)
: CC_OUT(c) (oldbit)
- : "m" (*(unsigned long *)addr), "Ir" (nr));
+ : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory");
return oldbit;
}
On Mon, 15 Apr 2019, Sasha Levin wrote:
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 1f50ddb4f418 x86/speculation: Handle HT correctly on AMD.
>
> The bot has tested the following trees: v5.0.7, v4.19.34, v4.14.111, v4.9.168, v4.4.178.
>
> v5.0.7: Build OK!
> v4.19.34: Build OK!
> v4.14.111: Build failed! Errors:
> arch/x86/kernel/process.c:417:2: error: implicit declaration of function ???lockdep_assert_irqs_disabled???; did you mean ???lockdep_assert_cpus_held???? [-Werror=implicit-function-declaration]
>
Simply zap that line.
> v4.9.168: Failed to apply! Possible dependencies:
> 01daf56875ee ("x86/speculation: Reorganize speculation control MSRs update")
> 26c4d75b2340 ("x86/speculation: Rename SSBD update functions")
> 5bfbe3ad5840 ("x86/speculation: Prepare for per task indirect branch speculation control")
>
So this lacks all the IBPB goodies.
> v4.4.178: Failed to apply! Possible dependencies:
> 01daf56875ee ("x86/speculation: Reorganize speculation control MSRs update")
> 26c4d75b2340 ("x86/speculation: Rename SSBD update functions")
> 5bfbe3ad5840 ("x86/speculation: Prepare for per task indirect branch speculation control")
> cc69b3498921 ("x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}")
Ditto.
> How should we proceed with this patch?
4.14. is easy. For the dead kernels you need to talk to the kernel
necrophilia cult members. :)
Thanks,
tglx
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 662d66466637862ef955f7f6e78a286d8cf0ebef Mon Sep 17 00:00:00 2001
From: Kaike Wan <kaike.wan(a)intel.com>
Date: Mon, 18 Mar 2019 09:55:19 -0700
Subject: [PATCH] IB/hfi1: Failed to drain send queue when QP is put into error
state
When a QP is put into error state, all pending requests in the send work
queue should be drained. The following sequence of events could lead to a
failure, causing a request to hang:
(1) The QP builds a packet and tries to send through SDMA engine.
However, PIO engine is still busy. Consequently, this packet is put on
the QP's tx list and the QP is put on the PIO waiting list. The field
qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;
(2) The QP is put into error state by the user application and
notify_error_qp() is called, which removes the QP from the PIO waiting
list and the packet from the QP's tx list. In addition, qp->s_flags is
cleared of RVT_S_ANY_WAIT_IO bits, which does not include
HFI1_S_WAIT_PIO_DRAIN bit;
(3) The hfi1_schdule_send() function is called to drain the QP's send
queue. Subsequently, hfi1_do_send() is called. Since the flag bit
HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As
a result, hfi1_do_send() bails out without draining any request from
the send queue;
(4) The PIO engine completes the sending and tries to wake up any QP on
its waiting list. But the QP has been removed from the PIO waiting
list and therefore is kept in sleep forever.
The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.
Fixes: 2e2ba09e48b7 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
Cc: <stable(a)vger.kernel.org> # 4.19.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Reviewed-by: Alex Estrin <alex.estrin(a)intel.com>
Signed-off-by: Kaike Wan <kaike.wan(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c
index 9b643c2409cf..64889eda1735 100644
--- a/drivers/infiniband/hw/hfi1/qp.c
+++ b/drivers/infiniband/hw/hfi1/qp.c
@@ -898,7 +898,7 @@ void notify_error_qp(struct rvt_qp *qp)
if (!list_empty(&priv->s_iowait.list) &&
!(qp->s_flags & RVT_S_BUSY) &&
!(priv->s_flags & RVT_S_BUSY)) {
- qp->s_flags &= ~RVT_S_ANY_WAIT_IO;
+ qp->s_flags &= ~HFI1_S_ANY_WAIT_IO;
list_del_init(&priv->s_iowait.list);
priv->s_iowait.lock = NULL;
rvt_put_qp(qp);
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 662d66466637862ef955f7f6e78a286d8cf0ebef Mon Sep 17 00:00:00 2001
From: Kaike Wan <kaike.wan(a)intel.com>
Date: Mon, 18 Mar 2019 09:55:19 -0700
Subject: [PATCH] IB/hfi1: Failed to drain send queue when QP is put into error
state
When a QP is put into error state, all pending requests in the send work
queue should be drained. The following sequence of events could lead to a
failure, causing a request to hang:
(1) The QP builds a packet and tries to send through SDMA engine.
However, PIO engine is still busy. Consequently, this packet is put on
the QP's tx list and the QP is put on the PIO waiting list. The field
qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;
(2) The QP is put into error state by the user application and
notify_error_qp() is called, which removes the QP from the PIO waiting
list and the packet from the QP's tx list. In addition, qp->s_flags is
cleared of RVT_S_ANY_WAIT_IO bits, which does not include
HFI1_S_WAIT_PIO_DRAIN bit;
(3) The hfi1_schdule_send() function is called to drain the QP's send
queue. Subsequently, hfi1_do_send() is called. Since the flag bit
HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As
a result, hfi1_do_send() bails out without draining any request from
the send queue;
(4) The PIO engine completes the sending and tries to wake up any QP on
its waiting list. But the QP has been removed from the PIO waiting
list and therefore is kept in sleep forever.
The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.
Fixes: 2e2ba09e48b7 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
Cc: <stable(a)vger.kernel.org> # 4.19.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Reviewed-by: Alex Estrin <alex.estrin(a)intel.com>
Signed-off-by: Kaike Wan <kaike.wan(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c
index 9b643c2409cf..64889eda1735 100644
--- a/drivers/infiniband/hw/hfi1/qp.c
+++ b/drivers/infiniband/hw/hfi1/qp.c
@@ -898,7 +898,7 @@ void notify_error_qp(struct rvt_qp *qp)
if (!list_empty(&priv->s_iowait.list) &&
!(qp->s_flags & RVT_S_BUSY) &&
!(priv->s_flags & RVT_S_BUSY)) {
- qp->s_flags &= ~RVT_S_ANY_WAIT_IO;
+ qp->s_flags &= ~HFI1_S_ANY_WAIT_IO;
list_del_init(&priv->s_iowait.list);
priv->s_iowait.lock = NULL;
rvt_put_qp(qp);
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8efd6365417a044db03009724ecc1a9521524913 Mon Sep 17 00:00:00 2001
From: Dinh Nguyen <dinguyen(a)kernel.org>
Date: Wed, 13 Mar 2019 17:28:37 -0500
Subject: [PATCH] arm64: dts: stratix10: add the sysmgr-syscon property from
the gmac's
The gmac ethernet driver uses the "altr,sysmgr-syscon" property to
configure phy settings for the gmac controller.
Add the "altr,sysmgr-syscon" property to all gmac nodes.
This patch fixes:
[ 0.917530] socfpga-dwmac ff800000.ethernet: No sysmgr-syscon node found
[ 0.924209] socfpga-dwmac ff800000.ethernet: Unable to parse OF data
Cc: stable(a)vger.kernel.org
Reported-by: Ley Foon Tan <ley.foon.tan(a)intel.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 7c649f6b14cb..cd7c76e58b09 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -162,6 +162,7 @@
rx-fifo-depth = <16384>;
snps,multicast-filter-bins = <256>;
iommus = <&smmu 1>;
+ altr,sysmgr-syscon = <&sysmgr 0x44 0>;
status = "disabled";
};
@@ -179,6 +180,7 @@
rx-fifo-depth = <16384>;
snps,multicast-filter-bins = <256>;
iommus = <&smmu 2>;
+ altr,sysmgr-syscon = <&sysmgr 0x48 0>;
status = "disabled";
};
@@ -196,6 +198,7 @@
rx-fifo-depth = <16384>;
snps,multicast-filter-bins = <256>;
iommus = <&smmu 3>;
+ altr,sysmgr-syscon = <&sysmgr 0x4c 0>;
status = "disabled";
};
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: 8b298d3a0bd5 - Linux 5.0.7
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out a ref:
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Ref: 8b298d3a0bd5 - Linux 5.0.7
We then merged the patchset with `git am`:
drm-i915-gvt-do-not-let-pin-count-of-shadow-mm-go-ne.patch
kbuild-pkg-use-f-srctree-makefile-to-recurse-to-top-.patch
netfilter-nft_compat-use-.release_ops-and-remove-lis.patch
netfilter-nf_tables-use-after-free-in-dynamic-operat.patch
netfilter-nf_tables-add-missing-release_ops-in-error.patch
hv_netvsc-fix-unwanted-wakeup-after-tx_disable.patch
ibmvnic-fix-completion-structure-initialization.patch
ip6_tunnel-match-to-arphrd_tunnel6-for-dev-type.patch
ipv6-fix-dangling-pointer-when-ipv6-fragment.patch
ipv6-sit-reset-ip-header-pointer-in-ipip6_rcv.patch
kcm-switch-order-of-device-registration-to-fix-a-cra.patch
net-ethtool-not-call-vzalloc-for-zero-sized-memory-r.patch
net-gro-fix-gro-flush-when-receiving-a-gso-packet.patch
net-mlx5-decrease-default-mr-cache-size.patch
netns-provide-pure-entropy-for-net_hash_mix.patch
net-rds-force-to-destroy-connection-if-t_sock-is-nul.patch
net-sched-act_sample-fix-divide-by-zero-in-the-traff.patch
net-sched-fix-get-helper-of-the-matchall-cls.patch
openvswitch-fix-flow-actions-reallocation.patch
qmi_wwan-add-olicard-600.patch
r8169-disable-aspm-again.patch
sctp-initialize-_pad-of-sockaddr_in-before-copying-t.patch
tcp-ensure-dctcp-reacts-to-losses.patch
tcp-fix-a-potential-null-pointer-dereference-in-tcp_.patch
vrf-check-accept_source_route-on-the-original-netdev.patch
net-mlx5e-fix-error-handling-when-refreshing-tirs.patch
net-mlx5e-add-a-lock-on-tir-list.patch
nfp-validate-the-return-code-from-dev_queue_xmit.patch
nfp-disable-netpoll-on-representors.patch
bnxt_en-improve-rx-consumer-index-validity-check.patch
bnxt_en-reset-device-on-rx-buffer-errors.patch
net-ip_gre-fix-possible-use-after-free-in-erspan_rcv.patch
net-ip6_gre-fix-possible-use-after-free-in-ip6erspan.patch
net-bridge-always-clear-mcast-matching-struct-on-rep.patch
net-thunderx-fix-null-pointer-dereference-in-nicvf_o.patch
net-vrf-fix-ping-failed-when-vrf-mtu-is-set-to-0.patch
net-core-netif_receive_skb_list-unlist-skb-before-pa.patch
r8169-disable-default-rx-interrupt-coalescing-on-rtl.patch
net-mlx5-add-a-missing-check-on-idr_find-free-buf.patch
net-mlx5e-update-xoff-formula.patch
net-mlx5e-update-xon-formula.patch
kbuild-clang-choose-gcc_toolchain_dir-not-on-ld.patch
lib-string.c-implement-a-basic-bcmp.patch
revert-clk-meson-clean-up-clock-registration.patch
tty-mark-siemens-r3964-line-discipline-as-broken.patch
tty-ldisc-add-sysctl-to-prevent-autoloading-of-ldiscs.patch
hwmon-w83773g-select-regmap_i2c-to-fix-build-error.patch
hwmon-occ-fix-power-sensor-indexing.patch
smb3-allow-persistent-handle-timeout-to-be-configurable-on-mount.patch
hid-logitech-handle-0-scroll-events-for-the-m560.patch
acpica-clear-status-of-gpes-before-enabling-them.patch
acpica-namespace-remove-address-node-from-global-list-after-method-termination.patch
alsa-seq-fix-oob-reads-from-strlcpy.patch
alsa-hda-realtek-enable-headset-mic-of-acer-travelmate-b114-21-with-alc233.patch
alsa-hda-realtek-add-quirk-for-tuxedo-xc-1509.patch
alsa-xen-front-do-not-use-stream-buffer-size-before-it-is-set.patch
alsa-hda-add-two-more-machines-to-the-power_save_blacklist.patch
mm-huge_memory.c-fix-modifying-of-page-protection-by-insert_pfn_pmd.patch
arm64-dts-rockchip-fix-rk3328-sdmmc0-write-errors.patch
mmc-alcor-don-t-write-data-before-command-has-completed.patch
mmc-sdhci-omap-don-t-finish_mrq-on-a-command-error-during-tuning.patch
parisc-detect-qemu-earlier-in-boot-process.patch
parisc-regs_return_value-should-return-gpr28.patch
parisc-also-set-iaoq_b-in-instruction_pointer_set.patch
alarmtimer-return-correct-remaining-time.patch
drm-i915-gvt-do-not-deliver-a-workload-if-its-creation-fails.patch
drm-sun4i-dw-hdmi-lower-max.-supported-rate-for-h6.patch
drm-udl-add-a-release-method-and-delay-modeset-teardown.patch
kvm-svm-fix-potential-get_num_contig_pages-overflow.patch
include-linux-bitrev.h-fix-constant-bitrev.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
kernel build: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
ppc64le:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
kernel build: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
x86_64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
kernel build: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
✅ tuned: tune-processes-through-perf [8]
✅ Usex - version 1.9-29 [9]
🚧 ✅ stress: stress-ng [10]
ppc64le:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [11]
✅ tuned: tune-processes-through-perf [8]
✅ Usex - version 1.9-29 [9]
🚧 ✅ stress: stress-ng [10]
x86_64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [11]
✅ tuned: tune-processes-through-perf [8]
✅ Usex - version 1.9-29 [9]
🚧 ✅ stress: stress-ng [10]
Test source:
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/us…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
Waived tests (marked with 🚧)
-----------------------------
This test run included waived tests. Such tests are executed but their results
are not taken into account. Tests are waived when their results are not
reliable enough, e.g. when they're just introduced or are being fixed.
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 21635d7311734d2d1b177f8a95e2f9386174b76d Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Fri, 5 Apr 2019 10:52:20 +0300
Subject: [PATCH] drm/i915/dp: revert back to max link rate and lane count on
eDP
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast
and narrow") started to optize the eDP 1.4+ link config, both per spec
and as preparation for display stream compression support.
Sadly, we again face panels that flat out fail with parameters they
claim to support. Revert, and go back to the drawing board.
v2: Actually revert to max params instead of just wide-and-slow.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109959
Fixes: 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow")
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Manasi Navare <manasi.d.navare(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Matt Atwood <matthew.s.atwood(a)intel.com>
Cc: "Lee, Shawn C" <shawn.c.lee(a)intel.com>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.0+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare(a)intel.com>
Tested-by: Albert Astals Cid <aacid(a)kde.org> # v5.0 backport
Tested-by: Emanuele Panigati <ilpanich(a)gmail.com> # v5.0 backport
Tested-by: Matteo Iervasi <matteoiervasi(a)gmail.com> # v5.0 backport
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405075220.9815-1-jani.ni…
(cherry picked from commit f11cb1c19ad0563b3c1ea5eb16a6bac0e401f428)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index cf709835fb9a..8891f29a8c7f 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1859,42 +1859,6 @@ intel_dp_compute_link_config_wide(struct intel_dp *intel_dp,
return -EINVAL;
}
-/* Optimize link config in order: max bpp, min lanes, min clock */
-static int
-intel_dp_compute_link_config_fast(struct intel_dp *intel_dp,
- struct intel_crtc_state *pipe_config,
- const struct link_config_limits *limits)
-{
- struct drm_display_mode *adjusted_mode = &pipe_config->base.adjusted_mode;
- int bpp, clock, lane_count;
- int mode_rate, link_clock, link_avail;
-
- for (bpp = limits->max_bpp; bpp >= limits->min_bpp; bpp -= 2 * 3) {
- mode_rate = intel_dp_link_required(adjusted_mode->crtc_clock,
- bpp);
-
- for (lane_count = limits->min_lane_count;
- lane_count <= limits->max_lane_count;
- lane_count <<= 1) {
- for (clock = limits->min_clock; clock <= limits->max_clock; clock++) {
- link_clock = intel_dp->common_rates[clock];
- link_avail = intel_dp_max_data_rate(link_clock,
- lane_count);
-
- if (mode_rate <= link_avail) {
- pipe_config->lane_count = lane_count;
- pipe_config->pipe_bpp = bpp;
- pipe_config->port_clock = link_clock;
-
- return 0;
- }
- }
- }
- }
-
- return -EINVAL;
-}
-
static int intel_dp_dsc_compute_bpp(struct intel_dp *intel_dp, u8 dsc_max_bpc)
{
int i, num_bpc;
@@ -2031,15 +1995,13 @@ intel_dp_compute_link_config(struct intel_encoder *encoder,
limits.min_bpp = 6 * 3;
limits.max_bpp = intel_dp_compute_bpp(intel_dp, pipe_config);
- if (intel_dp_is_edp(intel_dp) && intel_dp->edp_dpcd[0] < DP_EDP_14) {
+ if (intel_dp_is_edp(intel_dp)) {
/*
* Use the maximum clock and number of lanes the eDP panel
- * advertizes being capable of. The eDP 1.3 and earlier panels
- * are generally designed to support only a single clock and
- * lane configuration, and typically these values correspond to
- * the native resolution of the panel. With eDP 1.4 rate select
- * and DSC, this is decreasingly the case, and we need to be
- * able to select less than maximum link config.
+ * advertizes being capable of. The panels are generally
+ * designed to support only a single clock and lane
+ * configuration, and typically these values correspond to the
+ * native resolution of the panel.
*/
limits.min_lane_count = limits.max_lane_count;
limits.min_clock = limits.max_clock;
@@ -2053,22 +2015,11 @@ intel_dp_compute_link_config(struct intel_encoder *encoder,
intel_dp->common_rates[limits.max_clock],
limits.max_bpp, adjusted_mode->crtc_clock);
- if (intel_dp_is_edp(intel_dp))
- /*
- * Optimize for fast and narrow. eDP 1.3 section 3.3 and eDP 1.4
- * section A.1: "It is recommended that the minimum number of
- * lanes be used, using the minimum link rate allowed for that
- * lane configuration."
- *
- * Note that we use the max clock and lane count for eDP 1.3 and
- * earlier, and fast vs. wide is irrelevant.
- */
- ret = intel_dp_compute_link_config_fast(intel_dp, pipe_config,
- &limits);
- else
- /* Optimize for slow and wide. */
- ret = intel_dp_compute_link_config_wide(intel_dp, pipe_config,
- &limits);
+ /*
+ * Optimize for slow and wide. This is the place to add alternative
+ * optimization policy.
+ */
+ ret = intel_dp_compute_link_config_wide(intel_dp, pipe_config, &limits);
/* enable compression if the mode doesn't fit available BW */
DRM_DEBUG_KMS("Force DSC en = %d\n", intel_dp->force_dsc_en);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0b3d6e6f2dd0a7b697b1aa8c167265908940624b Mon Sep 17 00:00:00 2001
From: Greg Thelen <gthelen(a)google.com>
Date: Fri, 5 Apr 2019 18:39:18 -0700
Subject: [PATCH] mm: writeback: use exact memcg dirty counts
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic. Stat readers only check the atomic. Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per
cpu.
Assuming 100 cpus:
4k x86 page_size: 13 MiB error per memcg
64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions the
errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages. If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.
It could be argued that tiny containers are not supported, but it's more
subtle. It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids
oom kills.
$ cat test
mount -t cgroup2 none /dev/cgroup
cd /dev/cgroup
echo +io +memory > cgroup.subtree_control
mkdir test
cd test
echo 10M > memory.max
(echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
(echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c
/*
* Dirty pages from all but one cpu.
* Clean pages from the non dirtying cpu.
* This is to stress per cpu counter imbalance.
* On a 100 cpu machine:
* - per memcg per cpu dirty count is 32 pages for each of 99 cpus
* - per memcg atomic is -99*32 pages
* - thus the complete dirty limit: sum of all counters 0
* - balance_dirty_pages() only sees atomic count -99*32 pages, which
* it max()s to 0.
* - So a workload can dirty -99*32 pages before balance_dirty_pages()
* cares.
*/
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
#include <sys/types.h>
#include <unistd.h>
static char *buf;
static int bufSize;
static void set_affinity(int cpu)
{
cpu_set_t affinity;
CPU_ZERO(&affinity);
CPU_SET(cpu, &affinity);
if (sched_setaffinity(0, sizeof(affinity), &affinity))
err(1, "sched_setaffinity");
}
static void dirty_on(int output_fd, int cpu)
{
int i, wrote;
set_affinity(cpu);
for (i = 0; i < 32; i++) {
for (wrote = 0; wrote < bufSize; ) {
int ret = write(output_fd, buf+wrote, bufSize-wrote);
if (ret == -1)
err(1, "write");
wrote += ret;
}
}
}
int main(int argc, char **argv)
{
int cpu, flush_cpu = 1, output_fd;
const char *output;
if (argc != 2)
errx(1, "usage: output_file");
output = argv[1];
bufSize = getpagesize();
buf = malloc(getpagesize());
if (buf == NULL)
errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR);
if (output_fd == -1)
err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) {
if (cpu != flush_cpu)
dirty_on(output_fd, cpu);
}
set_affinity(flush_cpu);
if (fsync(output_fd))
err(1, "fsync(%s)", output);
if (close(output_fd))
err(1, "close(%s)", output);
free(buf);
}
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters. This avoids the aforementioned oom
kills.
This does not affect the overhead of memory.stat, which still reads the
single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter. And the percpu_counter
spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports. But that is saved for later.
Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1f3d880b7ca1..dbb6118370c1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -566,7 +566,10 @@ struct mem_cgroup *lock_page_memcg(struct page *page);
void __unlock_page_memcg(struct mem_cgroup *memcg);
void unlock_page_memcg(struct page *page);
-/* idx can be of type enum memcg_stat_item or node_stat_item */
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page_state().
+ */
static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
int idx)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 532e0e2a4817..81a0d3914ec9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3882,6 +3882,22 @@ struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb)
return &memcg->cgwb_domain;
}
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page().
+ */
+static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx)
+{
+ long x = atomic_long_read(&memcg->stat[idx]);
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx];
+ if (x < 0)
+ x = 0;
+ return x;
+}
+
/**
* mem_cgroup_wb_stats - retrieve writeback related stats from its memcg
* @wb: bdi_writeback in question
@@ -3907,10 +3923,10 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css);
struct mem_cgroup *parent;
- *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY);
+ *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY);
/* this should eventually include NR_UNSTABLE_NFS */
- *pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
+ *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK);
*pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) |
(1 << LRU_ACTIVE_FILE));
*pheadroom = PAGE_COUNTER_MAX;
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: 8b298d3a0bd5 - Linux 5.0.7
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out a ref:
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Ref: 8b298d3a0bd5 - Linux 5.0.7
We then merged the patchset with `git am`:
drm-i915-gvt-do-not-let-pin-count-of-shadow-mm-go-ne.patch
kbuild-pkg-use-f-srctree-makefile-to-recurse-to-top-.patch
netfilter-nft_compat-use-.release_ops-and-remove-lis.patch
netfilter-nf_tables-use-after-free-in-dynamic-operat.patch
netfilter-nf_tables-add-missing-release_ops-in-error.patch
hv_netvsc-fix-unwanted-wakeup-after-tx_disable.patch
ibmvnic-fix-completion-structure-initialization.patch
ip6_tunnel-match-to-arphrd_tunnel6-for-dev-type.patch
ipv6-fix-dangling-pointer-when-ipv6-fragment.patch
ipv6-sit-reset-ip-header-pointer-in-ipip6_rcv.patch
kcm-switch-order-of-device-registration-to-fix-a-cra.patch
net-ethtool-not-call-vzalloc-for-zero-sized-memory-r.patch
net-gro-fix-gro-flush-when-receiving-a-gso-packet.patch
net-mlx5-decrease-default-mr-cache-size.patch
netns-provide-pure-entropy-for-net_hash_mix.patch
net-rds-force-to-destroy-connection-if-t_sock-is-nul.patch
net-sched-act_sample-fix-divide-by-zero-in-the-traff.patch
net-sched-fix-get-helper-of-the-matchall-cls.patch
openvswitch-fix-flow-actions-reallocation.patch
qmi_wwan-add-olicard-600.patch
r8169-disable-aspm-again.patch
sctp-initialize-_pad-of-sockaddr_in-before-copying-t.patch
tcp-ensure-dctcp-reacts-to-losses.patch
tcp-fix-a-potential-null-pointer-dereference-in-tcp_.patch
vrf-check-accept_source_route-on-the-original-netdev.patch
net-mlx5e-fix-error-handling-when-refreshing-tirs.patch
net-mlx5e-add-a-lock-on-tir-list.patch
nfp-validate-the-return-code-from-dev_queue_xmit.patch
nfp-disable-netpoll-on-representors.patch
bnxt_en-improve-rx-consumer-index-validity-check.patch
bnxt_en-reset-device-on-rx-buffer-errors.patch
net-ip_gre-fix-possible-use-after-free-in-erspan_rcv.patch
net-ip6_gre-fix-possible-use-after-free-in-ip6erspan.patch
net-bridge-always-clear-mcast-matching-struct-on-rep.patch
net-thunderx-fix-null-pointer-dereference-in-nicvf_o.patch
net-vrf-fix-ping-failed-when-vrf-mtu-is-set-to-0.patch
net-core-netif_receive_skb_list-unlist-skb-before-pa.patch
r8169-disable-default-rx-interrupt-coalescing-on-rtl.patch
net-mlx5-add-a-missing-check-on-idr_find-free-buf.patch
net-mlx5e-update-xoff-formula.patch
net-mlx5e-update-xon-formula.patch
kbuild-clang-choose-gcc_toolchain_dir-not-on-ld.patch
lib-string.c-implement-a-basic-bcmp.patch
revert-clk-meson-clean-up-clock-registration.patch
tty-mark-siemens-r3964-line-discipline-as-broken.patch
tty-ldisc-add-sysctl-to-prevent-autoloading-of-ldiscs.patch
hwmon-w83773g-select-regmap_i2c-to-fix-build-error.patch
hwmon-occ-fix-power-sensor-indexing.patch
smb3-allow-persistent-handle-timeout-to-be-configurable-on-mount.patch
hid-logitech-handle-0-scroll-events-for-the-m560.patch
acpica-clear-status-of-gpes-before-enabling-them.patch
acpica-namespace-remove-address-node-from-global-list-after-method-termination.patch
alsa-seq-fix-oob-reads-from-strlcpy.patch
alsa-hda-realtek-enable-headset-mic-of-acer-travelmate-b114-21-with-alc233.patch
alsa-hda-realtek-add-quirk-for-tuxedo-xc-1509.patch
alsa-xen-front-do-not-use-stream-buffer-size-before-it-is-set.patch
alsa-hda-add-two-more-machines-to-the-power_save_blacklist.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
kernel build: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
ppc64le:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
kernel build: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
x86_64:
make options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
kernel build: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
✅ tuned: tune-processes-through-perf [8]
🚧 ✅ stress: stress-ng [9]
ppc64le:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [10]
✅ tuned: tune-processes-through-perf [8]
🚧 ✅ stress: stress-ng [9]
x86_64:
✅ Boot test [0]
✅ LTP lite [1]
✅ Loopdev Sanity [2]
✅ AMTU (Abstract Machine Test Utility) [3]
🚧 ✅ audit: audit testsuite test [4]
✅ httpd: mod_ssl smoke sanity [5]
✅ httpd: php sanity [6]
✅ iotop: sanity [7]
✅ /CoreOS/net-snmp/Regression/bz251332-tcp-transport
🚧 ✅ selinux-policy: serge-testsuite [10]
✅ tuned: tune-processes-through-perf [8]
🚧 ✅ stress: stress-ng [9]
Test source:
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
Waived tests (marked with 🚧)
-----------------------------
This test run included waived tests. Such tests are executed but their results
are not taken into account. Tests are waived when their results are not
reliable enough, e.g. when they're just introduced or are being fixed.
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b9d0a85d6b2e76630cfd4c475ee3af4109bfd87a Mon Sep 17 00:00:00 2001
From: Yue Haibing <yuehaibing(a)huawei.com>
Date: Wed, 20 Feb 2019 16:25:38 +0800
Subject: [PATCH] tpm: Fix the type of the return value in
calc_tpm2_event_size()
calc_tpm2_event_size() has an invalid signature because
it returns a 'size_t' where as its signature says that
it returns 'int'.
Cc: <stable(a)vger.kernel.org>
Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log")
Suggested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Yue Haibing <yuehaibing(a)huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: James Morris <james.morris(a)microsoft.com>
diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c
index d8b77133a83a..f824563fc28d 100644
--- a/drivers/char/tpm/eventlog/tpm2.c
+++ b/drivers/char/tpm/eventlog/tpm2.c
@@ -37,8 +37,8 @@
*
* Returns size of the event. If it is an invalid event, returns 0.
*/
-static int calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
- struct tcg_pcr_event *event_header)
+static size_t calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
+ struct tcg_pcr_event *event_header)
{
struct tcg_efi_specid_event_head *efispecid;
struct tcg_event_field *event_field;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b9d0a85d6b2e76630cfd4c475ee3af4109bfd87a Mon Sep 17 00:00:00 2001
From: Yue Haibing <yuehaibing(a)huawei.com>
Date: Wed, 20 Feb 2019 16:25:38 +0800
Subject: [PATCH] tpm: Fix the type of the return value in
calc_tpm2_event_size()
calc_tpm2_event_size() has an invalid signature because
it returns a 'size_t' where as its signature says that
it returns 'int'.
Cc: <stable(a)vger.kernel.org>
Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log")
Suggested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Yue Haibing <yuehaibing(a)huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: James Morris <james.morris(a)microsoft.com>
diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c
index d8b77133a83a..f824563fc28d 100644
--- a/drivers/char/tpm/eventlog/tpm2.c
+++ b/drivers/char/tpm/eventlog/tpm2.c
@@ -37,8 +37,8 @@
*
* Returns size of the event. If it is an invalid event, returns 0.
*/
-static int calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
- struct tcg_pcr_event *event_header)
+static size_t calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
+ struct tcg_pcr_event *event_header)
{
struct tcg_efi_specid_event_head *efispecid;
struct tcg_event_field *event_field;