Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 24311f884189 NFSv4: Recovery of recalled read delegations is broken.
The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135, v4.9.186, v4.4.186.
v5.2.5: Build OK!
v4.19.63: Failed to apply! Possible dependencies:
07d02a67b7fa ("SUNRPC: Simplify lookup code")
79b181810285 ("SUNRPC: Convert auth creds to use refcount_t")
8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred")
95cd623250ad ("SUNRPC: Clean up the AUTH cache code")
97f68c6b02e0 ("SUNRPC: add 'struct cred *' to auth_cred and rpc_cred")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
fc0664fd9bcc ("SUNRPC: remove groupinfo from struct auth_cred.")
v4.14.135: Failed to apply! Possible dependencies:
07d02a67b7fa ("SUNRPC: Simplify lookup code")
12f275cdd163 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
1eb5d98f16f6 ("nfs: convert to new i_version API")
35156bfff3c0 ("NFSv4: Fix the nfs_inode_set_delegation() arguments")
79b181810285 ("SUNRPC: Convert auth creds to use refcount_t")
95cd623250ad ("SUNRPC: Clean up the AUTH cache code")
97f68c6b02e0 ("SUNRPC: add 'struct cred *' to auth_cred and rpc_cred")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
fc0664fd9bcc ("SUNRPC: remove groupinfo from struct auth_cred.")
v4.9.186: Failed to apply! Possible dependencies:
1eb5d98f16f6 ("nfs: convert to new i_version API")
35156bfff3c0 ("NFSv4: Fix the nfs_inode_set_delegation() arguments")
39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution")
7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally")
9cf09d68b89a ("arm64: xen: Enable user access before a privcmd hvc call")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
bd38967d406f ("arm64: Factor out PAN enabling/disabling into separate uaccess_* macros")
v4.4.186: Failed to apply! Possible dependencies:
0654cc726fc6 ("NFSv4.1/pNFS: Add a helper to mark the layout as returned")
10335556c9e6 ("NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout")
13c13a6ad71f ("pNFS: Fix missing layoutreturn calls")
2454dfea0aef ("NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout")
3982a6a2d0e6 ("pnfs: keep track of the return sequence number in pnfs_layout_hdr")
4b0934baf931 ("NFSv4.1/pNFS: Fix a race in initiate_file_draining()")
506c0d68269e ("NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments")
50f563ef5d41 ("NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids")
5c97f5de2c7c ("NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode")
68d264cf02b0 ("NFS42: handle layoutstats stateid error")
6d597e175012 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
71b39854a500 ("NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()")
9fd4b9fc7695 ("NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b20135d0b243 ("NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
e036f46453f2 ("NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id")
e0d9243048fd ("NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL")
ed429d6b934d ("NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()")
fc7ff36747b9 ("pNFS: If we have to delay the layout callback, mark the layout for return")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
This is the start of the stable review cycle for the 4.14.136 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.136-rc1
Yan, Zheng <zyan(a)redhat.com>
ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato <ysato(a)users.sourceforge.jp>
Fix allyesconfig output.
Miroslav Lichvar <mlichvar(a)redhat.com>
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Jann Horn <jannh(a)google.com>
sched/fair: Don't free p->numa_faults with concurrent readers
Vladis Dronov <vdronov(a)redhat.com>
Bluetooth: hci_uart: check for missing tty operations
Sunil Muthuswamy <sunilmut(a)microsoft.com>
hv_sock: Add support for delayed close
Joerg Roedel <jroedel(a)suse.de>
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Dmitry Safonov <dima(a)arista.com>
iommu/vt-d: Don't queue_iova() if there is no flush queue
Luke Nowakowski-Krijger <lnowakow(a)eng.ucsd.edu>
media: radio-raremono: change devm_k*alloc to k*alloc
Benjamin Coddington <bcodding(a)redhat.com>
NFS: Cleanup if nfs_match_client is interrupted
Andrey Konovalov <andreyknvl(a)google.com>
media: pvrusb2: use a different format for warnings
Oliver Neukum <oneukum(a)suse.com>
media: cpia2_usb: first wake up, then free in disconnect
Fabio Estevam <festevam(a)gmail.com>
ath10k: Change the warning message string
Sean Young <sean(a)mess.org>
media: au0828: fix null dereference in error path
Phong Tran <tranmanphong(a)gmail.com>
ISDN: hfcsusb: checking idx of ep configuration
Todd Kjos <tkjos(a)android.com>
binder: fix possible UAF when freeing buffer
Will Deacon <will.deacon(a)arm.com>
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
Abhishek Sahu <absahu(a)codeaurora.org>
i2c: qup: fixed releasing dma without flush operation completion
allen yan <yanwei(a)marvell.com>
arm64: dts: marvell: Fix A37xx UART0 register size
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Fix lookup revalidate of regular files
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Refactor nfs_lookup_revalidate()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix dentry revalidation on NFSv4 lookup
Sunil Muthuswamy <sunilmut(a)microsoft.com>
vsock: correct removal of socket from the list
Stefan Hajnoczi <stefanha(a)redhat.com>
VSOCK: use TCP state constants for sk_state
-------------
Diffstat:
.../devicetree/bindings/serial/mvebu-uart.txt | 2 +-
Makefile | 4 +-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +-
arch/arm64/include/asm/compat.h | 1 +
arch/sh/boards/Kconfig | 14 +-
drivers/android/binder.c | 16 +-
drivers/bluetooth/hci_ath.c | 3 +
drivers/bluetooth/hci_bcm.c | 3 +
drivers/bluetooth/hci_intel.c | 3 +
drivers/bluetooth/hci_ldisc.c | 13 +
drivers/bluetooth/hci_mrvl.c | 3 +
drivers/bluetooth/hci_uart.h | 1 +
drivers/i2c/busses/i2c-qup.c | 2 +
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/iova.c | 18 +-
drivers/isdn/hardware/mISDN/hfcsusb.c | 3 +
drivers/media/radio/radio-raremono.c | 30 ++-
drivers/media/usb/au0828/au0828-core.c | 12 +-
drivers/media/usb/cpia2/cpia2_usb.c | 3 +-
drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +-
drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +-
drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +-
drivers/net/wireless/ath/ath10k/usb.c | 2 +-
drivers/pps/pps.c | 8 +
fs/ceph/caps.c | 7 +-
fs/exec.c | 2 +-
fs/nfs/client.c | 4 +-
fs/nfs/dir.c | 295 +++++++++++----------
fs/nfs/nfs4proc.c | 15 +-
include/linux/iova.h | 6 +
include/linux/sched/numa_balancing.h | 4 +-
include/net/af_vsock.h | 3 -
kernel/fork.c | 2 +-
kernel/sched/fair.c | 24 +-
net/vmw_vsock/af_vsock.c | 84 +++---
net/vmw_vsock/hyperv_transport.c | 118 ++++++---
net/vmw_vsock/virtio_transport.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 22 +-
net/vmw_vsock/vmci_transport.c | 34 +--
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
41 files changed, 472 insertions(+), 311 deletions(-)
This is the start of the stable review cycle for the 4.14.132 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.132-rc1
Xin Long <lucien.xin(a)gmail.com>
tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
Will Deacon <will.deacon(a)arm.com>
futex: Update comments and docs about return values of arch futex code
Daniel Borkmann <daniel(a)iogearbox.net>
bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
Will Deacon <will.deacon(a)arm.com>
arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg()
Martin KaFai Lau <kafai(a)fb.com>
bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err
Martin KaFai Lau <kafai(a)fb.com>
bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
YueHaibing <yuehaibing(a)huawei.com>
bonding: Always enable vlan tx offload
YueHaibing <yuehaibing(a)huawei.com>
team: Always enable vlan tx offload
Fei Li <lifei.shirley(a)bytedance.com>
tun: wake up waitqueues after IFF_UP is set
Xin Long <lucien.xin(a)gmail.com>
tipc: check msg->req data len in tipc_nl_compat_bearer_disable
Xin Long <lucien.xin(a)gmail.com>
tipc: change to use register_pernet_device
Xin Long <lucien.xin(a)gmail.com>
sctp: change to hold sk after auth shkey is created successfully
Roland Hii <roland.king.guan.hii(a)intel.com>
net: stmmac: fixed new system time seconds value calculation
JingYi Hou <houjingyi647(a)gmail.com>
net: remove duplicate fetch in sock_getsockopt
Eric Dumazet <edumazet(a)google.com>
net/packet: fix memory leak in packet_set_ring()
Stephen Suryaputra <ssuryaextr(a)gmail.com>
ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
Neil Horman <nhorman(a)tuxdriver.com>
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Wang Xin <xin.wang7(a)cn.bosch.com>
eeprom: at24: fix unexpected timeout under high load
Geert Uytterhoeven <geert(a)linux-m68k.org>
cpu/speculation: Warn on unsupported mitigations= parameter
Trond Myklebust <trondmy(a)gmail.com>
NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
Thomas Gleixner <tglx(a)linutronix.de>
x86/microcode: Fix the microcode load on CPU hotplug for real
Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
x86/speculation: Allow guests to use SSBD even if host does not
Jan Kara <jack(a)suse.cz>
scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
zhangyi (F) <yi.zhang(a)huawei.com>
dm log writes: make sure super sector log updates are written in order
Colin Ian King <colin.king(a)canonical.com>
mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
Jann Horn <jannh(a)google.com>
fs/binfmt_flat.c: make load_flat_shared_library() work
zhong jiang <zhongjiang(a)huawei.com>
mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
John Ogness <john.ogness(a)linutronix.de>
fs/proc/array.c: allow reporting eip/esp for all coredumping threads
Sasha Levin <sashal(a)kernel.org>
Revert "compiler.h: update definition of unreachable()"
Kristian Evensen <kristian.evensen(a)gmail.com>
qmi_wwan: Fix out-of-bounds read
Adeodato Simó <dato(a)net.com.org.es>
net/9p: include trans_common.h to fix missing prototype warning.
Dominique Martinet <dominique.martinet(a)cea.fr>
9p: p9dirent_read: check network-provided name length
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/rdma: remove useless check in cm_event_handler
Dominique Martinet <dominique.martinet(a)cea.fr>
9p: acl: fix uninitialized iattr access
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/rdma: do not disconnect on down_interruptible EAGAIN
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/xen: fix check for xenbus_read error in front_probe
Martin Wilck <mwilck(a)suse.com>
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
Christoph Hellwig <hch(a)lst.de>
block: add a lower-level bio_add_page interface
Mike Marciniszyn <mike.marciniszyn(a)intel.com>
IB/hfi1: Close PSM sdma_progress sleep window
Sasha Levin <sashal(a)kernel.org>
Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf header: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf help: Remove needless use of strncpy()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
-------------
Diffstat:
Documentation/robust-futexes.txt | 3 +-
Makefile | 4 +-
arch/arm64/include/asm/futex.h | 4 +-
arch/arm64/include/asm/insn.h | 8 ++
arch/arm64/kernel/insn.c | 40 +++++++
arch/arm64/net/bpf_jit.h | 4 +
arch/arm64/net/bpf_jit_comp.c | 28 +++--
arch/x86/kernel/cpu/bugs.c | 11 +-
arch/x86/kernel/cpu/microcode/core.c | 15 ++-
block/bio.c | 131 +++++++++++++++------
drivers/infiniband/hw/hfi1/user_sdma.c | 12 +-
drivers/infiniband/hw/hfi1/user_sdma.h | 1 -
drivers/md/dm-log-writes.c | 23 +++-
drivers/misc/eeprom/at24.c | 107 ++++++++++++-----
drivers/net/bonding/bond_main.c | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 2 +-
drivers/net/team/team.c | 2 +-
drivers/net/tun.c | 19 ++-
drivers/net/usb/qmi_wwan.c | 4 +-
drivers/scsi/vmw_pvscsi.c | 6 +-
fs/9p/acl.c | 2 +-
fs/binfmt_flat.c | 23 ++--
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +-
fs/proc/array.c | 2 +-
include/asm-generic/futex.h | 8 +-
include/linux/bio.h | 9 ++
include/linux/compiler.h | 5 +-
kernel/cpu.c | 3 +
kernel/trace/trace_branch.c | 4 -
mm/mempolicy.c | 2 +-
mm/page_idle.c | 4 +-
net/9p/protocol.c | 12 +-
net/9p/trans_common.c | 1 +
net/9p/trans_rdma.c | 7 +-
net/9p/trans_xen.c | 4 +-
net/core/sock.c | 3 -
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 6 +-
net/ipv6/udp.c | 4 +-
net/packet/af_packet.c | 23 +++-
net/packet/internal.h | 1 +
net/sctp/endpointola.c | 8 +-
net/tipc/core.c | 12 +-
net/tipc/netlink_compat.c | 18 ++-
net/tipc/udp_media.c | 8 +-
tools/perf/builtin-help.c | 2 +-
tools/perf/ui/tui/helpline.c | 2 +-
tools/perf/util/header.c | 2 +-
48 files changed, 418 insertions(+), 187 deletions(-)
From: Chris Down <chris(a)chrisdown.name>
Subject: cgroup: kselftest: relax fs_spec checks
On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct. Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":
% grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.
After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.
Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.name
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/cgroup/cgroup_util.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c~cgroup-kselftest-relax-fs_spec-checks
+++ a/tools/testing/selftests/cgroup/cgroup_util.c
@@ -191,8 +191,7 @@ int cg_find_unified_root(char *root, siz
strtok(NULL, delim);
strtok(NULL, delim);
- if (strcmp(fs, "cgroup") == 0 &&
- strcmp(type, "cgroup2") == 0) {
+ if (strcmp(type, "cgroup2") == 0) {
strncpy(root, mount, len);
return 0;
}
_
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: ubsan: build ubsan.c more conservatively
objtool points out several conditions that it does not like, depending on
the combination with other configuration options and compiler variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file as
we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is mixed
with the gcc specific -fno-conserve-stack option. According to Andrey
Ryabinin, that option is not even needed, dropping it here fixes the
stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/Makefile~ubsan-build-ubsanc-more-conservatively
+++ a/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
_
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes 1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting while
making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and isolating
a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/mm/compaction.c~mm-compaction-avoid-100%-cpu-usage-during-compaction-when-a-task-is-killed
+++ a/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compac
/*
* Periodically drop the lock (if held) regardless of its
- * contention, to give chance to IRQs. Abort async compaction
- * if contended.
+ * contention, to give chance to IRQs. Abort completely if
+ * a fatal signal is pending.
*/
if (!(low_pfn % SWAP_CLUSTER_MAX)
&& compact_unlock_should_abort(&pgdat->lru_lock,
- flags, &locked, cc))
- break;
+ flags, &locked, cc)) {
+ low_pfn = 0;
+ goto fatal_pending;
+ }
if (!pfn_valid_within(low_pfn))
goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_abort:
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
+fatal_pending:
cc->total_migrate_scanned += nr_scanned;
if (nr_isolated)
count_compact_events(COMPACTISOLATED, nr_isolated);
_
From: Jan Kara <jack(a)suse.cz>
Subject: mm: migrate: fix reference check race between __find_get_block() and migration
buffer_migrate_page_norefs() can race with bh users in the following way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page. Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary. The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock. Between the page lock and
private_lock, it should be impossible for other references to be acquired
and updates to happen during the migration.
A user had reported data corruption issues on a distribution kernel with a
similar page migration implementation as mainline. The data corruption
could not be reproduced with this patch applied. A small number of
migration-intensive tests were run and no performance problems were noted.
[mgorman(a)techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fix-reference-check-race-between-__find_get_block-and-migration
+++ a/mm/migrate.c
@@ -767,12 +767,12 @@ recheck_buffers:
}
bh = bh->b_this_page;
} while (bh != head);
- spin_unlock(&mapping->private_lock);
if (busy) {
if (invalidated) {
rc = -EAGAIN;
goto unlock_buffers;
}
+ spin_unlock(&mapping->private_lock);
invalidate_bh_lrus();
invalidated = true;
goto recheck_buffers;
@@ -805,6 +805,8 @@ recheck_buffers:
rc = MIGRATEPAGE_SUCCESS;
unlock_buffers:
+ if (check_refs)
+ spin_unlock(&mapping->private_lock);
bh = head;
do {
unlock_buffer(bh);
_