This is an automatic generated email to let you know that the following patch were queued:
Subject: media: don't drop front-end reference count for ->detach
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Wed Jun 19 10:24:17 2019 -0300
A bugfix introduce a link failure in configurations without CONFIG_MODULES:
In file included from drivers/media/usb/dvb-usb/pctv452e.c:20:0:
drivers/media/usb/dvb-usb/pctv452e.c: In function 'pctv452e_frontend_attach':
drivers/media/dvb-frontends/stb0899_drv.h:151:36: error: weak declaration of 'stb0899_attach' being applied to a already existing, static definition
The problem is that the !IS_REACHABLE() declaration of stb0899_attach()
is a 'static inline' definition that clashes with the weak definition.
I further observed that the bugfix was only done for one of the five users
of stb0899_attach(), the other four still have the problem. This reverts
the bugfix and instead addresses the problem by not dropping the reference
count when calling '->detach()', instead we call this function directly
in dvb_frontend_put() before dropping the kref on the front-end.
I first submitted this in early 2018, and after some discussion it
was apparently discarded. While there is a long-term plan in place,
that plan is obviously not nearing completion yet, and the current
kernel is still broken unless this patch is applied.
Link: https://patchwork.kernel.org/patch/10140175/
Link: https://patchwork.linuxtv.org/patch/54831/
Cc: Max Kellermann <max.kellermann(a)gmail.com>
Cc: Wolfgang Rohdewald <wolfgang(a)rohdewald.de>
Cc: stable(a)vger.kernel.org
Fixes: f686c14364ad ("[media] stb0899: move code to "detach" callback")
Fixes: 6cdeaed3b142 ("media: dvb_usb_pctv452e: module refcount changes were unbalanced")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/media/dvb-core/dvb_frontend.c | 4 +++-
drivers/media/usb/dvb-usb/pctv452e.c | 8 --------
2 files changed, 3 insertions(+), 9 deletions(-)
---
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c
index 209186c5cd9b..06ea30a689d7 100644
--- a/drivers/media/dvb-core/dvb_frontend.c
+++ b/drivers/media/dvb-core/dvb_frontend.c
@@ -152,6 +152,9 @@ static void dvb_frontend_free(struct kref *ref)
static void dvb_frontend_put(struct dvb_frontend *fe)
{
+ /* call detach before dropping the reference count */
+ if (fe->ops.detach)
+ fe->ops.detach(fe);
/*
* Check if the frontend was registered, as otherwise
* kref was not initialized yet.
@@ -3040,7 +3043,6 @@ void dvb_frontend_detach(struct dvb_frontend *fe)
dvb_frontend_invoke_release(fe, fe->ops.release_sec);
dvb_frontend_invoke_release(fe, fe->ops.tuner_ops.release);
dvb_frontend_invoke_release(fe, fe->ops.analog_ops.release);
- dvb_frontend_invoke_release(fe, fe->ops.detach);
dvb_frontend_put(fe);
}
EXPORT_SYMBOL(dvb_frontend_detach);
diff --git a/drivers/media/usb/dvb-usb/pctv452e.c b/drivers/media/usb/dvb-usb/pctv452e.c
index d6b36e4f33d2..441d878fc22c 100644
--- a/drivers/media/usb/dvb-usb/pctv452e.c
+++ b/drivers/media/usb/dvb-usb/pctv452e.c
@@ -909,14 +909,6 @@ static int pctv452e_frontend_attach(struct dvb_usb_adapter *a)
&a->dev->i2c_adap);
if (!a->fe_adap[0].fe)
return -ENODEV;
-
- /*
- * dvb_frontend will call dvb_detach for both stb0899_detach
- * and stb0899_release but we only do dvb_attach(stb0899_attach).
- * Increment the module refcount instead.
- */
- symbol_get(stb0899_attach);
-
if ((dvb_attach(lnbp22_attach, a->fe_adap[0].fe,
&a->dev->i2c_adap)) == NULL)
err("Cannot attach lnbp22\n");
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: e23008ec81ef NFSv4 reduce attribute requests for open reclaim.
The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135, v4.9.186, v4.4.186.
v5.2.5: Build OK!
v4.19.63: Failed to apply! Possible dependencies:
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
v4.14.135: Failed to apply! Possible dependencies:
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
v4.9.186: Failed to apply! Possible dependencies:
4e2fcac77390 ("NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state()")
75e8c48b9ef3 ("NFSv4: Use the nfs4_state being recovered in _nfs4_opendata_to_nfs4_state()")
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
v4.4.186: Failed to apply! Possible dependencies:
1393d9612ba0 ("NFSv4: Fix a race when updating an open_stateid")
4586f6e28327 ("NFSv4.1: Add a helper function to deal with expired stateids")
4e2fcac77390 ("NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state()")
75e8c48b9ef3 ("NFSv4: Use the nfs4_state being recovered in _nfs4_opendata_to_nfs4_state()")
8a64c4ef106d ("NFSv4.1: Even if the stateid is OK, we may need to recover the open modes")
a8ce377a5db8 ("nfs: track whether server sets MAY_NOTIFY_LOCK flag")
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
b134fc4a5333 ("NFSv4: Don't test open_stateid unless it is set")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 24311f884189 NFSv4: Recovery of recalled read delegations is broken.
The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135, v4.9.186, v4.4.186.
v5.2.5: Build OK!
v4.19.63: Failed to apply! Possible dependencies:
07d02a67b7fa ("SUNRPC: Simplify lookup code")
79b181810285 ("SUNRPC: Convert auth creds to use refcount_t")
8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred")
95cd623250ad ("SUNRPC: Clean up the AUTH cache code")
97f68c6b02e0 ("SUNRPC: add 'struct cred *' to auth_cred and rpc_cred")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
fc0664fd9bcc ("SUNRPC: remove groupinfo from struct auth_cred.")
v4.14.135: Failed to apply! Possible dependencies:
07d02a67b7fa ("SUNRPC: Simplify lookup code")
12f275cdd163 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
1eb5d98f16f6 ("nfs: convert to new i_version API")
35156bfff3c0 ("NFSv4: Fix the nfs_inode_set_delegation() arguments")
79b181810285 ("SUNRPC: Convert auth creds to use refcount_t")
95cd623250ad ("SUNRPC: Clean up the AUTH cache code")
97f68c6b02e0 ("SUNRPC: add 'struct cred *' to auth_cred and rpc_cred")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
fc0664fd9bcc ("SUNRPC: remove groupinfo from struct auth_cred.")
v4.9.186: Failed to apply! Possible dependencies:
1eb5d98f16f6 ("nfs: convert to new i_version API")
35156bfff3c0 ("NFSv4: Fix the nfs_inode_set_delegation() arguments")
39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution")
7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally")
9cf09d68b89a ("arm64: xen: Enable user access before a privcmd hvc call")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
bd38967d406f ("arm64: Factor out PAN enabling/disabling into separate uaccess_* macros")
v4.4.186: Failed to apply! Possible dependencies:
0654cc726fc6 ("NFSv4.1/pNFS: Add a helper to mark the layout as returned")
10335556c9e6 ("NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout")
13c13a6ad71f ("pNFS: Fix missing layoutreturn calls")
2454dfea0aef ("NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout")
3982a6a2d0e6 ("pnfs: keep track of the return sequence number in pnfs_layout_hdr")
4b0934baf931 ("NFSv4.1/pNFS: Fix a race in initiate_file_draining()")
506c0d68269e ("NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments")
50f563ef5d41 ("NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids")
5c97f5de2c7c ("NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode")
68d264cf02b0 ("NFS42: handle layoutstats stateid error")
6d597e175012 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
71b39854a500 ("NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()")
9fd4b9fc7695 ("NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b20135d0b243 ("NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
e036f46453f2 ("NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id")
e0d9243048fd ("NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL")
ed429d6b934d ("NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()")
fc7ff36747b9 ("pNFS: If we have to delay the layout callback, mark the layout for return")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
This is the start of the stable review cycle for the 4.14.136 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.136-rc1
Yan, Zheng <zyan(a)redhat.com>
ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato <ysato(a)users.sourceforge.jp>
Fix allyesconfig output.
Miroslav Lichvar <mlichvar(a)redhat.com>
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Jann Horn <jannh(a)google.com>
sched/fair: Don't free p->numa_faults with concurrent readers
Vladis Dronov <vdronov(a)redhat.com>
Bluetooth: hci_uart: check for missing tty operations
Sunil Muthuswamy <sunilmut(a)microsoft.com>
hv_sock: Add support for delayed close
Joerg Roedel <jroedel(a)suse.de>
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Dmitry Safonov <dima(a)arista.com>
iommu/vt-d: Don't queue_iova() if there is no flush queue
Luke Nowakowski-Krijger <lnowakow(a)eng.ucsd.edu>
media: radio-raremono: change devm_k*alloc to k*alloc
Benjamin Coddington <bcodding(a)redhat.com>
NFS: Cleanup if nfs_match_client is interrupted
Andrey Konovalov <andreyknvl(a)google.com>
media: pvrusb2: use a different format for warnings
Oliver Neukum <oneukum(a)suse.com>
media: cpia2_usb: first wake up, then free in disconnect
Fabio Estevam <festevam(a)gmail.com>
ath10k: Change the warning message string
Sean Young <sean(a)mess.org>
media: au0828: fix null dereference in error path
Phong Tran <tranmanphong(a)gmail.com>
ISDN: hfcsusb: checking idx of ep configuration
Todd Kjos <tkjos(a)android.com>
binder: fix possible UAF when freeing buffer
Will Deacon <will.deacon(a)arm.com>
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
Abhishek Sahu <absahu(a)codeaurora.org>
i2c: qup: fixed releasing dma without flush operation completion
allen yan <yanwei(a)marvell.com>
arm64: dts: marvell: Fix A37xx UART0 register size
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Fix lookup revalidate of regular files
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Refactor nfs_lookup_revalidate()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix dentry revalidation on NFSv4 lookup
Sunil Muthuswamy <sunilmut(a)microsoft.com>
vsock: correct removal of socket from the list
Stefan Hajnoczi <stefanha(a)redhat.com>
VSOCK: use TCP state constants for sk_state
-------------
Diffstat:
.../devicetree/bindings/serial/mvebu-uart.txt | 2 +-
Makefile | 4 +-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +-
arch/arm64/include/asm/compat.h | 1 +
arch/sh/boards/Kconfig | 14 +-
drivers/android/binder.c | 16 +-
drivers/bluetooth/hci_ath.c | 3 +
drivers/bluetooth/hci_bcm.c | 3 +
drivers/bluetooth/hci_intel.c | 3 +
drivers/bluetooth/hci_ldisc.c | 13 +
drivers/bluetooth/hci_mrvl.c | 3 +
drivers/bluetooth/hci_uart.h | 1 +
drivers/i2c/busses/i2c-qup.c | 2 +
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/iova.c | 18 +-
drivers/isdn/hardware/mISDN/hfcsusb.c | 3 +
drivers/media/radio/radio-raremono.c | 30 ++-
drivers/media/usb/au0828/au0828-core.c | 12 +-
drivers/media/usb/cpia2/cpia2_usb.c | 3 +-
drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +-
drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +-
drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +-
drivers/net/wireless/ath/ath10k/usb.c | 2 +-
drivers/pps/pps.c | 8 +
fs/ceph/caps.c | 7 +-
fs/exec.c | 2 +-
fs/nfs/client.c | 4 +-
fs/nfs/dir.c | 295 +++++++++++----------
fs/nfs/nfs4proc.c | 15 +-
include/linux/iova.h | 6 +
include/linux/sched/numa_balancing.h | 4 +-
include/net/af_vsock.h | 3 -
kernel/fork.c | 2 +-
kernel/sched/fair.c | 24 +-
net/vmw_vsock/af_vsock.c | 84 +++---
net/vmw_vsock/hyperv_transport.c | 118 ++++++---
net/vmw_vsock/virtio_transport.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 22 +-
net/vmw_vsock/vmci_transport.c | 34 +--
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
41 files changed, 472 insertions(+), 311 deletions(-)
This is the start of the stable review cycle for the 4.14.132 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.132-rc1
Xin Long <lucien.xin(a)gmail.com>
tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
Will Deacon <will.deacon(a)arm.com>
futex: Update comments and docs about return values of arch futex code
Daniel Borkmann <daniel(a)iogearbox.net>
bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
Will Deacon <will.deacon(a)arm.com>
arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg()
Martin KaFai Lau <kafai(a)fb.com>
bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err
Martin KaFai Lau <kafai(a)fb.com>
bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
YueHaibing <yuehaibing(a)huawei.com>
bonding: Always enable vlan tx offload
YueHaibing <yuehaibing(a)huawei.com>
team: Always enable vlan tx offload
Fei Li <lifei.shirley(a)bytedance.com>
tun: wake up waitqueues after IFF_UP is set
Xin Long <lucien.xin(a)gmail.com>
tipc: check msg->req data len in tipc_nl_compat_bearer_disable
Xin Long <lucien.xin(a)gmail.com>
tipc: change to use register_pernet_device
Xin Long <lucien.xin(a)gmail.com>
sctp: change to hold sk after auth shkey is created successfully
Roland Hii <roland.king.guan.hii(a)intel.com>
net: stmmac: fixed new system time seconds value calculation
JingYi Hou <houjingyi647(a)gmail.com>
net: remove duplicate fetch in sock_getsockopt
Eric Dumazet <edumazet(a)google.com>
net/packet: fix memory leak in packet_set_ring()
Stephen Suryaputra <ssuryaextr(a)gmail.com>
ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
Neil Horman <nhorman(a)tuxdriver.com>
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Wang Xin <xin.wang7(a)cn.bosch.com>
eeprom: at24: fix unexpected timeout under high load
Geert Uytterhoeven <geert(a)linux-m68k.org>
cpu/speculation: Warn on unsupported mitigations= parameter
Trond Myklebust <trondmy(a)gmail.com>
NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
Thomas Gleixner <tglx(a)linutronix.de>
x86/microcode: Fix the microcode load on CPU hotplug for real
Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
x86/speculation: Allow guests to use SSBD even if host does not
Jan Kara <jack(a)suse.cz>
scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
zhangyi (F) <yi.zhang(a)huawei.com>
dm log writes: make sure super sector log updates are written in order
Colin Ian King <colin.king(a)canonical.com>
mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
Jann Horn <jannh(a)google.com>
fs/binfmt_flat.c: make load_flat_shared_library() work
zhong jiang <zhongjiang(a)huawei.com>
mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
John Ogness <john.ogness(a)linutronix.de>
fs/proc/array.c: allow reporting eip/esp for all coredumping threads
Sasha Levin <sashal(a)kernel.org>
Revert "compiler.h: update definition of unreachable()"
Kristian Evensen <kristian.evensen(a)gmail.com>
qmi_wwan: Fix out-of-bounds read
Adeodato Simó <dato(a)net.com.org.es>
net/9p: include trans_common.h to fix missing prototype warning.
Dominique Martinet <dominique.martinet(a)cea.fr>
9p: p9dirent_read: check network-provided name length
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/rdma: remove useless check in cm_event_handler
Dominique Martinet <dominique.martinet(a)cea.fr>
9p: acl: fix uninitialized iattr access
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/rdma: do not disconnect on down_interruptible EAGAIN
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/xen: fix check for xenbus_read error in front_probe
Martin Wilck <mwilck(a)suse.com>
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
Christoph Hellwig <hch(a)lst.de>
block: add a lower-level bio_add_page interface
Mike Marciniszyn <mike.marciniszyn(a)intel.com>
IB/hfi1: Close PSM sdma_progress sleep window
Sasha Levin <sashal(a)kernel.org>
Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf header: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf help: Remove needless use of strncpy()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
-------------
Diffstat:
Documentation/robust-futexes.txt | 3 +-
Makefile | 4 +-
arch/arm64/include/asm/futex.h | 4 +-
arch/arm64/include/asm/insn.h | 8 ++
arch/arm64/kernel/insn.c | 40 +++++++
arch/arm64/net/bpf_jit.h | 4 +
arch/arm64/net/bpf_jit_comp.c | 28 +++--
arch/x86/kernel/cpu/bugs.c | 11 +-
arch/x86/kernel/cpu/microcode/core.c | 15 ++-
block/bio.c | 131 +++++++++++++++------
drivers/infiniband/hw/hfi1/user_sdma.c | 12 +-
drivers/infiniband/hw/hfi1/user_sdma.h | 1 -
drivers/md/dm-log-writes.c | 23 +++-
drivers/misc/eeprom/at24.c | 107 ++++++++++++-----
drivers/net/bonding/bond_main.c | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 2 +-
drivers/net/team/team.c | 2 +-
drivers/net/tun.c | 19 ++-
drivers/net/usb/qmi_wwan.c | 4 +-
drivers/scsi/vmw_pvscsi.c | 6 +-
fs/9p/acl.c | 2 +-
fs/binfmt_flat.c | 23 ++--
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +-
fs/proc/array.c | 2 +-
include/asm-generic/futex.h | 8 +-
include/linux/bio.h | 9 ++
include/linux/compiler.h | 5 +-
kernel/cpu.c | 3 +
kernel/trace/trace_branch.c | 4 -
mm/mempolicy.c | 2 +-
mm/page_idle.c | 4 +-
net/9p/protocol.c | 12 +-
net/9p/trans_common.c | 1 +
net/9p/trans_rdma.c | 7 +-
net/9p/trans_xen.c | 4 +-
net/core/sock.c | 3 -
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 6 +-
net/ipv6/udp.c | 4 +-
net/packet/af_packet.c | 23 +++-
net/packet/internal.h | 1 +
net/sctp/endpointola.c | 8 +-
net/tipc/core.c | 12 +-
net/tipc/netlink_compat.c | 18 ++-
net/tipc/udp_media.c | 8 +-
tools/perf/builtin-help.c | 2 +-
tools/perf/ui/tui/helpline.c | 2 +-
tools/perf/util/header.c | 2 +-
48 files changed, 418 insertions(+), 187 deletions(-)
From: Chris Down <chris(a)chrisdown.name>
Subject: cgroup: kselftest: relax fs_spec checks
On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct. Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":
% grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.
After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.
Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.name
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/cgroup/cgroup_util.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c~cgroup-kselftest-relax-fs_spec-checks
+++ a/tools/testing/selftests/cgroup/cgroup_util.c
@@ -191,8 +191,7 @@ int cg_find_unified_root(char *root, siz
strtok(NULL, delim);
strtok(NULL, delim);
- if (strcmp(fs, "cgroup") == 0 &&
- strcmp(type, "cgroup2") == 0) {
+ if (strcmp(type, "cgroup2") == 0) {
strncpy(root, mount, len);
return 0;
}
_
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: ubsan: build ubsan.c more conservatively
objtool points out several conditions that it does not like, depending on
the combination with other configuration options and compiler variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file as
we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is mixed
with the gcc specific -fno-conserve-stack option. According to Andrey
Ryabinin, that option is not even needed, dropping it here fixes the
stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/Makefile~ubsan-build-ubsanc-more-conservatively
+++ a/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
_
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes 1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting while
making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and isolating
a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/mm/compaction.c~mm-compaction-avoid-100%-cpu-usage-during-compaction-when-a-task-is-killed
+++ a/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compac
/*
* Periodically drop the lock (if held) regardless of its
- * contention, to give chance to IRQs. Abort async compaction
- * if contended.
+ * contention, to give chance to IRQs. Abort completely if
+ * a fatal signal is pending.
*/
if (!(low_pfn % SWAP_CLUSTER_MAX)
&& compact_unlock_should_abort(&pgdat->lru_lock,
- flags, &locked, cc))
- break;
+ flags, &locked, cc)) {
+ low_pfn = 0;
+ goto fatal_pending;
+ }
if (!pfn_valid_within(low_pfn))
goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_abort:
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
+fatal_pending:
cc->total_migrate_scanned += nr_scanned;
if (nr_isolated)
count_compact_events(COMPACTISOLATED, nr_isolated);
_
From: Jan Kara <jack(a)suse.cz>
Subject: mm: migrate: fix reference check race between __find_get_block() and migration
buffer_migrate_page_norefs() can race with bh users in the following way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page. Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary. The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock. Between the page lock and
private_lock, it should be impossible for other references to be acquired
and updates to happen during the migration.
A user had reported data corruption issues on a distribution kernel with a
similar page migration implementation as mainline. The data corruption
could not be reproduced with this patch applied. A small number of
migration-intensive tests were run and no performance problems were noted.
[mgorman(a)techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fix-reference-check-race-between-__find_get_block-and-migration
+++ a/mm/migrate.c
@@ -767,12 +767,12 @@ recheck_buffers:
}
bh = bh->b_this_page;
} while (bh != head);
- spin_unlock(&mapping->private_lock);
if (busy) {
if (invalidated) {
rc = -EAGAIN;
goto unlock_buffers;
}
+ spin_unlock(&mapping->private_lock);
invalidate_bh_lrus();
invalidated = true;
goto recheck_buffers;
@@ -805,6 +805,8 @@ recheck_buffers:
rc = MIGRATEPAGE_SUCCESS;
unlock_buffers:
+ if (check_refs)
+ spin_unlock(&mapping->private_lock);
bh = head;
do {
unlock_buffer(bh);
_
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
Shakeel Butt reported premature oom on kernel with "cgroup_disable=memory"
since mem_cgroup_is_root() returns false even though memcg is actually
NULL. The drop_caches is also broken.
It is because aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls
in shrink_node()") removed the !memcg check before !mem_cgroup_is_root().
And, surprisingly root memcg is allocated even though memory cgroup is
disabled by kernel boot parameter.
Add mem_cgroup_disabled() check to make reclaimer work as expected.
Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.a…
Fixes: aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Jan Hadrava <had(a)kam.mff.cuni.cz>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-check-if-mem-cgroup-is-disabled-or-not-before-calling-memcg-slab-shrinker
+++ a/mm/vmscan.c
@@ -699,7 +699,14 @@ static unsigned long shrink_slab(gfp_t g
unsigned long ret, freed = 0;
struct shrinker *shrinker;
- if (!mem_cgroup_is_root(memcg))
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
_
Some devices are not supposed to support on-die ECC but experience
shows that internal ECC machinery can actually be enabled through the
"SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
Tables" returns that it is not.
Currently, the driver checks the "READ ID Parameter" field directly
after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction.
To address this in a common way we need to turn off the on-die ECC
directly after reading the "READ ID Parameter" and before checking the
"ECC status".
Cc: stable(a)vger.kernel.org
Fixes: dbc44edbf833 ("mtd: rawnand: micron: Fix on-die ECC detection logic")
Signed-off-by: Marco Felsch <m.felsch(a)pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 1622d3145587..8ca9fad6e6ad 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
(chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2)
return MICRON_ON_DIE_UNSUPPORTED;
+ /*
+ * It seems that there are devices which do not support ECC officially.
+ * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports
+ * enabling the ECC feature but don't reflect that to the READ_ID table.
+ * So we have to guarantee that we disable the ECC feature directly
+ * after we did the READ_ID table command. Later we can evaluate the
+ * ECC_ENABLE support.
+ */
ret = micron_nand_on_die_ecc_setup(chip, true);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
@@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
- if (!(id[4] & MICRON_ID_ECC_ENABLED))
- return MICRON_ON_DIE_UNSUPPORTED;
-
ret = micron_nand_on_die_ecc_setup(chip, false);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
+ if (!(id[4] & MICRON_ID_ECC_ENABLED))
+ return MICRON_ON_DIE_UNSUPPORTED;
+
ret = nand_readid_op(chip, 0, id, sizeof(id));
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
--
2.20.1
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 6ee03a816d67..289a086e1467 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -397,6 +397,30 @@ void __put_devmap_managed_page(struct page *page)
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 9adedba78eeac..044559c10e98e 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -495,7 +495,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
> From: Sasha Levin <sashal(a)kernel.org>
> Sent: Friday, August 2, 2019 11:08 AM
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 15becc2b56c6 PCI: hv: Add hv_pci_remove_slots() when we
> unload the driver.
>
> The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135.
>
> v5.2.5: Build OK!
> v4.19.63: Build OK!
> v4.14.135: Failed to apply! Possible dependencies:
> Unable to calculate
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
> Sasha
I'm waiting the patch to be accepted into the pci tree first.
We do not need to do anything at this moment.
Thanks,
-- Dexuan
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 733d3e4a30d85..2cee032af46d2 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -530,7 +530,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 88d02c9ba2e83fc22d37ccb1f11c62ea6fc9ae50 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux(a)roeck-us.net>
Date: Fri, 2 Aug 2019 09:03:42 -0700
Subject: usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
TCPM may receive PD messages associated with unknown or unsupported
alternate modes. If that happens, calls to typec_match_altmode()
will return NULL. The tcpm code does not currently take this into
account. This results in crashes.
Unable to handle kernel NULL pointer dereference at virtual address 000001f0
pgd = 41dad9a1
[000001f0] *pgd=00000000
Internal error: Oops: 5 [#1] THUMB2
Modules linked in: tcpci tcpm
CPU: 0 PID: 2338 Comm: kworker/u2:0 Not tainted 5.1.18-sama5-armv7-r2 #6
Hardware name: Atmel SAMA5
Workqueue: 2-0050 tcpm_pd_rx_handler [tcpm]
PC is at typec_altmode_attention+0x0/0x14
LR is at tcpm_pd_rx_handler+0xa3b/0xda0 [tcpm]
...
[<c03fbee8>] (typec_altmode_attention) from [<bf8030fb>]
(tcpm_pd_rx_handler+0xa3b/0xda0 [tcpm])
[<bf8030fb>] (tcpm_pd_rx_handler [tcpm]) from [<c012082b>]
(process_one_work+0x123/0x2a8)
[<c012082b>] (process_one_work) from [<c0120a6d>]
(worker_thread+0xbd/0x3b0)
[<c0120a6d>] (worker_thread) from [<c012431f>] (kthread+0xcf/0xf4)
[<c012431f>] (kthread) from [<c01010f9>] (ret_from_fork+0x11/0x38)
Ignore PD messages if the associated alternate mode is not supported.
Fixes: e9576fe8e605c ("usb: typec: tcpm: Support for Alternate Modes")
Cc: stable <stable(a)vger.kernel.org>
Reported-by: Douglas Gilbert <dgilbert(a)interlog.com>
Cc: Douglas Gilbert <dgilbert(a)interlog.com>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Tested-by: Douglas Gilbert <dgilbert(a)interlog.com>
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/r/1564761822-13984-1-git-send-email-linux@roeck-us.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 38 ++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index ab6456622120..15abe1d9958f 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1109,7 +1109,8 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
break;
case CMD_ATTENTION:
/* Attention command does not have response */
- typec_altmode_attention(adev, p[1]);
+ if (adev)
+ typec_altmode_attention(adev, p[1]);
return 0;
default:
break;
@@ -1161,20 +1162,26 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
}
break;
case CMD_ENTER_MODE:
- typec_altmode_update_active(pdev, true);
-
- if (typec_altmode_vdm(adev, p[0], &p[1], cnt)) {
- response[0] = VDO(adev->svid, 1, CMD_EXIT_MODE);
- response[0] |= VDO_OPOS(adev->mode);
- return 1;
+ if (adev && pdev) {
+ typec_altmode_update_active(pdev, true);
+
+ if (typec_altmode_vdm(adev, p[0], &p[1], cnt)) {
+ response[0] = VDO(adev->svid, 1,
+ CMD_EXIT_MODE);
+ response[0] |= VDO_OPOS(adev->mode);
+ return 1;
+ }
}
return 0;
case CMD_EXIT_MODE:
- typec_altmode_update_active(pdev, false);
+ if (adev && pdev) {
+ typec_altmode_update_active(pdev, false);
- /* Back to USB Operation */
- WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB,
- NULL));
+ /* Back to USB Operation */
+ WARN_ON(typec_altmode_notify(adev,
+ TYPEC_STATE_USB,
+ NULL));
+ }
break;
default:
break;
@@ -1184,8 +1191,10 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
switch (cmd) {
case CMD_ENTER_MODE:
/* Back to USB Operation */
- WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB,
- NULL));
+ if (adev)
+ WARN_ON(typec_altmode_notify(adev,
+ TYPEC_STATE_USB,
+ NULL));
break;
default:
break;
@@ -1196,7 +1205,8 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
}
/* Informing the alternate mode drivers about everything */
- typec_altmode_vdm(adev, p[0], &p[1], cnt);
+ if (adev)
+ typec_altmode_vdm(adev, p[0], &p[1], cnt);
return rlen;
}
--
2.22.0
This is a note to let you know that I've just added the patch titled
usb: host: xhci-rcar: Fix timeout in xhci_suspend()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 783bda5e41acc71f98336e1a402c180f9748e5dc Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Fri, 2 Aug 2019 17:33:35 +0900
Subject: usb: host: xhci-rcar: Fix timeout in xhci_suspend()
When a USB device is connected to the host controller and
the system enters suspend, the following error happens
in xhci_suspend():
xhci-hcd ee000000.usb: WARN: xHC CMD_RUN timeout
Since the firmware/internal CPU control the USBSTS.STS_HALT
and the process speed is down when the roothub port enters U3,
long delay for the handshake of STS_HALT is neeed in xhci_suspend().
So, this patch adds to set the XHCI_SLOW_SUSPEND.
Fixes: 435cc1138ec9 ("usb: host: xhci-plat: set resume_quirk() for R-Car controllers")
Cc: <stable(a)vger.kernel.org> # v4.12+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/1564734815-17964-1-git-send-email-yoshihiro.shimo…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-rcar.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-rcar.c b/drivers/usb/host/xhci-rcar.c
index 671bce18782c..8616c52849c6 100644
--- a/drivers/usb/host/xhci-rcar.c
+++ b/drivers/usb/host/xhci-rcar.c
@@ -238,10 +238,15 @@ int xhci_rcar_init_quirk(struct usb_hcd *hcd)
* pointers. So, this driver clears the AC64 bit of xhci->hcc_params
* to call dma_set_coherent_mask(dev, DMA_BIT_MASK(32)) in
* xhci_gen_setup().
+ *
+ * And, since the firmware/internal CPU control the USBSTS.STS_HALT
+ * and the process speed is down when the roothub port enters U3,
+ * long delay for the handshake of STS_HALT is neeed in xhci_suspend().
*/
if (xhci_rcar_is_gen2(hcd->self.controller) ||
- xhci_rcar_is_gen3(hcd->self.controller))
- xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
+ xhci_rcar_is_gen3(hcd->self.controller)) {
+ xhci->quirks |= XHCI_NO_64BIT_SUPPORT | XHCI_SLOW_SUSPEND;
+ }
if (!xhci_rcar_wait_for_pll_active(hcd))
return -ETIMEDOUT;
--
2.22.0
The callback pmu->read() can be called with interrupts enabled.
Currently, on ARM, this can cause the following callchain:
armpmu_read() -> armpmu_event_update() -> armv7pmu_read_counter()
The last function might modify the counter selector register and then
read the target counter, without taking any lock. With interrupts
enabled, a PMU interrupt could occur and modify the selector register
as well, between the selection and read of the interrupted context.
Save and restore the value of the selector register in the PMU interrupt
handler, ensuring the interrupted context is left with the correct PMU
registers selected.
Signed-off-by: Julien Thierry <julien.thierry(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Arnaldo Carvalho de Melo <acme(a)kernel.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: stable(a)vger.kernel.org
---
arch/arm/kernel/perf_event_v7.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index a4fb0f8..b7be2a3 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -736,10 +736,22 @@ static inline int armv7_pmnc_counter_has_overflowed(u32 pmnc, int idx)
return pmnc & BIT(ARMV7_IDX_TO_COUNTER(idx));
}
-static inline void armv7_pmnc_select_counter(int idx)
+static inline u32 armv7_pmsel_read(void)
+{
+ u32 pmsel;
+
+ asm volatile("mrc p15, 0, %0, c9, c12, 5" : "=&r" (pmsel));
+ return pmsel;
+}
+
+static inline void armv7_pmsel_write(u32 counter)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter));
+}
+
+static inline void armv7_pmnc_select_counter(int idx)
+{
+ armv7_pmsel_write(ARMV7_IDX_TO_COUNTER(idx));
isb();
}
@@ -952,8 +964,15 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
struct perf_sample_data data;
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
struct pt_regs *regs;
+ u32 pmsel;
int idx;
+
+ /*
+ * Save pmsel in case the interrupted context was using it.
+ */
+ pmsel = armv7_pmsel_read();
+
/*
* Get and reset the IRQ flags
*/
@@ -1004,6 +1023,8 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
*/
irq_work_run();
+ armv7_pmsel_write(pmsel);
+
return IRQ_HANDLED;
}
--
1.9.1
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 2278d9ab723bf..9837a61cb3e3b 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -490,7 +490,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
From: Tomas Bortoli <tomasbortoli(a)gmail.com>
Uninitialized Kernel memory can leak to USB devices.
Fix by using kzalloc() instead of kmalloc() on the affected buffers.
Signed-off-by: Tomas Bortoli <tomasbortoli(a)gmail.com>
Reported-by: syzbot+d6a5a1a3657b596ef132(a)syzkaller.appspotmail.com
Fixes: f14e22435a27 ("net: can: peak_usb: Do not do dma on the stack")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/peak_usb/pcan_usb_pro.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
index 178bb7cff0c1..53cb2f72bdd0 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
@@ -494,7 +494,7 @@ static int pcan_usb_pro_drv_loaded(struct peak_usb_device *dev, int loaded)
u8 *buffer;
int err;
- buffer = kmalloc(PCAN_USBPRO_FCT_DRVLD_REQ_LEN, GFP_KERNEL);
+ buffer = kzalloc(PCAN_USBPRO_FCT_DRVLD_REQ_LEN, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
--
2.20.1
From: Tomas Bortoli <tomasbortoli(a)gmail.com>
Uninitialized Kernel memory can leak to USB devices.
Fix by using kzalloc() instead of kmalloc() on the affected buffers.
Signed-off-by: Tomas Bortoli <tomasbortoli(a)gmail.com>
Reported-by: syzbot+513e4d0985298538bf9b(a)syzkaller.appspotmail.com
Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
index 34761c3a6286..47cc1ff5b88e 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -841,7 +841,7 @@ static int pcan_usb_fd_init(struct peak_usb_device *dev)
goto err_out;
/* allocate command buffer once for all for the interface */
- pdev->cmd_buffer_addr = kmalloc(PCAN_UFD_CMD_BUFFER_SIZE,
+ pdev->cmd_buffer_addr = kzalloc(PCAN_UFD_CMD_BUFFER_SIZE,
GFP_KERNEL);
if (!pdev->cmd_buffer_addr)
goto err_out_1;
--
2.20.1
This adds support for __copy to v4.9.y so that we can use it in
init/exit_module to avoid -Werror=missing-attributes errors on GCC 9.
Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/
Cc: <stable(a)vger.kernel.org>
Suggested-by: Rolf Eike Beer <eb(a)emlix.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
---
include/linux/compiler.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 80a5bc623c47..e905353f217a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -54,6 +54,22 @@ extern void __chk_io_ptr(const volatile void __iomem *);
#ifdef __KERNEL__
+/*
+ * Minimal backport of compiler_attributes.h to add support for __copy
+ * to v4.9.y so that we can use it in init/exit_module to avoid
+ * -Werror=missing-attributes errors on GCC 9.
+ */
+#ifndef __has_attribute
+# define __has_attribute(x) __GCC4_has_attribute_##x
+# define __GCC4_has_attribute___copy__ 0
+#endif
+
+#if __has_attribute(__copy__)
+# define __copy(symbol) __attribute__((__copy__(symbol)))
+#else
+# define __copy(symbol)
+#endif
+
#ifdef __GNUC__
#include <linux/compiler-gcc.h>
#endif
--
2.17.1
I decided to dig out a toy project which uses a DragonBoard 410c. This has
been "running" with kernel 4.9, which I would keep this way for unrelated
reasons. The vanilla 4.9 kernel wasn't bootable back then, but it was
buildable, which was good enough.
Upgrading the kernel to 4.9.180 caused the boot to suddenly fail:
aarch64-unknown-linux-gnueabi-ld: ./drivers/firmware/efi/libstub/lib.a(arm64-
stub.stub.o): in function `handle_kernel_image':
/tmp/e2/build/linux-4.9.139/drivers/firmware/efi/libstub/arm64-stub.c:63:
undefined reference to `__efistub__GLOBAL_OFFSET_TABLE_'
aarch64-unknown-linux-gnueabi-ld: ./drivers/firmware/efi/libstub/lib.a(arm64-
stub.stub.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol
`__efistub__GLOBAL_OFFSET_TABLE_' which may bind externally can not be used
when making a shared object; recompile with -fPIC
/tmp/e2/build/linux-4.9.139/drivers/firmware/efi/libstub/arm64-stub.c:63:
(.init.text+0xc): dangerous relocation: unsupported relocation
/tmp/e2/build/linux-4.9.139/Makefile:1001: recipe for target 'vmlinux' failed
-make[1]: *** [vmlinux] Error 1
This is caused by commit 27b5ebf61818749b3568354c64a8ec2d9cd5ecca from
linux-4.9.y (which is 91ee5b21ee026c49e4e7483de69b55b8b47042be), reverting
this commit fixes the build.
This happens with vanilla binutils 2.32 and gcc 8.3.0 as well as 9.1.0. See
the attached .config for reference.
If you have questions or patches just ping me.
Greetings,
Eike
--
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055
emlix - smart embedded open source
Hi back in the December of 2016 there was commit
"1c52d859cb2d417e7216d3e56bb7fea88444cec9"
witch was followed shortly by "c198b121b1a1d7a7171770c634cd49191bac4477"
Unfortunately only the first commit was later included in long-term kernel
branches such as 4.4 and 4.9 witch left some of microcode functionality
broken on 32 bit systems
I guess it should be easily fixed by including
"c198b121b1a1d7a7171770c634cd49191bac4477" in those branches
Hi,
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.1.4, v5.0.18, v4.19.45, v4.14.121, v4.9.178, v4.4.180, v3.18.140.
v5.1.4: Build OK!
v5.0.18: Failed to apply! Possible dependencies:
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.19.45: Failed to apply! Possible dependencies:
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.14.121: Failed to apply! Possible dependencies:
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
a57d9064e4ee4 ("ceph: flush pending works before shutdown super")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.9.178: Failed to apply! Possible dependencies:
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
a57d9064e4ee4 ("ceph: flush pending works before shutdown super")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.4.180: Failed to apply! Possible dependencies:
13d1ad16d05ee ("libceph: move message allocation out of ceph_osdc_alloc_request()")
34b759b4a22b0 ("ceph: kill ceph_empty_snapc")
3f1af42ad0fad ("libceph: enable large, variable-sized OSD requests")
5be0389dac662 ("ceph: re-send AIO write request when getting -EOLDSNAP error")
7627151ea30bc ("libceph: define new ceph_file_layout structure")
779fe0fb8e188 ("ceph: rados pool namespace support")
922dab6134178 ("libceph, rbd: ceph_osd_linger_request, watch/notify v2")
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
ae458f5a171ba ("libceph: make r_request msg_size calculation clearer")
c41d13a31fefe ("rbd: use header_oid instead of header_name")
c8fe9b17d055f ("ceph: Asynchronous IO support")
d30291b985d18 ("libceph: variable-sized ceph_object_id")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v3.18.140: Failed to apply! Possible dependencies:
10183a69551f7 ("ceph: check OSD caps before read/write")
28127bdd2f843 ("ceph: convert inline data to normal data before data write")
31c542a199d79 ("ceph: add inline data to pagecache")
5be0389dac662 ("ceph: re-send AIO write request when getting -EOLDSNAP error")
70db4f3629b34 ("ceph: introduce a new inode flag indicating if cached dentries are ordered")
745a8e3bccbc6 ("ceph: don't pre-allocate space for cap release messages")
7627151ea30bc ("libceph: define new ceph_file_layout structure")
779fe0fb8e188 ("ceph: rados pool namespace support")
83701246aee8f ("ceph: sync read inline data")
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
affbc19a68f99 ("ceph: make sure syncfs flushes all cap snaps")
c8fe9b17d055f ("ceph: Asynchronous IO support")
d30291b985d18 ("libceph: variable-sized ceph_object_id")
d3383a8e37f80 ("ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
e96a650a8174e ("ceph, rbd: delete unnecessary checks before two function calls")
How should we proceed with this patch?
--
Thanks,
Sasha
From: Andrea Arcangeli <aarcange(a)redhat.com>
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Jann Horn <jannh(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[akaher(a)vmware.com: stable 4.9 backport
- handle binder_update_page_range - mhocko(a)suse.com]
Signed-off-by: Ajay Kaher <akaher(a)vmware.com>
---
drivers/android/binder.c | 6 ++++++
fs/proc/task_mmu.c | 18 ++++++++++++++++++
fs/userfaultfd.c | 9 +++++++++
include/linux/mm.h | 21 +++++++++++++++++++++
mm/mmap.c | 6 +++++-
5 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 80499f4..f05ab8f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -581,6 +581,12 @@ static int binder_update_page_range(struct binder_proc *proc, int allocate,
if (mm) {
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm)) {
+ if (allocate == 0)
+ goto free_range;
+ goto err_no_vma;
+ }
+
vma = proc->vma;
if (vma && mm != proc->vma_vm_mm) {
pr_err("%d: vma mm and task mm mismatch\n",
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5138e78..4b207b1 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1057,6 +1057,24 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
count = -EINTR;
goto out_mm;
}
+ /*
+ * Avoid to modify vma->vm_flags
+ * without locked ops while the
+ * coredump reads the vm_flags.
+ */
+ if (!mmget_still_valid(mm)) {
+ /*
+ * Silently return "count"
+ * like if get_task_mm()
+ * failed. FIXME: should this
+ * function have returned
+ * -ESRCH if get_task_mm()
+ * failed like if
+ * get_proc_task() fails?
+ */
+ up_write(&mm->mmap_sem);
+ goto out_mm;
+ }
for (vma = mm->mmap; vma; vma = vma->vm_next) {
vma->vm_flags &= ~VM_SOFTDIRTY;
vma_set_page_prot(vma);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 784d667..8bf425a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -479,6 +479,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
* taking the mmap_sem for writing.
*/
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto skip_mm;
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -501,6 +503,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
+skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
wakeup:
@@ -802,6 +805,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
+
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
@@ -947,6 +953,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
+
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e3c8d40..c239984 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1189,6 +1189,27 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long address,
void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
unsigned long start, unsigned long end);
+/*
+ * This has to be called after a get_task_mm()/mmget_not_zero()
+ * followed by taking the mmap_sem for writing before modifying the
+ * vmas or anything the coredump pretends not to change from under it.
+ *
+ * NOTE: find_extend_vma() called from GUP context is the only place
+ * that can modify the "mm" (notably the vm_start/end) under mmap_sem
+ * for reading and outside the context of the process, so it is also
+ * the only case that holds the mmap_sem for reading that must call
+ * this function. Generally if the mmap_sem is hold for reading
+ * there's no need of this check after get_task_mm()/mmget_not_zero().
+ *
+ * This function can be obsoleted and the check can be removed, after
+ * the coredump code will hold the mmap_sem for writing before
+ * invoking the ->core_dump methods.
+ */
+static inline bool mmget_still_valid(struct mm_struct *mm)
+{
+ return likely(!mm->core_state);
+}
+
/**
* mm_walk - callbacks for walk_page_range
* @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
diff --git a/mm/mmap.c b/mm/mmap.c
index 3f2314a..19368fb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2448,7 +2448,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
vma = find_vma_prev(mm, addr, &prev);
if (vma && (vma->vm_start <= addr))
return vma;
- if (!prev || expand_stack(prev, addr))
+ /* don't alter vm_end if the coredump is running */
+ if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr))
return NULL;
if (prev->vm_flags & VM_LOCKED)
populate_vma_page_range(prev, addr, prev->vm_end, NULL);
@@ -2474,6 +2475,9 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
return vma;
if (!(vma->vm_flags & VM_GROWSDOWN))
return NULL;
+ /* don't alter vm_start if the coredump is running */
+ if (!mmget_still_valid(mm))
+ return NULL;
start = vma->vm_start;
if (expand_stack(vma, addr))
return NULL;
--
2.7.4
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cb361d8cdef69990f6b4504dc1fd9a594d983c97 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 16 Jul 2019 17:20:47 +0200
Subject: [PATCH] sched/fair: Use RCU accessors consistently for ->numa_group
The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..9f51932bd543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1092,7 +1092,15 @@ struct task_struct {
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct numa_group *numa_group;
+ /*
+ * This pointer is only modified for current in syscall and
+ * pagefault context (and for tasks being destroyed), so it can be read
+ * from any of the following contexts:
+ * - RCU read-side critical section
+ * - current->numa_group from everywhere
+ * - task's runqueue locked, task not running
+ */
+ struct numa_group __rcu *numa_group;
/*
* numa_faults is an array split into four regions:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6adb0e0f5feb..bc9cfeaac8bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1086,6 +1086,21 @@ struct numa_group {
unsigned long faults[0];
};
+/*
+ * For functions that can be called in multiple contexts that permit reading
+ * ->numa_group (see struct task_struct for locking rules).
+ */
+static struct numa_group *deref_task_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_check(p->numa_group, p == current ||
+ (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+}
+
+static struct numa_group *deref_curr_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_protected(p->numa_group, p == current);
+}
+
static inline unsigned long group_faults_priv(struct numa_group *ng);
static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long period = smin;
+ struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p)
period *= shared + 1;
period /= private + shared + 1;
}
+ rcu_read_unlock();
return max(smin, period);
}
@@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long smax;
+ struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */
smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
@@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
- p->numa_group = NULL;
+ RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p)
{
- return p->numa_group ? p->numa_group->gid : 0;
+ struct numa_group *ng;
+ pid_t gid = 0;
+
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng)
+ gid = ng->gid;
+ rcu_read_unlock();
+
+ return gid;
}
/*
@@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid)
{
- if (!p->numa_group)
+ struct numa_group *ng = deref_task_numa_group(p);
+
+ if (!ng)
return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
- p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)];
+ return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
+ ng->faults[task_faults_idx(NUMA_MEM, nid, 1)];
}
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
@@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
static inline unsigned long group_weight(struct task_struct *p, int nid,
int dist)
{
+ struct numa_group *ng = deref_task_numa_group(p);
unsigned long faults, total_faults;
- if (!p->numa_group)
+ if (!ng)
return 0;
- total_faults = p->numa_group->total_faults;
+ total_faults = ng->total_faults;
if (!total_faults)
return 0;
@@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
int src_nid, int dst_cpu)
{
- struct numa_group *ng = p->numa_group;
+ struct numa_group *ng = deref_curr_numa_group(p);
int dst_nid = cpu_to_node(dst_cpu);
int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load,
static void task_numa_compare(struct task_numa_env *env,
long taskimp, long groupimp, bool maymove)
{
+ struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
+ long imp = p_ng ? groupimp : taskimp;
struct task_struct *cur;
long src_load, dst_load;
- long load;
- long imp = env->p->numa_group ? groupimp : taskimp;
- long moveimp = imp;
int dist = env->dist;
+ long moveimp = imp;
+ long load;
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
@@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env,
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
- if (cur->numa_group == env->p->numa_group) {
+ cur_ng = rcu_dereference(cur->numa_group);
+ if (cur_ng == p_ng) {
imp = taskimp + task_weight(cur, env->src_nid, dist) -
task_weight(cur, env->dst_nid, dist);
/*
* Add some hysteresis to prevent swapping the
* tasks within a group over tiny differences.
*/
- if (cur->numa_group)
+ if (cur_ng)
imp -= imp / 16;
} else {
/*
* Compare the group weights. If a task is all by itself
* (not part of a group), use the task weight instead.
*/
- if (cur->numa_group && env->p->numa_group)
+ if (cur_ng && p_ng)
imp += group_weight(cur, env->src_nid, dist) -
group_weight(cur, env->dst_nid, dist);
else
@@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p)
.best_imp = 0,
.best_cpu = -1,
};
+ unsigned long taskweight, groupweight;
struct sched_domain *sd;
+ long taskimp, groupimp;
+ struct numa_group *ng;
struct rq *best_rq;
- unsigned long taskweight, groupweight;
int nid, ret, dist;
- long taskimp, groupimp;
/*
* Pick the lowest SD_NUMA domain, as that would have the smallest
@@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p)
* multiple NUMA nodes; in order to better consolidate the group,
* we need to check other locations.
*/
- if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) {
+ ng = deref_curr_numa_group(p);
+ if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
@@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p)
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group) {
+ if (ng) {
if (env.best_cpu == -1)
nid = env.src_nid;
else
@@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p)
unsigned long total_faults;
u64 runtime, period;
spinlock_t *group_lock = NULL;
+ struct numa_group *ng;
/*
* The p->mm->numa_scan_seq field gets updated without
@@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p)
runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */
- if (p->numa_group) {
- group_lock = &p->numa_group->lock;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
+ group_lock = &ng->lock;
spin_lock_irq(group_lock);
}
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p)
p->numa_faults[cpu_idx] += f_diff;
faults += p->numa_faults[mem_idx];
p->total_numa_faults += diff;
- if (p->numa_group) {
+ if (ng) {
/*
* safe because we can only change our own group
*
@@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p)
* nid and priv in a specific region because it
* is at the beginning of the numa_faults array.
*/
- p->numa_group->faults[mem_idx] += diff;
- p->numa_group->faults_cpu[mem_idx] += f_diff;
- p->numa_group->total_faults += diff;
- group_faults += p->numa_group->faults[mem_idx];
+ ng->faults[mem_idx] += diff;
+ ng->faults_cpu[mem_idx] += f_diff;
+ ng->total_faults += diff;
+ group_faults += ng->faults[mem_idx];
}
}
- if (!p->numa_group) {
+ if (!ng) {
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p)
}
}
- if (p->numa_group) {
- numa_group_count_active_nodes(p->numa_group);
+ if (ng) {
+ numa_group_count_active_nodes(ng);
spin_unlock_irq(group_lock);
max_nid = preferred_group_nid(p, max_nid);
}
@@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
int cpu = cpupid_to_cpu(cpupid);
int i;
- if (unlikely(!p->numa_group)) {
+ if (unlikely(!deref_curr_numa_group(p))) {
unsigned int size = sizeof(struct numa_group) +
4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
goto no_join;
- my_grp = p->numa_group;
+ my_grp = deref_curr_numa_group(p);
if (grp == my_grp)
goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
*/
void task_numa_free(struct task_struct *p, bool final)
{
- struct numa_group *grp = p->numa_group;
+ /* safe: p either is current or is being freed by current */
+ struct numa_group *grp = rcu_dereference_raw(p->numa_group);
unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
@@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
- ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
if (!priv && !local && ng && ng->active_nodes > 1 &&
numa_is_active_node(cpu_node, ng) &&
numa_is_active_node(mem_node, ng))
@@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
{
int node;
unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0;
+ struct numa_group *ng;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
for_each_online_node(node) {
if (p->numa_faults) {
tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)];
tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)];
}
- if (p->numa_group) {
- gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)],
- gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)];
+ if (ng) {
+ gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)],
+ gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)];
}
print_numa_stats(m, node, tsf, tpf, gsf, gpf);
}
+ rcu_read_unlock();
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_SCHED_DEBUG */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cb361d8cdef69990f6b4504dc1fd9a594d983c97 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 16 Jul 2019 17:20:47 +0200
Subject: [PATCH] sched/fair: Use RCU accessors consistently for ->numa_group
The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..9f51932bd543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1092,7 +1092,15 @@ struct task_struct {
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct numa_group *numa_group;
+ /*
+ * This pointer is only modified for current in syscall and
+ * pagefault context (and for tasks being destroyed), so it can be read
+ * from any of the following contexts:
+ * - RCU read-side critical section
+ * - current->numa_group from everywhere
+ * - task's runqueue locked, task not running
+ */
+ struct numa_group __rcu *numa_group;
/*
* numa_faults is an array split into four regions:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6adb0e0f5feb..bc9cfeaac8bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1086,6 +1086,21 @@ struct numa_group {
unsigned long faults[0];
};
+/*
+ * For functions that can be called in multiple contexts that permit reading
+ * ->numa_group (see struct task_struct for locking rules).
+ */
+static struct numa_group *deref_task_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_check(p->numa_group, p == current ||
+ (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+}
+
+static struct numa_group *deref_curr_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_protected(p->numa_group, p == current);
+}
+
static inline unsigned long group_faults_priv(struct numa_group *ng);
static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long period = smin;
+ struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p)
period *= shared + 1;
period /= private + shared + 1;
}
+ rcu_read_unlock();
return max(smin, period);
}
@@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long smax;
+ struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */
smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
@@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
- p->numa_group = NULL;
+ RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p)
{
- return p->numa_group ? p->numa_group->gid : 0;
+ struct numa_group *ng;
+ pid_t gid = 0;
+
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng)
+ gid = ng->gid;
+ rcu_read_unlock();
+
+ return gid;
}
/*
@@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid)
{
- if (!p->numa_group)
+ struct numa_group *ng = deref_task_numa_group(p);
+
+ if (!ng)
return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
- p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)];
+ return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
+ ng->faults[task_faults_idx(NUMA_MEM, nid, 1)];
}
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
@@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
static inline unsigned long group_weight(struct task_struct *p, int nid,
int dist)
{
+ struct numa_group *ng = deref_task_numa_group(p);
unsigned long faults, total_faults;
- if (!p->numa_group)
+ if (!ng)
return 0;
- total_faults = p->numa_group->total_faults;
+ total_faults = ng->total_faults;
if (!total_faults)
return 0;
@@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
int src_nid, int dst_cpu)
{
- struct numa_group *ng = p->numa_group;
+ struct numa_group *ng = deref_curr_numa_group(p);
int dst_nid = cpu_to_node(dst_cpu);
int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load,
static void task_numa_compare(struct task_numa_env *env,
long taskimp, long groupimp, bool maymove)
{
+ struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
+ long imp = p_ng ? groupimp : taskimp;
struct task_struct *cur;
long src_load, dst_load;
- long load;
- long imp = env->p->numa_group ? groupimp : taskimp;
- long moveimp = imp;
int dist = env->dist;
+ long moveimp = imp;
+ long load;
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
@@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env,
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
- if (cur->numa_group == env->p->numa_group) {
+ cur_ng = rcu_dereference(cur->numa_group);
+ if (cur_ng == p_ng) {
imp = taskimp + task_weight(cur, env->src_nid, dist) -
task_weight(cur, env->dst_nid, dist);
/*
* Add some hysteresis to prevent swapping the
* tasks within a group over tiny differences.
*/
- if (cur->numa_group)
+ if (cur_ng)
imp -= imp / 16;
} else {
/*
* Compare the group weights. If a task is all by itself
* (not part of a group), use the task weight instead.
*/
- if (cur->numa_group && env->p->numa_group)
+ if (cur_ng && p_ng)
imp += group_weight(cur, env->src_nid, dist) -
group_weight(cur, env->dst_nid, dist);
else
@@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p)
.best_imp = 0,
.best_cpu = -1,
};
+ unsigned long taskweight, groupweight;
struct sched_domain *sd;
+ long taskimp, groupimp;
+ struct numa_group *ng;
struct rq *best_rq;
- unsigned long taskweight, groupweight;
int nid, ret, dist;
- long taskimp, groupimp;
/*
* Pick the lowest SD_NUMA domain, as that would have the smallest
@@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p)
* multiple NUMA nodes; in order to better consolidate the group,
* we need to check other locations.
*/
- if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) {
+ ng = deref_curr_numa_group(p);
+ if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
@@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p)
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group) {
+ if (ng) {
if (env.best_cpu == -1)
nid = env.src_nid;
else
@@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p)
unsigned long total_faults;
u64 runtime, period;
spinlock_t *group_lock = NULL;
+ struct numa_group *ng;
/*
* The p->mm->numa_scan_seq field gets updated without
@@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p)
runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */
- if (p->numa_group) {
- group_lock = &p->numa_group->lock;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
+ group_lock = &ng->lock;
spin_lock_irq(group_lock);
}
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p)
p->numa_faults[cpu_idx] += f_diff;
faults += p->numa_faults[mem_idx];
p->total_numa_faults += diff;
- if (p->numa_group) {
+ if (ng) {
/*
* safe because we can only change our own group
*
@@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p)
* nid and priv in a specific region because it
* is at the beginning of the numa_faults array.
*/
- p->numa_group->faults[mem_idx] += diff;
- p->numa_group->faults_cpu[mem_idx] += f_diff;
- p->numa_group->total_faults += diff;
- group_faults += p->numa_group->faults[mem_idx];
+ ng->faults[mem_idx] += diff;
+ ng->faults_cpu[mem_idx] += f_diff;
+ ng->total_faults += diff;
+ group_faults += ng->faults[mem_idx];
}
}
- if (!p->numa_group) {
+ if (!ng) {
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p)
}
}
- if (p->numa_group) {
- numa_group_count_active_nodes(p->numa_group);
+ if (ng) {
+ numa_group_count_active_nodes(ng);
spin_unlock_irq(group_lock);
max_nid = preferred_group_nid(p, max_nid);
}
@@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
int cpu = cpupid_to_cpu(cpupid);
int i;
- if (unlikely(!p->numa_group)) {
+ if (unlikely(!deref_curr_numa_group(p))) {
unsigned int size = sizeof(struct numa_group) +
4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
goto no_join;
- my_grp = p->numa_group;
+ my_grp = deref_curr_numa_group(p);
if (grp == my_grp)
goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
*/
void task_numa_free(struct task_struct *p, bool final)
{
- struct numa_group *grp = p->numa_group;
+ /* safe: p either is current or is being freed by current */
+ struct numa_group *grp = rcu_dereference_raw(p->numa_group);
unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
@@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
- ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
if (!priv && !local && ng && ng->active_nodes > 1 &&
numa_is_active_node(cpu_node, ng) &&
numa_is_active_node(mem_node, ng))
@@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
{
int node;
unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0;
+ struct numa_group *ng;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
for_each_online_node(node) {
if (p->numa_faults) {
tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)];
tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)];
}
- if (p->numa_group) {
- gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)],
- gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)];
+ if (ng) {
+ gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)],
+ gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)];
}
print_numa_stats(m, node, tsf, tpf, gsf, gpf);
}
+ rcu_read_unlock();
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_SCHED_DEBUG */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cb361d8cdef69990f6b4504dc1fd9a594d983c97 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 16 Jul 2019 17:20:47 +0200
Subject: [PATCH] sched/fair: Use RCU accessors consistently for ->numa_group
The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..9f51932bd543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1092,7 +1092,15 @@ struct task_struct {
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct numa_group *numa_group;
+ /*
+ * This pointer is only modified for current in syscall and
+ * pagefault context (and for tasks being destroyed), so it can be read
+ * from any of the following contexts:
+ * - RCU read-side critical section
+ * - current->numa_group from everywhere
+ * - task's runqueue locked, task not running
+ */
+ struct numa_group __rcu *numa_group;
/*
* numa_faults is an array split into four regions:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6adb0e0f5feb..bc9cfeaac8bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1086,6 +1086,21 @@ struct numa_group {
unsigned long faults[0];
};
+/*
+ * For functions that can be called in multiple contexts that permit reading
+ * ->numa_group (see struct task_struct for locking rules).
+ */
+static struct numa_group *deref_task_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_check(p->numa_group, p == current ||
+ (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+}
+
+static struct numa_group *deref_curr_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_protected(p->numa_group, p == current);
+}
+
static inline unsigned long group_faults_priv(struct numa_group *ng);
static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long period = smin;
+ struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p)
period *= shared + 1;
period /= private + shared + 1;
}
+ rcu_read_unlock();
return max(smin, period);
}
@@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long smax;
+ struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */
smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
@@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
- p->numa_group = NULL;
+ RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p)
{
- return p->numa_group ? p->numa_group->gid : 0;
+ struct numa_group *ng;
+ pid_t gid = 0;
+
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng)
+ gid = ng->gid;
+ rcu_read_unlock();
+
+ return gid;
}
/*
@@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid)
{
- if (!p->numa_group)
+ struct numa_group *ng = deref_task_numa_group(p);
+
+ if (!ng)
return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
- p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)];
+ return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
+ ng->faults[task_faults_idx(NUMA_MEM, nid, 1)];
}
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
@@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
static inline unsigned long group_weight(struct task_struct *p, int nid,
int dist)
{
+ struct numa_group *ng = deref_task_numa_group(p);
unsigned long faults, total_faults;
- if (!p->numa_group)
+ if (!ng)
return 0;
- total_faults = p->numa_group->total_faults;
+ total_faults = ng->total_faults;
if (!total_faults)
return 0;
@@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
int src_nid, int dst_cpu)
{
- struct numa_group *ng = p->numa_group;
+ struct numa_group *ng = deref_curr_numa_group(p);
int dst_nid = cpu_to_node(dst_cpu);
int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load,
static void task_numa_compare(struct task_numa_env *env,
long taskimp, long groupimp, bool maymove)
{
+ struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
+ long imp = p_ng ? groupimp : taskimp;
struct task_struct *cur;
long src_load, dst_load;
- long load;
- long imp = env->p->numa_group ? groupimp : taskimp;
- long moveimp = imp;
int dist = env->dist;
+ long moveimp = imp;
+ long load;
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
@@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env,
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
- if (cur->numa_group == env->p->numa_group) {
+ cur_ng = rcu_dereference(cur->numa_group);
+ if (cur_ng == p_ng) {
imp = taskimp + task_weight(cur, env->src_nid, dist) -
task_weight(cur, env->dst_nid, dist);
/*
* Add some hysteresis to prevent swapping the
* tasks within a group over tiny differences.
*/
- if (cur->numa_group)
+ if (cur_ng)
imp -= imp / 16;
} else {
/*
* Compare the group weights. If a task is all by itself
* (not part of a group), use the task weight instead.
*/
- if (cur->numa_group && env->p->numa_group)
+ if (cur_ng && p_ng)
imp += group_weight(cur, env->src_nid, dist) -
group_weight(cur, env->dst_nid, dist);
else
@@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p)
.best_imp = 0,
.best_cpu = -1,
};
+ unsigned long taskweight, groupweight;
struct sched_domain *sd;
+ long taskimp, groupimp;
+ struct numa_group *ng;
struct rq *best_rq;
- unsigned long taskweight, groupweight;
int nid, ret, dist;
- long taskimp, groupimp;
/*
* Pick the lowest SD_NUMA domain, as that would have the smallest
@@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p)
* multiple NUMA nodes; in order to better consolidate the group,
* we need to check other locations.
*/
- if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) {
+ ng = deref_curr_numa_group(p);
+ if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
@@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p)
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group) {
+ if (ng) {
if (env.best_cpu == -1)
nid = env.src_nid;
else
@@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p)
unsigned long total_faults;
u64 runtime, period;
spinlock_t *group_lock = NULL;
+ struct numa_group *ng;
/*
* The p->mm->numa_scan_seq field gets updated without
@@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p)
runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */
- if (p->numa_group) {
- group_lock = &p->numa_group->lock;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
+ group_lock = &ng->lock;
spin_lock_irq(group_lock);
}
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p)
p->numa_faults[cpu_idx] += f_diff;
faults += p->numa_faults[mem_idx];
p->total_numa_faults += diff;
- if (p->numa_group) {
+ if (ng) {
/*
* safe because we can only change our own group
*
@@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p)
* nid and priv in a specific region because it
* is at the beginning of the numa_faults array.
*/
- p->numa_group->faults[mem_idx] += diff;
- p->numa_group->faults_cpu[mem_idx] += f_diff;
- p->numa_group->total_faults += diff;
- group_faults += p->numa_group->faults[mem_idx];
+ ng->faults[mem_idx] += diff;
+ ng->faults_cpu[mem_idx] += f_diff;
+ ng->total_faults += diff;
+ group_faults += ng->faults[mem_idx];
}
}
- if (!p->numa_group) {
+ if (!ng) {
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p)
}
}
- if (p->numa_group) {
- numa_group_count_active_nodes(p->numa_group);
+ if (ng) {
+ numa_group_count_active_nodes(ng);
spin_unlock_irq(group_lock);
max_nid = preferred_group_nid(p, max_nid);
}
@@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
int cpu = cpupid_to_cpu(cpupid);
int i;
- if (unlikely(!p->numa_group)) {
+ if (unlikely(!deref_curr_numa_group(p))) {
unsigned int size = sizeof(struct numa_group) +
4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
goto no_join;
- my_grp = p->numa_group;
+ my_grp = deref_curr_numa_group(p);
if (grp == my_grp)
goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
*/
void task_numa_free(struct task_struct *p, bool final)
{
- struct numa_group *grp = p->numa_group;
+ /* safe: p either is current or is being freed by current */
+ struct numa_group *grp = rcu_dereference_raw(p->numa_group);
unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
@@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
- ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
if (!priv && !local && ng && ng->active_nodes > 1 &&
numa_is_active_node(cpu_node, ng) &&
numa_is_active_node(mem_node, ng))
@@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
{
int node;
unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0;
+ struct numa_group *ng;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
for_each_online_node(node) {
if (p->numa_faults) {
tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)];
tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)];
}
- if (p->numa_group) {
- gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)],
- gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)];
+ if (ng) {
+ gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)],
+ gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)];
}
print_numa_stats(m, node, tsf, tpf, gsf, gpf);
}
+ rcu_read_unlock();
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_SCHED_DEBUG */
Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+,
but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
Regards,
Jack Wang 1 & 1 IONOS Cloud GmbH
Jason Wang (4):
vhost: introduce vhost_exceeds_weight()
vhost_net: fix possible infinite loop
vhost: vsock: add weight support
vhost: scsi: add weight support
drivers/vhost/net.c | 41 ++++++++++++++---------------------------
drivers/vhost/scsi.c | 15 +++++++++++----
drivers/vhost/vhost.c | 20 +++++++++++++++++++-
drivers/vhost/vhost.h | 5 ++++-
drivers/vhost/vsock.c | 28 +++++++++++++++++++++-------
5 files changed, 69 insertions(+), 40 deletions(-)
--
2.17.1
Looks like a regression got introduced into nv50_mstc_atomic_check()
that somehow didn't get found until now. If userspace changes
crtc_state->active to false but leaves the CRTC enabled, we end up
calling drm_dp_atomic_find_vcpi_slots() using the PBN calculated in
asyh->dp.pbn. However, if the display is inactive we end up calculating
a PBN of 0, which inadvertently causes us to have an allocation of 0.
>From there, if userspace then disables the CRTC afterwards we end up
accidentally attempting to free the VCPI twice:
WARNING: CPU: 0 PID: 1484 at drivers/gpu/drm/drm_dp_mst_topology.c:3336
drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
RIP: 0010:drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
Call Trace:
drm_atomic_helper_check_modeset+0x3f3/0xa60 [drm_kms_helper]
? drm_atomic_check_only+0x43/0x780 [drm]
drm_atomic_helper_check+0x15/0x90 [drm_kms_helper]
nv50_disp_atomic_check+0x83/0x1d0 [nouveau]
drm_atomic_check_only+0x54d/0x780 [drm]
? drm_atomic_set_crtc_for_connector+0xec/0x100 [drm]
drm_atomic_commit+0x13/0x50 [drm]
drm_atomic_helper_set_config+0x81/0x90 [drm_kms_helper]
drm_mode_setcrtc+0x194/0x6a0 [drm]
? vprintk_emit+0x16a/0x230
? drm_ioctl+0x163/0x390 [drm]
? drm_mode_getcrtc+0x180/0x180 [drm]
drm_ioctl_kernel+0xaa/0xf0 [drm]
drm_ioctl+0x208/0x390 [drm]
? drm_mode_getcrtc+0x180/0x180 [drm]
nouveau_drm_ioctl+0x63/0xb0 [nouveau]
do_vfs_ioctl+0x405/0x660
? recalc_sigpending+0x17/0x50
? _copy_from_user+0x37/0x60
ksys_ioctl+0x5e/0x90
? exit_to_usermode_loop+0x92/0xe0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x59/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
WARNING: CPU: 0 PID: 1484 at drivers/gpu/drm/drm_dp_mst_topology.c:3336
drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
---[ end trace 4c395c0c51b1f88d ]---
[drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for
[MST PORT:00000000e288eb7d] found in mst state 000000008e642070
So, fix this by doing what we probably should have done from the start: only
call drm_dp_atomic_find_vcpi_slots() when crtc_state->mode_changed is set, so
that VCPI allocations remain for as long as the CRTC is enabled.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 232c9eec417a ("drm/nouveau: Use atomic VCPI helpers for MST")
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Ben Skeggs <bskeggs(a)redhat.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: David Airlie <airlied(a)redhat.com>
Cc: Jerry Zuo <Jerry.Zuo(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Juston Li <juston.li(a)intel.com>
Cc: Karol Herbst <karolherbst(a)gmail.com>
Cc: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Cc: Ilia Mirkin <imirkin(a)alum.mit.edu>
Cc: <stable(a)vger.kernel.org> # v5.1+
---
drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 8497768f1b41..126703816794 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -780,7 +780,7 @@ nv50_msto_atomic_check(struct drm_encoder *encoder,
drm_dp_calc_pbn_mode(crtc_state->adjusted_mode.clock,
connector->display_info.bpc * 3);
- if (drm_atomic_crtc_needs_modeset(crtc_state)) {
+ if (crtc_state->mode_changed) {
slots = drm_dp_atomic_find_vcpi_slots(state, &mstm->mgr,
mstc->port,
asyh->dp.pbn);
--
2.21.0
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: .
The bot has tested the following trees: v5.2.4, v5.1.21, v4.19.62, v4.14.134, v4.9.186, v4.4.186.
v5.2.4: Build OK!
v5.1.21: Build OK!
v4.19.62: Build OK!
v4.14.134: Failed to apply! Possible dependencies:
25a13e382de2 ("bluetooth: hci_qca: Replace GFP_ATOMIC with GFP_KERNEL")
v4.9.186: Failed to apply! Possible dependencies:
25a13e382de2 ("bluetooth: hci_qca: Replace GFP_ATOMIC with GFP_KERNEL")
v4.4.186: Failed to apply! Possible dependencies:
162f812f23ba ("Bluetooth: hci_uart: Add Marvell support")
25a13e382de2 ("bluetooth: hci_qca: Replace GFP_ATOMIC with GFP_KERNEL")
395174bb07c1 ("Bluetooth: hci_uart: Add Intel/AG6xx support")
9e69130c4efc ("Bluetooth: hci_uart: Add Nokia Protocol identifier")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.
Ajay
---
[PATCH 1/8]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit
[PATCH 2/8]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function
[PATCH 3/8]:
Backporting of upstream commit 7aef4172c795:
mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton
[PATCH 4/8]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
[PATCH 5/8]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages
[PATCH 6/8]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount
[PATCH 7/8]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper
[PATCH 8/8]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get
Hi,
Sunil Muthuswamy <sunilmut(a)microsoft.com> made some important fixes for
hv_sock recently. I and Sunil think it would be great to backport the
fixes to the longterm stable kernels.
Since hv_sock was firstly introduced in v4.14, we only care about
v4.14, v4.19 and v5.2.
For linux-5.2.y (currently it's v5.2.4), only one patch is missing.
The mainline commit ID is:
d5afa82c977e ("vsock: correct removal of socket from the list")
It can be cleanly cherry-picked from the mainline.
For linux-4.19.y (currently it's v4.19.62), 3 patches are missing.
The mainline commit IDs are:
cb359b604167 ("hvsock: fix epollout hang from race condition")
a9eeb998c28d ("hv_sock: Add support for delayed close")
d5afa82c977e ("vsock: correct removal of socket from the list")
They can be cleanly cherry-picked from the mainline, in the listed order here.
Note: it looks the first commit (cb359b604167) has been queued.
For linux-4.14.y (currently it's v4.14.134), 4 patches are missing.
The mainline commit IDs are:
cb359b604167 ("hvsock: fix epollout hang from race condition")
3b4477d2dcf2 ("VSOCK: use TCP state constants for sk_state")
a9eeb998c28d ("hv_sock: Add support for delayed close")
d5afa82c977e ("vsock: correct removal of socket from the list")
The third patch (a9eeb998c28d) needs small manual adjustments, and please
use the attached backported patch for it; the other 3 patches can be cleanly
cherry-picked from the mainline, in the listed order here.
Note: it looks the first commit (cb359b604167) has been queued.
Please let me know, if any clarification is needed.
Thanks!
-- Dexuan
On Thu, 1 Aug 2019 at 07:31, Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.2.4, v5.1.21, v4.19.62, v4.14.134, v4.9.186, v4.4.186.
>
> v5.2.4: Build OK!
> v5.1.21: Build OK!
> v4.19.62: Failed to apply! Possible dependencies:
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> e006d89abedd ("coresight: etm4x: Add support for handling errors")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
>
> v4.14.134: Failed to apply! Possible dependencies:
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> e006d89abedd ("coresight: etm4x: Add support for handling errors")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
>
> v4.9.186: Failed to apply! Possible dependencies:
> 297ab90f15f6 ("coresight: tmc: Cleanup operation mode handling")
> 2cd541402829 ("coresight: tmc: minor fix for output log")
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> c38e505e2701 ("coresight: tmc: Get rid of mode parameter for helper routines")
> e006d89abedd ("coresight: etm4x: Add support for handling errors")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
>
> v4.4.186: Failed to apply! Possible dependencies:
> 1925a470ce69 ("coresight: etm3x: splitting struct etm_drvdata")
> 2127154d115d ("coresight: etm3x: implementing user/kernel mode tracing")
> 22fd532eaa0c ("coresight: etm3x: adding operation mode for etm_enable()")
> 27b10da8fff2 ("coresight: etb10: moving to local atomic operations")
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 52210c8745e4 ("coresight: implementing 'cpu_id()' API")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> 882d5e112491 ("coresight: etm3x: implementing perf_enable/disable() API")
> b3e94405941e ("coresight: associating path with session rather than tracer")
> c04148e708c0 ("coresight: etm3x: moving sysFS entries to dedicated file")
> c1f8e57c9e66 ("coresight: etm3x: moving etm_readl/writel to header file")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
> e827d4550aa3 ("coresight: etb10: adding operation mode for sink->enable()")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
I have another one like that - will send a separate set that applies correctly.
Thanks for the consideration,
Mathieu
>
> --
> Thanks,
> Sasha
On 01/08/2019 14:31, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.2.4, v5.1.21, v4.19.62, v4.14.134, v4.9.186, v4.4.186.
>
> v5.2.4: Build OK!
> v5.1.21: Build OK!
> v4.19.62: Build OK!
> v4.14.134: Failed to apply! Possible dependencies:
> 031e3601869c ("lib: Add generic PIO mapping method")
>
> v4.9.186: Failed to apply! Possible dependencies:
> 031e3601869c ("lib: Add generic PIO mapping method")
>
> v4.4.186: Failed to apply! Possible dependencies:
> 031e3601869c ("lib: Add generic PIO mapping method")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
Please only consider for v4.19.x, v5.1.x, and v5.2.x stable trees, same
for all patches in the patchset.
Thanks,
John
> --
> Thanks,
> Sasha
>
> .
>
Dear stable maintainers,
Please include the commit 66b20ac0a1a10769d059d6903202f53494e3d902 into the stable 5.1 and 5.2.
Commit title: nvme: fix multipath crash when ANA is deactivated
The bug that this commit fixes was seen crashing kernels or provoking an oops from 5.0 onwards.
Regards,
Marta
Sehr geehrte Damen und Herren,
nach unserem Besuch Ihrer Homepage möchten wir Ihnen ein Angebot von Produkten vorstellen, das Ihnen ermöglichen wird, den Verkauf Ihrer Produkte sowie Dienstleistungen deutlich zu erhöhen.
Die Datenbanken der Firmen sind in für Sie interessante und relevante Zielgruppen untergliedert.
Die Firmenangaben beinhalten: Name der Firma, Ansprechpartner, E-mail Adresse, Tel. + Fax-Nr., PLZ, Ort, Straße etc.
1. Gesamtpaket 2019 DE - 1,4 Mio. Firmenadressen ( 1 457 620 ) - 190 € ( bis zum 01.08.2019 )
2. Gesamtpaket 2019 DE,AT,CH - 1,7 Mio. Firmenadressen ( 1 747 921 ) - 240 € ( bis zum 01.08.2019 )
3. Schweiz 2019 ( 187 911 ) - 149 € ( bis zum 01.08.2019 )
4. Österreich 2019 ( 104 000 ) - 149 € ( bis zum 01.08.2019 )
Die Verwendungsmöglichkeiten der Datenbanken sind praktisch unbegrenzt und Sie können durch Verwendung der von uns entwickelten
Programme des personalisierten Versendens von Angeboten u.ä. mittels E-mailing bzw. Fax effektive und sichere Werbekampagnen damit durchführen.
Bitte informieren Sie sich über die weiteren Details einmal unverbindlich auf unseren Webseiten:
http://www.acdbmarketing.net/?page=catalog
Mit freundlichen Grüßen
Oliver Klemens
From: Wanpeng Li <wanpengli(a)tencent.com>
After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a
five years ago bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
Call Trace:
flush_tlb_mm_range+0x68/0x140
tlb_flush_mmu.part.75+0x37/0xe0
tlb_finish_mmu+0x55/0x60
zap_page_range+0x142/0x190
SyS_madvise+0x3cd/0x9c0
system_call_fastpath+0x1c/0x21
swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.
This patch fixes it by reverting the kvm_arch_vcpu_runnable() condition
in kvm_vcpu_on_spin() loop.
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ed061d8..12f2c91 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2506,7 +2506,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
continue;
if (vcpu == me)
continue;
- if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+ if (swait_active(&vcpu->wq))
continue;
if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
!kvm_arch_vcpu_in_kernel(vcpu))
--
2.7.4
From: Munehisa Kamata <kamatam(a)amazon.com>
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: linux-block(a)vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br(a)gmail.com>
Cc: nbd(a)other.debian.org
Cc: stable(a)vger.kernel.org
Cc: David Woodhouse <dwmw(a)amazon.com>
Signed-off-by: Munehisa Kamata <kamatam(a)amazon.com>
---
I reproduced this phenomenon on the fat file system.
reproduce steps :
1.Establish a nbd connection.
2.Run two threads:one do mount and umount,anther one do clear_sock ioctl
3.Then hit the BUG_ON.
v2: Delete a link.
Signed-off-by: SunKe <sunke32(a)huawei.com>
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9bcde23..e21d2de 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1231,7 +1231,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
sock_shutdown(nbd);
- kill_bdev(bdev);
+ __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
--
2.7.4
The patch titled
Subject: mm/usercopy: use memory range to be accessed for wraparound check
has been added to the -mm tree. Its filename is
mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-usercopy-use-memory-range-to-be…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-usercopy-use-memory-range-to-be…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Isaac J. Manjarres" <isaacm(a)codeaurora.org>
Subject: mm/usercopy: use memory range to be accessed for wraparound check
Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the range
of addresses that will be accessed is [ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the memory address,
when trying to read 4 KB from memory that is mapped to the the last
possible page in the virtual address space, when in fact, accessing that
range of memory would not cause a wraparound to occur.
Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to wrap
around.
Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeauror…
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Trilok Soni <tsoni(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/usercopy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/usercopy.c~mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check
+++ a/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(c
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
_
Patches currently in -mm which might be from isaacm(a)codeaurora.org are
mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check.patch
Currently, when checking to see if accessing n bytes starting at
address "ptr" will cause a wraparound in the memory addresses,
the check in check_bogus_address() adds an extra byte, which is
incorrect, as the range of addresses that will be accessed is
[ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the
memory address, when trying to read 4 KB from memory that is
mapped to the the last possible page in the virtual address
space, when in fact, accessing that range of memory would not
cause a wraparound to occur.
Use the memory range that will actually be accessed when
considering if accessing a certain amount of bytes will cause
the memory address to wrap around.
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
---
mm/usercopy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 2a09796..98e92486 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(const unsigned long ptr, unsigned long n,
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Hi Sasha,
On Thu, Jul 18, 2019 at 5:45 PM Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
Where did you get this patch from? Since stable needs patches merged
on Linus tree,
shouldn't your scripts run to try backporting only patches from there?
Thanks,
Rodrigo.
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 88a0d9606aff drm/i915/vbt: Parse and use the new field with PSR2 TP2/3 wakeup time.
>
> The bot has tested the following trees: v5.2.1.
> v5.2.1: Failed to apply! Possible dependencies:
> 231dcffc234f ("drm/i915/bios: add BDB block comments before definitions")
> 843444ed1301 ("drm/i915/bios: sort BDB block definitions using block ID")
> f87f6599c843 ("drm/i915/bios: reserve struct bdb_ prefix for BDB blocks")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
> --
> Thanks,
> Sasha
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Rodrigo Vivi
Blog: http://blog.vivi.eng.br