This is the start of the stable review cycle for the 4.20.2 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jan 13 13:10:14 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.2-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.20.2-rc1
Enric Balletbo i Serra enric.balletbo@collabora.com drm/rockchip: psr: do not dereference encoder before it is null checked.
Boris Brezillon boris.brezillon@bootlin.com drm/vc4: Set ->is_yuv to false when num_planes == 1
Lyude Paul lyude@redhat.com drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume()
Christophe Leroy christophe.leroy@c-s.fr lib: fix build failure in CONFIG_DEBUG_VIRTUAL test
Frank Rowand frank.rowand@sony.com of: __of_detach_node() - remove node from phandle cache
Frank Rowand frank.rowand@sony.com of: of_node_get()/of_node_put() nodes held in phandle cache
Lubomir Rintel lkundrak@v3.sk power: supply: olpc_battery: correct the temperature units
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: msu: Fix an off-by-one in attribute store
Christian Borntraeger borntraeger@de.ibm.com genwqe: Fix size check
Shuah Khan shuah@kernel.org selftests: Fix test errors related to lib.mk khdr target
Christian Lamparter chunkeey@gmail.com powerpc/4xx/ocm: Fix compilation error due to PAGE_KERNEL usage
Shaokun Zhang zhangshaokun@hisilicon.com drivers/perf: hisi: Fixup one DDRC PMU register offset
YueHaibing yuehaibing@huawei.com video: fbdev: pxafb: Fix "WARNING: invalid free of devm_ allocated data"
Yan, Zheng zyan@redhat.com ceph: don't update importing cap's mseq when handing cap export
Linus Torvalds torvalds@linux-foundation.org sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
Sohil Mehta sohil.mehta@intel.com iommu/vt-d: Handle domain agaw being less than iommu agaw
Steve Wise swise@opengridcomputing.com RDMA/iwcm: Don't copy past the end of dev_name() string
Bart Van Assche bvanassche@acm.org RDMA/srpt: Fix a use-after-free in the channel release code
Alexander Shishkin alexander.shishkin@linux.intel.com stm class: Fix a module refcount leak in policy creation error path
Sagi Grimberg sagi@grimberg.me rxe: fix error completion wr_id and qp_num
Dominique Martinet dominique.martinet@cea.fr 9p/net: put a lower bound on msize
Mircea Caprioru mircea.caprioru@analog.com iio: dac: ad5686: fix bit shift read register
Evan Green evgreen@chromium.org iio: adc: qcom-spmi-adc5: Initialize prescale properly
Breno Leitao leitao@debian.org powerpc/tm: Set MSR[TS] just prior to recheckpoint
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"
J. Bruce Fields bfields@redhat.com nfsd4: zero-length WRITE should succeed
Chuck Lever chuck.lever@oracle.com xprtrdma: Yet another double DMA-unmap
Benjamin Coddington bcodding@redhat.com lockd: Show pid of lockd for remote locks
Jarkko Nikula jarkko.nikula@linux.intel.com PCI / PM: Allow runtime PM without callback functions
Ondrej Mosnacek omosnace@redhat.com selinux: policydb - fix byte order and alignment issues
Larry Finger Larry.Finger@lwfinger.net b43: Fix error in cordic routine
Andreas Gruenbacher agruenba@redhat.com gfs2: Fix loop in gfs2_rbm_find
Andreas Gruenbacher agruenba@redhat.com gfs2: Get rid of potential double-freeing in gfs2_create_inode
Vasily Averin vvs@virtuozzo.com dlm: memory leaks on error path in dlm_user_request()
Vasily Averin vvs@virtuozzo.com dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
Vasily Averin vvs@virtuozzo.com dlm: possible memory leak on error path in create_lkb()
Vasily Averin vvs@virtuozzo.com dlm: fixed memory leaks after failed ls_remove_names allocation
Jaegeuk Kim jaegeuk@kernel.org dm: do not allow readahead to limit IO size
Damien Le Moal damien.lemoal@wdc.com block: mq-deadline: Fix write completion handling
Ming Lei ming.lei@redhat.com block: deactivate blk_stat timer in wbt_disable_default()
Matthew Wilcox willy@infradead.org Fix failure path in alloc_pid()
Rafael J. Wysocki rafael.j.wysocki@intel.com driver core: Add missing dev->bus->need_parent_lock checks
Dennis Krein Dennis.Krein@netapp.com srcu: Lock srcu_data structure in srcu_gp_start()
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Always check descriptor sizes in parser code
Hui Peng benquike@163.com ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Check mixer unit descriptors more strictly
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
Dan Carpenter dan.carpenter@oracle.com ALSA: cs46xx: Potential NULL dereference in probe
Brad Love brad@nextdimension.cc media: cx23885: only reset DMA on problematic CPUs
Huang Ying ying.huang@intel.com mm, swap: fix swapoff with KSM pages
Dan Williams dan.j.williams@intel.com mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
Dan Williams dan.j.williams@intel.com mm, hmm: replace hmm_devmem_pages_create() with devm_memremap_pages()
Dan Williams dan.j.williams@intel.com mm, hmm: use devm semantics for hmm_devmem_{add, remove}
Dan Williams dan.j.williams@intel.com mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support
Vasily Averin vvs@virtuozzo.com sunrpc: use SVC_NET() in svcauth_gss_* functions
Vasily Averin vvs@virtuozzo.com sunrpc: fix cache_head leak due to queued request
Michal Hocko mhocko@suse.com memcg, oom: notify on oom killer invocation from the charge path
Dan Williams dan.j.williams@intel.com mm, devm_memremap_pages: fix shutdown handling
Dan Williams dan.j.williams@intel.com mm, devm_memremap_pages: kill mapping "System RAM" support
Dan Williams dan.j.williams@intel.com mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
Michal Hocko mhocko@suse.com hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
Minchan Kim minchan@kernel.org zram: fix double free backing device
David Herrmann dh.herrmann@gmail.com fork: record start_time late
Ewan D. Milne emilne@redhat.com scsi: lpfc: do not set queue->page_count to 0 if pc_sli4_params.wqpcnt is invalid
Steffen Maier maier@linux.ibm.com scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
-------------
Diffstat:
Makefile | 4 +- arch/powerpc/kernel/signal_32.c | 38 ++- arch/powerpc/kernel/signal_64.c | 64 +++-- arch/powerpc/platforms/4xx/ocm.c | 4 +- block/blk-mq-sched.c | 3 +- block/blk-mq-sched.h | 1 + block/blk-stat.h | 5 + block/blk-wbt.c | 4 +- block/mq-deadline.c | 12 +- drivers/base/dd.c | 4 +- drivers/block/zram/zram_drv.c | 4 +- drivers/dax/pmem.c | 14 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 12 +- drivers/gpu/drm/rockchip/rockchip_drm_psr.c | 4 +- drivers/gpu/drm/vc4/vc4_plane.c | 1 + drivers/hwtracing/intel_th/msu.c | 3 +- drivers/hwtracing/stm/policy.c | 12 +- drivers/iio/adc/qcom-spmi-adc5.c | 58 ++-- drivers/iio/dac/ad5686.c | 3 +- drivers/infiniband/core/iwcm.c | 12 +- drivers/infiniband/sw/rxe/rxe_resp.c | 13 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 18 +- drivers/iommu/intel-iommu.c | 4 +- drivers/md/dm-table.c | 3 + drivers/media/pci/cx23885/cx23885-core.c | 55 +++- drivers/media/pci/cx23885/cx23885.h | 2 + drivers/misc/genwqe/card_utils.c | 2 +- drivers/net/wireless/broadcom/b43/phy_common.c | 2 +- drivers/nvdimm/pmem.c | 13 +- drivers/of/base.c | 101 +++++-- drivers/of/dynamic.c | 3 + drivers/of/of_private.h | 4 + drivers/pci/p2pdma.c | 10 +- drivers/pci/pci-driver.c | 27 +- drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 4 +- drivers/power/supply/olpc_battery.c | 4 +- drivers/s390/scsi/zfcp_aux.c | 6 +- drivers/scsi/lpfc/lpfc_sli.c | 3 +- drivers/video/fbdev/pxafb.c | 4 +- fs/ceph/caps.c | 1 - fs/dlm/lock.c | 17 +- fs/dlm/lockspace.c | 2 +- fs/gfs2/inode.c | 18 +- fs/gfs2/rgrp.c | 2 +- fs/lockd/clntproc.c | 2 +- fs/lockd/xdr.c | 4 +- fs/lockd/xdr4.c | 4 +- fs/nfsd/nfs4proc.c | 2 - include/linux/hmm.h | 4 +- include/linux/memremap.h | 2 + kernel/fork.c | 13 +- kernel/memremap.c | 94 ++++--- kernel/pid.c | 6 +- kernel/rcu/srcutree.c | 2 + kernel/sched/fair.c | 43 +-- lib/test_debug_virtual.c | 1 + mm/hmm.c | 305 +++------------------ mm/memcontrol.c | 20 +- mm/memory_hotplug.c | 16 ++ mm/swapfile.c | 3 +- net/9p/client.c | 21 ++ net/sunrpc/auth_gss/svcauth_gss.c | 8 +- net/sunrpc/cache.c | 10 +- net/sunrpc/xprtrdma/frwr_ops.c | 6 +- net/sunrpc/xprtrdma/verbs.c | 9 +- security/selinux/ss/policydb.c | 51 +++- sound/pci/cs46xx/dsp_spos.c | 3 + sound/usb/card.c | 2 +- sound/usb/mixer.c | 29 +- sound/usb/quirks-table.h | 6 + sound/usb/stream.c | 36 ++- tools/testing/nvdimm/test/iomap.c | 17 +- tools/testing/selftests/android/Makefile | 2 +- tools/testing/selftests/futex/functional/Makefile | 1 + tools/testing/selftests/gpio/Makefile | 6 +- tools/testing/selftests/kvm/Makefile | 2 +- tools/testing/selftests/lib.mk | 8 +- .../selftests/networking/timestamping/Makefile | 1 + tools/testing/selftests/tc-testing/bpf/Makefile | 1 + tools/testing/selftests/vm/Makefile | 1 + 80 files changed, 710 insertions(+), 611 deletions(-)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steffen Maier maier@linux.ibm.com
commit 60a161b7e5b2a252ff0d4c622266a7d8da1120ce upstream.
Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.
Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel.
As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification.
Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item.
Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error().
An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency"); v2.6.37 commit d3e1088d6873 ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit fca55b6fb587 ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit c57a39a45a76 ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")]
Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work.
Signed-off-by: Steffen Maier maier@linux.ibm.com Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: stable@vger.kernel.org #2.6.27+ Reviewed-by: Jens Remus jremus@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/scsi/zfcp_aux.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/s390/scsi/zfcp_aux.c +++ b/drivers/s390/scsi/zfcp_aux.c @@ -275,16 +275,16 @@ static void zfcp_free_low_mem_buffers(st */ int zfcp_status_read_refill(struct zfcp_adapter *adapter) { - while (atomic_read(&adapter->stat_miss) > 0) + while (atomic_add_unless(&adapter->stat_miss, -1, 0)) if (zfcp_fsf_status_read(adapter->qdio)) { + atomic_inc(&adapter->stat_miss); /* undo add -1 */ if (atomic_read(&adapter->stat_miss) >= adapter->stat_read_buf_num) { zfcp_erp_adapter_reopen(adapter, 0, "axsref1"); return 1; } break; - } else - atomic_dec(&adapter->stat_miss); + } return 0; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ewan D. Milne emilne@redhat.com
commit 4e87eb2f46ea547d12a276b2e696ab934d16cfb6 upstream.
Certain older adapters such as the OneConnect OCe10100 may not have a valid wqpcnt value. In this case, do not set queue->page_count to 0 in lpfc_sli4_queue_alloc() as this will prevent the driver from initializing.
Fixes: 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications") Cc: stable@vger.kernel.org # 4.11+ Signed-off-by: Ewan D. Milne emilne@redhat.com Reviewed-by: Laurence Oberman loberman@redhat.com Tested-by: Laurence Oberman loberman@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/lpfc/lpfc_sli.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -14501,7 +14501,8 @@ lpfc_sli4_queue_alloc(struct lpfc_hba *p hw_page_size))/hw_page_size;
/* If needed, Adjust page count to match the max the adapter supports */ - if (queue->page_count > phba->sli4_hba.pc_sli4_params.wqpcnt) + if (phba->sli4_hba.pc_sli4_params.wqpcnt && + (queue->page_count > phba->sli4_hba.pc_sli4_params.wqpcnt)) queue->page_count = phba->sli4_hba.pc_sli4_params.wqpcnt;
INIT_LIST_HEAD(&queue->list);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Herrmann dh.herrmann@gmail.com
commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream.
This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space.
Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar).
By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes.
Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination.
Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Tom Gundersen teg@jklm.no Signed-off-by: David Herrmann dh.herrmann@gmail.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/fork.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -1837,8 +1837,6 @@ static __latent_entropy struct task_stru
posix_cpu_timers_init(p);
- p->start_time = ktime_get_ns(); - p->real_start_time = ktime_get_boot_ns(); p->io_context = NULL; audit_set_context(p, NULL); cgroup_fork(p); @@ -2005,6 +2003,17 @@ static __latent_entropy struct task_stru goto bad_fork_free_pid;
/* + * From this point on we must avoid any synchronous user-space + * communication until we take the tasklist-lock. In particular, we do + * not want user-space to be able to predict the process start-time by + * stalling fork(2) after we recorded the start_time but before it is + * visible to the system. + */ + + p->start_time = ktime_get_ns(); + p->real_start_time = ktime_get_boot_ns(); + + /* * Make it visible to the rest of the system, but dont wake it up yet. * Need tasklist lock for parent etc handling! */
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Minchan Kim minchan@kernel.org
commit 5547932dc67a48713eece4fa4703bfdf0cfcb818 upstream.
If blkdev_get fails, we shouldn't do blkdev_put. Otherwise, kernel emits below log. This patch fixes it.
WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120 Modules linked in: CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:blkdev_put+0x105/0x120 Call Trace: __x64_sys_swapoff+0x46d/0x490 do_syscall_64+0x5a/0x190 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 4466 hardirqs last enabled at (4465): __free_pages_ok+0x1e3/0x490 hardirqs last disabled at (4466): trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (3420): __do_softirq+0x333/0x446 softirqs last disabled at (3407): irq_exit+0xd1/0xe0
Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.org Signed-off-by: Minchan Kim minchan@kernel.org Reviewed-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Reviewed-by: Joey Pabalinas joeypabalinas@gmail.com Cc: stable@vger.kernel.org [4.14+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/zram/zram_drv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -382,8 +382,10 @@ static ssize_t backing_dev_store(struct
bdev = bdgrab(I_BDEV(inode)); err = blkdev_get(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram); - if (err < 0) + if (err < 0) { + bdev = NULL; goto out; + }
nr_pages = i_size_read(inode) >> PAGE_SHIFT; bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko mhocko@suse.com
commit b15c87263a69272423771118c653e9a1d0672caa upstream.
We have received a bug report that an injected MCE about faulty memory prevents memory offline to succeed on 4.4 base kernel. The underlying reason was that the HWPoison page has an elevated reference count and the migration keeps failing. There are two problems with that. First of all it is dubious to migrate the poisoned page because we know that accessing that memory is possible to fail. Secondly it doesn't make any sense to migrate a potentially broken content and preserve the memory corruption over to a new location.
Oscar has found out that 4.4 and the current upstream kernels behave slightly differently with his simply testcase
===
int main(void) { int ret; int i; int fd; char *array = malloc(4096); char *array_locked = malloc(4096);
fd = open("/tmp/data", O_RDONLY); read(fd, array, 4095);
for (i = 0; i < 4096; i++) array_locked[i] = 'd';
ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked)); if (ret) perror("mlock");
sleep (20);
ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON); if (ret) perror("madvise");
for (i = 0; i < 4096; i++) array_locked[i] = 'd';
return 0; } ===
+ offline this memory.
In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU list kernel: [<ffffffff81019ac9>] dump_trace+0x59/0x340 kernel: [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170 kernel: [<ffffffff8101ac71>] show_stack+0x21/0x40 kernel: [<ffffffff8132bb90>] dump_stack+0x5c/0x7c kernel: [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0 kernel: [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160 kernel: [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100 kernel: [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0 kernel: [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70 kernel: [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs] kernel: [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200 kernel: [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0 kernel: [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660 kernel: [<ffffffff8120e50d>] __vfs_read+0xcd/0x140 kernel: [<ffffffff8120e9ea>] vfs_read+0x7a/0x120 kernel: [<ffffffff8121404b>] kernel_read+0x3b/0x50 kernel: [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0 kernel: [<ffffffff81215f08>] do_execve+0x28/0x30 kernel: [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130 kernel: [<ffffffff8161c045>] ret_from_fork+0x55/0x80
And that latter confuses the hotremove path because an LRU page is attempted to be migrated and that fails due to an elevated reference count. It is quite possible that the reuse of the HWPoisoned page is some kind of fixed race condition but I am not really sure about that.
With the upstream kernel the failure is slightly different. The page doesn't seem to have LRU bit set but isolate_movable_page simply fails and do_migrate_range simply puts all the isolated pages back to LRU and therefore no progress is made and scan_movable_pages finds same set of pages over and over again.
Fix both cases by explicitly checking HWPoisoned pages before we even try to get reference on the page, try to unmap it if it is still mapped. As explained by Naoya:
: Hwpoison code never unmapped those for no big reason because : Ksm pages never dominate memory, so we simply didn't have strong : motivation to save the pages.
Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU HWPoison pages which shouldn't happen but I couldn't convince myself about that. Naoya has noted the following:
: Theoretically no such gurantee, because try_to_unmap() doesn't have a : guarantee of success and then memory_failure() returns immediately : when hwpoison_user_mappings fails. : Or the following code (comes after hwpoison_user_mappings block) also impli= : es : that the target page can still have PageLRU flag. : : /* : * Torn down by someone else? : */ : if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) { : action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); : res =3D -EBUSY; : goto out; : } : : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in : current version of your patch.
Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org Signed-off-by: Michal Hocko mhocko@suse.com Reviewed-by: Oscar Salvador osalvador@suse.com Debugged-by: Oscar Salvador osalvador@suse.com Tested-by: Oscar Salvador osalvador@suse.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memory_hotplug.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -34,6 +34,7 @@ #include <linux/hugetlb.h> #include <linux/memblock.h> #include <linux/compaction.h> +#include <linux/rmap.h>
#include <asm/tlbflush.h>
@@ -1369,6 +1370,21 @@ do_migrate_range(unsigned long start_pfn pfn = page_to_pfn(compound_head(page)) + hpage_nr_pages(page) - 1;
+ /* + * HWPoison pages have elevated reference counts so the migration would + * fail on them. It also doesn't make any sense to migrate them in the + * first place. Still try to unmap such a page in case it is still mapped + * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep + * the unmap as the catch all safety net). + */ + if (PageHWPoison(page)) { + if (WARN_ON(PageLRU(page))) + isolate_lru_page(page); + if (page_mapped(page)) + try_to_unmap(page, TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS); + continue; + } + if (!get_page_unless_zero(page)) continue; /*
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 808153e1187fa77ac7d7dad261ff476888dcf398 upstream.
devm_memremap_pages() is a facility that can create struct page entries for any arbitrary range and give drivers the ability to subvert core aspects of page management.
Specifically the facility is tightly integrated with the kernel's memory hotplug functionality. It injects an altmap argument deep into the architecture specific vmemmap implementation to allow allocating from specific reserved pages, and it has Linux specific assumptions about page structure reference counting relative to get_user_pages() and get_user_pages_fast(). It was an oversight and a mistake that this was not marked EXPORT_SYMBOL_GPL from the outset.
Again, devm_memremap_pagex() exposes and relies upon core kernel internal assumptions and will continue to evolve along with 'struct page', memory hotplug, and support for new memory types / topologies. Only an in-kernel GPL-only driver is expected to keep up with this ongoing evolution. This interface, and functionality derived from this interface, is not suitable for kernel-external drivers.
Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwill... Signed-off-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Michal Hocko mhocko@suse.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Balbir Singh bsingharora@gmail.com Cc: Logan Gunthorpe logang@deltatee.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/memremap.c | 2 +- tools/testing/nvdimm/test/iomap.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -233,7 +233,7 @@ void *devm_memremap_pages(struct device err_array: return ERR_PTR(error); } -EXPORT_SYMBOL(devm_memremap_pages); +EXPORT_SYMBOL_GPL(devm_memremap_pages);
unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) { --- a/tools/testing/nvdimm/test/iomap.c +++ b/tools/testing/nvdimm/test/iomap.c @@ -113,7 +113,7 @@ void *__wrap_devm_memremap_pages(struct return nfit_res->buf + offset - nfit_res->res.start; return devm_memremap_pages(dev, pgmap); } -EXPORT_SYMBOL(__wrap_devm_memremap_pages); +EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags) {
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 06489cfbd915ff36c8e36df27f1c2dc60f97ca56 upstream.
Given the fact that devm_memremap_pages() requires a percpu_ref that is torn down by devm_memremap_pages_release() the current support for mapping RAM is broken.
Support for remapping "System RAM" has been broken since the beginning and there is no existing user of this this code path, so just kill the support and make it an explicit error.
This cleanup also simplifies a follow-on patch to fix the error path when setting a devm release action for devm_memremap_pages_release() fails.
Link: http://lkml.kernel.org/r/154275557997.76910.14689813630968180480.stgit@dwill... Signed-off-by: Dan Williams dan.j.williams@intel.com Reviewed-by: "Jérôme Glisse" jglisse@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Logan Gunthorpe logang@deltatee.com Cc: Balbir Singh bsingharora@gmail.com Cc: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/memremap.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -167,15 +167,12 @@ void *devm_memremap_pages(struct device is_ram = region_intersects(align_start, align_size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
- if (is_ram == REGION_MIXED) { - WARN_ONCE(1, "%s attempted on mixed region %pr\n", - __func__, res); + if (is_ram != REGION_DISJOINT) { + WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__, + is_ram == REGION_MIXED ? "mixed" : "ram", res); return ERR_PTR(-ENXIO); }
- if (is_ram == REGION_INTERSECTS) - return __va(res->start); - if (!pgmap->ref) return ERR_PTR(-EINVAL);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit a95c90f1e2c253b280385ecf3d4ebfe476926b28 upstream.
The last step before devm_memremap_pages() returns success is to allocate a release action, devm_memremap_pages_release(), to tear the entire setup down. However, the result from devm_add_action() is not checked.
Checking the error from devm_add_action() is not enough. The api currently relies on the fact that the percpu_ref it is using is killed by the time the devm_memremap_pages_release() is run. Rather than continue this awkward situation, offload the responsibility of killing the percpu_ref to devm_memremap_pages_release() directly. This allows devm_memremap_pages() to do the right thing relative to init failures and shutdown.
Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range.
An argument could be made to require that the ->kill() operation be set in the @pgmap arg rather than passed in separately. However, it helps code readability, tracking the lifetime of a given instance, to be able to grep the kill routine directly at the devm_memremap_pages() call site.
Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwilli... Signed-off-by: Dan Williams dan.j.williams@intel.com Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Reviewed-by: "Jérôme Glisse" jglisse@redhat.com Reported-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Christoph Hellwig hch@lst.de Cc: Balbir Singh bsingharora@gmail.com Cc: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dax/pmem.c | 14 +++----------- drivers/nvdimm/pmem.c | 13 +++++-------- include/linux/memremap.h | 2 ++ kernel/memremap.c | 30 ++++++++++++++---------------- tools/testing/nvdimm/test/iomap.c | 15 ++++++++++++++- 5 files changed, 38 insertions(+), 36 deletions(-)
--- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -48,9 +48,8 @@ static void dax_pmem_percpu_exit(void *d percpu_ref_exit(ref); }
-static void dax_pmem_percpu_kill(void *data) +static void dax_pmem_percpu_kill(struct percpu_ref *ref) { - struct percpu_ref *ref = data; struct dax_pmem *dax_pmem = to_dax_pmem(ref);
dev_dbg(dax_pmem->dev, "trace\n"); @@ -112,17 +111,10 @@ static int dax_pmem_probe(struct device }
dax_pmem->pgmap.ref = &dax_pmem->ref; + dax_pmem->pgmap.kill = dax_pmem_percpu_kill; addr = devm_memremap_pages(dev, &dax_pmem->pgmap); - if (IS_ERR(addr)) { - devm_remove_action(dev, dax_pmem_percpu_exit, &dax_pmem->ref); - percpu_ref_exit(&dax_pmem->ref); + if (IS_ERR(addr)) return PTR_ERR(addr); - } - - rc = devm_add_action_or_reset(dev, dax_pmem_percpu_kill, - &dax_pmem->ref); - if (rc) - return rc;
/* adjust the dax_region resource to the start of data */ memcpy(&res, &dax_pmem->pgmap.res, sizeof(res)); --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -309,8 +309,11 @@ static void pmem_release_queue(void *q) blk_cleanup_queue(q); }
-static void pmem_freeze_queue(void *q) +static void pmem_freeze_queue(struct percpu_ref *ref) { + struct request_queue *q; + + q = container_of(ref, typeof(*q), q_usage_counter); blk_freeze_queue_start(q); }
@@ -402,6 +405,7 @@ static int pmem_attach_disk(struct devic
pmem->pfn_flags = PFN_DEV; pmem->pgmap.ref = &q->q_usage_counter; + pmem->pgmap.kill = pmem_freeze_queue; if (is_nd_pfn(dev)) { if (setup_pagemap_fsdax(dev, &pmem->pgmap)) return -ENOMEM; @@ -427,13 +431,6 @@ static int pmem_attach_disk(struct devic memcpy(&bb_res, &nsio->res, sizeof(bb_res)); }
- /* - * At release time the queue must be frozen before - * devm_memremap_pages is unwound - */ - if (devm_add_action_or_reset(dev, pmem_freeze_queue, q)) - return -ENOMEM; - if (IS_ERR(addr)) return PTR_ERR(addr); pmem->virt_addr = addr; --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -111,6 +111,7 @@ typedef void (*dev_page_free_t)(struct p * @altmap: pre-allocated/reserved memory for vmemmap allocations * @res: physical address range covered by @ref * @ref: reference count that pins the devm_memremap_pages() mapping + * @kill: callback to transition @ref to the dead state * @dev: host device of the mapping for debug * @data: private data pointer for page_free() * @type: memory type: see MEMORY_* in memory_hotplug.h @@ -122,6 +123,7 @@ struct dev_pagemap { bool altmap_valid; struct resource res; struct percpu_ref *ref; + void (*kill)(struct percpu_ref *ref); struct device *dev; void *data; enum memory_type type; --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -88,14 +88,10 @@ static void devm_memremap_pages_release( resource_size_t align_start, align_size; unsigned long pfn;
+ pgmap->kill(pgmap->ref); for_each_device_pfn(pfn, pgmap) put_page(pfn_to_page(pfn));
- if (percpu_ref_tryget_live(pgmap->ref)) { - dev_WARN(dev, "%s: page mapping is still live!\n", __func__); - percpu_ref_put(pgmap->ref); - } - /* pages are dead and unused, undo the arch mapping */ align_start = res->start & ~(SECTION_SIZE - 1); align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) @@ -116,7 +112,7 @@ static void devm_memremap_pages_release( /** * devm_memremap_pages - remap and provide memmap backing for the given resource * @dev: hosting device for @res - * @pgmap: pointer to a struct dev_pgmap + * @pgmap: pointer to a struct dev_pagemap * * Notes: * 1/ At a minimum the res, ref and type members of @pgmap must be initialized @@ -125,11 +121,8 @@ static void devm_memremap_pages_release( * 2/ The altmap field may optionally be initialized, in which case altmap_valid * must be set to true * - * 3/ pgmap.ref must be 'live' on entry and 'dead' before devm_memunmap_pages() - * time (or devm release event). The expected order of events is that ref has - * been through percpu_ref_kill() before devm_memremap_pages_release(). The - * wait for the completion of all references being dropped and - * percpu_ref_exit() must occur after devm_memremap_pages_release(). + * 3/ pgmap->ref must be 'live' on entry and will be killed at + * devm_memremap_pages_release() time, or if this routine fails. * * 4/ res is expected to be a host memory range that could feasibly be * treated as a "System RAM" range, i.e. not a device mmio range, but @@ -145,6 +138,9 @@ void *devm_memremap_pages(struct device pgprot_t pgprot = PAGE_KERNEL; int error, nid, is_ram;
+ if (!pgmap->ref || !pgmap->kill) + return ERR_PTR(-EINVAL); + align_start = res->start & ~(SECTION_SIZE - 1); align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) - align_start; @@ -170,12 +166,10 @@ void *devm_memremap_pages(struct device if (is_ram != REGION_DISJOINT) { WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__, is_ram == REGION_MIXED ? "mixed" : "ram", res); - return ERR_PTR(-ENXIO); + error = -ENXIO; + goto err_array; }
- if (!pgmap->ref) - return ERR_PTR(-EINVAL); - pgmap->dev = dev;
error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start), @@ -217,7 +211,10 @@ void *devm_memremap_pages(struct device align_size >> PAGE_SHIFT, pgmap); percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
- devm_add_action(dev, devm_memremap_pages_release, pgmap); + error = devm_add_action_or_reset(dev, devm_memremap_pages_release, + pgmap); + if (error) + return ERR_PTR(error);
return __va(res->start);
@@ -228,6 +225,7 @@ void *devm_memremap_pages(struct device err_pfn_remap: pgmap_array_delete(res); err_array: + pgmap->kill(pgmap->ref); return ERR_PTR(error); } EXPORT_SYMBOL_GPL(devm_memremap_pages); --- a/tools/testing/nvdimm/test/iomap.c +++ b/tools/testing/nvdimm/test/iomap.c @@ -104,13 +104,26 @@ void *__wrap_devm_memremap(struct device } EXPORT_SYMBOL(__wrap_devm_memremap);
+static void nfit_test_kill(void *_pgmap) +{ + struct dev_pagemap *pgmap = _pgmap; + + pgmap->kill(pgmap->ref); +} + void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) { resource_size_t offset = pgmap->res.start; struct nfit_test_resource *nfit_res = get_nfit_res(offset);
- if (nfit_res) + if (nfit_res) { + int rc; + + rc = devm_add_action_or_reset(dev, nfit_test_kill, pgmap); + if (rc) + return ERR_PTR(rc); return nfit_res->buf + offset - nfit_res->res.start; + } return devm_memremap_pages(dev, pgmap); } EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko mhocko@suse.com
commit 7056d3a37d2c6aaaab10c13e8e69adc67ec1fc65 upstream.
Burt Holzman has noticed that memcg v1 doesn't notify about OOM events via eventfd anymore. The reason is that 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") has moved the oom handling back to the charge path. While doing so the notification was left behind in mem_cgroup_oom_synchronize.
Fix the issue by replicating the oom hierarchy locking and the notification.
Link: http://lkml.kernel.org/r/20181224091107.18354-1-mhocko@kernel.org Fixes: 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Burt Holzman burt@fnal.gov Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Vladimir Davydov <vdavydov.dev@gmail.com Cc: stable@vger.kernel.org [4.19+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memcontrol.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1666,6 +1666,9 @@ enum oom_status {
static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) { + enum oom_status ret; + bool locked; + if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED;
@@ -1700,10 +1703,23 @@ static enum oom_status mem_cgroup_oom(st return OOM_ASYNC; }
+ mem_cgroup_mark_under_oom(memcg); + + locked = mem_cgroup_oom_trylock(memcg); + + if (locked) + mem_cgroup_oom_notify(memcg); + + mem_cgroup_unmark_under_oom(memcg); if (mem_cgroup_out_of_memory(memcg, mask, order)) - return OOM_SUCCESS; + ret = OOM_SUCCESS; + else + ret = OOM_FAILED; + + if (locked) + mem_cgroup_oom_unlock(memcg);
- return OOM_FAILED; + return ret; }
/**
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit 4ecd55ea074217473f94cfee21bb72864d39f8d7 upstream.
After commit d202cce8963d, an expired cache_head can be removed from the cache_detail's hash.
However, the expired cache_head may be waiting for a reply from a previously submitted request. Such a cache_head has an increased refcounter and therefore it won't be freed after cache_put(freeme).
Because the cache_head was removed from the hash it cannot be found during cache_clean() and can be leaked forever, together with stalled cache_request and other taken resources.
In our case we noticed it because an entry in the export cache was holding a reference on a filesystem.
Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup") Cc: Pavel Tikhomirov ptikhomirov@virtuozzo.com Cc: stable@kernel.org # 2.6.35 Signed-off-by: Vasily Averin vvs@virtuozzo.com Reviewed-by: NeilBrown neilb@suse.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/cache.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -54,6 +54,11 @@ static void cache_init(struct cache_head h->last_refresh = now; }
+static void cache_fresh_locked(struct cache_head *head, time_t expiry, + struct cache_detail *detail); +static void cache_fresh_unlocked(struct cache_head *head, + struct cache_detail *detail); + static struct cache_head *sunrpc_cache_find_rcu(struct cache_detail *detail, struct cache_head *key, int hash) @@ -100,6 +105,7 @@ static struct cache_head *sunrpc_cache_a if (cache_is_expired(detail, tmp)) { hlist_del_init_rcu(&tmp->cache_list); detail->entries --; + cache_fresh_locked(tmp, 0, detail); freeme = tmp; break; } @@ -115,8 +121,10 @@ static struct cache_head *sunrpc_cache_a cache_get(new); spin_unlock(&detail->hash_lock);
- if (freeme) + if (freeme) { + cache_fresh_unlocked(freeme, detail); cache_put(freeme, detail); + } return new; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit b8be5674fa9a6f3677865ea93f7803c4212f3e10 upstream.
Signed-off-by: Vasily Averin vvs@virtuozzo.com Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/auth_gss/svcauth_gss.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1142,7 +1142,7 @@ static int svcauth_gss_legacy_init(struc struct kvec *resv = &rqstp->rq_res.head[0]; struct rsi *rsip, rsikey; int ret; - struct sunrpc_net *sn = net_generic(rqstp->rq_xprt->xpt_net, sunrpc_net_id); + struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
memset(&rsikey, 0, sizeof(rsikey)); ret = gss_read_verf(gc, argv, authp, @@ -1253,7 +1253,7 @@ static int svcauth_gss_proxy_init(struct uint64_t handle; int status; int ret; - struct net *net = rqstp->rq_xprt->xpt_net; + struct net *net = SVC_NET(rqstp); struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
memset(&ud, 0, sizeof(ud)); @@ -1444,7 +1444,7 @@ svcauth_gss_accept(struct svc_rqst *rqst __be32 *rpcstart; __be32 *reject_stat = resv->iov_base + resv->iov_len; int ret; - struct sunrpc_net *sn = net_generic(rqstp->rq_xprt->xpt_net, sunrpc_net_id); + struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
dprintk("RPC: svcauth_gss: argv->iov_len = %zd\n", argv->iov_len); @@ -1734,7 +1734,7 @@ svcauth_gss_release(struct svc_rqst *rqs struct rpc_gss_wire_cred *gc = &gsd->clcred; struct xdr_buf *resbuf = &rqstp->rq_res; int stat = -EINVAL; - struct sunrpc_net *sn = net_generic(rqstp->rq_xprt->xpt_net, sunrpc_net_id); + struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
if (gc->gc_proc != RPC_GSS_PROC_DATA) goto out;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 69324b8f48339de2f90fdf2f774687fc6c47629a upstream.
In preparation for consolidating all ZONE_DEVICE enabling via devm_memremap_pages(), teach it how to handle the constraints of MEMORY_DEVICE_PRIVATE ranges.
[jglisse@redhat.com: call move_pfn_range_to_zone for MEMORY_DEVICE_PRIVATE] Link: http://lkml.kernel.org/r/154275559036.76910.12434636179931292607.stgit@dwill... Signed-off-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Jérôme Glisse jglisse@redhat.com Acked-by: Christoph Hellwig hch@lst.de Reported-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Logan Gunthorpe logang@deltatee.com Cc: Balbir Singh bsingharora@gmail.com Cc: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/memremap.c | 53 +++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 12 deletions(-)
--- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -98,9 +98,15 @@ static void devm_memremap_pages_release( - align_start;
mem_hotplug_begin(); - arch_remove_memory(align_start, align_size, pgmap->altmap_valid ? - &pgmap->altmap : NULL); - kasan_remove_zero_shadow(__va(align_start), align_size); + if (pgmap->type == MEMORY_DEVICE_PRIVATE) { + pfn = align_start >> PAGE_SHIFT; + __remove_pages(page_zone(pfn_to_page(pfn)), pfn, + align_size >> PAGE_SHIFT, NULL); + } else { + arch_remove_memory(align_start, align_size, + pgmap->altmap_valid ? &pgmap->altmap : NULL); + kasan_remove_zero_shadow(__va(align_start), align_size); + } mem_hotplug_done();
untrack_pfn(NULL, PHYS_PFN(align_start), align_size); @@ -187,17 +193,40 @@ void *devm_memremap_pages(struct device goto err_pfn_remap;
mem_hotplug_begin(); - error = kasan_add_zero_shadow(__va(align_start), align_size); - if (error) { - mem_hotplug_done(); - goto err_kasan; + + /* + * For device private memory we call add_pages() as we only need to + * allocate and initialize struct page for the device memory. More- + * over the device memory is un-accessible thus we do not want to + * create a linear mapping for the memory like arch_add_memory() + * would do. + * + * For all other device memory types, which are accessible by + * the CPU, we do want the linear mapping and thus use + * arch_add_memory(). + */ + if (pgmap->type == MEMORY_DEVICE_PRIVATE) { + error = add_pages(nid, align_start >> PAGE_SHIFT, + align_size >> PAGE_SHIFT, NULL, false); + } else { + error = kasan_add_zero_shadow(__va(align_start), align_size); + if (error) { + mem_hotplug_done(); + goto err_kasan; + } + + error = arch_add_memory(nid, align_start, align_size, altmap, + false); + } + + if (!error) { + struct zone *zone; + + zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; + move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT, + align_size >> PAGE_SHIFT, altmap); }
- error = arch_add_memory(nid, align_start, align_size, altmap, false); - if (!error) - move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], - align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, altmap); mem_hotplug_done(); if (error) goto err_add_memory;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 58ef15b765af0d2cbe6799ec564f1dc485010ab8 upstream.
devm semantics arrange for resources to be torn down when device-driver-probe fails or when device-driver-release completes. Similar to devm_memremap_pages() there is no need to support an explicit remove operation when the users properly adhere to devm semantics.
Note that devm_kzalloc() automatically handles allocating node-local memory.
Link: http://lkml.kernel.org/r/154275559545.76910.9186690723515469051.stgit@dwilli... Signed-off-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Jérôme Glisse jglisse@redhat.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Logan Gunthorpe logang@deltatee.com Cc: Balbir Singh bsingharora@gmail.com Cc: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/hmm.h | 4 - mm/hmm.c | 127 +++++++++------------------------------------------- 2 files changed, 25 insertions(+), 106 deletions(-)
--- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -512,8 +512,7 @@ struct hmm_devmem { * enough and allocate struct page for it. * * The device driver can wrap the hmm_devmem struct inside a private device - * driver struct. The device driver must call hmm_devmem_remove() before the - * device goes away and before freeing the hmm_devmem struct memory. + * driver struct. */ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops, struct device *device, @@ -521,7 +520,6 @@ struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops, struct device *device, struct resource *res); -void hmm_devmem_remove(struct hmm_devmem *devmem);
/* * hmm_devmem_page_set_drvdata - set per-page driver data field --- a/mm/hmm.c +++ b/mm/hmm.c @@ -987,7 +987,6 @@ static void hmm_devmem_ref_exit(void *da
devmem = container_of(ref, struct hmm_devmem, ref); percpu_ref_exit(ref); - devm_remove_action(devmem->device, &hmm_devmem_ref_exit, data); }
static void hmm_devmem_ref_kill(void *data) @@ -998,7 +997,6 @@ static void hmm_devmem_ref_kill(void *da devmem = container_of(ref, struct hmm_devmem, ref); percpu_ref_kill(ref); wait_for_completion(&devmem->completion); - devm_remove_action(devmem->device, &hmm_devmem_ref_kill, data); }
static int hmm_devmem_fault(struct vm_area_struct *vma, @@ -1036,7 +1034,7 @@ static void hmm_devmem_radix_release(str mutex_unlock(&hmm_devmem_lock); }
-static void hmm_devmem_release(struct device *dev, void *data) +static void hmm_devmem_release(void *data) { struct hmm_devmem *devmem = data; struct resource *resource = devmem->resource; @@ -1044,11 +1042,6 @@ static void hmm_devmem_release(struct de struct zone *zone; struct page *page;
- if (percpu_ref_tryget_live(&devmem->ref)) { - dev_WARN(dev, "%s: page mapping is still live!\n", __func__); - percpu_ref_put(&devmem->ref); - } - /* pages are dead and unused, undo the arch mapping */ start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT; npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT; @@ -1174,19 +1167,6 @@ error: return ret; }
-static int hmm_devmem_match(struct device *dev, void *data, void *match_data) -{ - struct hmm_devmem *devmem = data; - - return devmem->resource == match_data; -} - -static void hmm_devmem_pages_remove(struct hmm_devmem *devmem) -{ - devres_release(devmem->device, &hmm_devmem_release, - &hmm_devmem_match, devmem->resource); -} - /* * hmm_devmem_add() - hotplug ZONE_DEVICE memory for device memory * @@ -1214,8 +1194,7 @@ struct hmm_devmem *hmm_devmem_add(const
dev_pagemap_get_ops();
- devmem = devres_alloc_node(&hmm_devmem_release, sizeof(*devmem), - GFP_KERNEL, dev_to_node(device)); + devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL); if (!devmem) return ERR_PTR(-ENOMEM);
@@ -1229,11 +1208,11 @@ struct hmm_devmem *hmm_devmem_add(const ret = percpu_ref_init(&devmem->ref, &hmm_devmem_ref_release, 0, GFP_KERNEL); if (ret) - goto error_percpu_ref; + return ERR_PTR(ret);
- ret = devm_add_action(device, hmm_devmem_ref_exit, &devmem->ref); + ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit, &devmem->ref); if (ret) - goto error_devm_add_action; + return ERR_PTR(ret);
size = ALIGN(size, PA_SECTION_SIZE); addr = min((unsigned long)iomem_resource.end, @@ -1253,16 +1232,12 @@ struct hmm_devmem *hmm_devmem_add(const
devmem->resource = devm_request_mem_region(device, addr, size, dev_name(device)); - if (!devmem->resource) { - ret = -ENOMEM; - goto error_no_resource; - } + if (!devmem->resource) + return ERR_PTR(-ENOMEM); break; } - if (!devmem->resource) { - ret = -ERANGE; - goto error_no_resource; - } + if (!devmem->resource) + return ERR_PTR(-ERANGE);
devmem->resource->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY; devmem->pfn_first = devmem->resource->start >> PAGE_SHIFT; @@ -1271,28 +1246,13 @@ struct hmm_devmem *hmm_devmem_add(const
ret = hmm_devmem_pages_create(devmem); if (ret) - goto error_pages; - - devres_add(device, devmem); + return ERR_PTR(ret);
- ret = devm_add_action(device, hmm_devmem_ref_kill, &devmem->ref); - if (ret) { - hmm_devmem_remove(devmem); + ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem); + if (ret) return ERR_PTR(ret); - }
return devmem; - -error_pages: - devm_release_mem_region(device, devmem->resource->start, - resource_size(devmem->resource)); -error_no_resource: -error_devm_add_action: - hmm_devmem_ref_kill(&devmem->ref); - hmm_devmem_ref_exit(&devmem->ref); -error_percpu_ref: - devres_free(devmem); - return ERR_PTR(ret); } EXPORT_SYMBOL(hmm_devmem_add);
@@ -1308,8 +1268,7 @@ struct hmm_devmem *hmm_devmem_add_resour
dev_pagemap_get_ops();
- devmem = devres_alloc_node(&hmm_devmem_release, sizeof(*devmem), - GFP_KERNEL, dev_to_node(device)); + devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL); if (!devmem) return ERR_PTR(-ENOMEM);
@@ -1323,12 +1282,12 @@ struct hmm_devmem *hmm_devmem_add_resour ret = percpu_ref_init(&devmem->ref, &hmm_devmem_ref_release, 0, GFP_KERNEL); if (ret) - goto error_percpu_ref; + return ERR_PTR(ret);
- ret = devm_add_action(device, hmm_devmem_ref_exit, &devmem->ref); + ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit, + &devmem->ref); if (ret) - goto error_devm_add_action; - + return ERR_PTR(ret);
devmem->pfn_first = devmem->resource->start >> PAGE_SHIFT; devmem->pfn_last = devmem->pfn_first + @@ -1336,60 +1295,22 @@ struct hmm_devmem *hmm_devmem_add_resour
ret = hmm_devmem_pages_create(devmem); if (ret) - goto error_devm_add_action; + return ERR_PTR(ret);
- devres_add(device, devmem); + ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem); + if (ret) + return ERR_PTR(ret);
- ret = devm_add_action(device, hmm_devmem_ref_kill, &devmem->ref); - if (ret) { - hmm_devmem_remove(devmem); + ret = devm_add_action_or_reset(device, hmm_devmem_ref_kill, + &devmem->ref); + if (ret) return ERR_PTR(ret); - }
return devmem; - -error_devm_add_action: - hmm_devmem_ref_kill(&devmem->ref); - hmm_devmem_ref_exit(&devmem->ref); -error_percpu_ref: - devres_free(devmem); - return ERR_PTR(ret); } EXPORT_SYMBOL(hmm_devmem_add_resource);
/* - * hmm_devmem_remove() - remove device memory (kill and free ZONE_DEVICE) - * - * @devmem: hmm_devmem struct use to track and manage the ZONE_DEVICE memory - * - * This will hot-unplug memory that was hotplugged by hmm_devmem_add on behalf - * of the device driver. It will free struct page and remove the resource that - * reserved the physical address range for this device memory. - */ -void hmm_devmem_remove(struct hmm_devmem *devmem) -{ - resource_size_t start, size; - struct device *device; - bool cdm = false; - - if (!devmem) - return; - - device = devmem->device; - start = devmem->resource->start; - size = resource_size(devmem->resource); - - cdm = devmem->resource->desc == IORES_DESC_DEVICE_PUBLIC_MEMORY; - hmm_devmem_ref_kill(&devmem->ref); - hmm_devmem_ref_exit(&devmem->ref); - hmm_devmem_pages_remove(devmem); - - if (!cdm) - devm_release_mem_region(device, start, size); -} -EXPORT_SYMBOL(hmm_devmem_remove); - -/* * A device driver that wants to handle multiple devices memory through a * single fake device can use hmm_device to do so. This is purely a helper * and it is not needed to make use of any HMM functionality.
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit bbecd94e6c514a1559fc1a7749a62715958137b1 upstream.
Commit e8d513483300 ("memremap: change devm_memremap_pages interface to use struct dev_pagemap") refactored devm_memremap_pages() to allow a dev_pagemap instance to be supplied. Passing in a dev_pagemap interface simplifies the design of pgmap type drivers in that they can rely on container_of() to lookup any private data associated with the given dev_pagemap instance.
In addition to the cleanups this also gives hmm users multi-order-radix improvements that arrived with commit ab1b597ee0e4 "mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups"
As part of the conversion to the devm_memremap_pages() method of handling the percpu_ref relative to when pages are put, the percpu_ref completion needs to move to hmm_devmem_ref_exit(). See 71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page...") for details.
Link: http://lkml.kernel.org/r/154275560053.76910.10870962637383152392.stgit@dwill... Signed-off-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Jérôme Glisse jglisse@redhat.com Acked-by: Balbir Singh bsingharora@gmail.com Cc: Logan Gunthorpe logang@deltatee.com Cc: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/hmm.c | 196 ++++++++------------------------------------------------------- 1 file changed, 26 insertions(+), 170 deletions(-)
--- a/mm/hmm.c +++ b/mm/hmm.c @@ -986,17 +986,16 @@ static void hmm_devmem_ref_exit(void *da struct hmm_devmem *devmem;
devmem = container_of(ref, struct hmm_devmem, ref); + wait_for_completion(&devmem->completion); percpu_ref_exit(ref); }
-static void hmm_devmem_ref_kill(void *data) +static void hmm_devmem_ref_kill(struct percpu_ref *ref) { - struct percpu_ref *ref = data; struct hmm_devmem *devmem;
devmem = container_of(ref, struct hmm_devmem, ref); percpu_ref_kill(ref); - wait_for_completion(&devmem->completion); }
static int hmm_devmem_fault(struct vm_area_struct *vma, @@ -1019,154 +1018,6 @@ static void hmm_devmem_free(struct page devmem->ops->free(devmem, page); }
-static DEFINE_MUTEX(hmm_devmem_lock); -static RADIX_TREE(hmm_devmem_radix, GFP_KERNEL); - -static void hmm_devmem_radix_release(struct resource *resource) -{ - resource_size_t key; - - mutex_lock(&hmm_devmem_lock); - for (key = resource->start; - key <= resource->end; - key += PA_SECTION_SIZE) - radix_tree_delete(&hmm_devmem_radix, key >> PA_SECTION_SHIFT); - mutex_unlock(&hmm_devmem_lock); -} - -static void hmm_devmem_release(void *data) -{ - struct hmm_devmem *devmem = data; - struct resource *resource = devmem->resource; - unsigned long start_pfn, npages; - struct zone *zone; - struct page *page; - - /* pages are dead and unused, undo the arch mapping */ - start_pfn = (resource->start & ~(PA_SECTION_SIZE - 1)) >> PAGE_SHIFT; - npages = ALIGN(resource_size(resource), PA_SECTION_SIZE) >> PAGE_SHIFT; - - page = pfn_to_page(start_pfn); - zone = page_zone(page); - - mem_hotplug_begin(); - if (resource->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) - __remove_pages(zone, start_pfn, npages, NULL); - else - arch_remove_memory(start_pfn << PAGE_SHIFT, - npages << PAGE_SHIFT, NULL); - mem_hotplug_done(); - - hmm_devmem_radix_release(resource); -} - -static int hmm_devmem_pages_create(struct hmm_devmem *devmem) -{ - resource_size_t key, align_start, align_size, align_end; - struct device *device = devmem->device; - int ret, nid, is_ram; - - align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(devmem->resource->start + - resource_size(devmem->resource), - PA_SECTION_SIZE) - align_start; - - is_ram = region_intersects(align_start, align_size, - IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE); - if (is_ram == REGION_MIXED) { - WARN_ONCE(1, "%s attempted on mixed region %pr\n", - __func__, devmem->resource); - return -ENXIO; - } - if (is_ram == REGION_INTERSECTS) - return -ENXIO; - - if (devmem->resource->desc == IORES_DESC_DEVICE_PUBLIC_MEMORY) - devmem->pagemap.type = MEMORY_DEVICE_PUBLIC; - else - devmem->pagemap.type = MEMORY_DEVICE_PRIVATE; - - devmem->pagemap.res = *devmem->resource; - devmem->pagemap.page_fault = hmm_devmem_fault; - devmem->pagemap.page_free = hmm_devmem_free; - devmem->pagemap.dev = devmem->device; - devmem->pagemap.ref = &devmem->ref; - devmem->pagemap.data = devmem; - - mutex_lock(&hmm_devmem_lock); - align_end = align_start + align_size - 1; - for (key = align_start; key <= align_end; key += PA_SECTION_SIZE) { - struct hmm_devmem *dup; - - dup = radix_tree_lookup(&hmm_devmem_radix, - key >> PA_SECTION_SHIFT); - if (dup) { - dev_err(device, "%s: collides with mapping for %s\n", - __func__, dev_name(dup->device)); - mutex_unlock(&hmm_devmem_lock); - ret = -EBUSY; - goto error; - } - ret = radix_tree_insert(&hmm_devmem_radix, - key >> PA_SECTION_SHIFT, - devmem); - if (ret) { - dev_err(device, "%s: failed: %d\n", __func__, ret); - mutex_unlock(&hmm_devmem_lock); - goto error_radix; - } - } - mutex_unlock(&hmm_devmem_lock); - - nid = dev_to_node(device); - if (nid < 0) - nid = numa_mem_id(); - - mem_hotplug_begin(); - /* - * For device private memory we call add_pages() as we only need to - * allocate and initialize struct page for the device memory. More- - * over the device memory is un-accessible thus we do not want to - * create a linear mapping for the memory like arch_add_memory() - * would do. - * - * For device public memory, which is accesible by the CPU, we do - * want the linear mapping and thus use arch_add_memory(). - */ - if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC) - ret = arch_add_memory(nid, align_start, align_size, NULL, - false); - else - ret = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); - if (ret) { - mem_hotplug_done(); - goto error_add_memory; - } - move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], - align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL); - mem_hotplug_done(); - - /* - * Initialization of the pages has been deferred until now in order - * to allow us to do the work while not holding the hotplug lock. - */ - memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], - align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, &devmem->pagemap); - - return 0; - -error_add_memory: - untrack_pfn(NULL, PHYS_PFN(align_start), align_size); -error_radix: - hmm_devmem_radix_release(devmem->resource); -error: - return ret; -} - /* * hmm_devmem_add() - hotplug ZONE_DEVICE memory for device memory * @@ -1190,6 +1041,7 @@ struct hmm_devmem *hmm_devmem_add(const { struct hmm_devmem *devmem; resource_size_t addr; + void *result; int ret;
dev_pagemap_get_ops(); @@ -1244,14 +1096,18 @@ struct hmm_devmem *hmm_devmem_add(const devmem->pfn_last = devmem->pfn_first + (resource_size(devmem->resource) >> PAGE_SHIFT);
- ret = hmm_devmem_pages_create(devmem); - if (ret) - return ERR_PTR(ret); - - ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem); - if (ret) - return ERR_PTR(ret); + devmem->pagemap.type = MEMORY_DEVICE_PRIVATE; + devmem->pagemap.res = *devmem->resource; + devmem->pagemap.page_fault = hmm_devmem_fault; + devmem->pagemap.page_free = hmm_devmem_free; + devmem->pagemap.altmap_valid = false; + devmem->pagemap.ref = &devmem->ref; + devmem->pagemap.data = devmem; + devmem->pagemap.kill = hmm_devmem_ref_kill;
+ result = devm_memremap_pages(devmem->device, &devmem->pagemap); + if (IS_ERR(result)) + return result; return devmem; } EXPORT_SYMBOL(hmm_devmem_add); @@ -1261,6 +1117,7 @@ struct hmm_devmem *hmm_devmem_add_resour struct resource *res) { struct hmm_devmem *devmem; + void *result; int ret;
if (res->desc != IORES_DESC_DEVICE_PUBLIC_MEMORY) @@ -1293,19 +1150,18 @@ struct hmm_devmem *hmm_devmem_add_resour devmem->pfn_last = devmem->pfn_first + (resource_size(devmem->resource) >> PAGE_SHIFT);
- ret = hmm_devmem_pages_create(devmem); - if (ret) - return ERR_PTR(ret); - - ret = devm_add_action_or_reset(device, hmm_devmem_release, devmem); - if (ret) - return ERR_PTR(ret); - - ret = devm_add_action_or_reset(device, hmm_devmem_ref_kill, - &devmem->ref); - if (ret) - return ERR_PTR(ret); + devmem->pagemap.type = MEMORY_DEVICE_PUBLIC; + devmem->pagemap.res = *devmem->resource; + devmem->pagemap.page_fault = hmm_devmem_fault; + devmem->pagemap.page_free = hmm_devmem_free; + devmem->pagemap.altmap_valid = false; + devmem->pagemap.ref = &devmem->ref; + devmem->pagemap.data = devmem; + devmem->pagemap.kill = hmm_devmem_ref_kill;
+ result = devm_memremap_pages(devmem->device, &devmem->pagemap); + if (IS_ERR(result)) + return result; return devmem; } EXPORT_SYMBOL(hmm_devmem_add_resource);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 02917e9f8676207a4c577d4d94eae12bf348e9d7 upstream.
At Maintainer Summit, Greg brought up a topic I proposed around EXPORT_SYMBOL_GPL usage. The motivation was considerations for when EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional step of reclassifying an existing export. Specifically, I wanted to make the case that although the line is fuzzy and hard to specify in abstract terms, it is nonetheless clear that devm_memremap_pages() and HMM (Heterogeneous Memory Management) have crossed it. The devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the beginning, and HMM as a derivative of that functionality should have naturally picked up that designation as well.
Contrary to typical rules, the HMM infrastructure was merged upstream with zero in-tree consumers. There was a promise at the time that those users would be merged "soon", but it has been over a year with no drivers arriving. While the Nouveau driver is about to belatedly make good on that promise it is clear that HMM was targeted first and foremost at an out-of-tree consumer.
HMM is derived from devm_memremap_pages(), a facility Christoph and I spearheaded to support persistent memory. It combines a device lifetime model with a dynamically created 'struct page' / memmap array for any physical address range. It enables coordination and control of the many code paths in the kernel built to interact with memory via 'struct page' objects. With HMM the integration goes even deeper by allowing device drivers to hook and manipulate page fault and page free events.
One interpretation of when EXPORT_SYMBOL is suitable is when it is exporting stable and generic leaf functionality. The devm_memremap_pages() facility continues to see expanding use cases, peer-to-peer DMA being the most recent, with no clear end date when it will stop attracting reworks and semantic changes. It is not suitable to export devm_memremap_pages() as a stable 3rd party driver API due to the fact that it is still changing and manipulates core behavior. Moreover, it is not in the best interest of the long term development of the core memory management subsystem to permit any external driver to effectively define its own system-wide memory management policies with no encouragement to engage with upstream.
I am also concerned that HMM was designed in a way to minimize further engagement with the core-MM. That, with these hooks in place, device-drivers are free to implement their own policies without much consideration for whether and how the core-MM could grow to meet that need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the core-MM should be allowed the opportunity and stimulus to change and address these new use cases as first class functionality.
Original changelog:
hmm_devmem_add(), and hmm_devmem_add_resource() duplicated devm_memremap_pages() and are now simple now wrappers around the core facility to inject a dev_pagemap instance into the global pgmap_radix and hook page-idle events. The devm_memremap_pages() interface is base infrastructure for HMM. HMM has more and deeper ties into the kernel memory management implementation than base ZONE_DEVICE which is itself a EXPORT_SYMBOL_GPL facility.
Originally, the HMM page structure creation routines copied the devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the implementations was discussed during the initial review: http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled this cleanup to move forward.
In addition to the integration with devm_memremap_pages() HMM depends on other GPL-only symbols:
mmu_notifier_unregister_no_release percpu_ref region_intersects __class_create
It goes further to consume / indirectly expose functionality that is not exported to any other driver:
alloc_pages_vma walk_page_range
HMM is derived from devm_memremap_pages(), and extends deep core-kernel fundamentals. Similar to devm_memremap_pages(), mark its entry points EXPORT_SYMBOL_GPL().
[logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()] Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwill... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Christoph Hellwig hch@lst.de Cc: Logan Gunthorpe logang@deltatee.com Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Balbir Singh bsingharora@gmail.com, Cc: Michal Hocko mhocko@suse.com Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/p2pdma.c | 10 ++-------- mm/hmm.c | 4 ++-- 2 files changed, 4 insertions(+), 10 deletions(-)
--- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -82,10 +82,8 @@ static void pci_p2pdma_percpu_release(st complete_all(&p2p->devmap_ref_done); }
-static void pci_p2pdma_percpu_kill(void *data) +static void pci_p2pdma_percpu_kill(struct percpu_ref *ref) { - struct percpu_ref *ref = data; - /* * pci_p2pdma_add_resource() may be called multiple times * by a driver and may register the percpu_kill devm action multiple @@ -198,6 +196,7 @@ int pci_p2pdma_add_resource(struct pci_d pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) - pci_resource_start(pdev, bar); + pgmap->kill = pci_p2pdma_percpu_kill;
addr = devm_memremap_pages(&pdev->dev, pgmap); if (IS_ERR(addr)) { @@ -211,11 +210,6 @@ int pci_p2pdma_add_resource(struct pci_d if (error) goto pgmap_free;
- error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill, - &pdev->p2pdma->devmap_ref); - if (error) - goto pgmap_free; - pci_info(pdev, "added peer-to-peer DMA memory %pR\n", &pgmap->res);
--- a/mm/hmm.c +++ b/mm/hmm.c @@ -1110,7 +1110,7 @@ struct hmm_devmem *hmm_devmem_add(const return result; return devmem; } -EXPORT_SYMBOL(hmm_devmem_add); +EXPORT_SYMBOL_GPL(hmm_devmem_add);
struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops, struct device *device, @@ -1164,7 +1164,7 @@ struct hmm_devmem *hmm_devmem_add_resour return result; return devmem; } -EXPORT_SYMBOL(hmm_devmem_add_resource); +EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
/* * A device driver that wants to handle multiple devices memory through a
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huang Ying ying.huang@intel.com
commit 7af7a8e19f0c5425ff639b0f0d2d244c2a647724 upstream.
KSM pages may be mapped to the multiple VMAs that cannot be reached from one anon_vma. So during swapin, a new copy of the page need to be generated if a different anon_vma is needed, please refer to comments of ksm_might_need_to_copy() for details.
During swapoff, unuse_vma() uses anon_vma (if available) to locate VMA and virtual address mapped to the page, so not all mappings to a swapped out KSM page could be found. So in try_to_unuse(), even if the swap count of a swap entry isn't zero, the page needs to be deleted from swap cache, so that, in the next round a new page could be allocated and swapin for the other mappings of the swapped out KSM page.
But this contradicts with the THP swap support. Where the THP could be deleted from swap cache only after the swap count of every swap entry in the huge swap cluster backing the THP has reach 0. So try_to_unuse() is changed in commit e07098294adf ("mm, THP, swap: support to reclaim swap space for THP swapped out") to check that before delete a page from swap cache, but this has broken KSM swapoff too.
Fortunately, KSM is for the normal pages only, so the original behavior for KSM pages could be restored easily via checking PageTransCompound(). That is how this patch works.
The bug is introduced by e07098294adf ("mm, THP, swap: support to reclaim swap space for THP swapped out"), which is merged by v4.14-rc1. So I think we should backport the fix to from 4.14 on. But Hugh thinks it may be rare for the KSM pages being in the swap device when swapoff, so nobody reports the bug so far.
Link: http://lkml.kernel.org/r/20181226051522.28442-1-ying.huang@intel.com Fixes: e07098294adf ("mm, THP, swap: support to reclaim swap space for THP swapped out") Signed-off-by: "Huang, Ying" ying.huang@intel.com Reported-by: Hugh Dickins hughd@google.com Tested-by: Hugh Dickins hughd@google.com Acked-by: Hugh Dickins hughd@google.com Cc: Rik van Riel riel@redhat.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Minchan Kim minchan@kernel.org Cc: Shaohua Li shli@kernel.org Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/swapfile.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2197,7 +2197,8 @@ int try_to_unuse(unsigned int type, bool */ if (PageSwapCache(page) && likely(page_private(page) == entry.val) && - !page_swapped(page)) + (!PageTransCompound(page) || + !swap_page_trans_huge_swapped(si, entry))) delete_from_swap_cache(compound_head(page));
/*
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brad Love brad@nextdimension.cc
commit 4bd46aa0353e022c2401a258e93b107880a66533 upstream.
It is reported that commit 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes") caused regresssions with other CPUs.
Ensure that the quirk will be applied only for the CPUs that are known to cause problems.
A module option is added for explicit control of the behaviour.
Fixes: 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes")
Signed-off-by: Brad Love brad@nextdimension.cc Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/pci/cx23885/cx23885-core.c | 55 +++++++++++++++++++++++++++++-- drivers/media/pci/cx23885/cx23885.h | 2 + 2 files changed, 55 insertions(+), 2 deletions(-)
--- a/drivers/media/pci/cx23885/cx23885-core.c +++ b/drivers/media/pci/cx23885/cx23885-core.c @@ -23,6 +23,7 @@ #include <linux/moduleparam.h> #include <linux/kmod.h> #include <linux/kernel.h> +#include <linux/pci.h> #include <linux/slab.h> #include <linux/interrupt.h> #include <linux/delay.h> @@ -41,6 +42,18 @@ MODULE_AUTHOR("Steven Toth <stoth@linuxt MODULE_LICENSE("GPL"); MODULE_VERSION(CX23885_VERSION);
+/* + * Some platforms have been found to require periodic resetting of the DMA + * engine. Ryzen and XEON platforms are known to be affected. The symptom + * encountered is "mpeg risc op code error". Only Ryzen platforms employ + * this workaround if the option equals 1. The workaround can be explicitly + * disabled for all platforms by setting to 0, the workaround can be forced + * on for any platform by setting to 2. + */ +static unsigned int dma_reset_workaround = 1; +module_param(dma_reset_workaround, int, 0644); +MODULE_PARM_DESC(dma_reset_workaround, "periodic RiSC dma engine reset; 0-force disable, 1-driver detect (default), 2-force enable"); + static unsigned int debug; module_param(debug, int, 0644); MODULE_PARM_DESC(debug, "enable debug messages"); @@ -603,8 +616,13 @@ static void cx23885_risc_disasm(struct c
static void cx23885_clear_bridge_error(struct cx23885_dev *dev) { - uint32_t reg1_val = cx_read(TC_REQ); /* read-only */ - uint32_t reg2_val = cx_read(TC_REQ_SET); + uint32_t reg1_val, reg2_val; + + if (!dev->need_dma_reset) + return; + + reg1_val = cx_read(TC_REQ); /* read-only */ + reg2_val = cx_read(TC_REQ_SET);
if (reg1_val && reg2_val) { cx_write(TC_REQ, reg1_val); @@ -2058,6 +2076,37 @@ void cx23885_gpio_enable(struct cx23885_ /* TODO: 23-19 */ }
+static struct { + int vendor, dev; +} const broken_dev_id[] = { + /* According with + * https://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen%207%201800X..., + * 0x1451 is PCI ID for the IOMMU found on Ryzen + */ + { PCI_VENDOR_ID_AMD, 0x1451 }, +}; + +static bool cx23885_does_need_dma_reset(void) +{ + int i; + struct pci_dev *pdev = NULL; + + if (dma_reset_workaround == 0) + return false; + else if (dma_reset_workaround == 2) + return true; + + for (i = 0; i < ARRAY_SIZE(broken_dev_id); i++) { + pdev = pci_get_device(broken_dev_id[i].vendor, + broken_dev_id[i].dev, NULL); + if (pdev) { + pci_dev_put(pdev); + return true; + } + } + return false; +} + static int cx23885_initdev(struct pci_dev *pci_dev, const struct pci_device_id *pci_id) { @@ -2069,6 +2118,8 @@ static int cx23885_initdev(struct pci_de if (NULL == dev) return -ENOMEM;
+ dev->need_dma_reset = cx23885_does_need_dma_reset(); + err = v4l2_device_register(&pci_dev->dev, &dev->v4l2_dev); if (err < 0) goto fail_free; --- a/drivers/media/pci/cx23885/cx23885.h +++ b/drivers/media/pci/cx23885/cx23885.h @@ -451,6 +451,8 @@ struct cx23885_dev { /* Analog raw audio */ struct cx23885_audio_dev *audio_dev;
+ /* Does the system require periodic DMA resets? */ + unsigned int need_dma_reset:1; };
static inline struct cx23885_dev *to_cx23885(struct v4l2_device *v4l2_dev)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 1524f4e47f90b27a3ac84efbdd94c63172246a6f upstream.
The "chip->dsp_spos_instance" can be NULL on some of the ealier error paths in snd_cs46xx_create().
Reported-by: "Yavuz, Tuba" tuba@ece.ufl.edu Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/cs46xx/dsp_spos.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/sound/pci/cs46xx/dsp_spos.c +++ b/sound/pci/cs46xx/dsp_spos.c @@ -903,6 +903,9 @@ int cs46xx_dsp_proc_done (struct snd_cs4 struct dsp_spos_instance * ins = chip->dsp_spos_instance; int i;
+ if (!ins) + return 0; + snd_info_free_entry(ins->proc_sym_info_entry); ins->proc_sym_info_entry = NULL;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit f4351a199cc120ff9d59e06d02e8657d08e6cc46 upstream.
The parser for the processing unit reads bNrInPins field before the bLength sanity check, which may lead to an out-of-bound access when a malformed descriptor is given. Fix it by assignment after the bLength check.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2314,7 +2314,7 @@ static int build_audio_procunit(struct m char *name) { struct uac_processing_unit_descriptor *desc = raw_desc; - int num_ins = desc->bNrInPins; + int num_ins; struct usb_mixer_elem_info *cval; struct snd_kcontrol *kctl; int i, err, nameid, type, len; @@ -2329,7 +2329,13 @@ static int build_audio_procunit(struct m 0, NULL, default_value_info };
- if (desc->bLength < 13 || desc->bLength < 13 + num_ins || + if (desc->bLength < 13) { + usb_audio_err(state->chip, "invalid %s descriptor (id %d)\n", name, unitid); + return -EINVAL; + } + + num_ins = desc->bNrInPins; + if (desc->bLength < 13 + num_ins || desc->bLength < num_ins + uac_processing_unit_bControlSize(desc, state->mixer->protocol)) { usb_audio_err(state->chip, "invalid %s descriptor (id %d)\n", name, unitid); return -EINVAL;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 0bfe5e434e6665b3590575ec3c5e4f86a1ce51c9 upstream.
We've had some sanity checks of the mixer unit descriptors but they are too loose and some corner cases are overlooked. Add more strict checks in uac_mixer_unit_get_channels() for avoiding possible OOB accesses by malformed descriptors.
This also changes the semantics of uac_mixer_unit_get_channels() slightly. Now it returns zero for the cases where the descriptor lacks of bmControls instead of -EINVAL. Then the caller side skips the mixer creation for such unit while it keeps parsing it. This corresponds to the case like Maya44.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -753,8 +753,9 @@ static int uac_mixer_unit_get_channels(s struct uac_mixer_unit_descriptor *desc) { int mu_channels; + void *c;
- if (desc->bLength < 11) + if (desc->bLength < sizeof(*desc)) return -EINVAL; if (!desc->bNrInPins) return -EINVAL; @@ -763,6 +764,8 @@ static int uac_mixer_unit_get_channels(s case UAC_VERSION_1: case UAC_VERSION_2: default: + if (desc->bLength < sizeof(*desc) + desc->bNrInPins + 1) + return 0; /* no bmControls -> skip */ mu_channels = uac_mixer_unit_bNrChannels(desc); break; case UAC_VERSION_3: @@ -772,7 +775,11 @@ static int uac_mixer_unit_get_channels(s }
if (!mu_channels) - return -EINVAL; + return 0; + + c = uac_mixer_unit_bmControls(desc, state->mixer->protocol); + if (c - (void *)desc + (mu_channels - 1) / 8 >= desc->bLength) + return 0; /* no bmControls -> skip */
return mu_channels; } @@ -944,7 +951,7 @@ static int check_input_term(struct mixer struct uac_mixer_unit_descriptor *d = p1;
err = uac_mixer_unit_get_channels(state, d); - if (err < 0) + if (err <= 0) return err;
term->channels = err; @@ -2118,7 +2125,7 @@ static int parse_audio_mixer_unit(struct if (err < 0) continue; /* no bmControls field (e.g. Maya44) -> ignore */ - if (desc->bLength <= 10 + input_pins) + if (!num_outs) continue; err = check_input_term(state, desc->baSourceID[pin], &iterm); if (err < 0)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hui Peng benquike@163.com
commit cbb2ebf70daf7f7d97d3811a2ff8e39655b8c184 upstream.
In `create_composite_quirk`, the terminating condition of for loops is `quirk->ifnum < 0`. So any composite quirks should end with `struct snd_usb_audio_quirk` object with ifnum < 0.
for (quirk = quirk_comp->data; quirk->ifnum >= 0; ++quirk) {
..... }
the data field of Bower's & Wilkins PX headphones usb device device quirks do not end with {.ifnum = -1}, wihch may result in out-of-bound read.
This Patch fix the bug by adding an ending quirk object.
Fixes: 240a8af929c7 ("ALSA: usb-audio: Add a quirck for B&W PX headphones") Signed-off-by: Hui Peng benquike@163.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/quirks-table.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -3326,6 +3326,9 @@ AU0828_DEVICE(0x2040, 0x7270, "Hauppauge } } }, + { + .ifnum = -1 + }, } } }, @@ -3369,6 +3372,9 @@ AU0828_DEVICE(0x2040, 0x7270, "Hauppauge } } }, + { + .ifnum = -1 + }, } } },
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 3e96d7280f16e2f787307f695a31296b9e4a1cd7 upstream.
There are a few places where we access the data without checking the actual object size from the USB audio descriptor. This may result in OOB access, as recently reported.
This patch addresses these missing checks. Most of added codes are simple bLength checks in the caller side. For the input and output terminal parsers, we put the length check in the parser functions. For the input terminal, a new argument is added to distinguish between UAC1 and the rest, as they treat different objects.
Reported-by: Mathias Payer mathias.payer@nebelwelt.net Reported-by: Hui Peng benquike@163.com Tested-by: Hui Peng benquike@163.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/card.c | 2 +- sound/usb/mixer.c | 4 ++++ sound/usb/stream.c | 36 +++++++++++++++++++++++++----------- 3 files changed, 30 insertions(+), 12 deletions(-)
--- a/sound/usb/card.c +++ b/sound/usb/card.c @@ -246,7 +246,7 @@ static int snd_usb_create_streams(struct h1 = snd_usb_find_csint_desc(host_iface->extra, host_iface->extralen, NULL, UAC_HEADER); - if (!h1) { + if (!h1 || h1->bLength < sizeof(*h1)) { dev_err(&dev->dev, "cannot find UAC_HEADER\n"); return -EINVAL; } --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2075,11 +2075,15 @@ static int parse_audio_input_terminal(st
if (state->mixer->protocol == UAC_VERSION_2) { struct uac2_input_terminal_descriptor *d_v2 = raw_desc; + if (d_v2->bLength < sizeof(*d_v2)) + return -EINVAL; control = UAC2_TE_CONNECTOR; term_id = d_v2->bTerminalID; bmctls = le16_to_cpu(d_v2->bmControls); } else if (state->mixer->protocol == UAC_VERSION_3) { struct uac3_input_terminal_descriptor *d_v3 = raw_desc; + if (d_v3->bLength < sizeof(*d_v3)) + return -EINVAL; control = UAC3_TE_INSERTION; term_id = d_v3->bTerminalID; bmctls = le32_to_cpu(d_v3->bmControls); --- a/sound/usb/stream.c +++ b/sound/usb/stream.c @@ -596,12 +596,8 @@ static int parse_uac_endpoint_attributes csep = snd_usb_find_desc(alts->extra, alts->extralen, NULL, USB_DT_CS_ENDPOINT);
if (!csep || csep->bLength < 7 || - csep->bDescriptorSubtype != UAC_EP_GENERAL) { - usb_audio_warn(chip, - "%u:%d : no or invalid class specific endpoint descriptor\n", - iface_no, altsd->bAlternateSetting); - return 0; - } + csep->bDescriptorSubtype != UAC_EP_GENERAL) + goto error;
if (protocol == UAC_VERSION_1) { attributes = csep->bmAttributes; @@ -609,6 +605,8 @@ static int parse_uac_endpoint_attributes struct uac2_iso_endpoint_descriptor *csep2 = (struct uac2_iso_endpoint_descriptor *) csep;
+ if (csep2->bLength < sizeof(*csep2)) + goto error; attributes = csep->bmAttributes & UAC_EP_CS_ATTR_FILL_MAX;
/* emulate the endpoint attributes of a v1 device */ @@ -618,12 +616,20 @@ static int parse_uac_endpoint_attributes struct uac3_iso_endpoint_descriptor *csep3 = (struct uac3_iso_endpoint_descriptor *) csep;
+ if (csep3->bLength < sizeof(*csep3)) + goto error; /* emulate the endpoint attributes of a v1 device */ if (le32_to_cpu(csep3->bmControls) & UAC2_CONTROL_PITCH) attributes |= UAC_EP_CS_ATTR_PITCH_CONTROL; }
return attributes; + + error: + usb_audio_warn(chip, + "%u:%d : no or invalid class specific endpoint descriptor\n", + iface_no, altsd->bAlternateSetting); + return 0; }
/* find an input terminal descriptor (either UAC1 or UAC2) with the given @@ -631,13 +637,17 @@ static int parse_uac_endpoint_attributes */ static void * snd_usb_find_input_terminal_descriptor(struct usb_host_interface *ctrl_iface, - int terminal_id) + int terminal_id, bool uac23) { struct uac2_input_terminal_descriptor *term = NULL; + size_t minlen = uac23 ? sizeof(struct uac2_input_terminal_descriptor) : + sizeof(struct uac_input_terminal_descriptor);
while ((term = snd_usb_find_csint_desc(ctrl_iface->extra, ctrl_iface->extralen, term, UAC_INPUT_TERMINAL))) { + if (term->bLength < minlen) + continue; if (term->bTerminalID == terminal_id) return term; } @@ -655,7 +665,8 @@ snd_usb_find_output_terminal_descriptor( while ((term = snd_usb_find_csint_desc(ctrl_iface->extra, ctrl_iface->extralen, term, UAC_OUTPUT_TERMINAL))) { - if (term->bTerminalID == terminal_id) + if (term->bLength >= sizeof(*term) && + term->bTerminalID == terminal_id) return term; }
@@ -729,7 +740,8 @@ snd_usb_get_audioformat_uac12(struct snd format = le16_to_cpu(as->wFormatTag); /* remember the format value */
iterm = snd_usb_find_input_terminal_descriptor(chip->ctrl_intf, - as->bTerminalLink); + as->bTerminalLink, + false); if (iterm) { num_channels = iterm->bNrChannels; chconfig = le16_to_cpu(iterm->wChannelConfig); @@ -764,7 +776,8 @@ snd_usb_get_audioformat_uac12(struct snd * to extract the clock */ input_term = snd_usb_find_input_terminal_descriptor(chip->ctrl_intf, - as->bTerminalLink); + as->bTerminalLink, + true); if (input_term) { clock = input_term->bCSourceID; if (!chconfig && (num_channels == input_term->bNrChannels)) @@ -998,7 +1011,8 @@ snd_usb_get_audioformat_uac3(struct snd_ * to extract the clock */ input_term = snd_usb_find_input_terminal_descriptor(chip->ctrl_intf, - as->bTerminalLink); + as->bTerminalLink, + true); if (input_term) { clock = input_term->bCSourceID; goto found_clock;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dennis Krein Dennis.Krein@netapp.com
commit eb4c2382272ae7ae5d81fdfa5b7a6c86146eaaa4 upstream.
The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results.
This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption.
Reported-by: Bart Van Assche bvanassche@acm.org Reported-by: Christoph Hellwig hch@infradead.org Reported-by: Sebastian Kuzminsky seb.kuzminsky@gmail.com Signed-off-by: Dennis Krein Dennis.Krein@netapp.com Signed-off-by: Paul E. McKenney paulmck@linux.ibm.com Tested-by: Dennis Krein Dennis.Krein@netapp.com Cc: stable@vger.kernel.org # 4.16.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/rcu/srcutree.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -451,10 +451,12 @@ static void srcu_gp_start(struct srcu_st
lockdep_assert_held(&ACCESS_PRIVATE(sp, lock)); WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)); + spin_lock_rcu_node(sdp); /* Interrupts already disabled. */ rcu_segcblist_advance(&sdp->srcu_cblist, rcu_seq_current(&sp->srcu_gp_seq)); (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, rcu_seq_snap(&sp->srcu_gp_seq)); + spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */ smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */ rcu_seq_start(&sp->srcu_gp_seq); state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit e121a833745b4708b660e3fe6776129c2956b041 upstream.
__device_release_driver() has to check dev->bus->need_parent_lock before dropping the parent lock and acquiring it again as it may attempt to drop a lock that hasn't been acquired or lock a device that shouldn't be locked and create a lock imbalance.
Fixes: 8c97a46af04b (driver core: hold dev's parent lock when needed) Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: stable stable@vger.kernel.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/base/dd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -933,11 +933,11 @@ static void __device_release_driver(stru
while (device_links_busy(dev)) { device_unlock(dev); - if (parent) + if (parent && dev->bus->need_parent_lock) device_unlock(parent);
device_links_unbind_consumers(dev); - if (parent) + if (parent && dev->bus->need_parent_lock) device_lock(parent);
device_lock(dev);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Wilcox willy@infradead.org
commit 1a80dade010c7a7f4885a4c4c2a7ac22cc7b34df upstream.
The failure path removes the allocated PIDs from the wrong namespace. This could lead to us inadvertently reusing PIDs in the leaf namespace and leaking PIDs in parent namespaces.
Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox willy@infradead.org Acked-by: "Eric W. Biederman" ebiederm@xmission.com Reviewed-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/pid.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/kernel/pid.c +++ b/kernel/pid.c @@ -233,8 +233,10 @@ out_unlock:
out_free: spin_lock_irq(&pidmap_lock); - while (++i <= ns->level) - idr_remove(&ns->idr, (pid->numbers + i)->nr); + while (++i <= ns->level) { + upid = pid->numbers + i; + idr_remove(&upid->ns->idr, upid->nr); + }
/* On failure to allocate the first pid, reset the state */ if (ns->pid_allocated == PIDNS_ADDING)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
commit 544fbd16a461a318cd80537d1331c0df5c6cf930 upstream.
rwb_enabled() can't be changed when there is any inflight IO.
wbt_disable_default() may set rwb->wb_normal as zero, however the blk_stat timer may still be pending, and the timer function will update wrb->wb_normal again.
This patch introduces blk_stat_deactivate() and applies it in wbt_disable_default(), then the following IO hang triggered when running parted & switching io scheduler can be fixed:
[ 369.937806] INFO: task parted:3645 blocked for more than 120 seconds. [ 369.938941] Not tainted 4.20.0-rc6-00284-g906c801e5248 #498 [ 369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.940768] parted D 0 3645 3239 0x00000000 [ 369.941500] Call Trace: [ 369.941874] ? __schedule+0x6d9/0x74c [ 369.942392] ? wbt_done+0x5e/0x5e [ 369.942864] ? wbt_cleanup_cb+0x16/0x16 [ 369.943404] ? wbt_done+0x5e/0x5e [ 369.943874] schedule+0x67/0x78 [ 369.944298] io_schedule+0x12/0x33 [ 369.944771] rq_qos_wait+0xb5/0x119 [ 369.945193] ? karma_partition+0x1c2/0x1c2 [ 369.945691] ? wbt_cleanup_cb+0x16/0x16 [ 369.946151] wbt_wait+0x85/0xb6 [ 369.946540] __rq_qos_throttle+0x23/0x2f [ 369.947014] blk_mq_make_request+0xe6/0x40a [ 369.947518] generic_make_request+0x192/0x2fe [ 369.948042] ? submit_bio+0x103/0x11f [ 369.948486] ? __radix_tree_lookup+0x35/0xb5 [ 369.949011] submit_bio+0x103/0x11f [ 369.949436] ? blkg_lookup_slowpath+0x25/0x44 [ 369.949962] submit_bio_wait+0x53/0x7f [ 369.950469] blkdev_issue_flush+0x8a/0xae [ 369.951032] blkdev_fsync+0x2f/0x3a [ 369.951502] do_fsync+0x2e/0x47 [ 369.951887] __x64_sys_fsync+0x10/0x13 [ 369.952374] do_syscall_64+0x89/0x149 [ 369.952819] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 369.953492] RIP: 0033:0x7f95a1e729d4 [ 369.953996] Code: Bad RIP value. [ 369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [ 369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4 [ 369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004 [ 369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0 [ 369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380 [ 369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008
Cc: stable@vger.kernel.org Cc: Paolo Valente paolo.valente@linaro.org Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-stat.h | 5 +++++ block/blk-wbt.c | 4 +++- 2 files changed, 8 insertions(+), 1 deletion(-)
--- a/block/blk-stat.h +++ b/block/blk-stat.h @@ -145,6 +145,11 @@ static inline void blk_stat_activate_nse mod_timer(&cb->timer, jiffies + nsecs_to_jiffies(nsecs)); }
+static inline void blk_stat_deactivate(struct blk_stat_callback *cb) +{ + del_timer_sync(&cb->timer); +} + /** * blk_stat_activate_msecs() - Gather block statistics during a time window in * milliseconds. --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -760,8 +760,10 @@ void wbt_disable_default(struct request_ if (!rqos) return; rwb = RQWB(rqos); - if (rwb->enable_state == WBT_STATE_ON_DEFAULT) + if (rwb->enable_state == WBT_STATE_ON_DEFAULT) { + blk_stat_deactivate(rwb->cb); rwb->wb_normal = 0; + } } EXPORT_SYMBOL_GPL(wbt_disable_default);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal damien.lemoal@wdc.com
commit 7211aef86f79583e59b88a0aba0bc830566f7e8e upstream.
For a zoned block device using mq-deadline, if a write request for a zone is received while another write was already dispatched for the same zone, dd_dispatch_request() will return NULL and the newly inserted write request is kept in the scheduler queue waiting for the ongoing zone write to complete. With this behavior, when no other request has been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to __blk_mq_free_request() call of blk_mq_sched_restart() to not run the queue when the already dispatched write request completes. The newly dispatched request stays stuck in the scheduler queue until eventually another request is submitted.
This problem does not affect SCSI disk as the SCSI stack handles queue restart on request completion. However, this problem is can be triggered the nullblk driver with zoned mode enabled.
Fix this by always requesting a queue restart in dd_dispatch_request() if no request was dispatched while WRITE requests are queued.
Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Add missing export of blk_mq_sched_restart()
Signed-off-by: Jens Axboe axboe@kernel.dk
--- block/blk-mq-sched.c | 3 ++- block/blk-mq-sched.h | 1 + block/mq-deadline.c | 12 +++++++++++- 3 files changed, 14 insertions(+), 2 deletions(-)
--- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -54,13 +54,14 @@ void blk_mq_sched_assign_ioc(struct requ * Mark a hardware queue as needing a restart. For shared queues, maintain * a count of how many hardware queues are marked for restart. */ -static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx) +void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx) { if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) return;
set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } +EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) { --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -15,6 +15,7 @@ bool blk_mq_sched_try_merge(struct reque struct request **merged_request); bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio); bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); +void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_insert_request(struct request *rq, bool at_head, --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -373,9 +373,16 @@ done:
/* * One confusing aspect here is that we get called for a specific - * hardware queue, but we return a request that may not be for a + * hardware queue, but we may return a request that is for a * different hardware queue. This is because mq-deadline has shared * state for all hardware queues, in terms of sorting, FIFOs, etc. + * + * For a zoned block device, __dd_dispatch_request() may return NULL + * if all the queued write requests are directed at zones that are already + * locked due to on-going write requests. In this case, make sure to mark + * the queue as needing a restart to ensure that the queue is run again + * and the pending writes dispatched once the target zones for the ongoing + * write requests are unlocked in dd_finish_request(). */ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx) { @@ -384,6 +391,9 @@ static struct request *dd_dispatch_reque
spin_lock(&dd->lock); rq = __dd_dispatch_request(dd); + if (!rq && blk_queue_is_zoned(hctx->queue) && + !list_empty(&dd->fifo_list[WRITE])) + blk_mq_sched_mark_restart_hctx(hctx); spin_unlock(&dd->lock);
return rq;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaegeuk Kim jaegeuk@kernel.org
commit c6d6e9b0f6b4201c77f2cea3964dd122697e3543 upstream.
Update DM to set the bdi's io_pages. This fixes reads to be capped at the device's max request size (even if user's read IO exceeds the established readahead setting).
Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting") Cc: stable@vger.kernel.org Reviewed-by: Jens Axboe axboe@kernel.dk Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1927,6 +1927,9 @@ void dm_table_set_restrictions(struct dm */ if (blk_queue_is_zoned(q)) blk_revalidate_disk_zones(t->md->disk); + + /* Allow reads to exceed readahead limits */ + q->backing_dev_info->io_pages = limits->max_sectors >> (PAGE_SHIFT - 9); }
unsigned int dm_table_get_num_targets(struct dm_table *t)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit b982896cdb6e6a6b89d86dfb39df489d9df51e14 upstream.
If allocation fails on last elements of array need to free already allocated elements.
v2: just move existing out_rsbtbl label to right place
Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: stable@kernel.org # 3.6
Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/dlm/lockspace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -680,11 +680,11 @@ static int new_lockspace(const char *nam kfree(ls->ls_recover_buf); out_lkbidr: idr_destroy(&ls->ls_lkbidr); + out_rsbtbl: for (i = 0; i < DLM_REMOVE_NAMES_MAX; i++) { if (ls->ls_remove_names[i]) kfree(ls->ls_remove_names[i]); } - out_rsbtbl: vfree(ls->ls_rsbtbl); out_lsfree: if (do_unreg)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit 23851e978f31eda8b2d01bd410d3026659ca06c7 upstream.
Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr") Cc: stable@kernel.org # 3.1
Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/dlm/lock.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1209,6 +1209,7 @@ static int create_lkb(struct dlm_ls *ls,
if (rv < 0) { log_error(ls, "create_lkb idr error %d", rv); + dlm_free_lkb(lkb); return rv; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit c0174726c3976e67da8649ac62cae43220ae173a upstream.
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages") Cc: stable@kernel.org # 3.5
Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/dlm/lock.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -4180,6 +4180,7 @@ static int receive_convert(struct dlm_ls (unsigned long long)lkb->lkb_recover_seq, ms->m_header.h_nodeid, ms->m_lkid); error = -ENOENT; + dlm_put_lkb(lkb); goto fail; }
@@ -4233,6 +4234,7 @@ static int receive_unlock(struct dlm_ls lkb->lkb_id, lkb->lkb_remid, ms->m_header.h_nodeid, ms->m_lkid); error = -ENOENT; + dlm_put_lkb(lkb); goto fail; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream.
According to comment in dlm_user_request() ua should be freed in dlm_free_lkb() after successful attach to lkb.
However ua is attached to lkb not in set_lock_args() but later, inside request_lock().
Fixes 597d0cae0f99 ("[DLM] dlm: user locks") Cc: stable@kernel.org # 2.6.19
Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/dlm/lock.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -5795,20 +5795,20 @@ int dlm_user_request(struct dlm_ls *ls, goto out; } } - - /* After ua is attached to lkb it will be freed by dlm_free_lkb(). - When DLM_IFL_USER is set, the dlm knows that this is a userspace - lock and that lkb_astparam is the dlm_user_args structure. */ - error = set_lock_args(mode, &ua->lksb, flags, namelen, timeout_cs, fake_astfn, ua, fake_bastfn, &args); - lkb->lkb_flags |= DLM_IFL_USER; - if (error) { + kfree(ua->lksb.sb_lvbptr); + ua->lksb.sb_lvbptr = NULL; + kfree(ua); __put_lkb(ls, lkb); goto out; }
+ /* After ua is attached to lkb it will be freed by dlm_free_lkb(). + When DLM_IFL_USER is set, the dlm knows that this is a userspace + lock and that lkb_astparam is the dlm_user_args structure. */ + lkb->lkb_flags |= DLM_IFL_USER; error = request_lock(ls, lkb, name, namelen, &args);
switch (error) {
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit 6ff9b09e00a441599f3aacdf577254455a048bc9 upstream.
In gfs2_create_inode, after setting and releasing the acl / default_acl, the acl / default_acl pointers are not set to NULL as they should be. In that state, when the function reaches label fail_free_acls, gfs2_create_inode will try to release the same acls again.
Fix that by setting the pointers to NULL after releasing the acls. Slightly simplify the logic. Also, posix_acl_release checks for NULL already, so there is no need to duplicate those checks here.
Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure") Reported-by: Pan Bian bianpan2016@163.com Cc: Christoph Hellwig hch@lst.de Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/gfs2/inode.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
--- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -744,17 +744,19 @@ static int gfs2_create_inode(struct inod the gfs2 structures. */ if (default_acl) { error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT); + if (error) + goto fail_gunlock3; posix_acl_release(default_acl); + default_acl = NULL; } if (acl) { - if (!error) - error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); + error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); + if (error) + goto fail_gunlock3; posix_acl_release(acl); + acl = NULL; }
- if (error) - goto fail_gunlock3; - error = security_inode_init_security(&ip->i_inode, &dip->i_inode, name, &gfs2_initxattrs, NULL); if (error) @@ -789,10 +791,8 @@ fail_free_inode: } gfs2_rsqa_delete(ip, NULL); fail_free_acls: - if (default_acl) - posix_acl_release(default_acl); - if (acl) - posix_acl_release(acl); + posix_acl_release(default_acl); + posix_acl_release(acl); fail_gunlock: gfs2_dir_no_add(&da); gfs2_glock_dq_uninit(ghs);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Gruenbacher agruenba@redhat.com
commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop.
Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/gfs2/rgrp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1780,9 +1780,9 @@ static int gfs2_rbm_find(struct gfs2_rbm goto next_iter; } if (ret == -E2BIG) { + n += rbm->bii - initial_bii; rbm->bii = 0; rbm->offset = 0; - n += (rbm->bii - initial_bii); goto res_covered_end_of_rgrp; } return ret;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larry Finger Larry.Finger@lwfinger.net
commit 8ea3819c0bbef57a51d8abe579e211033e861677 upstream.
The cordic routine for calculating sines and cosines that was added in commit 6f98e62a9f1b ("b43: update cordic code to match current specs") contains an error whereby a quantity declared u32 can in fact go negative.
This problem was detected by Priit Laes who is switching b43 to use the routine in the library functions of the kernel.
Fixes: 986504540306 ("b43: make cordic common (LP-PHY and N-PHY need it)") Reported-by: Priit Laes plaes@plaes.org Cc: Rafał Miłecki zajec5@gmail.com Cc: Stable stable@vger.kernel.org # 2.6.34 Signed-off-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Priit Laes plaes@plaes.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/broadcom/b43/phy_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/broadcom/b43/phy_common.c +++ b/drivers/net/wireless/broadcom/b43/phy_common.c @@ -616,7 +616,7 @@ struct b43_c32 b43_cordic(int theta) u8 i; s32 tmp; s8 signx = 1; - u32 angle = 0; + s32 angle = 0; struct b43_c32 ret = { .i = 39797, .q = 0, };
while (theta > (180 << 16))
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ondrej Mosnacek omosnace@redhat.com
commit 5df275cd4cf51c86d49009f1397132f284ba515e upstream.
Do the LE conversions before doing the Infiniband-related range checks. The incorrect checks are otherwise causing a failure to load any policy with an ibendportcon rule on BE systems. This can be reproduced by running (on e.g. ppc64):
cat >my_module.cil <<EOF (type test_ibendport_t) (roletype object_r test_ibendport_t) (ibendportcon mlx4_0 1 (system_u object_r test_ibendport_t ((s0) (s0)))) EOF semodule -i my_module.cil
Also, fix loading/storing the 64-bit subnet prefix for OCON_IBPKEY to use a correctly aligned buffer.
Finally, do not use the 'nodebuf' (u32) buffer where 'buf' (__le32) should be used instead.
Tested internally on a ppc64 machine with a RHEL 7 kernel with this patch applied.
Cc: Daniel Jurgens danielj@mellanox.com Cc: Eli Cohen eli@mellanox.com Cc: James Morris jmorris@namei.org Cc: Doug Ledford dledford@redhat.com Cc: stable@vger.kernel.org # 4.13+ Fixes: a806f7a1616f ("selinux: Create policydb version for Infiniband support") Signed-off-by: Ondrej Mosnacek omosnace@redhat.com Acked-by: Stephen Smalley sds@tycho.nsa.gov Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/selinux/ss/policydb.c | 51 ++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 15 deletions(-)
--- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -2108,6 +2108,7 @@ static int ocontext_read(struct policydb { int i, j, rc; u32 nel, len; + __be64 prefixbuf[1]; __le32 buf[3]; struct ocontext *l, *c; u32 nodebuf[8]; @@ -2217,21 +2218,30 @@ static int ocontext_read(struct policydb goto out; break; } - case OCON_IBPKEY: - rc = next_entry(nodebuf, fp, sizeof(u32) * 4); + case OCON_IBPKEY: { + u32 pkey_lo, pkey_hi; + + rc = next_entry(prefixbuf, fp, sizeof(u64)); + if (rc) + goto out; + + /* we need to have subnet_prefix in CPU order */ + c->u.ibpkey.subnet_prefix = be64_to_cpu(prefixbuf[0]); + + rc = next_entry(buf, fp, sizeof(u32) * 2); if (rc) goto out;
- c->u.ibpkey.subnet_prefix = be64_to_cpu(*((__be64 *)nodebuf)); + pkey_lo = le32_to_cpu(buf[0]); + pkey_hi = le32_to_cpu(buf[1]);
- if (nodebuf[2] > 0xffff || - nodebuf[3] > 0xffff) { + if (pkey_lo > U16_MAX || pkey_hi > U16_MAX) { rc = -EINVAL; goto out; }
- c->u.ibpkey.low_pkey = le32_to_cpu(nodebuf[2]); - c->u.ibpkey.high_pkey = le32_to_cpu(nodebuf[3]); + c->u.ibpkey.low_pkey = pkey_lo; + c->u.ibpkey.high_pkey = pkey_hi;
rc = context_read_and_validate(&c->context[0], p, @@ -2239,7 +2249,10 @@ static int ocontext_read(struct policydb if (rc) goto out; break; - case OCON_IBENDPORT: + } + case OCON_IBENDPORT: { + u32 port; + rc = next_entry(buf, fp, sizeof(u32) * 2); if (rc) goto out; @@ -2249,12 +2262,13 @@ static int ocontext_read(struct policydb if (rc) goto out;
- if (buf[1] > 0xff || buf[1] == 0) { + port = le32_to_cpu(buf[1]); + if (port > U8_MAX || port == 0) { rc = -EINVAL; goto out; }
- c->u.ibendport.port = le32_to_cpu(buf[1]); + c->u.ibendport.port = port;
rc = context_read_and_validate(&c->context[0], p, @@ -2262,7 +2276,8 @@ static int ocontext_read(struct policydb if (rc) goto out; break; - } + } /* end case */ + } /* end switch */ } } rc = 0; @@ -3105,6 +3120,7 @@ static int ocontext_write(struct policyd { unsigned int i, j, rc; size_t nel, len; + __be64 prefixbuf[1]; __le32 buf[3]; u32 nodebuf[8]; struct ocontext *c; @@ -3192,12 +3208,17 @@ static int ocontext_write(struct policyd return rc; break; case OCON_IBPKEY: - *((__be64 *)nodebuf) = cpu_to_be64(c->u.ibpkey.subnet_prefix); + /* subnet_prefix is in CPU order */ + prefixbuf[0] = cpu_to_be64(c->u.ibpkey.subnet_prefix);
- nodebuf[2] = cpu_to_le32(c->u.ibpkey.low_pkey); - nodebuf[3] = cpu_to_le32(c->u.ibpkey.high_pkey); + rc = put_entry(prefixbuf, sizeof(u64), 1, fp); + if (rc) + return rc; + + buf[0] = cpu_to_le32(c->u.ibpkey.low_pkey); + buf[1] = cpu_to_le32(c->u.ibpkey.high_pkey);
- rc = put_entry(nodebuf, sizeof(u32), 4, fp); + rc = put_entry(buf, sizeof(u32), 2, fp); if (rc) return rc; rc = context_write(p, &c->context[0], fp);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jarkko Nikula jarkko.nikula@linux.intel.com
commit c5eb1190074cfb14c5d9cac692f1912eecf1a5e4 upstream.
a9c8088c7988 ("i2c: i801: Don't restore config registers on runtime PM") nullified the runtime PM suspend/resume callback pointers while keeping the runtime PM enabled.
This caused the SMBus PCI device to stay in D0 with /sys/devices/.../power/runtime_status showing "error" when the runtime PM framework attempted to autosuspend the device. This is due to PCI bus runtime PM, which checks for driver runtime PM callbacks and returns -ENOSYS if they are not set.
Since i2c-i801.c doesn't need to do anything device-specific for runtime PM, Jean Delvare proposed this be fixed in the PCI core rather than adding dummy runtime PM callback functions in the PCI drivers.
Change pci_pm_runtime_suspend()/pci_pm_runtime_resume() so they allow changing the PCI device power state during runtime PM transitions even if the driver supplies no runtime PM callbacks.
This fixes the runtime PM regression on i2c-i801.c.
It is not obvious why the code previously required the runtime PM callbacks. The test has been there since the code was introduced by 6cbf82148ff2 ("PCI PM: Run-time callbacks for PCI bus type").
On the other hand, a similar change was done to generic runtime PM callbacks in 05aa55dddb9e ("PM / Runtime: Lenient generic runtime pm callbacks").
Fixes: a9c8088c7988 ("i2c: i801: Don't restore config registers on runtime PM") Reported-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Jean Delvare jdelvare@suse.de Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci-driver.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1251,30 +1251,29 @@ static int pci_pm_runtime_suspend(struct return 0; }
- if (!pm || !pm->runtime_suspend) - return -ENOSYS; - pci_dev->state_saved = false; - error = pm->runtime_suspend(dev); - if (error) { + if (pm && pm->runtime_suspend) { + error = pm->runtime_suspend(dev); /* * -EBUSY and -EAGAIN is used to request the runtime PM core * to schedule a new suspend, so log the event only with debug * log level. */ - if (error == -EBUSY || error == -EAGAIN) + if (error == -EBUSY || error == -EAGAIN) { dev_dbg(dev, "can't suspend now (%pf returned %d)\n", pm->runtime_suspend, error); - else + return error; + } else if (error) { dev_err(dev, "can't suspend (%pf returned %d)\n", pm->runtime_suspend, error); - - return error; + return error; + } }
pci_fixup_device(pci_fixup_suspend, pci_dev);
- if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0 + if (pm && pm->runtime_suspend + && !pci_dev->state_saved && pci_dev->current_state != PCI_D0 && pci_dev->current_state != PCI_UNKNOWN) { WARN_ONCE(pci_dev->current_state != prev, "PCI PM: State of device not saved by %pF\n", @@ -1292,7 +1291,7 @@ static int pci_pm_runtime_suspend(struct
static int pci_pm_runtime_resume(struct device *dev) { - int rc; + int rc = 0; struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
@@ -1306,14 +1305,12 @@ static int pci_pm_runtime_resume(struct if (!pci_dev->driver) return 0;
- if (!pm || !pm->runtime_resume) - return -ENOSYS; - pci_fixup_device(pci_fixup_resume_early, pci_dev); pci_enable_wake(pci_dev, PCI_D0, false); pci_fixup_device(pci_fixup_resume, pci_dev);
- rc = pm->runtime_resume(dev); + if (pm && pm->runtime_resume) + rc = pm->runtime_resume(dev);
pci_dev->runtime_d3cold = false;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Coddington bcodding@redhat.com
commit b8eee0e90f9797b747113638bc75e739b192ad38 upstream.
Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks") specified that the l_pid returned for F_GETLK on a local file that has a remote lock should be the pid of the lock manager process. That commit, while updating other filesystems, failed to update lockd, such that locks created by lockd had their fl_pid set to that of the remote process holding the lock. Fix that here to be the pid of lockd.
Also, fix the client case so that the returned lock pid is negative, which indicates a remote lock on a remote file.
Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...") Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/lockd/clntproc.c | 2 +- fs/lockd/xdr.c | 4 ++-- fs/lockd/xdr4.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
--- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -442,7 +442,7 @@ nlmclnt_test(struct nlm_rqst *req, struc fl->fl_start = req->a_res.lock.fl.fl_start; fl->fl_end = req->a_res.lock.fl.fl_end; fl->fl_type = req->a_res.lock.fl.fl_type; - fl->fl_pid = 0; + fl->fl_pid = -req->a_res.lock.fl.fl_pid; break; default: status = nlm_stat_to_errno(req->a_res.status); --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -127,7 +127,7 @@ nlm_decode_lock(__be32 *p, struct nlm_lo
locks_init_lock(fl); fl->fl_owner = current->files; - fl->fl_pid = (pid_t)lock->svid; + fl->fl_pid = current->tgid; fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; /* as good as anything else */ start = ntohl(*p++); @@ -269,7 +269,7 @@ nlmsvc_decode_shareargs(struct svc_rqst memset(lock, 0, sizeof(*lock)); locks_init_lock(&lock->fl); lock->svid = ~(u32) 0; - lock->fl.fl_pid = (pid_t)lock->svid; + lock->fl.fl_pid = current->tgid;
if (!(p = nlm_decode_cookie(p, &argp->cookie)) || !(p = xdr_decode_string_inplace(p, &lock->caller, --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -119,7 +119,7 @@ nlm4_decode_lock(__be32 *p, struct nlm_l
locks_init_lock(fl); fl->fl_owner = current->files; - fl->fl_pid = (pid_t)lock->svid; + fl->fl_pid = current->tgid; fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; /* as good as anything else */ p = xdr_decode_hyper(p, &start); @@ -266,7 +266,7 @@ nlm4svc_decode_shareargs(struct svc_rqst memset(lock, 0, sizeof(*lock)); locks_init_lock(&lock->fl); lock->svid = ~(u32) 0; - lock->fl.fl_pid = (pid_t)lock->svid; + lock->fl.fl_pid = current->tgid;
if (!(p = nlm4_decode_cookie(p, &argp->cookie)) || !(p = xdr_decode_string_inplace(p, &lock->caller,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
commit e2f34e26710bfaa545a9d9cd0c70137406401467 upstream.
While chasing yet another set of DMAR fault reports, I noticed that the frwr recycler conflates whether or not an MR has been DMA unmapped with frwr->fr_state. Actually the two have only an indirect relationship. It's in fact impossible to guess reliably whether the MR has been DMA unmapped based on its fr_state field, especially as the surrounding code and its assumptions have changed over time.
A better approach is to track the DMA mapping status explicitly so that the recycler is less brittle to unexpected situations, and attempts to DMA-unmap a second time are prevented.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/xprtrdma/frwr_ops.c | 6 ++++-- net/sunrpc/xprtrdma/verbs.c | 9 ++++++--- 2 files changed, 10 insertions(+), 5 deletions(-)
--- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -117,15 +117,15 @@ static void frwr_mr_recycle_worker(struct work_struct *work) { struct rpcrdma_mr *mr = container_of(work, struct rpcrdma_mr, mr_recycle); - enum rpcrdma_frwr_state state = mr->frwr.fr_state; struct rpcrdma_xprt *r_xprt = mr->mr_xprt;
trace_xprtrdma_mr_recycle(mr);
- if (state != FRWR_FLUSHED_LI) { + if (mr->mr_dir != DMA_NONE) { trace_xprtrdma_mr_unmap(mr); ib_dma_unmap_sg(r_xprt->rx_ia.ri_device, mr->mr_sg, mr->mr_nents, mr->mr_dir); + mr->mr_dir = DMA_NONE; }
spin_lock(&r_xprt->rx_buf.rb_mrlock); @@ -150,6 +150,8 @@ frwr_op_init_mr(struct rpcrdma_ia *ia, s if (!mr->mr_sg) goto out_list_err;
+ frwr->fr_state = FRWR_IS_INVALID; + mr->mr_dir = DMA_NONE; INIT_LIST_HEAD(&mr->mr_list); INIT_WORK(&mr->mr_recycle, frwr_mr_recycle_worker); sg_init_table(mr->mr_sg, depth); --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1329,9 +1329,12 @@ rpcrdma_mr_unmap_and_put(struct rpcrdma_ { struct rpcrdma_xprt *r_xprt = mr->mr_xprt;
- trace_xprtrdma_mr_unmap(mr); - ib_dma_unmap_sg(r_xprt->rx_ia.ri_device, - mr->mr_sg, mr->mr_nents, mr->mr_dir); + if (mr->mr_dir != DMA_NONE) { + trace_xprtrdma_mr_unmap(mr); + ib_dma_unmap_sg(r_xprt->rx_ia.ri_device, + mr->mr_sg, mr->mr_nents, mr->mr_dir); + mr->mr_dir = DMA_NONE; + } __rpcrdma_mr_put(&r_xprt->rx_buf, mr); }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
commit fdec6114ee1f0f43b1ad081ad8d46b23ba126d70 upstream.
Zero-length writes are legal; from 5661 section 18.32.3: "If the count is zero, the WRITE will succeed and return a count of zero subject to permissions checking".
This check is unnecessary and is causing zero-length reads to return EINVAL.
Cc: stable@vger.kernel.org Fixes: 3fd9557aec91 "NFSD: Refactor the generic write vector fill helper" Cc: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfsd/nfs4proc.c | 2 -- 1 file changed, 2 deletions(-)
--- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1016,8 +1016,6 @@ nfsd4_write(struct svc_rqst *rqstp, stru
nvecs = svc_fill_write_vector(rqstp, write->wr_pagelist, &write->wr_head, write->wr_buflen); - if (!nvecs) - return nfserr_io; WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
status = nfsd_vfs_write(rqstp, &cstate->current_fh, filp,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
It breaks the powerpc build, so drop it from the tree until a fix goes upstream.
Reported-by: Guenter Roeck linux@roeck-us.net Cc: Breno Leitao leitao@debian.org Cc: Michal Suchánek msuchanek@suse.de Cc: Michael Ellerman mpe@ellerman.id.au Cc: Christoph Biedl linux-kernel.bfrz@manchmal.in-ulm.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/signal_32.c | 18 +++++------------- arch/powerpc/kernel/signal_64.c | 20 ++++---------------- 2 files changed, 9 insertions(+), 29 deletions(-)
--- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1140,11 +1140,11 @@ SYSCALL_DEFINE0(rt_sigreturn) { struct rt_sigframe __user *rt_sf; struct pt_regs *regs = current_pt_regs(); - int tm_restore = 0; #ifdef CONFIG_PPC_TRANSACTIONAL_MEM struct ucontext __user *uc_transact; unsigned long msr_hi; unsigned long tmp; + int tm_restore = 0; #endif /* Always make any pending restarted system calls return -EINTR */ current->restart_block.fn = do_no_restart_syscall; @@ -1192,19 +1192,11 @@ SYSCALL_DEFINE0(rt_sigreturn) goto bad; } } - if (!tm_restore) { - /* - * Unset regs->msr because ucontext MSR TS is not - * set, and recheckpoint was not called. This avoid - * hitting a TM Bad thing at RFID - */ - regs->msr &= ~MSR_TS_MASK; - } - /* Fall through, for non-TM restore */ -#endif if (!tm_restore) - if (do_setcontext(&rt_sf->uc, regs, 1)) - goto bad; + /* Fall through, for non-TM restore */ +#endif + if (do_setcontext(&rt_sf->uc, regs, 1)) + goto bad;
/* * It's not clear whether or why it is desirable to save the --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -740,23 +740,11 @@ SYSCALL_DEFINE0(rt_sigreturn) &uc_transact->uc_mcontext)) goto badframe; } -#endif + else /* Fall through, for non-TM restore */ - if (!MSR_TM_ACTIVE(msr)) { - /* - * Unset MSR[TS] on the thread regs since MSR from user - * context does not have MSR active, and recheckpoint was - * not called since restore_tm_sigcontexts() was not called - * also. - * - * If not unsetting it, the code can RFID to userspace with - * MSR[TS] set, but without CPU in the proper state, - * causing a TM bad thing. - */ - current->thread.regs->msr &= ~MSR_TS_MASK; - if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) - goto badframe; - } +#endif + if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) + goto badframe;
if (restore_altstack(&uc->uc_stack)) goto badframe;
Greg Kroah-Hartman wrote...
4.20-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
It breaks the powerpc build, so drop it from the tree until a fix goes upstream.
Is this necessary on 4.20? The build failures I reported were on 4.19 only. The 4.20.2-rc1 kernel for my Powermac G5 builds with and without that patch, both boot fine, no visible differences. Again however, Breno is authoritative here.
Aside, I also checked 4.19.15-rc1, builds and runs without any noticeable problems.
Christoph
On Sat, Jan 12, 2019 at 10:35:59PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.20-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
It breaks the powerpc build, so drop it from the tree until a fix goes upstream.
Is this necessary on 4.20? The build failures I reported were on 4.19 only. The 4.20.2-rc1 kernel for my Powermac G5 builds with and without that patch, both boot fine, no visible differences. Again however, Breno is authoritative here.
If there's no difference on 4.20, then maybe it's not needed there? :)
And yes, I would like confirmation from Breno as well.
thanks,
greg k-h
Greg Kroah-Hartman gregkh@linuxfoundation.org writes:
On Sat, Jan 12, 2019 at 10:35:59PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.20-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
It breaks the powerpc build, so drop it from the tree until a fix goes upstream.
Is this necessary on 4.20? The build failures I reported were on 4.19 only. The 4.20.2-rc1 kernel for my Powermac G5 builds with and without that patch, both boot fine, no visible differences. Again however, Breno is authoritative here.
If there's no difference on 4.20, then maybe it's not needed there? :)
And yes, I would like confirmation from Breno as well.
You shouldn't need the revert on 4.20.
In 4.20 we changed how MSR_TM_ACTIVE() is defined, which means commit 6f5b9f018f4c ("powerpc/tm: Unset MSR[TS] if not recheckpointing") should build fine on 4.20.
For 4.19 and earlier MSR_TM_ACTIVE() is different and that's what's causing the build error.
I have a fix queued in my fixes tree and will push it to Linus in the next few days.
cheers
On Mon, Jan 14, 2019 at 11:00:25AM +1100, Michael Ellerman wrote:
Greg Kroah-Hartman gregkh@linuxfoundation.org writes:
On Sat, Jan 12, 2019 at 10:35:59PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.20-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
It breaks the powerpc build, so drop it from the tree until a fix goes upstream.
Is this necessary on 4.20? The build failures I reported were on 4.19 only. The 4.20.2-rc1 kernel for my Powermac G5 builds with and without that patch, both boot fine, no visible differences. Again however, Breno is authoritative here.
If there's no difference on 4.20, then maybe it's not needed there? :)
And yes, I would like confirmation from Breno as well.
You shouldn't need the revert on 4.20.
In 4.20 we changed how MSR_TM_ACTIVE() is defined, which means commit 6f5b9f018f4c ("powerpc/tm: Unset MSR[TS] if not recheckpointing") should build fine on 4.20.
For 4.19 and earlier MSR_TM_ACTIVE() is different and that's what's causing the build error.
I have a fix queued in my fixes tree and will push it to Linus in the next few days.
Ok, added back to 4.20.y
thanks,
greg k-h
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream.
On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr.
At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed.
This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid.
The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also.
Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON().
kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule.
With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state.
Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code.
It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set.
The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code.
Changes from v2: * Run the critical section with preempt_disable.
Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable@vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/signal_32.c | 20 +++++++++++++++++- arch/powerpc/kernel/signal_64.c | 44 +++++++++++++++++++++++++++------------- 2 files changed, 49 insertions(+), 15 deletions(-)
--- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct /* If TM bits are set to the reserved value, it's an invalid context */ if (MSR_TM_RESV(msr_hi)) return 1; - /* Pull in the MSR TM bits from the user context */ + + /* + * Disabling preemption, since it is unsafe to be preempted + * with MSR[TS] set without recheckpointing. + */ + preempt_disable(); + + /* + * CAUTION: + * After regs->MSR[TS] being updated, make sure that get_user(), + * put_user() or similar functions are *not* called. These + * functions can generate page faults which will cause the process + * to be de-scheduled with MSR[TS] set but without calling + * tm_recheckpoint(). This can cause a bug. + * + * Pull in the MSR TM bits from the user context + */ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK); /* Now, recheckpoint. This loads up all of the checkpointed (older) * registers, including FP and V[S]Rs. After recheckpointing, the @@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct } #endif
+ preempt_enable(); + return 0; } #endif --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struc if (MSR_TM_RESV(msr)) return -EINVAL;
- /* pull in MSR TS bits from user context */ - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); - - /* - * Ensure that TM is enabled in regs->msr before we leave the signal - * handler. It could be the case that (a) user disabled the TM bit - * through the manipulation of the MSR bits in uc_mcontext or (b) the - * TM bit was disabled because a sufficient number of context switches - * happened whilst in the signal handler and load_tm overflowed, - * disabling the TM bit. In either case we can end up with an illegal - * TM state leading to a TM Bad Thing when we return to userspace. - */ - regs->msr |= MSR_TM; - /* pull in MSR LE from user context */ regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
@@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struc tm_enable(); /* Make sure the transaction is marked as failed */ tsk->thread.tm_texasr |= TEXASR_FS; + + /* + * Disabling preemption, since it is unsafe to be preempted + * with MSR[TS] set without recheckpointing. + */ + preempt_disable(); + + /* pull in MSR TS bits from user context */ + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); + + /* + * Ensure that TM is enabled in regs->msr before we leave the signal + * handler. It could be the case that (a) user disabled the TM bit + * through the manipulation of the MSR bits in uc_mcontext or (b) the + * TM bit was disabled because a sufficient number of context switches + * happened whilst in the signal handler and load_tm overflowed, + * disabling the TM bit. In either case we can end up with an illegal + * TM state leading to a TM Bad Thing when we return to userspace. + * + * CAUTION: + * After regs->MSR[TS] being updated, make sure that get_user(), + * put_user() or similar functions are *not* called. These + * functions can generate page faults which will cause the process + * to be de-scheduled with MSR[TS] set but without calling + * tm_recheckpoint(). This can cause a bug. + */ + regs->msr |= MSR_TM; + /* This loads the checkpointed FP/VEC state, if used */ tm_recheckpoint(&tsk->thread);
@@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struc regs->msr |= MSR_VEC; }
+ preempt_enable(); + return err; } #endif
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Evan Green evgreen@chromium.org
commit db23d88756abd38e0995ea8449d0025b3de4b26b upstream.
adc5_get_dt_data uses a local, prop, feeds it to adc5_get_dt_channel_data, and then puts the result into adc->chan_props. The problem is adc5_get_dt_channel_data may not initialize that structure fully, so a garbage value is used for prescale if the optional "qcom,pre-scaling" is not defined in DT. adc5_read_raw then uses this as an array index, generating a crash that looks like this:
[ 6.683186] Unable to handle kernel paging request at virtual address ffffff90e78c7964 Call trace: qcom_vadc_scale_code_voltage_factor+0x74/0x104 qcom_vadc_scale_hw_calib_die_temp+0x20/0x60 qcom_adc5_hw_scale+0x78/0xa4 adc5_read_raw+0x3d0/0x65c iio_channel_read+0x240/0x30c iio_read_channel_processed+0x10c/0x150 qpnp_tm_get_temp+0xc0/0x40c of_thermal_get_temp+0x7c/0x98 thermal_zone_get_temp+0xac/0xd8 thermal_zone_device_update+0xc0/0x38c qpnp_tm_probe+0x624/0x81c platform_drv_probe+0xe4/0x11c really_probe+0x188/0x3fc driver_probe_device+0xb8/0x188 __device_attach_driver+0x114/0x180 bus_for_each_drv+0xd8/0x118 __device_attach+0x180/0x27c device_initial_probe+0x20/0x2c bus_probe_device+0x78/0x124 deferred_probe_work_func+0xfc/0x138 process_one_work+0x3d8/0x8b0 process_scheduled_works+0x48/0x6c worker_thread+0x488/0x7cc kthread+0x24c/0x264 ret_from_fork+0x10/0x18
Unfortunately, when I went to add the initializer for this and tried to boot it, my machine shut down immediately, complaining that it was hotter than the sun. It appears that adc5_chans_pmic and adc5_chans_rev2 were initializing prescale_index as if it were directly a divisor, rather than the index into adc5_prescale_ratios that it is.
Fix the uninitialized value, and change the static initialization to use indices into adc5_prescale_ratios.
Signed-off-by: Evan Green evgreen@chromium.org Reviewed-by: Matthias Kaehlcke mka@chromium.org Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iio/adc/qcom-spmi-adc5.c | 58 ++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 27 deletions(-)
--- a/drivers/iio/adc/qcom-spmi-adc5.c +++ b/drivers/iio/adc/qcom-spmi-adc5.c @@ -423,6 +423,7 @@ struct adc5_channels { enum vadc_scale_fn_type scale_fn_type; };
+/* In these definitions, _pre refers to an index into adc5_prescale_ratios. */ #define ADC5_CHAN(_dname, _type, _mask, _pre, _scale) \ { \ .datasheet_name = _dname, \ @@ -443,63 +444,63 @@ struct adc5_channels { _pre, _scale) \
static const struct adc5_channels adc5_chans_pmic[ADC5_MAX_CHANNEL] = { - [ADC5_REF_GND] = ADC5_CHAN_VOLT("ref_gnd", 1, + [ADC5_REF_GND] = ADC5_CHAN_VOLT("ref_gnd", 0, SCALE_HW_CALIB_DEFAULT) - [ADC5_1P25VREF] = ADC5_CHAN_VOLT("vref_1p25", 1, + [ADC5_1P25VREF] = ADC5_CHAN_VOLT("vref_1p25", 0, SCALE_HW_CALIB_DEFAULT) - [ADC5_VPH_PWR] = ADC5_CHAN_VOLT("vph_pwr", 3, + [ADC5_VPH_PWR] = ADC5_CHAN_VOLT("vph_pwr", 1, SCALE_HW_CALIB_DEFAULT) - [ADC5_VBAT_SNS] = ADC5_CHAN_VOLT("vbat_sns", 3, + [ADC5_VBAT_SNS] = ADC5_CHAN_VOLT("vbat_sns", 1, SCALE_HW_CALIB_DEFAULT) - [ADC5_DIE_TEMP] = ADC5_CHAN_TEMP("die_temp", 1, + [ADC5_DIE_TEMP] = ADC5_CHAN_TEMP("die_temp", 0, SCALE_HW_CALIB_PMIC_THERM) - [ADC5_USB_IN_I] = ADC5_CHAN_VOLT("usb_in_i_uv", 1, + [ADC5_USB_IN_I] = ADC5_CHAN_VOLT("usb_in_i_uv", 0, SCALE_HW_CALIB_DEFAULT) - [ADC5_USB_IN_V_16] = ADC5_CHAN_VOLT("usb_in_v_div_16", 16, + [ADC5_USB_IN_V_16] = ADC5_CHAN_VOLT("usb_in_v_div_16", 8, SCALE_HW_CALIB_DEFAULT) - [ADC5_CHG_TEMP] = ADC5_CHAN_TEMP("chg_temp", 1, + [ADC5_CHG_TEMP] = ADC5_CHAN_TEMP("chg_temp", 0, SCALE_HW_CALIB_PM5_CHG_TEMP) /* Charger prescales SBUx and MID_CHG to fit within 1.8V upper unit */ - [ADC5_SBUx] = ADC5_CHAN_VOLT("chg_sbux", 3, + [ADC5_SBUx] = ADC5_CHAN_VOLT("chg_sbux", 1, SCALE_HW_CALIB_DEFAULT) - [ADC5_MID_CHG_DIV6] = ADC5_CHAN_VOLT("chg_mid_chg", 6, + [ADC5_MID_CHG_DIV6] = ADC5_CHAN_VOLT("chg_mid_chg", 3, SCALE_HW_CALIB_DEFAULT) - [ADC5_XO_THERM_100K_PU] = ADC5_CHAN_TEMP("xo_therm", 1, + [ADC5_XO_THERM_100K_PU] = ADC5_CHAN_TEMP("xo_therm", 0, SCALE_HW_CALIB_XOTHERM) - [ADC5_AMUX_THM1_100K_PU] = ADC5_CHAN_TEMP("amux_thm1_100k_pu", 1, + [ADC5_AMUX_THM1_100K_PU] = ADC5_CHAN_TEMP("amux_thm1_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM2_100K_PU] = ADC5_CHAN_TEMP("amux_thm2_100k_pu", 1, + [ADC5_AMUX_THM2_100K_PU] = ADC5_CHAN_TEMP("amux_thm2_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM3_100K_PU] = ADC5_CHAN_TEMP("amux_thm3_100k_pu", 1, + [ADC5_AMUX_THM3_100K_PU] = ADC5_CHAN_TEMP("amux_thm3_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM2] = ADC5_CHAN_TEMP("amux_thm2", 1, + [ADC5_AMUX_THM2] = ADC5_CHAN_TEMP("amux_thm2", 0, SCALE_HW_CALIB_PM5_SMB_TEMP) };
static const struct adc5_channels adc5_chans_rev2[ADC5_MAX_CHANNEL] = { - [ADC5_REF_GND] = ADC5_CHAN_VOLT("ref_gnd", 1, + [ADC5_REF_GND] = ADC5_CHAN_VOLT("ref_gnd", 0, SCALE_HW_CALIB_DEFAULT) - [ADC5_1P25VREF] = ADC5_CHAN_VOLT("vref_1p25", 1, + [ADC5_1P25VREF] = ADC5_CHAN_VOLT("vref_1p25", 0, SCALE_HW_CALIB_DEFAULT) - [ADC5_VPH_PWR] = ADC5_CHAN_VOLT("vph_pwr", 3, + [ADC5_VPH_PWR] = ADC5_CHAN_VOLT("vph_pwr", 1, SCALE_HW_CALIB_DEFAULT) - [ADC5_VBAT_SNS] = ADC5_CHAN_VOLT("vbat_sns", 3, + [ADC5_VBAT_SNS] = ADC5_CHAN_VOLT("vbat_sns", 1, SCALE_HW_CALIB_DEFAULT) - [ADC5_VCOIN] = ADC5_CHAN_VOLT("vcoin", 3, + [ADC5_VCOIN] = ADC5_CHAN_VOLT("vcoin", 1, SCALE_HW_CALIB_DEFAULT) - [ADC5_DIE_TEMP] = ADC5_CHAN_TEMP("die_temp", 1, + [ADC5_DIE_TEMP] = ADC5_CHAN_TEMP("die_temp", 0, SCALE_HW_CALIB_PMIC_THERM) - [ADC5_AMUX_THM1_100K_PU] = ADC5_CHAN_TEMP("amux_thm1_100k_pu", 1, + [ADC5_AMUX_THM1_100K_PU] = ADC5_CHAN_TEMP("amux_thm1_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM2_100K_PU] = ADC5_CHAN_TEMP("amux_thm2_100k_pu", 1, + [ADC5_AMUX_THM2_100K_PU] = ADC5_CHAN_TEMP("amux_thm2_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM3_100K_PU] = ADC5_CHAN_TEMP("amux_thm3_100k_pu", 1, + [ADC5_AMUX_THM3_100K_PU] = ADC5_CHAN_TEMP("amux_thm3_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM4_100K_PU] = ADC5_CHAN_TEMP("amux_thm4_100k_pu", 1, + [ADC5_AMUX_THM4_100K_PU] = ADC5_CHAN_TEMP("amux_thm4_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_AMUX_THM5_100K_PU] = ADC5_CHAN_TEMP("amux_thm5_100k_pu", 1, + [ADC5_AMUX_THM5_100K_PU] = ADC5_CHAN_TEMP("amux_thm5_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) - [ADC5_XO_THERM_100K_PU] = ADC5_CHAN_TEMP("xo_therm_100k_pu", 1, + [ADC5_XO_THERM_100K_PU] = ADC5_CHAN_TEMP("xo_therm_100k_pu", 0, SCALE_HW_CALIB_THERM_100K_PULLUP) };
@@ -558,6 +559,9 @@ static int adc5_get_dt_channel_data(stru return ret; } prop->prescale = ret; + } else { + prop->prescale = + adc->data->adc_chans[prop->channel].prescale_index; }
ret = of_property_read_u32(node, "qcom,hw-settle-time", &value);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mircea Caprioru mircea.caprioru@analog.com
commit 0e76df5c978338f3051e5126fc0c4245c57a307a upstream.
This patch solves the register readback issue with the bit shift. When the dac resolution was lower than the register size (ex. 12 bits out of 16 bits) the readback value was not shifted with the difference in bits and the value was higher. Also a mask is applied on the read value in order to get the value relative to the actual bit size.
Fixes: 0357e488b8 ("iio:dac:ad5686: Refactor the driver") Signed-off-by: Mircea Caprioru mircea.caprioru@analog.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iio/dac/ad5686.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/iio/dac/ad5686.c +++ b/drivers/iio/dac/ad5686.c @@ -124,7 +124,8 @@ static int ad5686_read_raw(struct iio_de mutex_unlock(&indio_dev->mlock); if (ret < 0) return ret; - *val = ret; + *val = (ret >> chan->scan_type.shift) & + GENMASK(chan->scan_type.realbits - 1, 0); return IIO_VAL_INT; case IIO_CHAN_INFO_SCALE: *val = st->vref_mv;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dominique Martinet dominique.martinet@cea.fr
commit 574d356b7a02c7e1b01a1d9cba8a26b3c2888f45 upstream.
If the requested msize is too small (either from command line argument or from the server version reply), we won't get any work done. If it's *really* too small, nothing will work, and this got caught by syzbot recently (on a new kmem_cache_create_usercopy() call)
Just set a minimum msize to 4k in both code paths, until someone complains they have a use-case for a smaller msize.
We need to check in both mount option and server reply individually because the msize for the first version request would be unchecked with just a global check on clnt->msize.
Link: http://lkml.kernel.org/r/1541407968-31350-1-git-send-email-asmadeus@codewrec... Reported-by: syzbot+0c1d61e4db7db94102ca@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Cc: Eric Van Hensbergen ericvh@gmail.com Cc: Latchesar Ionkov lucho@ionkov.net Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/9p/client.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/net/9p/client.c +++ b/net/9p/client.c @@ -181,6 +181,12 @@ static int parse_opts(char *opts, struct ret = r; continue; } + if (option < 4096) { + p9_debug(P9_DEBUG_ERROR, + "msize should be at least 4k\n"); + ret = -EINVAL; + continue; + } clnt->msize = option; break; case Opt_trans: @@ -983,10 +989,18 @@ static int p9_client_version(struct p9_c else if (!strncmp(version, "9P2000", 6)) c->proto_version = p9_proto_legacy; else { + p9_debug(P9_DEBUG_ERROR, + "server returned an unknown version: %s\n", version); err = -EREMOTEIO; goto error; }
+ if (msize < 4096) { + p9_debug(P9_DEBUG_ERROR, + "server returned a msize < 4096: %d\n", msize); + err = -EREMOTEIO; + goto error; + } if (msize < c->msize) c->msize = msize;
@@ -1043,6 +1057,13 @@ struct p9_client *p9_client_create(const if (clnt->msize > clnt->trans_mod->maxsize) clnt->msize = clnt->trans_mod->maxsize;
+ if (clnt->msize < 4096) { + p9_debug(P9_DEBUG_ERROR, + "Please specify a msize of at least 4k\n"); + err = -EINVAL; + goto free_client; + } + err = p9_client_version(clnt); if (err) goto close_trans;
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sagi Grimberg sagi@grimberg.me
commit e48d8ed9c6193502d849b35767fd18e20bbd7ba2 upstream.
Error completions must still contain a valid wr_id and qp_num such that the consumer can rely on. Correctly fill these fields in receive error completions.
Reported-by: Walker Benjamin benjamin.walker@intel.com Cc: stable@vger.kernel.org Signed-off-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Zhu Yanjun yanjun.zhu@oracle.com Tested-by: Zhu Yanjun yanjun.zhu@oracle.com Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -844,11 +844,16 @@ static enum resp_states do_complete(stru
memset(&cqe, 0, sizeof(cqe));
- wc->wr_id = wqe->wr_id; - wc->status = qp->resp.status; - wc->qp = &qp->ibqp; + if (qp->rcq->is_user) { + uwc->status = qp->resp.status; + uwc->qp_num = qp->ibqp.qp_num; + uwc->wr_id = wqe->wr_id; + } else { + wc->status = qp->resp.status; + wc->qp = &qp->ibqp; + wc->wr_id = wqe->wr_id; + }
- /* fields after status are not required for errors */ if (wc->status == IB_WC_SUCCESS) { wc->opcode = (pkt->mask & RXE_IMMDT_MASK && pkt->mask & RXE_WRITE_MASK) ?
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit c18614a1a11276837bdd44403d84d207c9951538 upstream.
Commit c7fd62bc69d0 ("stm class: Introduce framing protocol drivers") adds a bug into the error path of policy creation, that would do a module_put() on a wrong module, if one tried to create a policy for an stm device which already has a policy, using a different protocol. IOW,
| mkdir /config/stp-policy/dummy_stm.0:p_basic.test | mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # puts "p_basic" | mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # "p_basic" -> -1
throws:
| general protection fault: 0000 [#1] SMP PTI | CPU: 3 PID: 2887 Comm: mkdir | RIP: 0010:module_put.part.31+0xe/0x90 | Call Trace: | module_put+0x13/0x20 | stm_put_protocol+0x11/0x20 [stm_core] | stp_policy_make+0xf1/0x210 [stm_core] | ? __kmalloc+0x183/0x220 | ? configfs_mkdir+0x10d/0x4c0 | configfs_mkdir+0x169/0x4c0 | vfs_mkdir+0x108/0x1c0 | do_mkdirat+0xe8/0x110 | __x64_sys_mkdir+0x1b/0x20 | do_syscall_64+0x5a/0x140 | entry_SYSCALL_64_after_hwframe+0x44/0xa9
Correct this sad mistake by calling calling 'put' on the correct reference, which happens to match another error path in the same function, so we consolidate the two at the same time.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Fixes: c7fd62bc69d0 ("stm class: Introduce framing protocol drivers") Reported-by: Ammy Yi ammy.yi@intel.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/stm/policy.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/hwtracing/stm/policy.c +++ b/drivers/hwtracing/stm/policy.c @@ -440,10 +440,8 @@ stp_policy_make(struct config_group *gro
stm->policy = kzalloc(sizeof(*stm->policy), GFP_KERNEL); if (!stm->policy) { - mutex_unlock(&stm->policy_mutex); - stm_put_protocol(pdrv); - stm_put_device(stm); - return ERR_PTR(-ENOMEM); + ret = ERR_PTR(-ENOMEM); + goto unlock_policy; }
config_group_init_type_name(&stm->policy->group, name, @@ -458,7 +456,11 @@ unlock_policy: mutex_unlock(&stm->policy_mutex);
if (IS_ERR(ret)) { - stm_put_protocol(stm->pdrv); + /* + * pdrv and stm->pdrv at this point can be quite different, + * and only one of them needs to be 'put' + */ + stm_put_protocol(pdrv); stm_put_device(stm); }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
commit ed041919f0d23c109d52cde8da6ddc211c52d67e upstream.
This patch avoids that KASAN sporadically reports the following:
BUG: KASAN: use-after-free in rxe_run_task+0x1e/0x60 [rdma_rxe] Read of size 1 at addr ffff88801c50d8f4 by task check/24830
CPU: 4 PID: 24830 Comm: check Not tainted 4.20.0-rc6-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_address_description+0x71/0x239 kasan_report.cold.5+0x242/0x301 __asan_load1+0x47/0x50 rxe_run_task+0x1e/0x60 [rdma_rxe] rxe_post_send+0x4bd/0x8d0 [rdma_rxe] srpt_zerolength_write+0xe1/0x160 [ib_srpt] srpt_close_ch+0x8b/0xe0 [ib_srpt] srpt_set_enabled+0xe7/0x150 [ib_srpt] srpt_tpg_enable_store+0xc0/0x100 [ib_srpt] configfs_write_file+0x157/0x1d0 __vfs_write+0xd7/0x3d0 vfs_write+0x102/0x290 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Allocated by task 13856: save_stack+0x43/0xd0 kasan_kmalloc+0xc7/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0x105/0x320 rxe_alloc+0xff/0x1f0 [rdma_rxe] rxe_create_qp+0x9f/0x160 [rdma_rxe] ib_create_qp+0xf5/0x690 [ib_core] rdma_create_qp+0x6a/0x140 [rdma_cm] srpt_cm_req_recv.cold.59+0x1588/0x237b [ib_srpt] srpt_rdma_cm_req_recv.isra.35+0x1d5/0x220 [ib_srpt] srpt_rdma_cm_handler+0x6f/0x100 [ib_srpt] cma_listen_handler+0x59/0x60 [rdma_cm] cma_ib_req_handler+0xd5b/0x2570 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_work_handler+0x2aae/0x502b [ib_cm] process_one_work+0x481/0x9e0 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30
Freed by task 3440: save_stack+0x43/0xd0 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xbc/0x330 rxe_elem_release+0x66/0xe0 [rdma_rxe] rxe_destroy_qp+0x3f/0x50 [rdma_rxe] ib_destroy_qp+0x140/0x360 [ib_core] srpt_release_channel_work+0xdc/0x310 [ib_srpt] process_one_work+0x481/0x9e0 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30
Cc: Sergey Gorenko sergeygo@mellanox.com Cc: Max Gurtovoy maxg@mellanox.com Cc: Laurence Oberman loberman@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/ulp/srpt/ib_srpt.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2010,6 +2010,14 @@ static void srpt_free_ch(struct kref *kr kfree_rcu(ch, rcu); }
+/* + * Shut down the SCSI target session, tell the connection manager to + * disconnect the associated RDMA channel, transition the QP to the error + * state and remove the channel from the channel list. This function is + * typically called from inside srpt_zerolength_write_done(). Concurrent + * srpt_zerolength_write() calls from inside srpt_close_ch() are possible + * as long as the channel is on sport->nexus_list. + */ static void srpt_release_channel_work(struct work_struct *w) { struct srpt_rdma_ch *ch; @@ -2037,6 +2045,11 @@ static void srpt_release_channel_work(st else ib_destroy_cm_id(ch->ib_cm.cm_id);
+ sport = ch->sport; + mutex_lock(&sport->mutex); + list_del_rcu(&ch->list); + mutex_unlock(&sport->mutex); + srpt_destroy_ch_ib(ch);
srpt_free_ioctx_ring((struct srpt_ioctx **)ch->ioctx_ring, @@ -2047,11 +2060,6 @@ static void srpt_release_channel_work(st sdev, ch->rq_size, srp_max_req_size, DMA_FROM_DEVICE);
- sport = ch->sport; - mutex_lock(&sport->mutex); - list_del_rcu(&ch->list); - mutex_unlock(&sport->mutex); - wake_up(&sport->ch_releaseQ);
kref_put(&ch->kref, srpt_free_ch);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve Wise swise@opengridcomputing.com
commit d53ec8af56d5163f8a42e961ece3aeb5c560e79d upstream.
We now use dev_name(&ib_device->dev) instead of ib_device->name in iwpm messages. The name field in struct device is a const char *, where as ib_device->name is a char array of size IB_DEVICE_NAME_MAX, and it is pre-initialized to zeros.
Since iw_cm_map() was using memcpy() to copy in the device name, and copying IWPM_DEVNAME_SIZE bytes, it ends up copying past the end of the source device name string and copying random bytes. This results in iwpmd failing the REGISTER_PID request from iwcm. Thus port mapping is broken.
Validate the device and if names, and use strncpy() to inialize the entire message field.
Fixes: 896de0090a85 ("RDMA/core: Use dev_name instead of ibdev->name") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise swise@opengridcomputing.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/iwcm.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -502,17 +502,21 @@ static void iw_cm_check_wildcard(struct */ static int iw_cm_map(struct iw_cm_id *cm_id, bool active) { + const char *devname = dev_name(&cm_id->device->dev); + const char *ifname = cm_id->device->iwcm->ifname; struct iwpm_dev_data pm_reg_msg; struct iwpm_sa_data pm_msg; int status;
+ if (strlen(devname) >= sizeof(pm_reg_msg.dev_name) || + strlen(ifname) >= sizeof(pm_reg_msg.if_name)) + return -EINVAL; + cm_id->m_local_addr = cm_id->local_addr; cm_id->m_remote_addr = cm_id->remote_addr;
- memcpy(pm_reg_msg.dev_name, dev_name(&cm_id->device->dev), - sizeof(pm_reg_msg.dev_name)); - memcpy(pm_reg_msg.if_name, cm_id->device->iwcm->ifname, - sizeof(pm_reg_msg.if_name)); + strncpy(pm_reg_msg.dev_name, devname, sizeof(pm_reg_msg.dev_name)); + strncpy(pm_reg_msg.if_name, ifname, sizeof(pm_reg_msg.if_name));
if (iwpm_register_pid(&pm_reg_msg, RDMA_NL_IWCM) || !iwpm_valid_pid())
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sohil Mehta sohil.mehta@intel.com
commit 3569dd07aaad71920c5ea4da2d5cc9a167c1ffd4 upstream.
The Intel IOMMU driver opportunistically skips a few top level page tables from the domain paging directory while programming the IOMMU context entry. However there is an implicit assumption in the code that domain's adjusted guest address width (agaw) would always be greater than IOMMU's agaw.
The IOMMU capabilities in an upcoming platform cause the domain's agaw to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports both 4-level and 5-level paging. The domain builds a 4-level page table based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In this case the code incorrectly tries to skip page page table levels. This causes the IOMMU driver to avoid programming the context entry. The fix handles this case and programs the context entry accordingly.
Fixes: de24e55395698 ("iommu/vt-d: Simplify domain_context_mapping_one") Cc: stable@vger.kernel.org Cc: Ashok Raj ashok.raj@intel.com Cc: Jacob Pan jacob.jun.pan@linux.intel.com Cc: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reported-by: Ramos Falcon, Ernesto R ernesto.r.ramos.falcon@intel.com Tested-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Sohil Mehta sohil.mehta@intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iommu/intel-iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2044,7 +2044,7 @@ static int domain_context_mapping_one(st * than default. Unnecessary for PT mode. */ if (translation != CONTEXT_TT_PASS_THROUGH) { - for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) { + for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { ret = -ENOMEM; pgd = phys_to_virt(dma_pte_addr(pgd)); if (!dma_pte_present(pgd)) @@ -2058,7 +2058,7 @@ static int domain_context_mapping_one(st translation = CONTEXT_TT_MULTI_LEVEL;
context_set_address_root(context, virt_to_phys(pgd)); - context_set_address_width(context, iommu->agaw); + context_set_address_width(context, agaw); } else { /* * In pass through mode, AW must be programmed to
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit c40f7d74c741a907cfaeb73a7697081881c497d0 upstream.
Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the scheduler under high loads, starting at around the v4.18 time frame, and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list manipulation.
Do a (manual) revert of:
a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")
It turns out that the list_del_leaf_cfs_rq() introduced by this commit is a surprising property that was not considered in followup commits such as:
9c2791f936ef ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list")
As Vincent Guittot explains:
"I think that there is a bigger problem with commit a9e7f6544b9c and cfs_rq throttling:
Let take the example of the following topology TG2 --> TG1 --> root:
1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1 cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in one path because it has never been used and can't be throttled so tmp_alone_branch will point to leaf_cfs_rq_list at the end.
2) Then TG1 is throttled
3) and we add TG3 as a new child of TG1.
4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1 cfs_rq and tmp_alone_branch will stay on rq->leaf_cfs_rq_list.
With commit a9e7f6544b9c, we can del a cfs_rq from rq->leaf_cfs_rq_list. So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1 cfs_rq is removed from the list. Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list but tmp_alone_branch still points to TG3 cfs_rq because its throttled parent can't be enqueued when the lock is released. tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should.
So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch points on another TG cfs_rq, the next TG cfs_rq that will be added, will be linked outside rq->leaf_cfs_rq_list - which is bad.
In addition, we can break the ordering of the cfs_rq in rq->leaf_cfs_rq_list but this ordering is used to update and propagate the update from leaf down to root."
Instead of trying to work through all these cases and trying to reproduce the very high loads that produced the lockup to begin with, simplify the code temporarily by reverting a9e7f6544b9c - which change was clearly not thought through completely.
This (hopefully) gives us a kernel that doesn't lock up so people can continue to enjoy their holidays without worrying about regressions. ;-)
[ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ]
Analyzed-by: Xie XiuQi xiexiuqi@huawei.com Analyzed-by: Vincent Guittot vincent.guittot@linaro.org Reported-by: Zhipeng Xie xiezhipeng1@huawei.com Reported-by: Sargun Dhillon sargun@sargun.me Reported-by: Xie XiuQi xiexiuqi@huawei.com Tested-by: Zhipeng Xie xiezhipeng1@huawei.com Tested-by: Sargun Dhillon sargun@sargun.me Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Acked-by: Vincent Guittot vincent.guittot@linaro.org Cc: stable@vger.kernel.org # v4.13+ Cc: Bin Li huawei.libin@huawei.com Cc: Mike Galbraith efault@gmx.de Cc: Peter Zijlstra peterz@infradead.org Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexiuqi@huawei.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 43 +++++++++---------------------------------- 1 file changed, 9 insertions(+), 34 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -352,10 +352,9 @@ static inline void list_del_leaf_cfs_rq( } }
-/* Iterate thr' all leaf cfs_rq's on a runqueue */ -#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ - list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \ - leaf_cfs_rq_list) +/* Iterate through all leaf cfs_rq's on a runqueue: */ +#define for_each_leaf_cfs_rq(rq, cfs_rq) \ + list_for_each_entry_rcu(cfs_rq, &rq->leaf_cfs_rq_list, leaf_cfs_rq_list)
/* Do the two (enqueued) entities belong to the same group ? */ static inline struct cfs_rq * @@ -447,8 +446,8 @@ static inline void list_del_leaf_cfs_rq( { }
-#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ - for (cfs_rq = &rq->cfs, pos = NULL; cfs_rq; cfs_rq = pos) +#define for_each_leaf_cfs_rq(rq, cfs_rq) \ + for (cfs_rq = &rq->cfs; cfs_rq; cfs_rq = NULL)
static inline struct sched_entity *parent_entity(struct sched_entity *se) { @@ -7387,27 +7386,10 @@ static inline bool others_have_blocked(s
#ifdef CONFIG_FAIR_GROUP_SCHED
-static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) -{ - if (cfs_rq->load.weight) - return false; - - if (cfs_rq->avg.load_sum) - return false; - - if (cfs_rq->avg.util_sum) - return false; - - if (cfs_rq->avg.runnable_load_sum) - return false; - - return true; -} - static void update_blocked_averages(int cpu) { struct rq *rq = cpu_rq(cpu); - struct cfs_rq *cfs_rq, *pos; + struct cfs_rq *cfs_rq; const struct sched_class *curr_class; struct rq_flags rf; bool done = true; @@ -7419,7 +7401,7 @@ static void update_blocked_averages(int * Iterates the task_group tree in a bottom up fashion, see * list_add_leaf_cfs_rq() for details. */ - for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) { + for_each_leaf_cfs_rq(rq, cfs_rq) { struct sched_entity *se;
/* throttled entities do not contribute to load */ @@ -7434,13 +7416,6 @@ static void update_blocked_averages(int if (se && !skip_blocked_update(se)) update_load_avg(cfs_rq_of(se), se, 0);
- /* - * There can be a lot of idle CPU cgroups. Don't let fully - * decayed cfs_rqs linger on the list. - */ - if (cfs_rq_is_decayed(cfs_rq)) - list_del_leaf_cfs_rq(cfs_rq); - /* Don't need periodic decay once load/util_avg are null */ if (cfs_rq_has_blocked(cfs_rq)) done = false; @@ -10289,10 +10264,10 @@ const struct sched_class fair_sched_clas #ifdef CONFIG_SCHED_DEBUG void print_cfs_stats(struct seq_file *m, int cpu) { - struct cfs_rq *cfs_rq, *pos; + struct cfs_rq *cfs_rq;
rcu_read_lock(); - for_each_leaf_cfs_rq_safe(cpu_rq(cpu), cfs_rq, pos) + for_each_leaf_cfs_rq(cpu_rq(cpu), cfs_rq) print_cfs_rq(m, cpu, cfs_rq); rcu_read_unlock(); }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yan, Zheng zyan@redhat.com
commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream.
Updating mseq makes client think importer mds has accepted all prior cap messages and importer mds knows what caps client wants. Actually some cap messages may have been dropped because of mseq mismatch.
If mseq is left untouched, importing cap's mds_wanted later will get reset by cap import message.
Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/caps.c | 1 - 1 file changed, 1 deletion(-)
--- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -3569,7 +3569,6 @@ retry: tcap->cap_id = t_cap_id; tcap->seq = t_seq - 1; tcap->issue_seq = t_seq - 1; - tcap->mseq = t_mseq; tcap->issued |= issued; tcap->implemented |= issued; if (cap == ci->i_auth_cap)
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: YueHaibing yuehaibing@huawei.com
commit 2607391882fca37463187e7f2a9c76dec286947e upstream.
'info->modes' got allocated with devm_kcalloc in of_get_pxafb_display.
This gives this error message: ./drivers/video/fbdev/pxafb.c:2238:2-7: WARNING: invalid free of devm_ allocated data
Fixes: c8f96304ec8b4 ("video: fbdev: pxafb: switch to devm_* API") Cc: stable@kernel.org [v4.19+] Signed-off-by: YueHaibing yuehaibing@huawei.com Reviewed-by: Daniel Mack daniel@zonque.org Cc: Robert Jarzmik robert.jarzmik@free.fr Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/video/fbdev/pxafb.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/video/fbdev/pxafb.c +++ b/drivers/video/fbdev/pxafb.c @@ -2234,10 +2234,8 @@ static struct pxafb_mach_info *of_pxafb_ if (!info) return ERR_PTR(-ENOMEM); ret = of_get_pxafb_mode_info(dev, info); - if (ret) { - kfree(info->modes); + if (ret) return ERR_PTR(ret); - }
/* * On purpose, neither lccrX registers nor video memory size can be
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shaokun Zhang zhangshaokun@hisilicon.com
commit eb4f5213251833567570df1a09803f895653274d upstream.
For DDRC PMU, each PMU counter is fixed-purpose. There is a mismatch between perf list and driver definition on rw_chg event. # perf list | grep chg hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event] hisi_sccl1_ddrc0/rw_chg/ [Kernel PMU event] But the register offset of rw_chg event is not defined in the driver, meanwhile bnk_chg register offset is mis-defined, let's fixup it.
Fixes: 904dcf03f086 ("perf: hisi: Add support for HiSilicon SoC DDRC PMU driver") Cc: stable@vger.kernel.org Cc: John Garry john.garry@huawei.com Cc: Will Deacon will.deacon@arm.com Cc: Mark Rutland mark.rutland@arm.com Reported-by: Weijian Huang huangweijian4@hisilicon.com Signed-off-by: Shaokun Zhang zhangshaokun@hisilicon.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c @@ -30,8 +30,8 @@ #define DDRC_FLUX_RCMD 0x38c #define DDRC_PRE_CMD 0x3c0 #define DDRC_ACT_CMD 0x3c4 -#define DDRC_BNK_CHG 0x3c8 #define DDRC_RNK_CHG 0x3cc +#define DDRC_RW_CHG 0x3d0 #define DDRC_EVENT_CTRL 0x6C0 #define DDRC_INT_MASK 0x6c8 #define DDRC_INT_STATUS 0x6cc @@ -51,7 +51,7 @@
static const u32 ddrc_reg_off[] = { DDRC_FLUX_WR, DDRC_FLUX_RD, DDRC_FLUX_WCMD, DDRC_FLUX_RCMD, - DDRC_PRE_CMD, DDRC_ACT_CMD, DDRC_BNK_CHG, DDRC_RNK_CHG + DDRC_PRE_CMD, DDRC_ACT_CMD, DDRC_RNK_CHG, DDRC_RW_CHG };
/*
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Lamparter chunkeey@gmail.com
commit d0757237d7b18b1ce74293be7c077d86f7a732e8 upstream.
This patch fixes a recent compilation regression in ocm:
ocm.c: In function ‘ocm_init_node’: ocm.c:182:18: error: invalid operands to binary | (have ‘int’ and ‘pgprot_t’ {aka ‘struct <anonymous>’}) _PAGE_EXEC | PAGE_KERNEL_NCG); ^
ocm.c:197:17: error: invalid operands to binary | (have ‘int’ and ‘pgprot_t’ {aka ‘struct <anonymous>’}) _PAGE_EXEC | PAGE_KERNEL); ^
Fixes: 56f3c1413f5c ("powerpc/mm: properly set PAGE_KERNEL flags in ioremap()") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Christian Lamparter chunkeey@gmail.com Reviewed-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/4xx/ocm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/powerpc/platforms/4xx/ocm.c +++ b/arch/powerpc/platforms/4xx/ocm.c @@ -179,7 +179,7 @@ static void __init ocm_init_node(int cou /* ioremap the non-cached region */ if (ocm->nc.memtotal) { ocm->nc.virt = __ioremap(ocm->nc.phys, ocm->nc.memtotal, - _PAGE_EXEC | PAGE_KERNEL_NCG); + _PAGE_EXEC | pgprot_val(PAGE_KERNEL_NCG));
if (!ocm->nc.virt) { printk(KERN_ERR @@ -194,7 +194,7 @@ static void __init ocm_init_node(int cou
if (ocm->c.memtotal) { ocm->c.virt = __ioremap(ocm->c.phys, ocm->c.memtotal, - _PAGE_EXEC | PAGE_KERNEL); + _PAGE_EXEC | pgprot_val(PAGE_KERNEL));
if (!ocm->c.virt) { printk(KERN_ERR
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan shuah@kernel.org
commit 211929fd3f7c8de4d541b1cc243b82830e5ea1e8 upstream.
Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk") added khdr target to run headers_install target from the main Makefile. The logic uses KSFT_KHDR_INSTALL and top_srcdir as controls to initialize variables and include files to run headers_install from the top level Makefile. There are a few problems with this logic.
1. Exposes top_srcdir to all tests 2. Common logic impacts all tests 3. Uses KSFT_KHDR_INSTALL, top_srcdir, and khdr in an adhoc way. Tests add "khdr" dependency in their Makefiles to TEST_PROGS_EXTENDED in some cases, and STATIC_LIBS in other cases. This makes this framework confusing to use.
The common logic that runs for all tests even when KSFT_KHDR_INSTALL isn't defined by the test. top_srcdir is initialized to a default value when test doesn't initialize it. It works for all tests without a sub-dir structure and tests with sub-dir structure fail to build.
e.g: make -C sparc64/drivers/ or make -C drivers/dma-buf
../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory make: *** No rule to make target '../../../../scripts/subarch.include'. Stop.
There is no reason to require all tests to define top_srcdir and there is no need to require tests to add khdr dependency using adhoc changes to TEST_* and other variables.
Fix it with a consistent use of KSFT_KHDR_INSTALL and top_srcdir from tests that have the dependency on headers_install.
Change common logic to include khdr target define and "all" target with dependency on khdr when KSFT_KHDR_INSTALL is defined.
Only tests that have dependency on headers_install have to define just the KSFT_KHDR_INSTALL, and top_srcdir variables and there is no need to specify khdr dependency in the test Makefiles.
Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk") Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan shuah@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/android/Makefile | 2 +- tools/testing/selftests/futex/functional/Makefile | 1 + tools/testing/selftests/gpio/Makefile | 6 +++--- tools/testing/selftests/kvm/Makefile | 2 +- tools/testing/selftests/lib.mk | 8 ++++---- tools/testing/selftests/networking/timestamping/Makefile | 1 + tools/testing/selftests/tc-testing/bpf/Makefile | 1 + tools/testing/selftests/vm/Makefile | 1 + 8 files changed, 13 insertions(+), 9 deletions(-)
--- a/tools/testing/selftests/android/Makefile +++ b/tools/testing/selftests/android/Makefile @@ -6,7 +6,7 @@ TEST_PROGS := run.sh
include ../lib.mk
-all: khdr +all: @for DIR in $(SUBDIRS); do \ BUILD_TARGET=$(OUTPUT)/$$DIR; \ mkdir $$BUILD_TARGET -p; \ --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -19,6 +19,7 @@ TEST_GEN_FILES := \ TEST_PROGS := run.sh
top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 include ../../lib.mk
$(TEST_GEN_FILES): $(HEADERS) --- a/tools/testing/selftests/gpio/Makefile +++ b/tools/testing/selftests/gpio/Makefile @@ -10,8 +10,6 @@ TEST_PROGS_EXTENDED := gpio-mockup-chard GPIODIR := $(realpath ../../../gpio) GPIOOBJ := gpio-utils.o
-include ../lib.mk - all: $(TEST_PROGS_EXTENDED)
override define CLEAN @@ -19,7 +17,9 @@ override define CLEAN $(MAKE) -C $(GPIODIR) OUTPUT=$(GPIODIR)/ clean endef
-$(TEST_PROGS_EXTENDED):| khdr +KSFT_KHDR_INSTALL := 1 +include ../lib.mk + $(TEST_PROGS_EXTENDED): $(GPIODIR)/$(GPIOOBJ)
$(GPIODIR)/$(GPIOOBJ): --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -1,6 +1,7 @@ all:
top_srcdir = ../../../.. +KSFT_KHDR_INSTALL := 1 UNAME_M := $(shell uname -m)
LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/ucall.c lib/sparsebit.c @@ -44,7 +45,6 @@ $(OUTPUT)/libkvm.a: $(LIBKVM_OBJ)
all: $(STATIC_LIBS) $(TEST_GEN_PROGS): $(STATIC_LIBS) -$(STATIC_LIBS):| khdr
cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib .. cscope: --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -16,18 +16,18 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT) TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED)) TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
+ifdef KSFT_KHDR_INSTALL top_srcdir ?= ../../../.. include $(top_srcdir)/scripts/subarch.include ARCH ?= $(SUBARCH)
-all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) - .PHONY: khdr khdr: make ARCH=$(ARCH) -C $(top_srcdir) headers_install
-ifdef KSFT_KHDR_INSTALL -$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES):| khdr +all: khdr $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) +else +all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) endif
.ONESHELL: --- a/tools/testing/selftests/networking/timestamping/Makefile +++ b/tools/testing/selftests/networking/timestamping/Makefile @@ -6,6 +6,7 @@ TEST_PROGS := hwtstamp_config rxtimestam all: $(TEST_PROGS)
top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 include ../../lib.mk
clean: --- a/tools/testing/selftests/tc-testing/bpf/Makefile +++ b/tools/testing/selftests/tc-testing/bpf/Makefile @@ -4,6 +4,7 @@ APIDIR := ../../../../include/uapi TEST_GEN_FILES = action.o
top_srcdir = ../../../../.. +KSFT_KHDR_INSTALL := 1 include ../../lib.mk
CLANG ?= clang --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -25,6 +25,7 @@ TEST_GEN_FILES += virtual_address_range
TEST_PROGS := run_vmtests
+KSFT_KHDR_INSTALL := 1 include ../lib.mk
$(OUTPUT)/userfaultfd: LDLIBS += -lpthread
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Borntraeger borntraeger@de.ibm.com
commit fdd669684655c07dacbdb0d753fd13833de69a33 upstream.
Calling the test program genwqe_cksum with the default buffer size of 2MB triggers the following kernel warning on s390:
WARNING: CPU: 30 PID: 9311 at mm/page_alloc.c:3189 __alloc_pages_nodemask+0x45c/0xbe0 CPU: 30 PID: 9311 Comm: genwqe_cksum Kdump: loaded Not tainted 3.10.0-957.el7.s390x #1 task: 00000005e5d13980 ti: 00000005e7c6c000 task.ti: 00000005e7c6c000 Krnl PSW : 0704c00180000000 00000000002780ac (__alloc_pages_nodemask+0x45c/0xbe0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 00000000002932b8 0000000000b73d7c 0000000000000010 0000000000000009 0000000000000041 00000005e7c6f9b8 0000000000000001 00000000000080d0 0000000000000000 0000000000b70500 0000000000000001 0000000000000000 0000000000b70528 00000000007682c0 0000000000277df2 00000005e7c6f9a0 Krnl Code: 000000000027809e: de7195001000 ed 1280(114,%r9),0(%r1) 00000000002780a4: a774fead brc 7,277dfe #00000000002780a8: a7f40001 brc 15,2780aa >00000000002780ac: 92011000 mvi 0(%r1),1 00000000002780b0: a7f4fea7 brc 15,277dfe 00000000002780b4: 9101c6b6 tm 1718(%r12),1 00000000002780b8: a784ff3a brc 8,277f2c 00000000002780bc: a7f4fe2e brc 15,277d18 Call Trace: ([<0000000000277df2>] __alloc_pages_nodemask+0x1a2/0xbe0) [<000000000013afae>] s390_dma_alloc+0xfe/0x310 [<000003ff8065f362>] __genwqe_alloc_consistent+0xfa/0x148 [genwqe_card] [<000003ff80658f7a>] genwqe_mmap+0xca/0x248 [genwqe_card] [<00000000002b2712>] mmap_region+0x4e2/0x778 [<00000000002b2c54>] do_mmap+0x2ac/0x3e0 [<0000000000292d7e>] vm_mmap_pgoff+0xd6/0x118 [<00000000002b081c>] SyS_mmap_pgoff+0xdc/0x268 [<00000000002b0a34>] SyS_old_mmap+0x8c/0xb0 [<000000000074e518>] sysc_tracego+0x14/0x1e [<000003ffacf87dc6>] 0x3ffacf87dc6
turns out the check in __genwqe_alloc_consistent uses "> MAX_ORDER" while the mm code uses ">= MAX_ORDER". Fix genwqe.
Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Frank Haverkamp haver@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/genwqe/card_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/misc/genwqe/card_utils.c +++ b/drivers/misc/genwqe/card_utils.c @@ -215,7 +215,7 @@ u32 genwqe_crc32(u8 *buff, size_t len, u void *__genwqe_alloc_consistent(struct genwqe_dev *cd, size_t size, dma_addr_t *dma_handle) { - if (get_order(size) > MAX_ORDER) + if (get_order(size) >= MAX_ORDER) return NULL;
return dma_zalloc_coherent(&cd->pci_dev->dev, size, dma_handle,
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit ec5b5ad6e272d8d6b92d1007f79574919862a2d2 upstream.
The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated list of window sizes, passed from userspace. However, there is a bug in the string parsing logic wherein it doesn't exclude the comma character from the range of characters as it consumes them. This leads to an out-of-bounds access given a sufficiently long list. For example:
# echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages
BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40 Read of size 1 at addr ffff8803ffcebcd1 by task sh/825
CPU: 3 PID: 825 Comm: npktest.sh Tainted: G W 4.20.0-rc1+ Call Trace: dump_stack+0x7c/0xc0 print_address_description+0x6c/0x23c ? memchr+0x1e/0x40 kasan_report.cold.5+0x241/0x308 memchr+0x1e/0x40 nr_pages_store+0x203/0xd00 [intel_th_msu]
Fix this by accounting for the comma character.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Fixes: ba82664c134ef ("intel_th: Add Memory Storage Unit driver") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/intel_th/msu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/hwtracing/intel_th/msu.c +++ b/drivers/hwtracing/intel_th/msu.c @@ -1423,7 +1423,8 @@ nr_pages_store(struct device *dev, struc if (!end) break;
- len -= end - p; + /* consume the number and the following comma, hence +1 */ + len -= end - p + 1; p = end + 1; } while (len);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lubomir Rintel lkundrak@v3.sk
commit ed54ffbe554f0902689fd6d1712bbacbacd11376 upstream.
According to [1] and [2], the temperature values are in tenths of degree Celsius. Exposing the Celsius value makes the battery appear on fire:
$ upower -i /org/freedesktop/UPower/devices/battery_olpc_battery ... temperature: 236.9 degrees C
Tested on OLPC XO-1 and OLPC XO-1.75 laptops.
[1] include/linux/power_supply.h [2] Documentation/power/power_supply_class.txt
Fixes: fb972873a767 ("[BATTERY] One Laptop Per Child power/battery driver") Cc: stable@vger.kernel.org Signed-off-by: Lubomir Rintel lkundrak@v3.sk Acked-by: Pavel Machek pavel@ucw.cz Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/olpc_battery.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/power/supply/olpc_battery.c +++ b/drivers/power/supply/olpc_battery.c @@ -428,14 +428,14 @@ static int olpc_bat_get_property(struct if (ret) return ret;
- val->intval = (s16)be16_to_cpu(ec_word) * 100 / 256; + val->intval = (s16)be16_to_cpu(ec_word) * 10 / 256; break; case POWER_SUPPLY_PROP_TEMP_AMBIENT: ret = olpc_ec_cmd(EC_AMB_TEMP, NULL, 0, (void *)&ec_word, 2); if (ret) return ret;
- val->intval = (int)be16_to_cpu(ec_word) * 100 / 256; + val->intval = (int)be16_to_cpu(ec_word) * 10 / 256; break; case POWER_SUPPLY_PROP_CHARGE_COUNTER: ret = olpc_ec_cmd(EC_BAT_ACR, NULL, 0, (void *)&ec_word, 2);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Rowand frank.rowand@sony.com
commit b8a9ac1a5b99a2fcbed19fd29d2d59270c281a31 upstream.
The phandle cache contains struct device_node pointers. The refcount of the pointers was not incremented while in the cache, allowing use after free error after kfree() of the node. Add the proper increment and decrement of the use count.
Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Frank Rowand frank.rowand@sony.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/base.c | 70 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 24 deletions(-)
--- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -116,9 +116,6 @@ int __weak of_node_to_nid(struct device_ } #endif
-static struct device_node **phandle_cache; -static u32 phandle_cache_mask; - /* * Assumptions behind phandle_cache implementation: * - phandle property values are in a contiguous range of 1..n @@ -127,6 +124,44 @@ static u32 phandle_cache_mask; * - the phandle lookup overhead reduction provided by the cache * will likely be less */ + +static struct device_node **phandle_cache; +static u32 phandle_cache_mask; + +/* + * Caller must hold devtree_lock. + */ +static void __of_free_phandle_cache(void) +{ + u32 cache_entries = phandle_cache_mask + 1; + u32 k; + + if (!phandle_cache) + return; + + for (k = 0; k < cache_entries; k++) + of_node_put(phandle_cache[k]); + + kfree(phandle_cache); + phandle_cache = NULL; +} + +int of_free_phandle_cache(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&devtree_lock, flags); + + __of_free_phandle_cache(); + + raw_spin_unlock_irqrestore(&devtree_lock, flags); + + return 0; +} +#if !defined(CONFIG_MODULES) +late_initcall_sync(of_free_phandle_cache); +#endif + void of_populate_phandle_cache(void) { unsigned long flags; @@ -136,8 +171,7 @@ void of_populate_phandle_cache(void)
raw_spin_lock_irqsave(&devtree_lock, flags);
- kfree(phandle_cache); - phandle_cache = NULL; + __of_free_phandle_cache();
for_each_of_allnodes(np) if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) @@ -155,30 +189,15 @@ void of_populate_phandle_cache(void) goto out;
for_each_of_allnodes(np) - if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) + if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) { + of_node_get(np); phandle_cache[np->phandle & phandle_cache_mask] = np; + }
out: raw_spin_unlock_irqrestore(&devtree_lock, flags); }
-int of_free_phandle_cache(void) -{ - unsigned long flags; - - raw_spin_lock_irqsave(&devtree_lock, flags); - - kfree(phandle_cache); - phandle_cache = NULL; - - raw_spin_unlock_irqrestore(&devtree_lock, flags); - - return 0; -} -#if !defined(CONFIG_MODULES) -late_initcall_sync(of_free_phandle_cache); -#endif - void __init of_core_init(void) { struct device_node *np; @@ -1195,8 +1214,11 @@ struct device_node *of_find_node_by_phan if (!np) { for_each_of_allnodes(np) if (np->phandle == handle) { - if (phandle_cache) + if (phandle_cache) { + /* will put when removed from cache */ + of_node_get(np); phandle_cache[masked_handle] = np; + } break; } }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Rowand frank.rowand@sony.com
commit 5801169a2ed20003f771acecf3ac00574cf10a38 upstream.
Non-overlay dynamic devicetree node removal may leave the node in the phandle cache. Subsequent calls to of_find_node_by_phandle() will incorrectly find the stale entry. Remove the node from the cache.
Add paranoia checks in of_find_node_by_phandle() as a second level of defense (do not return cached node if detached, do not add node to cache if detached).
Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Reported-by: Michael Bringmann mwb@linux.vnet.ibm.com Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Frank Rowand frank.rowand@sony.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/base.c | 31 ++++++++++++++++++++++++++++++- drivers/of/dynamic.c | 3 +++ drivers/of/of_private.h | 4 ++++ 3 files changed, 37 insertions(+), 1 deletion(-)
--- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -162,6 +162,28 @@ int of_free_phandle_cache(void) late_initcall_sync(of_free_phandle_cache); #endif
+/* + * Caller must hold devtree_lock. + */ +void __of_free_phandle_cache_entry(phandle handle) +{ + phandle masked_handle; + struct device_node *np; + + if (!handle) + return; + + masked_handle = handle & phandle_cache_mask; + + if (phandle_cache) { + np = phandle_cache[masked_handle]; + if (np && handle == np->phandle) { + of_node_put(np); + phandle_cache[masked_handle] = NULL; + } + } +} + void of_populate_phandle_cache(void) { unsigned long flags; @@ -1209,11 +1231,18 @@ struct device_node *of_find_node_by_phan if (phandle_cache[masked_handle] && handle == phandle_cache[masked_handle]->phandle) np = phandle_cache[masked_handle]; + if (np && of_node_check_flag(np, OF_DETACHED)) { + WARN_ON(1); /* did not uncache np on node removal */ + of_node_put(np); + phandle_cache[masked_handle] = NULL; + np = NULL; + } }
if (!np) { for_each_of_allnodes(np) - if (np->phandle == handle) { + if (np->phandle == handle && + !of_node_check_flag(np, OF_DETACHED)) { if (phandle_cache) { /* will put when removed from cache */ of_node_get(np); --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -268,6 +268,9 @@ void __of_detach_node(struct device_node }
of_node_set_flag(np, OF_DETACHED); + + /* race with of_find_node_by_phandle() prevented by devtree_lock */ + __of_free_phandle_cache_entry(np->phandle); }
/** --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -84,6 +84,10 @@ static inline void __of_detach_node_sysf int of_resolve_phandles(struct device_node *tree); #endif
+#if defined(CONFIG_OF_DYNAMIC) +void __of_free_phandle_cache_entry(phandle handle); +#endif + #if defined(CONFIG_OF_OVERLAY) void of_overlay_mutex_lock(void); void of_overlay_mutex_unlock(void);
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Leroy christophe.leroy@c-s.fr
commit 10fdf838e5f540beca466e9d1325999c072e5d3f upstream.
On several arches, virt_to_phys() is in io.h
Build fails without it:
CC lib/test_debug_virtual.o lib/test_debug_virtual.c: In function 'test_debug_virtual_init': lib/test_debug_virtual.c:26:7: error: implicit declaration of function 'virt_to_phys' [-Werror=implicit-function-declaration] pa = virt_to_phys(va); ^
Fixes: e4dace361552 ("lib: add test module for CONFIG_DEBUG_VIRTUAL") CC: stable@vger.kernel.org Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/test_debug_virtual.c | 1 + 1 file changed, 1 insertion(+)
--- a/lib/test_debug_virtual.c +++ b/lib/test_debug_virtual.c @@ -5,6 +5,7 @@ #include <linux/vmalloc.h> #include <linux/slab.h> #include <linux/sizes.h> +#include <linux/io.h>
#include <asm/page.h> #ifdef CONFIG_MIPS
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lyude Paul lyude@redhat.com
commit b89fdf7ae8500feae1100d8b283176a44d31d698 upstream.
We need to actually make sure we check this on resume since otherwise we won't know whether or not the topology is still there once we've resumed, which will cause us to still think the topology is connected even after it's been removed if the removal happens mid-suspend.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs bskeggs@redhat.com Signed-off-by: Ben Skeggs bskeggs@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/dispnv50/disp.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -1262,8 +1262,16 @@ nv50_mstm_fini(struct nv50_mstm *mstm) static void nv50_mstm_init(struct nv50_mstm *mstm) { - if (mstm && mstm->mgr.mst_state) - drm_dp_mst_topology_mgr_resume(&mstm->mgr); + int ret; + + if (!mstm || !mstm->mgr.mst_state) + return; + + ret = drm_dp_mst_topology_mgr_resume(&mstm->mgr); + if (ret == -1) { + drm_dp_mst_topology_mgr_set_mst(&mstm->mgr, false); + drm_kms_helper_hotplug_event(mstm->mgr.dev); + } }
static void
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Brezillon boris.brezillon@bootlin.com
commit 2b02a05bdc3a62d36e0d0b015351897109e25991 upstream.
When vc4_plane_state is duplicated ->is_yuv is left assigned to its previous value, and we never set it back to false when switching to a non-YUV format.
Fix that by setting ->is_yuv to false in the 'num_planes == 1' branch of the vc4_plane_setup_clipping_and_scaling() function.
Fixes: fc04023fafecf ("drm/vc4: Add support for YUV planes.") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Reviewed-by: Eric Anholt eric@anholt.net Link: https://patchwork.freedesktop.org/patch/msgid/20181009132446.21960-1-boris.b... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vc4/vc4_plane.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/vc4/vc4_plane.c +++ b/drivers/gpu/drm/vc4/vc4_plane.c @@ -321,6 +321,7 @@ static int vc4_plane_setup_clipping_and_ if (vc4_state->is_unity) vc4_state->x_scaling[0] = VC4_SCALING_PPF; } else { + vc4_state->is_yuv = false; vc4_state->x_scaling[1] = VC4_SCALING_NONE; vc4_state->y_scaling[1] = VC4_SCALING_NONE; }
4.20-stable review patch. If anyone has any objections, please let me know.
------------------
From: Enric Balletbo i Serra enric.balletbo@collabora.com
commit 4eda776c3cefcb1f01b2d85bd8753f67606282b5 upstream.
'encoder' is dereferenced before it is null sanity checked, hence we potentially have a null pointer dereference bug. Instead, initialise drm_drv from encoder->dev->dev_private after we are sure 'encoder' is not null.
Fixes: 5182c1a556d7f ("drm/rockchip: add an common abstracted PSR driver") Cc: stable@vger.kernel.org Signed-off-by: Enric Balletbo i Serra enric.balletbo@collabora.com Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20181013105654.11827-1-enric.b... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/rockchip/rockchip_drm_psr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/rockchip/rockchip_drm_psr.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_psr.c @@ -189,12 +189,14 @@ EXPORT_SYMBOL(rockchip_drm_psr_flush_all int rockchip_drm_psr_register(struct drm_encoder *encoder, int (*psr_set)(struct drm_encoder *, bool enable)) { - struct rockchip_drm_private *drm_drv = encoder->dev->dev_private; + struct rockchip_drm_private *drm_drv; struct psr_drv *psr;
if (!encoder || !psr_set) return -EINVAL;
+ drm_drv = encoder->dev->dev_private; + psr = kzalloc(sizeof(struct psr_drv), GFP_KERNEL); if (!psr) return -ENOMEM;
On 1/11/19 7:14 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.20.2 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jan 13 13:10:14 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.2-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Fri, Jan 11, 2019 at 02:35:25PM -0700, shuah wrote:
On 1/11/19 7:14 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.20.2 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jan 13 13:10:14 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.2-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Thanks for testing all of these and letting me know.
greg k-h
On Fri, 11 Jan 2019 at 20:13, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.20.2 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jan 13 13:10:14 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.2-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.20.2-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.20.y git commit: 61a581953554b7aae086213d64da1fd3a760e0bd git describe: v4.20.1-66-g61a581953554 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.20-oe/build/v4.20.1-66-...
No regressions (compared to build v4.20.1)
No fixes (compared to build v4.20.1)
Ran 20568 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-fs-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Sat, Jan 12, 2019 at 01:58:59PM +0530, Naresh Kamboju wrote:
On Fri, 11 Jan 2019 at 20:13, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.20.2 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jan 13 13:10:14 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.20.2-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.20.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing all of these and letting me know.
greg k-h
On 1/11/19 6:14 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.20.2 release. There are 65 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jan 13 13:10:14 UTC 2019. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 332 pass: 332 fail: 0
Guenter
linux-stable-mirror@lists.linaro.org