This is the start of the stable review cycle for the 4.19.79 release. There are 114 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.79-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.79-rc1
Johannes Berg johannes.berg@intel.com nl80211: validate beacon head
Jouni Malinen j@w1.fi cfg80211: Use const more consistently in for_each_element macros
Johannes Berg johannes.berg@intel.com cfg80211: add and use strongly typed element iteration macros
Gao Xiang gaoxiang25@huawei.com staging: erofs: detect potential multiref due to corrupted images
Gao Xiang gaoxiang25@huawei.com staging: erofs: add two missing erofs_workgroup_put for corrupted images
Gao Xiang gaoxiang25@huawei.com staging: erofs: some compressed cluster should be submitted for corrupted images
Gao Xiang gaoxiang25@huawei.com staging: erofs: fix an error handling in erofs_readdir()
Andrew Murray andrew.murray@arm.com coresight: etm4x: Use explicit barriers on enable/disable
Eric Sandeen sandeen@redhat.com vfs: Fix EOVERFLOW testing in put_compat_statfs64
Josh Poimboeuf jpoimboe@redhat.com arm64/speculation: Support 'mitigations=' cmdline option
Marc Zyngier marc.zyngier@arm.com arm64: Use firmware to detect CPUs that are not affected by Spectre-v2
Marc Zyngier marc.zyngier@arm.com arm64: Force SSBS on context switch
Will Deacon will.deacon@arm.com arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
Jeremy Linton jeremy.linton@arm.com arm64: add sysfs vulnerability show for speculative store bypass
Jeremy Linton jeremy.linton@arm.com arm64: add sysfs vulnerability show for spectre-v2
Jeremy Linton jeremy.linton@arm.com arm64: Always enable spectre-v2 vulnerability detection
Marc Zyngier marc.zyngier@arm.com arm64: Advertise mitigation of Spectre-v2, or lack thereof
Jeremy Linton jeremy.linton@arm.com arm64: Provide a command line to disable spectre_v2 mitigation
Jeremy Linton jeremy.linton@arm.com arm64: Always enable ssb vulnerability detection
Mian Yousaf Kaukab ykaukab@suse.de arm64: enable generic CPU vulnerabilites support
Jeremy Linton jeremy.linton@arm.com arm64: add sysfs vulnerability show for meltdown
Mian Yousaf Kaukab ykaukab@suse.de arm64: Add sysfs vulnerability show for spectre-v1
Mark Rutland mark.rutland@arm.com arm64: fix SSBS sanitization
Will Deacon will.deacon@arm.com arm64: docs: Document SSBS HWCAP
Will Deacon will.deacon@arm.com KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
Will Deacon will.deacon@arm.com arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
Vincent Chen vincent.chen@sifive.com riscv: Avoid interrupts being erroneously enabled in handle_exception()
Chris Wilson chris@chris-wilson.co.uk drm/i915/userptr: Acquire the page lock around set_page_dirty()
Srikar Dronamraju srikar@linux.vnet.ibm.com perf stat: Reset previous counts on repeat with interval
Jiri Olsa jolsa@kernel.org perf tools: Fix segfault in cpu_cache_level__read()
Balasubramani Vivekanandan balasubramani_vivekanandan@mentor.com tick: broadcast-hrtimer: Fix a race in bc_set_next
Steven Rostedt (VMware) rostedt@goodmis.org tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
Gautham R. Shenoy ego@linux.vnet.ibm.com powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
Xiubo Li xiubli@redhat.com nbd: fix crash when the blksize is zero
Sean Christopherson sean.j.christopherson@intel.com KVM: nVMX: Fix consistency check on injected exception error code
Cédric Le Goater clg@kaod.org KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP
Hans de Goede hdegoede@redhat.com drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
Navid Emamdoost navid.emamdoost@gmail.com nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
Arnaldo Carvalho de Melo acme@redhat.com perf unwind: Fix libunwind build failure on i386 systems
Valdis Kletnieks valdis.kletnieks@vt.edu kernel/elfcore.c: include proper prototypes
Thomas Richter tmricht@linux.ibm.com perf build: Add detection of java-11-openjdk-devel package
KeMeng Shi shikemeng@huawei.com sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
Mathieu Desnoyers mathieu.desnoyers@efficios.com sched/membarrier: Fix private expedited registration check
Mathieu Desnoyers mathieu.desnoyers@efficios.com sched/membarrier: Call sync_core only before usermode for same mm
Nathan Chancellor natechancellor@gmail.com libnvdimm/nfit_test: Fix acpi_handle redefinition
zhengbin zhengbin13@huawei.com fuse: fix memleak in cuse_channel_open
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com libnvdimm/region: Initialize bad block for volatile namespaces
Stefan Mavrodiev stefan@olimex.com thermal_hwmon: Sanitize thermal_zone type
Ido Schimmel idosch@mellanox.com thermal: Fix use-after-free when unregistering thermal zone device
Sanjay R Mehta sanju.mehta@amd.com ntb: point to right memory window index
Arvind Sankar nivedita@alum.mit.edu x86/purgatory: Disable the stackleak GCC plugin for the purgatory
Fabrice Gasnier fabrice.gasnier@st.com pwm: stm32-lp: Add check in case requested period cannot be achieved
Trond Myklebust trondmy@gmail.com pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
Trek trek00@inbox.ru drm/amdgpu: Check for valid number of registers to read
Felix Kuehling Felix.Kuehling@amd.com drm/amdgpu: Fix KFD-related kernel oops on Hawaii
Florian Westphal fw@strlen.de netfilter: nf_tables: allow lookups in dynamic sets
Ryan Chen ryan_chen@aspeedtech.com watchdog: aspeed: Add support for AST2600
Erqi Chen chenerqi@gmail.com ceph: reconnect connection if session hang in opening state
Luis Henriques lhenriques@suse.com ceph: fix directories inode i_blkbits initialization
Igor Druzhinin igor.druzhinin@citrix.com xen/pci: reserve MCFG areas earlier
Chengguang Xu cgxu519@zoho.com.cn 9p: avoid attaching writeback_fid on mmap with type PRIVATE
Lu Shuaibing shuaibinglu@126.com 9p: Transport error uninitialized
Jia-Ju Bai baijiaju1990@gmail.com fs: nfs: Fix possible null-pointer dereferences in encode_attrs()
Sascha Hauer s.hauer@pengutronix.de ima: fix freeing ongoing ahash_request
Sascha Hauer s.hauer@pengutronix.de ima: always return negative code for error
Will Deacon will.deacon@arm.com arm64: cpufeature: Detect SSBS and advertise to userspace
Johannes Berg johannes.berg@intel.com cfg80211: initialize on-stack chandefs
Vasily Gorbik gor@linux.ibm.com s390/cio: avoid calling strlen on null pointer
Johan Hovold johan@kernel.org ieee802154: atusb: fix use-after-free at disconnect
Juergen Gross jgross@suse.com xen/xenbus: fix self-deadlock after killing user process
Wanpeng Li wanpengli@tencent.com Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
Russell King rmk+kernel@armlinux.org.uk mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
Russell King rmk+kernel@armlinux.org.uk mmc: sdhci: improve ADMA error reporting
Xiaolin Zhang xiaolin.zhang@intel.com drm/i915/gvt: update vgpu workload head pointer correctly
Lyude Paul lyude@redhat.com drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors
Sean Paul seanpaul@chromium.org drm/msm/dsi: Fix return value check for clk_get_parent
Tomi Valkeinen tomi.valkeinen@ti.com drm/omap: fix max fclk divider for omap36xx
Srikar Dronamraju srikar@linux.vnet.ibm.com perf stat: Fix a segmentation fault when using repeat forever
Rasmus Villemoes linux@rasmusvillemoes.dk watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout
Sumit Saxena sumit.saxena@broadcom.com PCI: Restore Resizable BAR size bits correctly for 1MB BARs
Jon Derrick jonathan.derrick@intel.com PCI: vmd: Fix shadow offsets to reflect spec changes
Li RongQing lirongqing@baidu.com timer: Read jiffies once when forwarding base clk
Kees Cook keescook@chromium.org usercopy: Avoid HIGHMEM pfn warning
Tom Zanussi zanussi@kernel.org tracing: Make sure variable reference alias has correct var_ref_idx
Michael Nosthoff committed@heine.so power: supply: sbs-battery: only return health when battery present
Michael Nosthoff committed@heine.so power: supply: sbs-battery: use correct flags field
Jiaxun Yang jiaxun.yang@flygoat.com MIPS: Treat Loongson Extensions as ASEs
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - use the full crypt length value
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - account for TEE not ready to report
Horia Geantă horia.geanta@nxp.com crypto: caam - fix concurrency issue in givencrypt descriptor
Wei Yongjun weiyongjun1@huawei.com crypto: cavium/zip - Add missing single_release()
Herbert Xu herbert@gondor.apana.org.au crypto: skcipher - Unmap pages after an external error
Alexander Sverdlin alexander.sverdlin@nokia.com crypto: qat - Silence smp_processor_id() warning
Steven Rostedt (VMware) rostedt@goodmis.org tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file
Marc Kleine-Budde mkl@pengutronix.de can: mcp251x: mcp251x_hw_reset(): allow more time after a reset
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
Alexey Kardashevskiy aik@ozlabs.ru powerpc/powernv/ioda: Fix race in TCE level allocation
Andrew Donnellan ajd@linux.ibm.com powerpc/powernv: Restrict OPAL symbol map to only be readable by root
Santosh Sivaraj santosh@fossix.org powerpc/mce: Schedule work from irq_work
Balbir Singh bsingharora@gmail.com powerpc/mce: Fix MCE handling for huge pages
Oleksandr Suvorov oleksandr.suvorov@toradex.com ASoC: sgtl5000: Improve VAG power and mute control
Oleksandr Suvorov oleksandr.suvorov@toradex.com ASoC: Define a set of DAPM pre/post-up events
Dmitry Osipenko digetx@gmail.com PM / devfreq: tegra: Fix kHz to Hz conversion
Mike Christie mchristi@redhat.com nbd: fix max number of supported devs
Jack Wang jinpu.wang@cloud.ionos.com KVM: nVMX: handle page fault in vmread fix
Wanpeng Li wanpengli@tencent.com KVM: X86: Fix userspace set invalid CR4
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts
Vasily Gorbik gor@linux.ibm.com s390/cio: exclude subchannels with no parent from pseudo check
Vasily Gorbik gor@linux.ibm.com s390/topology: avoid firing events before kobjs are created
Thomas Huth thuth@redhat.com KVM: s390: Test for bad access register and size at the start of S390_MEM_OP
Vasily Gorbik gor@linux.ibm.com s390/process: avoid potential reading of freed stack
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 16 +- Documentation/arm64/elf_hwcaps.txt | 4 + Makefile | 4 +- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/cpufeature.h | 4 - arch/arm64/include/asm/kvm_host.h | 11 + arch/arm64/include/asm/processor.h | 17 ++ arch/arm64/include/asm/ptrace.h | 1 + arch/arm64/include/asm/sysreg.h | 19 +- arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 1 + arch/arm64/kernel/cpu_errata.c | 271 +++++++++++++++++------ arch/arm64/kernel/cpufeature.c | 130 +++++++++-- arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/process.c | 31 +++ arch/arm64/kernel/ptrace.c | 15 +- arch/arm64/kernel/ssbd.c | 21 ++ arch/arm64/kvm/hyp/sysreg-sr.c | 11 + arch/mips/include/asm/cpu-features.h | 16 ++ arch/mips/include/asm/cpu.h | 4 + arch/mips/kernel/cpu-probe.c | 6 + arch/mips/kernel/proc.c | 4 + arch/powerpc/include/asm/cputable.h | 4 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 30 ++- arch/powerpc/kernel/mce.c | 11 +- arch/powerpc/kernel/mce_power.c | 19 +- arch/powerpc/kvm/book3s_hv.c | 24 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 36 +-- arch/powerpc/kvm/book3s_xive.c | 18 +- arch/powerpc/mm/hash_native_64.c | 2 +- arch/powerpc/mm/hash_utils_64.c | 9 +- arch/powerpc/mm/tlb-radix.c | 4 +- arch/powerpc/platforms/powernv/opal.c | 11 +- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +- arch/powerpc/platforms/pseries/lpar.c | 8 +- arch/riscv/kernel/entry.S | 6 +- arch/s390/kernel/process.c | 22 +- arch/s390/kernel/topology.c | 3 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/x86/kvm/vmx.c | 4 +- arch/x86/kvm/x86.c | 38 ++-- arch/x86/purgatory/Makefile | 1 + crypto/skcipher.c | 42 ++-- drivers/block/nbd.c | 62 ++++-- drivers/crypto/caam/caamalg_desc.c | 9 + drivers/crypto/caam/caamalg_desc.h | 2 +- drivers/crypto/cavium/zip/zip_main.c | 3 + drivers/crypto/ccree/cc_aead.c | 2 +- drivers/crypto/ccree/cc_fips.c | 8 +- drivers/crypto/qat/qat_common/adf_common_drv.h | 2 +- drivers/devfreq/tegra-devfreq.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 + drivers/gpu/drm/i915/gvt/scheduler.c | 28 +-- drivers/gpu/drm/i915/i915_gem_userptr.c | 10 +- drivers/gpu/drm/msm/dsi/dsi_host.c | 8 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 +- drivers/gpu/drm/omapdrm/dss/dss.c | 2 +- drivers/gpu/drm/radeon/radeon_drv.c | 31 +++ drivers/gpu/drm/radeon/radeon_kms.c | 25 --- drivers/hwtracing/coresight/coresight-etm4x.c | 14 +- drivers/mmc/host/sdhci-of-esdhc.c | 7 +- drivers/mmc/host/sdhci.c | 15 +- drivers/net/can/spi/mcp251x.c | 19 +- drivers/net/ethernet/netronome/nfp/flower/main.c | 1 + drivers/net/ieee802154/atusb.c | 3 +- drivers/ntb/test/ntb_perf.c | 2 +- drivers/nvdimm/bus.c | 2 +- drivers/nvdimm/region.c | 4 +- drivers/nvdimm/region_devs.c | 4 +- drivers/pci/controller/vmd.c | 9 +- drivers/pci/pci.c | 2 +- drivers/power/supply/sbs-battery.c | 27 ++- drivers/pwm/pwm-stm32-lp.c | 6 + drivers/s390/cio/ccwgroup.c | 2 +- drivers/s390/cio/css.c | 2 + drivers/staging/erofs/dir.c | 11 +- drivers/staging/erofs/unzip_vle.c | 37 +++- drivers/thermal/thermal_core.c | 2 +- drivers/thermal/thermal_hwmon.c | 8 +- drivers/watchdog/aspeed_wdt.c | 4 +- drivers/watchdog/imx2_wdt.c | 4 +- drivers/xen/pci.c | 21 +- drivers/xen/xenbus/xenbus_dev_frontend.c | 20 +- fs/9p/vfs_file.c | 3 + fs/ceph/inode.c | 7 +- fs/ceph/mds_client.c | 4 +- fs/fuse/cuse.c | 1 + fs/nfs/nfs4xdr.c | 2 +- fs/nfs/pnfs.c | 9 +- fs/statfs.c | 17 +- include/linux/ieee80211.h | 53 +++++ include/linux/sched/mm.h | 2 + include/sound/soc-dapm.h | 2 + kernel/elfcore.c | 1 + kernel/locking/qspinlock_paravirt.h | 2 +- kernel/sched/core.c | 4 +- kernel/sched/membarrier.c | 2 +- kernel/time/tick-broadcast-hrtimer.c | 57 ++--- kernel/time/timer.c | 8 +- kernel/trace/trace_events_hist.c | 2 + mm/usercopy.c | 8 +- net/9p/client.c | 1 + net/netfilter/nf_tables_api.c | 7 +- net/netfilter/nft_lookup.c | 3 - net/wireless/nl80211.c | 42 +++- net/wireless/reg.c | 2 +- net/wireless/scan.c | 14 +- net/wireless/wext-compat.c | 2 +- security/integrity/ima/ima_crypto.c | 10 +- sound/soc/codecs/sgtl5000.c | 224 ++++++++++++++++--- tools/lib/traceevent/Makefile | 4 +- tools/lib/traceevent/event-parse.c | 3 +- tools/perf/Makefile.config | 2 +- tools/perf/arch/x86/util/unwind-libunwind.c | 2 +- tools/perf/builtin-stat.c | 5 +- tools/perf/util/header.c | 2 +- tools/perf/util/stat.c | 17 ++ tools/perf/util/stat.h | 1 + tools/testing/nvdimm/test/nfit_test.h | 4 +- 122 files changed, 1415 insertions(+), 459 deletions(-)
From: Vasily Gorbik gor@linux.ibm.com
commit 8769f610fe6d473e5e8e221709c3ac402037da6c upstream.
With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage is refcounted and should always be protected by get/put when touching other task's stack to avoid race conditions with task's destruction code.
Fixes: d5c352cdd022 ("s390: move thread_info into task_struct") Cc: stable@vger.kernel.org # v4.10+ Acked-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/process.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
--- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -183,20 +183,30 @@ unsigned long get_wchan(struct task_stru
if (!p || p == current || p->state == TASK_RUNNING || !task_stack_page(p)) return 0; + + if (!try_get_task_stack(p)) + return 0; + low = task_stack_page(p); high = (struct stack_frame *) task_pt_regs(p); sf = (struct stack_frame *) p->thread.ksp; - if (sf <= low || sf > high) - return 0; + if (sf <= low || sf > high) { + return_address = 0; + goto out; + } for (count = 0; count < 16; count++) { sf = (struct stack_frame *) sf->back_chain; - if (sf <= low || sf > high) - return 0; + if (sf <= low || sf > high) { + return_address = 0; + goto out; + } return_address = sf->gprs[8]; if (!in_sched_functions(return_address)) - return return_address; + goto out; } - return 0; +out: + put_task_stack(p); + return return_address; }
unsigned long arch_align_stack(unsigned long sp)
From: Thomas Huth thuth@redhat.com
commit a13b03bbb4575b350b46090af4dfd30e735aaed1 upstream.
If the KVM_S390_MEM_OP ioctl is called with an access register >= 16, then there is certainly a bug in the calling userspace application. We check for wrong access registers, but only if the vCPU was already in the access register mode before (i.e. the SIE block has recorded it). The check is also buried somewhere deep in the calling chain (in the function ar_translation()), so this is somewhat hard to find.
It's better to always report an error to the userspace in case this field is set wrong, and it's safer in the KVM code if we block wrong values here early instead of relying on a check somewhere deep down the calling chain, so let's add another check to kvm_s390_guest_mem_op() directly.
We also should check that the "size" is non-zero here (thanks to Janosch Frank for the hint!). If we do not check the size, we could call vmalloc() with this 0 value, and this will cause a kernel warning.
Signed-off-by: Thomas Huth thuth@redhat.com Link: https://lkml.kernel.org/r/20190829122517.31042-1-thuth@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3890,7 +3890,7 @@ static long kvm_s390_guest_mem_op(struct const u64 supported_flags = KVM_S390_MEMOP_F_INJECT_EXCEPTION | KVM_S390_MEMOP_F_CHECK_ONLY;
- if (mop->flags & ~supported_flags) + if (mop->flags & ~supported_flags || mop->ar >= NUM_ACRS || !mop->size) return -EINVAL;
if (mop->size > MEM_OP_MAX_SIZE)
From: Vasily Gorbik gor@linux.ibm.com
commit f3122a79a1b0a113d3aea748e0ec26f2cb2889de upstream.
arch_update_cpu_topology is first called from: kernel_init_freeable->sched_init_smp->sched_init_domains
even before cpus has been registered in: kernel_init_freeable->do_one_initcall->s390_smp_init
Do not trigger kobject_uevent change events until cpu devices are actually created. Fixes the following kasan findings:
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0 Read of size 8 at addr 0000000000000020 by task swapper/0/1
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0 Read of size 8 at addr 0000000000000018 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: ([<0000000143c6db7e>] show_stack+0x14e/0x1a8) [<0000000145956498>] dump_stack+0x1d0/0x218 [<000000014429fb4c>] print_address_description+0x64/0x380 [<000000014429f630>] __kasan_report+0x138/0x168 [<0000000145960b96>] kobject_uevent_env+0xb36/0xee0 [<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108 [<0000000143df9e22>] sched_init_domains+0x62/0xe8 [<000000014644c94a>] sched_init_smp+0x3a/0xc0 [<0000000146433a20>] kernel_init_freeable+0x558/0x958 [<000000014599002a>] kernel_init+0x22/0x160 [<00000001459a71d4>] ret_from_fork+0x28/0x30 [<00000001459a71dc>] kernel_thread_starter+0x0/0x10
Cc: stable@vger.kernel.org Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/topology.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/s390/kernel/topology.c +++ b/arch/s390/kernel/topology.c @@ -311,7 +311,8 @@ int arch_update_cpu_topology(void) on_each_cpu(__arch_update_dedicated_flag, NULL, 0); for_each_online_cpu(cpu) { dev = get_cpu_device(cpu); - kobject_uevent(&dev->kobj, KOBJ_CHANGE); + if (dev) + kobject_uevent(&dev->kobj, KOBJ_CHANGE); } return rc; }
From: Vasily Gorbik gor@linux.ibm.com
commit ab5758848039de9a4b249d46e4ab591197eebaf2 upstream.
ccw console is created early in start_kernel and used before css is initialized or ccw console subchannel is registered. Until then console subchannel does not have a parent. For that reason assume subchannels with no parent are not pseudo subchannels. This fixes the following kasan finding:
BUG: KASAN: global-out-of-bounds in sch_is_pseudo_sch+0x8e/0x98 Read of size 8 at addr 00000000000005e8 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc8-07370-g6ac43dd12538 #2 Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) Call Trace: ([<000000000012cd76>] show_stack+0x14e/0x1e0) [<0000000001f7fb44>] dump_stack+0x1a4/0x1f8 [<00000000007d7afc>] print_address_description+0x64/0x3c8 [<00000000007d75f6>] __kasan_report+0x14e/0x180 [<00000000018a2986>] sch_is_pseudo_sch+0x8e/0x98 [<000000000189b950>] cio_enable_subchannel+0x1d0/0x510 [<00000000018cac7c>] ccw_device_recognition+0x12c/0x188 [<0000000002ceb1a8>] ccw_device_enable_console+0x138/0x340 [<0000000002cf1cbe>] con3215_init+0x25e/0x300 [<0000000002c8770a>] console_init+0x68a/0x9b8 [<0000000002c6a3d6>] start_kernel+0x4fe/0x728 [<0000000000100070>] startup_continue+0x70/0xd0
Cc: stable@vger.kernel.org Reviewed-by: Sebastian Ott sebott@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/css.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/s390/cio/css.c +++ b/drivers/s390/cio/css.c @@ -1213,6 +1213,8 @@ device_initcall(cio_settle_init);
int sch_is_pseudo_sch(struct subchannel *sch) { + if (!sch->dev.parent) + return 0; return sch == to_css(sch->dev.parent)->pseudo_subchannel; }
From: Paul Mackerras paulus@ozlabs.org
commit 959c5d5134786b4988b6fdd08e444aa67d1667ed upstream.
Escalation interrupts are interrupts sent to the host by the XIVE hardware when it has an interrupt to deliver to a guest VCPU but that VCPU is not running anywhere in the system. Hence we disable the escalation interrupt for the VCPU being run when we enter the guest and re-enable it when the guest does an H_CEDE hypercall indicating it is idle.
It is possible that an escalation interrupt gets generated just as we are entering the guest. In that case the escalation interrupt may be using a queue entry in one of the interrupt queues, and that queue entry may not have been processed when the guest exits with an H_CEDE. The existing entry code detects this situation and does not clear the vcpu->arch.xive_esc_on flag as an indication that there is a pending queue entry (if the queue entry gets processed, xive_esc_irq() will clear the flag). There is a comment in the code saying that if the flag is still set on H_CEDE, we have to abort the cede rather than re-enabling the escalation interrupt, lest we end up with two occurrences of the escalation interrupt in the interrupt queue.
However, the exit code doesn't do that; it aborts the cede in the sense that vcpu->arch.ceded gets cleared, but it still enables the escalation interrupt by setting the source's PQ bits to 00. Instead we need to set the PQ bits to 10, indicating that an interrupt has been triggered. We also need to avoid setting vcpu->arch.xive_esc_on in this case (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because xive_esc_irq() will run at some point and clear it, and if we race with that we may end up with an incorrect result (i.e. xive_esc_on set when the escalation interrupt has just been handled).
It is extremely unlikely that having two queue entries would cause observable problems; theoretically it could cause queue overflow, but the CPU would have to have thousands of interrupts targetted to it for that to be possible. However, this fix will also make it possible to determine accurately whether there is an unhandled escalation interrupt in the queue, which will be needed by the following patch.
Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 36 ++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 13 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2903,29 +2903,39 @@ kvm_cede_prodded: kvm_cede_exit: ld r9, HSTATE_KVM_VCPU(r13) #ifdef CONFIG_KVM_XICS - /* Abort if we still have a pending escalation */ + /* are we using XIVE with single escalation? */ + ld r10, VCPU_XIVE_ESC_VADDR(r9) + cmpdi r10, 0 + beq 3f + li r6, XIVE_ESB_SET_PQ_00 + /* + * If we still have a pending escalation, abort the cede, + * and we must set PQ to 10 rather than 00 so that we don't + * potentially end up with two entries for the escalation + * interrupt in the XIVE interrupt queue. In that case + * we also don't want to set xive_esc_on to 1 here in + * case we race with xive_esc_irq(). + */ lbz r5, VCPU_XIVE_ESC_ON(r9) cmpwi r5, 0 - beq 1f + beq 4f li r0, 0 stb r0, VCPU_CEDED(r9) -1: /* Enable XIVE escalation */ - li r5, XIVE_ESB_SET_PQ_00 + li r6, XIVE_ESB_SET_PQ_10 + b 5f +4: li r0, 1 + stb r0, VCPU_XIVE_ESC_ON(r9) + /* make sure store to xive_esc_on is seen before xive_esc_irq runs */ + sync +5: /* Enable XIVE escalation */ mfmsr r0 andi. r0, r0, MSR_DR /* in real mode? */ beq 1f - ld r10, VCPU_XIVE_ESC_VADDR(r9) - cmpdi r10, 0 - beq 3f - ldx r0, r10, r5 + ldx r0, r10, r6 b 2f 1: ld r10, VCPU_XIVE_ESC_RADDR(r9) - cmpdi r10, 0 - beq 3f - ldcix r0, r10, r5 + ldcix r0, r10, r6 2: sync - li r0, 1 - stb r0, VCPU_XIVE_ESC_ON(r9) #endif /* CONFIG_KVM_XICS */ 3: b guest_exit_cont
From: Paul Mackerras paulus@ozlabs.org
commit d28eafc5a64045c78136162af9d4ba42f8230080 upstream.
When we are running multiple vcores on the same physical core, they could be from different VMs and so it is possible that one of the VMs could have its arch.mmu_ready flag cleared (for example by a concurrent HPT resize) when we go to run it on a physical core. We currently check the arch.mmu_ready flag for the primary vcore but not the flags for the other vcores that will be run alongside it. This adds that check, and also a check when we select the secondary vcores from the preempted vcores list.
Cc: stable@vger.kernel.org # v4.14+ Fixes: 38c53af85306 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates") Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2550,7 +2550,7 @@ static void collect_piggybacks(struct co if (!spin_trylock(&pvc->lock)) continue; prepare_threads(pvc); - if (!pvc->n_runnable) { + if (!pvc->n_runnable || !pvc->kvm->arch.mmu_ready) { list_del_init(&pvc->preempt_list); if (pvc->runner == NULL) { pvc->vcore_state = VCORE_INACTIVE; @@ -2571,15 +2571,20 @@ static void collect_piggybacks(struct co spin_unlock(&lp->lock); }
-static bool recheck_signals(struct core_info *cip) +static bool recheck_signals_and_mmu(struct core_info *cip) { int sub, i; struct kvm_vcpu *vcpu; + struct kvmppc_vcore *vc;
- for (sub = 0; sub < cip->n_subcores; ++sub) - for_each_runnable_thread(i, vcpu, cip->vc[sub]) + for (sub = 0; sub < cip->n_subcores; ++sub) { + vc = cip->vc[sub]; + if (!vc->kvm->arch.mmu_ready) + return true; + for_each_runnable_thread(i, vcpu, vc) if (signal_pending(vcpu->arch.run_task)) return true; + } return false; }
@@ -2800,7 +2805,7 @@ static noinline void kvmppc_run_core(str local_irq_disable(); hard_irq_disable(); if (lazy_irq_pending() || need_resched() || - recheck_signals(&core_info) || !vc->kvm->arch.mmu_ready) { + recheck_signals_and_mmu(&core_info)) { local_irq_enable(); vc->vcore_state = VCORE_INACTIVE; /* Unlock all except the primary vcore */
From: Paul Mackerras paulus@ozlabs.org
commit ff42df49e75f053a8a6b4c2533100cdcc23afe69 upstream.
On POWER9, when userspace reads the value of the DPDES register on a vCPU, it is possible for 0 to be returned although there is a doorbell interrupt pending for the vCPU. This can lead to a doorbell interrupt being lost across migration. If the guest kernel uses doorbell interrupts for IPIs, then it could malfunction because of the lost interrupt.
This happens because a newly-generated doorbell interrupt is signalled by setting vcpu->arch.doorbell_request to 1; the DPDES value in vcpu->arch.vcore->dpdes is not updated, because it can only be updated when holding the vcpu mutex, in order to avoid races.
To fix this, we OR in vcpu->arch.doorbell_request when reading the DPDES value.
Cc: stable@vger.kernel.org # v4.13+ Fixes: 579006944e0d ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9") Signed-off-by: Paul Mackerras paulus@ozlabs.org Tested-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1407,7 +1407,14 @@ static int kvmppc_get_one_reg_hv(struct *val = get_reg_val(id, vcpu->arch.pspb); break; case KVM_REG_PPC_DPDES: - *val = get_reg_val(id, vcpu->arch.vcore->dpdes); + /* + * On POWER9, where we are emulating msgsndp etc., + * we return 1 bit for each vcpu, which can come from + * either vcore->dpdes or doorbell_request. + * On POWER8, doorbell_request is 0. + */ + *val = get_reg_val(id, vcpu->arch.vcore->dpdes | + vcpu->arch.doorbell_request); break; case KVM_REG_PPC_VTB: *val = get_reg_val(id, vcpu->arch.vcore->vtb);
From: Wanpeng Li wanpengli@tencent.com
commit 3ca94192278ca8de169d78c085396c424be123b3 upstream.
Reported by syzkaller:
WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel] CPU: 0 PID: 6544 Comm: a.out Tainted: G OE 5.3.0-rc4+ #4 RIP: 0010:handle_desc+0x37/0x40 [kvm_intel] Call Trace: vmx_handle_exit+0xbe/0x6b0 [kvm_intel] vcpu_enter_guest+0x4dc/0x18d0 [kvm] kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm] kvm_vcpu_ioctl+0x3ad/0x690 [kvm] do_vfs_ioctl+0xa2/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x74/0x720 entry_SYSCALL_64_after_hwframe+0x49/0xbe
When CR4.UMIP is set, guest should have UMIP cpuid flag. Current kvm set_sregs function doesn't have such check when userspace inputs sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast triggers handle_desc warning when executing ltr instruction since guest architectural CR4 doesn't set UMIP. This patch fixes it by adding valid CR4 and CPUID combination checking in __set_sregs.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000
Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li wanpengli@tencent.com Reviewed-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -791,34 +791,42 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u } EXPORT_SYMBOL_GPL(kvm_set_xcr);
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long old_cr4 = kvm_read_cr4(vcpu); - unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | - X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE; - if (cr4 & CR4_RESERVED_BITS) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP)) + return -EINVAL; + + return 0; +} + +int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + unsigned long old_cr4 = kvm_read_cr4(vcpu); + unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | + X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE; + + if (kvm_valid_cr4(vcpu, cr4)) return 1;
if (is_long_mode(vcpu)) { @@ -8237,10 +8245,6 @@ EXPORT_SYMBOL_GPL(kvm_task_switch);
static int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { - if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && - (sregs->cr4 & X86_CR4_OSXSAVE)) - return -EINVAL; - if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) { /* * When EFER.LME and CR0.PG are set, the processor is in @@ -8259,7 +8263,7 @@ static int kvm_valid_sregs(struct kvm_vc return -EINVAL; }
- return 0; + return kvm_valid_cr4(vcpu, sregs->cr4); }
static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
From: Jack Wang jinpu.wang@cloud.ionos.com
During backport f7eea636c3d5 ("KVM: nVMX: handle page fault in vmread"), there was a mistake the exception reference should be passed to function kvm_write_guest_virt_system, instead of NULL, other wise, we will get NULL pointer deref, eg
kvm-unit-test triggered a NULL pointer deref below: [ 948.518437] kvm [24114]: vcpu0, guest rIP: 0x407ef9 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x3, nop [ 949.106464] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 949.106707] PGD 0 P4D 0 [ 949.106872] Oops: 0002 [#1] SMP [ 949.107038] CPU: 2 PID: 24126 Comm: qemu-2.7 Not tainted 4.19.77-pserver #4.19.77-1+feature+daily+update+20191005.1625+a4168bb~deb9 [ 949.107283] Hardware name: Dell Inc. Precision Tower 3620/09WH54, BIOS 2.7.3 01/31/2018 [ 949.107549] RIP: 0010:kvm_write_guest_virt_system+0x12/0x40 [kvm] [ 949.107719] Code: c0 5d 41 5c 41 5d 41 5e 83 f8 03 41 0f 94 c0 41 c1 e0 02 e9 b0 ed ff ff 0f 1f 44 00 00 48 89 f0 c6 87 59 56 00 00 01 48 89 d6 <49> c7 00 00 00 00 00 89 ca 49 c7 40 08 00 00 00 00 49 c7 40 10 00 [ 949.108044] RSP: 0018:ffffb31b0a953cb0 EFLAGS: 00010202 [ 949.108216] RAX: 000000000046b4d8 RBX: ffff9e9f415b0000 RCX: 0000000000000008 [ 949.108389] RDX: ffffb31b0a953cc0 RSI: ffffb31b0a953cc0 RDI: ffff9e9f415b0000 [ 949.108562] RBP: 00000000d2e14928 R08: 0000000000000000 R09: 0000000000000000 [ 949.108733] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffc8 [ 949.108907] R13: 0000000000000002 R14: ffff9e9f4f26f2e8 R15: 0000000000000000 [ 949.109079] FS: 00007eff8694c700(0000) GS:ffff9e9f51a80000(0000) knlGS:0000000031415928 [ 949.109318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 949.109495] CR2: 0000000000000000 CR3: 00000003be53b002 CR4: 00000000003626e0 [ 949.109671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 949.109845] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 949.110017] Call Trace: [ 949.110186] handle_vmread+0x22b/0x2f0 [kvm_intel] [ 949.110356] ? vmexit_fill_RSB+0xc/0x30 [kvm_intel] [ 949.110549] kvm_arch_vcpu_ioctl_run+0xa98/0x1b30 [kvm] [ 949.110725] ? kvm_vcpu_ioctl+0x388/0x5d0 [kvm] [ 949.110901] kvm_vcpu_ioctl+0x388/0x5d0 [kvm] [ 949.111072] do_vfs_ioctl+0xa2/0x620
Signed-off-by: Jack Wang jinpu.wang@cloud.ionos.com Acked-by: Paolo Bonzini pbonzini@redhat.com --- arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8801,7 +8801,7 @@ static int handle_vmread(struct kvm_vcpu /* _system ok, nested_vmx_check_permission has verified cpl=0 */ if (kvm_write_guest_virt_system(vcpu, gva, &field_value, (is_long_mode(vcpu) ? 8 : 4), - NULL)) + &e)) kvm_inject_page_fault(vcpu, &e); }
From: Mike Christie mchristi@redhat.com
commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 upstream.
This fixes a bug added in 4.10 with commit:
commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024 Author: Josef Bacik jbacik@fb.com Date: Tue Nov 22 14:04:40 2016 -0500
nbd: add multi-connection support
that limited the number of devices to 256. Before the patch we could create 1000s of devices, but the patch switched us from using our own thread to using a work queue which has a default limit of 256 active works.
The problem is that our recv_work function sits in a loop until disconnection but only handles IO for one connection. The work is started when the connection is started/restarted, but if we end up creating 257 or more connections, the queue_work call just queues connection257+'s recv_work and that waits for connection 1 - 256's recv_work to be disconnected and that work instance completing.
Instead of reverting back to kthreads, this has us allocate a workqueue_struct per device, so we can block in the work.
Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Mike Christie mchristi@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/nbd.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-)
--- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -106,6 +106,7 @@ struct nbd_device { struct nbd_config *config; struct mutex config_lock; struct gendisk *disk; + struct workqueue_struct *recv_workq;
struct list_head list; struct task_struct *task_recv; @@ -134,7 +135,6 @@ static struct dentry *nbd_dbg_dir;
static unsigned int nbds_max = 16; static int max_part = 16; -static struct workqueue_struct *recv_workqueue; static int part_shift;
static int nbd_dev_dbg_init(struct nbd_device *nbd); @@ -1025,7 +1025,7 @@ static int nbd_reconnect_socket(struct n /* We take the tx_mutex in an error path in the recv_work, so we * need to queue_work outside of the tx_mutex. */ - queue_work(recv_workqueue, &args->work); + queue_work(nbd->recv_workq, &args->work);
atomic_inc(&config->live_connections); wake_up(&config->conn_wait); @@ -1126,6 +1126,10 @@ static void nbd_config_put(struct nbd_de kfree(nbd->config); nbd->config = NULL;
+ if (nbd->recv_workq) + destroy_workqueue(nbd->recv_workq); + nbd->recv_workq = NULL; + nbd->tag_set.timeout = 0; nbd->disk->queue->limits.discard_granularity = 0; nbd->disk->queue->limits.discard_alignment = 0; @@ -1154,6 +1158,14 @@ static int nbd_start_device(struct nbd_d return -EINVAL; }
+ nbd->recv_workq = alloc_workqueue("knbd%d-recv", + WQ_MEM_RECLAIM | WQ_HIGHPRI | + WQ_UNBOUND, 0, nbd->index); + if (!nbd->recv_workq) { + dev_err(disk_to_dev(nbd->disk), "Could not allocate knbd recv work queue.\n"); + return -ENOMEM; + } + blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections); nbd->task_recv = current;
@@ -1184,7 +1196,7 @@ static int nbd_start_device(struct nbd_d INIT_WORK(&args->work, recv_work); args->nbd = nbd; args->index = i; - queue_work(recv_workqueue, &args->work); + queue_work(nbd->recv_workq, &args->work); } nbd_size_update(nbd); return error; @@ -1204,8 +1216,10 @@ static int nbd_start_device_ioctl(struct mutex_unlock(&nbd->config_lock); ret = wait_event_interruptible(config->recv_wq, atomic_read(&config->recv_threads) == 0); - if (ret) + if (ret) { sock_shutdown(nbd); + flush_workqueue(nbd->recv_workq); + } mutex_lock(&nbd->config_lock); nbd_bdev_reset(bdev); /* user requested, ignore socket errors */ @@ -1835,6 +1849,12 @@ static void nbd_disconnect_and_put(struc nbd_disconnect(nbd); nbd_clear_sock(nbd); mutex_unlock(&nbd->config_lock); + /* + * Make sure recv thread has finished, so it does not drop the last + * config ref and try to destroy the workqueue from inside the work + * queue. + */ + flush_workqueue(nbd->recv_workq); if (test_and_clear_bit(NBD_HAS_CONFIG_REF, &nbd->config->runtime_flags)) nbd_config_put(nbd); @@ -2215,20 +2235,12 @@ static int __init nbd_init(void)
if (nbds_max > 1UL << (MINORBITS - part_shift)) return -EINVAL; - recv_workqueue = alloc_workqueue("knbd-recv", - WQ_MEM_RECLAIM | WQ_HIGHPRI | - WQ_UNBOUND, 0); - if (!recv_workqueue) - return -ENOMEM;
- if (register_blkdev(NBD_MAJOR, "nbd")) { - destroy_workqueue(recv_workqueue); + if (register_blkdev(NBD_MAJOR, "nbd")) return -EIO; - }
if (genl_register_family(&nbd_genl_family)) { unregister_blkdev(NBD_MAJOR, "nbd"); - destroy_workqueue(recv_workqueue); return -EINVAL; } nbd_dbg_init(); @@ -2270,7 +2282,6 @@ static void __exit nbd_cleanup(void)
idr_destroy(&nbd_index_idr); genl_unregister_family(&nbd_genl_family); - destroy_workqueue(recv_workqueue); unregister_blkdev(NBD_MAJOR, "nbd"); }
From: Dmitry Osipenko digetx@gmail.com
commit 62bacb06b9f08965c4ef10e17875450490c948c0 upstream.
The kHz to Hz is incorrectly converted in a few places in the code, this results in a wrong frequency being calculated because devfreq core uses OPP frequencies that are given in Hz to clamp the rate, while tegra-devfreq gives to the core value in kHz and then it also expects to receive value in kHz from the core. In a result memory freq is always set to a value which is close to ULONG_MAX because of the bug. Hence the EMC frequency is always capped to the maximum and the driver doesn't do anything useful. This patch was tested on Tegra30 and Tegra124 SoC's, EMC frequency scaling works properly now.
Cc: stable@vger.kernel.org # 4.14+ Tested-by: Steev Klimaszewski steev@kali.org Reviewed-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Thierry Reding treding@nvidia.com Signed-off-by: MyungJoo Ham myungjoo.ham@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/devfreq/tegra-devfreq.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/drivers/devfreq/tegra-devfreq.c +++ b/drivers/devfreq/tegra-devfreq.c @@ -486,11 +486,11 @@ static int tegra_devfreq_target(struct d { struct tegra_devfreq *tegra = dev_get_drvdata(dev); struct dev_pm_opp *opp; - unsigned long rate = *freq * KHZ; + unsigned long rate;
- opp = devfreq_recommended_opp(dev, &rate, flags); + opp = devfreq_recommended_opp(dev, freq, flags); if (IS_ERR(opp)) { - dev_err(dev, "Failed to find opp for %lu KHz\n", *freq); + dev_err(dev, "Failed to find opp for %lu Hz\n", *freq); return PTR_ERR(opp); } rate = dev_pm_opp_get_freq(opp); @@ -499,8 +499,6 @@ static int tegra_devfreq_target(struct d clk_set_min_rate(tegra->emc_clock, rate); clk_set_rate(tegra->emc_clock, 0);
- *freq = rate; - return 0; }
@@ -510,7 +508,7 @@ static int tegra_devfreq_get_dev_status( struct tegra_devfreq *tegra = dev_get_drvdata(dev); struct tegra_devfreq_device *actmon_dev;
- stat->current_frequency = tegra->cur_freq; + stat->current_frequency = tegra->cur_freq * KHZ;
/* To be used by the tegra governor */ stat->private_data = tegra; @@ -565,7 +563,7 @@ static int tegra_governor_get_target(str target_freq = max(target_freq, dev->target_freq); }
- *freq = target_freq; + *freq = target_freq * KHZ;
return 0; }
From: Oleksandr Suvorov oleksandr.suvorov@toradex.com
commit cfc8f568aada98f9608a0a62511ca18d647613e2 upstream.
Prepare to use SND_SOC_DAPM_PRE_POST_PMU definition to reduce coming code size and make it more readable.
Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Suvorov oleksandr.suvorov@toradex.com Reviewed-by: Marcel Ziswiler marcel.ziswiler@toradex.com Reviewed-by: Igor Opaniuk igor.opaniuk@toradex.com Reviewed-by: Fabio Estevam festevam@gmail.com Link: https://lore.kernel.org/r/20190719100524.23300-2-oleksandr.suvorov@toradex.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/sound/soc-dapm.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/include/sound/soc-dapm.h +++ b/include/sound/soc-dapm.h @@ -353,6 +353,8 @@ struct device; #define SND_SOC_DAPM_WILL_PMD 0x80 /* called at start of sequence */ #define SND_SOC_DAPM_PRE_POST_PMD \ (SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMD) +#define SND_SOC_DAPM_PRE_POST_PMU \ + (SND_SOC_DAPM_PRE_PMU | SND_SOC_DAPM_POST_PMU)
/* convenience event type detection */ #define SND_SOC_DAPM_EVENT_ON(e) \
From: Oleksandr Suvorov oleksandr.suvorov@toradex.com
commit b1f373a11d25fc9a5f7679c9b85799fe09b0dc4a upstream.
VAG power control is improved to fit the manual [1]. This patch fixes as minimum one bug: if customer muxes Headphone to Line-In right after boot, the VAG power remains off that leads to poor sound quality from line-in.
I.e. after boot: - Connect sound source to Line-In jack; - Connect headphone to HP jack; - Run following commands: $ amixer set 'Headphone' 80% $ amixer set 'Headphone Mux' LINE_IN
Change VAG power on/off control according to the following algorithm: - turn VAG power ON on the 1st incoming event. - keep it ON if there is any active VAG consumer (ADC/DAC/HP/Line-In). - turn VAG power OFF when there is the latest consumer's pre-down event come. - always delay after VAG power OFF to avoid pop. - delay after VAG power ON if the initiative consumer is Line-In, this prevents pop during line-in muxing.
According to the data sheet [1], to avoid any pops/clicks, the outputs should be muted during input/output routing changes.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Cc: stable@vger.kernel.org Fixes: 9b34e6cc3bc2 ("ASoC: Add Freescale SGTL5000 codec support") Signed-off-by: Oleksandr Suvorov oleksandr.suvorov@toradex.com Reviewed-by: Marcel Ziswiler marcel.ziswiler@toradex.com Reviewed-by: Fabio Estevam festevam@gmail.com Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://lore.kernel.org/r/20190719100524.23300-3-oleksandr.suvorov@toradex.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/sgtl5000.c | 224 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 194 insertions(+), 30 deletions(-)
--- a/sound/soc/codecs/sgtl5000.c +++ b/sound/soc/codecs/sgtl5000.c @@ -31,6 +31,13 @@ #define SGTL5000_DAP_REG_OFFSET 0x0100 #define SGTL5000_MAX_REG_OFFSET 0x013A
+/* Delay for the VAG ramp up */ +#define SGTL5000_VAG_POWERUP_DELAY 500 /* ms */ +/* Delay for the VAG ramp down */ +#define SGTL5000_VAG_POWERDOWN_DELAY 500 /* ms */ + +#define SGTL5000_OUTPUTS_MUTE (SGTL5000_HP_MUTE | SGTL5000_LINE_OUT_MUTE) + /* default value of sgtl5000 registers */ static const struct reg_default sgtl5000_reg_defaults[] = { { SGTL5000_CHIP_DIG_POWER, 0x0000 }, @@ -116,6 +123,13 @@ enum { I2S_LRCLK_STRENGTH_HIGH, };
+enum { + HP_POWER_EVENT, + DAC_POWER_EVENT, + ADC_POWER_EVENT, + LAST_POWER_EVENT = ADC_POWER_EVENT +}; + /* sgtl5000 private structure in codec */ struct sgtl5000_priv { int sysclk; /* sysclk rate */ @@ -129,8 +143,109 @@ struct sgtl5000_priv { u8 micbias_resistor; u8 micbias_voltage; u8 lrclk_strength; + u16 mute_state[LAST_POWER_EVENT + 1]; };
+static inline int hp_sel_input(struct snd_soc_component *component) +{ + return (snd_soc_component_read32(component, SGTL5000_CHIP_ANA_CTRL) & + SGTL5000_HP_SEL_MASK) >> SGTL5000_HP_SEL_SHIFT; +} + +static inline u16 mute_output(struct snd_soc_component *component, + u16 mute_mask) +{ + u16 mute_reg = snd_soc_component_read32(component, + SGTL5000_CHIP_ANA_CTRL); + + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_CTRL, + mute_mask, mute_mask); + return mute_reg; +} + +static inline void restore_output(struct snd_soc_component *component, + u16 mute_mask, u16 mute_reg) +{ + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_CTRL, + mute_mask, mute_reg); +} + +static void vag_power_on(struct snd_soc_component *component, u32 source) +{ + if (snd_soc_component_read32(component, SGTL5000_CHIP_ANA_POWER) & + SGTL5000_VAG_POWERUP) + return; + + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, + SGTL5000_VAG_POWERUP, SGTL5000_VAG_POWERUP); + + /* When VAG powering on to get local loop from Line-In, the sleep + * is required to avoid loud pop. + */ + if (hp_sel_input(component) == SGTL5000_HP_SEL_LINE_IN && + source == HP_POWER_EVENT) + msleep(SGTL5000_VAG_POWERUP_DELAY); +} + +static int vag_power_consumers(struct snd_soc_component *component, + u16 ana_pwr_reg, u32 source) +{ + int consumers = 0; + + /* count dac/adc consumers unconditional */ + if (ana_pwr_reg & SGTL5000_DAC_POWERUP) + consumers++; + if (ana_pwr_reg & SGTL5000_ADC_POWERUP) + consumers++; + + /* + * If the event comes from HP and Line-In is selected, + * current action is 'DAC to be powered down'. + * As HP_POWERUP is not set when HP muxed to line-in, + * we need to keep VAG power ON. + */ + if (source == HP_POWER_EVENT) { + if (hp_sel_input(component) == SGTL5000_HP_SEL_LINE_IN) + consumers++; + } else { + if (ana_pwr_reg & SGTL5000_HP_POWERUP) + consumers++; + } + + return consumers; +} + +static void vag_power_off(struct snd_soc_component *component, u32 source) +{ + u16 ana_pwr = snd_soc_component_read32(component, + SGTL5000_CHIP_ANA_POWER); + + if (!(ana_pwr & SGTL5000_VAG_POWERUP)) + return; + + /* + * This function calls when any of VAG power consumers is disappearing. + * Thus, if there is more than one consumer at the moment, as minimum + * one consumer will definitely stay after the end of the current + * event. + * Don't clear VAG_POWERUP if 2 or more consumers of VAG present: + * - LINE_IN (for HP events) / HP (for DAC/ADC events) + * - DAC + * - ADC + * (the current consumer is disappearing right now) + */ + if (vag_power_consumers(component, ana_pwr, source) >= 2) + return; + + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, + SGTL5000_VAG_POWERUP, 0); + /* In power down case, we need wait 400-1000 ms + * when VAG fully ramped down. + * As longer we wait, as smaller pop we've got. + */ + msleep(SGTL5000_VAG_POWERDOWN_DELAY); +} + /* * mic_bias power on/off share the same register bits with * output impedance of mic bias, when power on mic bias, we @@ -162,36 +277,46 @@ static int mic_bias_event(struct snd_soc return 0; }
-/* - * As manual described, ADC/DAC only works when VAG powerup, - * So enabled VAG before ADC/DAC up. - * In power down case, we need wait 400ms when vag fully ramped down. - */ -static int power_vag_event(struct snd_soc_dapm_widget *w, - struct snd_kcontrol *kcontrol, int event) +static int vag_and_mute_control(struct snd_soc_component *component, + int event, int event_source) { - struct snd_soc_component *component = snd_soc_dapm_to_component(w->dapm); - const u32 mask = SGTL5000_DAC_POWERUP | SGTL5000_ADC_POWERUP; + static const u16 mute_mask[] = { + /* + * Mask for HP_POWER_EVENT. + * Muxing Headphones have to be wrapped with mute/unmute + * headphones only. + */ + SGTL5000_HP_MUTE, + /* + * Masks for DAC_POWER_EVENT/ADC_POWER_EVENT. + * Muxing DAC or ADC block have to wrapped with mute/unmute + * both headphones and line-out. + */ + SGTL5000_OUTPUTS_MUTE, + SGTL5000_OUTPUTS_MUTE + }; + + struct sgtl5000_priv *sgtl5000 = + snd_soc_component_get_drvdata(component);
switch (event) { + case SND_SOC_DAPM_PRE_PMU: + sgtl5000->mute_state[event_source] = + mute_output(component, mute_mask[event_source]); + break; case SND_SOC_DAPM_POST_PMU: - snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, - SGTL5000_VAG_POWERUP, SGTL5000_VAG_POWERUP); - msleep(400); + vag_power_on(component, event_source); + restore_output(component, mute_mask[event_source], + sgtl5000->mute_state[event_source]); break; - case SND_SOC_DAPM_PRE_PMD: - /* - * Don't clear VAG_POWERUP, when both DAC and ADC are - * operational to prevent inadvertently starving the - * other one of them. - */ - if ((snd_soc_component_read32(component, SGTL5000_CHIP_ANA_POWER) & - mask) != mask) { - snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, - SGTL5000_VAG_POWERUP, 0); - msleep(400); - } + sgtl5000->mute_state[event_source] = + mute_output(component, mute_mask[event_source]); + vag_power_off(component, event_source); + break; + case SND_SOC_DAPM_POST_PMD: + restore_output(component, mute_mask[event_source], + sgtl5000->mute_state[event_source]); break; default: break; @@ -200,6 +325,41 @@ static int power_vag_event(struct snd_so return 0; }
+/* + * Mute Headphone when power it up/down. + * Control VAG power on HP power path. + */ +static int headphone_pga_event(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *component = + snd_soc_dapm_to_component(w->dapm); + + return vag_and_mute_control(component, event, HP_POWER_EVENT); +} + +/* As manual describes, ADC/DAC powering up/down requires + * to mute outputs to avoid pops. + * Control VAG power on ADC/DAC power path. + */ +static int adc_updown_depop(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *component = + snd_soc_dapm_to_component(w->dapm); + + return vag_and_mute_control(component, event, ADC_POWER_EVENT); +} + +static int dac_updown_depop(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *component = + snd_soc_dapm_to_component(w->dapm); + + return vag_and_mute_control(component, event, DAC_POWER_EVENT); +} + /* input sources for ADC */ static const char *adc_mux_text[] = { "MIC_IN", "LINE_IN" @@ -272,7 +432,10 @@ static const struct snd_soc_dapm_widget mic_bias_event, SND_SOC_DAPM_POST_PMU | SND_SOC_DAPM_PRE_PMD),
- SND_SOC_DAPM_PGA("HP", SGTL5000_CHIP_ANA_POWER, 4, 0, NULL, 0), + SND_SOC_DAPM_PGA_E("HP", SGTL5000_CHIP_ANA_POWER, 4, 0, NULL, 0, + headphone_pga_event, + SND_SOC_DAPM_PRE_POST_PMU | + SND_SOC_DAPM_PRE_POST_PMD), SND_SOC_DAPM_PGA("LO", SGTL5000_CHIP_ANA_POWER, 0, 0, NULL, 0),
SND_SOC_DAPM_MUX("Capture Mux", SND_SOC_NOPM, 0, 0, &adc_mux), @@ -293,11 +456,12 @@ static const struct snd_soc_dapm_widget 0, SGTL5000_CHIP_DIG_POWER, 1, 0),
- SND_SOC_DAPM_ADC("ADC", "Capture", SGTL5000_CHIP_ANA_POWER, 1, 0), - SND_SOC_DAPM_DAC("DAC", "Playback", SGTL5000_CHIP_ANA_POWER, 3, 0), - - SND_SOC_DAPM_PRE("VAG_POWER_PRE", power_vag_event), - SND_SOC_DAPM_POST("VAG_POWER_POST", power_vag_event), + SND_SOC_DAPM_ADC_E("ADC", "Capture", SGTL5000_CHIP_ANA_POWER, 1, 0, + adc_updown_depop, SND_SOC_DAPM_PRE_POST_PMU | + SND_SOC_DAPM_PRE_POST_PMD), + SND_SOC_DAPM_DAC_E("DAC", "Playback", SGTL5000_CHIP_ANA_POWER, 3, 0, + dac_updown_depop, SND_SOC_DAPM_PRE_POST_PMU | + SND_SOC_DAPM_PRE_POST_PMD), };
/* routes for sgtl5000 */
From: Balbir Singh bsingharora@gmail.com
commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee upstream.
The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages.
Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh bsingharora@gmail.com [arbab@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab arbab@linux.ibm.com Tested-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Signed-off-by: Santosh Sivaraj santosh@fossix.org Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/mce_power.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
--- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -39,6 +39,7 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) { pte_t *ptep; + unsigned int shift; unsigned long flags; struct mm_struct *mm;
@@ -48,13 +49,18 @@ static unsigned long addr_to_pfn(struct mm = &init_mm;
local_irq_save(flags); - if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); - else - ptep = find_init_mm_pte(addr, NULL); + ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift); local_irq_restore(flags); + if (!ptep || pte_special(*ptep)) return ULONG_MAX; + + if (shift > PAGE_SHIFT) { + unsigned long rpnmask = (1ul << shift) - PAGE_SIZE; + + return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask))); + } + return pte_pfn(*ptep); }
@@ -339,7 +345,7 @@ static const struct mce_derror_table mce MCE_INITIATOR_CPU, MCE_SEV_ERROR_SYNC, }, { 0, false, 0, 0, 0, 0 } };
-static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, +static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr, uint64_t *phys_addr) { /* @@ -530,7 +536,8 @@ static int mce_handle_derror(struct pt_r * kernel/exception-64s.h */ if (get_paca()->in_mce < MAX_MCE_DEPTH) - mce_find_instr_ea_and_pfn(regs, addr, phys_addr); + mce_find_instr_ea_and_phys(regs, addr, + phys_addr); } found = 1; }
From: Santosh Sivaraj santosh@fossix.org
commit b5bda6263cad9a927e1a4edb7493d542da0c1410 upstream.
schedule_work() cannot be called from MCE exception context as MCE can interrupt even in interrupt disabled context.
Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors") Cc: stable@vger.kernel.org # v4.15+ Reviewed-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Acked-by: Balbir Singh bsingharora@gmail.com Signed-off-by: Santosh Sivaraj santosh@fossix.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190820081352.8641-2-santosh@fossix.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/mce.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -45,6 +45,7 @@ static DEFINE_PER_CPU(struct machine_che mce_ue_event_queue);
static void machine_check_process_queued_event(struct irq_work *work); +static void machine_check_ue_irq_work(struct irq_work *work); void machine_check_ue_event(struct machine_check_event *evt); static void machine_process_ue_event(struct work_struct *work);
@@ -52,6 +53,10 @@ static struct irq_work mce_event_process .func = machine_check_process_queued_event, };
+static struct irq_work mce_ue_event_irq_work = { + .func = machine_check_ue_irq_work, +}; + DECLARE_WORK(mce_ue_event_work, machine_process_ue_event);
static void mce_set_error_info(struct machine_check_event *mce, @@ -208,6 +213,10 @@ void release_mce_event(void) get_mce_event(NULL, true); }
+static void machine_check_ue_irq_work(struct irq_work *work) +{ + schedule_work(&mce_ue_event_work); +}
/* * Queue up the MCE event which then can be handled later. @@ -225,7 +234,7 @@ void machine_check_ue_event(struct machi memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt));
/* Queue work to process this event later. */ - schedule_work(&mce_ue_event_work); + irq_work_queue(&mce_ue_event_irq_work); }
/*
From: Andrew Donnellan ajd@linux.ibm.com
commit e7de4f7b64c23e503a8c42af98d56f2a7462bd6d upstream.
Currently the OPAL symbol map is globally readable, which seems bad as it contains physical addresses.
Restrict it to root.
Fixes: c8742f85125d ("powerpc/powernv: Expose OPAL firmware symbol map") Cc: stable@vger.kernel.org # v3.19+ Suggested-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Andrew Donnellan ajd@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190503075253.22798-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/powernv/opal.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -680,7 +680,10 @@ static ssize_t symbol_map_read(struct fi bin_attr->size); }
-static BIN_ATTR_RO(symbol_map, 0); +static struct bin_attribute symbol_map_attr = { + .attr = {.name = "symbol_map", .mode = 0400}, + .read = symbol_map_read +};
static void opal_export_symmap(void) { @@ -697,10 +700,10 @@ static void opal_export_symmap(void) return;
/* Setup attributes */ - bin_attr_symbol_map.private = __va(be64_to_cpu(syms[0])); - bin_attr_symbol_map.size = be64_to_cpu(syms[1]); + symbol_map_attr.private = __va(be64_to_cpu(syms[0])); + symbol_map_attr.size = be64_to_cpu(syms[1]);
- rc = sysfs_create_bin_file(opal_kobj, &bin_attr_symbol_map); + rc = sysfs_create_bin_file(opal_kobj, &symbol_map_attr); if (rc) pr_warn("Error %d creating OPAL symbols file\n", rc); }
From: Alexey Kardashevskiy aik@ozlabs.ru
commit 56090a3902c80c296e822d11acdb6a101b322c52 upstream.
pnv_tce() returns a pointer to a TCE entry and originally a TCE table would be pre-allocated. For the default case of 2GB window the table needs only a single level and that is fine. However if more levels are requested, it is possible to get a race when 2 threads want a pointer to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero, it cannot become zero again.
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand") CC: stable@vger.kernel.org # v4.19+ Signed-off-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190718051139.74787-2-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c @@ -49,6 +49,9 @@ static __be64 *pnv_alloc_tce_level(int n return addr; }
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr, + unsigned long size, unsigned int levels); + static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc) { __be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base; @@ -58,9 +61,9 @@ static __be64 *pnv_tce(struct iommu_tabl
while (level) { int n = (idx & mask) >> (level * shift); - unsigned long tce; + unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) { + if (!tce) { __be64 *tmp2;
if (!alloc) @@ -71,10 +74,15 @@ static __be64 *pnv_tce(struct iommu_tabl if (!tmp2) return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) | - TCE_PCI_READ | TCE_PCI_WRITE); + tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE; + oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0, + cpu_to_be64(tce))); + if (oldtce) { + pnv_pci_ioda2_table_do_free_pages(tmp2, + ilog2(tbl->it_level_size) + 3, 1); + tce = oldtce; + } } - tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE)); idx &= ~mask;
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit 677733e296b5c7a37c47da391fc70a43dc40bd67 upstream.
The store ordering vs tlbie issue mentioned in commit a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them
We can only do this on PowerNV. On pseries guest with KVM we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/dt_cpu_ftrs.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -694,9 +694,35 @@ static bool __init cpufeatures_process_f return true; }
+/* + * Handle POWER9 broadcast tlbie invalidation issue using + * cpu feature flag. + */ +static __init void update_tlbie_feature_flag(unsigned long pvr) +{ + if (PVR_VER(pvr) == PVR_POWER9) { + /* + * Set the tlbie feature flag for anything below + * Nimbus DD 2.3 and Cumulus DD 1.3 + */ + if ((pvr & 0xe000) == 0) { + /* Nimbus */ + if ((pvr & 0xfff) < 0x203) + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } else if ((pvr & 0xc000) == 0) { + /* Cumulus */ + if ((pvr & 0xfff) < 0x103) + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } else { + WARN_ONCE(1, "Unknown PVR"); + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } + } +} + static __init void cpufeatures_cpu_quirks(void) { - int version = mfspr(SPRN_PVR); + unsigned long version = mfspr(SPRN_PVR);
/* * Not all quirks can be derived from the cpufeatures device tree. @@ -715,10 +741,10 @@ static __init void cpufeatures_cpu_quirk
if ((version & 0xffff0000) == 0x004e0000) { cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR); - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR; }
+ update_tlbie_feature_flag(version); /* * PKEY was not in the initial base or feature node * specification, but it should become optional in the next
From: Marc Kleine-Budde mkl@pengutronix.de
commit d84ea2123f8d27144e3f4d58cd88c9c6ddc799de upstream.
Some boards take longer than 5ms to power up after a reset, so allow some retries attempts before giving up.
Fixes: ff06d611a31c ("can: mcp251x: Improve mcp251x_hw_reset()") Cc: linux-stable stable@vger.kernel.org Tested-by: Sean Nyekjaer sean@geanix.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/spi/mcp251x.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
--- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -626,7 +626,7 @@ static int mcp251x_setup(struct net_devi static int mcp251x_hw_reset(struct spi_device *spi) { struct mcp251x_priv *priv = spi_get_drvdata(spi); - u8 reg; + unsigned long timeout; int ret;
/* Wait for oscillator startup timer after power up */ @@ -640,10 +640,19 @@ static int mcp251x_hw_reset(struct spi_d /* Wait for oscillator startup timer after reset */ mdelay(MCP251X_OST_DELAY_MS);
- reg = mcp251x_read_reg(spi, CANSTAT); - if ((reg & CANCTRL_REQOP_MASK) != CANCTRL_REQOP_CONF) - return -ENODEV; - + /* Wait for reset to finish */ + timeout = jiffies + HZ; + while ((mcp251x_read_reg(spi, CANSTAT) & CANCTRL_REQOP_MASK) != + CANCTRL_REQOP_CONF) { + usleep_range(MCP251X_OST_DELAY_MS * 1000, + MCP251X_OST_DELAY_MS * 1000 * 2); + + if (time_after(jiffies, timeout)) { + dev_err(&spi->dev, + "MCP251x didn't enter in conf mode after reset\n"); + return -EBUSY; + } + } return 0; }
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 82a2f88458d70704be843961e10b5cef9a6e95d3 upstream.
The tools/lib/traceevent/Makefile had a test added to it to detect a failure of the "nm" when making the dynamic list file (whatever that is). The problem is that the test sorts the values "U W w" and some versions of sort will place "w" ahead of "W" (even though it has a higher ASCII value, and break the test.
Add 'tr "w" "W"' to merge the two and not worry about the ordering.
Reported-by: Tzvetomir Stoyanov tstoyanov@vmware.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: David Carrillo-Cisneros davidcc@google.com Cc: He Kuang hekuang@huawei.com Cc: Jiri Olsa jolsa@kernel.org Cc: Michal rarek mmarek@suse.com Cc: Paul Turner pjt@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Uwe Kleine-König u.kleine-koenig@pengutronix.de Cc: Wang Nan wangnan0@huawei.com Cc: stable@vger.kernel.org Fixes: 6467753d61399 ("tools lib traceevent: Robustify do_generate_dynamic_list_file") Link: http://lkml.kernel.org/r/20190805130150.25acfeb1@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/lib/traceevent/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/lib/traceevent/Makefile +++ b/tools/lib/traceevent/Makefile @@ -259,8 +259,8 @@ endef
define do_generate_dynamic_list_file symbol_type=`$(NM) -u -D $1 | awk 'NF>1 {print $$1}' | \ - xargs echo "U W w" | tr ' ' '\n' | sort -u | xargs echo`;\ - if [ "$$symbol_type" = "U W w" ];then \ + xargs echo "U w W" | tr 'w ' 'W\n' | sort -u | xargs echo`;\ + if [ "$$symbol_type" = "U W" ];then \ (echo '{'; \ $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u;\ echo '};'; \
From: Alexander Sverdlin alexander.sverdlin@nokia.com
commit 1b82feb6c5e1996513d0fb0bbb475417088b4954 upstream.
It seems that smp_processor_id() is only used for a best-effort load-balancing, refer to qat_crypto_get_instance_node(). It's not feasible to disable preemption for the duration of the crypto requests. Therefore, just silence the warning. This commit is similar to e7a9b05ca4 ("crypto: cavium - Fix smp_processor_id() warnings").
Silences the following splat: BUG: using smp_processor_id() in preemptible [00000000] code: cryptomgr_test/2904 caller is qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat] CPU: 1 PID: 2904 Comm: cryptomgr_test Tainted: P O 4.14.69 #1 ... Call Trace: dump_stack+0x5f/0x86 check_preemption_disabled+0xd3/0xe0 qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat] skcipher_setkey_ablkcipher+0x2b/0x40 __test_skcipher+0x1f3/0xb20 ? cpumask_next_and+0x26/0x40 ? find_busiest_group+0x10e/0x9d0 ? preempt_count_add+0x49/0xa0 ? try_module_get+0x61/0xf0 ? crypto_mod_get+0x15/0x30 ? __kmalloc+0x1df/0x1f0 ? __crypto_alloc_tfm+0x116/0x180 ? crypto_skcipher_init_tfm+0xa6/0x180 ? crypto_create_tfm+0x4b/0xf0 test_skcipher+0x21/0xa0 alg_test_skcipher+0x3f/0xa0 alg_test.part.6+0x126/0x2a0 ? finish_task_switch+0x21b/0x260 ? __schedule+0x1e9/0x800 ? __wake_up_common+0x8d/0x140 cryptomgr_test+0x40/0x50 kthread+0xff/0x130 ? cryptomgr_notify+0x540/0x540 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x24/0x50
Fixes: ed8ccaef52 ("crypto: qat - Add support for SRIOV") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/qat/qat_common/adf_common_drv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/crypto/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/qat/qat_common/adf_common_drv.h @@ -95,7 +95,7 @@ struct service_hndl {
static inline int get_current_node(void) { - return topology_physical_package_id(smp_processor_id()); + return topology_physical_package_id(raw_smp_processor_id()); }
int adf_service_register(struct service_hndl *service);
From: Herbert Xu herbert@gondor.apana.org.au
commit 0ba3c026e685573bd3534c17e27da7c505ac99c4 upstream.
skcipher_walk_done may be called with an error by internal or external callers. For those internal callers we shouldn't unmap pages but for external callers we must unmap any pages that are in use.
This patch distinguishes between the two cases by checking whether walk->nbytes is zero or not. For internal callers, we now set walk->nbytes to zero prior to the call. For external callers, walk->nbytes has always been non-zero (as zero is used to indicate the termination of a walk).
Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Fixes: 5cde0af2a982 ("[CRYPTO] cipher: Added block cipher type") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/skcipher.c | 42 +++++++++++++++++++++++------------------- 1 file changed, 23 insertions(+), 19 deletions(-)
--- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 * return max(start, end_page); }
-static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize) +static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize) { u8 *addr;
@@ -103,19 +103,21 @@ static void skcipher_done_slow(struct sk addr = skcipher_get_spot(addr, bsize); scatterwalk_copychunks(addr, &walk->out, bsize, (walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1); + return 0; }
int skcipher_walk_done(struct skcipher_walk *walk, int err) { - unsigned int n; /* bytes processed */ - bool more; + unsigned int n = walk->nbytes; + unsigned int nbytes = 0;
- if (unlikely(err < 0)) + if (!n) goto finish;
- n = walk->nbytes - err; - walk->total -= n; - more = (walk->total != 0); + if (likely(err >= 0)) { + n -= err; + nbytes = walk->total - n; + }
if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS | SKCIPHER_WALK_SLOW | @@ -131,7 +133,7 @@ unmap_src: memcpy(walk->dst.virt.addr, walk->page, n); skcipher_unmap_dst(walk); } else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) { - if (err) { + if (err > 0) { /* * Didn't process all bytes. Either the algorithm is * broken, or this was the last step and it turned out @@ -139,27 +141,29 @@ unmap_src: * the algorithm requires it. */ err = -EINVAL; - goto finish; - } - skcipher_done_slow(walk, n); - goto already_advanced; + nbytes = 0; + } else + n = skcipher_done_slow(walk, n); }
+ if (err > 0) + err = 0; + + walk->total = nbytes; + walk->nbytes = 0; + scatterwalk_advance(&walk->in, n); scatterwalk_advance(&walk->out, n); -already_advanced: - scatterwalk_done(&walk->in, 0, more); - scatterwalk_done(&walk->out, 1, more); + scatterwalk_done(&walk->in, 0, nbytes); + scatterwalk_done(&walk->out, 1, nbytes);
- if (more) { + if (nbytes) { crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ? CRYPTO_TFM_REQ_MAY_SLEEP : 0); return skcipher_walk_next(walk); } - err = 0; -finish: - walk->nbytes = 0;
+finish: /* Short-circuit for the common/fast path. */ if (!((unsigned long)walk->buffer | (unsigned long)walk->page)) goto out;
From: Wei Yongjun weiyongjun1@huawei.com
commit c552ffb5c93d9d65aaf34f5f001c4e7e8484ced1 upstream.
When using single_open() for opening, single_release() should be used instead of seq_release(), otherwise there is a memory leak.
Fixes: 09ae5d37e093 ("crypto: zip - Add Compression/Decompression statistics") Cc: stable@vger.kernel.org Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/cavium/zip/zip_main.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/crypto/cavium/zip/zip_main.c +++ b/drivers/crypto/cavium/zip/zip_main.c @@ -593,6 +593,7 @@ static const struct file_operations zip_ .owner = THIS_MODULE, .open = zip_stats_open, .read = seq_read, + .release = single_release, };
static int zip_clear_open(struct inode *inode, struct file *file) @@ -604,6 +605,7 @@ static const struct file_operations zip_ .owner = THIS_MODULE, .open = zip_clear_open, .read = seq_read, + .release = single_release, };
static int zip_regs_open(struct inode *inode, struct file *file) @@ -615,6 +617,7 @@ static const struct file_operations zip_ .owner = THIS_MODULE, .open = zip_regs_open, .read = seq_read, + .release = single_release, };
/* Root directory for thunderx_zip debugfs entry */
From: Horia Geantă horia.geanta@nxp.com
commit 48f89d2a2920166c35b1c0b69917dbb0390ebec7 upstream.
IV transfer from ofifo to class2 (set up at [29][30]) is not guaranteed to be scheduled before the data transfer from ofifo to external memory (set up at [38]:
[29] 10FA0004 ld: ind-nfifo (len=4) imm [30] 81F00010 <nfifo_entry: ofifo->class2 type=msg len=16> [31] 14820004 ld: ccb2-datasz len=4 offs=0 imm [32] 00000010 data:0x00000010 [33] 8210010D operation: cls1-op aes cbc init-final enc [34] A8080B04 math: (seqin + math0)->vseqout len=4 [35] 28000010 seqfifold: skip len=16 [36] A8080A04 math: (seqin + math0)->vseqin len=4 [37] 2F1E0000 seqfifold: both msg1->2-last2-last1 len=vseqinsz [38] 69300000 seqfifostr: msg len=vseqoutsz [39] 5C20000C seqstr: ccb2 ctx len=12 offs=0
If ofifo -> external memory transfer happens first, DECO will hang (issuing a Watchdog Timeout error, if WDOG is enabled) waiting for data availability in ofifo for the ofifo -> c2 ififo transfer.
Make sure IV transfer happens first by waiting for all CAAM internal transfers to end before starting payload transfer.
New descriptor with jump command inserted at [37]:
[..] [36] A8080A04 math: (seqin + math0)->vseqin len=4 [37] A1000401 jump: jsl1 all-match[!nfifopend] offset=[01] local->[38] [38] 2F1E0000 seqfifold: both msg1->2-last2-last1 len=vseqinsz [39] 69300000 seqfifostr: msg len=vseqoutsz [40] 5C20000C seqstr: ccb2 ctx len=12 offs=0
[Note: the issue is present in the descriptor from the very beginning (cf. Fixes tag). However I've marked it v4.19+ since it's the oldest maintained kernel that the patch applies clean against.]
Cc: stable@vger.kernel.org # v4.19+ Fixes: 1acebad3d8db8 ("crypto: caam - faster aead implementation") Signed-off-by: Horia Geantă horia.geanta@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/caam/caamalg_desc.c | 9 +++++++++ drivers/crypto/caam/caamalg_desc.h | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-)
--- a/drivers/crypto/caam/caamalg_desc.c +++ b/drivers/crypto/caam/caamalg_desc.c @@ -509,6 +509,7 @@ void cnstr_shdsc_aead_givencap(u32 * con const bool is_qi, int era) { u32 geniv, moveiv; + u32 *wait_cmd;
/* Note: Context registers are saved. */ init_sh_desc_key_aead(desc, cdata, adata, is_rfc3686, nonce, era); @@ -604,6 +605,14 @@ copy_iv:
/* Will read cryptlen */ append_math_add(desc, VARSEQINLEN, SEQINLEN, REG0, CAAM_CMD_SZ); + + /* + * Wait for IV transfer (ofifo -> class2) to finish before starting + * ciphertext transfer (ofifo -> external memory). + */ + wait_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL | JUMP_COND_NIFP); + set_jump_tgt_here(desc, wait_cmd); + append_seq_fifo_load(desc, 0, FIFOLD_CLASS_BOTH | KEY_VLF | FIFOLD_TYPE_MSG1OUT2 | FIFOLD_TYPE_LASTBOTH); append_seq_fifo_store(desc, 0, FIFOST_TYPE_MESSAGE_DATA | KEY_VLF); --- a/drivers/crypto/caam/caamalg_desc.h +++ b/drivers/crypto/caam/caamalg_desc.h @@ -12,7 +12,7 @@ #define DESC_AEAD_BASE (4 * CAAM_CMD_SZ) #define DESC_AEAD_ENC_LEN (DESC_AEAD_BASE + 11 * CAAM_CMD_SZ) #define DESC_AEAD_DEC_LEN (DESC_AEAD_BASE + 15 * CAAM_CMD_SZ) -#define DESC_AEAD_GIVENC_LEN (DESC_AEAD_ENC_LEN + 7 * CAAM_CMD_SZ) +#define DESC_AEAD_GIVENC_LEN (DESC_AEAD_ENC_LEN + 8 * CAAM_CMD_SZ) #define DESC_QI_AEAD_ENC_LEN (DESC_AEAD_ENC_LEN + 3 * CAAM_CMD_SZ) #define DESC_QI_AEAD_DEC_LEN (DESC_AEAD_DEC_LEN + 3 * CAAM_CMD_SZ) #define DESC_QI_AEAD_GIVENC_LEN (DESC_AEAD_GIVENC_LEN + 3 * CAAM_CMD_SZ)
From: Gilad Ben-Yossef gilad@benyossef.com
commit 76a95bd8f9e10cade9c4c8df93b5c20ff45dc0f5 upstream.
When ccree driver runs it checks the state of the Trusted Execution Environment CryptoCell driver before proceeding. We did not account for cases where the TEE side is not ready or not available at all. Fix it by only considering TEE error state after sync with the TEE side driver.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Fixes: ab8ec9658f5a ("crypto: ccree - add FIPS support") CC: stable@vger.kernel.org # v4.17+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_fips.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/crypto/ccree/cc_fips.c +++ b/drivers/crypto/ccree/cc_fips.c @@ -21,7 +21,13 @@ static bool cc_get_tee_fips_status(struc u32 reg;
reg = cc_ioread(drvdata, CC_REG(GPR_HOST)); - return (reg == (CC_FIPS_SYNC_TEE_STATUS | CC_FIPS_SYNC_MODULE_OK)); + /* Did the TEE report status? */ + if (reg & CC_FIPS_SYNC_TEE_STATUS) + /* Yes. Is it OK? */ + return (reg & CC_FIPS_SYNC_MODULE_OK); + + /* No. It's either not in use or will be reported later */ + return true; }
/*
From: Gilad Ben-Yossef gilad@benyossef.com
commit 7a4be6c113c1f721818d1e3722a9015fe393295c upstream.
In case of AEAD decryption verifcation error we were using the wrong value to zero out the plaintext buffer leaving the end of the buffer with the false plaintext.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Fixes: ff27e85a85bb ("crypto: ccree - add AEAD support") CC: stable@vger.kernel.org # v4.17+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_aead.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/crypto/ccree/cc_aead.c +++ b/drivers/crypto/ccree/cc_aead.c @@ -227,7 +227,7 @@ static void cc_aead_complete(struct devi /* In case of payload authentication failure, MUST NOT * revealed the decrypted message --> zero its memory. */ - cc_zero_sgl(areq->dst, areq_ctx->cryptlen); + cc_zero_sgl(areq->dst, areq->cryptlen); err = -EBADMSG; } } else { /*ENCRYPT*/
From: Jiaxun Yang jiaxun.yang@flygoat.com
commit d2f965549006acb865c4638f1f030ebcefdc71f6 upstream.
Recently, binutils had split Loongson-3 Extensions into four ASEs: MMI, CAM, EXT, EXT2. This patch do the samething in kernel and expose them in cpuinfo so applications can probe supported ASEs at runtime.
Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Cc: Huacai Chen chenhc@lemote.com Cc: Yunqiang Su ysu@wavecomp.com Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Paul Burton paul.burton@mips.com Cc: linux-mips@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/include/asm/cpu-features.h | 16 ++++++++++++++++ arch/mips/include/asm/cpu.h | 4 ++++ arch/mips/kernel/cpu-probe.c | 6 ++++++ arch/mips/kernel/proc.c | 4 ++++ 4 files changed, 30 insertions(+)
--- a/arch/mips/include/asm/cpu-features.h +++ b/arch/mips/include/asm/cpu-features.h @@ -387,6 +387,22 @@ #define cpu_has_dsp3 __ase(MIPS_ASE_DSP3) #endif
+#ifndef cpu_has_loongson_mmi +#define cpu_has_loongson_mmi __ase(MIPS_ASE_LOONGSON_MMI) +#endif + +#ifndef cpu_has_loongson_cam +#define cpu_has_loongson_cam __ase(MIPS_ASE_LOONGSON_CAM) +#endif + +#ifndef cpu_has_loongson_ext +#define cpu_has_loongson_ext __ase(MIPS_ASE_LOONGSON_EXT) +#endif + +#ifndef cpu_has_loongson_ext2 +#define cpu_has_loongson_ext2 __ase(MIPS_ASE_LOONGSON_EXT2) +#endif + #ifndef cpu_has_mipsmt #define cpu_has_mipsmt __isa_lt_and_ase(6, MIPS_ASE_MIPSMT) #endif --- a/arch/mips/include/asm/cpu.h +++ b/arch/mips/include/asm/cpu.h @@ -436,5 +436,9 @@ enum cpu_type_enum { #define MIPS_ASE_MSA 0x00000100 /* MIPS SIMD Architecture */ #define MIPS_ASE_DSP3 0x00000200 /* Signal Processing ASE Rev 3*/ #define MIPS_ASE_MIPS16E2 0x00000400 /* MIPS16e2 */ +#define MIPS_ASE_LOONGSON_MMI 0x00000800 /* Loongson MultiMedia extensions Instructions */ +#define MIPS_ASE_LOONGSON_CAM 0x00001000 /* Loongson CAM */ +#define MIPS_ASE_LOONGSON_EXT 0x00002000 /* Loongson EXTensions */ +#define MIPS_ASE_LOONGSON_EXT2 0x00004000 /* Loongson EXTensions R2 */
#endif /* _ASM_CPU_H */ --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -1489,6 +1489,8 @@ static inline void cpu_probe_legacy(stru __cpu_name[cpu] = "ICT Loongson-3"; set_elf_platform(cpu, "loongson3a"); set_isa(c, MIPS_CPU_ISA_M64R1); + c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | + MIPS_ASE_LOONGSON_EXT); break; case PRID_REV_LOONGSON3B_R1: case PRID_REV_LOONGSON3B_R2: @@ -1496,6 +1498,8 @@ static inline void cpu_probe_legacy(stru __cpu_name[cpu] = "ICT Loongson-3"; set_elf_platform(cpu, "loongson3b"); set_isa(c, MIPS_CPU_ISA_M64R1); + c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | + MIPS_ASE_LOONGSON_EXT); break; }
@@ -1861,6 +1865,8 @@ static inline void cpu_probe_loongson(st decode_configs(c); c->options |= MIPS_CPU_FTLB | MIPS_CPU_TLBINV | MIPS_CPU_LDPTE; c->writecombine = _CACHE_UNCACHED_ACCELERATED; + c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | + MIPS_ASE_LOONGSON_EXT | MIPS_ASE_LOONGSON_EXT2); break; default: panic("Unknown Loongson Processor ID!"); --- a/arch/mips/kernel/proc.c +++ b/arch/mips/kernel/proc.c @@ -124,6 +124,10 @@ static int show_cpuinfo(struct seq_file if (cpu_has_eva) seq_printf(m, "%s", " eva"); if (cpu_has_htw) seq_printf(m, "%s", " htw"); if (cpu_has_xpa) seq_printf(m, "%s", " xpa"); + if (cpu_has_loongson_mmi) seq_printf(m, "%s", " loongson-mmi"); + if (cpu_has_loongson_cam) seq_printf(m, "%s", " loongson-cam"); + if (cpu_has_loongson_ext) seq_printf(m, "%s", " loongson-ext"); + if (cpu_has_loongson_ext2) seq_printf(m, "%s", " loongson-ext2"); seq_printf(m, "\n");
if (cpu_has_mmips) {
From: Michael Nosthoff committed@heine.so
commit 99956a9e08251a1234434b492875b1eaff502a12 upstream.
the type flag is stored in the chip->flags field not in the client->flags field. This currently leads to never using the ti specific health function as client->flags doesn't use that bit. So it's always falling back to the general one.
Fixes: 76b16f4cdfb8 ("power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats") Cc: stable@vger.kernel.org Signed-off-by: Michael Nosthoff committed@heine.so Reviewed-by: Brian Norris briannorris@chromium.org Reviewed-by: Enric Balletbo i Serra enric.balletbo@collabora.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/sbs-battery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/power/supply/sbs-battery.c +++ b/drivers/power/supply/sbs-battery.c @@ -629,7 +629,7 @@ static int sbs_get_property(struct power switch (psp) { case POWER_SUPPLY_PROP_PRESENT: case POWER_SUPPLY_PROP_HEALTH: - if (client->flags & SBS_FLAGS_TI_BQ20Z75) + if (chip->flags & SBS_FLAGS_TI_BQ20Z75) ret = sbs_get_ti_battery_presence_and_health(client, psp, val); else
From: Michael Nosthoff committed@heine.so
commit fe55e770327363304c4111423e6f7ff3c650136d upstream.
when the battery is set to sbs-mode and no gpio detection is enabled "health" is always returning a value even when the battery is not present. All other fields return "not present". This leads to a scenario where the driver is constantly switching between "present" and "not present" state. This generates a lot of constant traffic on the i2c.
This commit changes the response of "health" to an error when the battery is not responding leading to a consistent "not present" state.
Fixes: 76b16f4cdfb8 ("power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats") Cc: stable@vger.kernel.org Signed-off-by: Michael Nosthoff committed@heine.so Reviewed-by: Brian Norris briannorris@chromium.org Tested-by: Brian Norris briannorris@chromium.org Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/sbs-battery.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)
--- a/drivers/power/supply/sbs-battery.c +++ b/drivers/power/supply/sbs-battery.c @@ -323,17 +323,22 @@ static int sbs_get_battery_presence_and_ { int ret;
- if (psp == POWER_SUPPLY_PROP_PRESENT) { - /* Dummy command; if it succeeds, battery is present. */ - ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr); - if (ret < 0) - val->intval = 0; /* battery disconnected */ - else - val->intval = 1; /* battery present */ - } else { /* POWER_SUPPLY_PROP_HEALTH */ + /* Dummy command; if it succeeds, battery is present. */ + ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr); + + if (ret < 0) { /* battery not present*/ + if (psp == POWER_SUPPLY_PROP_PRESENT) { + val->intval = 0; + return 0; + } + return ret; + } + + if (psp == POWER_SUPPLY_PROP_PRESENT) + val->intval = 1; /* battery present */ + else /* POWER_SUPPLY_PROP_HEALTH */ /* SBS spec doesn't have a general health command. */ val->intval = POWER_SUPPLY_HEALTH_UNKNOWN; - }
return 0; } @@ -635,6 +640,8 @@ static int sbs_get_property(struct power else ret = sbs_get_battery_presence_and_health(client, psp, val); + + /* this can only be true if no gpio is used */ if (psp == POWER_SUPPLY_PROP_PRESENT) return 0; break;
From: Tom Zanussi zanussi@kernel.org
commit 17f8607a1658a8e70415eef67909f990d13017b5 upstream.
Original changelog from Steve Rostedt (except last sentence which explains the problem, and the Fixes: tag):
I performed a three way histogram with the following commands:
echo 'irq_lat u64 lat pid_t pid' > synthetic_events echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger echo 1 > events/synthetic/wake_lat/enable
Basically I wanted to see:
hrtimer_start (calling function tick_sched_timer)
Note:
# grep tick_sched_timer /proc/kallsyms ffffffff81200580 t tick_sched_timer
And save the time of that, and then record sched_waking if it is called in interrupt context and with the same pid as the hrtimer_start, it will record the latency between that and the waking event.
I then look at when the task that is woken is scheduled in, and record the latency between the wakeup and the task running.
At the end, the wake_lat synthetic event will show the wakeup to scheduled latency, as well as the irq latency in from hritmer_start to the wakeup. The problem is that I found this:
<idle>-0 [007] d... 190.485261: wake_lat: lat=27 irqlat=190485230 pid=698 <idle>-0 [005] d... 190.485283: wake_lat: lat=40 irqlat=190485239 pid=10 <idle>-0 [002] d... 190.488327: wake_lat: lat=56 irqlat=190488266 pid=335 <idle>-0 [005] d... 190.489330: wake_lat: lat=64 irqlat=190489262 pid=10 <idle>-0 [003] d... 190.490312: wake_lat: lat=43 irqlat=190490265 pid=77 <idle>-0 [005] d... 190.493322: wake_lat: lat=54 irqlat=190493262 pid=10 <idle>-0 [005] d... 190.497305: wake_lat: lat=35 irqlat=190497267 pid=10 <idle>-0 [005] d... 190.501319: wake_lat: lat=50 irqlat=190501264 pid=10
The irqlat seemed quite large! Investigating this further, if I had enabled the irq_lat synthetic event, I noticed this:
<idle>-0 [002] d.s. 249.429308: irq_lat: lat=164968 pid=335 <idle>-0 [002] d... 249.429369: wake_lat: lat=55 irqlat=249429308 pid=335
Notice that the timestamp of the irq_lat "249.429308" is awfully similar to the reported irqlat variable. In fact, all instances were like this. It appeared that:
irqlat=$irqlat
Wasn't assigning the old $irqlat to the new irqlat variable, but instead was assigning the $irqts to it.
The issue is that assigning the old $irqlat to the new irqlat variable creates a variable reference alias, but the alias creation code forgets to make sure the alias uses the same var_ref_idx to access the reference.
Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org
Cc: Linux Trace Devel linux-trace-devel@vger.kernel.org Cc: linux-rt-users linux-rt-users@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7e8b88a30b085 ("tracing: Add hist trigger support for variable reference aliases") Reported-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_events_hist.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -2526,6 +2526,8 @@ static struct hist_field *create_alias(s return NULL; }
+ alias->var_ref_idx = var_ref->var_ref_idx; + return alias; }
From: Kees Cook keescook@chromium.org
commit 314eed30ede02fa925990f535652254b5bad6b65 upstream.
When running on a system with >512MB RAM with a 32-bit kernel built with:
CONFIG_DEBUG_VIRTUAL=y CONFIG_HIGHMEM=y CONFIG_HARDENED_USERCOPY=y
all execve()s will fail due to argv copying into kmap()ed pages, and on usercopy checking the calls ultimately of virt_to_page() will be looking for "bad" kmap (highmem) pointers due to CONFIG_DEBUG_VIRTUAL=y:
------------[ cut here ]------------ kernel BUG at ../arch/x86/mm/physaddr.c:83! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc8 #6 Hardware name: Dell Inc. Inspiron 1318/0C236D, BIOS A04 01/15/2009 EIP: __phys_addr+0xaf/0x100 ... Call Trace: __check_object_size+0xaf/0x3c0 ? __might_sleep+0x80/0xa0 copy_strings+0x1c2/0x370 copy_strings_kernel+0x2b/0x40 __do_execve_file+0x4ca/0x810 ? kmem_cache_alloc+0x1c7/0x370 do_execve+0x1b/0x20 ...
The check is from arch/x86/mm/physaddr.c:
VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn);
Due to the kmap() in fs/exec.c:
kaddr = kmap(kmapped_page); ... if (copy_from_user(kaddr+offset, str, bytes_to_copy)) ...
Now we can fetch the correct page to avoid the pfn check. In both cases, hardened usercopy will need to walk the page-span checker (if enabled) to do sanity checking.
Reported-by: Randy Dunlap rdunlap@infradead.org Tested-by: Randy Dunlap rdunlap@infradead.org Fixes: f5509cc18daa ("mm: Hardened usercopy") Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Link: https://lore.kernel.org/r/201909171056.7F2FFD17@keescook Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/usercopy.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -15,6 +15,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/mm.h> +#include <linux/highmem.h> #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/task.h> @@ -231,7 +232,12 @@ static inline void check_heap_object(con if (!virt_addr_valid(ptr)) return;
- page = virt_to_head_page(ptr); + /* + * When CONFIG_HIGHMEM=y, kmap_to_page() will give either the + * highmem page or fallback to virt_to_page(). The following + * is effectively a highmem-aware virt_to_head_page(). + */ + page = compound_head(kmap_to_page((void *)ptr));
if (PageSlab(page)) { /* Check slab allocator for flags and size. */
From: Li RongQing lirongqing@baidu.com
commit e430d802d6a3aaf61bd3ed03d9404888a29b9bf9 upstream.
The timer delayed for more than 3 seconds warning was triggered during testing.
Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xee/0x100 ... Call Trace: process_one_work+0x18c/0x3a0 worker_thread+0x30/0x380 kthread+0x113/0x130 ret_from_fork+0x22/0x40
The reason is that the code in collect_expired_timers() uses jiffies unprotected:
if (next_event > jiffies) base->clk = jiffies;
As the compiler is allowed to reload the value base->clk can advance between the check and the store and in the worst case advance farther than next event. That causes the timer expiry to be delayed until the wheel pointer wraps around.
Convert the code to use READ_ONCE()
Fixes: 236968383cf5 ("timers: Optimize collect_expired_timers() for NOHZ") Signed-off-by: Li RongQing lirongqing@baidu.com Signed-off-by: Liang ZhiCheng liangzhicheng@baidu.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/timer.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1590,24 +1590,26 @@ void timer_clear_idle(void) static int collect_expired_timers(struct timer_base *base, struct hlist_head *heads) { + unsigned long now = READ_ONCE(jiffies); + /* * NOHZ optimization. After a long idle sleep we need to forward the * base to current jiffies. Avoid a loop by searching the bitfield for * the next expiring timer. */ - if ((long)(jiffies - base->clk) > 2) { + if ((long)(now - base->clk) > 2) { unsigned long next = __next_timer_interrupt(base);
/* * If the next timer is ahead of time forward to current * jiffies, otherwise forward to the next expiry time: */ - if (time_after(next, jiffies)) { + if (time_after(next, now)) { /* * The call site will increment base->clk and then * terminate the expiry loop immediately. */ - base->clk = jiffies; + base->clk = now; return 0; } base->clk = next;
On Thu 2019-10-10 10:35:39, Greg Kroah-Hartman wrote:
From: Li RongQing lirongqing@baidu.com
commit e430d802d6a3aaf61bd3ed03d9404888a29b9bf9 upstream.
The reason is that the code in collect_expired_timers() uses jiffies unprotected:
if (next_event > jiffies) base->clk = jiffies;
As the compiler is allowed to reload the value base->clk can advance between the check and the store and in the worst case advance farther than next event. That causes the timer expiry to be delayed until the wheel pointer wraps around.
Convert the code to use READ_ONCE()
Does it really need to use READ_ONCE? "jiffies" is already volatile, READ_ONCE just adds another volatile...
Best regards, Pavel
From: Jon Derrick jonathan.derrick@intel.com
commit a1a30170138c9c5157bd514ccd4d76b47060f29b upstream.
The shadow offset scratchpad was moved to 0x2000-0x2010. Update the location to get the correct shadow offset.
Fixes: 6788958e4f3c ("PCI: vmd: Assign membar addresses from shadow registers") Signed-off-by: Jon Derrick jonathan.derrick@intel.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/controller/vmd.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -31,6 +31,9 @@ #define PCI_REG_VMLOCK 0x70 #define MB2_SHADOW_EN(vmlock) (vmlock & 0x2)
+#define MB2_SHADOW_OFFSET 0x2000 +#define MB2_SHADOW_SIZE 16 + enum vmd_features { /* * Device may contain registers which hint the physical location of the @@ -600,7 +603,7 @@ static int vmd_enable_domain(struct vmd_ u32 vmlock; int ret;
- membar2_offset = 0x2018; + membar2_offset = MB2_SHADOW_OFFSET + MB2_SHADOW_SIZE; ret = pci_read_config_dword(vmd->dev, PCI_REG_VMLOCK, &vmlock); if (ret || vmlock == ~0) return -ENODEV; @@ -612,9 +615,9 @@ static int vmd_enable_domain(struct vmd_ if (!membar2) return -ENOMEM; offset[0] = vmd->dev->resource[VMD_MEMBAR1].start - - readq(membar2 + 0x2008); + readq(membar2 + MB2_SHADOW_OFFSET); offset[1] = vmd->dev->resource[VMD_MEMBAR2].start - - readq(membar2 + 0x2010); + readq(membar2 + MB2_SHADOW_OFFSET + 8); pci_iounmap(vmd->dev, membar2); } }
From: Sumit Saxena sumit.saxena@broadcom.com
commit d2182b2d4b71ff0549a07f414d921525fade707b upstream.
In a Resizable BAR Control Register, bits 13:8 control the size of the BAR. The encoded values of these bits are as follows (see PCIe r5.0, sec 7.8.6.3):
Value BAR size 0 1 MB (2^20 bytes) 1 2 MB (2^21 bytes) 2 4 MB (2^22 bytes) ... 43 8 EB (2^63 bytes)
Previously we incorrectly set the BAR size bits for a 1 MB BAR to 0x1f instead of 0, so devices that support that size, e.g., new megaraid_sas and mpt3sas adapters, fail to initialize during resume from S3 sleep.
Correctly calculate the BAR size bits for Resizable BAR control registers.
Link: https://lore.kernel.org/r/20190725192552.24295-1-sumit.saxena@broadcom.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203939 Fixes: d3252ace0bc6 ("PCI: Restore resized BAR state on resume") Signed-off-by: Sumit Saxena sumit.saxena@broadcom.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1366,7 +1366,7 @@ static void pci_restore_rebar_state(stru pci_read_config_dword(pdev, pos + PCI_REBAR_CTRL, &ctrl); bar_idx = ctrl & PCI_REBAR_CTRL_BAR_IDX; res = pdev->resource + bar_idx; - size = order_base_2((resource_size(res) >> 20) | 1) - 1; + size = ilog2(resource_size(res)) - 20; ctrl &= ~PCI_REBAR_CTRL_BAR_SIZE; ctrl |= size << PCI_REBAR_CTRL_BAR_SHIFT; pci_write_config_dword(pdev, pos + PCI_REBAR_CTRL, ctrl);
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit 144783a80cd2cbc45c6ce17db649140b65f203dd upstream.
Converting from ms to s requires dividing by 1000, not multiplying. So this is currently taking the smaller of new_timeout and 1.28e8, i.e. effectively new_timeout.
The driver knows what it set max_hw_heartbeat_ms to, so use that value instead of doing a division at run-time.
FWIW, this can easily be tested by booting into a busybox shell and doing "watchdog -t 5 -T 130 /dev/watchdog" - without this patch, the watchdog fires after 130&127 == 2 seconds.
Fixes: b07e228eee69 "watchdog: imx2_wdt: Fix set_timeout for big timeout values" Cc: stable@vger.kernel.org # 5.2 plus anything the above got backported to Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20190812131356.23039-1-linux@rasmusvillemoes.dk Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/watchdog/imx2_wdt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/watchdog/imx2_wdt.c +++ b/drivers/watchdog/imx2_wdt.c @@ -55,7 +55,7 @@
#define IMX2_WDT_WMCR 0x08 /* Misc Register */
-#define IMX2_WDT_MAX_TIME 128 +#define IMX2_WDT_MAX_TIME 128U #define IMX2_WDT_DEFAULT_TIME 60 /* in seconds */
#define WDOG_SEC_TO_COUNT(s) ((s * 2 - 1) << 8) @@ -180,7 +180,7 @@ static int imx2_wdt_set_timeout(struct w { unsigned int actual;
- actual = min(new_timeout, wdog->max_hw_heartbeat_ms * 1000); + actual = min(new_timeout, IMX2_WDT_MAX_TIME); __imx2_wdt_set_timeout(wdog, actual); wdog->timeout = new_timeout; return 0;
From: Srikar Dronamraju srikar@linux.vnet.ibm.com
commit 443f2d5ba13d65ccfd879460f77941875159d154 upstream.
Observe a segmentation fault when 'perf stat' is asked to repeat forever with the interval option.
Without fix:
# perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.000211692 3,13,89,82,34,157 cycles 10.000380119 1,53,98,52,22,294 cycles 10.040467280 17,16,79,265 cycles Segmentation fault
This problem was only observed when we use forever option aka -r 0 and works with limited repeats. Calling print_counter with ts being set to NULL, is not a correct option when interval is set. Hence avoid print_counter(NULL,..) if interval is set.
With fix:
# perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.019866622 3,15,14,43,08,697 cycles 10.039865756 3,15,16,31,95,261 cycles 10.059950628 1,26,05,47,158 cycles 5.009902655 3,14,52,62,33,932 cycles 10.019880228 3,14,52,22,89,154 cycles 10.030543876 66,90,18,333 cycles 5.009848281 3,14,51,98,25,437 cycles 10.029854402 3,15,14,93,04,918 cycles 5.009834177 3,14,51,95,92,316 cycles
Committer notes:
Did the 'git bisect' to find the cset introducing the problem to add the Fixes tag below, and at that time the problem reproduced as:
(gdb) run stat -r0 -I500 sleep 1 <SNIP> Program received signal SIGSEGV, Segmentation fault. print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 866 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep); (gdb) bt #0 print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 #1 0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938 #2 0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411 #3 0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370 #4 0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429 #5 0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473 #6 0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588 (gdb)
Mostly the same as just before this patch:
Program received signal SIGSEGV, Segmentation fault. 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 964 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); (gdb) bt #0 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 #1 0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670) at util/stat-display.c:1172 #2 0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656 #3 0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960 #4 0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310 #5 0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362 #6 0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406 #7 0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531 (gdb)
Fixes: d4f63a4741a8 ("perf stat: Introduce print_counters function") Signed-off-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Acked-by: Jiri Olsa jolsa@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Tested-by: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org # v4.2+ Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/builtin-stat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -3091,7 +3091,7 @@ int cmd_stat(int argc, const char **argv run_idx + 1);
status = run_perf_stat(argc, argv, run_idx); - if (forever && status != -1) { + if (forever && status != -1 && !interval) { print_counters(NULL, argc, argv); perf_stat__reset_stats(); }
From: Tomi Valkeinen tomi.valkeinen@ti.com
commit e2c4ed148cf3ec8669a1d90dc66966028e5fad70 upstream.
The OMAP36xx and AM/DM37x TRMs say that the maximum divider for DSS fclk (in CM_CLKSEL_DSS) is 32. Experimentation shows that this is not correct, and using divider of 32 breaks DSS with a flood or underflows and sync losts. Dividers up to 31 seem to work fine.
There is another patch to the DT files to limit the divider correctly, but as the DSS driver also needs to know the maximum divider to be able to iteratively find good rates, we also need to do the fix in the DSS driver.
Signed-off-by: Tomi Valkeinen tomi.valkeinen@ti.com Cc: Adam Ford aford173@gmail.com Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20191002122542.8449-1-tomi.val... Tested-by: Adam Ford aford173@gmail.com Reviewed-by: Jyri Sarha jsarha@ti.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/omapdrm/dss/dss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/omapdrm/dss/dss.c +++ b/drivers/gpu/drm/omapdrm/dss/dss.c @@ -1110,7 +1110,7 @@ static const struct dss_features omap34x
static const struct dss_features omap3630_dss_feats = { .model = DSS_MODEL_OMAP3, - .fck_div_max = 32, + .fck_div_max = 31, .fck_freq_max = 173000000, .dss_fck_multiplier = 1, .parent_clk_name = "dpll4_ck",
From: Sean Paul seanpaul@chromium.org
commit 5fb9b797d5ccf311ae4aba69e86080d47668b5f7 upstream.
clk_get_parent returns an error pointer upon failure, not NULL. So the checks as they exist won't catch a failure. This patch changes the checks and the return values to properly handle an error pointer.
Fixes: c4d8cfe516dc ("drm/msm/dsi: add implementation for helper functions") Cc: Sibi Sankar sibis@codeaurora.org Cc: Sean Paul seanpaul@chromium.org Cc: Rob Clark robdclark@chromium.org Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Sean Paul seanpaul@chromium.org Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/msm/dsi/dsi_host.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c +++ b/drivers/gpu/drm/msm/dsi/dsi_host.c @@ -429,15 +429,15 @@ static int dsi_clk_init(struct msm_dsi_h }
msm_host->byte_clk_src = clk_get_parent(msm_host->byte_clk); - if (!msm_host->byte_clk_src) { - ret = -ENODEV; + if (IS_ERR(msm_host->byte_clk_src)) { + ret = PTR_ERR(msm_host->byte_clk_src); pr_err("%s: can't find byte_clk clock. ret=%d\n", __func__, ret); goto exit; }
msm_host->pixel_clk_src = clk_get_parent(msm_host->pixel_clk); - if (!msm_host->pixel_clk_src) { - ret = -ENODEV; + if (IS_ERR(msm_host->pixel_clk_src)) { + ret = PTR_ERR(msm_host->pixel_clk_src); pr_err("%s: can't find pixel_clk clock. ret=%d\n", __func__, ret); goto exit; }
From: Lyude Paul lyude@redhat.com
commit 698c1aa9f83b618de79e9e5e19a58f70a4a6ae0f upstream.
On the ThinkPad P71, we have one eDP connector exposed along with 5 DP connectors, resulting in a total of 11 TMDS encoders. Since the GPU on this system is also capable of MST, we create an additional 4 fake MST encoders for each DP port. Unfortunately, we also do this for the eDP port as well, resulting in:
1 eDP port: +1 TMDS encoder +4 DPMST encoders 5 DP ports: +2 TMDS encoders +4 DPMST encoders *5 ports == 35 encoders
Which breaks things, since DRM has a hard coded limit of 32 encoders. So, fix this by not creating MSTMs for any eDP connectors. This brings us down to 31 encoders, although we can do better.
This fixes driver probing for nouveau on the ThinkPad P71.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs bskeggs@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -1517,7 +1517,8 @@ nv50_sor_create(struct drm_connector *co nv_encoder->aux = aux; }
- if ((data = nvbios_dp_table(bios, &ver, &hdr, &cnt, &len)) && + if (nv_connector->type != DCB_CONNECTOR_eDP && + (data = nvbios_dp_table(bios, &ver, &hdr, &cnt, &len)) && ver >= 0x40 && (nvbios_rd08(bios, data + 0x08) & 0x04)) { ret = nv50_mstm_new(nv_encoder, &nv_connector->aux, 16, nv_connector->base.base.id,
From: Xiaolin Zhang xiaolin.zhang@intel.com
commit 0a3242bdb47713e09cb004a0ba4947d3edf82d8a upstream.
when creating a vGPU workload, the guest context head pointer should be updated correctly by comparing with the exsiting workload in the guest worklod queue including the current running context.
in some situation, there is a running context A and then received 2 new vGPU workload context B and A. in the new workload context A, it's head pointer should be updated with the running context A's tail.
v2: walk through guest workload list in backward way.
Cc: stable@vger.kernel.org Signed-off-by: Xiaolin Zhang xiaolin.zhang@intel.com Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gvt/scheduler.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
--- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -1276,9 +1276,6 @@ static int prepare_mm(struct intel_vgpu_ #define same_context(a, b) (((a)->context_id == (b)->context_id) && \ ((a)->lrca == (b)->lrca))
-#define get_last_workload(q) \ - (list_empty(q) ? NULL : container_of(q->prev, \ - struct intel_vgpu_workload, list)) /** * intel_vgpu_create_workload - create a vGPU workload * @vgpu: a vGPU @@ -1297,7 +1294,7 @@ intel_vgpu_create_workload(struct intel_ { struct intel_vgpu_submission *s = &vgpu->submission; struct list_head *q = workload_q_head(vgpu, ring_id); - struct intel_vgpu_workload *last_workload = get_last_workload(q); + struct intel_vgpu_workload *last_workload = NULL; struct intel_vgpu_workload *workload = NULL; struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; u64 ring_context_gpa; @@ -1320,15 +1317,20 @@ intel_vgpu_create_workload(struct intel_ head &= RB_HEAD_OFF_MASK; tail &= RB_TAIL_OFF_MASK;
- if (last_workload && same_context(&last_workload->ctx_desc, desc)) { - gvt_dbg_el("ring id %d cur workload == last\n", ring_id); - gvt_dbg_el("ctx head %x real head %lx\n", head, - last_workload->rb_tail); - /* - * cannot use guest context head pointer here, - * as it might not be updated at this time - */ - head = last_workload->rb_tail; + list_for_each_entry_reverse(last_workload, q, list) { + + if (same_context(&last_workload->ctx_desc, desc)) { + gvt_dbg_el("ring id %d cur workload == last\n", + ring_id); + gvt_dbg_el("ctx head %x real head %lx\n", head, + last_workload->rb_tail); + /* + * cannot use guest context head pointer here, + * as it might not be updated at this time + */ + head = last_workload->rb_tail; + break; + } }
gvt_dbg_el("ring id %d begin a new workload\n", ring_id);
From: Russell King rmk+kernel@armlinux.org.uk
commit d1c536e3177390da43d99f20143b810c35433d1f upstream.
ADMA errors are potentially data corrupting events; although we print the register state, we do not usefully print the ADMA descriptors. Worse than that, we print them by referencing their virtual address which is meaningless when the register state gives us the DMA address of the failing descriptor.
Print the ADMA descriptors giving their DMA addresses rather than their virtual addresses, and print them using SDHCI_DUMP() rather than DBG().
We also do not show the correct value of the interrupt status register; the register dump shows the current value, after we have cleared the pending interrupts we are going to service. What is more useful is to print the interrupts that _were_ pending at the time the ADMA error was encountered. Fix that too.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2720,6 +2720,7 @@ static void sdhci_cmd_irq(struct sdhci_h static void sdhci_adma_show_error(struct sdhci_host *host) { void *desc = host->adma_table; + dma_addr_t dma = host->adma_addr;
sdhci_dumpregs(host);
@@ -2727,18 +2728,21 @@ static void sdhci_adma_show_error(struct struct sdhci_adma2_64_desc *dma_desc = desc;
if (host->flags & SDHCI_USE_64_BIT_DMA) - DBG("%p: DMA 0x%08x%08x, LEN 0x%04x, Attr=0x%02x\n", - desc, le32_to_cpu(dma_desc->addr_hi), + SDHCI_DUMP("%08llx: DMA 0x%08x%08x, LEN 0x%04x, Attr=0x%02x\n", + (unsigned long long)dma, + le32_to_cpu(dma_desc->addr_hi), le32_to_cpu(dma_desc->addr_lo), le16_to_cpu(dma_desc->len), le16_to_cpu(dma_desc->cmd)); else - DBG("%p: DMA 0x%08x, LEN 0x%04x, Attr=0x%02x\n", - desc, le32_to_cpu(dma_desc->addr_lo), + SDHCI_DUMP("%08llx: DMA 0x%08x, LEN 0x%04x, Attr=0x%02x\n", + (unsigned long long)dma, + le32_to_cpu(dma_desc->addr_lo), le16_to_cpu(dma_desc->len), le16_to_cpu(dma_desc->cmd));
desc += host->desc_sz; + dma += host->desc_sz;
if (dma_desc->cmd & cpu_to_le16(ADMA2_END)) break; @@ -2814,7 +2818,8 @@ static void sdhci_data_irq(struct sdhci_ != MMC_BUS_TEST_R) host->data->error = -EILSEQ; else if (intmask & SDHCI_INT_ADMA_ERROR) { - pr_err("%s: ADMA error\n", mmc_hostname(host->mmc)); + pr_err("%s: ADMA error: 0x%08x\n", mmc_hostname(host->mmc), + intmask); sdhci_adma_show_error(host); host->data->error = -EIO; if (host->ops->adma_workaround)
From: Russell King rmk+kernel@armlinux.org.uk
commit 121bd08b029e03404c451bb237729cdff76eafed upstream.
We must not unconditionally set the DMA snoop bit; if the DMA API is assuming that the device is not DMA coherent, and the device snoops the CPU caches, the device can see stale cache lines brought in by speculative prefetch.
This leads to the device seeing stale data, potentially resulting in corrupted data transfers. Commonly, this results in a descriptor fetch error such as:
mmc0: ADMA error mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00002202 mmc0: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000001 mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013 mmc0: sdhci: Present: 0x01f50008 | Host ctl: 0x00000038 mmc0: sdhci: Power: 0x00000003 | Blk gap: 0x00000000 mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x000040d8 mmc0: sdhci: Timeout: 0x00000003 | Int stat: 0x00000001 mmc0: sdhci: Int enab: 0x037f108f | Sig enab: 0x037f108b mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202 mmc0: sdhci: Caps: 0x35fa0000 | Caps_1: 0x0000af00 mmc0: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000 mmc0: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x001d8a33 mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x3f400e00 mmc0: sdhci: Host ctl2: 0x00000000 mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x000000236d43820c mmc0: sdhci: ============================================ mmc0: error -5 whilst initialising SD card
but can lead to other errors, and potentially direct the SDHCI controller to read/write data to other memory locations (e.g. if a valid descriptor is visible to the device in a stale cache line.)
Fix this by ensuring that the DMA snoop bit corresponds with the behaviour of the DMA API. Since the driver currently only supports DT, use of_dma_is_coherent(). Note that device_get_dma_attr() can not be used as that risks re-introducing this bug if/when the driver is converted to ACPI.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-of-esdhc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-of-esdhc.c +++ b/drivers/mmc/host/sdhci-of-esdhc.c @@ -480,7 +480,12 @@ static int esdhc_of_enable_dma(struct sd dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
value = sdhci_readl(host, ESDHC_DMA_SYSCTL); - value |= ESDHC_DMA_SNOOP; + + if (of_dma_is_coherent(dev->of_node)) + value |= ESDHC_DMA_SNOOP; + else + value &= ~ESDHC_DMA_SNOOP; + sdhci_writel(host, value, ESDHC_DMA_SYSCTL); return 0; }
From: Wanpeng Li wanpengli@tencent.com
commit 89340d0935c9296c7b8222b6eab30e67cb57ab82 upstream.
This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted). A large performance regression was caused by this commit. on over-subscription scenarios.
The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads, with three VMs of 80 vCPUs each. The score of ebizzy -M is reduced from 13000-14000 records/s to 1700-1800 records/s:
Host Guest score
vanilla w/o kvm optimizations upstream 1700-1800 records/s vanilla w/o kvm optimizations revert 13000-14000 records/s vanilla w/ kvm optimizations upstream 4500-5000 records/s vanilla w/ kvm optimizations revert 14000-15500 records/s
Exit from aggressive wait-early mechanism can result in premature yield and extra scheduling latency.
Actually, only 6% of wait_early events are caused by vcpu_is_preempted() being true. However, when one vCPU voluntarily releases its vCPU, all the subsequently waiters in the queue will do the same and the cascading effect leads to bad performance.
kvm optimizations: [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts) [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
Tested-by: loobinliu@tencent.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: Waiman Long longman@redhat.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: loobinliu@tencent.com Cc: stable@vger.kernel.org Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted) Signed-off-by: Wanpeng Li wanpengli@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/locking/qspinlock_paravirt.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -271,7 +271,7 @@ pv_wait_early(struct pv_node *prev, int if ((loop & PV_PREV_CHECK_MASK) != 0) return false;
- return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu); + return READ_ONCE(prev->state) != vcpu_running; }
/*
From: Juergen Gross jgross@suse.com
commit a8fabb38525c51a094607768bac3ba46b3f4a9d5 upstream.
In case a user process using xenbus has open transactions and is killed e.g. via ctrl-C the following cleanup of the allocated resources might result in a deadlock due to trying to end a transaction in the xenbus worker thread:
[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds. [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5 [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2551.528585] xenbus D 0 37 2 0x80000080 [ 2551.528590] Call Trace: [ 2551.528603] __schedule+0x2c0/0x870 [ 2551.528606] ? _cond_resched+0x19/0x40 [ 2551.528632] schedule+0x2c/0x70 [ 2551.528637] xs_talkv+0x1ec/0x2b0 [ 2551.528642] ? wait_woken+0x80/0x80 [ 2551.528645] xs_single+0x53/0x80 [ 2551.528648] xenbus_transaction_end+0x3b/0x70 [ 2551.528651] xenbus_file_free+0x5a/0x160 [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220 [ 2551.528657] xenbus_thread+0x7de/0x880 [ 2551.528660] ? wait_woken+0x80/0x80 [ 2551.528665] kthread+0x121/0x140 [ 2551.528667] ? xb_read+0x1d0/0x1d0 [ 2551.528670] ? kthread_park+0x90/0x90 [ 2551.528673] ret_from_fork+0x35/0x40
Fix this by doing the cleanup via a workqueue instead.
Reported-by: James Dingwall james@dingwall.me.uk Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Cc: stable@vger.kernel.org # 4.11 Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/xenbus/xenbus_dev_frontend.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c @@ -55,6 +55,7 @@ #include <linux/string.h> #include <linux/slab.h> #include <linux/miscdevice.h> +#include <linux/workqueue.h>
#include <xen/xenbus.h> #include <xen/xen.h> @@ -116,6 +117,8 @@ struct xenbus_file_priv { wait_queue_head_t read_waitq;
struct kref kref; + + struct work_struct wq; };
/* Read out any raw xenbus messages queued up. */ @@ -300,14 +303,14 @@ static void watch_fired(struct xenbus_wa mutex_unlock(&adap->dev_data->reply_mutex); }
-static void xenbus_file_free(struct kref *kref) +static void xenbus_worker(struct work_struct *wq) { struct xenbus_file_priv *u; struct xenbus_transaction_holder *trans, *tmp; struct watch_adapter *watch, *tmp_watch; struct read_buffer *rb, *tmp_rb;
- u = container_of(kref, struct xenbus_file_priv, kref); + u = container_of(wq, struct xenbus_file_priv, wq);
/* * No need for locking here because there are no other users, @@ -333,6 +336,18 @@ static void xenbus_file_free(struct kref kfree(u); }
+static void xenbus_file_free(struct kref *kref) +{ + struct xenbus_file_priv *u; + + /* + * We might be called in xenbus_thread(). + * Use workqueue to avoid deadlock. + */ + u = container_of(kref, struct xenbus_file_priv, kref); + schedule_work(&u->wq); +} + static struct xenbus_transaction_holder *xenbus_get_transaction( struct xenbus_file_priv *u, uint32_t tx_id) { @@ -652,6 +667,7 @@ static int xenbus_file_open(struct inode INIT_LIST_HEAD(&u->watches); INIT_LIST_HEAD(&u->read_buffers); init_waitqueue_head(&u->read_waitq); + INIT_WORK(&u->wq, xenbus_worker);
mutex_init(&u->reply_mutex); mutex_init(&u->msgbuffer_mutex);
From: Johan Hovold johan@kernel.org
commit 7fd25e6fc035f4b04b75bca6d7e8daa069603a76 upstream.
The disconnect callback was accessing the hardware-descriptor private data after having having freed it.
Fixes: 7490b008d123 ("ieee802154: add support for atusb transceiver") Cc: stable stable@vger.kernel.org # 4.2 Cc: Alexander Aring alex.aring@gmail.com Reported-by: syzbot+f4509a9138a1472e7e80@syzkaller.appspotmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ieee802154/atusb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ieee802154/atusb.c +++ b/drivers/net/ieee802154/atusb.c @@ -1140,10 +1140,11 @@ static void atusb_disconnect(struct usb_
ieee802154_unregister_hw(atusb->hw);
+ usb_put_dev(atusb->usb_dev); + ieee802154_free_hw(atusb->hw);
usb_set_intfdata(interface, NULL); - usb_put_dev(atusb->usb_dev);
pr_debug("%s done\n", __func__); }
From: Vasily Gorbik gor@linux.ibm.com
commit ea298e6ee8b34b3ed4366be7eb799d0650ebe555 upstream.
Fix the following kasan finding: BUG: KASAN: global-out-of-bounds in ccwgroup_create_dev+0x850/0x1140 Read of size 1 at addr 0000000000000000 by task systemd-udevd.r/561
CPU: 30 PID: 561 Comm: systemd-udevd.r Tainted: G B Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: ([<0000000231b3db7e>] show_stack+0x14e/0x1a8) [<0000000233826410>] dump_stack+0x1d0/0x218 [<000000023216fac4>] print_address_description+0x64/0x380 [<000000023216f5a8>] __kasan_report+0x138/0x168 [<00000002331b8378>] ccwgroup_create_dev+0x850/0x1140 [<00000002332b618a>] group_store+0x3a/0x50 [<00000002323ac706>] kernfs_fop_write+0x246/0x3b8 [<00000002321d409a>] vfs_write+0x132/0x450 [<00000002321d47da>] ksys_write+0x122/0x208 [<0000000233877102>] system_call+0x2a6/0x2c8
Triggered by: openat(AT_FDCWD, "/sys/bus/ccwgroup/drivers/qeth/group", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 16 write(16, "0.0.bd00,0.0.bd01,0.0.bd02", 26) = 26
The problem is that __get_next_id in ccwgroup_create_dev might set "buf" buffer pointer to NULL and explicit check for that is required.
Cc: stable@vger.kernel.org Reviewed-by: Sebastian Ott sebott@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/ccwgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/cio/ccwgroup.c +++ b/drivers/s390/cio/ccwgroup.c @@ -372,7 +372,7 @@ int ccwgroup_create_dev(struct device *p goto error; } /* Check for trailing stuff. */ - if (i == num_devices && strlen(buf) > 0) { + if (i == num_devices && buf && strlen(buf) > 0) { rc = -EINVAL; goto error; }
From: Johannes Berg johannes.berg@intel.com
commit f43e5210c739fe76a4b0ed851559d6902f20ceb1 upstream.
In a few places we don't properly initialize on-stack chandefs, resulting in EDMG data to be non-zero, which broke things.
Additionally, in a few places we rely on the driver to init the data completely, but perhaps we shouldn't as non-EDMG drivers may not initialize the EDMG data, also initialize it there.
Cc: stable@vger.kernel.org Fixes: 2a38075cd0be ("nl80211: Add support for EDMG channels") Reported-by: Dmitry Osipenko digetx@gmail.com Tested-by: Dmitry Osipenko digetx@gmail.com Link: https://lore.kernel.org/r/1569239475-I2dcce394ecf873376c386a78f31c2ec8b538fa... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/nl80211.c | 4 +++- net/wireless/reg.c | 2 +- net/wireless/wext-compat.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -2299,6 +2299,8 @@ static int nl80211_parse_chandef(struct
control_freq = nla_get_u32(info->attrs[NL80211_ATTR_WIPHY_FREQ]);
+ memset(chandef, 0, sizeof(*chandef)); + chandef->chan = ieee80211_get_channel(&rdev->wiphy, control_freq); chandef->width = NL80211_CHAN_WIDTH_20_NOHT; chandef->center_freq1 = control_freq; @@ -2819,7 +2821,7 @@ static int nl80211_send_iface(struct sk_
if (rdev->ops->get_channel) { int ret; - struct cfg80211_chan_def chandef; + struct cfg80211_chan_def chandef = {};
ret = rdev_get_channel(rdev, wdev, &chandef); if (ret == 0) { --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -2095,7 +2095,7 @@ static void reg_call_notifier(struct wip
static bool reg_wdev_chan_valid(struct wiphy *wiphy, struct wireless_dev *wdev) { - struct cfg80211_chan_def chandef; + struct cfg80211_chan_def chandef = {}; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy); enum nl80211_iftype iftype;
--- a/net/wireless/wext-compat.c +++ b/net/wireless/wext-compat.c @@ -800,7 +800,7 @@ static int cfg80211_wext_giwfreq(struct { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); - struct cfg80211_chan_def chandef; + struct cfg80211_chan_def chandef = {}; int ret;
switch (wdev->iftype) {
From: Will Deacon will.deacon@arm.com
commit d71be2b6c0e19180b5f80a6d42039cc074a693a2 upstream.
Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass Safe (SSBS) which can be used as a mitigation against Spectre variant 4.
Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS directly, so that userspace can toggle the SSBS control without trapping to the kernel.
This patch probes for the existence of SSBS and advertise the new instructions to userspace if they exist.
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/cpucaps.h | 3 ++- arch/arm64/include/asm/sysreg.h | 16 ++++++++++++---- arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/kernel/cpufeature.c | 19 +++++++++++++++++-- arch/arm64/kernel/cpuinfo.c | 1 + 5 files changed, 33 insertions(+), 7 deletions(-)
--- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -52,7 +52,8 @@ #define ARM64_MISMATCHED_CACHE_TYPE 31 #define ARM64_HAS_STAGE2_FWB 32 #define ARM64_WORKAROUND_1463225 33 +#define ARM64_SSBS 34
-#define ARM64_NCAPS 34 +#define ARM64_NCAPS 35
#endif /* __ASM_CPUCAPS_H */ --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -419,6 +419,7 @@ #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
/* Common SCTLR_ELx flags. */ +#define SCTLR_ELx_DSSBS (1UL << 44) #define SCTLR_ELx_EE (1 << 25) #define SCTLR_ELx_IESB (1 << 21) #define SCTLR_ELx_WXN (1 << 19) @@ -439,7 +440,7 @@ (1 << 10) | (1 << 13) | (1 << 14) | (1 << 15) | \ (1 << 17) | (1 << 20) | (1 << 24) | (1 << 26) | \ (1 << 27) | (1 << 30) | (1 << 31) | \ - (0xffffffffUL << 32)) + (0xffffefffUL << 32))
#ifdef CONFIG_CPU_BIG_ENDIAN #define ENDIAN_SET_EL2 SCTLR_ELx_EE @@ -453,7 +454,7 @@ #define SCTLR_EL2_SET (SCTLR_ELx_IESB | ENDIAN_SET_EL2 | SCTLR_EL2_RES1) #define SCTLR_EL2_CLEAR (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \ SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_WXN | \ - ENDIAN_CLEAR_EL2 | SCTLR_EL2_RES0) + SCTLR_ELx_DSSBS | ENDIAN_CLEAR_EL2 | SCTLR_EL2_RES0)
#if (SCTLR_EL2_SET ^ SCTLR_EL2_CLEAR) != 0xffffffffffffffff #error "Inconsistent SCTLR_EL2 set/clear bits" @@ -477,7 +478,7 @@ (1 << 29)) #define SCTLR_EL1_RES0 ((1 << 6) | (1 << 10) | (1 << 13) | (1 << 17) | \ (1 << 27) | (1 << 30) | (1 << 31) | \ - (0xffffffffUL << 32)) + (0xffffefffUL << 32))
#ifdef CONFIG_CPU_BIG_ENDIAN #define ENDIAN_SET_EL1 (SCTLR_EL1_E0E | SCTLR_ELx_EE) @@ -494,7 +495,7 @@ ENDIAN_SET_EL1 | SCTLR_EL1_UCI | SCTLR_EL1_RES1) #define SCTLR_EL1_CLEAR (SCTLR_ELx_A | SCTLR_EL1_CP15BEN | SCTLR_EL1_ITD |\ SCTLR_EL1_UMA | SCTLR_ELx_WXN | ENDIAN_CLEAR_EL1 |\ - SCTLR_EL1_RES0) + SCTLR_ELx_DSSBS | SCTLR_EL1_RES0)
#if (SCTLR_EL1_SET ^ SCTLR_EL1_CLEAR) != 0xffffffffffffffff #error "Inconsistent SCTLR_EL1 set/clear bits" @@ -544,6 +545,13 @@ #define ID_AA64PFR0_EL0_64BIT_ONLY 0x1 #define ID_AA64PFR0_EL0_32BIT_64BIT 0x2
+/* id_aa64pfr1 */ +#define ID_AA64PFR1_SSBS_SHIFT 4 + +#define ID_AA64PFR1_SSBS_PSTATE_NI 0 +#define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 +#define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 + /* id_aa64mmfr0 */ #define ID_AA64MMFR0_TGRAN4_SHIFT 28 #define ID_AA64MMFR0_TGRAN64_SHIFT 24 --- a/arch/arm64/include/uapi/asm/hwcap.h +++ b/arch/arm64/include/uapi/asm/hwcap.h @@ -48,5 +48,6 @@ #define HWCAP_USCAT (1 << 25) #define HWCAP_ILRCPC (1 << 26) #define HWCAP_FLAGM (1 << 27) +#define HWCAP_SSBS (1 << 28)
#endif /* _UAPI__ASM_HWCAP_H */ --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -164,6 +164,11 @@ static const struct arm64_ftr_bits ftr_i ARM64_FTR_END, };
+static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI), + ARM64_FTR_END, +}; + static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { /* * We already refuse to boot CPUs that don't support our configured @@ -379,7 +384,7 @@ static const struct __ftr_reg_entry {
/* Op1 = 0, CRn = 0, CRm = 4 */ ARM64_FTR_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0), - ARM64_FTR_REG(SYS_ID_AA64PFR1_EL1, ftr_raz), + ARM64_FTR_REG(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1), ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_raz),
/* Op1 = 0, CRn = 0, CRm = 5 */ @@ -669,7 +674,6 @@ void update_cpu_features(int cpu,
/* * EL3 is not our concern. - * ID_AA64PFR1 is currently RES0. */ taint |= check_update_ftr_reg(SYS_ID_AA64PFR0_EL1, cpu, info->reg_id_aa64pfr0, boot->reg_id_aa64pfr0); @@ -1254,6 +1258,16 @@ static const struct arm64_cpu_capabiliti .cpu_enable = cpu_enable_hw_dbm, }, #endif + { + .desc = "Speculative Store Bypassing Safe (SSBS)", + .capability = ARM64_SSBS, + .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, + .matches = has_cpuid_feature, + .sys_reg = SYS_ID_AA64PFR1_EL1, + .field_pos = ID_AA64PFR1_SSBS_SHIFT, + .sign = FTR_UNSIGNED, + .min_field_value = ID_AA64PFR1_SSBS_PSTATE_ONLY, + }, {}, };
@@ -1299,6 +1313,7 @@ static const struct arm64_cpu_capabiliti #ifdef CONFIG_ARM64_SVE HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_SVE_SHIFT, FTR_UNSIGNED, ID_AA64PFR0_SVE, CAP_HWCAP, HWCAP_SVE), #endif + HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SSBS_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_SSBS_PSTATE_INSNS, CAP_HWCAP, HWCAP_SSBS), {}, };
--- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -81,6 +81,7 @@ static const char *const hwcap_str[] = { "uscat", "ilrcpc", "flagm", + "ssbs", NULL };
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit f5e1040196dbfe14c77ce3dfe3b7b08d2d961e88 ]
integrity_kernel_read() returns the number of bytes read. If this is a short read then this positive value is returned from ima_calc_file_hash_atfm(). Currently this is only indirectly called from ima_calc_file_hash() and this function only tests for the return value being zero or nonzero and also doesn't forward the return value. Nevertheless there's no point in returning a positive value as an error, so translate a short read into -EINVAL.
Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- security/integrity/ima/ima_crypto.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c index d9e7728027c6c..b7822d2b79736 100644 --- a/security/integrity/ima/ima_crypto.c +++ b/security/integrity/ima/ima_crypto.c @@ -271,8 +271,11 @@ static int ima_calc_file_hash_atfm(struct file *file, rbuf_len = min_t(loff_t, i_size - offset, rbuf_size[active]); rc = integrity_kernel_read(file, offset, rbuf[active], rbuf_len); - if (rc != rbuf_len) + if (rc != rbuf_len) { + if (rc >= 0) + rc = -EINVAL; goto out3; + }
if (rbuf[1] && offset) { /* Using two buffers, and it is not the first
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit 4ece3125f21b1d42b84896c5646dbf0e878464e1 ]
integrity_kernel_read() can fail in which case we forward to call ahash_request_free() on a currently running request. We have to wait for its completion before we can free the request.
This was observed by interrupting a "find / -type f -xdev -print0 | xargs -0 cat 1>/dev/null" with ctrl-c on an IMA enabled filesystem.
Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- security/integrity/ima/ima_crypto.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c index b7822d2b79736..f63b4bd45d60e 100644 --- a/security/integrity/ima/ima_crypto.c +++ b/security/integrity/ima/ima_crypto.c @@ -274,6 +274,11 @@ static int ima_calc_file_hash_atfm(struct file *file, if (rc != rbuf_len) { if (rc >= 0) rc = -EINVAL; + /* + * Forward current rc, do not overwrite with return value + * from ahash_wait() + */ + ahash_wait(ahash_rc, &wait); goto out3; }
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit e2751463eaa6f9fec8fea80abbdc62dbc487b3c5 ]
In encode_attrs(), there is an if statement on line 1145 to check whether label is NULL: if (label && (attrmask[2] & FATTR4_WORD2_SECURITY_LABEL))
When label is NULL, it is used on lines 1178-1181: *p++ = cpu_to_be32(label->lfs); *p++ = cpu_to_be32(label->pi); *p++ = cpu_to_be32(label->len); p = xdr_encode_opaque_fixed(p, label->label, label->len);
To fix these bugs, label is checked before being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index b7bde12d8cd51..1c0227c78a7bc 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -1171,7 +1171,7 @@ static void encode_attrs(struct xdr_stream *xdr, const struct iattr *iap, } else *p++ = cpu_to_be32(NFS4_SET_TO_SERVER_TIME); } - if (bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { + if (label && (bmval[2] & FATTR4_WORD2_SECURITY_LABEL)) { *p++ = cpu_to_be32(label->lfs); *p++ = cpu_to_be32(label->pi); *p++ = cpu_to_be32(label->len);
From: Lu Shuaibing shuaibinglu@126.com
[ Upstream commit 0ce772fe79b68f83df40f07f28207b292785c677 ]
The p9_tag_alloc() does not initialize the transport error t_err field. The struct p9_req_t *req is allocated and stored in a struct p9_client variable. The field t_err is never initialized before p9_conn_cancel() checks its value.
KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool) reports this bug.
================================================================== BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0 Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216
CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events p9_write_work Call Trace: dump_stack+0x75/0xae __kumsan_report+0x17c/0x3e6 kumsan_report+0xe/0x20 p9_conn_cancel+0x2d9/0x3b0 p9_write_work+0x183/0x4a0 process_one_work+0x4d1/0x8c0 worker_thread+0x6e/0x780 kthread+0x1ca/0x1f0 ret_from_fork+0x35/0x40
Allocated by task 1979: save_stack+0x19/0x80 __kumsan_kmalloc.constprop.3+0xbc/0x120 kmem_cache_alloc+0xa7/0x170 p9_client_prepare_req.part.9+0x3b/0x380 p9_client_rpc+0x15e/0x880 p9_client_create+0x3d0/0xac0 v9fs_session_init+0x192/0xc80 v9fs_mount+0x67/0x470 legacy_get_tree+0x70/0xd0 vfs_get_tree+0x4a/0x1c0 do_mount+0xba9/0xf90 ksys_mount+0xa8/0x120 __x64_sys_mount+0x62/0x70 do_syscall_64+0x6d/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 0: (stack is not available)
The buggy address belongs to the object at ffff88805f9b6008 which belongs to the cache p9_req_t of size 144 The buggy address is located 4 bytes inside of 144-byte region [ffff88805f9b6008, ffff88805f9b6098) The buggy address belongs to the page: page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0 flags: 0x100000000010200(slab|head) raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740 raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000 page dumped because: kumsan: bad access detected ==================================================================
Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.com Signed-off-by: Lu Shuaibing shuaibinglu@126.com [dominique.martinet@cea.fr: grouped the added init with the others] Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/client.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/9p/client.c b/net/9p/client.c index b615aae5a0f81..d62f83f93d7bb 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -296,6 +296,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
p9pdu_reset(&req->tc); p9pdu_reset(&req->rc); + req->t_err = 0; req->status = REQ_STATUS_ALLOC; init_waitqueue_head(&req->wq); INIT_LIST_HEAD(&req->req_list);
From: Chengguang Xu cgxu519@zoho.com.cn
[ Upstream commit c87a37ebd40b889178664c2c09cc187334146292 ]
Currently on mmap cache policy, we always attach writeback_fid whether mmap type is SHARED or PRIVATE. However, in the use case of kata-container which combines 9p(Guest OS) with overlayfs(Host OS), this behavior will trigger overlayfs' copy-up when excute command inside container.
Link: http://lkml.kernel.org/r/20190820100325.10313-1-cgxu519@zoho.com.cn Signed-off-by: Chengguang Xu cgxu519@zoho.com.cn Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- fs/9p/vfs_file.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index 05454a7e22dc2..550d0b169d7c2 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -528,6 +528,7 @@ v9fs_mmap_file_mmap(struct file *filp, struct vm_area_struct *vma) v9inode = V9FS_I(inode); mutex_lock(&v9inode->v_mutex); if (!v9inode->writeback_fid && + (vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE)) { /* * clone a fid and add it to writeback_fid @@ -629,6 +630,8 @@ static void v9fs_mmap_vm_close(struct vm_area_struct *vma) (vma->vm_end - vma->vm_start - 1), };
+ if (!(vma->vm_flags & VM_SHARED)) + return;
p9_debug(P9_DEBUG_VFS, "9p VMA close, %p, flushing", vma);
From: Igor Druzhinin igor.druzhinin@citrix.com
[ Upstream commit a4098bc6eed5e31e0391bcc068e61804c98138df ]
If MCFG area is not reserved in E820, Xen by default will defer its usage until Dom0 registers it explicitly after ACPI parser recognizes it as a reserved resource in DSDT. Having it reserved in E820 is not mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2) and firmware is free to keep a hole in E820 in that place. Xen doesn't know what exactly is inside this hole since it lacks full ACPI view of the platform therefore it's potentially harmful to access MCFG region without additional checks as some machines are known to provide inconsistent information on the size of the region.
Now xen_mcfg_late() runs after acpi_init() which is too late as some basic PCI enumeration starts exactly there as well. Trying to register a device prior to MCFG reservation causes multiple problems with PCIe extended capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are no convenient hooks for us to subscribe to so register MCFG areas earlier upon the first invocation of xen_add_device(). It should be safe to do once since all the boot time buses must have their MCFG areas in MCFG table already and we don't support PCI bus hot-plug.
Signed-off-by: Igor Druzhinin igor.druzhinin@citrix.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/xen/pci.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 7494dbeb4409c..db58aaa4dc598 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -29,6 +29,8 @@ #include "../pci/pci.h" #ifdef CONFIG_PCI_MMCONFIG #include <asm/pci_x86.h> + +static int xen_mcfg_late(void); #endif
static bool __read_mostly pci_seg_supported = true; @@ -40,7 +42,18 @@ static int xen_add_device(struct device *dev) #ifdef CONFIG_PCI_IOV struct pci_dev *physfn = pci_dev->physfn; #endif - +#ifdef CONFIG_PCI_MMCONFIG + static bool pci_mcfg_reserved = false; + /* + * Reserve MCFG areas in Xen on first invocation due to this being + * potentially called from inside of acpi_init immediately after + * MCFG table has been finally parsed. + */ + if (!pci_mcfg_reserved) { + xen_mcfg_late(); + pci_mcfg_reserved = true; + } +#endif if (pci_seg_supported) { struct { struct physdev_pci_device_add add; @@ -213,7 +226,7 @@ static int __init register_xen_pci_notifier(void) arch_initcall(register_xen_pci_notifier);
#ifdef CONFIG_PCI_MMCONFIG -static int __init xen_mcfg_late(void) +static int xen_mcfg_late(void) { struct pci_mmcfg_region *cfg; int rc; @@ -252,8 +265,4 @@ static int __init xen_mcfg_late(void) } return 0; } -/* - * Needs to be done after acpi_init which are subsys_initcall. - */ -subsys_initcall_sync(xen_mcfg_late); #endif
From: Luis Henriques lhenriques@suse.com
[ Upstream commit 750670341a24cb714e624e0fd7da30900ad93752 ]
When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in bytes. Unfortunately, this doesn't make sense for directories as they have fl_stripe_unit set to '0'. This means that i_blkbits will be set to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 shift exponent 255 is too large for 32-bit type 'int'
Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit is zero.
Signed-off-by: Luis Henriques lhenriques@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index c06845237cbaa..8196c21d8623c 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -807,7 +807,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
/* update inode */ inode->i_rdev = le32_to_cpu(info->rdev); - inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + /* directories have fl_stripe_unit set to zero */ + if (le32_to_cpu(info->layout.fl_stripe_unit)) + inode->i_blkbits = + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + else + inode->i_blkbits = CEPH_BLOCK_SHIFT;
__ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
From: Erqi Chen chenerqi@gmail.com
[ Upstream commit 71a228bc8d65900179e37ac309e678f8c523f133 ]
If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, mds won't send session msg to client, and delayed_work skip CEPH_MDS_SESSION_OPENING state session, the session hang forever.
Allow ceph_con_keepalive to reconnect a session in OPENING to avoid session hang. Also, ensure that we skip sessions in RESTARTING and REJECTED states since those states can't be resurrected by issuing a keepalive.
Link: https://tracker.ceph.com/issues/41551 Signed-off-by: Erqi Chen chenerqi@gmail.com Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/mds_client.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index bfcf11c70bfad..09db6d08614d2 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -3640,7 +3640,9 @@ static void delayed_work(struct work_struct *work) pr_info("mds%d hung\n", s->s_mds); } } - if (s->s_state < CEPH_MDS_SESSION_OPEN) { + if (s->s_state == CEPH_MDS_SESSION_NEW || + s->s_state == CEPH_MDS_SESSION_RESTARTING || + s->s_state == CEPH_MDS_SESSION_REJECTED) { /* this mds is failed or recovering, just wait */ ceph_put_mds_session(s); continue;
From: Ryan Chen ryan_chen@aspeedtech.com
[ Upstream commit b3528b4874480818e38e4da019d655413c233e6a ]
The ast2600 can be supported by the same code as the ast2500.
Signed-off-by: Ryan Chen ryan_chen@aspeedtech.com Signed-off-by: Joel Stanley joel@jms.id.au Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20190819051738.17370-3-joel@jms.id.au Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/watchdog/aspeed_wdt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/aspeed_wdt.c b/drivers/watchdog/aspeed_wdt.c index 1abe4d021fd27..ffde179a9bb2c 100644 --- a/drivers/watchdog/aspeed_wdt.c +++ b/drivers/watchdog/aspeed_wdt.c @@ -38,6 +38,7 @@ static const struct aspeed_wdt_config ast2500_config = { static const struct of_device_id aspeed_wdt_of_table[] = { { .compatible = "aspeed,ast2400-wdt", .data = &ast2400_config }, { .compatible = "aspeed,ast2500-wdt", .data = &ast2500_config }, + { .compatible = "aspeed,ast2600-wdt", .data = &ast2500_config }, { }, }; MODULE_DEVICE_TABLE(of, aspeed_wdt_of_table); @@ -264,7 +265,8 @@ static int aspeed_wdt_probe(struct platform_device *pdev) set_bit(WDOG_HW_RUNNING, &wdt->wdd.status); }
- if (of_device_is_compatible(np, "aspeed,ast2500-wdt")) { + if ((of_device_is_compatible(np, "aspeed,ast2500-wdt")) || + (of_device_is_compatible(np, "aspeed,ast2600-wdt"))) { u32 reg = readl(wdt->base + WDT_RESET_WIDTH);
reg &= config->ext_pulse_width_mask;
From: Florian Westphal fw@strlen.de
[ Upstream commit acab713177377d9e0889c46bac7ff0cfb9a90c4d ]
This un-breaks lookups in sets that have the 'dynamic' flag set. Given this active example configuration:
table filter { set set1 { type ipv4_addr size 64 flags dynamic,timeout timeout 1m }
chain input { type filter hook input priority 0; policy accept; } }
... this works: nft add rule ip filter input add @set1 { ip saddr }
-> whenever rule is triggered, the source ip address is inserted into the set (if it did not exist).
This won't work: nft add rule ip filter input ip saddr @set1 counter Error: Could not process rule: Operation not supported
In other words, we can add entries to the set, but then can't make matching decision based on that set.
That is just wrong -- all set backends support lookups (else they would not be very useful). The failure comes from an explicit rejection in nft_lookup.c.
Looking at the history, it seems like NFT_SET_EVAL used to mean 'set contains expressions' (aka. "is a meter"), for instance something like
nft add rule ip filter input meter example { ip saddr limit rate 10/second } or nft add rule ip filter input meter example { ip saddr counter }
The actual meaning of NFT_SET_EVAL however, is 'set can be updated from the packet path'.
'meters' and packet-path insertions into sets, such as 'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c) and thus require a set backend that provides the ->update() function.
The only set that provides this also is the only one that has the NFT_SET_EVAL feature flag.
Removing the wrong check makes the above example work. While at it, also fix the flag check during set instantiation to allow supported combinations only.
Fixes: 8aeff920dcc9b3f ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 7 +++++-- net/netfilter/nft_lookup.c | 3 --- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 2145581d7b3dc..24fddf0322790 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3429,8 +3429,11 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, NFT_SET_OBJECT)) return -EINVAL; /* Only one of these operations is supported */ - if ((flags & (NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT)) == - (NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT)) + if ((flags & (NFT_SET_MAP | NFT_SET_OBJECT)) == + (NFT_SET_MAP | NFT_SET_OBJECT)) + return -EOPNOTSUPP; + if ((flags & (NFT_SET_EVAL | NFT_SET_OBJECT)) == + (NFT_SET_EVAL | NFT_SET_OBJECT)) return -EOPNOTSUPP; }
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 161c3451a747a..55754d9939b50 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -76,9 +76,6 @@ static int nft_lookup_init(const struct nft_ctx *ctx, if (IS_ERR(set)) return PTR_ERR(set);
- if (set->flags & NFT_SET_EVAL) - return -EOPNOTSUPP; - priv->sreg = nft_parse_register(tb[NFTA_LOOKUP_SREG]); err = nft_validate_register_load(priv->sreg, set->klen); if (err < 0)
From: Felix Kuehling Felix.Kuehling@amd.com
[ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ]
Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case.
Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 51b5e977ca885..f4e9d1b10e3ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -139,7 +139,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* ring tests don't use a job */ if (job) { vm = job->vm; - fence_ctx = job->base.s_fence->scheduled.context; + fence_ctx = job->base.s_fence ? + job->base.s_fence->scheduled.context : 0; } else { vm = NULL; fence_ctx = 0;
From: Trek trek00@inbox.ru
[ Upstream commit 73d8e6c7b841d9bf298c8928f228fb433676635c ]
Do not try to allocate any amount of memory requested by the user. Instead limit it to 128 registers. Actually the longest series of consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273 Signed-off-by: Trek trek00@inbox.ru Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index c0396e83f3526..fc93b103f7778 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -562,6 +562,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file if (sh_num == AMDGPU_INFO_MMR_SH_INDEX_MASK) sh_num = 0xffffffff;
+ if (info->read_mmr_reg.count > 128) + return -EINVAL; + regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); if (!regs) return -ENOMEM;
From: Trond Myklebust trondmy@gmail.com
[ Upstream commit 9c47b18cf722184f32148784189fca945a7d0561 ]
IF the server rejected our layout return with a state error such as NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want to clear out all the remaining layout segments and mark that stateid as invalid.
Fixes: 1c5bd76d17cca ("pNFS: Enable layoutreturn operation for...") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pnfs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 4931c3a75f038..c818f9886f618 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1426,10 +1426,15 @@ void pnfs_roc_release(struct nfs4_layoutreturn_args *args, const nfs4_stateid *res_stateid = NULL; struct nfs4_xdr_opaque_data *ld_private = args->ld_private;
- if (ret == 0) { - arg_stateid = &args->stateid; + switch (ret) { + case -NFS4ERR_NOMATCHING_LAYOUT: + break; + case 0: if (res->lrs_present) res_stateid = &res->stateid; + /* Fallthrough */ + default: + arg_stateid = &args->stateid; } pnfs_layoutreturn_free_lsegs(lo, arg_stateid, &args->range, res_stateid);
From: Fabrice Gasnier fabrice.gasnier@st.com
[ Upstream commit c91e3234c6035baf5a79763cb4fcd5d23ce75c2b ]
LPTimer can use a 32KHz clock for counting. It depends on clock tree configuration. In such a case, PWM output frequency range is limited. Although unlikely, nothing prevents user from requesting a PWM frequency above counting clock (32KHz for instance): - This causes (prd - 1) = 0xffff to be written in ARR register later in the apply() routine. This results in badly configured PWM period (and also duty_cycle). Add a check to report an error is such a case.
Signed-off-by: Fabrice Gasnier fabrice.gasnier@st.com Reviewed-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-stm32-lp.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/pwm/pwm-stm32-lp.c b/drivers/pwm/pwm-stm32-lp.c index 0059b24cfdc3c..28e1f64134763 100644 --- a/drivers/pwm/pwm-stm32-lp.c +++ b/drivers/pwm/pwm-stm32-lp.c @@ -58,6 +58,12 @@ static int stm32_pwm_lp_apply(struct pwm_chip *chip, struct pwm_device *pwm, /* Calculate the period and prescaler value */ div = (unsigned long long)clk_get_rate(priv->clk) * state->period; do_div(div, NSEC_PER_SEC); + if (!div) { + /* Clock is too slow to achieve requested period. */ + dev_dbg(priv->chip.dev, "Can't reach %u ns\n", state->period); + return -EINVAL; + } + prd = div; while (div > STM32_LPTIM_MAX_ARR) { presc++;
From: Arvind Sankar nivedita@alum.mit.edu
[ Upstream commit ca14c996afe7228ff9b480cf225211cc17212688 ]
Since commit:
b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
kexec breaks if GCC_PLUGIN_STACKLEAK=y is enabled, as the purgatory contains undefined references to stackleak_track_stack.
Attempting to load a kexec kernel results in this failure:
kexec: Undefined symbol: stackleak_track_stack kexec-bzImage64: Loading purgatory failed
Fix this by disabling the stackleak plugin for the purgatory.
Signed-off-by: Arvind Sankar nivedita@alum.mit.edu Reviewed-by: Nick Desaulniers ndesaulniers@google.com Cc: Borislav Petkov bp@alien8.de Cc: H. Peter Anvin hpa@zytor.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS") Link: https://lkml.kernel.org/r/20190923171753.GA2252517@rani.riverdale.lan Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/purgatory/Makefile | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile index 10fb42da0007e..b81b5172cf994 100644 --- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -23,6 +23,7 @@ KCOV_INSTRUMENT := n
PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss +PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN)
# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That # in turn leaves some undefined symbols like __fentry__ in purgatory and not
From: Sanjay R Mehta sanju.mehta@amd.com
[ Upstream commit ae89339b08f3fe02457ec9edd512ddc3d246d0f8 ]
second parameter of ntb_peer_mw_get_addr is pointing to wrong memory window index by passing "peer gidx" instead of "local gidx".
For ex, "local gidx" value is '0' and "peer gidx" value is '1', then
on peer side ntb_mw_set_trans() api is used as below with gidx pointing to local side gidx which is '0', so memroy window '0' is chosen and XLAT '0' will be programmed by peer side.
ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat, peer->inbuf_size);
Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx pointing to "peer gidx" which is '1', so pointing to memory window '1' instead of memory window '0'.
ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr, &peer->outbuf_size);
So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().
Signed-off-by: Sanjay R Mehta sanju.mehta@amd.com Signed-off-by: Jon Mason jdmason@kudzu.us Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ntb/test/ntb_perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c index 2a9d6b0d1f193..80508da3c8b5c 100644 --- a/drivers/ntb/test/ntb_perf.c +++ b/drivers/ntb/test/ntb_perf.c @@ -1373,7 +1373,7 @@ static int perf_setup_peer_mw(struct perf_peer *peer) int ret;
/* Get outbound MW parameters and map it */ - ret = ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr, + ret = ntb_peer_mw_get_addr(perf->ntb, perf->gidx, &phys_addr, &peer->outbuf_size); if (ret) return ret;
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]
thermal_zone_device_unregister() cancels the delayed work that polls the thermal zone, but it does not wait for it to finish. This is racy with respect to the freeing of the thermal zone device, which can result in a use-after-free [1].
Fix this by waiting for the delayed work to finish before freeing the thermal zone device. Note that thermal_zone_device_set_polling() is never invoked from an atomic context, so it is safe to call cancel_delayed_work_sync() that can block.
[1] [ +0.002221] ================================================================== [ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0 [ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
[ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701 [ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check [ +0.000012] Call Trace: [ +0.000021] dump_stack+0xa9/0x10e [ +0.000020] print_address_description.cold.2+0x9/0x25e [ +0.000018] __kasan_report.cold.3+0x78/0x9d [ +0.000016] kasan_report+0xe/0x20 [ +0.000016] __mutex_lock+0x1076/0x11c0 [ +0.000014] step_wise_throttle+0x72/0x150 [ +0.000018] handle_thermal_trip+0x167/0x760 [ +0.000019] thermal_zone_device_update+0x19e/0x5f0 [ +0.000019] process_one_work+0x969/0x16f0 [ +0.000017] worker_thread+0x91/0xc40 [ +0.000014] kthread+0x33d/0x400 [ +0.000015] ret_from_fork+0x3a/0x50
[ +0.000020] Allocated by task 1: [ +0.000015] save_stack+0x19/0x80 [ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0 [ +0.000014] kmem_cache_alloc_trace+0x152/0x320 [ +0.000015] thermal_zone_device_register+0x1b4/0x13a0 [ +0.000015] mlxsw_thermal_init+0xc92/0x23d0 [ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0 [ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90 [ +0.000013] mlxsw_pci_probe+0x355/0x4b0 [ +0.000014] local_pci_probe+0xc3/0x150 [ +0.000013] pci_device_probe+0x280/0x410 [ +0.000013] really_probe+0x26a/0xbb0 [ +0.000013] driver_probe_device+0x208/0x2e0 [ +0.000013] device_driver_attach+0xfe/0x140 [ +0.000013] __driver_attach+0x110/0x310 [ +0.000013] bus_for_each_dev+0x14b/0x1d0 [ +0.000013] driver_register+0x1c0/0x400 [ +0.000015] mlxsw_sp_module_init+0x5d/0xd3 [ +0.000014] do_one_initcall+0x239/0x4dd [ +0.000013] kernel_init_freeable+0x42b/0x4e8 [ +0.000012] kernel_init+0x11/0x18b [ +0.000013] ret_from_fork+0x3a/0x50
[ +0.000015] Freed by task 581: [ +0.000013] save_stack+0x19/0x80 [ +0.000014] __kasan_slab_free+0x125/0x170 [ +0.000013] kfree+0xf3/0x310 [ +0.000013] thermal_release+0xc7/0xf0 [ +0.000014] device_release+0x77/0x200 [ +0.000014] kobject_put+0x1a8/0x4c0 [ +0.000014] device_unregister+0x38/0xc0 [ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0 [ +0.000014] mlxsw_thermal_fini+0x184/0x35a [ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640 [ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210 [ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0 [ +0.000014] genl_family_rcv_msg+0x700/0xee0 [ +0.000013] genl_rcv_msg+0xca/0x170 [ +0.000013] netlink_rcv_skb+0x137/0x3a0 [ +0.000012] genl_rcv+0x29/0x40 [ +0.000013] netlink_unicast+0x49b/0x660 [ +0.000013] netlink_sendmsg+0x755/0xc90 [ +0.000013] __sys_sendto+0x3de/0x430 [ +0.000013] __x64_sys_sendto+0xe2/0x1b0 [ +0.000013] do_syscall_64+0xa4/0x4d0 [ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000017] The buggy address belongs to the object at ffff8881e48e0008 which belongs to the cache kmalloc-2k of size 2048 [ +0.000012] The buggy address is located 1096 bytes inside of 2048-byte region [ffff8881e48e0008, ffff8881e48e0808) [ +0.000007] The buggy address belongs to the page: [ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0 [ +0.000020] flags: 0x200000000010200(slab|head) [ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0 [ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ +0.000007] page dumped because: kasan: bad access detected
[ +0.000012] Memory state around the buggy address: [ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000008] ^ [ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000007] ==================================================================
Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer") Reported-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index bf9721fc2824e..be3eafc7682ba 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -296,7 +296,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz, mod_delayed_work(system_freezable_wq, &tz->poll_queue, msecs_to_jiffies(delay)); else - cancel_delayed_work(&tz->poll_queue); + cancel_delayed_work_sync(&tz->poll_queue); }
static void monitor_thermal_zone(struct thermal_zone_device *tz)
From: Stefan Mavrodiev stefan@olimex.com
[ Upstream commit 8c7aa184281c01fc26f319059efb94725012921d ]
When calling thermal_add_hwmon_sysfs(), the device type is sanitized by replacing '-' with '_'. However tz->type remains unsanitized. Thus calling thermal_hwmon_lookup_by_type() returns no device. And if there is no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup failed!".
The result is unregisted hwmon devices in the sysfs.
Fixes: 409ef0bacacf ("thermal_hwmon: Sanitize attribute name passed to hwmon")
Signed-off-by: Stefan Mavrodiev stefan@olimex.com Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_hwmon.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c index 40c69a533b240..dd5d8ee379287 100644 --- a/drivers/thermal/thermal_hwmon.c +++ b/drivers/thermal/thermal_hwmon.c @@ -87,13 +87,17 @@ static struct thermal_hwmon_device * thermal_hwmon_lookup_by_type(const struct thermal_zone_device *tz) { struct thermal_hwmon_device *hwmon; + char type[THERMAL_NAME_LENGTH];
mutex_lock(&thermal_hwmon_list_lock); - list_for_each_entry(hwmon, &thermal_hwmon_list, node) - if (!strcmp(hwmon->type, tz->type)) { + list_for_each_entry(hwmon, &thermal_hwmon_list, node) { + strcpy(type, tz->type); + strreplace(type, '-', '_'); + if (!strcmp(hwmon->type, type)) { mutex_unlock(&thermal_hwmon_list_lock); return hwmon; } + } mutex_unlock(&thermal_hwmon_list_lock);
return NULL;
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit c42adf87e4e7ed77f6ffe288dc90f980d07d68df ]
We do check for a bad block during namespace init and that use region bad block list. We need to initialize the bad block for volatile regions for this to work. We also observe a lockdep warning as below because the lock is not initialized correctly since we skip bad block init for volatile regions.
INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149 Call Trace: [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable) [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60 [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0 [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270 [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290 [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0 [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0 [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160 [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240 [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0 [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0 [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0 [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0 [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130 [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50 [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0 [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170 [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100 [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48 [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0 [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180 [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvdimm/bus.c | 2 +- drivers/nvdimm/region.c | 4 ++-- drivers/nvdimm/region_devs.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 2ba22cd1331b0..54a633e8cb5d2 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -189,7 +189,7 @@ static int nvdimm_clear_badblocks_region(struct device *dev, void *data) sector_t sector;
/* make sure device is a region */ - if (!is_nd_pmem(dev)) + if (!is_memory(dev)) return 0;
nd_region = to_nd_region(dev); diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c index f9130cc157e83..22224b21c34df 100644 --- a/drivers/nvdimm/region.c +++ b/drivers/nvdimm/region.c @@ -42,7 +42,7 @@ static int nd_region_probe(struct device *dev) if (rc) return rc;
- if (is_nd_pmem(&nd_region->dev)) { + if (is_memory(&nd_region->dev)) { struct resource ndr_res;
if (devm_init_badblocks(dev, &nd_region->bb)) @@ -131,7 +131,7 @@ static void nd_region_notify(struct device *dev, enum nvdimm_event event) struct nd_region *nd_region = to_nd_region(dev); struct resource res;
- if (is_nd_pmem(&nd_region->dev)) { + if (is_memory(&nd_region->dev)) { res.start = nd_region->ndr_start; res.end = nd_region->ndr_start + nd_region->ndr_size - 1; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index 0303296e6d5b6..609fc450522a1 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -633,11 +633,11 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n) if (!is_memory(dev) && a == &dev_attr_dax_seed.attr) return 0;
- if (!is_nd_pmem(dev) && a == &dev_attr_badblocks.attr) + if (!is_memory(dev) && a == &dev_attr_badblocks.attr) return 0;
if (a == &dev_attr_resource.attr) { - if (is_nd_pmem(dev)) + if (is_memory(dev)) return 0400; else return 0;
From: zhengbin zhengbin13@huawei.com
[ Upstream commit 9ad09b1976c562061636ff1e01bfc3a57aebe56b ]
If cuse_send_init fails, need to fuse_conn_put cc->fc.
cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1) ->fuse_dev_alloc->fuse_conn_get ->fuse_dev_free->fuse_conn_put
Fixes: cc080e9e9be1 ("fuse: introduce per-instance fuse_dev structure") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: zhengbin zhengbin13@huawei.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/cuse.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 8f68181256c00..f057c213c453a 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -518,6 +518,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file) rc = cuse_send_init(cc); if (rc) { fuse_dev_free(fud); + fuse_conn_put(&cc->fc); return rc; } file->private_data = fud;
From: Nathan Chancellor natechancellor@gmail.com
[ Upstream commit 59f08896f058a92f03a0041b397a1a227c5e8529 ]
After commit 62974fc389b3 ("libnvdimm: Enable unit test infrastructure compile checks"), clang warns:
In file included from ../drivers/nvdimm/../../tools/testing/nvdimm/test/iomap.c:15: ../drivers/nvdimm/../../tools/testing/nvdimm/test/nfit_test.h:206:15: warning: redefinition of typedef 'acpi_handle' is a C11 feature [-Wtypedef-redefinition] typedef void *acpi_handle; ^ ../include/acpi/actypes.h:424:15: note: previous definition is here typedef void *acpi_handle; /* Actually a ptr to a NS Node */ ^ 1 warning generated.
The include chain:
iomap.c -> linux/acpi.h -> acpi/acpi.h -> acpi/actypes.h nfit_test.h
Avoid this by including linux/acpi.h in nfit_test.h, which allows us to remove both the typedef and the forward declaration of acpi_object.
Link: https://github.com/ClangBuiltLinux/linux/issues/660 Signed-off-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Ira Weiny ira.weiny@intel.com Link: https://lore.kernel.org/r/20190918042148.77553-1-natechancellor@gmail.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/nvdimm/test/nfit_test.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/nvdimm/test/nfit_test.h b/tools/testing/nvdimm/test/nfit_test.h index 33752e06ff8d0..3de57cc8716b9 100644 --- a/tools/testing/nvdimm/test/nfit_test.h +++ b/tools/testing/nvdimm/test/nfit_test.h @@ -12,6 +12,7 @@ */ #ifndef __NFIT_TEST_H__ #define __NFIT_TEST_H__ +#include <linux/acpi.h> #include <linux/list.h> #include <linux/uuid.h> #include <linux/ioport.h> @@ -234,9 +235,6 @@ struct nd_intel_lss { __u32 status; } __packed;
-union acpi_object; -typedef void *acpi_handle; - typedef struct nfit_test_resource *(*nfit_test_lookup_fn)(resource_size_t); typedef union acpi_object *(*nfit_test_evaluate_dsm_fn)(acpi_handle handle, const guid_t *guid, u64 rev, u64 func,
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
[ Upstream commit 2840cf02fae627860156737e83326df354ee4ec6 ]
When the prev and next task's mm change, switch_mm() provides the core serializing guarantees before returning to usermode. The only case where an explicit core serialization is needed is when the scheduler keeps the same mm for prev and next.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/mm.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 0d10b7ce0da74..e9d4e389aed93 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -330,6 +330,8 @@ enum {
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { + if (current->mm != mm) + return; if (likely(!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) return;
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
[ Upstream commit fc0d77387cb5ae883fd774fc559e056a8dde024c ]
Fix a logic flaw in the way membarrier_register_private_expedited() handles ready state checks for private expedited sync core and private expedited registrations.
If a private expedited membarrier registration is first performed, and then a private expedited sync_core registration is performed, the ready state check will skip the second registration when it really should not.
Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/membarrier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 76e0eaf4654e0..dd27e632b1bab 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -235,7 +235,7 @@ static int membarrier_register_private_expedited(int flags) * groups, which use the same mm. (CLONE_VM but not * CLONE_THREAD). */ - if (atomic_read(&mm->membarrier_state) & state) + if ((atomic_read(&mm->membarrier_state) & state) == state) return 0; atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state); if (flags & MEMBARRIER_FLAG_SYNC_CORE)
From: KeMeng Shi shikemeng@huawei.com
[ Upstream commit 714e501e16cd473538b609b3e351b2cc9f7f09ed ]
An oops can be triggered in the scheduler when running qemu on arm64:
Unable to handle kernel paging request at virtual address ffff000008effe40 Internal error: Oops: 96000007 [#1] SMP Process migration/0 (pid: 12, stack limit = 0x00000000084e3736) pstate: 20000085 (nzCv daIf -PAN -UAO) pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20 lr : move_queued_task.isra.21+0x124/0x298 ... Call trace: __ll_sc___cmpxchg_case_acq_4+0x4/0x20 __migrate_task+0xc8/0xe0 migration_cpu_stop+0x170/0x180 cpu_stopper_thread+0xec/0x178 smpboot_thread_fn+0x1ac/0x1e8 kthread+0x134/0x138 ret_from_fork+0x10/0x18
__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to migrage the process if process is not currently running on any one of the CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if corresponding bit is coincidentally set. As a consequence, kernel will access an invalid rq address associate with the invalid CPU in migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
The reproduce the crash:
1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling sched_setaffinity.
2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online" and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
3) Oops appears if the invalid CPU is set in memory after tested cpumask.
Signed-off-by: KeMeng Shi shikemeng@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f4e050681ba1c..78ecdfae25b69 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1077,7 +1077,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, if (cpumask_equal(&p->cpus_allowed, new_mask)) goto out;
- if (!cpumask_intersects(new_mask, cpu_valid_mask)) { + dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); + if (dest_cpu >= nr_cpu_ids) { ret = -EINVAL; goto out; } @@ -1098,7 +1099,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, if (cpumask_test_cpu(task_cpu(p), new_mask)) goto out;
- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); if (task_running(rq, p) || p->state == TASK_WAKING) { struct migration_arg arg = { p, dest_cpu }; /* Need help from migration thread: drop lock and wait. */
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit 815c1560bf8fd522b8d93a1d727868b910c1cc24 ]
With Java 11 there is no seperate JRE anymore.
Details:
https://coderanch.com/t/701603/java/JRE-JDK
Therefore the detection of the JRE needs to be adapted.
This change works for s390 and x86. I have not tested other platforms.
Committer testing:
Continues to work with the OpenJDK 8:
$ rm -f ~acme/lib64/libperf-jvmti.so $ rpm -qa | grep jdk-devel java-1.8.0-openjdk-devel-1.8.0.222.b10-0.fc30.x86_64 $ git log --oneline -1 a51937170f33 (HEAD -> perf/core) perf build: Add detection of java-11-openjdk-devel package $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install > /dev/null 2>1 $ ls -la ~acme/lib64/libperf-jvmti.so -rwxr-xr-x. 1 acme acme 230744 Sep 24 16:46 /home/acme/lib64/libperf-jvmti.so $
Suggested-by: Andreas Krebbel krebbel@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: Hendrik Brueckner brueckner@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Link: http://lore.kernel.org/lkml/20190909114116.50469-4-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/Makefile.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index 849b3be15bd89..510caedd73194 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -837,7 +837,7 @@ ifndef NO_JVMTI JDIR=$(shell /usr/sbin/update-java-alternatives -l | head -1 | awk '{print $$3}') else ifneq (,$(wildcard /usr/sbin/alternatives)) - JDIR=$(shell /usr/sbin/alternatives --display java | tail -1 | cut -d' ' -f 5 | sed 's%/jre/bin/java.%%g') + JDIR=$(shell /usr/sbin/alternatives --display java | tail -1 | cut -d' ' -f 5 | sed -e 's%/jre/bin/java.%%g' -e 's%/bin/java.%%g') endif endif ifndef JDIR
From: Valdis Kletnieks valdis.kletnieks@vt.edu
[ Upstream commit 0f74914071ab7e7b78731ed62bf350e3a344e0a5 ]
When building with W=1, gcc properly complains that there's no prototypes:
CC kernel/elfcore.o kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes] 7 | Elf_Half __weak elf_core_extra_phdrs(void) | ^~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes] 12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes] 17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm) | ^~~~~~~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes] 22 | size_t __weak elf_core_extra_data_size(void) | ^~~~~~~~~~~~~~~~~~~~~~~~
Provide the include file so gcc is happy, and we don't have potential code drift
Link: http://lkml.kernel.org/r/29875.1565224705@turing-police Signed-off-by: Valdis Kletnieks valdis.kletnieks@vt.edu Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/elfcore.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/elfcore.c b/kernel/elfcore.c index fc482c8e0bd88..57fb4dcff4349 100644 --- a/kernel/elfcore.c +++ b/kernel/elfcore.c @@ -3,6 +3,7 @@ #include <linux/fs.h> #include <linux/mm.h> #include <linux/binfmts.h> +#include <linux/elfcore.h>
Elf_Half __weak elf_core_extra_phdrs(void) {
From: Arnaldo Carvalho de Melo acme@redhat.com
[ Upstream commit 26acf400d2dcc72c7e713e1f55db47ad92010cc2 ]
Naresh Kamboju reported, that on the i386 build pr_err() doesn't get defined properly due to header ordering:
perf-in.o: In function `libunwind__x86_reg_id': tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109: undefined reference to `pr_err'
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: David Ahern dsahern@gmail.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/arch/x86/util/unwind-libunwind.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/arch/x86/util/unwind-libunwind.c b/tools/perf/arch/x86/util/unwind-libunwind.c index 05920e3edf7a7..47357973b55b2 100644 --- a/tools/perf/arch/x86/util/unwind-libunwind.c +++ b/tools/perf/arch/x86/util/unwind-libunwind.c @@ -1,11 +1,11 @@ // SPDX-License-Identifier: GPL-2.0
#include <errno.h> +#include "../../util/debug.h" #ifndef REMOTE_UNWIND_LIBUNWIND #include <libunwind.h> #include "perf_regs.h" #include "../../util/unwind.h" -#include "../../util/debug.h" #endif
#ifdef HAVE_ARCH_X86_64_SUPPORT
From: Navid Emamdoost navid.emamdoost@gmail.com
[ Upstream commit 8ce39eb5a67aee25d9f05b40b673c95b23502e3e ]
In nfp_flower_spawn_vnic_reprs in the loop if initialization or the allocations fail memory is leaked. Appropriate releases are added.
Fixes: b94524529741 ("nfp: flower: add per repr private data for LAG offload") Signed-off-by: Navid Emamdoost navid.emamdoost@gmail.com Acked-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/flower/main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c index 22c572a09b320..c19e88efe958d 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/main.c +++ b/drivers/net/ethernet/netronome/nfp/flower/main.c @@ -272,6 +272,7 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app, port = nfp_port_alloc(app, port_type, repr); if (IS_ERR(port)) { err = PTR_ERR(port); + kfree(repr_priv); nfp_repr_free(repr); goto err_reprs_clean; }
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 9dbc88d013b79c62bd845cb9e7c0256e660967c5 ]
Bail from the pci_driver probe function instead of from the drm_driver load function.
This avoid /dev/dri/card0 temporarily getting registered and then unregistered again, sending unwanted add / remove udev events to userspace.
Specifically this avoids triggering the (userspace) bug fixed by this plymouth merge-request: https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59
Note that despite that being an userspace bug, not sending unnecessary udev events is a good idea in general.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490 Reviewed-by: Michel Dänzer mdaenzer@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/radeon/radeon_drv.c | 31 +++++++++++++++++++++++++++++ drivers/gpu/drm/radeon/radeon_kms.c | 25 ----------------------- 2 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 25b5407c74b5a..d83310751a8e4 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -340,8 +340,39 @@ static int radeon_kick_out_firmware_fb(struct pci_dev *pdev) static int radeon_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { + unsigned long flags = 0; int ret;
+ if (!ent) + return -ENODEV; /* Avoid NULL-ptr deref in drm_get_pci_dev */ + + flags = ent->driver_data; + + if (!radeon_si_support) { + switch (flags & RADEON_FAMILY_MASK) { + case CHIP_TAHITI: + case CHIP_PITCAIRN: + case CHIP_VERDE: + case CHIP_OLAND: + case CHIP_HAINAN: + dev_info(&pdev->dev, + "SI support disabled by module param\n"); + return -ENODEV; + } + } + if (!radeon_cik_support) { + switch (flags & RADEON_FAMILY_MASK) { + case CHIP_KAVERI: + case CHIP_BONAIRE: + case CHIP_HAWAII: + case CHIP_KABINI: + case CHIP_MULLINS: + dev_info(&pdev->dev, + "CIK support disabled by module param\n"); + return -ENODEV; + } + } + if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER;
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index 6a8fb6fd183c3..3ff835767ac58 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -95,31 +95,6 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags) struct radeon_device *rdev; int r, acpi_status;
- if (!radeon_si_support) { - switch (flags & RADEON_FAMILY_MASK) { - case CHIP_TAHITI: - case CHIP_PITCAIRN: - case CHIP_VERDE: - case CHIP_OLAND: - case CHIP_HAINAN: - dev_info(dev->dev, - "SI support disabled by module param\n"); - return -ENODEV; - } - } - if (!radeon_cik_support) { - switch (flags & RADEON_FAMILY_MASK) { - case CHIP_KAVERI: - case CHIP_BONAIRE: - case CHIP_HAWAII: - case CHIP_KABINI: - case CHIP_MULLINS: - dev_info(dev->dev, - "CIK support disabled by module param\n"); - return -ENODEV; - } - } - rdev = kzalloc(sizeof(struct radeon_device), GFP_KERNEL); if (rdev == NULL) { return -ENOMEM;
From: Cédric Le Goater clg@kaod.org
[ Upstream commit 237aed48c642328ff0ab19b63423634340224a06 ]
When a vCPU is brought done, the XIVE VP (Virtual Processor) is first disabled and then the event notification queues are freed. When freeing the queues, we check for possible escalation interrupts and free them also.
But when a XIVE VP is disabled, the underlying XIVE ENDs also are disabled in OPAL. When an END (Event Notification Descriptor) is disabled, its ESB pages (ESn and ESe) are disabled and loads return all 1s. Which means that any access on the ESB page of the escalation interrupt will return invalid values.
When an interrupt is freed, the shutdown handler computes a 'saved_p' field from the value returned by a load in xive_do_source_set_mask(). This value is incorrect for escalation interrupts for the reason described above.
This has no impact on Linux/KVM today because we don't make use of it but we will introduce in future changes a xive_get_irqchip_state() handler. This handler will use the 'saved_p' field to return the state of an interrupt and 'saved_p' being incorrect, softlockup will occur.
Fix the vCPU cleanup sequence by first freeing the escalation interrupts if any, then disable the XIVE VP and last free the queues.
Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode") Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Cédric Le Goater clg@kaod.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kvm/book3s_xive.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index aae34f218ab45..031f07f048afd 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -1037,20 +1037,22 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) /* Mask the VP IPI */ xive_vm_esb_load(&xc->vp_ipi_data, XIVE_ESB_SET_PQ_01);
- /* Disable the VP */ - xive_native_disable_vp(xc->vp_id); - - /* Free the queues & associated interrupts */ + /* Free escalations */ for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { - struct xive_q *q = &xc->queues[i]; - - /* Free the escalation irq */ if (xc->esc_virq[i]) { free_irq(xc->esc_virq[i], vcpu); irq_dispose_mapping(xc->esc_virq[i]); kfree(xc->esc_virq_names[i]); } - /* Free the queue */ + } + + /* Disable the VP */ + xive_native_disable_vp(xc->vp_id); + + /* Free the queues */ + for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { + struct xive_q *q = &xc->queues[i]; + xive_native_disable_queue(xc->vp_id, q, i); if (q->qpage) { free_pages((unsigned long)q->qpage,
From: Sean Christopherson sean.j.christopherson@intel.com
[ Upstream commit 567926cca99ba1750be8aae9c4178796bf9bb90b ]
Current versions of Intel's SDM incorrectly state that "bits 31:15 of the VM-Entry exception error-code field" must be zero. In reality, bits 31:16 must be zero, i.e. error codes are 16-bit values.
The bogus error code check manifests as an unexpected VM-Entry failure due to an invalid code field (error number 7) in L1, e.g. when injecting a #GP with error_code=0x9f00.
Nadav previously reported the bug[*], both to KVM and Intel, and fixed the associated kvm-unit-test.
[*] https://patchwork.kernel.org/patch/11124749/
Reported-by: Nadav Amit namit@vmware.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Reviewed-by: Jim Mattson jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d3a900a4fa0e7..6f7b3acdab263 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -12574,7 +12574,7 @@ static int check_vmentry_prereqs(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
/* VM-entry exception error code */ if (has_error_code && - vmcs12->vm_entry_exception_error_code & GENMASK(31, 15)) + vmcs12->vm_entry_exception_error_code & GENMASK(31, 16)) return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
/* VM-entry interruption-info field: reserved bits */
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 553768d1169a48c0cd87c4eb4ab57534ee663415 ]
This will allow the blksize to be set zero and then use 1024 as default.
Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Xiubo Li xiubli@redhat.com [fix to use goto out instead of return in genl_connect] Signed-off-by: Mike Christie mchristi@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/nbd.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 21f54c7946a0e..bc2fa4e85f0ca 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -133,6 +133,8 @@ static struct dentry *nbd_dbg_dir;
#define NBD_MAGIC 0x68797548
+#define NBD_DEF_BLKSIZE 1024 + static unsigned int nbds_max = 16; static int max_part = 16; static int part_shift; @@ -1241,6 +1243,14 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd, nbd_config_put(nbd); }
+static bool nbd_is_valid_blksize(unsigned long blksize) +{ + if (!blksize || !is_power_of_2(blksize) || blksize < 512 || + blksize > PAGE_SIZE) + return false; + return true; +} + /* Must be called with config_lock held */ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, unsigned int cmd, unsigned long arg) @@ -1256,8 +1266,9 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, case NBD_SET_SOCK: return nbd_add_socket(nbd, arg, false); case NBD_SET_BLKSIZE: - if (!arg || !is_power_of_2(arg) || arg < 512 || - arg > PAGE_SIZE) + if (!arg) + arg = NBD_DEF_BLKSIZE; + if (!nbd_is_valid_blksize(arg)) return -EINVAL; nbd_size_set(nbd, arg, div_s64(config->bytesize, arg)); @@ -1337,7 +1348,7 @@ static struct nbd_config *nbd_alloc_config(void) atomic_set(&config->recv_threads, 0); init_waitqueue_head(&config->recv_wq); init_waitqueue_head(&config->conn_wait); - config->blksize = 1024; + config->blksize = NBD_DEF_BLKSIZE; atomic_set(&config->live_connections, 0); try_module_get(THIS_MODULE); return config; @@ -1773,6 +1784,12 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) if (info->attrs[NBD_ATTR_BLOCK_SIZE_BYTES]) { u64 bsize = nla_get_u64(info->attrs[NBD_ATTR_BLOCK_SIZE_BYTES]); + if (!bsize) + bsize = NBD_DEF_BLKSIZE; + if (!nbd_is_valid_blksize(bsize)) { + ret = -EINVAL; + goto out; + } nbd_size_set(nbd, bsize, div64_u64(config->bytesize, bsize)); } if (info->attrs[NBD_ATTR_TIMEOUT]) {
From: Gautham R. Shenoy ego@linux.vnet.ibm.com
[ Upstream commit c784be435d5dae28d3b03db31753dd7a18733f0c ]
The calls to arch_add_memory()/arch_remove_memory() are always made with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin(). On pSeries, arch_add_memory()/arch_remove_memory() eventually call resize_hpt() which in turn calls stop_machine() which acquires the read-side cpu_hotplug_lock again, thereby resulting in the recursive acquisition of this lock.
In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system lockup during a memory hotplug operation because cpus_read_lock() is a per-cpu rwsem read, which, in the fast-path (in the absence of the writer, which in our case is a CPU-hotplug operation) simply increments the read_count on the semaphore. Thus a recursive read in the fast-path doesn't cause any problems.
However, we can hit this problem in practice if there is a concurrent CPU-Hotplug operation in progress which is waiting to acquire the write-side of the lock. This will cause the second recursive read to block until the writer finishes. While the writer is blocked since the first read holds the lock. Thus both the reader as well as the writers fail to make any progress thereby blocking both CPU-Hotplug as well as Memory Hotplug operations.
Memory-Hotplug CPU-Hotplug CPU 0 CPU 1 ------ ------
1. down_read(cpu_hotplug_lock.rw_sem) [memory_hotplug_begin] 2. down_write(cpu_hotplug_lock.rw_sem) [cpu_up/cpu_down] 3. down_read(cpu_hotplug_lock.rw_sem) [stop_machine()]
Lockdep complains as follows in these code-paths.
swapper/0/1 is trying to acquire lock: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
but task is already holding lock: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(cpu_hotplug_lock.rw_sem); lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by swapper/0/1: #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0 #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50 #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166 Call Trace: dump_stack+0xe8/0x164 (unreliable) __lock_acquire+0x1110/0x1c70 lock_acquire+0x240/0x290 cpus_read_lock+0x64/0xf0 stop_machine+0x2c/0x60 pseries_lpar_resize_hpt+0x19c/0x2c0 resize_hpt_for_hotplug+0x70/0xd0 arch_add_memory+0x58/0xfc devm_memremap_pages+0x5e8/0x8f0 pmem_attach_disk+0x764/0x830 nvdimm_bus_probe+0x118/0x240 really_probe+0x230/0x4b0 driver_probe_device+0x16c/0x1e0 __driver_attach+0x148/0x1b0 bus_for_each_dev+0x90/0x130 driver_attach+0x34/0x50 bus_add_driver+0x1a8/0x360 driver_register+0x108/0x170 __nd_driver_register+0xd0/0xf0 nd_pmem_driver_init+0x34/0x48 do_one_initcall+0x1e0/0x45c kernel_init_freeable+0x540/0x64c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0x68
Fix this issue by 1) Requiring all the calls to pseries_lpar_resize_hpt() be made with cpu_hotplug_lock held.
2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked() as a consequence of 1)
3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt() with cpu_hotplug_lock held.
Fixes: dbcf929c0062 ("powerpc/pseries: Add support for hash table resizing") Cc: stable@vger.kernel.org # v4.11+ Reported-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Gautham R. Shenoy ego@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.i... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/hash_utils_64.c | 9 ++++++++- arch/powerpc/platforms/pseries/lpar.c | 8 ++++++-- 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 29fd8940867e5..b1007e9a31ba7 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -37,6 +37,7 @@ #include <linux/context_tracking.h> #include <linux/libfdt.h> #include <linux/pkeys.h> +#include <linux/cpu.h>
#include <asm/debugfs.h> #include <asm/processor.h> @@ -1891,10 +1892,16 @@ static int hpt_order_get(void *data, u64 *val)
static int hpt_order_set(void *data, u64 val) { + int ret; + if (!mmu_hash_ops.resize_hpt) return -ENODEV;
- return mmu_hash_ops.resize_hpt(val); + cpus_read_lock(); + ret = mmu_hash_ops.resize_hpt(val); + cpus_read_unlock(); + + return ret; }
DEFINE_SIMPLE_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n"); diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 9e52b686a8fa4..ea602f7f97ce1 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -647,7 +647,10 @@ static int pseries_lpar_resize_hpt_commit(void *data) return 0; }
-/* Must be called in user context */ +/* + * Must be called in process context. The caller must hold the + * cpus_lock. + */ static int pseries_lpar_resize_hpt(unsigned long shift) { struct hpt_resize_state state = { @@ -699,7 +702,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
t1 = ktime_get();
- rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL); + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit, + &state, NULL);
t2 = ktime_get();
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue.
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com --- arch/powerpc/include/asm/cputable.h | 4 ++-- arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- arch/powerpc/mm/hash_native_64.c | 2 +- arch/powerpc/mm/tlb-radix.c | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 29f49a35d6eec..6a6804c2e1b08 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -212,7 +212,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_POWER9_DD2_1 LONG_ASM_CONST(0x0000080000000000) #define CPU_FTR_P9_TM_HV_ASSIST LONG_ASM_CONST(0x0000100000000000) #define CPU_FTR_P9_TM_XER_SO_BUG LONG_ASM_CONST(0x0000200000000000) -#define CPU_FTR_P9_TLBIE_BUG LONG_ASM_CONST(0x0000400000000000) +#define CPU_FTR_P9_TLBIE_STQ_BUG LONG_ASM_CONST(0x0000400000000000) #define CPU_FTR_P9_TIDR LONG_ASM_CONST(0x0000800000000000)
#ifndef __ASSEMBLY__ @@ -460,7 +460,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ - CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR) + CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TIDR) #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1) #define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \ diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 2fdc08ab6b9e2..f3b8e04eca9c3 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -708,14 +708,14 @@ static __init void update_tlbie_feature_flag(unsigned long pvr) if ((pvr & 0xe000) == 0) { /* Nimbus */ if ((pvr & 0xfff) < 0x203) - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } else if ((pvr & 0xc000) == 0) { /* Cumulus */ if ((pvr & 0xfff) < 0x103) - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } else { WARN_ONCE(1, "Unknown PVR"); - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } } } diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index a67cf1cdeda40..7c68d834c94a7 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -452,7 +452,7 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, "r" (rbvalues[i]), "r" (kvm->arch.lpid)); }
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { /* * Need the extra ptesync to make sure we don't * re-order the tlbie diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c index aaa28fd918fe4..0c13561d8b807 100644 --- a/arch/powerpc/mm/hash_native_64.c +++ b/arch/powerpc/mm/hash_native_64.c @@ -203,7 +203,7 @@ static inline unsigned long ___tlbie(unsigned long vpn, int psize,
static inline void fixup_tlbie(unsigned long vpn, int psize, int apsize, int ssize) { - if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { /* Need the extra ptesync to ensure we don't reorder tlbie*/ asm volatile("ptesync": : :"memory"); ___tlbie(vpn, psize, apsize, ssize); diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c index fef3e1eb3a199..0cddae4263f96 100644 --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@ -220,7 +220,7 @@ static inline void fixup_tlbie(void) unsigned long pid = 0; unsigned long va = ((1UL << 52) - 1);
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); __tlbie_va(va, pid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); } @@ -230,7 +230,7 @@ static inline void fixup_tlbie_lpid(unsigned long lpid) { unsigned long va = ((1UL << 52) - 1);
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); __tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); }
Hi!
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue.
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
Apparently this is upstream commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 , but the changelog does not say so.
Best regards, Pavel
On Fri, Oct 11, 2019 at 01:21:06PM +0200, Pavel Machek wrote:
Hi!
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue.
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
Apparently this is upstream commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 , but the changelog does not say so.
Yeah, somehow when Sasha backported this, he didn't add that :(
Nor did he add his signed-off-by :(
I'll go fix it up and add mine, thanks for noticing it.
greg k-h
On Fri, Oct 11, 2019 at 02:58:38PM +0200, Greg Kroah-Hartman wrote:
On Fri, Oct 11, 2019 at 01:21:06PM +0200, Pavel Machek wrote:
Hi!
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue.
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
Apparently this is upstream commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 , but the changelog does not say so.
Yeah, somehow when Sasha backported this, he didn't add that :(
Nor did he add his signed-off-by :(
I'll go fix it up and add mine, thanks for noticing it.
I forgot to run my "prettyfying" script on it, sorry and thanks for catching it.
From: Steven Rostedt (VMware) rostedt@goodmis.org
[ Upstream commit e0d2615856b2046c2e8d5bfd6933f37f69703b0b ]
If the re-allocation of tep->cmdlines succeeds, then the previous allocation of tep->cmdlines will be freed. If we later fail in add_new_comm(), we must not free cmdlines, and also should assign tep->cmdlines to the new allocation. Otherwise when freeing tep, the tep->cmdlines will be pointing to garbage.
Fixes: a6d2a61ac653a ("tools lib traceevent: Remove some die() calls") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jiri Olsa jolsa@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190828191819.970121417@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/traceevent/event-parse.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index 6ccfd13d5cf9c..382e476629fb1 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -254,10 +254,10 @@ static int add_new_comm(struct tep_handle *pevent, const char *comm, int pid) errno = ENOMEM; return -1; } + pevent->cmdlines = cmdlines;
cmdlines[pevent->cmdline_count].comm = strdup(comm); if (!cmdlines[pevent->cmdline_count].comm) { - free(cmdlines); errno = ENOMEM; return -1; } @@ -268,7 +268,6 @@ static int add_new_comm(struct tep_handle *pevent, const char *comm, int pid) pevent->cmdline_count++;
qsort(cmdlines, pevent->cmdline_count, sizeof(*cmdlines), cmdline_cmp); - pevent->cmdlines = cmdlines;
return 0; }
From: Balasubramani Vivekanandan balasubramani_vivekanandan@mentor.com
[ Upstream commit b9023b91dd020ad7e093baa5122b6968c48cc9e0 ]
When a cpu requests broadcasting, before starting the tick broadcast hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide the required synchronization when the callback is active on other core.
The callback could have already executed tick_handle_oneshot_broadcast() and could have also returned. But still there is a small time window where the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns without doing anything, but the next_event of the tick broadcast clock device is already set to a timeout value.
In the race condition diagram below, CPU #1 is running the timer callback and CPU #2 is entering idle state and so calls bc_set_next().
In the worst case, the next_event will contain an expiry time, but the hrtimer will not be started which happens when the racing callback returns HRTIMER_NORESTART. The hrtimer might never recover if all further requests from the CPUs to subscribe to tick broadcast have timeout greater than the next_event of tick broadcast clock device. This leads to cascading of failures and finally noticed as rcu stall warnings
Here is a depiction of the race condition
CPU #1 (Running timer callback) CPU #2 (Enter idle and subscribe to tick broadcast) --------------------- ---------------------
__run_hrtimer() tick_broadcast_enter()
bc_handler() __tick_broadcast_oneshot_control()
tick_handle_oneshot_broadcast()
raw_spin_lock(&tick_broadcast_lock);
dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock //next_event for tick broadcast clock set to KTIME_MAX since no other cores subscribed to tick broadcasting
raw_spin_unlock(&tick_broadcast_lock);
if (dev->next_event == KTIME_MAX) return HRTIMER_NORESTART // callback function exits without restarting the hrtimer //tick_broadcast_lock acquired raw_spin_lock(&tick_broadcast_lock);
tick_broadcast_set_event()
clockevents_program_event()
dev->next_event = expires;
bc_set_next()
hrtimer_try_to_cancel() //returns -1 since the timer callback is active. Exits without restarting the timer cpu_base->running = NULL;
The comment that hrtimer cannot be armed from within the callback is wrong. It is fine to start the hrtimer from within the callback. Also it is safe to start the hrtimer from the enter/exit idle code while the broadcast handler is active. The enter/exit idle code and the broadcast handler are synchronized using tick_broadcast_lock. So there is no need for the existing try to cancel logic. All this can be removed which will eliminate the race condition as well.
Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") Originally-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Balasubramani Vivekanandan balasubramani_vivekanandan@mentor.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/time/tick-broadcast-hrtimer.c | 57 ++++++++++++++-------------- 1 file changed, 29 insertions(+), 28 deletions(-)
diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c index a59641fb88b69..a836efd345895 100644 --- a/kernel/time/tick-broadcast-hrtimer.c +++ b/kernel/time/tick-broadcast-hrtimer.c @@ -44,34 +44,39 @@ static int bc_shutdown(struct clock_event_device *evt) */ static int bc_set_next(ktime_t expires, struct clock_event_device *bc) { - int bc_moved; /* - * We try to cancel the timer first. If the callback is on - * flight on some other cpu then we let it handle it. If we - * were able to cancel the timer nothing can rearm it as we - * own broadcast_lock. + * This is called either from enter/exit idle code or from the + * broadcast handler. In all cases tick_broadcast_lock is held. * - * However we can also be called from the event handler of - * ce_broadcast_hrtimer itself when it expires. We cannot - * restart the timer because we are in the callback, but we - * can set the expiry time and let the callback return - * HRTIMER_RESTART. + * hrtimer_cancel() cannot be called here neither from the + * broadcast handler nor from the enter/exit idle code. The idle + * code can run into the problem described in bc_shutdown() and the + * broadcast handler cannot wait for itself to complete for obvious + * reasons. * - * Since we are in the idle loop at this point and because - * hrtimer_{start/cancel} functions call into tracing, - * calls to these functions must be bound within RCU_NONIDLE. + * Each caller tries to arm the hrtimer on its own CPU, but if the + * hrtimer callbback function is currently running, then + * hrtimer_start() cannot move it and the timer stays on the CPU on + * which it is assigned at the moment. + * + * As this can be called from idle code, the hrtimer_start() + * invocation has to be wrapped with RCU_NONIDLE() as + * hrtimer_start() can call into tracing. */ - RCU_NONIDLE({ - bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0; - if (bc_moved) - hrtimer_start(&bctimer, expires, - HRTIMER_MODE_ABS_PINNED);}); - if (bc_moved) { - /* Bind the "device" to the cpu */ - bc->bound_on = smp_processor_id(); - } else if (bc->bound_on == smp_processor_id()) { - hrtimer_set_expires(&bctimer, expires); - } + RCU_NONIDLE( { + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED); + /* + * The core tick broadcast mode expects bc->bound_on to be set + * correctly to prevent a CPU which has the broadcast hrtimer + * armed from going deep idle. + * + * As tick_broadcast_lock is held, nothing can change the cpu + * base which was just established in hrtimer_start() above. So + * the below access is safe even without holding the hrtimer + * base lock. + */ + bc->bound_on = bctimer.base->cpu_base->cpu; + } ); return 0; }
@@ -97,10 +102,6 @@ static enum hrtimer_restart bc_handler(struct hrtimer *t) { ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
- if (clockevent_state_oneshot(&ce_broadcast_hrtimer)) - if (ce_broadcast_hrtimer.next_event != KTIME_MAX) - return HRTIMER_RESTART; - return HRTIMER_NORESTART; }
From: Srikar Dronamraju srikar@linux.vnet.ibm.com
[ Upstream commit b63fd11cced17fcb8e133def29001b0f6aaa5e06 ]
When using 'perf stat' with repeat and interval option, it shows wrong values for events.
The wrong values will be shown for the first interval on the second and subsequent repetitions.
Without the fix:
# perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
2.000282489 53 faults 2.000282489 513 sched:sched_switch 4.005478208 3,721 faults 4.005478208 2,666 sched:sched_switch 5.025470933 395 faults 5.025470933 1,307 sched:sched_switch 2.009602825 1,84,46,74,40,73,70,95,47,520 faults <------ 2.009602825 1,84,46,74,40,73,70,95,49,568 sched:sched_switch <------ 4.019612206 4,730 faults 4.019612206 2,746 sched:sched_switch 5.039615484 3,953 faults 5.039615484 1,496 sched:sched_switch 2.000274620 1,84,46,74,40,73,70,95,47,520 faults <------ 2.000274620 1,84,46,74,40,73,70,95,47,520 sched:sched_switch <------ 4.000480342 4,282 faults 4.000480342 2,303 sched:sched_switch 5.000916811 1,322 faults 5.000916811 1,064 sched:sched_switch #
prev_raw_counts is allocated when using intervals. This is used when calculating the difference in the counts of events when using interval.
The current counts are stored in prev_raw_counts to calculate the differences in the next iteration.
On the first interval of the second and subsequent repetitions, prev_raw_counts would be the values stored in the last interval of the previous repetitions, while the current counts will only be for the first interval of the current repetition.
Hence there is a possibility of events showing up as big number.
Fix this by resetting prev_raw_counts whenever perf stat repeats the command.
With the fix:
# perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
2.019349347 2,597 faults 2.019349347 2,753 sched:sched_switch 4.019577372 3,098 faults 4.019577372 2,532 sched:sched_switch 5.019415481 1,879 faults 5.019415481 1,356 sched:sched_switch 2.000178813 8,468 faults 2.000178813 2,254 sched:sched_switch 4.000404621 7,440 faults 4.000404621 1,266 sched:sched_switch 5.040196079 2,458 faults 5.040196079 556 sched:sched_switch 2.000191939 6,870 faults 2.000191939 1,170 sched:sched_switch 4.000414103 541 faults 4.000414103 902 sched:sched_switch 5.000809863 450 faults 5.000809863 364 sched:sched_switch #
Committer notes:
This was broken since the cset introducing the --interval feature, i.e. --repeat + --interval wasn't tested at that point, add the Fixes tag so that automatic scripts can pick this up.
Fixes: 13370a9b5bb8 ("perf stat: Add interval printing") Signed-off-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Acked-by: Jiri Olsa jolsa@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Tested-by: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Cc: Stephane Eranian eranian@google.com Cc: stable@vger.kernel.org # v3.9+ Link: http://lore.kernel.org/lkml/20190904094738.9558-2-srikar@linux.vnet.ibm.com [ Fixed up conflicts with libperf, i.e. some perf_{evsel,evlist} lost the 'perf' prefix ] Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-stat.c | 3 +++ tools/perf/util/stat.c | 17 +++++++++++++++++ tools/perf/util/stat.h | 1 + 3 files changed, 21 insertions(+)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 11650910e089a..6aae10ff954c7 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -3090,6 +3090,9 @@ int cmd_stat(int argc, const char **argv) fprintf(output, "[ perf stat: executing run #%d ... ]\n", run_idx + 1);
+ if (run_idx != 0) + perf_evlist__reset_prev_raw_counts(evsel_list); + status = run_perf_stat(argc, argv, run_idx); if (forever && status != -1 && !interval) { print_counters(NULL, argc, argv); diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index a0061e0b0fade..6917ba8a00240 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -154,6 +154,15 @@ static void perf_evsel__free_prev_raw_counts(struct perf_evsel *evsel) evsel->prev_raw_counts = NULL; }
+static void perf_evsel__reset_prev_raw_counts(struct perf_evsel *evsel) +{ + if (evsel->prev_raw_counts) { + evsel->prev_raw_counts->aggr.val = 0; + evsel->prev_raw_counts->aggr.ena = 0; + evsel->prev_raw_counts->aggr.run = 0; + } +} + static int perf_evsel__alloc_stats(struct perf_evsel *evsel, bool alloc_raw) { int ncpus = perf_evsel__nr_cpus(evsel); @@ -204,6 +213,14 @@ void perf_evlist__reset_stats(struct perf_evlist *evlist) } }
+void perf_evlist__reset_prev_raw_counts(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + + evlist__for_each_entry(evlist, evsel) + perf_evsel__reset_prev_raw_counts(evsel); +} + static void zero_per_pkg(struct perf_evsel *counter) { if (counter->per_pkg_mask) diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 36efb986f7fc6..e19abb1635c4e 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -158,6 +158,7 @@ void perf_stat__collect_metric_expr(struct perf_evlist *); int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw); void perf_evlist__free_stats(struct perf_evlist *evlist); void perf_evlist__reset_stats(struct perf_evlist *evlist); +void perf_evlist__reset_prev_raw_counts(struct perf_evlist *evlist);
int perf_stat_process_counter(struct perf_stat_config *config, struct perf_evsel *counter);
From: Chris Wilson chris@chris-wilson.co.uk
[ Upstream commit cb6d7c7dc7ff8cace666ddec66334117a6068ce2 ]
set_page_dirty says:
For pages with a mapping this should be done under the page lock for the benefit of asynchronous memory errors who prefer a consistent dirty state. This rule can be broken in some special cases, but should be better not to.
Under those rules, it is only safe for us to use the plain set_page_dirty calls for shmemfs/anonymous memory. Userptr may be used with real mappings and so needs to use the locked version (set_page_dirty_lock).
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317 Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190708140327.26825-1-chris@c... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/i915_gem_userptr.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 2c9b284036d10..e13ea2ecd669c 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -692,7 +692,15 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
for_each_sgt_page(page, sgt_iter, pages) { if (obj->mm.dirty) - set_page_dirty(page); + /* + * As this may not be anonymous memory (e.g. shmem) + * but exist on a real mapping, we have to lock + * the page in order to dirty it -- holding + * the page reference is not sufficient to + * prevent the inode from being truncated. + * Play safe and take the lock. + */ + set_page_dirty_lock(page);
mark_page_accessed(page); put_page(page);
From: Vincent Chen vincent.chen@sifive.com
[ Upstream commit c82dd6d078a2bb29d41eda032bb96d05699a524d ]
When the handle_exception function addresses an exception, the interrupts will be unconditionally enabled after finishing the context save. However, It may erroneously enable the interrupts if the interrupts are disabled before entering the handle_exception.
For example, one of the WARN_ON() condition is satisfied in the scheduling where the interrupt is disabled and rq.lock is locked. The WARN_ON will trigger a break exception and the handle_exception function will enable the interrupts before entering do_trap_break function. During the procedure, if a timer interrupt is pending, it will be taken when interrupts are enabled. In this case, it may cause a deadlock problem if the rq.lock is locked again in the timer ISR.
Hence, the handle_exception() can only enable interrupts when the state of sstatus.SPIE is 1.
This patch is tested on HiFive Unleashed board.
Signed-off-by: Vincent Chen vincent.chen@sifive.com Reviewed-by: Palmer Dabbelt palmer@sifive.com [paul.walmsley@sifive.com: updated to apply] Fixes: bcae803a21317 ("RISC-V: Enable IRQ during exception handling") Cc: David Abdurachmanov david.abdurachmanov@sifive.com Cc: stable@vger.kernel.org Signed-off-by: Paul Walmsley paul.walmsley@sifive.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kernel/entry.S | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index fa2c08e3c05e6..a03821b2656aa 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -171,9 +171,13 @@ ENTRY(handle_exception) move a1, s4 /* scause */ tail do_IRQ 1: - /* Exceptions run with interrupts enabled */ + /* Exceptions run with interrupts enabled or disabled + depending on the state of sstatus.SR_SPIE */ + andi t0, s1, SR_SPIE + beqz t0, 1f csrs sstatus, SR_SIE
+1: /* Handle syscalls */ li t0, EXC_SYSCALL beq s4, t0, handle_syscall
From: Will Deacon will.deacon@arm.com
[ Upstream commit 8f04e8e6e29c93421a95b61cad62e3918425eac7 ]
On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD state without needing to call into firmware.
This patch hooks into the existing SSBD infrastructure so that SSBS is used on CPUs that support it, but it's all made horribly complicated by the very real possibility of big/little systems that don't uniformly provide the new capability.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/processor.h | 7 +++++ arch/arm64/include/asm/ptrace.h | 1 arch/arm64/include/asm/sysreg.h | 3 ++ arch/arm64/include/uapi/asm/ptrace.h | 1 arch/arm64/kernel/cpu_errata.c | 26 ++++++++++++++++++-- arch/arm64/kernel/cpufeature.c | 45 +++++++++++++++++++++++++++++++++++ arch/arm64/kernel/process.c | 4 +++ arch/arm64/kernel/ssbd.c | 21 ++++++++++++++++ 8 files changed, 106 insertions(+), 2 deletions(-)
--- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -182,6 +182,10 @@ static inline void start_thread(struct p { start_thread_common(regs, pc); regs->pstate = PSR_MODE_EL0t; + + if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) + regs->pstate |= PSR_SSBS_BIT; + regs->sp = sp; }
@@ -198,6 +202,9 @@ static inline void compat_start_thread(s regs->pstate |= PSR_AA32_E_BIT; #endif
+ if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) + regs->pstate |= PSR_AA32_SSBS_BIT; + regs->compat_sp = sp; } #endif --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -50,6 +50,7 @@ #define PSR_AA32_I_BIT 0x00000080 #define PSR_AA32_A_BIT 0x00000100 #define PSR_AA32_E_BIT 0x00000200 +#define PSR_AA32_SSBS_BIT 0x00800000 #define PSR_AA32_DIT_BIT 0x01000000 #define PSR_AA32_Q_BIT 0x08000000 #define PSR_AA32_V_BIT 0x10000000 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -86,11 +86,14 @@
#define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4) #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3) +#define REG_PSTATE_SSBS_IMM sys_reg(0, 3, 4, 0, 1)
#define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM | \ (!!x)<<8 | 0x1f) #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM | \ (!!x)<<8 | 0x1f) +#define SET_PSTATE_SSBS(x) __emit_inst(0xd5000000 | REG_PSTATE_SSBS_IMM | \ + (!!x)<<8 | 0x1f)
#define SYS_DC_ISW sys_insn(1, 0, 7, 6, 2) #define SYS_DC_CSW sys_insn(1, 0, 7, 10, 2) --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -46,6 +46,7 @@ #define PSR_I_BIT 0x00000080 #define PSR_A_BIT 0x00000100 #define PSR_D_BIT 0x00000200 +#define PSR_SSBS_BIT 0x00001000 #define PSR_PAN_BIT 0x00400000 #define PSR_UAO_BIT 0x00800000 #define PSR_V_BIT 0x10000000 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -312,6 +312,14 @@ void __init arm64_enable_wa2_handling(st
void arm64_set_ssbd_mitigation(bool state) { + if (this_cpu_has_cap(ARM64_SSBS)) { + if (state) + asm volatile(SET_PSTATE_SSBS(0)); + else + asm volatile(SET_PSTATE_SSBS(1)); + return; + } + switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); @@ -336,6 +344,11 @@ static bool has_ssbd_mitigation(const st
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+ if (this_cpu_has_cap(ARM64_SSBS)) { + required = false; + goto out_printmsg; + } + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { ssbd_state = ARM64_SSBD_UNKNOWN; return false; @@ -384,7 +397,6 @@ static bool has_ssbd_mitigation(const st
switch (ssbd_state) { case ARM64_SSBD_FORCE_DISABLE: - pr_info_once("%s disabled from command-line\n", entry->desc); arm64_set_ssbd_mitigation(false); required = false; break; @@ -397,7 +409,6 @@ static bool has_ssbd_mitigation(const st break;
case ARM64_SSBD_FORCE_ENABLE: - pr_info_once("%s forced from command-line\n", entry->desc); arm64_set_ssbd_mitigation(true); required = true; break; @@ -407,6 +418,17 @@ static bool has_ssbd_mitigation(const st break; }
+out_printmsg: + switch (ssbd_state) { + case ARM64_SSBD_FORCE_DISABLE: + pr_info_once("%s disabled from command-line\n", entry->desc); + break; + + case ARM64_SSBD_FORCE_ENABLE: + pr_info_once("%s forced from command-line\n", entry->desc); + break; + } + return required; } #endif /* CONFIG_ARM64_SSBD */ --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1071,6 +1071,48 @@ static void cpu_has_fwb(const struct arm WARN_ON(val & (7 << 27 | 7 << 21)); }
+#ifdef CONFIG_ARM64_SSBD +static int ssbs_emulation_handler(struct pt_regs *regs, u32 instr) +{ + if (user_mode(regs)) + return 1; + + if (instr & BIT(CRm_shift)) + regs->pstate |= PSR_SSBS_BIT; + else + regs->pstate &= ~PSR_SSBS_BIT; + + arm64_skip_faulting_instruction(regs, 4); + return 0; +} + +static struct undef_hook ssbs_emulation_hook = { + .instr_mask = ~(1U << CRm_shift), + .instr_val = 0xd500001f | REG_PSTATE_SSBS_IMM, + .fn = ssbs_emulation_handler, +}; + +static void cpu_enable_ssbs(const struct arm64_cpu_capabilities *__unused) +{ + static bool undef_hook_registered = false; + static DEFINE_SPINLOCK(hook_lock); + + spin_lock(&hook_lock); + if (!undef_hook_registered) { + register_undef_hook(&ssbs_emulation_hook); + undef_hook_registered = true; + } + spin_unlock(&hook_lock); + + if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) { + sysreg_clear_set(sctlr_el1, 0, SCTLR_ELx_DSSBS); + arm64_set_ssbd_mitigation(false); + } else { + arm64_set_ssbd_mitigation(true); + } +} +#endif /* CONFIG_ARM64_SSBD */ + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -1258,6 +1300,7 @@ static const struct arm64_cpu_capabiliti .cpu_enable = cpu_enable_hw_dbm, }, #endif +#ifdef CONFIG_ARM64_SSBD { .desc = "Speculative Store Bypassing Safe (SSBS)", .capability = ARM64_SSBS, @@ -1267,7 +1310,9 @@ static const struct arm64_cpu_capabiliti .field_pos = ID_AA64PFR1_SSBS_SHIFT, .sign = FTR_UNSIGNED, .min_field_value = ID_AA64PFR1_SSBS_PSTATE_ONLY, + .cpu_enable = cpu_enable_ssbs, }, +#endif {}, };
--- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -358,6 +358,10 @@ int copy_thread(unsigned long clone_flag if (IS_ENABLED(CONFIG_ARM64_UAO) && cpus_have_const_cap(ARM64_HAS_UAO)) childregs->pstate |= PSR_UAO_BIT; + + if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) + childregs->pstate |= PSR_SSBS_BIT; + p->thread.cpu_context.x19 = stack_start; p->thread.cpu_context.x20 = stk_sz; } --- a/arch/arm64/kernel/ssbd.c +++ b/arch/arm64/kernel/ssbd.c @@ -3,13 +3,31 @@ * Copyright (C) 2018 ARM Ltd, All Rights Reserved. */
+#include <linux/compat.h> #include <linux/errno.h> #include <linux/prctl.h> #include <linux/sched.h> +#include <linux/sched/task_stack.h> #include <linux/thread_info.h>
#include <asm/cpufeature.h>
+static void ssbd_ssbs_enable(struct task_struct *task) +{ + u64 val = is_compat_thread(task_thread_info(task)) ? + PSR_AA32_SSBS_BIT : PSR_SSBS_BIT; + + task_pt_regs(task)->pstate |= val; +} + +static void ssbd_ssbs_disable(struct task_struct *task) +{ + u64 val = is_compat_thread(task_thread_info(task)) ? + PSR_AA32_SSBS_BIT : PSR_SSBS_BIT; + + task_pt_regs(task)->pstate &= ~val; +} + /* * prctl interface for SSBD * FIXME: Drop the below ifdefery once merged in 4.18. @@ -47,12 +65,14 @@ static int ssbd_prctl_set(struct task_st return -EPERM; task_clear_spec_ssb_disable(task); clear_tsk_thread_flag(task, TIF_SSBD); + ssbd_ssbs_enable(task); break; case PR_SPEC_DISABLE: if (state == ARM64_SSBD_FORCE_DISABLE) return -EPERM; task_set_spec_ssb_disable(task); set_tsk_thread_flag(task, TIF_SSBD); + ssbd_ssbs_disable(task); break; case PR_SPEC_FORCE_DISABLE: if (state == ARM64_SSBD_FORCE_DISABLE) @@ -60,6 +80,7 @@ static int ssbd_prctl_set(struct task_st task_set_spec_ssb_disable(task); task_set_spec_ssb_force_disable(task); set_tsk_thread_flag(task, TIF_SSBD); + ssbd_ssbs_disable(task); break; default: return -ERANGE;
From: Will Deacon will.deacon@arm.com
[ Upstream commit 7c36447ae5a090729e7b129f24705bb231a07e0b ]
When running without VHE, it is necessary to set SCTLR_EL2.DSSBS if SSBD has been forcefully disabled on the kernel command-line.
Acked-by: Christoffer Dall christoffer.dall@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_host.h | 11 +++++++++++ arch/arm64/kvm/hyp/sysreg-sr.c | 11 +++++++++++ 2 files changed, 22 insertions(+)
--- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -398,6 +398,8 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struc
DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
+void __kvm_enable_ssbs(void); + static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, unsigned long hyp_stack_ptr, unsigned long vector_ptr) @@ -418,6 +420,15 @@ static inline void __cpu_init_hyp_mode(p */ BUG_ON(!static_branch_likely(&arm64_const_caps_ready)); __kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr, tpidr_el2); + + /* + * Disabling SSBD on a non-VHE system requires us to enable SSBS + * at EL2. + */ + if (!has_vhe() && this_cpu_has_cap(ARM64_SSBS) && + arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) { + kvm_call_hyp(__kvm_enable_ssbs); + } }
static inline bool kvm_arch_check_sve_has_vhe(void) --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -293,3 +293,14 @@ void kvm_vcpu_put_sysregs(struct kvm_vcp
vcpu->arch.sysregs_loaded_on_cpu = false; } + +void __hyp_text __kvm_enable_ssbs(void) +{ + u64 tmp; + + asm volatile( + "mrs %0, sctlr_el2\n" + "orr %0, %0, %1\n" + "msr sctlr_el2, %0" + : "=&r" (tmp) : "L" (SCTLR_ELx_DSSBS)); +}
From: Will Deacon will.deacon@arm.com
[ Upstream commit ee91176120bd584aa10c564e7e9fdcaf397190a1 ]
We advertise the MRS/MSR instructions for toggling SSBS at EL0 using an HWCAP, so document it along with the others.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arm64/elf_hwcaps.txt | 4 ++++ 1 file changed, 4 insertions(+)
--- a/Documentation/arm64/elf_hwcaps.txt +++ b/Documentation/arm64/elf_hwcaps.txt @@ -178,3 +178,7 @@ HWCAP_ILRCPC HWCAP_FLAGM
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. + +HWCAP_SSBS + + Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.
From: Mark Rutland mark.rutland@arm.com
[ Upstream commit f54dada8274643e3ff4436df0ea124aeedc43cae ]
In valid_user_regs() we treat SSBS as a RES0 bit, and consequently it is unexpectedly cleared when we restore a sigframe or fiddle with GPRs via ptrace.
This patch fixes valid_user_regs() to account for this, updating the function to refer to the latest ARM ARM (ARM DDI 0487D.a). For AArch32 tasks, SSBS appears in bit 23 of SPSR_EL1, matching its position in the AArch32-native PSR format, and we don't need to translate it as we have to for DIT.
There are no other bit assignments that we need to account for today. As the recent documentation describes the DIT bit, we can drop our comment regarding DIT.
While removing SSBS from the RES0 masks, existing inconsistent whitespace is corrected.
Fixes: d71be2b6c0e19180 ("arm64: cpufeature: Detect SSBS and advertise to userspace") Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Suzuki K Poulose suzuki.poulose@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/ptrace.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
--- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1666,19 +1666,20 @@ void syscall_trace_exit(struct pt_regs * }
/* - * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487C.a - * We also take into account DIT (bit 24), which is not yet documented, and - * treat PAN and UAO as RES0 bits, as they are meaningless at EL0, and may be - * allocated an EL0 meaning in future. + * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487D.a. + * We permit userspace to set SSBS (AArch64 bit 12, AArch32 bit 23) which is + * not described in ARM DDI 0487D.a. + * We treat PAN and UAO as RES0 bits, as they are meaningless at EL0, and may + * be allocated an EL0 meaning in future. * Userspace cannot use these until they have an architectural meaning. * Note that this follows the SPSR_ELx format, not the AArch32 PSR format. * We also reserve IL for the kernel; SS is handled dynamically. */ #define SPSR_EL1_AARCH64_RES0_BITS \ - (GENMASK_ULL(63,32) | GENMASK_ULL(27, 25) | GENMASK_ULL(23, 22) | \ - GENMASK_ULL(20, 10) | GENMASK_ULL(5, 5)) + (GENMASK_ULL(63, 32) | GENMASK_ULL(27, 25) | GENMASK_ULL(23, 22) | \ + GENMASK_ULL(20, 13) | GENMASK_ULL(11, 10) | GENMASK_ULL(5, 5)) #define SPSR_EL1_AARCH32_RES0_BITS \ - (GENMASK_ULL(63,32) | GENMASK_ULL(23, 22) | GENMASK_ULL(20,20)) + (GENMASK_ULL(63, 32) | GENMASK_ULL(22, 22) | GENMASK_ULL(20, 20))
static int valid_compat_regs(struct user_pt_regs *regs) {
From: Mian Yousaf Kaukab ykaukab@suse.de
[ Upstream commit 3891ebccace188af075ce143d8b072b65e90f695 ]
spectre-v1 has been mitigated and the mitigation is always active. Report this to userspace via sysfs
Signed-off-by: Mian Yousaf Kaukab ykaukab@suse.de Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Acked-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -729,3 +729,9 @@ const struct arm64_cpu_capabilities arm6 { } }; + +ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, + char *buf) +{ + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); +}
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit 1b3ccf4be0e7be8c4bd8522066b6cbc92591e912 ]
We implement page table isolation as a mitigation for meltdown. Report this to userspace via sysfs.
Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpufeature.c | 58 +++++++++++++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 14 deletions(-)
--- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -889,7 +889,7 @@ static bool has_cache_dic(const struct a return ctr & BIT(CTR_DIC_SHIFT); }
-#ifdef CONFIG_UNMAP_KERNEL_AT_EL0 +static bool __meltdown_safe = true; static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */
static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry, @@ -908,6 +908,16 @@ static bool unmap_kernel_at_el0(const st { /* sentinel */ } }; char const *str = "command line option"; + bool meltdown_safe; + + meltdown_safe = is_midr_in_range_list(read_cpuid_id(), kpti_safe_list); + + /* Defer to CPU feature registers */ + if (has_cpuid_feature(entry, scope)) + meltdown_safe = true; + + if (!meltdown_safe) + __meltdown_safe = false;
/* * For reasons that aren't entirely clear, enabling KPTI on Cavium @@ -919,6 +929,19 @@ static bool unmap_kernel_at_el0(const st __kpti_forced = -1; }
+ /* Useful for KASLR robustness */ + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0) { + if (!__kpti_forced) { + str = "KASLR"; + __kpti_forced = 1; + } + } + + if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) { + pr_info_once("kernel page table isolation disabled by kernel configuration\n"); + return false; + } + /* Forced? */ if (__kpti_forced) { pr_info_once("kernel page table isolation forced %s by %s\n", @@ -926,18 +949,10 @@ static bool unmap_kernel_at_el0(const st return __kpti_forced > 0; }
- /* Useful for KASLR robustness */ - if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) - return true; - - /* Don't force KPTI for CPUs that are not vulnerable */ - if (is_midr_in_range_list(read_cpuid_id(), kpti_safe_list)) - return false; - - /* Defer to CPU feature registers */ - return !has_cpuid_feature(entry, scope); + return !meltdown_safe; }
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0 static void kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused) { @@ -962,6 +977,12 @@ kpti_install_ng_mappings(const struct ar
return; } +#else +static void +kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused) +{ +} +#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
static int __init parse_kpti(char *str) { @@ -975,7 +996,6 @@ static int __init parse_kpti(char *str) return 0; } early_param("kpti", parse_kpti); -#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
#ifdef CONFIG_ARM64_HW_AFDBM static inline void __cpu_enable_hw_dbm(void) @@ -1196,7 +1216,6 @@ static const struct arm64_cpu_capabiliti .field_pos = ID_AA64PFR0_EL0_SHIFT, .min_field_value = ID_AA64PFR0_EL0_32BIT_64BIT, }, -#ifdef CONFIG_UNMAP_KERNEL_AT_EL0 { .desc = "Kernel page table isolation (KPTI)", .capability = ARM64_UNMAP_KERNEL_AT_EL0, @@ -1212,7 +1231,6 @@ static const struct arm64_cpu_capabiliti .matches = unmap_kernel_at_el0, .cpu_enable = kpti_install_ng_mappings, }, -#endif { /* FP/SIMD is not implemented */ .capability = ARM64_HAS_NO_FPSIMD, @@ -1853,3 +1871,15 @@ void cpu_clear_disr(const struct arm64_c /* Firmware may have left a deferred SError in this register. */ write_sysreg_s(0, SYS_DISR_EL1); } + +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, + char *buf) +{ + if (__meltdown_safe) + return sprintf(buf, "Not affected\n"); + + if (arm64_kernel_unmapped_at_el0()) + return sprintf(buf, "Mitigation: PTI\n"); + + return sprintf(buf, "Vulnerable\n"); +}
From: Mian Yousaf Kaukab ykaukab@suse.de
[ Upstream commit 61ae1321f06c4489c724c803e9b8363dea576da3 ]
Enable CPU vulnerabilty show functions for spectre_v1, spectre_v2, meltdown and store-bypass.
Signed-off-by: Mian Yousaf Kaukab ykaukab@suse.de Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -84,6 +84,7 @@ config ARM64 select GENERIC_CLOCKEVENTS select GENERIC_CLOCKEVENTS_BROADCAST select GENERIC_CPU_AUTOPROBE + select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP select GENERIC_IDLE_POLL_SETUP select GENERIC_IRQ_MULTI_HANDLER
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit d42281b6e49510f078ace15a8ea10f71e6262581 ]
Ensure we are always able to detect whether or not the CPU is affected by SSB, so that we can later advertise this to userspace.
Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com [will: Use IS_ENABLED instead of #ifdef] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 4 ---- arch/arm64/kernel/cpu_errata.c | 9 +++++---- 2 files changed, 5 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -525,11 +525,7 @@ static inline int arm64_get_ssbd_state(v #endif }
-#ifdef CONFIG_ARM64_SSBD void arm64_set_ssbd_mitigation(bool state); -#else -static inline void arm64_set_ssbd_mitigation(bool state) {} -#endif
#endif /* __ASSEMBLY__ */
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -239,7 +239,6 @@ enable_smccc_arch_workaround_1(const str } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
-#ifdef CONFIG_ARM64_SSBD DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
int ssbd_state __read_mostly = ARM64_SSBD_KERNEL; @@ -312,6 +311,11 @@ void __init arm64_enable_wa2_handling(st
void arm64_set_ssbd_mitigation(bool state) { + if (!IS_ENABLED(CONFIG_ARM64_SSBD)) { + pr_info_once("SSBD disabled by kernel configuration\n"); + return; + } + if (this_cpu_has_cap(ARM64_SSBS)) { if (state) asm volatile(SET_PSTATE_SSBS(0)); @@ -431,7 +435,6 @@ out_printmsg:
return required; } -#endif /* CONFIG_ARM64_SSBD */
#ifdef CONFIG_ARM64_ERRATUM_1463225 DEFINE_PER_CPU(int, __in_cortex_a76_erratum_1463225_wa); @@ -710,14 +713,12 @@ const struct arm64_cpu_capabilities arm6 ERRATA_MIDR_RANGE_LIST(arm64_harden_el2_vectors), }, #endif -#ifdef CONFIG_ARM64_SSBD { .desc = "Speculative Store Bypass Disable", .capability = ARM64_SSBD, .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, .matches = has_ssbd_mitigation, }, -#endif #ifdef CONFIG_ARM64_ERRATUM_1463225 { .desc = "ARM erratum 1463225",
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit e5ce5e7267ddcbe13ab9ead2542524e1b7993e5a ]
There are various reasons, such as benchmarking, to disable spectrev2 mitigation on a machine. Provide a command-line option to do so.
Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Cc: Jonathan Corbet corbet@lwn.net Cc: linux-doc@vger.kernel.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 8 ++++---- arch/arm64/kernel/cpu_errata.c | 13 +++++++++++++ 2 files changed, 17 insertions(+), 4 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2866,10 +2866,10 @@ (bounds check bypass). With this option data leaks are possible in the system.
- nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2 - (indirect branch prediction) vulnerability. System may - allow data leaks with this option, which is equivalent - to spectre_v2=off. + nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for + the Spectre variant 2 (indirect branch prediction) + vulnerability. System may allow data leaks with this + option.
nospec_store_bypass_disable [HW] Disable all mitigations for the Speculative Store Bypass vulnerability --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -189,6 +189,14 @@ static void qcom_link_stack_sanitization : "=&r" (tmp)); }
+static bool __nospectre_v2; +static int __init parse_nospectre_v2(char *str) +{ + __nospectre_v2 = true; + return 0; +} +early_param("nospectre_v2", parse_nospectre_v2); + static void enable_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry) { @@ -200,6 +208,11 @@ enable_smccc_arch_workaround_1(const str if (!entry->matches(entry, SCOPE_LOCAL_CPU)) return;
+ if (__nospectre_v2) { + pr_info_once("spectrev2 mitigation disabled by command line option\n"); + return; + } + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) return;
From: Marc Zyngier marc.zyngier@arm.com
[ Upstream commit 73f38166095947f3b86b02fbed6bd592223a7ac8 ]
We currently have a list of CPUs affected by Spectre-v2, for which we check that the firmware implements ARCH_WORKAROUND_1. It turns out that not all firmwares do implement the required mitigation, and that we fail to let the user know about it.
Instead, let's slightly revamp our checks, and rely on a whitelist of cores that are known to be non-vulnerable, and let the user know the status of the mitigation in the kernel log.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 109 +++++++++++++++++++++-------------------- 1 file changed, 56 insertions(+), 53 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -109,9 +109,9 @@ static void __copy_hyp_vect_bpi(int slot __flush_icache_range((uintptr_t)dst, (uintptr_t)dst + SZ_2K); }
-static void __install_bp_hardening_cb(bp_hardening_cb_t fn, - const char *hyp_vecs_start, - const char *hyp_vecs_end) +static void install_bp_hardening_cb(bp_hardening_cb_t fn, + const char *hyp_vecs_start, + const char *hyp_vecs_end) { static DEFINE_SPINLOCK(bp_lock); int cpu, slot = -1; @@ -138,7 +138,7 @@ static void __install_bp_hardening_cb(bp #define __smccc_workaround_1_smc_start NULL #define __smccc_workaround_1_smc_end NULL
-static void __install_bp_hardening_cb(bp_hardening_cb_t fn, +static void install_bp_hardening_cb(bp_hardening_cb_t fn, const char *hyp_vecs_start, const char *hyp_vecs_end) { @@ -146,23 +146,6 @@ static void __install_bp_hardening_cb(bp } #endif /* CONFIG_KVM_INDIRECT_VECTORS */
-static void install_bp_hardening_cb(const struct arm64_cpu_capabilities *entry, - bp_hardening_cb_t fn, - const char *hyp_vecs_start, - const char *hyp_vecs_end) -{ - u64 pfr0; - - if (!entry->matches(entry, SCOPE_LOCAL_CPU)) - return; - - pfr0 = read_cpuid(ID_AA64PFR0_EL1); - if (cpuid_feature_extract_unsigned_field(pfr0, ID_AA64PFR0_CSV2_SHIFT)) - return; - - __install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end); -} - #include <uapi/linux/psci.h> #include <linux/arm-smccc.h> #include <linux/psci.h> @@ -197,31 +180,27 @@ static int __init parse_nospectre_v2(cha } early_param("nospectre_v2", parse_nospectre_v2);
-static void -enable_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry) +/* + * -1: No workaround + * 0: No workaround required + * 1: Workaround installed + */ +static int detect_harden_bp_fw(void) { bp_hardening_cb_t cb; void *smccc_start, *smccc_end; struct arm_smccc_res res; u32 midr = read_cpuid_id();
- if (!entry->matches(entry, SCOPE_LOCAL_CPU)) - return; - - if (__nospectre_v2) { - pr_info_once("spectrev2 mitigation disabled by command line option\n"); - return; - } - if (psci_ops.smccc_version == SMCCC_VERSION_1_0) - return; + return -1;
switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_1, &res); if ((int)res.a0 < 0) - return; + return -1; cb = call_hvc_arch_workaround_1; /* This is a guest, no need to patch KVM vectors */ smccc_start = NULL; @@ -232,23 +211,23 @@ enable_smccc_arch_workaround_1(const str arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_1, &res); if ((int)res.a0 < 0) - return; + return -1; cb = call_smc_arch_workaround_1; smccc_start = __smccc_workaround_1_smc_start; smccc_end = __smccc_workaround_1_smc_end; break;
default: - return; + return -1; }
if (((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR) || ((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1)) cb = qcom_link_stack_sanitization;
- install_bp_hardening_cb(entry, cb, smccc_start, smccc_end); + install_bp_hardening_cb(cb, smccc_start, smccc_end);
- return; + return 1; } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
@@ -535,24 +514,48 @@ multi_entry_cap_cpu_enable(const struct }
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR - /* - * List of CPUs where we need to issue a psci call to - * harden the branch predictor. + * List of CPUs that do not need any Spectre-v2 mitigation at all. */ -static const struct midr_range arm64_bp_harden_smccc_cpus[] = { - MIDR_ALL_VERSIONS(MIDR_CORTEX_A57), - MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), - MIDR_ALL_VERSIONS(MIDR_CORTEX_A73), - MIDR_ALL_VERSIONS(MIDR_CORTEX_A75), - MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN), - MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2), - MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR_V1), - MIDR_ALL_VERSIONS(MIDR_QCOM_FALKOR), - MIDR_ALL_VERSIONS(MIDR_NVIDIA_DENVER), - {}, +static const struct midr_range spectre_v2_safe_list[] = { + MIDR_ALL_VERSIONS(MIDR_CORTEX_A35), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A53), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A55), + { /* sentinel */ } };
+static bool __maybe_unused +check_branch_predictor(const struct arm64_cpu_capabilities *entry, int scope) +{ + int need_wa; + + WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + + /* If the CPU has CSV2 set, we're safe */ + if (cpuid_feature_extract_unsigned_field(read_cpuid(ID_AA64PFR0_EL1), + ID_AA64PFR0_CSV2_SHIFT)) + return false; + + /* Alternatively, we have a list of unaffected CPUs */ + if (is_midr_in_range_list(read_cpuid_id(), spectre_v2_safe_list)) + return false; + + /* Fallback to firmware detection */ + need_wa = detect_harden_bp_fw(); + if (!need_wa) + return false; + + /* forced off */ + if (__nospectre_v2) { + pr_info_once("spectrev2 mitigation disabled by command line option\n"); + return false; + } + + if (need_wa < 0) + pr_warn_once("ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware\n"); + + return (need_wa > 0); +} #endif
#ifdef CONFIG_HARDEN_EL2_VECTORS @@ -715,8 +718,8 @@ const struct arm64_cpu_capabilities arm6 #ifdef CONFIG_HARDEN_BRANCH_PREDICTOR { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, - .cpu_enable = enable_smccc_arch_workaround_1, - ERRATA_MIDR_RANGE_LIST(arm64_bp_harden_smccc_cpus), + .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, + .matches = check_branch_predictor, }, #endif #ifdef CONFIG_HARDEN_EL2_VECTORS
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit 8c1e3d2bb44cbb998cb28ff9a18f105fee7f1eb3 ]
Ensure we are always able to detect whether or not the CPU is affected by Spectre-v2, so that we can later advertise this to userspace.
Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -87,7 +87,6 @@ cpu_enable_trap_ctr_access(const struct
atomic_t arm64_el2_vector_last_slot = ATOMIC_INIT(-1);
-#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR #include <asm/mmu_context.h> #include <asm/cacheflush.h>
@@ -225,11 +224,11 @@ static int detect_harden_bp_fw(void) ((midr & MIDR_CPU_MODEL_MASK) == MIDR_QCOM_FALKOR_V1)) cb = qcom_link_stack_sanitization;
- install_bp_hardening_cb(cb, smccc_start, smccc_end); + if (IS_ENABLED(CONFIG_HARDEN_BRANCH_PREDICTOR)) + install_bp_hardening_cb(cb, smccc_start, smccc_end);
return 1; } -#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
@@ -513,7 +512,6 @@ multi_entry_cap_cpu_enable(const struct caps->cpu_enable(caps); }
-#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR /* * List of CPUs that do not need any Spectre-v2 mitigation at all. */ @@ -545,6 +543,12 @@ check_branch_predictor(const struct arm6 if (!need_wa) return false;
+ if (!IS_ENABLED(CONFIG_HARDEN_BRANCH_PREDICTOR)) { + pr_warn_once("spectrev2 mitigation disabled by kernel configuration\n"); + __hardenbp_enab = false; + return false; + } + /* forced off */ if (__nospectre_v2) { pr_info_once("spectrev2 mitigation disabled by command line option\n"); @@ -556,7 +560,6 @@ check_branch_predictor(const struct arm6
return (need_wa > 0); } -#endif
#ifdef CONFIG_HARDEN_EL2_VECTORS
@@ -715,13 +718,11 @@ const struct arm64_cpu_capabilities arm6 ERRATA_MIDR_ALL_VERSIONS(MIDR_CORTEX_A73), }, #endif -#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, .matches = check_branch_predictor, }, -#endif #ifdef CONFIG_HARDEN_EL2_VECTORS { .desc = "EL2 vector hardening",
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit d2532e27b5638bb2e2dd52b80b7ea2ec65135377 ]
Track whether all the cores in the machine are vulnerable to Spectre-v2, and whether all the vulnerable cores have been mitigated. We then expose this information to userspace via sysfs.
Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -480,6 +480,10 @@ has_cortex_a76_erratum_1463225(const str .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, \ CAP_MIDR_RANGE_LIST(midr_list)
+/* Track overall mitigation state. We are only mitigated if all cores are ok */ +static bool __hardenbp_enab = true; +static bool __spectrev2_safe = true; + /* * Generic helper for handling capabilties with multiple (match,enable) pairs * of call backs, sharing the same capability bit. @@ -522,6 +526,10 @@ static const struct midr_range spectre_v { /* sentinel */ } };
+/* + * Track overall bp hardening for all heterogeneous cores in the machine. + * We are only considered "safe" if all booted cores are known safe. + */ static bool __maybe_unused check_branch_predictor(const struct arm64_cpu_capabilities *entry, int scope) { @@ -543,6 +551,8 @@ check_branch_predictor(const struct arm6 if (!need_wa) return false;
+ __spectrev2_safe = false; + if (!IS_ENABLED(CONFIG_HARDEN_BRANCH_PREDICTOR)) { pr_warn_once("spectrev2 mitigation disabled by kernel configuration\n"); __hardenbp_enab = false; @@ -552,11 +562,14 @@ check_branch_predictor(const struct arm6 /* forced off */ if (__nospectre_v2) { pr_info_once("spectrev2 mitigation disabled by command line option\n"); + __hardenbp_enab = false; return false; }
- if (need_wa < 0) + if (need_wa < 0) { pr_warn_once("ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware\n"); + __hardenbp_enab = false; + }
return (need_wa > 0); } @@ -753,3 +766,15 @@ ssize_t cpu_show_spectre_v1(struct devic { return sprintf(buf, "Mitigation: __user pointer sanitization\n"); } + +ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, + char *buf) +{ + if (__spectrev2_safe) + return sprintf(buf, "Not affected\n"); + + if (__hardenbp_enab) + return sprintf(buf, "Mitigation: Branch predictor hardening\n"); + + return sprintf(buf, "Vulnerable\n"); +}
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit 526e065dbca6df0b5a130b84b836b8b3c9f54e21 ]
Return status based on ssbd_state and __ssb_safe. If the mitigation is disabled, or the firmware isn't responding then return the expected machine state based on a whitelist of known good cores.
Given a heterogeneous machine, the overall machine vulnerability defaults to safe but is reset to unsafe when we miss the whitelist and the firmware doesn't explicitly tell us the core is safe. In order to make that work we delay transitioning to vulnerable until we know the firmware isn't responding to avoid a case where we miss the whitelist, but the firmware goes ahead and reports the core is not vulnerable. If all the cores in the machine have SSBS, then __ssb_safe will remain true.
Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Jeremy Linton jeremy.linton@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 42 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -233,6 +233,7 @@ static int detect_harden_bp_fw(void) DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
int ssbd_state __read_mostly = ARM64_SSBD_KERNEL; +static bool __ssb_safe = true;
static const struct ssbd_options { const char *str; @@ -336,6 +337,7 @@ static bool has_ssbd_mitigation(const st struct arm_smccc_res res; bool required = true; s32 val; + bool this_cpu_safe = false;
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
@@ -344,8 +346,14 @@ static bool has_ssbd_mitigation(const st goto out_printmsg; }
+ /* delay setting __ssb_safe until we get a firmware response */ + if (is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list)) + this_cpu_safe = true; + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { ssbd_state = ARM64_SSBD_UNKNOWN; + if (!this_cpu_safe) + __ssb_safe = false; return false; }
@@ -362,6 +370,8 @@ static bool has_ssbd_mitigation(const st
default: ssbd_state = ARM64_SSBD_UNKNOWN; + if (!this_cpu_safe) + __ssb_safe = false; return false; }
@@ -370,14 +380,18 @@ static bool has_ssbd_mitigation(const st switch (val) { case SMCCC_RET_NOT_SUPPORTED: ssbd_state = ARM64_SSBD_UNKNOWN; + if (!this_cpu_safe) + __ssb_safe = false; return false;
+ /* machines with mixed mitigation requirements must not return this */ case SMCCC_RET_NOT_REQUIRED: pr_info_once("%s mitigation not required\n", entry->desc); ssbd_state = ARM64_SSBD_MITIGATED; return false;
case SMCCC_RET_SUCCESS: + __ssb_safe = false; required = true; break;
@@ -387,6 +401,8 @@ static bool has_ssbd_mitigation(const st
default: WARN_ON(1); + if (!this_cpu_safe) + __ssb_safe = false; return false; }
@@ -427,6 +443,14 @@ out_printmsg: return required; }
+/* known invulnerable cores */ +static const struct midr_range arm64_ssb_cpus[] = { + MIDR_ALL_VERSIONS(MIDR_CORTEX_A35), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A53), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A55), + {}, +}; + #ifdef CONFIG_ARM64_ERRATUM_1463225 DEFINE_PER_CPU(int, __in_cortex_a76_erratum_1463225_wa);
@@ -748,6 +772,7 @@ const struct arm64_cpu_capabilities arm6 .capability = ARM64_SSBD, .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, .matches = has_ssbd_mitigation, + .midr_range_list = arm64_ssb_cpus, }, #ifdef CONFIG_ARM64_ERRATUM_1463225 { @@ -778,3 +803,20 @@ ssize_t cpu_show_spectre_v2(struct devic
return sprintf(buf, "Vulnerable\n"); } + +ssize_t cpu_show_spec_store_bypass(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (__ssb_safe) + return sprintf(buf, "Not affected\n"); + + switch (ssbd_state) { + case ARM64_SSBD_KERNEL: + case ARM64_SSBD_FORCE_ENABLE: + if (IS_ENABLED(CONFIG_ARM64_SSBD)) + return sprintf(buf, + "Mitigation: Speculative Store Bypass disabled via prctl\n"); + } + + return sprintf(buf, "Vulnerable\n"); +}
From: Will Deacon will.deacon@arm.com
[ Upstream commit eb337cdfcd5dd3b10522c2f34140a73a4c285c30 ]
SSBS provides a relatively cheap mitigation for SSB, but it is still a mitigation and its presence does not indicate that the CPU is unaffected by the vulnerability.
Tweak the mitigation logic so that we report the correct string in sysfs.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -341,15 +341,17 @@ static bool has_ssbd_mitigation(const st
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+ /* delay setting __ssb_safe until we get a firmware response */ + if (is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list)) + this_cpu_safe = true; + if (this_cpu_has_cap(ARM64_SSBS)) { + if (!this_cpu_safe) + __ssb_safe = false; required = false; goto out_printmsg; }
- /* delay setting __ssb_safe until we get a firmware response */ - if (is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list)) - this_cpu_safe = true; - if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { ssbd_state = ARM64_SSBD_UNKNOWN; if (!this_cpu_safe)
From: Marc Zyngier marc.zyngier@arm.com
[ Upstream commit cbdf8a189a66001c36007bf0f5c975d0376c5c3a ]
On a CPU that doesn't support SSBS, PSTATE[12] is RES0. In a system where only some of the CPUs implement SSBS, we end-up losing track of the SSBS bit across task migration.
To address this issue, let's force the SSBS bit on context switch.
Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3") Signed-off-by: Marc Zyngier marc.zyngier@arm.com [will: inverted logic and added comments] Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/processor.h | 14 ++++++++++++-- arch/arm64/kernel/process.c | 29 ++++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 3 deletions(-)
--- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -177,6 +177,16 @@ static inline void start_thread_common(s regs->pc = pc; }
+static inline void set_ssbs_bit(struct pt_regs *regs) +{ + regs->pstate |= PSR_SSBS_BIT; +} + +static inline void set_compat_ssbs_bit(struct pt_regs *regs) +{ + regs->pstate |= PSR_AA32_SSBS_BIT; +} + static inline void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp) { @@ -184,7 +194,7 @@ static inline void start_thread(struct p regs->pstate = PSR_MODE_EL0t;
if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) - regs->pstate |= PSR_SSBS_BIT; + set_ssbs_bit(regs);
regs->sp = sp; } @@ -203,7 +213,7 @@ static inline void compat_start_thread(s #endif
if (arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE) - regs->pstate |= PSR_AA32_SSBS_BIT; + set_compat_ssbs_bit(regs);
regs->compat_sp = sp; } --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -360,7 +360,7 @@ int copy_thread(unsigned long clone_flag childregs->pstate |= PSR_UAO_BIT;
if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) - childregs->pstate |= PSR_SSBS_BIT; + set_ssbs_bit(childregs);
p->thread.cpu_context.x19 = stack_start; p->thread.cpu_context.x20 = stk_sz; @@ -402,6 +402,32 @@ void uao_thread_switch(struct task_struc }
/* + * Force SSBS state on context-switch, since it may be lost after migrating + * from a CPU which treats the bit as RES0 in a heterogeneous system. + */ +static void ssbs_thread_switch(struct task_struct *next) +{ + struct pt_regs *regs = task_pt_regs(next); + + /* + * Nothing to do for kernel threads, but 'regs' may be junk + * (e.g. idle task) so check the flags and bail early. + */ + if (unlikely(next->flags & PF_KTHREAD)) + return; + + /* If the mitigation is enabled, then we leave SSBS clear. */ + if ((arm64_get_ssbd_state() == ARM64_SSBD_FORCE_ENABLE) || + test_tsk_thread_flag(next, TIF_SSBD)) + return; + + if (compat_user_mode(regs)) + set_compat_ssbs_bit(regs); + else if (user_mode(regs)) + set_ssbs_bit(regs); +} + +/* * We store our current task in sp_el0, which is clobbered by userspace. Keep a * shadow copy so that we can restore this upon entry from userspace. * @@ -429,6 +455,7 @@ __notrace_funcgraph struct task_struct * contextidr_thread_switch(next); entry_task_switch(next); uao_thread_switch(next); + ssbs_thread_switch(next);
/* * Complete any pending TLB or cache maintenance on this CPU in case
From: Marc Zyngier marc.zyngier@arm.com
commit 517953c2c47f9c00a002f588ac856a5bc70cede3 upstream.
The SMCCC ARCH_WORKAROUND_1 service can indicate that although the firmware knows about the Spectre-v2 mitigation, this particular CPU is not vulnerable, and it is thus not necessary to call the firmware on this CPU.
Let's use this information to our benefit.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Jeremy Linton jeremy.linton@arm.com Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Tested-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -198,22 +198,36 @@ static int detect_harden_bp_fw(void) case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_1, &res); - if ((int)res.a0 < 0) + switch ((int)res.a0) { + case 1: + /* Firmware says we're just fine */ + return 0; + case 0: + cb = call_hvc_arch_workaround_1; + /* This is a guest, no need to patch KVM vectors */ + smccc_start = NULL; + smccc_end = NULL; + break; + default: return -1; - cb = call_hvc_arch_workaround_1; - /* This is a guest, no need to patch KVM vectors */ - smccc_start = NULL; - smccc_end = NULL; + } break;
case PSCI_CONDUIT_SMC: arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_1, &res); - if ((int)res.a0 < 0) + switch ((int)res.a0) { + case 1: + /* Firmware says we're just fine */ + return 0; + case 0: + cb = call_smc_arch_workaround_1; + smccc_start = __smccc_workaround_1_smc_start; + smccc_end = __smccc_workaround_1_smc_end; + break; + default: return -1; - cb = call_smc_arch_workaround_1; - smccc_start = __smccc_workaround_1_smc_start; - smccc_end = __smccc_workaround_1_smc_end; + } break;
default:
From: Josh Poimboeuf jpoimboe@redhat.com
commit a111b7c0f20e13b54df2fa959b3dc0bdf1925ae6 upstream.
Configure arm64 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2, and Speculative Store Bypass.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com [will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/admin-guide/kernel-parameters.txt | 8 +++++--- arch/arm64/kernel/cpu_errata.c | 6 +++++- arch/arm64/kernel/cpufeature.c | 8 +++++++- 3 files changed, 17 insertions(+), 5 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2503,8 +2503,8 @@ http://repo.or.cz/w/linux-2.6/mini2440.git
mitigations= - [X86,PPC,S390] Control optional mitigations for CPU - vulnerabilities. This is a set of curated, + [X86,PPC,S390,ARM64] Control optional mitigations for + CPU vulnerabilities. This is a set of curated, arch-independent options, each of which is an aggregation of existing arch-specific options.
@@ -2513,12 +2513,14 @@ improves system performance, but it may also expose users to several CPU vulnerabilities. Equivalent to: nopti [X86,PPC] + kpti=0 [ARM64] nospectre_v1 [PPC] nobp=0 [S390] nospectre_v1 [X86] - nospectre_v2 [X86,PPC,S390] + nospectre_v2 [X86,PPC,S390,ARM64] spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] + ssbd=force-off [ARM64] l1tf=off [X86] mds=off [X86]
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -19,6 +19,7 @@ #include <linux/arm-smccc.h> #include <linux/psci.h> #include <linux/types.h> +#include <linux/cpu.h> #include <asm/cpu.h> #include <asm/cputype.h> #include <asm/cpufeature.h> @@ -355,6 +356,9 @@ static bool has_ssbd_mitigation(const st
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+ if (cpu_mitigations_off()) + ssbd_state = ARM64_SSBD_FORCE_DISABLE; + /* delay setting __ssb_safe until we get a firmware response */ if (is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list)) this_cpu_safe = true; @@ -600,7 +604,7 @@ check_branch_predictor(const struct arm6 }
/* forced off */ - if (__nospectre_v2) { + if (__nospectre_v2 || cpu_mitigations_off()) { pr_info_once("spectrev2 mitigation disabled by command line option\n"); __hardenbp_enab = false; return false; --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -24,6 +24,7 @@ #include <linux/stop_machine.h> #include <linux/types.h> #include <linux/mm.h> +#include <linux/cpu.h> #include <asm/cpu.h> #include <asm/cpufeature.h> #include <asm/cpu_ops.h> @@ -907,7 +908,7 @@ static bool unmap_kernel_at_el0(const st MIDR_ALL_VERSIONS(MIDR_CORTEX_A73), { /* sentinel */ } }; - char const *str = "command line option"; + char const *str = "kpti command line option"; bool meltdown_safe;
meltdown_safe = is_midr_in_range_list(read_cpuid_id(), kpti_safe_list); @@ -937,6 +938,11 @@ static bool unmap_kernel_at_el0(const st } }
+ if (cpu_mitigations_off() && !__kpti_forced) { + str = "mitigations=off"; + __kpti_forced = -1; + } + if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) { pr_info_once("kernel page table isolation disabled by kernel configuration\n"); return false;
From: Eric Sandeen sandeen@redhat.com
commit cc3a7bfe62b947b423fcb2cfe89fcba92bf48fa3 upstream.
Today, put_compat_statfs64() disallows nearly any field value over 2^32 if f_bsize is only 32 bits, but that makes no sense. compat_statfs64 is there for the explicit purpose of providing 64-bit fields for f_files, f_ffree, etc. And f_bsize is always only 32 bits.
As a result, 32-bit userspace gets -EOVERFLOW for i.e. large file counts even with -D_FILE_OFFSET_BITS=64 set.
In reality, only f_bsize and f_frsize can legitimately overflow (fields like f_type and f_namelen should never be large), so test only those fields.
This bug was discussed at length some time ago, and this is the proposal Al suggested at https://lkml.org/lkml/2018/8/6/640. It seemed to get dropped amid the discussion of other related changes, but this part seems obviously correct on its own, so I've picked it up and sent it, for expediency.
Fixes: 64d2ab32efe3 ("vfs: fix put_compat_statfs64() does not handle errors") Signed-off-by: Eric Sandeen sandeen@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/statfs.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-)
--- a/fs/statfs.c +++ b/fs/statfs.c @@ -304,19 +304,10 @@ COMPAT_SYSCALL_DEFINE2(fstatfs, unsigned static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstatfs *kbuf) { struct compat_statfs64 buf; - if (sizeof(ubuf->f_bsize) == 4) { - if ((kbuf->f_type | kbuf->f_bsize | kbuf->f_namelen | - kbuf->f_frsize | kbuf->f_flags) & 0xffffffff00000000ULL) - return -EOVERFLOW; - /* f_files and f_ffree may be -1; it's okay - * to stuff that into 32 bits */ - if (kbuf->f_files != 0xffffffffffffffffULL - && (kbuf->f_files & 0xffffffff00000000ULL)) - return -EOVERFLOW; - if (kbuf->f_ffree != 0xffffffffffffffffULL - && (kbuf->f_ffree & 0xffffffff00000000ULL)) - return -EOVERFLOW; - } + + if ((kbuf->f_bsize | kbuf->f_frsize) & 0xffffffff00000000ULL) + return -EOVERFLOW; + memset(&buf, 0, sizeof(struct compat_statfs64)); buf.f_type = kbuf->f_type; buf.f_bsize = kbuf->f_bsize;
From: Andrew Murray andrew.murray@arm.com
commit 1004ce4c255fc3eb3ad9145ddd53547d1b7ce327 upstream.
Synchronization is recommended before disabling the trace registers to prevent any start or stop points being speculative at the point of disabling the unit (section 7.3.77 of ARM IHI 0064D).
Synchronization is also recommended after programming the trace registers to ensure all updates are committed prior to normal code resuming (section 4.3.7 of ARM IHI 0064D).
Let's ensure these syncronization points are present in the code and clearly commented.
Note that we could rely on the barriers in CS_LOCK and coresight_disclaim_device_unlocked or the context switch to user space - however coresight may be of use in the kernel.
On armv8 the mb macro is defined as dsb(sy) - Given that the etm4x is only used on armv8 let's directly use dsb(sy) instead of mb(). This removes some ambiguity and makes it easier to correlate the code with the TRM.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com [Fixed capital letter for "use" in title] Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Link: https://lore.kernel.org/r/20190829202842.580-11-mathieu.poirier@linaro.org Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/coresight/coresight-etm4x.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
--- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -174,6 +174,12 @@ static void etm4_enable_hw(void *info) if (coresight_timeout(drvdata->base, TRCSTATR, TRCSTATR_IDLE_BIT, 0)) dev_err(drvdata->dev, "timeout while waiting for Idle Trace Status\n"); + /* + * As recommended by section 4.3.7 ("Synchronization when using the + * memory-mapped interface") of ARM IHI 0064D + */ + dsb(sy); + isb();
CS_LOCK(drvdata->base);
@@ -324,8 +330,12 @@ static void etm4_disable_hw(void *info) /* EN, bit[0] Trace unit enable bit */ control &= ~0x1;
- /* make sure everything completes before disabling */ - mb(); + /* + * Make sure everything completes before disabling, as recommended + * by section 7.3.77 ("TRCVICTLR, ViewInst Main Control Register, + * SSTATUS") of ARM IHI 0064D + */ + dsb(sy); isb(); writel_relaxed(control, drvdata->base + TRCPRGCTLR);
From: Gao Xiang gaoxiang25@huawei.com
commit acb383f1dcb4f1e79b66d4be3a0b6f519a957b0d upstream.
Richard observed a forever loop of erofs_read_raw_page() [1] which can be generated by forcely setting ->u.i_blkaddr to 0xdeadbeef (as my understanding block layer can handle access beyond end of device correctly).
After digging into that, it seems the problem is highly related with directories and then I found the root cause is an improper error handling in erofs_readdir().
Let's fix it now.
[1] https://lore.kernel.org/r/1163995781.68824.1566084358245.JavaMail.zimbra@nod...
Reported-by: Richard Weinberger richard@nod.at Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations") Cc: stable@vger.kernel.org # 4.19+ Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Link: https://lore.kernel.org/r/20190818125457.25906-1-hsiangkao@aol.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use original error code instead. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/dir.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/staging/erofs/dir.c +++ b/drivers/staging/erofs/dir.c @@ -100,8 +100,15 @@ static int erofs_readdir(struct file *f, unsigned nameoff, maxsize;
dentry_page = read_mapping_page(mapping, i, NULL); - if (IS_ERR(dentry_page)) - continue; + if (dentry_page == ERR_PTR(-ENOMEM)) { + err = -ENOMEM; + break; + } else if (IS_ERR(dentry_page)) { + errln("fail to readdir of logical block %u of nid %llu", + i, EROFS_V(dir)->nid); + err = PTR_ERR(dentry_page); + break; + }
lock_page(dentry_page); de = (struct erofs_dirent *)kmap(dentry_page);
From: Gao Xiang gaoxiang25@huawei.com
commit ee45197c807895e156b2be0abcaebdfc116487c8 upstream.
As reported by erofs_utils fuzzer, a logical page can belong to at most 2 compressed clusters, if one compressed cluster is corrupted, but the other has been ready in submitting chain.
The chain needs to submit anyway in order to keep the page working properly (page unlocked with PG_error set, PG_uptodate not set).
Let's fix it now.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190819103426.87579-2-gaoxiang25@huawei.com [ Gao Xiang: Manually backport to v4.19.y stable. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/unzip_vle.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -1335,19 +1335,18 @@ static int z_erofs_vle_normalaccess_read err = z_erofs_do_read_page(&f, page, &pagepool); (void)z_erofs_vle_work_iter_end(&f.builder);
- if (err) { + /* if some compressed cluster ready, need submit them anyway */ + z_erofs_submit_and_unzip(&f, &pagepool, true); + + if (err) errln("%s, failed to read, err [%d]", __func__, err); - goto out; - }
- z_erofs_submit_and_unzip(&f, &pagepool, true); -out: if (f.m_iter.mpage != NULL) put_page(f.m_iter.mpage);
/* clean up the remaining free pages */ put_pages_list(&pagepool); - return 0; + return err; }
static inline int __z_erofs_vle_normalaccess_readpages(
From: Gao Xiang gaoxiang25@huawei.com
commit 138e1a0990e80db486ab9f6c06bd5c01f9a97999 upstream.
As reported by erofs-utils fuzzer, these error handling path will be entered to handle corrupted images.
Lack of erofs_workgroup_puts will cause unmounting unsuccessfully.
Fix these return values to EFSCORRUPTED as well.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190819103426.87579-4-gaoxiang25@huawei.com [ Gao Xiang: Older kernel versions don't have length validity check and EFSCORRUPTED, thus backport pageofs check for now. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/unzip_vle.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -311,7 +311,11 @@ z_erofs_vle_work_lookup(struct super_blo /* if multiref is disabled, `primary' is always true */ primary = true;
- DBG_BUGON(work->pageofs != pageofs); + if (work->pageofs != pageofs) { + DBG_BUGON(1); + erofs_workgroup_put(egrp); + return ERR_PTR(-EIO); + }
/* * lock must be taken first to avoid grp->next == NIL between
From: Gao Xiang gaoxiang25@huawei.com
commit e12a0ce2fa69798194f3a8628baf6edfbd5c548f upstream.
As reported by erofs-utils fuzzer, currently, multiref (ondisk deduplication) hasn't been supported for now, we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use EIO instead. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/unzip_vle.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
--- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -857,6 +857,7 @@ repeat: for (i = 0; i < nr_pages; ++i) pages[i] = NULL;
+ err = 0; z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_VLE_INLINE_PAGEVECS, work->pagevec, 0);
@@ -878,8 +879,17 @@ repeat: pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages); - DBG_BUGON(pages[pagenr]);
+ /* + * currently EROFS doesn't support multiref(dedup), + * so here erroring out one multiref page. + */ + if (pages[pagenr]) { + DBG_BUGON(1); + SetPageError(pages[pagenr]); + z_erofs_onlinepage_endio(pages[pagenr]); + err = -EIO; + } pages[pagenr] = page; } sparsemem_pages = i; @@ -889,7 +899,6 @@ repeat: overlapped = false; compressed_pages = grp->compressed_pages;
- err = 0; for (i = 0; i < clusterpages; ++i) { unsigned pagenr;
@@ -915,7 +924,12 @@ repeat: pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages); - DBG_BUGON(pages[pagenr]); + if (pages[pagenr]) { + DBG_BUGON(1); + SetPageError(pages[pagenr]); + z_erofs_onlinepage_endio(pages[pagenr]); + err = -EIO; + } ++sparsemem_pages; pages[pagenr] = page;
From: Johannes Berg johannes.berg@intel.com
commit 0f3b07f027f87a38ebe5c436490095df762819be upstream.
Rather than always iterating elements from frames with pure u8 pointers, add a type "struct element" that encapsulates the id/datalen/data format of them.
Then, add the element iteration macros * for_each_element * for_each_element_id * for_each_element_extid
which take, as their first 'argument', such a structure and iterate through a given u8 array interpreting it as elements.
While at it and since we'll need it, also add * for_each_subelement * for_each_subelement_id * for_each_subelement_extid
which instead of taking data/length just take an outer element and use its data/datalen.
Also add for_each_element_completed() to determine if any of the loops above completed, i.e. it was able to parse all of the elements successfully and no data remained.
Use for_each_element_id() in cfg80211_find_ie_match() as the first user of this.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/ieee80211.h | 53 ++++++++++++++++++++++++++++++++++++++++++++++ net/wireless/scan.c | 14 +++++------- 2 files changed, 59 insertions(+), 8 deletions(-)
--- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -3185,4 +3185,57 @@ static inline bool ieee80211_action_cont return true; }
+struct element { + u8 id; + u8 datalen; + u8 data[]; +}; + +/* element iteration helpers */ +#define for_each_element(element, _data, _datalen) \ + for (element = (void *)(_data); \ + (u8 *)(_data) + (_datalen) - (u8 *)element >= \ + sizeof(*element) && \ + (u8 *)(_data) + (_datalen) - (u8 *)element >= \ + sizeof(*element) + element->datalen; \ + element = (void *)(element->data + element->datalen)) + +#define for_each_element_id(element, _id, data, datalen) \ + for_each_element(element, data, datalen) \ + if (element->id == (_id)) + +#define for_each_element_extid(element, extid, data, datalen) \ + for_each_element(element, data, datalen) \ + if (element->id == WLAN_EID_EXTENSION && \ + element->datalen > 0 && \ + element->data[0] == (extid)) + +#define for_each_subelement(sub, element) \ + for_each_element(sub, (element)->data, (element)->datalen) + +#define for_each_subelement_id(sub, id, element) \ + for_each_element_id(sub, id, (element)->data, (element)->datalen) + +#define for_each_subelement_extid(sub, extid, element) \ + for_each_element_extid(sub, extid, (element)->data, (element)->datalen) + +/** + * for_each_element_completed - determine if element parsing consumed all data + * @element: element pointer after for_each_element() or friends + * @data: same data pointer as passed to for_each_element() or friends + * @datalen: same data length as passed to for_each_element() or friends + * + * This function returns %true if all the data was parsed or considered + * while walking the elements. Only use this if your for_each_element() + * loop cannot be broken out of, otherwise it always returns %false. + * + * If some data was malformed, this returns %false since the last parsed + * element will not fill the whole remaining data. + */ +static inline bool for_each_element_completed(const struct element *element, + const void *data, size_t datalen) +{ + return (u8 *)element == (u8 *)data + datalen; +} + #endif /* LINUX_IEEE80211_H */ --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -484,6 +484,8 @@ const u8 *cfg80211_find_ie_match(u8 eid, const u8 *match, int match_len, int match_offset) { + const struct element *elem; + /* match_offset can't be smaller than 2, unless match_len is * zero, in which case match_offset must be zero as well. */ @@ -491,14 +493,10 @@ const u8 *cfg80211_find_ie_match(u8 eid, (!match_len && match_offset))) return NULL;
- while (len >= 2 && len >= ies[1] + 2) { - if ((ies[0] == eid) && - (ies[1] + 2 >= match_offset + match_len) && - !memcmp(ies + match_offset, match, match_len)) - return ies; - - len -= ies[1] + 2; - ies += ies[1] + 2; + for_each_element_id(elem, eid, ies, len) { + if (elem->datalen >= match_offset - 2 + match_len && + !memcmp(elem->data + match_offset - 2, match, match_len)) + return (void *)elem; }
return NULL;
From: Jouni Malinen j@w1.fi
commit 7388afe09143210f555bdd6c75035e9acc1fab96 upstream.
Enforce the first argument to be a correct type of a pointer to struct element and avoid unnecessary typecasts from const to non-const pointers (the change in validate_ie_attr() is needed to make this part work). In addition, avoid signed/unsigned comparison within for_each_element() and mark struct element packed just in case.
Signed-off-by: Jouni Malinen j@w1.fi Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/ieee80211.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
--- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -3189,16 +3189,16 @@ struct element { u8 id; u8 datalen; u8 data[]; -}; +} __packed;
/* element iteration helpers */ -#define for_each_element(element, _data, _datalen) \ - for (element = (void *)(_data); \ - (u8 *)(_data) + (_datalen) - (u8 *)element >= \ - sizeof(*element) && \ - (u8 *)(_data) + (_datalen) - (u8 *)element >= \ - sizeof(*element) + element->datalen; \ - element = (void *)(element->data + element->datalen)) +#define for_each_element(_elem, _data, _datalen) \ + for (_elem = (const struct element *)(_data); \ + (const u8 *)(_data) + (_datalen) - (const u8 *)_elem >= \ + (int)sizeof(*_elem) && \ + (const u8 *)(_data) + (_datalen) - (const u8 *)_elem >= \ + (int)sizeof(*_elem) + _elem->datalen; \ + _elem = (const struct element *)(_elem->data + _elem->datalen))
#define for_each_element_id(element, _id, data, datalen) \ for_each_element(element, data, datalen) \ @@ -3235,7 +3235,7 @@ struct element { static inline bool for_each_element_completed(const struct element *element, const void *data, size_t datalen) { - return (u8 *)element == (u8 *)data + datalen; + return (const u8 *)element == (const u8 *)data + datalen; }
#endif /* LINUX_IEEE80211_H */
From: Johannes Berg johannes.berg@intel.com
commit f88eb7c0d002a67ef31aeb7850b42ff69abc46dc upstream.
We currently don't validate the beacon head, i.e. the header, fixed part and elements that are to go in front of the TIM element. This means that the variable elements there can be malformed, e.g. have a length exceeding the buffer size, but most downstream code from this assumes that this has already been checked.
Add the necessary checks to the netlink policy.
Cc: stable@vger.kernel.org Fixes: ed1b6cc7f80f ("cfg80211/nl80211: add beacon settings") Link: https://lore.kernel.org/r/1569009255-I7ac7fbe9436e9d8733439eab8acbbd35e55c74... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/nl80211.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -200,6 +200,38 @@ cfg80211_get_dev_from_info(struct net *n return __cfg80211_rdev_from_attrs(netns, info->attrs); }
+static int validate_beacon_head(const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + const u8 *data = nla_data(attr); + unsigned int len = nla_len(attr); + const struct element *elem; + const struct ieee80211_mgmt *mgmt = (void *)data; + unsigned int fixedlen = offsetof(struct ieee80211_mgmt, + u.beacon.variable); + + if (len < fixedlen) + goto err; + + if (ieee80211_hdrlen(mgmt->frame_control) != + offsetof(struct ieee80211_mgmt, u.beacon)) + goto err; + + data += fixedlen; + len -= fixedlen; + + for_each_element(elem, data, len) { + /* nothing */ + } + + if (for_each_element_completed(elem, data, len)) + return 0; + +err: + NL_SET_ERR_MSG_ATTR(extack, attr, "malformed beacon head"); + return -EINVAL; +} + /* policy for the attributes */ static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_WIPHY] = { .type = NLA_U32 }, @@ -4016,6 +4048,12 @@ static int nl80211_parse_beacon(struct n memset(bcn, 0, sizeof(*bcn));
if (attrs[NL80211_ATTR_BEACON_HEAD]) { + int ret = validate_beacon_head(attrs[NL80211_ATTR_BEACON_HEAD], + NULL); + + if (ret) + return ret; + bcn->head = nla_data(attrs[NL80211_ATTR_BEACON_HEAD]); bcn->head_len = nla_len(attrs[NL80211_ATTR_BEACON_HEAD]); if (!bcn->head_len)
stable-rc/linux-4.19.y boot: 117 boots: 0 failed, 107 passed with 10 offline (v4.19.78-115-g4d84b0bb68d4)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.19.y/kernel/v4.19... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.19.y/kernel/v4.19.78-115...
Tree: stable-rc Branch: linux-4.19.y Git Describe: v4.19.78-115-g4d84b0bb68d4 Git Commit: 4d84b0bb68d49edd179af2b16d4b912c1568a182 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 70 unique boards, 22 SoC families, 16 builds out of 206
Offline Platforms:
arm:
qcom_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
davinci_all_defconfig: gcc-8 dm365evm,legacy: 1 offline lab
sunxi_defconfig: gcc-8 sun5i-r8-chip: 1 offline lab sun7i-a20-bananapi: 1 offline lab
multi_v7_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab sun7i-a20-bananapi: 1 offline lab
arm64:
defconfig: gcc-8 apq8016-sbc: 1 offline lab
--- For more info write to info@kernelci.org
On Thu, 10 Oct 2019 at 14:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.19.79 release. There are 114 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.79-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.19.79-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.19.y git commit: 4d84b0bb68d49edd179af2b16d4b912c1568a182 git describe: v4.19.78-115-g4d84b0bb68d4 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.78-11...
No regressions (compared to build v4.19.78)
No fixes (compared to build v4.19.78)
Ran 23612 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-fs-tests * network-basic-tests * ltp-open-posix-tests * kvm-unit-tests * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 10/10/19 1:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.79 release. There are 114 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
Build results: total: 156 pass: 156 fail: 0 Qemu test results: total: 390 pass: 390 fail: 0
Guenter
On Thu, Oct 10, 2019 at 10:35:07AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.79 release. There are 114 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.79-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled, booted, and no regressions found on my x86_64 system.
Thanks, Didik Setiawan
On 10/10/19 2:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.79 release. There are 114 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.79-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 10/10/2019 09:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.79 release. There are 114 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.79-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.19: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 32 tests: 32 pass, 0 fail
Linux version: 4.19.79-rc1-g4d84b0bb68d4 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
linux-stable-mirror@lists.linaro.org