This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.9.3-rc1
Pali Rohár pali@kernel.org phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Ricky Wu ricky_wu@realtek.com misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()
Pavel Begunkov asml.silence@gmail.com io_uring: don't reuse linked_timeout
Souptick Joarder jrdr.linux@gmail.com xen/gntdev.c: Mark pages as dirty
Jens Axboe axboe@kernel.dk mm: mark async iocb read as NOWAIT once some data has been copied
Geert Uytterhoeven geert+renesas@glider.be ata: sata_rcar: Fix DMA boundary mask
Grygorii Strashko grygorii.strashko@ti.com PM: runtime: Fix timer_expires data type on 32-bit arches
Peter Zijlstra peterz@infradead.org serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt
Paras Sharma parashar@codeaurora.org serial: qcom_geni_serial: To correct QUP Version detection logic
Chris Wilson chris@chris-wilson.co.uk drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex
Gustavo A. R. Silva gustavo@embeddedor.com mtd: lpddr: Fix bad logic in print_drs_error
Jason Gunthorpe jgg@ziepe.ca RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
Frederic Barrat fbarrat@linux.ibm.com cxl: Rework error message for incompatible slots
Jia-Ju Bai baijiaju@tsinghua.edu.cn p54: avoid accessing the data mapped to streaming DMA
Roberto Sassu roberto.sassu@huawei.com evm: Check size of security.evm before using it
Song Liu songliubraving@fb.com bpf: Fix comment for helper bpf_current_task_under_cgroup()
Miklos Szeredi mszeredi@redhat.com fuse: fix page dereference after free
Pali Rohár pali@kernel.org ata: ahci: mvebu: Make SATA PHY optional for Armada 3720
Pali Rohár pali@kernel.org PCI: aardvark: Fix initialization with old Marvell's Arm Trusted Firmware
Juergen Gross jgross@suse.com x86/xen: disable Firmware First mode for correctable memory errors
Thomas Gleixner tglx@linutronix.de x86/traps: Fix #DE Oops message regression
Kim Phillips kim.phillips@amd.com arch/x86/amd/ibs: Fix re-arming IBS Fetch
Gao Xiang hsiangkao@redhat.com erofs: avoid duplicated permission check for "trusted." xattrs
Leon Romanovsky leon@kernel.org net: protect tcf_block_unbind with block lock
Karsten Graul kgraul@linux.ibm.com net/smc: fix suppressed return code
Karsten Graul kgraul@linux.ibm.com net/smc: fix invalid return code in smcd_new_buf_create()
Tung Nguyen tung.q.nguyen@dektech.com.au tipc: fix memory leak caused by tipc_buf_append()
Arjun Roy arjunroy@google.com tcp: Prevent low rmem stalls with SO_RCVLOWAT.
Andrew Gabbasov andrew_gabbasov@mentor.com ravb: Fix bit fields checking in ravb_hwtstamp_get()
Heiner Kallweit hkallweit1@gmail.com r8169: fix issue with forced threading in combination with shared interrupts
Guillaume Nault gnault@redhat.com net/sched: act_mpls: Add softdep on mpls_gso.ko
Alex Elder elder@linaro.org net: ipa: command payloads already mapped
Zenghui Yu yuzenghui@huawei.com net: hns3: Clear the CMDQ registers before unmapping BAR region
Aleksandr Nogikh nogikh@google.com netem: fix zero division in tabledist
Amit Cohen amcohen@nvidia.com mlxsw: Only advertise link modes supported by both driver and device
Ido Schimmel idosch@nvidia.com mlxsw: core: Fix memory leak on module removal
Lijun Pan ljp@linux.ibm.com ibmvnic: fix ibmvnic_set_mac
Thomas Bogendoerfer tbogendoerfer@suse.de ibmveth: Fix use of ibmveth in a bridge.
Masahiro Fujiwara fujiwara.masahiro@gmail.com gtp: fix an use-before-init in gtp_newlink()
Raju Rangoju rajur@chelsio.com cxgb4: set up filter action after rewrites
Vinay Kumar Yadav vinay.yadav@chelsio.com chelsio/chtls: fix tls record info to user
Vinay Kumar Yadav vinay.yadav@chelsio.com chelsio/chtls: fix memory leaks in CPL handlers
Vinay Kumar Yadav vinay.yadav@chelsio.com chelsio/chtls: fix deadlock issue
Vasundhara Volam vasundhara-v.volam@broadcom.com bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
Vasundhara Volam vasundhara-v.volam@broadcom.com bnxt_en: Re-write PCI BARs after PCI fatal error.
Vasundhara Volam vasundhara-v.volam@broadcom.com bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
Vasundhara Volam vasundhara-v.volam@broadcom.com bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
Michael Chan michael.chan@broadcom.com bnxt_en: Check abort error state in bnxt_open_nic().
Michael Schaller misch@google.com efivarfs: Replace invalid slashes with exclamation marks in dentries.
Dan Williams dan.j.williams@intel.com x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
Dan Williams dan.j.williams@intel.com x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
Randy Dunlap rdunlap@infradead.org x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
Nick Desaulniers ndesaulniers@google.com arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
Marc Zyngier maz@kernel.org arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs
Marc Zyngier maz@kernel.org arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs
Kees Cook keescook@chromium.org fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum
Ard Biesheuvel ardb@kernel.org efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure
Rasmus Villemoes linux@rasmusvillemoes.dk scripts/setlocalversion: make git describe output more reliable
Matthew Wilcox (Oracle) willy@infradead.org io_uring: Convert advanced XArray uses to the normal API
Matthew Wilcox (Oracle) willy@infradead.org io_uring: Fix XArray usage in io_uring_add_task_file
Matthew Wilcox (Oracle) willy@infradead.org io_uring: Fix use of XArray in __io_uring_files_cancel
Jens Axboe axboe@kernel.dk io_uring: no need to call xa_destroy() on empty xarray
Hillf Danton hdanton@sina.com io-wq: fix use-after-free in io_wq_worker_running
Sebastian Andrzej Siewior bigeasy@linutronix.de io_wq: Make io_wqe::lock a raw_spinlock_t
Jens Axboe axboe@kernel.dk io_uring: reference ->nsproxy for file table commands
Jens Axboe axboe@kernel.dk io_uring: don't rely on weak ->files references
Jens Axboe axboe@kernel.dk io_uring: enable task/files specific overflow flushing
Jens Axboe axboe@kernel.dk io_uring: return cancelation status from poll/timeout/files handlers
Jens Axboe axboe@kernel.dk io_uring: unconditionally grab req->task
Jens Axboe axboe@kernel.dk io_uring: stash ctx task reference for SQPOLL
Jens Axboe axboe@kernel.dk io_uring: move dropping of files into separate helper
Jens Axboe axboe@kernel.dk io_uring: allow timeout/poll/files killing to take task into account
Saeed Mirzamohammadi saeed.mirzamohammadi@oracle.com netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create
Viresh Kumar viresh.kumar@linaro.org cpufreq: Improve code around unlisted freq check
-------------
Diffstat:
Makefile | 4 +- arch/arm64/Makefile | 4 +- arch/arm64/kernel/cpu_errata.c | 15 + arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/string.h | 2 - arch/powerpc/include/asm/uaccess.h | 40 +- arch/powerpc/lib/Makefile | 2 +- .../lib/{memcpy_mcsafe_64.S => copy_mc_64.S} | 4 +- arch/x86/Kconfig | 2 +- arch/x86/Kconfig.debug | 2 +- arch/x86/events/amd/ibs.c | 15 +- arch/x86/include/asm/copy_mc_test.h | 75 +++ arch/x86/include/asm/mce.h | 9 + arch/x86/include/asm/mcsafe_test.h | 75 --- arch/x86/include/asm/string_64.h | 32 -- arch/x86/include/asm/uaccess.h | 9 + arch/x86/include/asm/uaccess_64.h | 20 - arch/x86/kernel/cpu/mce/core.c | 8 +- arch/x86/kernel/quirks.c | 10 +- arch/x86/kernel/traps.c | 2 +- arch/x86/lib/Makefile | 1 + arch/x86/lib/copy_mc.c | 96 ++++ arch/x86/lib/copy_mc_64.S | 163 +++++++ arch/x86/lib/memcpy_64.S | 115 ----- arch/x86/lib/usercopy_64.c | 21 - arch/x86/pci/intel_mid_pci.c | 1 + arch/x86/xen/enlighten_pv.c | 9 + drivers/ata/ahci.h | 2 + drivers/ata/ahci_mvebu.c | 2 +- drivers/ata/libahci_platform.c | 2 +- drivers/ata/sata_rcar.c | 2 +- drivers/base/firmware_loader/fallback_platform.c | 2 +- drivers/cpufreq/cpufreq.c | 15 +- drivers/crypto/chelsio/chtls/chtls_cm.c | 29 +- drivers/crypto/chelsio/chtls/chtls_io.c | 7 +- drivers/firmware/efi/libstub/arm64-stub.c | 8 +- drivers/firmware/efi/libstub/fdt.c | 4 +- drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/infiniband/core/addr.c | 11 +- drivers/md/dm-writecache.c | 15 +- drivers/misc/cardreader/rtsx_pcr.c | 4 - drivers/misc/cxl/pci.c | 4 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 49 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 56 ++- drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h | 4 + .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +- drivers/net/ethernet/ibm/ibmveth.c | 6 - drivers/net/ethernet/ibm/ibmvnic.c | 8 +- drivers/net/ethernet/mellanox/mlxsw/core.c | 2 + drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 9 +- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 + .../net/ethernet/mellanox/mlxsw/spectrum_ethtool.c | 30 ++ drivers/net/ethernet/realtek/r8169_main.c | 4 +- drivers/net/ethernet/renesas/ravb_main.c | 10 +- drivers/net/gtp.c | 16 +- drivers/net/ipa/gsi_trans.c | 21 +- drivers/net/wireless/intersil/p54/p54pci.c | 4 +- drivers/nvdimm/claim.c | 2 +- drivers/nvdimm/pmem.c | 6 +- drivers/pci/controller/pci-aardvark.c | 4 +- drivers/phy/marvell/phy-mvebu-a3700-comphy.c | 14 +- drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 14 +- drivers/tty/serial/amba-pl011.c | 11 +- drivers/tty/serial/qcom_geni_serial.c | 2 +- drivers/xen/gntdev.c | 17 +- fs/efivarfs/super.c | 3 + fs/erofs/xattr.c | 2 - fs/exec.c | 6 + fs/file.c | 2 + fs/fuse/dev.c | 28 +- fs/io-wq.c | 172 +++---- fs/io-wq.h | 1 + fs/io_uring.c | 502 ++++++++++++++++----- include/linux/fs.h | 1 - include/linux/io_uring.h | 53 +++ include/linux/mtd/pfow.h | 2 +- include/linux/pm.h | 2 +- include/linux/qcom-geni-se.h | 3 + include/linux/sched.h | 5 + include/linux/string.h | 9 +- include/linux/uaccess.h | 13 + include/linux/uio.h | 10 +- include/net/netfilter/nf_tables.h | 6 + include/uapi/linux/bpf.h | 4 +- init/init_task.c | 3 + kernel/fork.c | 6 + lib/Kconfig | 7 +- lib/iov_iter.c | 48 +- mm/filemap.c | 8 + net/ipv4/tcp.c | 2 + net/ipv4/tcp_input.c | 3 +- net/netfilter/nf_tables_api.c | 6 +- net/netfilter/nf_tables_offload.c | 4 +- net/sched/act_mpls.c | 1 + net/sched/cls_api.c | 4 +- net/sched/sch_netem.c | 9 +- net/smc/smc_core.c | 6 +- net/tipc/msg.c | 5 +- scripts/setlocalversion | 21 +- security/integrity/evm/evm_main.c | 6 + tools/arch/x86/include/asm/mcsafe_test.h | 13 - tools/arch/x86/lib/memcpy_64.S | 115 ----- tools/include/uapi/linux/bpf.h | 4 +- tools/objtool/check.c | 5 +- tools/perf/bench/Build | 1 - tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 - tools/testing/nvdimm/test/nfit.c | 49 +- .../testing/selftests/powerpc/copyloops/.gitignore | 2 +- tools/testing/selftests/powerpc/copyloops/Makefile | 6 +- .../selftests/powerpc/copyloops/copy_mc_64.S | 242 ++++++++++ 111 files changed, 1638 insertions(+), 926 deletions(-)
From: Viresh Kumar viresh.kumar@linaro.org
commit 97148d0ae5303bcc18fcd1c9b968a9485292f32a upstream.
The cpufreq core checks if the frequency programmed by the bootloaders is not listed in the freq table and programs one from the table in such a case. This is done only if the driver has set the CPUFREQ_NEED_INITIAL_FREQ_CHECK flag.
Currently we print two separate messages, with almost the same content, and do this with a pr_warn() which may be a bit too much as the driver only asked us to check this as it expected this to be the case. Lower down the severity of the print message by switching to pr_info() instead and print a single message only.
Reported-by: Sumit Gupta sumitg@nvidia.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Reviewed-by: Sumit Gupta sumitg@nvidia.com Tested-by: Sumit Gupta sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: Jon Hunter jonathanh@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cpufreq/cpufreq.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1450,14 +1450,13 @@ static int cpufreq_online(unsigned int c */ if ((cpufreq_driver->flags & CPUFREQ_NEED_INITIAL_FREQ_CHECK) && has_target()) { + unsigned int old_freq = policy->cur; + /* Are we running at unknown frequency ? */ - ret = cpufreq_frequency_table_get_index(policy, policy->cur); + ret = cpufreq_frequency_table_get_index(policy, old_freq); if (ret == -EINVAL) { - /* Warn user and fix it */ - pr_warn("%s: CPU%d: Running at unlisted freq: %u KHz\n", - __func__, policy->cpu, policy->cur); - ret = __cpufreq_driver_target(policy, policy->cur - 1, - CPUFREQ_RELATION_L); + ret = __cpufreq_driver_target(policy, old_freq - 1, + CPUFREQ_RELATION_L);
/* * Reaching here after boot in a few seconds may not @@ -1465,8 +1464,8 @@ static int cpufreq_online(unsigned int c * frequency for longer duration. Hence, a BUG_ON(). */ BUG_ON(ret); - pr_warn("%s: CPU%d: Unlisted initial frequency changed to: %u KHz\n", - __func__, policy->cpu, policy->cur); + pr_info("%s: CPU%d: Running at unlisted initial frequency: %u KHz, changing to: %u KHz\n", + __func__, policy->cpu, old_freq, policy->cur); } }
From: Saeed Mirzamohammadi saeed.mirzamohammadi@oracle.com
commit 31cc578ae2de19c748af06d859019dced68e325d upstream.
This patch fixes the issue due to:
BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2 net/netfilter/nf_tables_offload.c:40 Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244
The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.
This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.
Add nft_expr_more() and use it to fix this problem.
Signed-off-by: Saeed Mirzamohammadi saeed.mirzamohammadi@oracle.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/netfilter/nf_tables.h | 6 ++++++ net/netfilter/nf_tables_api.c | 6 +++--- net/netfilter/nf_tables_offload.c | 4 ++-- 3 files changed, 11 insertions(+), 5 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -896,6 +896,12 @@ static inline struct nft_expr *nft_expr_ return (struct nft_expr *)&rule->data[rule->dlen]; }
+static inline bool nft_expr_more(const struct nft_rule *rule, + const struct nft_expr *expr) +{ + return expr != nft_expr_last(rule) && expr->ops; +} + static inline struct nft_userdata *nft_userdata(const struct nft_rule *rule) { return (void *)&rule->data[rule->dlen]; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -302,7 +302,7 @@ static void nft_rule_expr_activate(const struct nft_expr *expr;
expr = nft_expr_first(rule); - while (expr != nft_expr_last(rule) && expr->ops) { + while (nft_expr_more(rule, expr)) { if (expr->ops->activate) expr->ops->activate(ctx, expr);
@@ -317,7 +317,7 @@ static void nft_rule_expr_deactivate(con struct nft_expr *expr;
expr = nft_expr_first(rule); - while (expr != nft_expr_last(rule) && expr->ops) { + while (nft_expr_more(rule, expr)) { if (expr->ops->deactivate) expr->ops->deactivate(ctx, expr, phase);
@@ -3036,7 +3036,7 @@ static void nf_tables_rule_destroy(const * is called on error from nf_tables_newrule(). */ expr = nft_expr_first(rule); - while (expr != nft_expr_last(rule) && expr->ops) { + while (nft_expr_more(rule, expr)) { next = nft_expr_next(expr); nf_tables_expr_destroy(ctx, expr); expr = next; --- a/net/netfilter/nf_tables_offload.c +++ b/net/netfilter/nf_tables_offload.c @@ -37,7 +37,7 @@ struct nft_flow_rule *nft_flow_rule_crea struct nft_expr *expr;
expr = nft_expr_first(rule); - while (expr->ops && expr != nft_expr_last(rule)) { + while (nft_expr_more(rule, expr)) { if (expr->ops->offload_flags & NFT_OFFLOAD_F_ACTION) num_actions++;
@@ -61,7 +61,7 @@ struct nft_flow_rule *nft_flow_rule_crea ctx->net = net; ctx->dep.type = NFT_OFFLOAD_DEP_UNSPEC;
- while (expr->ops && expr != nft_expr_last(rule)) { + while (nft_expr_more(rule, expr)) { if (!expr->ops->offload) { err = -EOPNOTSUPP; goto err_out;
From: Jens Axboe axboe@kernel.dk
commit 07d3ca52b0056f25eef61b1c896d089f8d365468 upstream.
We currently cancel these when the ring exits, and we cancel all of them. This is in preparation for killing only the ones associated with a given task.
Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1226,13 +1226,26 @@ static void io_kill_timeout(struct io_ki } }
-static void io_kill_timeouts(struct io_ring_ctx *ctx) +static bool io_task_match(struct io_kiocb *req, struct task_struct *tsk) +{ + struct io_ring_ctx *ctx = req->ctx; + + if (!tsk || req->task == tsk) + return true; + if ((ctx->flags & IORING_SETUP_SQPOLL) && req->task == ctx->sqo_thread) + return true; + return false; +} + +static void io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk) { struct io_kiocb *req, *tmp;
spin_lock_irq(&ctx->completion_lock); - list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) - io_kill_timeout(req); + list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) { + if (io_task_match(req, tsk)) + io_kill_timeout(req); + } spin_unlock_irq(&ctx->completion_lock); }
@@ -5017,7 +5030,7 @@ static bool io_poll_remove_one(struct io return do_complete; }
-static void io_poll_remove_all(struct io_ring_ctx *ctx) +static void io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk) { struct hlist_node *tmp; struct io_kiocb *req; @@ -5028,8 +5041,10 @@ static void io_poll_remove_all(struct io struct hlist_head *list;
list = &ctx->cancel_hash[i]; - hlist_for_each_entry_safe(req, tmp, list, hash_node) - posted += io_poll_remove_one(req); + hlist_for_each_entry_safe(req, tmp, list, hash_node) { + if (io_task_match(req, tsk)) + posted += io_poll_remove_one(req); + } } spin_unlock_irq(&ctx->completion_lock);
@@ -7989,8 +8004,8 @@ static void io_ring_ctx_wait_and_kill(st percpu_ref_kill(&ctx->refs); mutex_unlock(&ctx->uring_lock);
- io_kill_timeouts(ctx); - io_poll_remove_all(ctx); + io_kill_timeouts(ctx, NULL); + io_poll_remove_all(ctx, NULL);
if (ctx->io_wq) io_wq_cancel_all(ctx->io_wq); @@ -8221,7 +8236,7 @@ static bool io_cancel_task_cb(struct io_ struct io_kiocb *req = container_of(work, struct io_kiocb, work); struct task_struct *task = data;
- return req->task == task; + return io_task_match(req, task); }
static int io_uring_flush(struct file *file, void *data)
From: Jens Axboe axboe@kernel.dk
commit 0444ce1e0b5967393447dcd5adbf2bb023a50aab upstream.
No functional changes in this patch, prep patch for grabbing references to the files_struct.
Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5648,6 +5648,20 @@ static int io_req_defer(struct io_kiocb return -EIOCBQUEUED; }
+static void io_req_drop_files(struct io_kiocb *req) +{ + struct io_ring_ctx *ctx = req->ctx; + unsigned long flags; + + spin_lock_irqsave(&ctx->inflight_lock, flags); + list_del(&req->inflight_entry); + if (waitqueue_active(&ctx->inflight_wait)) + wake_up(&ctx->inflight_wait); + spin_unlock_irqrestore(&ctx->inflight_lock, flags); + req->flags &= ~REQ_F_INFLIGHT; + req->work.files = NULL; +} + static void __io_clean_op(struct io_kiocb *req) { struct io_async_ctx *io = req->io; @@ -5697,17 +5711,8 @@ static void __io_clean_op(struct io_kioc req->flags &= ~REQ_F_NEED_CLEANUP; }
- if (req->flags & REQ_F_INFLIGHT) { - struct io_ring_ctx *ctx = req->ctx; - unsigned long flags; - - spin_lock_irqsave(&ctx->inflight_lock, flags); - list_del(&req->inflight_entry); - if (waitqueue_active(&ctx->inflight_wait)) - wake_up(&ctx->inflight_wait); - spin_unlock_irqrestore(&ctx->inflight_lock, flags); - req->flags &= ~REQ_F_INFLIGHT; - } + if (req->flags & REQ_F_INFLIGHT) + io_req_drop_files(req); }
static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
From: Jens Axboe axboe@kernel.dk
commit 2aede0e417db846793c276c7a1bbf7262c8349b0 upstream.
We can grab a reference to the task instead of stashing away the task files_struct. This is doable without creating a circular reference between the ring fd and the task itself.
Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 47 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 34 insertions(+), 13 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -265,7 +265,16 @@ struct io_ring_ctx { /* IO offload */ struct io_wq *io_wq; struct task_struct *sqo_thread; /* if using sq thread polling */ - struct mm_struct *sqo_mm; + + /* + * For SQPOLL usage - we hold a reference to the parent task, so we + * have access to the ->files + */ + struct task_struct *sqo_task; + + /* Only used for accounting purposes */ + struct mm_struct *mm_account; + wait_queue_head_t sqo_wait;
/* @@ -969,9 +978,10 @@ static int __io_sq_thread_acquire_mm(str { if (!current->mm) { if (unlikely(!(ctx->flags & IORING_SETUP_SQPOLL) || - !mmget_not_zero(ctx->sqo_mm))) + !ctx->sqo_task->mm || + !mmget_not_zero(ctx->sqo_task->mm))) return -EFAULT; - kthread_use_mm(ctx->sqo_mm); + kthread_use_mm(ctx->sqo_task->mm); }
return 0; @@ -7591,11 +7601,11 @@ static void io_unaccount_mem(struct io_r if (ctx->limit_mem) __io_unaccount_mem(ctx->user, nr_pages);
- if (ctx->sqo_mm) { + if (ctx->mm_account) { if (acct == ACCT_LOCKED) - ctx->sqo_mm->locked_vm -= nr_pages; + ctx->mm_account->locked_vm -= nr_pages; else if (acct == ACCT_PINNED) - atomic64_sub(nr_pages, &ctx->sqo_mm->pinned_vm); + atomic64_sub(nr_pages, &ctx->mm_account->pinned_vm); } }
@@ -7610,11 +7620,11 @@ static int io_account_mem(struct io_ring return ret; }
- if (ctx->sqo_mm) { + if (ctx->mm_account) { if (acct == ACCT_LOCKED) - ctx->sqo_mm->locked_vm += nr_pages; + ctx->mm_account->locked_vm += nr_pages; else if (acct == ACCT_PINNED) - atomic64_add(nr_pages, &ctx->sqo_mm->pinned_vm); + atomic64_add(nr_pages, &ctx->mm_account->pinned_vm); }
return 0; @@ -7918,9 +7928,12 @@ static void io_ring_ctx_free(struct io_r { io_finish_async(ctx); io_sqe_buffer_unregister(ctx); - if (ctx->sqo_mm) { - mmdrop(ctx->sqo_mm); - ctx->sqo_mm = NULL; + + if (ctx->sqo_task) { + put_task_struct(ctx->sqo_task); + ctx->sqo_task = NULL; + mmdrop(ctx->mm_account); + ctx->mm_account = NULL; }
io_sqe_files_unregister(ctx); @@ -8665,8 +8678,16 @@ static int io_uring_create(unsigned entr ctx->user = user; ctx->creds = get_current_cred();
+ ctx->sqo_task = get_task_struct(current); + + /* + * This is just grabbed for accounting purposes. When a process exits, + * the mm is exited and dropped before the files, hence we need to hang + * on to this mm purely for the purposes of being able to unaccount + * memory (locked/pinned vm). It's not used for anything else. + */ mmgrab(current->mm); - ctx->sqo_mm = current->mm; + ctx->mm_account = current->mm;
/* * Account memory _before_ installing the file descriptor. Once
From: Jens Axboe axboe@kernel.dk
commit e3bc8e9dad7f2f83cc807111d4472164c9210153 upstream.
Sometimes we assign a weak reference to it, sometimes we grab a reference to it. Clean this up and make it unconditional, and drop the flag related to tracking this state.
Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 47 +++++++++-------------------------------------- 1 file changed, 9 insertions(+), 38 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -553,7 +553,6 @@ enum { REQ_F_BUFFER_SELECTED_BIT, REQ_F_NO_FILE_TABLE_BIT, REQ_F_WORK_INITIALIZED_BIT, - REQ_F_TASK_PINNED_BIT,
/* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -599,8 +598,6 @@ enum { REQ_F_NO_FILE_TABLE = BIT(REQ_F_NO_FILE_TABLE_BIT), /* io_wq_work is initialized */ REQ_F_WORK_INITIALIZED = BIT(REQ_F_WORK_INITIALIZED_BIT), - /* req->task is refcounted */ - REQ_F_TASK_PINNED = BIT(REQ_F_TASK_PINNED_BIT), };
struct async_poll { @@ -942,14 +939,6 @@ struct sock *io_uring_get_socket(struct } EXPORT_SYMBOL(io_uring_get_socket);
-static void io_get_req_task(struct io_kiocb *req) -{ - if (req->flags & REQ_F_TASK_PINNED) - return; - get_task_struct(req->task); - req->flags |= REQ_F_TASK_PINNED; -} - static inline void io_clean_op(struct io_kiocb *req) { if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED | @@ -957,13 +946,6 @@ static inline void io_clean_op(struct io __io_clean_op(req); }
-/* not idempotent -- it doesn't clear REQ_F_TASK_PINNED */ -static void __io_put_req_task(struct io_kiocb *req) -{ - if (req->flags & REQ_F_TASK_PINNED) - put_task_struct(req->task); -} - static void io_sq_thread_drop_mm(void) { struct mm_struct *mm = current->mm; @@ -1589,7 +1571,8 @@ static void __io_free_req_finish(struct { struct io_ring_ctx *ctx = req->ctx;
- __io_put_req_task(req); + put_task_struct(req->task); + if (likely(!io_is_fallback_req(req))) kmem_cache_free(req_cachep, req); else @@ -1916,16 +1899,13 @@ static void io_req_free_batch(struct req if (req->flags & REQ_F_LINK_HEAD) io_queue_next(req);
- if (req->flags & REQ_F_TASK_PINNED) { - if (req->task != rb->task) { - if (rb->task) - put_task_struct_many(rb->task, rb->task_refs); - rb->task = req->task; - rb->task_refs = 0; - } - rb->task_refs++; - req->flags &= ~REQ_F_TASK_PINNED; + if (req->task != rb->task) { + if (rb->task) + put_task_struct_many(rb->task, rb->task_refs); + rb->task = req->task; + rb->task_refs = 0; } + rb->task_refs++;
WARN_ON_ONCE(io_dismantle_req(req)); rb->reqs[rb->to_free++] = req; @@ -2550,9 +2530,6 @@ static int io_prep_rw(struct io_kiocb *r if (kiocb->ki_flags & IOCB_NOWAIT) req->flags |= REQ_F_NOWAIT;
- if (kiocb->ki_flags & IOCB_DIRECT) - io_get_req_task(req); - if (force_nonblock) kiocb->ki_flags |= IOCB_NOWAIT;
@@ -2564,7 +2541,6 @@ static int io_prep_rw(struct io_kiocb *r kiocb->ki_flags |= IOCB_HIPRI; kiocb->ki_complete = io_complete_rw_iopoll; req->iopoll_completed = 0; - io_get_req_task(req); } else { if (kiocb->ki_flags & IOCB_HIPRI) return -EINVAL; @@ -3132,8 +3108,6 @@ static bool io_rw_should_retry(struct io kiocb->ki_flags |= IOCB_WAITQ; kiocb->ki_flags &= ~IOCB_NOWAIT; kiocb->ki_waitq = wait; - - io_get_req_task(req); return true; }
@@ -4965,7 +4939,6 @@ static bool io_arm_poll_handler(struct i apoll->double_poll = NULL;
req->flags |= REQ_F_POLLED; - io_get_req_task(req); req->apoll = apoll; INIT_HLIST_NODE(&req->hash_node);
@@ -5148,8 +5121,6 @@ static int io_poll_add_prep(struct io_ki #endif poll->events = demangle_poll(events) | EPOLLERR | EPOLLHUP | (events & EPOLLEXCLUSIVE); - - io_get_req_task(req); return 0; }
@@ -6336,7 +6307,6 @@ static int io_submit_sqe(struct io_kiocb return ret; } trace_io_uring_link(ctx, req, head); - io_get_req_task(req); list_add_tail(&req->link_list, &head->link_list);
/* last request of a link, enqueue the link */ @@ -6461,6 +6431,7 @@ static int io_init_req(struct io_ring_ct /* one is dropped after submission, the other at completion */ refcount_set(&req->refs, 2); req->task = current; + get_task_struct(req->task); req->result = 0;
if (unlikely(req->opcode >= IORING_OP_LAST))
From: Jens Axboe axboe@kernel.dk
commit 76e1b6427fd8246376a97e3227049d49188dfb9c upstream.
Return whether we found and canceled requests or not. This is in preparation for using this information, no functional changes in this patch.
Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1229,16 +1229,23 @@ static bool io_task_match(struct io_kioc return false; }
-static void io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk) +/* + * Returns true if we found and killed one or more timeouts + */ +static bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk) { struct io_kiocb *req, *tmp; + int canceled = 0;
spin_lock_irq(&ctx->completion_lock); list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) { - if (io_task_match(req, tsk)) + if (io_task_match(req, tsk)) { io_kill_timeout(req); + canceled++; + } } spin_unlock_irq(&ctx->completion_lock); + return canceled != 0; }
static void __io_queue_deferred(struct io_ring_ctx *ctx) @@ -5013,7 +5020,10 @@ static bool io_poll_remove_one(struct io return do_complete; }
-static void io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk) +/* + * Returns true if we found and killed one or more poll requests + */ +static bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk) { struct hlist_node *tmp; struct io_kiocb *req; @@ -5033,6 +5043,8 @@ static void io_poll_remove_all(struct io
if (posted) io_cqring_ev_posted(ctx); + + return posted != 0; }
static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr) @@ -8178,11 +8190,14 @@ static void io_cancel_defer_files(struct } }
-static void io_uring_cancel_files(struct io_ring_ctx *ctx, +/* + * Returns true if we found and killed one or more files pinning requests + */ +static bool io_uring_cancel_files(struct io_ring_ctx *ctx, struct files_struct *files) { if (list_empty_careful(&ctx->inflight_list)) - return; + return false;
io_cancel_defer_files(ctx, files); /* cancel all at once, should be faster than doing it one by one*/ @@ -8218,6 +8233,8 @@ static void io_uring_cancel_files(struct schedule(); finish_wait(&ctx->inflight_wait, &wait); } + + return true; }
static bool io_cancel_task_cb(struct io_wq_work *work, void *data)
From: Jens Axboe axboe@kernel.dk
commit e6c8aa9ac33bd7c968af7816240fc081401fddcd upstream.
This allows us to selectively flush out pending overflows, depending on the task and/or files_struct being passed in.
No intended functional changes in this patch.
Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1344,12 +1344,24 @@ static void io_cqring_mark_overflow(stru } }
+static inline bool io_match_files(struct io_kiocb *req, + struct files_struct *files) +{ + if (!files) + return true; + if (req->flags & REQ_F_WORK_INITIALIZED) + return req->work.files == files; + return false; +} + /* Returns true if there are no backlogged entries after the flush */ -static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) +static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force, + struct task_struct *tsk, + struct files_struct *files) { struct io_rings *rings = ctx->rings; + struct io_kiocb *req, *tmp; struct io_uring_cqe *cqe; - struct io_kiocb *req; unsigned long flags; LIST_HEAD(list);
@@ -1368,13 +1380,16 @@ static bool io_cqring_overflow_flush(str ctx->cq_overflow_flushed = 1;
cqe = NULL; - while (!list_empty(&ctx->cq_overflow_list)) { + list_for_each_entry_safe(req, tmp, &ctx->cq_overflow_list, compl.list) { + if (tsk && req->task != tsk) + continue; + if (!io_match_files(req, files)) + continue; + cqe = io_get_cqring(ctx); if (!cqe && !force) break;
- req = list_first_entry(&ctx->cq_overflow_list, struct io_kiocb, - compl.list); list_move(&req->compl.list, &list); if (cqe) { WRITE_ONCE(cqe->user_data, req->user_data); @@ -1988,7 +2003,7 @@ static unsigned io_cqring_events(struct if (noflush && !list_empty(&ctx->cq_overflow_list)) return -1U;
- io_cqring_overflow_flush(ctx, false); + io_cqring_overflow_flush(ctx, false, NULL, NULL); }
/* See comment at the top of this file */ @@ -6489,7 +6504,7 @@ static int io_submit_sqes(struct io_ring /* if we have a backlog and couldn't flush it all, return BUSY */ if (test_bit(0, &ctx->sq_check_overflow)) { if (!list_empty(&ctx->cq_overflow_list) && - !io_cqring_overflow_flush(ctx, false)) + !io_cqring_overflow_flush(ctx, false, NULL, NULL)) return -EBUSY; }
@@ -7993,7 +8008,7 @@ static void io_ring_exit_work(struct wor */ do { if (ctx->rings) - io_cqring_overflow_flush(ctx, true); + io_cqring_overflow_flush(ctx, true, NULL, NULL); io_iopoll_try_reap_events(ctx); } while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20)); io_ring_ctx_free(ctx); @@ -8013,7 +8028,7 @@ static void io_ring_ctx_wait_and_kill(st
/* if we failed setting up the ctx, we might not have any rings */ if (ctx->rings) - io_cqring_overflow_flush(ctx, true); + io_cqring_overflow_flush(ctx, true, NULL, NULL); io_iopoll_try_reap_events(ctx); idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
@@ -8069,12 +8084,6 @@ static bool io_match_link(struct io_kioc return false; }
-static inline bool io_match_files(struct io_kiocb *req, - struct files_struct *files) -{ - return (req->flags & REQ_F_WORK_INITIALIZED) && req->work.files == files; -} - static bool io_match_link_files(struct io_kiocb *req, struct files_struct *files) { @@ -8365,7 +8374,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned ret = 0; if (ctx->flags & IORING_SETUP_SQPOLL) { if (!list_empty_careful(&ctx->cq_overflow_list)) - io_cqring_overflow_flush(ctx, false); + io_cqring_overflow_flush(ctx, false, NULL, NULL); if (flags & IORING_ENTER_SQ_WAKEUP) wake_up(&ctx->sqo_wait); submitted = to_submit;
From: Jens Axboe axboe@kernel.dk
commit 0f2122045b946241a9e549c2a76cea54fa58a7ff upstream.
Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking.
With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed.
Cc: stable@vger.kernel.org # v5.5+ Reviewed-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exec.c | 6 fs/file.c | 2 fs/io_uring.c | 306 +++++++++++++++++++++++++++++++++++++++++------ include/linux/io_uring.h | 53 ++++++++ include/linux/sched.h | 5 init/init_task.c | 3 kernel/fork.c | 6 7 files changed, 344 insertions(+), 37 deletions(-) create mode 100644 include/linux/io_uring.h
--- a/fs/exec.c +++ b/fs/exec.c @@ -62,6 +62,7 @@ #include <linux/oom.h> #include <linux/compat.h> #include <linux/vmalloc.h> +#include <linux/io_uring.h>
#include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -1895,6 +1896,11 @@ static int bprm_execve(struct linux_binp struct files_struct *displaced; int retval;
+ /* + * Cancel any io_uring activity across execve + */ + io_uring_task_cancel(); + retval = unshare_files(&displaced); if (retval) return retval; --- a/fs/file.c +++ b/fs/file.c @@ -21,6 +21,7 @@ #include <linux/rcupdate.h> #include <linux/close_range.h> #include <net/sock.h> +#include <linux/io_uring.h>
unsigned int sysctl_nr_open __read_mostly = 1024*1024; unsigned int sysctl_nr_open_min = BITS_PER_LONG; @@ -452,6 +453,7 @@ void exit_files(struct task_struct *tsk) struct files_struct * files = tsk->files;
if (files) { + io_uring_files_cancel(files); task_lock(tsk); tsk->files = NULL; task_unlock(tsk); --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -79,6 +79,7 @@ #include <linux/splice.h> #include <linux/task_work.h> #include <linux/pagemap.h> +#include <linux/io_uring.h>
#define CREATE_TRACE_POINTS #include <trace/events/io_uring.h> @@ -284,8 +285,6 @@ struct io_ring_ctx { */ struct fixed_file_data *file_data; unsigned nr_user_files; - int ring_fd; - struct file *ring_file;
/* if used, fixed mapped user buffers */ unsigned nr_user_bufs; @@ -1433,7 +1432,12 @@ static void __io_cqring_fill_event(struc WRITE_ONCE(cqe->user_data, req->user_data); WRITE_ONCE(cqe->res, res); WRITE_ONCE(cqe->flags, cflags); - } else if (ctx->cq_overflow_flushed) { + } else if (ctx->cq_overflow_flushed || req->task->io_uring->in_idle) { + /* + * If we're in ring overflow flush mode, or in task cancel mode, + * then we cannot store the request for later flushing, we need + * to drop it on the floor. + */ WRITE_ONCE(ctx->rings->cq_overflow, atomic_inc_return(&ctx->cached_cq_overflow)); } else { @@ -1591,8 +1595,12 @@ static bool io_dismantle_req(struct io_k
static void __io_free_req_finish(struct io_kiocb *req) { + struct io_uring_task *tctx = req->task->io_uring; struct io_ring_ctx *ctx = req->ctx;
+ atomic_long_inc(&tctx->req_complete); + if (tctx->in_idle) + wake_up(&tctx->wait); put_task_struct(req->task);
if (likely(!io_is_fallback_req(req))) @@ -1907,6 +1915,7 @@ static void io_req_free_batch_finish(str if (rb->to_free) __io_req_free_batch_flush(ctx, rb); if (rb->task) { + atomic_long_add(rb->task_refs, &rb->task->io_uring->req_complete); put_task_struct_many(rb->task, rb->task_refs); rb->task = NULL; } @@ -1922,8 +1931,10 @@ static void io_req_free_batch(struct req io_queue_next(req);
if (req->task != rb->task) { - if (rb->task) + if (rb->task) { + atomic_long_add(rb->task_refs, &rb->task->io_uring->req_complete); put_task_struct_many(rb->task, rb->task_refs); + } rb->task = req->task; rb->task_refs = 0; } @@ -3978,8 +3989,7 @@ static int io_close_prep(struct io_kiocb return -EBADF;
req->close.fd = READ_ONCE(sqe->fd); - if ((req->file && req->file->f_op == &io_uring_fops) || - req->close.fd == req->ctx->ring_fd) + if ((req->file && req->file->f_op == &io_uring_fops)) return -EBADF;
req->close.put_file = NULL; @@ -5667,6 +5677,7 @@ static void io_req_drop_files(struct io_ wake_up(&ctx->inflight_wait); spin_unlock_irqrestore(&ctx->inflight_lock, flags); req->flags &= ~REQ_F_INFLIGHT; + put_files_struct(req->work.files); req->work.files = NULL; }
@@ -6067,34 +6078,20 @@ static int io_req_set_file(struct io_sub
static int io_grab_files(struct io_kiocb *req) { - int ret = -EBADF; struct io_ring_ctx *ctx = req->ctx;
io_req_init_async(req);
if (req->work.files || (req->flags & REQ_F_NO_FILE_TABLE)) return 0; - if (!ctx->ring_file) - return -EBADF;
- rcu_read_lock(); + req->work.files = get_files_struct(current); + req->flags |= REQ_F_INFLIGHT; + spin_lock_irq(&ctx->inflight_lock); - /* - * We use the f_ops->flush() handler to ensure that we can flush - * out work accessing these files if the fd is closed. Check if - * the fd has changed since we started down this path, and disallow - * this operation if it has. - */ - if (fcheck(ctx->ring_fd) == ctx->ring_file) { - list_add(&req->inflight_entry, &ctx->inflight_list); - req->flags |= REQ_F_INFLIGHT; - req->work.files = current->files; - ret = 0; - } + list_add(&req->inflight_entry, &ctx->inflight_list); spin_unlock_irq(&ctx->inflight_lock); - rcu_read_unlock(); - - return ret; + return 0; }
static inline int io_prep_work_files(struct io_kiocb *req) @@ -6459,6 +6456,7 @@ static int io_init_req(struct io_ring_ct refcount_set(&req->refs, 2); req->task = current; get_task_struct(req->task); + atomic_long_inc(&req->task->io_uring->req_issue); req->result = 0;
if (unlikely(req->opcode >= IORING_OP_LAST)) @@ -6494,8 +6492,7 @@ static int io_init_req(struct io_ring_ct return io_req_set_file(state, req, READ_ONCE(sqe->fd)); }
-static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, - struct file *ring_file, int ring_fd) +static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) { struct io_submit_state state; struct io_kiocb *link = NULL; @@ -6516,9 +6513,6 @@ static int io_submit_sqes(struct io_ring
io_submit_state_start(&state, ctx, nr);
- ctx->ring_fd = ring_fd; - ctx->ring_file = ring_file; - for (i = 0; i < nr; i++) { const struct io_uring_sqe *sqe; struct io_kiocb *req; @@ -6687,7 +6681,7 @@ static int io_sq_thread(void *data)
mutex_lock(&ctx->uring_lock); if (likely(!percpu_ref_is_dying(&ctx->refs))) - ret = io_submit_sqes(ctx, to_submit, NULL, -1); + ret = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock); timeout = jiffies + ctx->sq_thread_idle; } @@ -7516,6 +7510,34 @@ out_fput: return ret; }
+static int io_uring_alloc_task_context(struct task_struct *task) +{ + struct io_uring_task *tctx; + + tctx = kmalloc(sizeof(*tctx), GFP_KERNEL); + if (unlikely(!tctx)) + return -ENOMEM; + + xa_init(&tctx->xa); + init_waitqueue_head(&tctx->wait); + tctx->last = NULL; + tctx->in_idle = 0; + atomic_long_set(&tctx->req_issue, 0); + atomic_long_set(&tctx->req_complete, 0); + task->io_uring = tctx; + return 0; +} + +void __io_uring_free(struct task_struct *tsk) +{ + struct io_uring_task *tctx = tsk->io_uring; + + WARN_ON_ONCE(!xa_empty(&tctx->xa)); + xa_destroy(&tctx->xa); + kfree(tctx); + tsk->io_uring = NULL; +} + static int io_sq_offload_start(struct io_ring_ctx *ctx, struct io_uring_params *p) { @@ -7551,6 +7573,9 @@ static int io_sq_offload_start(struct io ctx->sqo_thread = NULL; goto err; } + ret = io_uring_alloc_task_context(ctx->sqo_thread); + if (ret) + goto err; wake_up_process(ctx->sqo_thread); } else if (p->flags & IORING_SETUP_SQ_AFF) { /* Can't have SQ_AFF without SQPOLL */ @@ -8063,7 +8088,7 @@ static bool io_wq_files_match(struct io_ { struct files_struct *files = data;
- return work->files == files; + return !files || work->files == files; }
/* @@ -8218,7 +8243,7 @@ static bool io_uring_cancel_files(struct
spin_lock_irq(&ctx->inflight_lock); list_for_each_entry(req, &ctx->inflight_list, inflight_entry) { - if (req->work.files != files) + if (files && req->work.files != files) continue; /* req is being completed, ignore */ if (!refcount_inc_not_zero(&req->refs)) @@ -8254,18 +8279,217 @@ static bool io_cancel_task_cb(struct io_ return io_task_match(req, task); }
+static bool __io_uring_cancel_task_requests(struct io_ring_ctx *ctx, + struct task_struct *task, + struct files_struct *files) +{ + bool ret; + + ret = io_uring_cancel_files(ctx, files); + if (!files) { + enum io_wq_cancel cret; + + cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, task, true); + if (cret != IO_WQ_CANCEL_NOTFOUND) + ret = true; + + /* SQPOLL thread does its own polling */ + if (!(ctx->flags & IORING_SETUP_SQPOLL)) { + while (!list_empty_careful(&ctx->iopoll_list)) { + io_iopoll_try_reap_events(ctx); + ret = true; + } + } + + ret |= io_poll_remove_all(ctx, task); + ret |= io_kill_timeouts(ctx, task); + } + + return ret; +} + +/* + * We need to iteratively cancel requests, in case a request has dependent + * hard links. These persist even for failure of cancelations, hence keep + * looping until none are found. + */ +static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx, + struct files_struct *files) +{ + struct task_struct *task = current; + + if (ctx->flags & IORING_SETUP_SQPOLL) + task = ctx->sqo_thread; + + io_cqring_overflow_flush(ctx, true, task, files); + + while (__io_uring_cancel_task_requests(ctx, task, files)) { + io_run_task_work(); + cond_resched(); + } +} + +/* + * Note that this task has used io_uring. We use it for cancelation purposes. + */ +static int io_uring_add_task_file(struct file *file) +{ + if (unlikely(!current->io_uring)) { + int ret; + + ret = io_uring_alloc_task_context(current); + if (unlikely(ret)) + return ret; + } + if (current->io_uring->last != file) { + XA_STATE(xas, ¤t->io_uring->xa, (unsigned long) file); + void *old; + + rcu_read_lock(); + old = xas_load(&xas); + if (old != file) { + get_file(file); + xas_lock(&xas); + xas_store(&xas, file); + xas_unlock(&xas); + } + rcu_read_unlock(); + current->io_uring->last = file; + } + + return 0; +} + +/* + * Remove this io_uring_file -> task mapping. + */ +static void io_uring_del_task_file(struct file *file) +{ + struct io_uring_task *tctx = current->io_uring; + XA_STATE(xas, &tctx->xa, (unsigned long) file); + + if (tctx->last == file) + tctx->last = NULL; + + xas_lock(&xas); + file = xas_store(&xas, NULL); + xas_unlock(&xas); + + if (file) + fput(file); +} + +static void __io_uring_attempt_task_drop(struct file *file) +{ + XA_STATE(xas, ¤t->io_uring->xa, (unsigned long) file); + struct file *old; + + rcu_read_lock(); + old = xas_load(&xas); + rcu_read_unlock(); + + if (old == file) + io_uring_del_task_file(file); +} + +/* + * Drop task note for this file if we're the only ones that hold it after + * pending fput() + */ +static void io_uring_attempt_task_drop(struct file *file, bool exiting) +{ + if (!current->io_uring) + return; + /* + * fput() is pending, will be 2 if the only other ref is our potential + * task file note. If the task is exiting, drop regardless of count. + */ + if (!exiting && atomic_long_read(&file->f_count) != 2) + return; + + __io_uring_attempt_task_drop(file); +} + +void __io_uring_files_cancel(struct files_struct *files) +{ + struct io_uring_task *tctx = current->io_uring; + XA_STATE(xas, &tctx->xa, 0); + + /* make sure overflow events are dropped */ + tctx->in_idle = true; + + do { + struct io_ring_ctx *ctx; + struct file *file; + + xas_lock(&xas); + file = xas_next_entry(&xas, ULONG_MAX); + xas_unlock(&xas); + + if (!file) + break; + + ctx = file->private_data; + + io_uring_cancel_task_requests(ctx, files); + if (files) + io_uring_del_task_file(file); + } while (1); +} + +static inline bool io_uring_task_idle(struct io_uring_task *tctx) +{ + return atomic_long_read(&tctx->req_issue) == + atomic_long_read(&tctx->req_complete); +} + +/* + * Find any io_uring fd that this task has registered or done IO on, and cancel + * requests. + */ +void __io_uring_task_cancel(void) +{ + struct io_uring_task *tctx = current->io_uring; + DEFINE_WAIT(wait); + long completions; + + /* make sure overflow events are dropped */ + tctx->in_idle = true; + + while (!io_uring_task_idle(tctx)) { + /* read completions before cancelations */ + completions = atomic_long_read(&tctx->req_complete); + __io_uring_files_cancel(NULL); + + prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE); + + /* + * If we've seen completions, retry. This avoids a race where + * a completion comes in before we did prepare_to_wait(). + */ + if (completions != atomic_long_read(&tctx->req_complete)) + continue; + if (io_uring_task_idle(tctx)) + break; + schedule(); + } + + finish_wait(&tctx->wait, &wait); + tctx->in_idle = false; +} + static int io_uring_flush(struct file *file, void *data) { struct io_ring_ctx *ctx = file->private_data;
- io_uring_cancel_files(ctx, data); - /* * If the task is going away, cancel work it may have pending */ if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) - io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, current, true); + data = NULL;
+ io_uring_cancel_task_requests(ctx, data); + io_uring_attempt_task_drop(file, !data); return 0; }
@@ -8379,8 +8603,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned wake_up(&ctx->sqo_wait); submitted = to_submit; } else if (to_submit) { + ret = io_uring_add_task_file(f.file); + if (unlikely(ret)) + goto out; mutex_lock(&ctx->uring_lock); - submitted = io_submit_sqes(ctx, to_submit, f.file, fd); + submitted = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock);
if (submitted != to_submit) @@ -8590,6 +8817,7 @@ static int io_uring_get_fd(struct io_rin file = anon_inode_getfile("[io_uring]", &io_uring_fops, ctx, O_RDWR | O_CLOEXEC); if (IS_ERR(file)) { +err_fd: put_unused_fd(ret); ret = PTR_ERR(file); goto err; @@ -8598,6 +8826,10 @@ static int io_uring_get_fd(struct io_rin #if defined(CONFIG_UNIX) ctx->ring_sock->file = file; #endif + if (unlikely(io_uring_add_task_file(file))) { + file = ERR_PTR(-ENOMEM); + goto err_fd; + } fd_install(ret, file); return ret; err: --- /dev/null +++ b/include/linux/io_uring.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_IO_URING_H +#define _LINUX_IO_URING_H + +#include <linux/sched.h> +#include <linux/xarray.h> +#include <linux/percpu-refcount.h> + +struct io_uring_task { + /* submission side */ + struct xarray xa; + struct wait_queue_head wait; + struct file *last; + atomic_long_t req_issue; + + /* completion side */ + bool in_idle ____cacheline_aligned_in_smp; + atomic_long_t req_complete; +}; + +#if defined(CONFIG_IO_URING) +void __io_uring_task_cancel(void); +void __io_uring_files_cancel(struct files_struct *files); +void __io_uring_free(struct task_struct *tsk); + +static inline void io_uring_task_cancel(void) +{ + if (current->io_uring && !xa_empty(¤t->io_uring->xa)) + __io_uring_task_cancel(); +} +static inline void io_uring_files_cancel(struct files_struct *files) +{ + if (current->io_uring && !xa_empty(¤t->io_uring->xa)) + __io_uring_files_cancel(files); +} +static inline void io_uring_free(struct task_struct *tsk) +{ + if (tsk->io_uring) + __io_uring_free(tsk); +} +#else +static inline void io_uring_task_cancel(void) +{ +} +static inline void io_uring_files_cancel(struct files_struct *files) +{ +} +static inline void io_uring_free(struct task_struct *tsk) +{ +} +#endif + +#endif --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -63,6 +63,7 @@ struct sighand_struct; struct signal_struct; struct task_delay_info; struct task_group; +struct io_uring_task;
/* * Task state bitmask. NOTE! These bits are also @@ -935,6 +936,10 @@ struct task_struct { /* Open file information: */ struct files_struct *files;
+#ifdef CONFIG_IO_URING + struct io_uring_task *io_uring; +#endif + /* Namespaces: */ struct nsproxy *nsproxy;
--- a/init/init_task.c +++ b/init/init_task.c @@ -114,6 +114,9 @@ struct task_struct init_task .thread = INIT_THREAD, .fs = &init_fs, .files = &init_files, +#ifdef CONFIG_IO_URING + .io_uring = NULL, +#endif .signal = &init_signals, .sighand = &init_sighand, .nsproxy = &init_nsproxy, --- a/kernel/fork.c +++ b/kernel/fork.c @@ -95,6 +95,7 @@ #include <linux/stackleak.h> #include <linux/kasan.h> #include <linux/scs.h> +#include <linux/io_uring.h>
#include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -728,6 +729,7 @@ void __put_task_struct(struct task_struc WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk == current);
+ io_uring_free(tsk); cgroup_free(tsk); task_numa_free(tsk, true); security_task_free(tsk); @@ -2002,6 +2004,10 @@ static __latent_entropy struct task_stru p->vtime.state = VTIME_INACTIVE; #endif
+#ifdef CONFIG_IO_URING + p->io_uring = NULL; +#endif + #if defined(SPLIT_RSS_COUNTING) memset(&p->rss_stat, 0, sizeof(p->rss_stat)); #endif
From: Jens Axboe axboe@kernel.dk
commit 9b8284921513fc1ea57d87777283a59b05862f03 upstream.
If we don't get and assign the namespace for the async work, then certain paths just don't work properly (like /dev/stdin, /proc/mounts, etc). Anything that references the current namespace of the given task should be assigned for async work on behalf of that task.
Cc: stable@vger.kernel.org # v5.5+ Reported-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io-wq.c | 4 ++++ fs/io-wq.h | 1 + fs/io_uring.c | 3 +++ 3 files changed, 8 insertions(+)
--- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -60,6 +60,7 @@ struct io_worker { const struct cred *cur_creds; const struct cred *saved_creds; struct files_struct *restore_files; + struct nsproxy *restore_nsproxy; struct fs_struct *restore_fs; };
@@ -153,6 +154,7 @@ static bool __io_worker_unuse(struct io_
task_lock(current); current->files = worker->restore_files; + current->nsproxy = worker->restore_nsproxy; task_unlock(current); }
@@ -318,6 +320,7 @@ static void io_worker_start(struct io_wq
worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING); worker->restore_files = current->files; + worker->restore_nsproxy = current->nsproxy; worker->restore_fs = current->fs; io_wqe_inc_running(wqe, worker); } @@ -454,6 +457,7 @@ static void io_impersonate_work(struct i if (work->files && current->files != work->files) { task_lock(current); current->files = work->files; + current->nsproxy = work->nsproxy; task_unlock(current); } if (work->fs && current->fs != work->fs) --- a/fs/io-wq.h +++ b/fs/io-wq.h @@ -88,6 +88,7 @@ struct io_wq_work { struct files_struct *files; struct mm_struct *mm; const struct cred *creds; + struct nsproxy *nsproxy; struct fs_struct *fs; unsigned long fsize; unsigned flags; --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5678,6 +5678,7 @@ static void io_req_drop_files(struct io_ spin_unlock_irqrestore(&ctx->inflight_lock, flags); req->flags &= ~REQ_F_INFLIGHT; put_files_struct(req->work.files); + put_nsproxy(req->work.nsproxy); req->work.files = NULL; }
@@ -6086,6 +6087,8 @@ static int io_grab_files(struct io_kiocb return 0;
req->work.files = get_files_struct(current); + get_nsproxy(current->nsproxy); + req->work.nsproxy = current->nsproxy; req->flags |= REQ_F_INFLIGHT;
spin_lock_irq(&ctx->inflight_lock);
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 95da84659226d75698a1ab958be0af21d9cc2a9c upstream.
During a context switch the scheduler invokes wq_worker_sleeping() with disabled preemption. Disabling preemption is needed because it protects access to `worker->sleeping'. As an optimisation it avoids invoking schedule() within the schedule path as part of possible wake up (thus preempt_enable_no_resched() afterwards).
The io-wq has been added to the mix in the same section with disabled preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping() acquires a spinlock_t. Also within the schedule() the spinlock_t must be acquired after tsk_is_pi_blocked() otherwise it will block on the sleeping lock again while scheduling out.
While playing with `io_uring-bench' I didn't notice a significant latency spike after converting io_wqe::lock to a raw_spinlock_t. The latency was more or less the same.
In order to keep the spinlock_t it would have to be moved after the tsk_is_pi_blocked() check which would introduce a branch instruction into the hot path.
The lock is used to maintain the `work_list' and wakes one task up at most. Should io_wqe_cancel_pending_work() cause latency spikes, while searching for a specific item, then it would need to drop the lock during iterations. revert_creds() is also invoked under the lock. According to debug cred::non_rcu is 0. Otherwise it should be moved outside of the locked section because put_cred_rcu()->free_uid() acquires a sleeping lock.
Convert io_wqe::lock to a raw_spinlock_t.c
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io-wq.c | 52 ++++++++++++++++++++++++++-------------------------- 1 file changed, 26 insertions(+), 26 deletions(-)
--- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -88,7 +88,7 @@ enum { */ struct io_wqe { struct { - spinlock_t lock; + raw_spinlock_t lock; struct io_wq_work_list work_list; unsigned long hash_map; unsigned flags; @@ -149,7 +149,7 @@ static bool __io_worker_unuse(struct io_
if (current->files != worker->restore_files) { __acquire(&wqe->lock); - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); dropped_lock = true;
task_lock(current); @@ -168,7 +168,7 @@ static bool __io_worker_unuse(struct io_ if (worker->mm) { if (!dropped_lock) { __acquire(&wqe->lock); - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); dropped_lock = true; } __set_current_state(TASK_RUNNING); @@ -222,17 +222,17 @@ static void io_worker_exit(struct io_wor worker->flags = 0; preempt_enable();
- spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); hlist_nulls_del_rcu(&worker->nulls_node); list_del_rcu(&worker->all_list); if (__io_worker_unuse(wqe, worker)) { __release(&wqe->lock); - spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); } acct->nr_workers--; nr_workers = wqe->acct[IO_WQ_ACCT_BOUND].nr_workers + wqe->acct[IO_WQ_ACCT_UNBOUND].nr_workers; - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock);
/* all workers gone, wq exit can proceed */ if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) @@ -508,7 +508,7 @@ get_next: else if (!wq_list_empty(&wqe->work_list)) wqe->flags |= IO_WQE_FLAG_STALLED;
- spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); if (!work) break; io_assign_current_work(worker, work); @@ -542,17 +542,17 @@ get_next: io_wqe_enqueue(wqe, linked);
if (hash != -1U && !next_hashed) { - spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); wqe->hash_map &= ~BIT_ULL(hash); wqe->flags &= ~IO_WQE_FLAG_STALLED; /* skip unnecessary unlock-lock wqe->lock */ if (!work) goto get_next; - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); } } while (work);
- spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); } while (1); }
@@ -567,7 +567,7 @@ static int io_wqe_worker(void *data) while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) { set_current_state(TASK_INTERRUPTIBLE); loop: - spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); if (io_wqe_run_queue(wqe)) { __set_current_state(TASK_RUNNING); io_worker_handle_work(worker); @@ -578,7 +578,7 @@ loop: __release(&wqe->lock); goto loop; } - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); if (signal_pending(current)) flush_signals(current); if (schedule_timeout(WORKER_IDLE_TIMEOUT)) @@ -590,11 +590,11 @@ loop: }
if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) { - spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); if (!wq_list_empty(&wqe->work_list)) io_worker_handle_work(worker); else - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); }
io_worker_exit(worker); @@ -634,9 +634,9 @@ void io_wq_worker_sleeping(struct task_s
worker->flags &= ~IO_WORKER_F_RUNNING;
- spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); io_wqe_dec_running(wqe, worker); - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); }
static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index) @@ -660,7 +660,7 @@ static bool create_io_worker(struct io_w return false; }
- spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list); list_add_tail_rcu(&worker->all_list, &wqe->all_list); worker->flags |= IO_WORKER_F_FREE; @@ -669,7 +669,7 @@ static bool create_io_worker(struct io_w if (!acct->nr_workers && (worker->flags & IO_WORKER_F_BOUND)) worker->flags |= IO_WORKER_F_FIXED; acct->nr_workers++; - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock);
if (index == IO_WQ_ACCT_UNBOUND) atomic_inc(&wq->user->processes); @@ -724,12 +724,12 @@ static int io_wq_manager(void *data) if (!node_online(node)) continue;
- spin_lock_irq(&wqe->lock); + raw_spin_lock_irq(&wqe->lock); if (io_wqe_need_worker(wqe, IO_WQ_ACCT_BOUND)) fork_worker[IO_WQ_ACCT_BOUND] = true; if (io_wqe_need_worker(wqe, IO_WQ_ACCT_UNBOUND)) fork_worker[IO_WQ_ACCT_UNBOUND] = true; - spin_unlock_irq(&wqe->lock); + raw_spin_unlock_irq(&wqe->lock); if (fork_worker[IO_WQ_ACCT_BOUND]) create_io_worker(wq, wqe, IO_WQ_ACCT_BOUND); if (fork_worker[IO_WQ_ACCT_UNBOUND]) @@ -825,10 +825,10 @@ static void io_wqe_enqueue(struct io_wqe }
work_flags = work->flags; - spin_lock_irqsave(&wqe->lock, flags); + raw_spin_lock_irqsave(&wqe->lock, flags); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; - spin_unlock_irqrestore(&wqe->lock, flags); + raw_spin_unlock_irqrestore(&wqe->lock, flags);
if ((work_flags & IO_WQ_WORK_CONCURRENT) || !atomic_read(&acct->nr_running)) @@ -955,13 +955,13 @@ static void io_wqe_cancel_pending_work(s unsigned long flags;
retry: - spin_lock_irqsave(&wqe->lock, flags); + raw_spin_lock_irqsave(&wqe->lock, flags); wq_list_for_each(node, prev, &wqe->work_list) { work = container_of(node, struct io_wq_work, list); if (!match->fn(work, match->data)) continue; io_wqe_remove_pending(wqe, work, prev); - spin_unlock_irqrestore(&wqe->lock, flags); + raw_spin_unlock_irqrestore(&wqe->lock, flags); io_run_cancel(work, wqe); match->nr_pending++; if (!match->cancel_all) @@ -970,7 +970,7 @@ retry: /* not safe to continue after unlock */ goto retry; } - spin_unlock_irqrestore(&wqe->lock, flags); + raw_spin_unlock_irqrestore(&wqe->lock, flags); }
static void io_wqe_cancel_running_work(struct io_wqe *wqe, @@ -1078,7 +1078,7 @@ struct io_wq *io_wq_create(unsigned boun } atomic_set(&wqe->acct[IO_WQ_ACCT_UNBOUND].nr_running, 0); wqe->wq = wq; - spin_lock_init(&wqe->lock); + raw_spin_lock_init(&wqe->lock); INIT_WQ_LIST(&wqe->work_list); INIT_HLIST_NULLS_HEAD(&wqe->free_list, 0); INIT_LIST_HEAD(&wqe->all_list);
From: Hillf Danton hdanton@sina.com
commit c4068bf898ddaef791049a366828d9b84b467bda upstream.
The smart syzbot has found a reproducer for the following issue:
================================================================== BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline] BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771
CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:192 instrument_atomic_write include/linux/instrumented.h:71 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_wqe_inc_running fs/io-wq.c:301 [inline] io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 schedule_timeout+0x148/0x250 kernel/time/timer.c:1879 io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Allocated by task 7768: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594 kmalloc_node include/linux/slab.h:572 [inline] kzalloc_node include/linux/slab.h:677 [inline] io_wq_create+0x57b/0xa10 fs/io-wq.c:1064 io_init_wq_offload fs/io_uring.c:7432 [inline] io_sq_offload_start fs/io_uring.c:7504 [inline] io_uring_create fs/io_uring.c:8625 [inline] io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 21: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kfree+0x10e/0x2b0 mm/slab.c:3756 __io_wq_destroy fs/io-wq.c:1138 [inline] io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146 io_finish_async fs/io_uring.c:6836 [inline] io_ring_ctx_free fs/io_uring.c:7870 [inline] io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
The buggy address belongs to the object at ffff8882183db000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 140 bytes inside of 1024-byte region [ffff8882183db000, ffff8882183db400) The buggy address belongs to the page: page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db flags: 0x57ffe0000000200(slab) raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
which is down to the comment below,
/* all workers gone, wq exit can proceed */ if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) complete(&wqe->wq->done);
because there might be multiple cases of wqe in a wq and we would wait for every worker in every wqe to go home before releasing wq's resources on destroying.
To that end, rework wq's refcount by making it independent of the tracking of workers because after all they are two different things, and keeping it balanced when workers come and go. Note the manager kthread, like other workers, now holds a grab to wq during its lifetime.
Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker and do nothing for exiting wq.
Cc: stable@vger.kernel.org # v5.5+ Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com Cc: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Hillf Danton hdanton@sina.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io-wq.c | 116 ++++++++++++++++++++++++++++++------------------------------- 1 file changed, 58 insertions(+), 58 deletions(-)
--- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -202,7 +202,6 @@ static void io_worker_exit(struct io_wor { struct io_wqe *wqe = worker->wqe; struct io_wqe_acct *acct = io_wqe_get_acct(wqe, worker); - unsigned nr_workers;
/* * If we're not at zero, someone else is holding a brief reference @@ -230,15 +229,11 @@ static void io_worker_exit(struct io_wor raw_spin_lock_irq(&wqe->lock); } acct->nr_workers--; - nr_workers = wqe->acct[IO_WQ_ACCT_BOUND].nr_workers + - wqe->acct[IO_WQ_ACCT_UNBOUND].nr_workers; raw_spin_unlock_irq(&wqe->lock);
- /* all workers gone, wq exit can proceed */ - if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) - complete(&wqe->wq->done); - kfree_rcu(worker, rcu); + if (refcount_dec_and_test(&wqe->wq->refs)) + complete(&wqe->wq->done); }
static inline bool io_wqe_run_queue(struct io_wqe *wqe) @@ -641,7 +636,7 @@ void io_wq_worker_sleeping(struct task_s
static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index) { - struct io_wqe_acct *acct =&wqe->acct[index]; + struct io_wqe_acct *acct = &wqe->acct[index]; struct io_worker *worker;
worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, wqe->node); @@ -674,6 +669,7 @@ static bool create_io_worker(struct io_w if (index == IO_WQ_ACCT_UNBOUND) atomic_inc(&wq->user->processes);
+ refcount_inc(&wq->refs); wake_up_process(worker->task); return true; } @@ -689,28 +685,63 @@ static inline bool io_wqe_need_worker(st return acct->nr_workers < acct->max_workers; }
+static bool io_wqe_worker_send_sig(struct io_worker *worker, void *data) +{ + send_sig(SIGINT, worker->task, 1); + return false; +} + +/* + * Iterate the passed in list and call the specific function for each + * worker that isn't exiting + */ +static bool io_wq_for_each_worker(struct io_wqe *wqe, + bool (*func)(struct io_worker *, void *), + void *data) +{ + struct io_worker *worker; + bool ret = false; + + list_for_each_entry_rcu(worker, &wqe->all_list, all_list) { + if (io_worker_get(worker)) { + /* no task if node is/was offline */ + if (worker->task) + ret = func(worker, data); + io_worker_release(worker); + if (ret) + break; + } + } + + return ret; +} + +static bool io_wq_worker_wake(struct io_worker *worker, void *data) +{ + wake_up_process(worker->task); + return false; +} + /* * Manager thread. Tasked with creating new workers, if we need them. */ static int io_wq_manager(void *data) { struct io_wq *wq = data; - int workers_to_create = num_possible_nodes(); int node;
/* create fixed workers */ - refcount_set(&wq->refs, workers_to_create); + refcount_set(&wq->refs, 1); for_each_node(node) { if (!node_online(node)) continue; - if (!create_io_worker(wq, wq->wqes[node], IO_WQ_ACCT_BOUND)) - goto err; - workers_to_create--; + if (create_io_worker(wq, wq->wqes[node], IO_WQ_ACCT_BOUND)) + continue; + set_bit(IO_WQ_BIT_ERROR, &wq->state); + set_bit(IO_WQ_BIT_EXIT, &wq->state); + goto out; }
- while (workers_to_create--) - refcount_dec(&wq->refs); - complete(&wq->done);
while (!kthread_should_stop()) { @@ -742,12 +773,18 @@ static int io_wq_manager(void *data) if (current->task_works) task_work_run();
- return 0; -err: - set_bit(IO_WQ_BIT_ERROR, &wq->state); - set_bit(IO_WQ_BIT_EXIT, &wq->state); - if (refcount_sub_and_test(workers_to_create, &wq->refs)) +out: + if (refcount_dec_and_test(&wq->refs)) { complete(&wq->done); + return 0; + } + /* if ERROR is set and we get here, we have workers to wake */ + if (test_bit(IO_WQ_BIT_ERROR, &wq->state)) { + rcu_read_lock(); + for_each_node(node) + io_wq_for_each_worker(wq->wqes[node], io_wq_worker_wake, NULL); + rcu_read_unlock(); + } return 0; }
@@ -854,37 +891,6 @@ void io_wq_hash_work(struct io_wq_work * work->flags |= (IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT)); }
-static bool io_wqe_worker_send_sig(struct io_worker *worker, void *data) -{ - send_sig(SIGINT, worker->task, 1); - return false; -} - -/* - * Iterate the passed in list and call the specific function for each - * worker that isn't exiting - */ -static bool io_wq_for_each_worker(struct io_wqe *wqe, - bool (*func)(struct io_worker *, void *), - void *data) -{ - struct io_worker *worker; - bool ret = false; - - list_for_each_entry_rcu(worker, &wqe->all_list, all_list) { - if (io_worker_get(worker)) { - /* no task if node is/was offline */ - if (worker->task) - ret = func(worker, data); - io_worker_release(worker); - if (ret) - break; - } - } - - return ret; -} - void io_wq_cancel_all(struct io_wq *wq) { int node; @@ -1117,12 +1123,6 @@ bool io_wq_get(struct io_wq *wq, struct return refcount_inc_not_zero(&wq->use_refs); }
-static bool io_wq_worker_wake(struct io_worker *worker, void *data) -{ - wake_up_process(worker->task); - return false; -} - static void __io_wq_destroy(struct io_wq *wq) { int node;
From: Jens Axboe axboe@kernel.dk
commit ca6484cd308a671811bf39f3119e81966eb476e3 upstream.
The kernel test robot reports this lockdep issue:
[child1:659] mbind (274) returned ENOSYS, marking as inactive. [child1:659] mq_timedsend (279) returned ENOSYS, marking as inactive. [main] 10175 iterations. [F:7781 S:2344 HI:2397] [ 24.610601] [ 24.610743] ================================ [ 24.611083] WARNING: inconsistent lock state [ 24.611437] 5.9.0-rc7-00017-g0f2122045b9462 #5 Not tainted [ 24.611861] -------------------------------- [ 24.612193] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 24.612660] ksoftirqd/0/7 [HC0[0]:SC1[3]:HE0:SE0] takes: [ 24.613086] f00ed998 (&xa->xa_lock#4){+.?.}-{2:2}, at: xa_destroy+0x43/0xc1 [ 24.613642] {SOFTIRQ-ON-W} state was registered at: [ 24.614024] lock_acquire+0x20c/0x29b [ 24.614341] _raw_spin_lock+0x21/0x30 [ 24.614636] io_uring_add_task_file+0xe8/0x13a [ 24.614987] io_uring_create+0x535/0x6bd [ 24.615297] io_uring_setup+0x11d/0x136 [ 24.615606] __ia32_sys_io_uring_setup+0xd/0xf [ 24.615977] do_int80_syscall_32+0x53/0x6c [ 24.616306] restore_all_switch_stack+0x0/0xb1 [ 24.616677] irq event stamp: 939881 [ 24.616968] hardirqs last enabled at (939880): [<8105592d>] __local_bh_enable_ip+0x13c/0x145 [ 24.617642] hardirqs last disabled at (939881): [<81b6ace3>] _raw_spin_lock_irqsave+0x1b/0x4e [ 24.618321] softirqs last enabled at (939738): [<81b6c7c8>] __do_softirq+0x3f0/0x45a [ 24.618924] softirqs last disabled at (939743): [<81055741>] run_ksoftirqd+0x35/0x61 [ 24.619521] [ 24.619521] other info that might help us debug this: [ 24.620028] Possible unsafe locking scenario: [ 24.620028] [ 24.620492] CPU0 [ 24.620685] ---- [ 24.620894] lock(&xa->xa_lock#4); [ 24.621168] <Interrupt> [ 24.621381] lock(&xa->xa_lock#4); [ 24.621695] [ 24.621695] *** DEADLOCK *** [ 24.621695] [ 24.622154] 1 lock held by ksoftirqd/0/7: [ 24.622468] #0: 823bfb94 (rcu_callback){....}-{0:0}, at: rcu_process_callbacks+0xc0/0x155 [ 24.623106] [ 24.623106] stack backtrace: [ 24.623454] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.9.0-rc7-00017-g0f2122045b9462 #5 [ 24.624090] Call Trace: [ 24.624284] ? show_stack+0x40/0x46 [ 24.624551] dump_stack+0x1b/0x1d [ 24.624809] print_usage_bug+0x17a/0x185 [ 24.625142] mark_lock+0x11d/0x1db [ 24.625474] ? print_shortest_lock_dependencies+0x121/0x121 [ 24.625905] __lock_acquire+0x41e/0x7bf [ 24.626206] lock_acquire+0x20c/0x29b [ 24.626517] ? xa_destroy+0x43/0xc1 [ 24.626810] ? lock_acquire+0x20c/0x29b [ 24.627110] _raw_spin_lock_irqsave+0x3e/0x4e [ 24.627450] ? xa_destroy+0x43/0xc1 [ 24.627725] xa_destroy+0x43/0xc1 [ 24.627989] __io_uring_free+0x57/0x71 [ 24.628286] ? get_pid+0x22/0x22 [ 24.628544] __put_task_struct+0xf2/0x163 [ 24.628865] put_task_struct+0x1f/0x2a [ 24.629161] delayed_put_task_struct+0xe2/0xe9 [ 24.629509] rcu_process_callbacks+0x128/0x155 [ 24.629860] __do_softirq+0x1a3/0x45a [ 24.630151] run_ksoftirqd+0x35/0x61 [ 24.630443] smpboot_thread_fn+0x304/0x31a [ 24.630763] kthread+0x124/0x139 [ 24.631016] ? sort_range+0x18/0x18 [ 24.631290] ? kthread_create_worker_on_cpu+0x17/0x17 [ 24.631682] ret_from_fork+0x1c/0x28
which is complaining about xa_destroy() grabbing the xa lock in an IRQ disabling fashion, whereas the io_uring uses cases aren't interrupt safe. This is really an xarray issue, since it should not assume the lock type. But for our use case, since we know the xarray is empty at this point, there's no need to actually call xa_destroy(). So just get rid of it.
Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 1 - 1 file changed, 1 deletion(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7536,7 +7536,6 @@ void __io_uring_free(struct task_struct struct io_uring_task *tctx = tsk->io_uring;
WARN_ON_ONCE(!xa_empty(&tctx->xa)); - xa_destroy(&tctx->xa); kfree(tctx); tsk->io_uring = NULL; }
From: "Matthew Wilcox (Oracle)" willy@infradead.org
commit ce765372bc443573d1d339a2bf4995de385dea3a upstream.
We have to drop the lock during each iteration, so there's no advantage to using the advanced API. Convert this to a standard xa_for_each() loop.
Reported-by: syzbot+27c12725d8ff0bfe1a13@syzkaller.appspotmail.com Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8415,28 +8415,19 @@ static void io_uring_attempt_task_drop(s void __io_uring_files_cancel(struct files_struct *files) { struct io_uring_task *tctx = current->io_uring; - XA_STATE(xas, &tctx->xa, 0); + struct file *file; + unsigned long index;
/* make sure overflow events are dropped */ tctx->in_idle = true;
- do { - struct io_ring_ctx *ctx; - struct file *file; - - xas_lock(&xas); - file = xas_next_entry(&xas, ULONG_MAX); - xas_unlock(&xas); - - if (!file) - break; - - ctx = file->private_data; + xa_for_each(&tctx->xa, index, file) { + struct io_ring_ctx *ctx = file->private_data;
io_uring_cancel_task_requests(ctx, files); if (files) io_uring_del_task_file(file); - } while (1); + } }
static inline bool io_uring_task_idle(struct io_uring_task *tctx)
From: "Matthew Wilcox (Oracle)" willy@infradead.org
commit 236434c3438c4da3dfbd6aeeab807577b85e951a upstream.
The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't allocate memory using GFP_NOWAIT, it would leak the reference to the file descriptor. Also the node pointed to by the xas could be freed between the call to xas_load() under the rcu_read_lock() and the acquisition of the xa_lock.
It's easier to just use the normal xa_load/xa_store interface here.
Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org [axboe: fix missing assign after alloc, cur_uring -> tctx rename] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8336,27 +8336,24 @@ static void io_uring_cancel_task_request */ static int io_uring_add_task_file(struct file *file) { - if (unlikely(!current->io_uring)) { + struct io_uring_task *tctx = current->io_uring; + + if (unlikely(!tctx)) { int ret;
ret = io_uring_alloc_task_context(current); if (unlikely(ret)) return ret; + tctx = current->io_uring; } - if (current->io_uring->last != file) { - XA_STATE(xas, ¤t->io_uring->xa, (unsigned long) file); - void *old; + if (tctx->last != file) { + void *old = xa_load(&tctx->xa, (unsigned long)file);
- rcu_read_lock(); - old = xas_load(&xas); - if (old != file) { + if (!old) { get_file(file); - xas_lock(&xas); - xas_store(&xas, file); - xas_unlock(&xas); + xa_store(&tctx->xa, (unsigned long)file, file, GFP_KERNEL); } - rcu_read_unlock(); - current->io_uring->last = file; + tctx->last = file; }
return 0;
From: "Matthew Wilcox (Oracle)" willy@infradead.org
commit 5e2ed8c4f45093698855b1f45cdf43efbf6dd498 upstream.
There are no bugs here that I've spotted, it's just easier to use the normal API and there are no performance advantages to using the more verbose advanced API.
Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8365,27 +8365,17 @@ static int io_uring_add_task_file(struct static void io_uring_del_task_file(struct file *file) { struct io_uring_task *tctx = current->io_uring; - XA_STATE(xas, &tctx->xa, (unsigned long) file);
if (tctx->last == file) tctx->last = NULL; - - xas_lock(&xas); - file = xas_store(&xas, NULL); - xas_unlock(&xas); - + file = xa_erase(&tctx->xa, (unsigned long)file); if (file) fput(file); }
static void __io_uring_attempt_task_drop(struct file *file) { - XA_STATE(xas, ¤t->io_uring->xa, (unsigned long) file); - struct file *old; - - rcu_read_lock(); - old = xas_load(&xas); - rcu_read_unlock(); + struct file *old = xa_load(¤t->io_uring->xa, (unsigned long)file);
if (old == file) io_uring_del_task_file(file);
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream.
When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails.
The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by
(1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig
So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- scripts/setlocalversion | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
--- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version()
# Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify HEAD 2>/dev/null); then
# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,11 +59,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}'
- # If we don't have a tag at all we print -g{commitish}. + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi
From: Ard Biesheuvel ardb@kernel.org
commit d32de9130f6c79533508e2c7879f18997bfbe2a0 upstream.
Currently, on arm64, we abort on any failure from efi_get_random_bytes() other than EFI_NOT_FOUND when it comes to setting the physical seed for KASLR, but ignore such failures when obtaining the seed for virtual KASLR or for early seeding of the kernel's entropy pool via the config table. This is inconsistent, and may lead to unexpected boot failures.
So let's permit any failure for the physical seed, and simply report the error code if it does not equal EFI_NOT_FOUND.
Cc: stable@vger.kernel.org # v5.8+ Reported-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/firmware/efi/libstub/arm64-stub.c | 8 +++++--- drivers/firmware/efi/libstub/fdt.c | 4 +--- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c @@ -62,10 +62,12 @@ efi_status_t handle_kernel_image(unsigne status = efi_get_random_bytes(sizeof(phys_seed), (u8 *)&phys_seed); if (status == EFI_NOT_FOUND) { - efi_info("EFI_RNG_PROTOCOL unavailable, no randomness supplied\n"); + efi_info("EFI_RNG_PROTOCOL unavailable, KASLR will be disabled\n"); + efi_nokaslr = true; } else if (status != EFI_SUCCESS) { - efi_err("efi_get_random_bytes() failed\n"); - return status; + efi_err("efi_get_random_bytes() failed (0x%lx), KASLR will be disabled\n", + status); + efi_nokaslr = true; } } else { efi_info("KASLR disabled on kernel command line\n"); --- a/drivers/firmware/efi/libstub/fdt.c +++ b/drivers/firmware/efi/libstub/fdt.c @@ -136,7 +136,7 @@ static efi_status_t update_fdt(void *ori if (status) goto fdt_set_fail;
- if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) { efi_status_t efi_status;
efi_status = efi_get_random_bytes(sizeof(fdt_val64), @@ -145,8 +145,6 @@ static efi_status_t update_fdt(void *ori status = fdt_setprop_var(fdt, node, "kaslr-seed", fdt_val64); if (status) goto fdt_set_fail; - } else if (efi_status != EFI_NOT_FOUND) { - return efi_status; } }
From: Kees Cook keescook@chromium.org
commit 06e67b849ab910a49a629445f43edb074153d0eb upstream.
The "FIRMWARE_EFI_EMBEDDED" enum is a "where", not a "what". It should not be distinguished separately from just "FIRMWARE", as this confuses the LSMs about what is being loaded. Additionally, there was no actual validation of the firmware contents happening.
Fixes: e4c2c0ff00ec ("firmware: Add new platform fallback mechanism and firmware_request_platform()") Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Luis Chamberlain mcgrof@kernel.org Acked-by: Scott Branden scott.branden@broadcom.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201002173828.2099543-3-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/firmware_loader/fallback_platform.c | 2 +- include/linux/fs.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/base/firmware_loader/fallback_platform.c +++ b/drivers/base/firmware_loader/fallback_platform.c @@ -17,7 +17,7 @@ int firmware_fallback_platform(struct fw if (!(opt_flags & FW_OPT_FALLBACK_PLATFORM)) return -ENOENT;
- rc = security_kernel_load_data(LOADING_FIRMWARE_EFI_EMBEDDED); + rc = security_kernel_load_data(LOADING_FIRMWARE); if (rc) return rc;
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2862,7 +2862,6 @@ extern int do_pipe_flags(int *, int); id(UNKNOWN, unknown) \ id(FIRMWARE, firmware) \ id(FIRMWARE_PREALLOC_BUFFER, firmware) \ - id(FIRMWARE_EFI_EMBEDDED, firmware) \ id(MODULE, kernel-module) \ id(KEXEC_IMAGE, kexec-image) \ id(KEXEC_INITRAMFS, kexec-initramfs) \
From: Marc Zyngier maz@kernel.org
commit 18fce56134c987e5b4eceddafdbe4b00c07e2ae1 upstream.
Commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof") changed the way we deal with ARCH_WORKAROUND_1, by moving most of the enabling code to the .matches() callback.
This has the unfortunate effect that the workaround gets only enabled on the first affected CPU, and no other.
In order to address this, forcefully call the .matches() callback from a .cpu_enable() callback, which brings us back to the original behaviour.
Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof") Cc: stable@vger.kernel.org Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/cpu_errata.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -599,6 +599,12 @@ check_branch_predictor(const struct arm6 return (need_wa > 0); }
+static void +cpu_enable_branch_predictor_hardening(const struct arm64_cpu_capabilities *cap) +{ + cap->matches(cap, SCOPE_LOCAL_CPU); +} + static const __maybe_unused struct midr_range tx2_family_cpus[] = { MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN), MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2), @@ -890,9 +896,11 @@ const struct arm64_cpu_capabilities arm6 }, #endif { + .desc = "Branch predictor hardening", .capability = ARM64_HARDEN_BRANCH_PREDICTOR, .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, .matches = check_branch_predictor, + .cpu_enable = cpu_enable_branch_predictor_hardening, }, #ifdef CONFIG_RANDOMIZE_BASE {
From: Marc Zyngier maz@kernel.org
commit 39533e12063be7f55e3d6ae21ffe067799d542a4 upstream.
Commit 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification") changed the way we deal with per-CPU errata by only calling the .matches() callback until one CPU is found to be affected. At this point, .matches() stop being called, and .cpu_enable() will be called on all CPUs.
This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be mitigated.
In order to address this, forcefully call the .matches() callback from a .cpu_enable() callback, which brings us back to the original behaviour.
Fixes: 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification") Cc: stable@vger.kernel.org Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/cpu_errata.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -457,6 +457,12 @@ out_printmsg: return required; }
+static void cpu_enable_ssbd_mitigation(const struct arm64_cpu_capabilities *cap) +{ + if (ssbd_state != ARM64_SSBD_FORCE_DISABLE) + cap->matches(cap, SCOPE_LOCAL_CPU); +} + /* known invulnerable cores */ static const struct midr_range arm64_ssb_cpus[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A35), @@ -914,6 +920,7 @@ const struct arm64_cpu_capabilities arm6 .capability = ARM64_SSBD, .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, .matches = has_ssbd_mitigation, + .cpu_enable = cpu_enable_ssbd_mitigation, .midr_range_list = arm64_ssb_cpus, }, #ifdef CONFIG_ARM64_ERRATUM_1418040
From: Nick Desaulniers ndesaulniers@google.com
commit 3b92fa7485eba16b05166fddf38ab42f2ff6ab95 upstream.
With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n, CONFIG_RELOCATABLE=n, we observe the following failure when trying to link the kernel image with LD=ld.lld:
error: section: .exit.data is not contiguous with other relro sections
ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This was previously fixed, but only for CONFIG_RELOCATABLE=y.
Fixes: 3bbd3db86470 ("arm64: relocatable: fix inconsistencies in linker script and options") Signed-off-by: Nick Desaulniers ndesaulniers@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201016175339.2429280-1-ndesaulniers@google.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -10,14 +10,14 @@ # # Copyright (C) 1995-2001 by Russell King
-LDFLAGS_vmlinux :=--no-undefined -X +LDFLAGS_vmlinux :=--no-undefined -X -z norelro CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
ifeq ($(CONFIG_RELOCATABLE), y) # Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour # for relative relocs, since this leads to better Image compression # with the relocation offsets always being zero. -LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro \ +LDFLAGS_vmlinux += -shared -Bsymbolic -z notext \ $(call ld-option, --no-apply-dynamic-relocs) endif
From: Randy Dunlap rdunlap@infradead.org
commit 035fff1f7aab43e420e0098f0854470a5286fb83 upstream.
Fix build error when CONFIG_ACPI is not set/enabled by adding the header file <asm/acpi.h> which contains a stub for the function in the build error.
../arch/x86/pci/intel_mid_pci.c: In function ‘intel_mid_pci_init’: ../arch/x86/pci/intel_mid_pci.c:303:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration] acpi_noirq_set();
Fixes: a912a7584ec3 ("x86/platform/intel-mid: Move PCI initialization to arch_init()") Link: https://lore.kernel.org/r/ea903917-e51b-4cc9-2680-bc1e36efa026@infradead.org Signed-off-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Jesse Barnes jsbarnes@google.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org # v4.16+ Cc: Jacob Pan jacob.jun.pan@linux.intel.com Cc: Len Brown lenb@kernel.org Cc: Jesse Barnes jsbarnes@google.com Cc: Arjan van de Ven arjan@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/pci/intel_mid_pci.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/pci/intel_mid_pci.c +++ b/arch/x86/pci/intel_mid_pci.c @@ -33,6 +33,7 @@ #include <asm/hw_irq.h> #include <asm/io_apic.h> #include <asm/intel-mid.h> +#include <asm/acpi.h>
#define PCIE_CAP_OFFSET 0x100
From: Dan Williams dan.j.williams@intel.com
commit ec6347bb43395cb92126788a1a5b25302543f815 upstream.
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled.
Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case:
On Fri, May 1, 2020 at 11:28 AM Linus Torvalds torvalds@linux-foundation.org wrote:
On Thu, Apr 30, 2020 at 6:21 PM Dan Williams dan.j.williams@intel.com wrote:
However now I see that copy_user_generic() works for the wrong reason. It works because the exception on the source address due to poison looks no different than a write fault on the user address to the caller, it's still just a short copy. So it makes copy_to_user() work for the wrong reason relative to the name.
Right.
And it won't work that way on other architectures. On x86, we have a generic function that can take faults on either side, and we use it for both cases (and for the "in_user" case too), but that's an artifact of the architecture oddity.
In fact, it's probably wrong even on x86 - because it can hide bugs - but writing those things is painful enough that everybody prefers having just one function.
Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel().
Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks.
[ bp: Massage a bit. ]
Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Tony Luck tony.luck@intel.com Acked-by: Michael Ellerman mpe@ellerman.id.au Cc: stable@vger.kernel.org Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg... Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dw... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/Kconfig | 2 arch/powerpc/include/asm/string.h | 2 arch/powerpc/include/asm/uaccess.h | 40 +- arch/powerpc/lib/Makefile | 2 arch/powerpc/lib/copy_mc_64.S | 242 +++++++++++++++++ arch/powerpc/lib/memcpy_mcsafe_64.S | 242 ----------------- arch/x86/Kconfig | 2 arch/x86/Kconfig.debug | 2 arch/x86/include/asm/copy_mc_test.h | 75 +++++ arch/x86/include/asm/mce.h | 9 arch/x86/include/asm/mcsafe_test.h | 75 ----- arch/x86/include/asm/string_64.h | 32 -- arch/x86/include/asm/uaccess.h | 9 arch/x86/include/asm/uaccess_64.h | 20 - arch/x86/kernel/cpu/mce/core.c | 8 arch/x86/kernel/quirks.c | 10 arch/x86/lib/Makefile | 1 arch/x86/lib/copy_mc.c | 82 +++++ arch/x86/lib/copy_mc_64.S | 127 ++++++++ arch/x86/lib/memcpy_64.S | 115 -------- arch/x86/lib/usercopy_64.c | 21 - drivers/md/dm-writecache.c | 15 - drivers/nvdimm/claim.c | 2 drivers/nvdimm/pmem.c | 6 include/linux/string.h | 9 include/linux/uaccess.h | 13 include/linux/uio.h | 10 lib/Kconfig | 7 lib/iov_iter.c | 48 +-- tools/arch/x86/include/asm/mcsafe_test.h | 13 tools/arch/x86/lib/memcpy_64.S | 115 -------- tools/objtool/check.c | 4 tools/perf/bench/Build | 1 tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 - tools/testing/nvdimm/test/nfit.c | 49 +-- tools/testing/selftests/powerpc/copyloops/.gitignore | 2 tools/testing/selftests/powerpc/copyloops/Makefile | 6 tools/testing/selftests/powerpc/copyloops/copy_mc_64.S | 242 +++++++++++++++++ 38 files changed, 914 insertions(+), 770 deletions(-)
--- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -135,7 +135,7 @@ config PPC select ARCH_HAS_STRICT_KERNEL_RWX if (PPC32 && !HIBERNATION) select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UACCESS_FLUSHCACHE - select ARCH_HAS_UACCESS_MCSAFE if PPC64 + select ARCH_HAS_COPY_MC if PPC64 select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_KEEP_MEMBLOCK --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -53,9 +53,7 @@ void *__memmove(void *to, const void *fr #ifndef CONFIG_KASAN #define __HAVE_ARCH_MEMSET32 #define __HAVE_ARCH_MEMSET64 -#define __HAVE_ARCH_MEMCPY_MCSAFE
-extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -435,6 +435,32 @@ do { \ extern unsigned long __copy_tofrom_user(void __user *to, const void __user *from, unsigned long size);
+#ifdef CONFIG_ARCH_HAS_COPY_MC +unsigned long __must_check +copy_mc_generic(void *to, const void *from, unsigned long size); + +static inline unsigned long __must_check +copy_mc_to_kernel(void *to, const void *from, unsigned long size) +{ + return copy_mc_generic(to, from, size); +} +#define copy_mc_to_kernel copy_mc_to_kernel + +static inline unsigned long __must_check +copy_mc_to_user(void __user *to, const void *from, unsigned long n) +{ + if (likely(check_copy_size(from, n, true))) { + if (access_ok(to, n)) { + allow_write_to_user(to, n); + n = copy_mc_generic((void *)to, from, n); + prevent_write_to_user(to, n); + } + } + + return n; +} +#endif + #ifdef __powerpc64__ static inline unsigned long raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) @@ -523,20 +549,6 @@ raw_copy_to_user(void __user *to, const return ret; }
-static __always_inline unsigned long __must_check -copy_to_user_mcsafe(void __user *to, const void *from, unsigned long n) -{ - if (likely(check_copy_size(from, n, true))) { - if (access_ok(to, n)) { - allow_write_to_user(to, n); - n = memcpy_mcsafe((void *)to, from, n); - prevent_write_to_user(to, n); - } - } - - return n; -} - unsigned long __arch_clear_user(void __user *addr, unsigned long size);
static inline unsigned long clear_user(void __user *addr, unsigned long size) --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -39,7 +39,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_ memcpy_power7.o
obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ - memcpy_64.o memcpy_mcsafe_64.o + memcpy_64.o copy_mc_64.o
ifndef CONFIG_PPC_QUEUED_SPINLOCKS obj64-$(CONFIG_SMP) += locks.o --- /dev/null +++ b/arch/powerpc/lib/copy_mc_64.S @@ -0,0 +1,242 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard anton@au.ibm.com + * Author - Balbir Singh bsingharora@gmail.com + */ +#include <asm/ppc_asm.h> +#include <asm/errno.h> +#include <asm/export.h> + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + + .macro err3 +300: EX_TABLE(300b,.Ldone) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addi r1,r1,STACKFRAMESIZE +.Ldo_err1: + /* Do a byte by byte copy to get the exact remaining size */ + mtctr r7 +46: +err3; lbz r0,0(r4) + addi r4,r4,1 +err3; stb r0,0(r3) + addi r3,r3,1 + bdnz 46b + li r3,0 + blr + +.Ldone: + mfctr r3 + blr + + +_GLOBAL(copy_mc_generic) + mr r7,r5 + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addi r4,r4,1 +err1; stb r0,0(r3) + addi r3,r3,1 + subi r7,r7,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + subi r7,r7,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + subi r7,r7,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + + mflr r0 + stdu r1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + blt 5f + srdi r6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r8,16(r4) +err2; ld r9,24(r4) +err2; ld r10,32(r4) +err2; ld r11,40(r4) +err2; ld r12,48(r4) +err2; ld r14,56(r4) +err2; ld r15,64(r4) +err2; ld r16,72(r4) +err2; ld r17,80(r4) +err2; ld r18,88(r4) +err2; ld r19,96(r4) +err2; ld r20,104(r4) +err2; ld r21,112(r4) +err2; ld r22,120(r4) + addi r4,r4,128 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r8,16(r3) +err2; std r9,24(r3) +err2; std r10,32(r3) +err2; std r11,40(r3) +err2; std r12,48(r3) +err2; std r14,56(r3) +err2; std r15,64(r3) +err2; std r16,72(r3) +err2; std r17,80(r3) +err2; std r18,88(r3) +err2; std r19,96(r3) +err2; std r20,104(r3) +err2; std r21,112(r3) +err2; std r22,120(r3) + addi r3,r3,128 + subi r7,r7,128 + bdnz 4b + + clrldi r5,r5,(64-7) + + /* Up to 127B to go */ +5: srdi r6,r5,4 + mtocrf 0x01,r6 + +6: bf cr7*4+1,7f +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r8,16(r4) +err2; ld r9,24(r4) +err2; ld r10,32(r4) +err2; ld r11,40(r4) +err2; ld r12,48(r4) +err2; ld r14,56(r4) + addi r4,r4,64 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r8,16(r3) +err2; std r9,24(r3) +err2; std r10,32(r3) +err2; std r11,40(r3) +err2; std r12,48(r3) +err2; std r14,56(r3) + addi r3,r3,64 + subi r7,r7,64 + +7: ld r14,STK_REG(R14)(r1) + ld r15,STK_REG(R15)(r1) + ld r16,STK_REG(R16)(r1) + ld r17,STK_REG(R17)(r1) + ld r18,STK_REG(R18)(r1) + ld r19,STK_REG(R19)(r1) + ld r20,STK_REG(R20)(r1) + ld r21,STK_REG(R21)(r1) + ld r22,STK_REG(R22)(r1) + addi r1,r1,STACKFRAMESIZE + + /* Up to 63B to go */ + bf cr7*4+2,8f +err1; ld r0,0(r4) +err1; ld r6,8(r4) +err1; ld r8,16(r4) +err1; ld r9,24(r4) + addi r4,r4,32 +err1; std r0,0(r3) +err1; std r6,8(r3) +err1; std r8,16(r3) +err1; std r9,24(r3) + addi r3,r3,32 + subi r7,r7,32 + + /* Up to 31B to go */ +8: bf cr7*4+3,9f +err1; ld r0,0(r4) +err1; ld r6,8(r4) + addi r4,r4,16 +err1; std r0,0(r3) +err1; std r6,8(r3) + addi r3,r3,16 + subi r7,r7,16 + +9: clrldi r5,r5,(64-4) + + /* Up to 15B to go */ +.Lshort_copy: + mtocrf 0x01,r5 + bf cr7*4+0,12f +err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ +err1; lwz r6,4(r4) + addi r4,r4,8 +err1; stw r0,0(r3) +err1; stw r6,4(r3) + addi r3,r3,8 + subi r7,r7,8 + +12: bf cr7*4+1,13f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + subi r7,r7,4 + +13: bf cr7*4+2,14f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + subi r7,r7,2 + +14: bf cr7*4+3,15f +err1; lbz r0,0(r4) +err1; stb r0,0(r3) + +15: li r3,0 + blr + +EXPORT_SYMBOL_GPL(copy_mc_generic); --- a/arch/powerpc/lib/memcpy_mcsafe_64.S +++ /dev/null @@ -1,242 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright (C) IBM Corporation, 2011 - * Derived from copyuser_power7.s by Anton Blanchard anton@au.ibm.com - * Author - Balbir Singh bsingharora@gmail.com - */ -#include <asm/ppc_asm.h> -#include <asm/errno.h> -#include <asm/export.h> - - .macro err1 -100: - EX_TABLE(100b,.Ldo_err1) - .endm - - .macro err2 -200: - EX_TABLE(200b,.Ldo_err2) - .endm - - .macro err3 -300: EX_TABLE(300b,.Ldone) - .endm - -.Ldo_err2: - ld r22,STK_REG(R22)(r1) - ld r21,STK_REG(R21)(r1) - ld r20,STK_REG(R20)(r1) - ld r19,STK_REG(R19)(r1) - ld r18,STK_REG(R18)(r1) - ld r17,STK_REG(R17)(r1) - ld r16,STK_REG(R16)(r1) - ld r15,STK_REG(R15)(r1) - ld r14,STK_REG(R14)(r1) - addi r1,r1,STACKFRAMESIZE -.Ldo_err1: - /* Do a byte by byte copy to get the exact remaining size */ - mtctr r7 -46: -err3; lbz r0,0(r4) - addi r4,r4,1 -err3; stb r0,0(r3) - addi r3,r3,1 - bdnz 46b - li r3,0 - blr - -.Ldone: - mfctr r3 - blr - - -_GLOBAL(memcpy_mcsafe) - mr r7,r5 - cmpldi r5,16 - blt .Lshort_copy - -.Lcopy: - /* Get the source 8B aligned */ - neg r6,r4 - mtocrf 0x01,r6 - clrldi r6,r6,(64-3) - - bf cr7*4+3,1f -err1; lbz r0,0(r4) - addi r4,r4,1 -err1; stb r0,0(r3) - addi r3,r3,1 - subi r7,r7,1 - -1: bf cr7*4+2,2f -err1; lhz r0,0(r4) - addi r4,r4,2 -err1; sth r0,0(r3) - addi r3,r3,2 - subi r7,r7,2 - -2: bf cr7*4+1,3f -err1; lwz r0,0(r4) - addi r4,r4,4 -err1; stw r0,0(r3) - addi r3,r3,4 - subi r7,r7,4 - -3: sub r5,r5,r6 - cmpldi r5,128 - - mflr r0 - stdu r1,-STACKFRAMESIZE(r1) - std r14,STK_REG(R14)(r1) - std r15,STK_REG(R15)(r1) - std r16,STK_REG(R16)(r1) - std r17,STK_REG(R17)(r1) - std r18,STK_REG(R18)(r1) - std r19,STK_REG(R19)(r1) - std r20,STK_REG(R20)(r1) - std r21,STK_REG(R21)(r1) - std r22,STK_REG(R22)(r1) - std r0,STACKFRAMESIZE+16(r1) - - blt 5f - srdi r6,r5,7 - mtctr r6 - - /* Now do cacheline (128B) sized loads and stores. */ - .align 5 -4: -err2; ld r0,0(r4) -err2; ld r6,8(r4) -err2; ld r8,16(r4) -err2; ld r9,24(r4) -err2; ld r10,32(r4) -err2; ld r11,40(r4) -err2; ld r12,48(r4) -err2; ld r14,56(r4) -err2; ld r15,64(r4) -err2; ld r16,72(r4) -err2; ld r17,80(r4) -err2; ld r18,88(r4) -err2; ld r19,96(r4) -err2; ld r20,104(r4) -err2; ld r21,112(r4) -err2; ld r22,120(r4) - addi r4,r4,128 -err2; std r0,0(r3) -err2; std r6,8(r3) -err2; std r8,16(r3) -err2; std r9,24(r3) -err2; std r10,32(r3) -err2; std r11,40(r3) -err2; std r12,48(r3) -err2; std r14,56(r3) -err2; std r15,64(r3) -err2; std r16,72(r3) -err2; std r17,80(r3) -err2; std r18,88(r3) -err2; std r19,96(r3) -err2; std r20,104(r3) -err2; std r21,112(r3) -err2; std r22,120(r3) - addi r3,r3,128 - subi r7,r7,128 - bdnz 4b - - clrldi r5,r5,(64-7) - - /* Up to 127B to go */ -5: srdi r6,r5,4 - mtocrf 0x01,r6 - -6: bf cr7*4+1,7f -err2; ld r0,0(r4) -err2; ld r6,8(r4) -err2; ld r8,16(r4) -err2; ld r9,24(r4) -err2; ld r10,32(r4) -err2; ld r11,40(r4) -err2; ld r12,48(r4) -err2; ld r14,56(r4) - addi r4,r4,64 -err2; std r0,0(r3) -err2; std r6,8(r3) -err2; std r8,16(r3) -err2; std r9,24(r3) -err2; std r10,32(r3) -err2; std r11,40(r3) -err2; std r12,48(r3) -err2; std r14,56(r3) - addi r3,r3,64 - subi r7,r7,64 - -7: ld r14,STK_REG(R14)(r1) - ld r15,STK_REG(R15)(r1) - ld r16,STK_REG(R16)(r1) - ld r17,STK_REG(R17)(r1) - ld r18,STK_REG(R18)(r1) - ld r19,STK_REG(R19)(r1) - ld r20,STK_REG(R20)(r1) - ld r21,STK_REG(R21)(r1) - ld r22,STK_REG(R22)(r1) - addi r1,r1,STACKFRAMESIZE - - /* Up to 63B to go */ - bf cr7*4+2,8f -err1; ld r0,0(r4) -err1; ld r6,8(r4) -err1; ld r8,16(r4) -err1; ld r9,24(r4) - addi r4,r4,32 -err1; std r0,0(r3) -err1; std r6,8(r3) -err1; std r8,16(r3) -err1; std r9,24(r3) - addi r3,r3,32 - subi r7,r7,32 - - /* Up to 31B to go */ -8: bf cr7*4+3,9f -err1; ld r0,0(r4) -err1; ld r6,8(r4) - addi r4,r4,16 -err1; std r0,0(r3) -err1; std r6,8(r3) - addi r3,r3,16 - subi r7,r7,16 - -9: clrldi r5,r5,(64-4) - - /* Up to 15B to go */ -.Lshort_copy: - mtocrf 0x01,r5 - bf cr7*4+0,12f -err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ -err1; lwz r6,4(r4) - addi r4,r4,8 -err1; stw r0,0(r3) -err1; stw r6,4(r3) - addi r3,r3,8 - subi r7,r7,8 - -12: bf cr7*4+1,13f -err1; lwz r0,0(r4) - addi r4,r4,4 -err1; stw r0,0(r3) - addi r3,r3,4 - subi r7,r7,4 - -13: bf cr7*4+2,14f -err1; lhz r0,0(r4) - addi r4,r4,2 -err1; sth r0,0(r3) - addi r3,r3,2 - subi r7,r7,2 - -14: bf cr7*4+3,15f -err1; lbz r0,0(r4) -err1; stb r0,0(r3) - -15: li r3,0 - blr - -EXPORT_SYMBOL_GPL(memcpy_mcsafe); --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -75,7 +75,7 @@ config X86 select ARCH_HAS_PTE_DEVMAP if X86_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64 - select ARCH_HAS_UACCESS_MCSAFE if X86_64 && X86_MCE + select ARCH_HAS_COPY_MC if X86_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_SET_DIRECT_MAP select ARCH_HAS_STRICT_KERNEL_RWX --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -62,7 +62,7 @@ config EARLY_PRINTK_USB_XDBC You should normally say N here, unless you want to debug early crashes or need a very simple printk logging facility.
-config MCSAFE_TEST +config COPY_MC_TEST def_bool n
config EFI_PGT_DUMP --- /dev/null +++ b/arch/x86/include/asm/copy_mc_test.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _COPY_MC_TEST_H_ +#define _COPY_MC_TEST_H_ + +#ifndef __ASSEMBLY__ +#ifdef CONFIG_COPY_MC_TEST +extern unsigned long copy_mc_test_src; +extern unsigned long copy_mc_test_dst; + +static inline void copy_mc_inject_src(void *addr) +{ + if (addr) + copy_mc_test_src = (unsigned long) addr; + else + copy_mc_test_src = ~0UL; +} + +static inline void copy_mc_inject_dst(void *addr) +{ + if (addr) + copy_mc_test_dst = (unsigned long) addr; + else + copy_mc_test_dst = ~0UL; +} +#else /* CONFIG_COPY_MC_TEST */ +static inline void copy_mc_inject_src(void *addr) +{ +} + +static inline void copy_mc_inject_dst(void *addr) +{ +} +#endif /* CONFIG_COPY_MC_TEST */ + +#else /* __ASSEMBLY__ */ +#include <asm/export.h> + +#ifdef CONFIG_COPY_MC_TEST +.macro COPY_MC_TEST_CTL + .pushsection .data + .align 8 + .globl copy_mc_test_src + copy_mc_test_src: + .quad 0 + EXPORT_SYMBOL_GPL(copy_mc_test_src) + .globl copy_mc_test_dst + copy_mc_test_dst: + .quad 0 + EXPORT_SYMBOL_GPL(copy_mc_test_dst) + .popsection +.endm + +.macro COPY_MC_TEST_SRC reg count target + leaq \count(\reg), %r9 + cmp copy_mc_test_src, %r9 + ja \target +.endm + +.macro COPY_MC_TEST_DST reg count target + leaq \count(\reg), %r9 + cmp copy_mc_test_dst, %r9 + ja \target +.endm +#else +.macro COPY_MC_TEST_CTL +.endm + +.macro COPY_MC_TEST_SRC reg count target +.endm + +.macro COPY_MC_TEST_DST reg count target +.endm +#endif /* CONFIG_COPY_MC_TEST */ +#endif /* __ASSEMBLY__ */ +#endif /* _COPY_MC_TEST_H_ */ --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -174,6 +174,15 @@ extern void mce_unregister_decode_chain(
extern int mce_p5_enabled;
+#ifdef CONFIG_ARCH_HAS_COPY_MC +extern void enable_copy_mc_fragile(void); +unsigned long __must_check copy_mc_fragile(void *dst, const void *src, unsigned cnt); +#else +static inline void enable_copy_mc_fragile(void) +{ +} +#endif + #ifdef CONFIG_X86_MCE int mcheck_init(void); void mcheck_cpu_init(struct cpuinfo_x86 *c); --- a/arch/x86/include/asm/mcsafe_test.h +++ /dev/null @@ -1,75 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _MCSAFE_TEST_H_ -#define _MCSAFE_TEST_H_ - -#ifndef __ASSEMBLY__ -#ifdef CONFIG_MCSAFE_TEST -extern unsigned long mcsafe_test_src; -extern unsigned long mcsafe_test_dst; - -static inline void mcsafe_inject_src(void *addr) -{ - if (addr) - mcsafe_test_src = (unsigned long) addr; - else - mcsafe_test_src = ~0UL; -} - -static inline void mcsafe_inject_dst(void *addr) -{ - if (addr) - mcsafe_test_dst = (unsigned long) addr; - else - mcsafe_test_dst = ~0UL; -} -#else /* CONFIG_MCSAFE_TEST */ -static inline void mcsafe_inject_src(void *addr) -{ -} - -static inline void mcsafe_inject_dst(void *addr) -{ -} -#endif /* CONFIG_MCSAFE_TEST */ - -#else /* __ASSEMBLY__ */ -#include <asm/export.h> - -#ifdef CONFIG_MCSAFE_TEST -.macro MCSAFE_TEST_CTL - .pushsection .data - .align 8 - .globl mcsafe_test_src - mcsafe_test_src: - .quad 0 - EXPORT_SYMBOL_GPL(mcsafe_test_src) - .globl mcsafe_test_dst - mcsafe_test_dst: - .quad 0 - EXPORT_SYMBOL_GPL(mcsafe_test_dst) - .popsection -.endm - -.macro MCSAFE_TEST_SRC reg count target - leaq \count(\reg), %r9 - cmp mcsafe_test_src, %r9 - ja \target -.endm - -.macro MCSAFE_TEST_DST reg count target - leaq \count(\reg), %r9 - cmp mcsafe_test_dst, %r9 - ja \target -.endm -#else -.macro MCSAFE_TEST_CTL -.endm - -.macro MCSAFE_TEST_SRC reg count target -.endm - -.macro MCSAFE_TEST_DST reg count target -.endm -#endif /* CONFIG_MCSAFE_TEST */ -#endif /* __ASSEMBLY__ */ -#endif /* _MCSAFE_TEST_H_ */ --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -82,38 +82,6 @@ int strcmp(const char *cs, const char *c
#endif
-#define __HAVE_ARCH_MEMCPY_MCSAFE 1 -__must_check unsigned long __memcpy_mcsafe(void *dst, const void *src, - size_t cnt); -DECLARE_STATIC_KEY_FALSE(mcsafe_key); - -/** - * memcpy_mcsafe - copy memory with indication if a machine check happened - * - * @dst: destination address - * @src: source address - * @cnt: number of bytes to copy - * - * Low level memory copy function that catches machine checks - * We only call into the "safe" function on systems that can - * actually do machine check recovery. Everyone else can just - * use memcpy(). - * - * Return 0 for success, or number of bytes not copied if there was an - * exception. - */ -static __always_inline __must_check unsigned long -memcpy_mcsafe(void *dst, const void *src, size_t cnt) -{ -#ifdef CONFIG_X86_MCE - if (static_branch_unlikely(&mcsafe_key)) - return __memcpy_mcsafe(dst, src, cnt); - else -#endif - memcpy(dst, src, cnt); - return 0; -} - #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE #define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1 void __memcpy_flushcache(void *dst, const void *src, size_t cnt); --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -455,6 +455,15 @@ extern __must_check long strnlen_user(co unsigned long __must_check clear_user(void __user *mem, unsigned long len); unsigned long __must_check __clear_user(void __user *mem, unsigned long len);
+#ifdef CONFIG_ARCH_HAS_COPY_MC +unsigned long __must_check +copy_mc_to_kernel(void *to, const void *from, unsigned len); +#define copy_mc_to_kernel copy_mc_to_kernel + +unsigned long __must_check +copy_mc_to_user(void *to, const void *from, unsigned len); +#endif + /* * movsl can be slow when source and dest are not both 8-byte aligned */ --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -47,22 +47,6 @@ copy_user_generic(void *to, const void * }
static __always_inline __must_check unsigned long -copy_to_user_mcsafe(void *to, const void *from, unsigned len) -{ - unsigned long ret; - - __uaccess_begin(); - /* - * Note, __memcpy_mcsafe() is explicitly used since it can - * handle exceptions / faults. memcpy_mcsafe() may fall back to - * memcpy() which lacks this handling. - */ - ret = __memcpy_mcsafe(to, from, len); - __uaccess_end(); - return ret; -} - -static __always_inline __must_check unsigned long raw_copy_from_user(void *dst, const void __user *src, unsigned long size) { return copy_user_generic(dst, (__force void *)src, size); @@ -102,8 +86,4 @@ __copy_from_user_flushcache(void *dst, c kasan_check_write(dst, size); return __copy_user_flushcache(dst, src, size); } - -unsigned long -mcsafe_handle_tail(char *to, char *from, unsigned len); - #endif /* _ASM_X86_UACCESS_64_H */ --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -40,7 +40,6 @@ #include <linux/debugfs.h> #include <linux/irq_work.h> #include <linux/export.h> -#include <linux/jump_label.h> #include <linux/set_memory.h> #include <linux/sync_core.h> #include <linux/task_work.h> @@ -2127,7 +2126,7 @@ void mce_disable_bank(int bank) and older. * mce=nobootlog Don't log MCEs from before booting. * mce=bios_cmci_threshold Don't program the CMCI threshold - * mce=recovery force enable memcpy_mcsafe() + * mce=recovery force enable copy_mc_fragile() */ static int __init mcheck_enable(char *str) { @@ -2735,13 +2734,10 @@ static void __init mcheck_debugfs_init(v static void __init mcheck_debugfs_init(void) { } #endif
-DEFINE_STATIC_KEY_FALSE(mcsafe_key); -EXPORT_SYMBOL_GPL(mcsafe_key); - static int __init mcheck_late_init(void) { if (mca_cfg.recovery) - static_branch_inc(&mcsafe_key); + enable_copy_mc_fragile();
mcheck_debugfs_init();
--- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -8,6 +8,7 @@
#include <asm/hpet.h> #include <asm/setup.h> +#include <asm/mce.h>
#if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)
@@ -624,10 +625,6 @@ static void amd_disable_seq_and_redirect DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3, amd_disable_seq_and_redirect_scrub);
-#if defined(CONFIG_X86_64) && defined(CONFIG_X86_MCE) -#include <linux/jump_label.h> -#include <asm/string_64.h> - /* Ivy Bridge, Haswell, Broadwell */ static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev) { @@ -636,7 +633,7 @@ static void quirk_intel_brickland_xeon_r pci_read_config_dword(pdev, 0x84, &capid0);
if (capid0 & 0x10) - static_branch_inc(&mcsafe_key); + enable_copy_mc_fragile(); }
/* Skylake */ @@ -653,7 +650,7 @@ static void quirk_intel_purley_xeon_ras_ * enabled, so memory machine check recovery is also enabled. */ if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0)) - static_branch_inc(&mcsafe_key); + enable_copy_mc_fragile();
} DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap); @@ -661,7 +658,6 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, quirk_intel_brickland_xeon_ras_cap); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2083, quirk_intel_purley_xeon_ras_cap); #endif -#endif
bool x86_apple_machine; EXPORT_SYMBOL(x86_apple_machine); --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -44,6 +44,7 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o +lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o --- /dev/null +++ b/arch/x86/lib/copy_mc.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */ + +#include <linux/jump_label.h> +#include <linux/uaccess.h> +#include <linux/export.h> +#include <linux/string.h> +#include <linux/types.h> + +#include <asm/mce.h> + +#ifdef CONFIG_X86_MCE +/* + * See COPY_MC_TEST for self-test of the copy_mc_fragile() + * implementation. + */ +static DEFINE_STATIC_KEY_FALSE(copy_mc_fragile_key); + +void enable_copy_mc_fragile(void) +{ + static_branch_inc(©_mc_fragile_key); +} +#define copy_mc_fragile_enabled (static_branch_unlikely(©_mc_fragile_key)) + +/* + * Similar to copy_user_handle_tail, probe for the write fault point, or + * source exception point. + */ +__visible notrace unsigned long +copy_mc_fragile_handle_tail(char *to, char *from, unsigned len) +{ + for (; len; --len, to++, from++) + if (copy_mc_fragile(to, from, 1)) + break; + return len; +} +#else +/* + * No point in doing careful copying, or consulting a static key when + * there is no #MC handler in the CONFIG_X86_MCE=n case. + */ +void enable_copy_mc_fragile(void) +{ +} +#define copy_mc_fragile_enabled (0) +#endif + +/** + * copy_mc_to_kernel - memory copy that handles source exceptions + * + * @dst: destination address + * @src: source address + * @len: number of bytes to copy + * + * Call into the 'fragile' version on systems that have trouble + * actually do machine check recovery. Everyone else can just + * use memcpy(). + * + * Return 0 for success, or number of bytes not copied if there was an + * exception. + */ +unsigned long __must_check copy_mc_to_kernel(void *dst, const void *src, unsigned len) +{ + if (copy_mc_fragile_enabled) + return copy_mc_fragile(dst, src, len); + memcpy(dst, src, len); + return 0; +} +EXPORT_SYMBOL_GPL(copy_mc_to_kernel); + +unsigned long __must_check copy_mc_to_user(void *dst, const void *src, unsigned len) +{ + unsigned long ret; + + if (!copy_mc_fragile_enabled) + return copy_user_generic(dst, src, len); + + __uaccess_begin(); + ret = copy_mc_fragile(dst, src, len); + __uaccess_end(); + return ret; +} --- /dev/null +++ b/arch/x86/lib/copy_mc_64.S @@ -0,0 +1,127 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */ + +#include <linux/linkage.h> +#include <asm/copy_mc_test.h> +#include <asm/export.h> +#include <asm/asm.h> + +#ifndef CONFIG_UML + +#ifdef CONFIG_X86_MCE +COPY_MC_TEST_CTL + +/* + * copy_mc_fragile - copy memory with indication if an exception / fault happened + * + * The 'fragile' version is opted into by platform quirks and takes + * pains to avoid unrecoverable corner cases like 'fast-string' + * instruction sequences, and consuming poison across a cacheline + * boundary. The non-fragile version is equivalent to memcpy() + * regardless of CPU machine-check-recovery capability. + */ +SYM_FUNC_START(copy_mc_fragile) + cmpl $8, %edx + /* Less than 8 bytes? Go to byte copy loop */ + jb .L_no_whole_words + + /* Check for bad alignment of source */ + testl $7, %esi + /* Already aligned */ + jz .L_8byte_aligned + + /* Copy one byte at a time until source is 8-byte aligned */ + movl %esi, %ecx + andl $7, %ecx + subl $8, %ecx + negl %ecx + subl %ecx, %edx +.L_read_leading_bytes: + movb (%rsi), %al + COPY_MC_TEST_SRC %rsi 1 .E_leading_bytes + COPY_MC_TEST_DST %rdi 1 .E_leading_bytes +.L_write_leading_bytes: + movb %al, (%rdi) + incq %rsi + incq %rdi + decl %ecx + jnz .L_read_leading_bytes + +.L_8byte_aligned: + movl %edx, %ecx + andl $7, %edx + shrl $3, %ecx + jz .L_no_whole_words + +.L_read_words: + movq (%rsi), %r8 + COPY_MC_TEST_SRC %rsi 8 .E_read_words + COPY_MC_TEST_DST %rdi 8 .E_write_words +.L_write_words: + movq %r8, (%rdi) + addq $8, %rsi + addq $8, %rdi + decl %ecx + jnz .L_read_words + + /* Any trailing bytes? */ +.L_no_whole_words: + andl %edx, %edx + jz .L_done_memcpy_trap + + /* Copy trailing bytes */ + movl %edx, %ecx +.L_read_trailing_bytes: + movb (%rsi), %al + COPY_MC_TEST_SRC %rsi 1 .E_trailing_bytes + COPY_MC_TEST_DST %rdi 1 .E_trailing_bytes +.L_write_trailing_bytes: + movb %al, (%rdi) + incq %rsi + incq %rdi + decl %ecx + jnz .L_read_trailing_bytes + + /* Copy successful. Return zero */ +.L_done_memcpy_trap: + xorl %eax, %eax +.L_done: + ret +SYM_FUNC_END(copy_mc_fragile) +EXPORT_SYMBOL_GPL(copy_mc_fragile) + + .section .fixup, "ax" + /* + * Return number of bytes not copied for any failure. Note that + * there is no "tail" handling since the source buffer is 8-byte + * aligned and poison is cacheline aligned. + */ +.E_read_words: + shll $3, %ecx +.E_leading_bytes: + addl %edx, %ecx +.E_trailing_bytes: + mov %ecx, %eax + jmp .L_done + + /* + * For write fault handling, given the destination is unaligned, + * we handle faults on multi-byte writes with a byte-by-byte + * copy up to the write-protected page. + */ +.E_write_words: + shll $3, %ecx + addl %edx, %ecx + movl %ecx, %edx + jmp copy_mc_fragile_handle_tail + + .previous + + _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes) + _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words) + _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes) + _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes) + _ASM_EXTABLE(.L_write_words, .E_write_words) + _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes) +#endif /* CONFIG_X86_MCE */ +#endif /* !CONFIG_UML */ --- a/arch/x86/lib/memcpy_64.S +++ b/arch/x86/lib/memcpy_64.S @@ -4,7 +4,6 @@ #include <linux/linkage.h> #include <asm/errno.h> #include <asm/cpufeatures.h> -#include <asm/mcsafe_test.h> #include <asm/alternative-asm.h> #include <asm/export.h>
@@ -187,117 +186,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig) SYM_FUNC_END(memcpy_orig)
.popsection - -#ifndef CONFIG_UML - -MCSAFE_TEST_CTL - -/* - * __memcpy_mcsafe - memory copy with machine check exception handling - * Note that we only catch machine checks when reading the source addresses. - * Writes to target are posted and don't generate machine checks. - */ -SYM_FUNC_START(__memcpy_mcsafe) - cmpl $8, %edx - /* Less than 8 bytes? Go to byte copy loop */ - jb .L_no_whole_words - - /* Check for bad alignment of source */ - testl $7, %esi - /* Already aligned */ - jz .L_8byte_aligned - - /* Copy one byte at a time until source is 8-byte aligned */ - movl %esi, %ecx - andl $7, %ecx - subl $8, %ecx - negl %ecx - subl %ecx, %edx -.L_read_leading_bytes: - movb (%rsi), %al - MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes - MCSAFE_TEST_DST %rdi 1 .E_leading_bytes -.L_write_leading_bytes: - movb %al, (%rdi) - incq %rsi - incq %rdi - decl %ecx - jnz .L_read_leading_bytes - -.L_8byte_aligned: - movl %edx, %ecx - andl $7, %edx - shrl $3, %ecx - jz .L_no_whole_words - -.L_read_words: - movq (%rsi), %r8 - MCSAFE_TEST_SRC %rsi 8 .E_read_words - MCSAFE_TEST_DST %rdi 8 .E_write_words -.L_write_words: - movq %r8, (%rdi) - addq $8, %rsi - addq $8, %rdi - decl %ecx - jnz .L_read_words - - /* Any trailing bytes? */ -.L_no_whole_words: - andl %edx, %edx - jz .L_done_memcpy_trap - - /* Copy trailing bytes */ - movl %edx, %ecx -.L_read_trailing_bytes: - movb (%rsi), %al - MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes - MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes -.L_write_trailing_bytes: - movb %al, (%rdi) - incq %rsi - incq %rdi - decl %ecx - jnz .L_read_trailing_bytes - - /* Copy successful. Return zero */ -.L_done_memcpy_trap: - xorl %eax, %eax -.L_done: - ret -SYM_FUNC_END(__memcpy_mcsafe) -EXPORT_SYMBOL_GPL(__memcpy_mcsafe) - - .section .fixup, "ax" - /* - * Return number of bytes not copied for any failure. Note that - * there is no "tail" handling since the source buffer is 8-byte - * aligned and poison is cacheline aligned. - */ -.E_read_words: - shll $3, %ecx -.E_leading_bytes: - addl %edx, %ecx -.E_trailing_bytes: - mov %ecx, %eax - jmp .L_done - - /* - * For write fault handling, given the destination is unaligned, - * we handle faults on multi-byte writes with a byte-by-byte - * copy up to the write-protected page. - */ -.E_write_words: - shll $3, %ecx - addl %edx, %ecx - movl %ecx, %edx - jmp mcsafe_handle_tail - - .previous - - _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes) - _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words) - _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes) - _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes) - _ASM_EXTABLE(.L_write_words, .E_write_words) - _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes) -#endif --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -56,27 +56,6 @@ unsigned long clear_user(void __user *to } EXPORT_SYMBOL(clear_user);
-/* - * Similar to copy_user_handle_tail, probe for the write fault point, - * but reuse __memcpy_mcsafe in case a new read error is encountered. - * clac() is handled in _copy_to_iter_mcsafe(). - */ -__visible notrace unsigned long -mcsafe_handle_tail(char *to, char *from, unsigned len) -{ - for (; len; --len, to++, from++) { - /* - * Call the assembly routine back directly since - * memcpy_mcsafe() may silently fallback to memcpy. - */ - unsigned long rem = __memcpy_mcsafe(to, from, 1); - - if (rem) - break; - } - return len; -} - #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE /** * clean_cache_range - write back a cache range with CLWB --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -49,7 +49,7 @@ do { \ #define pmem_assign(dest, src) ((dest) = (src)) #endif
-#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM) +#if IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && defined(DM_WRITECACHE_HAS_PMEM) #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS #endif
@@ -992,7 +992,8 @@ static void writecache_resume(struct dm_ } wc->freelist_size = 0;
- r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t)); + r = copy_mc_to_kernel(&sb_seq_count, &sb(wc)->seq_count, + sizeof(uint64_t)); if (r) { writecache_error(wc, r, "hardware memory error when reading superblock: %d", r); sb_seq_count = cpu_to_le64(0); @@ -1008,7 +1009,8 @@ static void writecache_resume(struct dm_ e->seq_count = -1; continue; } - r = memcpy_mcsafe(&wme, memory_entry(wc, e), sizeof(struct wc_memory_entry)); + r = copy_mc_to_kernel(&wme, memory_entry(wc, e), + sizeof(struct wc_memory_entry)); if (r) { writecache_error(wc, r, "hardware memory error when reading metadata entry %lu: %d", (unsigned long)b, r); @@ -1206,7 +1208,7 @@ static void bio_copy_block(struct dm_wri
if (rw == READ) { int r; - r = memcpy_mcsafe(buf, data, size); + r = copy_mc_to_kernel(buf, data, size); flush_dcache_page(bio_page(bio)); if (unlikely(r)) { writecache_error(wc, r, "hardware memory error when reading data: %d", r); @@ -2349,7 +2351,7 @@ invalid_optional: } }
- r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock)); + r = copy_mc_to_kernel(&s, sb(wc), sizeof(struct wc_memory_superblock)); if (r) { ti->error = "Hardware memory error when reading superblock"; goto bad; @@ -2360,7 +2362,8 @@ invalid_optional: ti->error = "Unable to initialize device"; goto bad; } - r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock)); + r = copy_mc_to_kernel(&s, sb(wc), + sizeof(struct wc_memory_superblock)); if (r) { ti->error = "Hardware memory error when reading superblock"; goto bad; --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -268,7 +268,7 @@ static int nsio_rw_bytes(struct nd_names if (rw == READ) { if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) return -EIO; - if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0) + if (copy_mc_to_kernel(buf, nsio->addr + offset, size) != 0) return -EIO; return 0; } --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -125,7 +125,7 @@ static blk_status_t read_pmem(struct pag while (len) { mem = kmap_atomic(page); chunk = min_t(unsigned int, len, PAGE_SIZE - off); - rem = memcpy_mcsafe(mem + off, pmem_addr, chunk); + rem = copy_mc_to_kernel(mem + off, pmem_addr, chunk); kunmap_atomic(mem); if (rem) return BLK_STS_IOERR; @@ -304,7 +304,7 @@ static long pmem_dax_direct_access(struc
/* * Use the 'no check' versions of copy_from_iter_flushcache() and - * copy_to_iter_mcsafe() to bypass HARDENED_USERCOPY overhead. Bounds + * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds * checking, both file offset and device offset, is handled by * dax_iomap_actor() */ @@ -317,7 +317,7 @@ static size_t pmem_copy_from_iter(struct static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) { - return _copy_to_iter_mcsafe(addr, bytes, i); + return _copy_mc_to_iter(addr, bytes, i); }
static const struct dax_operations pmem_dax_ops = { --- a/include/linux/string.h +++ b/include/linux/string.h @@ -161,20 +161,13 @@ extern int bcmp(const void *,const void #ifndef __HAVE_ARCH_MEMCHR extern void * memchr(const void *,int,__kernel_size_t); #endif -#ifndef __HAVE_ARCH_MEMCPY_MCSAFE -static inline __must_check unsigned long memcpy_mcsafe(void *dst, - const void *src, size_t cnt) -{ - memcpy(dst, src, cnt); - return 0; -} -#endif #ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt) { memcpy(dst, src, cnt); } #endif + void *memchr_inv(const void *s, int c, size_t n); char *strreplace(char *s, char old, char new);
--- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -179,6 +179,19 @@ copy_in_user(void __user *to, const void } #endif
+#ifndef copy_mc_to_kernel +/* + * Without arch opt-in this generic copy_mc_to_kernel() will not handle + * #MC (or arch equivalent) during source read. + */ +static inline unsigned long __must_check +copy_mc_to_kernel(void *dst, const void *src, size_t cnt) +{ + memcpy(dst, src, cnt); + return 0; +} +#endif + static __always_inline void pagefault_disabled_inc(void) { current->pagefault_disabled++; --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -185,10 +185,10 @@ size_t _copy_from_iter_flushcache(void * #define _copy_from_iter_flushcache _copy_from_iter_nocache #endif
-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE -size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i); +#ifdef CONFIG_ARCH_HAS_COPY_MC +size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i); #else -#define _copy_to_iter_mcsafe _copy_to_iter +#define _copy_mc_to_iter _copy_to_iter #endif
static __always_inline __must_check @@ -201,12 +201,12 @@ size_t copy_from_iter_flushcache(void *a }
static __always_inline __must_check -size_t copy_to_iter_mcsafe(void *addr, size_t bytes, struct iov_iter *i) +size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i) { if (unlikely(!check_copy_size(addr, bytes, true))) return 0; else - return _copy_to_iter_mcsafe(addr, bytes, i); + return _copy_mc_to_iter(addr, bytes, i); }
size_t iov_iter_zero(size_t bytes, struct iov_iter *); --- a/lib/Kconfig +++ b/lib/Kconfig @@ -635,7 +635,12 @@ config UACCESS_MEMCPY config ARCH_HAS_UACCESS_FLUSHCACHE bool
-config ARCH_HAS_UACCESS_MCSAFE +# arch has a concept of a recoverable synchronous exception due to a +# memory-read error like x86 machine-check or ARM data-abort, and +# implements copy_mc_to_{user,kernel} to abort and report +# 'bytes-transferred' if that exception fires when accessing the source +# buffer. +config ARCH_HAS_COPY_MC bool
# Temporary. Goes away when all archs are cleaned up --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -637,30 +637,30 @@ size_t _copy_to_iter(const void *addr, s } EXPORT_SYMBOL(_copy_to_iter);
-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE -static int copyout_mcsafe(void __user *to, const void *from, size_t n) +#ifdef CONFIG_ARCH_HAS_COPY_MC +static int copyout_mc(void __user *to, const void *from, size_t n) { if (access_ok(to, n)) { instrument_copy_to_user(to, from, n); - n = copy_to_user_mcsafe((__force void *) to, from, n); + n = copy_mc_to_user((__force void *) to, from, n); } return n; }
-static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset, +static unsigned long copy_mc_to_page(struct page *page, size_t offset, const char *from, size_t len) { unsigned long ret; char *to;
to = kmap_atomic(page); - ret = memcpy_mcsafe(to + offset, from, len); + ret = copy_mc_to_kernel(to + offset, from, len); kunmap_atomic(to);
return ret; }
-static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes, +static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; @@ -678,7 +678,7 @@ static size_t copy_pipe_to_iter_mcsafe(c size_t chunk = min_t(size_t, n, PAGE_SIZE - off); unsigned long rem;
- rem = memcpy_mcsafe_to_page(pipe->bufs[i_head & p_mask].page, + rem = copy_mc_to_page(pipe->bufs[i_head & p_mask].page, off, addr, chunk); i->head = i_head; i->iov_offset = off + chunk - rem; @@ -695,18 +695,17 @@ static size_t copy_pipe_to_iter_mcsafe(c }
/** - * _copy_to_iter_mcsafe - copy to user with source-read error exception handling + * _copy_mc_to_iter - copy to iter with source memory error exception handling * @addr: source kernel address * @bytes: total transfer length * @iter: destination iterator * - * The pmem driver arranges for filesystem-dax to use this facility via - * dax_copy_to_iter() for protecting read/write to persistent memory. - * Unless / until an architecture can guarantee identical performance - * between _copy_to_iter_mcsafe() and _copy_to_iter() it would be a - * performance regression to switch more users to the mcsafe version. + * The pmem driver deploys this for the dax operation + * (dax_copy_to_iter()) for dax reads (bypass page-cache and the + * block-layer). Upon #MC read(2) aborts and returns EIO or the bytes + * successfully copied. * - * Otherwise, the main differences between this and typical _copy_to_iter(). + * The main differences between this and typical _copy_to_iter(). * * * Typical tail/residue handling after a fault retries the copy * byte-by-byte until the fault happens again. Re-triggering machine @@ -717,23 +716,22 @@ static size_t copy_pipe_to_iter_mcsafe(c * * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies. * Compare to copy_to_iter() where only ITER_IOVEC attempts might return * a short copy. - * - * See MCSAFE_TEST for self-test. */ -size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i) +size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { const char *from = addr; unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
if (unlikely(iov_iter_is_pipe(i))) - return copy_pipe_to_iter_mcsafe(addr, bytes, i); + return copy_mc_pipe_to_iter(addr, bytes, i); if (iter_is_iovec(i)) might_fault(); iterate_and_advance(i, bytes, v, - copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), + copyout_mc(v.iov_base, (from += v.iov_len) - v.iov_len, + v.iov_len), ({ - rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset, - (from += v.bv_len) - v.bv_len, v.bv_len); + rem = copy_mc_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len); if (rem) { curr_addr = (unsigned long) from; bytes = curr_addr - s_addr - rem; @@ -741,8 +739,8 @@ size_t _copy_to_iter_mcsafe(const void * } }), ({ - rem = memcpy_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, - v.iov_len); + rem = copy_mc_to_kernel(v.iov_base, (from += v.iov_len) + - v.iov_len, v.iov_len); if (rem) { curr_addr = (unsigned long) from; bytes = curr_addr - s_addr - rem; @@ -753,8 +751,8 @@ size_t _copy_to_iter_mcsafe(const void *
return bytes; } -EXPORT_SYMBOL_GPL(_copy_to_iter_mcsafe); -#endif /* CONFIG_ARCH_HAS_UACCESS_MCSAFE */ +EXPORT_SYMBOL_GPL(_copy_mc_to_iter); +#endif /* CONFIG_ARCH_HAS_COPY_MC */
size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) { --- a/tools/arch/x86/include/asm/mcsafe_test.h +++ /dev/null @@ -1,13 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _MCSAFE_TEST_H_ -#define _MCSAFE_TEST_H_ - -.macro MCSAFE_TEST_CTL -.endm - -.macro MCSAFE_TEST_SRC reg count target -.endm - -.macro MCSAFE_TEST_DST reg count target -.endm -#endif /* _MCSAFE_TEST_H_ */ --- a/tools/arch/x86/lib/memcpy_64.S +++ b/tools/arch/x86/lib/memcpy_64.S @@ -4,7 +4,6 @@ #include <linux/linkage.h> #include <asm/errno.h> #include <asm/cpufeatures.h> -#include <asm/mcsafe_test.h> #include <asm/alternative-asm.h> #include <asm/export.h>
@@ -187,117 +186,3 @@ SYM_FUNC_START(memcpy_orig) SYM_FUNC_END(memcpy_orig)
.popsection - -#ifndef CONFIG_UML - -MCSAFE_TEST_CTL - -/* - * __memcpy_mcsafe - memory copy with machine check exception handling - * Note that we only catch machine checks when reading the source addresses. - * Writes to target are posted and don't generate machine checks. - */ -SYM_FUNC_START(__memcpy_mcsafe) - cmpl $8, %edx - /* Less than 8 bytes? Go to byte copy loop */ - jb .L_no_whole_words - - /* Check for bad alignment of source */ - testl $7, %esi - /* Already aligned */ - jz .L_8byte_aligned - - /* Copy one byte at a time until source is 8-byte aligned */ - movl %esi, %ecx - andl $7, %ecx - subl $8, %ecx - negl %ecx - subl %ecx, %edx -.L_read_leading_bytes: - movb (%rsi), %al - MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes - MCSAFE_TEST_DST %rdi 1 .E_leading_bytes -.L_write_leading_bytes: - movb %al, (%rdi) - incq %rsi - incq %rdi - decl %ecx - jnz .L_read_leading_bytes - -.L_8byte_aligned: - movl %edx, %ecx - andl $7, %edx - shrl $3, %ecx - jz .L_no_whole_words - -.L_read_words: - movq (%rsi), %r8 - MCSAFE_TEST_SRC %rsi 8 .E_read_words - MCSAFE_TEST_DST %rdi 8 .E_write_words -.L_write_words: - movq %r8, (%rdi) - addq $8, %rsi - addq $8, %rdi - decl %ecx - jnz .L_read_words - - /* Any trailing bytes? */ -.L_no_whole_words: - andl %edx, %edx - jz .L_done_memcpy_trap - - /* Copy trailing bytes */ - movl %edx, %ecx -.L_read_trailing_bytes: - movb (%rsi), %al - MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes - MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes -.L_write_trailing_bytes: - movb %al, (%rdi) - incq %rsi - incq %rdi - decl %ecx - jnz .L_read_trailing_bytes - - /* Copy successful. Return zero */ -.L_done_memcpy_trap: - xorl %eax, %eax -.L_done: - ret -SYM_FUNC_END(__memcpy_mcsafe) -EXPORT_SYMBOL_GPL(__memcpy_mcsafe) - - .section .fixup, "ax" - /* - * Return number of bytes not copied for any failure. Note that - * there is no "tail" handling since the source buffer is 8-byte - * aligned and poison is cacheline aligned. - */ -.E_read_words: - shll $3, %ecx -.E_leading_bytes: - addl %edx, %ecx -.E_trailing_bytes: - mov %ecx, %eax - jmp .L_done - - /* - * For write fault handling, given the destination is unaligned, - * we handle faults on multi-byte writes with a byte-by-byte - * copy up to the write-protected page. - */ -.E_write_words: - shll $3, %ecx - addl %edx, %ecx - movl %ecx, %edx - jmp mcsafe_handle_tail - - .previous - - _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes) - _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words) - _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes) - _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes) - _ASM_EXTABLE(.L_write_words, .E_write_words) - _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes) -#endif --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -548,8 +548,8 @@ static const char *uaccess_safe_builtin[ "__ubsan_handle_shift_out_of_bounds", /* misc */ "csum_partial_copy_generic", - "__memcpy_mcsafe", - "mcsafe_handle_tail", + "copy_mc_fragile", + "copy_mc_fragile_handle_tail", "ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */ NULL }; --- a/tools/perf/bench/Build +++ b/tools/perf/bench/Build @@ -13,7 +13,6 @@ perf-y += synthesize.o perf-y += kallsyms-parse.o perf-y += find-bit-bench.o
-perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o
--- a/tools/perf/bench/mem-memcpy-x86-64-lib.c +++ /dev/null @@ -1,24 +0,0 @@ -/* - * From code in arch/x86/lib/usercopy_64.c, copied to keep tools/ copy - * of the kernel's arch/x86/lib/memcpy_64.s used in 'perf bench mem memcpy' - * happy. - */ -#include <linux/types.h> - -unsigned long __memcpy_mcsafe(void *dst, const void *src, size_t cnt); -unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len); - -unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len) -{ - for (; len; --len, to++, from++) { - /* - * Call the assembly routine back directly since - * memcpy_mcsafe() may silently fallback to memcpy. - */ - unsigned long rem = __memcpy_mcsafe(to, from, 1); - - if (rem) - break; - } - return len; -} --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -23,7 +23,8 @@ #include "nfit_test.h" #include "../watermark.h"
-#include <asm/mcsafe_test.h> +#include <asm/copy_mc_test.h> +#include <asm/mce.h>
/* * Generate an NFIT table to describe the following topology: @@ -3283,7 +3284,7 @@ static struct platform_driver nfit_test_ .id_table = nfit_test_id, };
-static char mcsafe_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE))); +static char copy_mc_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
enum INJECT { INJECT_NONE, @@ -3291,7 +3292,7 @@ enum INJECT { INJECT_DST, };
-static void mcsafe_test_init(char *dst, char *src, size_t size) +static void copy_mc_test_init(char *dst, char *src, size_t size) { size_t i;
@@ -3300,7 +3301,7 @@ static void mcsafe_test_init(char *dst, src[i] = (char) i; }
-static bool mcsafe_test_validate(unsigned char *dst, unsigned char *src, +static bool copy_mc_test_validate(unsigned char *dst, unsigned char *src, size_t size, unsigned long rem) { size_t i; @@ -3321,12 +3322,12 @@ static bool mcsafe_test_validate(unsigne return true; }
-void mcsafe_test(void) +void copy_mc_test(void) { char *inject_desc[] = { "none", "source", "destination" }; enum INJECT inj;
- if (IS_ENABLED(CONFIG_MCSAFE_TEST)) { + if (IS_ENABLED(CONFIG_COPY_MC_TEST)) { pr_info("%s: run...\n", __func__); } else { pr_info("%s: disabled, skip.\n", __func__); @@ -3344,31 +3345,31 @@ void mcsafe_test(void)
switch (inj) { case INJECT_NONE: - mcsafe_inject_src(NULL); - mcsafe_inject_dst(NULL); - dst = &mcsafe_buf[2048]; - src = &mcsafe_buf[1024 - i]; + copy_mc_inject_src(NULL); + copy_mc_inject_dst(NULL); + dst = ©_mc_buf[2048]; + src = ©_mc_buf[1024 - i]; expect = 0; break; case INJECT_SRC: - mcsafe_inject_src(&mcsafe_buf[1024]); - mcsafe_inject_dst(NULL); - dst = &mcsafe_buf[2048]; - src = &mcsafe_buf[1024 - i]; + copy_mc_inject_src(©_mc_buf[1024]); + copy_mc_inject_dst(NULL); + dst = ©_mc_buf[2048]; + src = ©_mc_buf[1024 - i]; expect = 512 - i; break; case INJECT_DST: - mcsafe_inject_src(NULL); - mcsafe_inject_dst(&mcsafe_buf[2048]); - dst = &mcsafe_buf[2048 - i]; - src = &mcsafe_buf[1024]; + copy_mc_inject_src(NULL); + copy_mc_inject_dst(©_mc_buf[2048]); + dst = ©_mc_buf[2048 - i]; + src = ©_mc_buf[1024]; expect = 512 - i; break; }
- mcsafe_test_init(dst, src, 512); - rem = __memcpy_mcsafe(dst, src, 512); - valid = mcsafe_test_validate(dst, src, 512, expect); + copy_mc_test_init(dst, src, 512); + rem = copy_mc_fragile(dst, src, 512); + valid = copy_mc_test_validate(dst, src, 512, expect); if (rem == expect && valid) continue; pr_info("%s: copy(%#lx, %#lx, %d) off: %d rem: %ld %s expect: %ld\n", @@ -3380,8 +3381,8 @@ void mcsafe_test(void) } }
- mcsafe_inject_src(NULL); - mcsafe_inject_dst(NULL); + copy_mc_inject_src(NULL); + copy_mc_inject_dst(NULL); }
static __init int nfit_test_init(void) @@ -3392,7 +3393,7 @@ static __init int nfit_test_init(void) libnvdimm_test(); acpi_nfit_test(); device_dax_test(); - mcsafe_test(); + copy_mc_test(); dax_pmem_test(); dax_pmem_core_test(); #ifdef CONFIG_DEV_DAX_PMEM_COMPAT --- a/tools/testing/selftests/powerpc/copyloops/.gitignore +++ b/tools/testing/selftests/powerpc/copyloops/.gitignore @@ -12,4 +12,4 @@ memcpy_p7_t1 copyuser_64_exc_t0 copyuser_64_exc_t1 copyuser_64_exc_t2 -memcpy_mcsafe_64 +copy_mc_64 --- a/tools/testing/selftests/powerpc/copyloops/Makefile +++ b/tools/testing/selftests/powerpc/copyloops/Makefile @@ -12,7 +12,7 @@ ASFLAGS = $(CFLAGS) -Wa,-mpower4 TEST_GEN_PROGS := copyuser_64_t0 copyuser_64_t1 copyuser_64_t2 \ copyuser_p7_t0 copyuser_p7_t1 \ memcpy_64_t0 memcpy_64_t1 memcpy_64_t2 \ - memcpy_p7_t0 memcpy_p7_t1 memcpy_mcsafe_64 \ + memcpy_p7_t0 memcpy_p7_t1 copy_mc_64 \ copyuser_64_exc_t0 copyuser_64_exc_t1 copyuser_64_exc_t2
EXTRA_SOURCES := validate.c ../harness.c stubs.S @@ -45,9 +45,9 @@ $(OUTPUT)/memcpy_p7_t%: memcpy_power7.S -D SELFTEST_CASE=$(subst memcpy_p7_t,,$(notdir $@)) \ -o $@ $^
-$(OUTPUT)/memcpy_mcsafe_64: memcpy_mcsafe_64.S $(EXTRA_SOURCES) +$(OUTPUT)/copy_mc_64: copy_mc_64.S $(EXTRA_SOURCES) $(CC) $(CPPFLAGS) $(CFLAGS) \ - -D COPY_LOOP=test_memcpy_mcsafe \ + -D COPY_LOOP=test_copy_mc_generic \ -o $@ $^
$(OUTPUT)/copyuser_64_exc_t%: copyuser_64.S exc_validate.c ../harness.c \ --- /dev/null +++ b/tools/testing/selftests/powerpc/copyloops/copy_mc_64.S @@ -0,0 +1,242 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard anton@au.ibm.com + * Author - Balbir Singh bsingharora@gmail.com + */ +#include <asm/ppc_asm.h> +#include <asm/errno.h> +#include <asm/export.h> + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + + .macro err3 +300: EX_TABLE(300b,.Ldone) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addi r1,r1,STACKFRAMESIZE +.Ldo_err1: + /* Do a byte by byte copy to get the exact remaining size */ + mtctr r7 +46: +err3; lbz r0,0(r4) + addi r4,r4,1 +err3; stb r0,0(r3) + addi r3,r3,1 + bdnz 46b + li r3,0 + blr + +.Ldone: + mfctr r3 + blr + + +_GLOBAL(copy_mc_generic) + mr r7,r5 + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addi r4,r4,1 +err1; stb r0,0(r3) + addi r3,r3,1 + subi r7,r7,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + subi r7,r7,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + subi r7,r7,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + + mflr r0 + stdu r1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + blt 5f + srdi r6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r8,16(r4) +err2; ld r9,24(r4) +err2; ld r10,32(r4) +err2; ld r11,40(r4) +err2; ld r12,48(r4) +err2; ld r14,56(r4) +err2; ld r15,64(r4) +err2; ld r16,72(r4) +err2; ld r17,80(r4) +err2; ld r18,88(r4) +err2; ld r19,96(r4) +err2; ld r20,104(r4) +err2; ld r21,112(r4) +err2; ld r22,120(r4) + addi r4,r4,128 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r8,16(r3) +err2; std r9,24(r3) +err2; std r10,32(r3) +err2; std r11,40(r3) +err2; std r12,48(r3) +err2; std r14,56(r3) +err2; std r15,64(r3) +err2; std r16,72(r3) +err2; std r17,80(r3) +err2; std r18,88(r3) +err2; std r19,96(r3) +err2; std r20,104(r3) +err2; std r21,112(r3) +err2; std r22,120(r3) + addi r3,r3,128 + subi r7,r7,128 + bdnz 4b + + clrldi r5,r5,(64-7) + + /* Up to 127B to go */ +5: srdi r6,r5,4 + mtocrf 0x01,r6 + +6: bf cr7*4+1,7f +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r8,16(r4) +err2; ld r9,24(r4) +err2; ld r10,32(r4) +err2; ld r11,40(r4) +err2; ld r12,48(r4) +err2; ld r14,56(r4) + addi r4,r4,64 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r8,16(r3) +err2; std r9,24(r3) +err2; std r10,32(r3) +err2; std r11,40(r3) +err2; std r12,48(r3) +err2; std r14,56(r3) + addi r3,r3,64 + subi r7,r7,64 + +7: ld r14,STK_REG(R14)(r1) + ld r15,STK_REG(R15)(r1) + ld r16,STK_REG(R16)(r1) + ld r17,STK_REG(R17)(r1) + ld r18,STK_REG(R18)(r1) + ld r19,STK_REG(R19)(r1) + ld r20,STK_REG(R20)(r1) + ld r21,STK_REG(R21)(r1) + ld r22,STK_REG(R22)(r1) + addi r1,r1,STACKFRAMESIZE + + /* Up to 63B to go */ + bf cr7*4+2,8f +err1; ld r0,0(r4) +err1; ld r6,8(r4) +err1; ld r8,16(r4) +err1; ld r9,24(r4) + addi r4,r4,32 +err1; std r0,0(r3) +err1; std r6,8(r3) +err1; std r8,16(r3) +err1; std r9,24(r3) + addi r3,r3,32 + subi r7,r7,32 + + /* Up to 31B to go */ +8: bf cr7*4+3,9f +err1; ld r0,0(r4) +err1; ld r6,8(r4) + addi r4,r4,16 +err1; std r0,0(r3) +err1; std r6,8(r3) + addi r3,r3,16 + subi r7,r7,16 + +9: clrldi r5,r5,(64-4) + + /* Up to 15B to go */ +.Lshort_copy: + mtocrf 0x01,r5 + bf cr7*4+0,12f +err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ +err1; lwz r6,4(r4) + addi r4,r4,8 +err1; stw r0,0(r3) +err1; stw r6,4(r3) + addi r3,r3,8 + subi r7,r7,8 + +12: bf cr7*4+1,13f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + subi r7,r7,4 + +13: bf cr7*4+2,14f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + subi r7,r7,2 + +14: bf cr7*4+3,15f +err1; lbz r0,0(r4) +err1; stb r0,0(r3) + +15: li r3,0 + blr + +EXPORT_SYMBOL_GPL(copy_mc_generic);
Hi Greg, hi Dan,
Am Samstag, 31. Oktober 2020, 12:36:06 CET schrieb Greg Kroah-Hartman:
From: Dan Williams dan.j.williams@intel.com
commit ec6347bb43395cb92126788a1a5b25302543f815 upstream.
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled.
Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks.
arch/powerpc/lib/copy_mc_64.S | 242 +++++++++++++++++ arch/powerpc/lib/memcpy_mcsafe_64.S | 242 -----------------
tools/testing/selftests/powerpc/copyloops/copy_mc_64.S | 242 +++++++++++++++++
This change leaves a dangling symlink in tools/testing/selftests/powerpc/copyloops behind. At least, this is, what I could track down, when building 5.9.3 within an environment, that bails out on this:
[ 2908s] calling /usr/lib/rpm/brp-suse.d/brp-25-symlink [ 2908s] ERROR: link target doesn't exist (neither in build root nor in installed system): [ 2908s] /usr/src/linux-5.9.3-lp152.3-vanilla/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S -> /usr/src/linux-5.9.3-lp152.3-vanilla/arch/powerpc/lib/memcpy_mcsafe_64.S [ 2908s] Add the package providing the target to BuildRequires and Requires [ 2909s] INFO: relinking /usr/src/linux-5.9.3-lp152.3-vanilla/tools/testing/selftests/powerpc/primitives/asm/asm-compat.h -> ../../../../../../arch/powerpc/include/asm/asm-compat.h (was ../.././../../../../arch/powerpc/include/asm/asm-compat.h)
Linus` tree seems to not suffer from this, though.
Cheers, Pete
On Mon, Nov 02, 2020 at 10:34:08PM +0100, Hans-Peter Jansen wrote:
Hi Greg, hi Dan,
Am Samstag, 31. Oktober 2020, 12:36:06 CET schrieb Greg Kroah-Hartman:
From: Dan Williams dan.j.williams@intel.com
commit ec6347bb43395cb92126788a1a5b25302543f815 upstream.
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled.
Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks.
arch/powerpc/lib/copy_mc_64.S | 242 +++++++++++++++++ arch/powerpc/lib/memcpy_mcsafe_64.S | 242 -----------------
tools/testing/selftests/powerpc/copyloops/copy_mc_64.S | 242 +++++++++++++++++
This change leaves a dangling symlink in tools/testing/selftests/powerpc/copyloops behind. At least, this is, what I could track down, when building 5.9.3 within an environment, that bails out on this:
[ 2908s] calling /usr/lib/rpm/brp-suse.d/brp-25-symlink [ 2908s] ERROR: link target doesn't exist (neither in build root nor in installed system): [ 2908s] /usr/src/linux-5.9.3-lp152.3-vanilla/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S -> /usr/src/linux-5.9.3-lp152.3-vanilla/arch/powerpc/lib/memcpy_mcsafe_64.S [ 2908s] Add the package providing the target to BuildRequires and Requires [ 2909s] INFO: relinking /usr/src/linux-5.9.3-lp152.3-vanilla/tools/testing/selftests/powerpc/primitives/asm/asm-compat.h -> ../../../../../../arch/powerpc/include/asm/asm-compat.h (was ../.././../../../../arch/powerpc/include/asm/asm-compat.h)
Linus` tree seems to not suffer from this, though.
Ah, that kind of makes sense, I saw odd things with these patches that I couldn't figure out.
So, is there a symlink that I need to add/fix to resolve this?
thanks,
greg k-h
Am Dienstag, 3. November 2020, 07:35:29 CET schrieb Greg Kroah-Hartman:
On Mon, Nov 02, 2020 at 10:34:08PM +0100, Hans-Peter Jansen wrote:
Ah, that kind of makes sense, I saw odd things with these patches that I couldn't figure out.
So, is there a symlink that I need to add/fix to resolve this?
rm tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S should do the trick.
Cheers, Pete
From: Dan Williams dan.j.williams@intel.com
commit 5da8e4a658109e3b7e1f45ae672b7c06ac3e7158 upstream.
The motivations to go rework memcpy_mcsafe() are that the benefit of doing slow and careful copies is obviated on newer CPUs, and that the current opt-in list of CPUs to instrument recovery is broken relative to those CPUs. There is no need to keep an opt-in list up to date on an ongoing basis if pmem/dax operations are instrumented for recovery by default. With recovery enabled by default the old "mcsafe_key" opt-in to careful copying can be made a "fragile" opt-out. Where the "fragile" list takes steps to not consume poison across cachelines.
The discussion with Linus made clear that the current "_mcsafe" suffix was imprecise to a fault. The operations that are needed by pmem/dax are to copy from a source address that might throw #MC to a destination that may write-fault, if it is a user page.
So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate the separate precautions taken on source and destination. copy_mc_to_kernel() is introduced as a non-SMAP version that does not expect write-faults on the destination, but is still prepared to abort with an error code upon taking #MC.
The original copy_mc_fragile() implementation had negative performance implications since it did not use the fast-string instruction sequence to perform copies. For this reason copy_mc_to_kernel() fell back to plain memcpy() to preserve performance on platforms that did not indicate the capability to recover from machine check exceptions. However, that capability detection was not architectural and now that some platforms can recover from fast-string consumption of memory errors the memcpy() fallback now causes these more capable platforms to fail.
Introduce copy_mc_enhanced_fast_string() as the fast default implementation of copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'. With this in place, copy_mc_to_kernel() is fast and recovery-ready by default regardless of hardware capability.
Thanks to Vivek for identifying that copy_user_generic() is not suitable as the copy_mc_to_user() backend since the #MC handler explicitly checks ex_has_fault_handler(). Thanks to the 0day robot for catching a performance bug in the x86/copy_mc_to_user implementation.
[ bp: Add the "why" for this change from the 0/2th message, massage. ]
Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()") Reported-by: Erwin Tsaur erwin.tsaur@intel.com Reported-by: 0day robot lkp@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Tony Luck tony.luck@intel.com Tested-by: Erwin Tsaur erwin.tsaur@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dw... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/lib/copy_mc.c | 32 +++++++++++++++++++++++--------- arch/x86/lib/copy_mc_64.S | 36 ++++++++++++++++++++++++++++++++++++ tools/objtool/check.c | 1 + 3 files changed, 60 insertions(+), 9 deletions(-)
--- a/arch/x86/lib/copy_mc.c +++ b/arch/x86/lib/copy_mc.c @@ -45,6 +45,8 @@ void enable_copy_mc_fragile(void) #define copy_mc_fragile_enabled (0) #endif
+unsigned long copy_mc_enhanced_fast_string(void *dst, const void *src, unsigned len); + /** * copy_mc_to_kernel - memory copy that handles source exceptions * @@ -52,9 +54,11 @@ void enable_copy_mc_fragile(void) * @src: source address * @len: number of bytes to copy * - * Call into the 'fragile' version on systems that have trouble - * actually do machine check recovery. Everyone else can just - * use memcpy(). + * Call into the 'fragile' version on systems that benefit from avoiding + * corner case poison consumption scenarios, For example, accessing + * poison across 2 cachelines with a single instruction. Almost all + * other uses case can use copy_mc_enhanced_fast_string() for a fast + * recoverable copy, or fallback to plain memcpy. * * Return 0 for success, or number of bytes not copied if there was an * exception. @@ -63,6 +67,8 @@ unsigned long __must_check copy_mc_to_ke { if (copy_mc_fragile_enabled) return copy_mc_fragile(dst, src, len); + if (static_cpu_has(X86_FEATURE_ERMS)) + return copy_mc_enhanced_fast_string(dst, src, len); memcpy(dst, src, len); return 0; } @@ -72,11 +78,19 @@ unsigned long __must_check copy_mc_to_us { unsigned long ret;
- if (!copy_mc_fragile_enabled) - return copy_user_generic(dst, src, len); + if (copy_mc_fragile_enabled) { + __uaccess_begin(); + ret = copy_mc_fragile(dst, src, len); + __uaccess_end(); + return ret; + } + + if (static_cpu_has(X86_FEATURE_ERMS)) { + __uaccess_begin(); + ret = copy_mc_enhanced_fast_string(dst, src, len); + __uaccess_end(); + return ret; + }
- __uaccess_begin(); - ret = copy_mc_fragile(dst, src, len); - __uaccess_end(); - return ret; + return copy_user_generic(dst, src, len); } --- a/arch/x86/lib/copy_mc_64.S +++ b/arch/x86/lib/copy_mc_64.S @@ -124,4 +124,40 @@ EXPORT_SYMBOL_GPL(copy_mc_fragile) _ASM_EXTABLE(.L_write_words, .E_write_words) _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes) #endif /* CONFIG_X86_MCE */ + +/* + * copy_mc_enhanced_fast_string - memory copy with exception handling + * + * Fast string copy + fault / exception handling. If the CPU does + * support machine check exception recovery, but does not support + * recovering from fast-string exceptions then this CPU needs to be + * added to the copy_mc_fragile_key set of quirks. Otherwise, absent any + * machine check recovery support this version should be no slower than + * standard memcpy. + */ +SYM_FUNC_START(copy_mc_enhanced_fast_string) + movq %rdi, %rax + movq %rdx, %rcx +.L_copy: + rep movsb + /* Copy successful. Return zero */ + xorl %eax, %eax + ret +SYM_FUNC_END(copy_mc_enhanced_fast_string) + + .section .fixup, "ax" +.E_copy: + /* + * On fault %rcx is updated such that the copy instruction could + * optionally be restarted at the fault position, i.e. it + * contains 'bytes remaining'. A non-zero return indicates error + * to copy_mc_generic() users, or indicate short transfers to + * user-copy routines. + */ + movq %rcx, %rax + ret + + .previous + + _ASM_EXTABLE_FAULT(.L_copy, .E_copy) #endif /* !CONFIG_UML */ --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -550,6 +550,7 @@ static const char *uaccess_safe_builtin[ "csum_partial_copy_generic", "copy_mc_fragile", "copy_mc_fragile_handle_tail", + "copy_mc_enhanced_fast_string", "ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */ NULL };
From: Michael Schaller misch@google.com
commit 336af6a4686d885a067ecea8c3c3dd129ba4fc75 upstream.
Without this patch efivarfs_alloc_dentry creates dentries with slashes in their name if the respective EFI variable has slashes in its name. This in turn causes EIO on getdents64, which prevents a complete directory listing of /sys/firmware/efi/efivars/.
This patch replaces the invalid shlashes with exclamation marks like kobject_set_name_vargs does for /sys/firmware/efi/vars/ to have consistently named dentries under /sys/firmware/efi/vars/ and /sys/firmware/efi/efivars/.
Signed-off-by: Michael Schaller misch@google.com Link: https://lore.kernel.org/r/20200925074502.150448-1-misch@google.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: dann frazier dann.frazier@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/efivarfs/super.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -141,6 +141,9 @@ static int efivarfs_callback(efi_char16_
name[len + EFI_VARIABLE_GUID_LEN+1] = '\0';
+ /* replace invalid slashes like kobject_set_name_vargs does for /sys/firmware/efi/vars. */ + strreplace(name, '/', '!'); + inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0644, 0, is_removable); if (!inode)
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit a1301f08c5acf992d9c1fafddc84c3a822844b04 ]
bnxt_open_nic() is called during configuration changes that require the NIC to be closed and then opened. This call is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The configuration change that resulted in the bnxt_open_nic() call will fail but the NIC will be brought to a consistent IF_DOWN state.
Without this patch, if bnxt_open_nic() were to continue in this error state, it may crash like this:
[ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0 [ 1648.659813] Oops: 0000 [#1] SMP [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common [ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1 [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020 [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000 [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202 [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000 [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0 [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008 [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0 [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40 [ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000 [ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0 [ 1648.669966] Call Trace: [ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en] [ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en] [ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en] [ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en] [ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0 [ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0 [ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60 [ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0 [ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0 [ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500 [ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0 [ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7 [ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.682724] RSP <ffff94f55df1fba8> [ 1648.683451] CR2: 0000000000000000
Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -9566,7 +9566,10 @@ int bnxt_open_nic(struct bnxt *bp, bool { int rc = 0;
- rc = __bnxt_open_nic(bp, irq_re_init, link_re_init); + if (test_bit(BNXT_STATE_ABORT_ERR, &bp->state)) + rc = -EIO; + if (!rc) + rc = __bnxt_open_nic(bp, irq_re_init, link_re_init); if (rc) { netdev_err(bp->dev, "nic open fail (rc: %x)\n", rc); dev_close(bp->dev);
From: Vasundhara Volam vasundhara-v.volam@broadcom.com
[ Upstream commit 21d6a11e2cadfb8446265a3efff0e2aad206e15e ]
A recent patch has moved the workqueue cleanup logic before calling unregister_netdev() in bnxt_remove_one(). This caused a regression because the workqueue can be restarted if the device is still open. Workqueue cleanup must be done after unregister_netdev(). The workqueue will not restart itself after the device is closed.
Call bnxt_cancel_sp_work() after unregister_netdev() and call bnxt_dl_fw_reporters_destroy() after that. This fixes the regession and the original NULL ptr dereference issue.
Fixes: b16939b59cc0 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11790,15 +11790,16 @@ static void bnxt_remove_one(struct pci_d if (BNXT_PF(bp)) bnxt_sriov_disable(bp);
+ if (BNXT_PF(bp)) + devlink_port_type_clear(&bp->dl_port); + pci_disable_pcie_error_reporting(pdev); + unregister_netdev(dev); clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state); + /* Flush any pending tasks */ bnxt_cancel_sp_work(bp); bp->sp_event = 0;
bnxt_dl_fw_reporters_destroy(bp, true); - if (BNXT_PF(bp)) - devlink_port_type_clear(&bp->dl_port); - pci_disable_pcie_error_reporting(pdev); - unregister_netdev(dev); bnxt_dl_unregister(bp); bnxt_shutdown_tc(bp);
From: Vasundhara Volam vasundhara-v.volam@broadcom.com
[ Upstream commit 631ce27a3006fc0b732bfd589c6df505f62eadd9 ]
As part of the commit b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."), cancel_delayed_work_sync() is called only for VFs to fix a possible crash by cancelling any pending delayed work items. It was assumed by mistake that the flush_workqueue() call on the PF would flush delayed work items as well.
As flush_workqueue() does not cancel the delayed workqueue, extend the fix for PFs. This fix will avoid the system crash, if there are any pending delayed work items in fw_reset_task() during driver's .remove() call.
Unify the workqueue cleanup logic for both PF and VF by calling cancel_work_sync() and cancel_delayed_work_sync() directly in bnxt_remove_one().
Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Reviewed-by: Andy Gospodarek gospo@broadcom.com Signed-off-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1158,16 +1158,6 @@ static void bnxt_queue_sp_work(struct bn schedule_work(&bp->sp_task); }
-static void bnxt_cancel_sp_work(struct bnxt *bp) -{ - if (BNXT_PF(bp)) { - flush_workqueue(bnxt_pf_wq); - } else { - cancel_work_sync(&bp->sp_task); - cancel_delayed_work_sync(&bp->fw_reset_task); - } -} - static void bnxt_sched_reset(struct bnxt *bp, struct bnxt_rx_ring_info *rxr) { if (!rxr->bnapi->in_reset) { @@ -11796,7 +11786,8 @@ static void bnxt_remove_one(struct pci_d unregister_netdev(dev); clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state); /* Flush any pending tasks */ - bnxt_cancel_sp_work(bp); + cancel_work_sync(&bp->sp_task); + cancel_delayed_work_sync(&bp->fw_reset_task); bp->sp_event = 0;
bnxt_dl_fw_reporters_destroy(bp, true);
From: Vasundhara Volam vasundhara-v.volam@broadcom.com
[ Upstream commit f75d9a0aa96721d20011cd5f8c7a24eb32728589 ]
When a PCIe fatal error occurs, the internal latched BAR addresses in the chip get reset even though the BAR register values in config space are retained.
pci_restore_state() will not rewrite the BAR addresses if the BAR address values are valid, causing the chip's internal BAR addresses to stay invalid. So we need to zero the BAR registers during PCIe fatal error to force pci_restore_state() to restore the BAR addresses. These write cycles to the BAR registers will cause the proper BAR addresses to latch internally.
Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 ++++++++++++++++++- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + 2 files changed, 19 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -12530,6 +12530,9 @@ static pci_ers_result_t bnxt_io_error_de return PCI_ERS_RESULT_DISCONNECT; }
+ if (state == pci_channel_io_frozen) + set_bit(BNXT_STATE_PCI_CHANNEL_IO_FROZEN, &bp->state); + if (netif_running(netdev)) bnxt_close(netdev);
@@ -12556,7 +12559,7 @@ static pci_ers_result_t bnxt_io_slot_res { struct net_device *netdev = pci_get_drvdata(pdev); struct bnxt *bp = netdev_priv(netdev); - int err = 0; + int err = 0, off; pci_ers_result_t result = PCI_ERS_RESULT_DISCONNECT;
netdev_info(bp->dev, "PCI Slot Reset\n"); @@ -12568,6 +12571,20 @@ static pci_ers_result_t bnxt_io_slot_res "Cannot re-enable PCI device after reset.\n"); } else { pci_set_master(pdev); + /* Upon fatal error, our device internal logic that latches to + * BAR value is getting reset and will restore only upon + * rewritting the BARs. + * + * As pci_restore_state() does not re-write the BARs if the + * value is same as saved value earlier, driver needs to + * write the BARs to 0 to force restore, in case of fatal error. + */ + if (test_and_clear_bit(BNXT_STATE_PCI_CHANNEL_IO_FROZEN, + &bp->state)) { + for (off = PCI_BASE_ADDRESS_0; + off <= PCI_BASE_ADDRESS_5; off += 4) + pci_write_config_dword(bp->pdev, off, 0); + } pci_restore_state(pdev); pci_save_state(pdev);
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1736,6 +1736,7 @@ struct bnxt { #define BNXT_STATE_ABORT_ERR 5 #define BNXT_STATE_FW_FATAL_COND 6 #define BNXT_STATE_DRV_REGISTERED 7 +#define BNXT_STATE_PCI_CHANNEL_IO_FROZEN 8
#define BNXT_NO_FW_ACCESS(bp) \ (test_bit(BNXT_STATE_FW_FATAL_COND, &(bp)->state) || \
From: Vasundhara Volam vasundhara-v.volam@broadcom.com
[ Upstream commit 825741b071722f1c8ad692cead562c4b5f5eaa93 ]
In the AER or firmware reset flow, if we are in fatal error state or if pci_channel_offline() is true, we don't send any commands to the firmware because the commands will likely not reach the firmware and most commands don't matter much because the firmware is likely to be reset imminently.
However, the HWRM_FUNC_RESET command is different and we should always attempt to send it. In the AER flow for example, the .slot_reset() call will trigger this fw command and we need to try to send it to effect the proper reset.
Fixes: b340dc680ed4 ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: Edwin Peer edwin.peer@broadcom.com Signed-off-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4296,7 +4296,8 @@ static int bnxt_hwrm_do_send_msg(struct u32 bar_offset = BNXT_GRCPF_REG_CHIMP_COMM; u16 dst = BNXT_HWRM_CHNL_CHIMP;
- if (BNXT_NO_FW_ACCESS(bp)) + if (BNXT_NO_FW_ACCESS(bp) && + le16_to_cpu(req->req_type) != HWRM_FUNC_RESET) return -EBUSY;
if (msg_len > BNXT_HWRM_MAX_REQ_LEN) {
From: Vinay Kumar Yadav vinay.yadav@chelsio.com
[ Upstream commit 28e9dcd9172028263c8225c15c4e329e08475e89 ]
In chtls_pass_establish() we hold child socket lock using bh_lock_sock and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock. Remove bh_lock_sock in add_to_reap_list() as lock is already held.
Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav vinay.yadav@chelsio.com Link: https://lore.kernel.org/r/20201025193538.31112-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/chelsio/chtls/chtls_cm.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/crypto/chelsio/chtls/chtls_cm.c +++ b/drivers/crypto/chelsio/chtls/chtls_cm.c @@ -1514,7 +1514,6 @@ static void add_to_reap_list(struct sock struct chtls_sock *csk = sk->sk_user_data;
local_bh_disable(); - bh_lock_sock(sk); release_tcp_port(sk); /* release the port immediately */
spin_lock(&reap_list_lock); @@ -1523,7 +1522,6 @@ static void add_to_reap_list(struct sock if (!csk->passive_reap_next) schedule_work(&reap_task); spin_unlock(&reap_list_lock); - bh_unlock_sock(sk); local_bh_enable(); }
From: Vinay Kumar Yadav vinay.yadav@chelsio.com
[ Upstream commit 6daa1da4e262b0cd52ef0acc1989ff22b5540264 ]
CPL handler functions chtls_pass_open_rpl() and chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE so that caller function will do skb free to avoid leak.
Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav vinay.yadav@chelsio.com Link: https://lore.kernel.org/r/20201025194228.31271-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/chelsio/chtls/chtls_cm.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/drivers/crypto/chelsio/chtls/chtls_cm.c +++ b/drivers/crypto/chelsio/chtls/chtls_cm.c @@ -772,14 +772,13 @@ static int chtls_pass_open_rpl(struct ch if (rpl->status != CPL_ERR_NONE) { pr_info("Unexpected PASS_OPEN_RPL status %u for STID %u\n", rpl->status, stid); - return CPL_RET_BUF_DONE; + } else { + cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family); + sock_put(listen_ctx->lsk); + kfree(listen_ctx); + module_put(THIS_MODULE); } - cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family); - sock_put(listen_ctx->lsk); - kfree(listen_ctx); - module_put(THIS_MODULE); - - return 0; + return CPL_RET_BUF_DONE; }
static int chtls_close_listsrv_rpl(struct chtls_dev *cdev, struct sk_buff *skb) @@ -796,15 +795,13 @@ static int chtls_close_listsrv_rpl(struc if (rpl->status != CPL_ERR_NONE) { pr_info("Unexpected CLOSE_LISTSRV_RPL status %u for STID %u\n", rpl->status, stid); - return CPL_RET_BUF_DONE; + } else { + cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family); + sock_put(listen_ctx->lsk); + kfree(listen_ctx); + module_put(THIS_MODULE); } - - cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family); - sock_put(listen_ctx->lsk); - kfree(listen_ctx); - module_put(THIS_MODULE); - - return 0; + return CPL_RET_BUF_DONE; }
static void chtls_purge_wr_queue(struct sock *sk)
From: Vinay Kumar Yadav vinay.yadav@chelsio.com
[ Upstream commit 4f3391ce8f5a69e7e6d66d0a3fc654eb6dbdc919 ]
chtls_pt_recvmsg() receives a skb with tls header and subsequent skb with data, need to finalize the data copy whenever next skb with tls header is available. but here current tls header is overwritten by next available tls header, ends up corrupting user buffer data. fixing it by finalizing current record whenever next skb contains tls header.
v1->v2: - Improved commit message.
Fixes: 17a7d24aa89d ("crypto: chtls - generic handling of data and hdr") Signed-off-by: Vinay Kumar Yadav vinay.yadav@chelsio.com Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/chelsio/chtls/chtls_io.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/crypto/chelsio/chtls/chtls_io.c +++ b/drivers/crypto/chelsio/chtls/chtls_io.c @@ -1585,6 +1585,7 @@ skip_copy: tp->urg_data = 0;
if ((avail + offset) >= skb->len) { + struct sk_buff *next_skb; if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_TLS_HDR) { tp->copied_seq += skb->len; hws->rcvpld = skb->hdr_len; @@ -1595,8 +1596,10 @@ skip_copy: chtls_free_skb(sk, skb); buffers_freed++; hws->copied_seq = 0; - if (copied >= target && - !skb_peek(&sk->sk_receive_queue)) + next_skb = skb_peek(&sk->sk_receive_queue); + if (copied >= target && !next_skb) + break; + if (ULP_SKB_CB(next_skb)->flags & ULPCB_FLAG_TLS_HDR) break; } } while (len > 0);
From: Raju Rangoju rajur@chelsio.com
[ Upstream commit 937d8420588421eaa5c7aa5c79b26b42abb288ef ]
The current code sets up the filter action field before rewrites are set up. When the action 'switch' is used with rewrites, this may result in initial few packets that get switched out don't have rewrites applied on them.
So, make sure filter action is set up along with rewrites or only after everything else is set up for rewrites.
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Signed-off-by: Raju Rangoju rajur@chelsio.com Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 56 ++++++++++------------ drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h | 4 + 2 files changed, 31 insertions(+), 29 deletions(-)
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -145,13 +145,13 @@ static int configure_filter_smac(struct int err;
/* do a set-tcb for smac-sel and CWR bit.. */ - err = set_tcb_tflag(adap, f, f->tid, TF_CCTRL_CWR_S, 1, 1); - if (err) - goto smac_err; - err = set_tcb_field(adap, f, f->tid, TCB_SMAC_SEL_W, TCB_SMAC_SEL_V(TCB_SMAC_SEL_M), TCB_SMAC_SEL_V(f->smt->idx), 1); + if (err) + goto smac_err; + + err = set_tcb_tflag(adap, f, f->tid, TF_CCTRL_CWR_S, 1, 1); if (!err) return 0;
@@ -865,6 +865,7 @@ int set_filter_wr(struct adapter *adapte FW_FILTER_WR_DIRSTEERHASH_V(f->fs.dirsteerhash) | FW_FILTER_WR_LPBK_V(f->fs.action == FILTER_SWITCH) | FW_FILTER_WR_DMAC_V(f->fs.newdmac) | + FW_FILTER_WR_SMAC_V(f->fs.newsmac) | FW_FILTER_WR_INSVLAN_V(f->fs.newvlan == VLAN_INSERT || f->fs.newvlan == VLAN_REWRITE) | FW_FILTER_WR_RMVLAN_V(f->fs.newvlan == VLAN_REMOVE || @@ -882,7 +883,7 @@ int set_filter_wr(struct adapter *adapte FW_FILTER_WR_OVLAN_VLD_V(f->fs.val.ovlan_vld) | FW_FILTER_WR_IVLAN_VLDM_V(f->fs.mask.ivlan_vld) | FW_FILTER_WR_OVLAN_VLDM_V(f->fs.mask.ovlan_vld)); - fwr->smac_sel = 0; + fwr->smac_sel = f->smt->idx; fwr->rx_chan_rx_rpl_iq = htons(FW_FILTER_WR_RX_CHAN_V(0) | FW_FILTER_WR_RX_RPL_IQ_V(adapter->sge.fw_evtq.abs_id)); @@ -1326,11 +1327,8 @@ static void mk_act_open_req6(struct filt TX_QUEUE_V(f->fs.nat_mode) | T5_OPT_2_VALID_F | RX_CHANNEL_V(cxgb4_port_e2cchan(f->dev)) | - CONG_CNTRL_V((f->fs.action == FILTER_DROP) | - (f->fs.dirsteer << 1)) | PACE_V((f->fs.maskhash) | - ((f->fs.dirsteerhash) << 1)) | - CCTRL_ECN_V(f->fs.action == FILTER_SWITCH)); + ((f->fs.dirsteerhash) << 1))); }
static void mk_act_open_req(struct filter_entry *f, struct sk_buff *skb, @@ -1366,11 +1364,8 @@ static void mk_act_open_req(struct filte TX_QUEUE_V(f->fs.nat_mode) | T5_OPT_2_VALID_F | RX_CHANNEL_V(cxgb4_port_e2cchan(f->dev)) | - CONG_CNTRL_V((f->fs.action == FILTER_DROP) | - (f->fs.dirsteer << 1)) | PACE_V((f->fs.maskhash) | - ((f->fs.dirsteerhash) << 1)) | - CCTRL_ECN_V(f->fs.action == FILTER_SWITCH)); + ((f->fs.dirsteerhash) << 1))); }
static int cxgb4_set_hash_filter(struct net_device *dev, @@ -2042,6 +2037,20 @@ void hash_filter_rpl(struct adapter *ada } return; } + switch (f->fs.action) { + case FILTER_PASS: + if (f->fs.dirsteer) + set_tcb_tflag(adap, f, tid, + TF_DIRECT_STEER_S, 1, 1); + break; + case FILTER_DROP: + set_tcb_tflag(adap, f, tid, TF_DROP_S, 1, 1); + break; + case FILTER_SWITCH: + set_tcb_tflag(adap, f, tid, TF_LPBK_S, 1, 1); + break; + } + break;
default: @@ -2109,22 +2118,11 @@ void filter_rpl(struct adapter *adap, co if (ctx) ctx->result = 0; } else if (ret == FW_FILTER_WR_FLT_ADDED) { - int err = 0; - - if (f->fs.newsmac) - err = configure_filter_smac(adap, f); - - if (!err) { - f->pending = 0; /* async setup completed */ - f->valid = 1; - if (ctx) { - ctx->result = 0; - ctx->tid = idx; - } - } else { - clear_filter(adap, f); - if (ctx) - ctx->result = err; + f->pending = 0; /* async setup completed */ + f->valid = 1; + if (ctx) { + ctx->result = 0; + ctx->tid = idx; } } else { /* Something went wrong. Issue a warning about the --- a/drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h @@ -50,6 +50,10 @@ #define TCB_T_FLAGS_M 0xffffffffffffffffULL #define TCB_T_FLAGS_V(x) ((__u64)(x) << TCB_T_FLAGS_S)
+#define TF_DROP_S 22 +#define TF_DIRECT_STEER_S 23 +#define TF_LPBK_S 59 + #define TF_CCTRL_ECE_S 60 #define TF_CCTRL_CWR_S 61 #define TF_CCTRL_RFR_S 62
From: Masahiro Fujiwara fujiwara.masahiro@gmail.com
[ Upstream commit 51467431200b91682b89d31317e35dcbca1469ce ]
*_pdp_find() from gtp_encap_recv() would trigger a crash when a peer sends GTP packets while creating new GTP device.
RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp] <SNIP> Call Trace: <IRQ> gtp_encap_recv+0xc2/0x2e0 [gtp] ? gtp1_pdp_find.isra.0+0x90/0x90 [gtp] udp_queue_rcv_one_skb+0x1fe/0x530 udp_queue_rcv_skb+0x40/0x1b0 udp_unicast_rcv_skb.isra.0+0x78/0x90 __udp4_lib_rcv+0x5af/0xc70 udp_rcv+0x1a/0x20 ip_protocol_deliver_rcu+0xc5/0x1b0 ip_local_deliver_finish+0x48/0x50 ip_local_deliver+0xe5/0xf0 ? ip_protocol_deliver_rcu+0x1b0/0x1b0
gtp_encap_enable() should be called after gtp_hastable_new() otherwise *_pdp_find() will access the uninitialized hash table.
Fixes: 1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional") Signed-off-by: Masahiro Fujiwara fujiwara.masahiro@gmail.com Link: https://lore.kernel.org/r/20201027114846.3924-1-fujiwara.masahiro@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -663,10 +663,6 @@ static int gtp_newlink(struct net *src_n
gtp = netdev_priv(dev);
- err = gtp_encap_enable(gtp, data); - if (err < 0) - return err; - if (!data[IFLA_GTP_PDP_HASHSIZE]) { hashsize = 1024; } else { @@ -677,12 +673,16 @@ static int gtp_newlink(struct net *src_n
err = gtp_hashtable_new(gtp, hashsize); if (err < 0) - goto out_encap; + return err; + + err = gtp_encap_enable(gtp, data); + if (err < 0) + goto out_hashtable;
err = register_netdevice(dev); if (err < 0) { netdev_dbg(dev, "failed to register new netdev %d\n", err); - goto out_hashtable; + goto out_encap; }
gn = net_generic(dev_net(dev), gtp_net_id); @@ -693,11 +693,11 @@ static int gtp_newlink(struct net *src_n
return 0;
+out_encap: + gtp_encap_disable(gtp); out_hashtable: kfree(gtp->addr_hash); kfree(gtp->tid_hash); -out_encap: - gtp_encap_disable(gtp); return err; }
From: Thomas Bogendoerfer tbogendoerfer@suse.de
[ Upstream commit 2ac8af0967aaa2b67cb382727e784900d2f4d0da ]
The check for src mac address in ibmveth_is_packet_unsupported is wrong. Commit 6f2275433a2f wanted to shut down messages for loopback packets, but now suppresses bridged frames, which are accepted by the hypervisor otherwise bridging won't work at all.
Fixes: 6f2275433a2f ("ibmveth: Detect unsupported packets before sending to the hypervisor") Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Thomas Bogendoerfer tbogendoerfer@suse.de Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmveth.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1031,12 +1031,6 @@ static int ibmveth_is_packet_unsupported ret = -EOPNOTSUPP; }
- if (!ether_addr_equal(ether_header->h_source, netdev->dev_addr)) { - netdev_dbg(netdev, "source packet MAC address does not match veth device's, dropping packet.\n"); - netdev->stats.tx_dropped++; - ret = -EOPNOTSUPP; - } - return ret; }
From: Lijun Pan ljp@linux.ibm.com
[ Upstream commit 8fc3672a8ad3e782bac80e979bc2a2c10960cbe9 ]
Jakub Kicinski brought up a concern in ibmvnic_set_mac(). ibmvnic_set_mac() does this:
ether_addr_copy(adapter->mac_addr, addr->sa_data); if (adapter->state != VNIC_PROBED) rc = __ibmvnic_set_mac(netdev, addr->sa_data);
So if state == VNIC_PROBED, the user can assign an invalid address to adapter->mac_addr, and ibmvnic_set_mac() will still return 0.
The fix is to validate ethernet address at the beginning of ibmvnic_set_mac(), and move the ether_addr_copy to the case of "adapter->state != VNIC_PROBED".
Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters") Signed-off-by: Lijun Pan ljp@linux.ibm.com Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1828,9 +1828,13 @@ static int ibmvnic_set_mac(struct net_de int rc;
rc = 0; - ether_addr_copy(adapter->mac_addr, addr->sa_data); - if (adapter->state != VNIC_PROBED) + if (!is_valid_ether_addr(addr->sa_data)) + return -EADDRNOTAVAIL; + + if (adapter->state != VNIC_PROBED) { + ether_addr_copy(adapter->mac_addr, addr->sa_data); rc = __ibmvnic_set_mac(netdev, addr->sa_data); + }
return rc; }
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit adc80b6cfedff6dad8b93d46a5ea2775fd5af9ec ]
Free the devlink instance during the teardown sequence in the non-reload case to avoid the following memory leak.
unreferenced object 0xffff888232895000 (size 2048): comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2.... backtrace: [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0 [<00000000ee30129d>] devlink_alloc+0xff/0x760 [<0000000092ab3e5d>] 0xffffffffa042e5b0 [<000000004f3f8a31>] 0xffffffffa042f6ad [<0000000092800b4b>] 0xffffffffa0491df3 [<00000000c4843903>] local_pci_probe+0xcb/0x170 [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0 [<00000000a8e0de75>] really_probe+0x2c5/0xf90 [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340 [<00000000bcc95e05>] device_driver_attach+0x294/0x300 [<000000000e2bc177>] __driver_attach+0x167/0x2f0 [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0 [<000000003cd5a91e>] driver_attach+0x45/0x60 [<000000000041ce51>] bus_add_driver+0x3b8/0x720 [<00000000f5215476>] driver_register+0x230/0x4e0 [<00000000d79356f5>] __pci_register_driver+0x190/0x200
Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow") Signed-off-by: Ido Schimmel idosch@nvidia.com Reported-by: Vadim Pasternak vadimp@nvidia.com Tested-by: Oleksandr Shamray oleksandrs@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/core.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -1485,6 +1485,8 @@ void mlxsw_core_bus_device_unregister(st if (!reload) devlink_resources_unregister(devlink, NULL); mlxsw_core->bus->fini(mlxsw_core->bus_priv); + if (!reload) + devlink_free(devlink);
return;
From: Amit Cohen amcohen@nvidia.com
[ Upstream commit 1601559be3e4213148b4cb4a1abe672b00bf4f67 ]
During port creation the driver instructs the device to advertise all the supported link modes queried from the device.
Since cited commit not all the link modes supported by the device are supported by the driver. This can result in the device negotiating a link mode that is not recognized by the driver causing ethtool to show an unsupported speed:
$ ethtool swp1 ... Speed: Unknown!
This is especially problematic when the netdev is enslaved to a bond, as the bond driver uses unknown speed as an indication that the link is down:
[13048.900895] net_ratelimit: 86 callbacks suppressed [13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex [13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
Fix this by making sure that only link modes that are supported by both the device and the driver are advertised.
Fixes: b97cd891268d ("mlxsw: Remove 56G speed support") Signed-off-by: Amit Cohen amcohen@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 9 +++-- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c | 30 +++++++++++++++++ 3 files changed, 38 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -1546,11 +1546,14 @@ mlxsw_sp_port_speed_by_width_set(struct u32 eth_proto_cap, eth_proto_admin, eth_proto_oper; const struct mlxsw_sp_port_type_speed_ops *ops; char ptys_pl[MLXSW_REG_PTYS_LEN]; + u32 eth_proto_cap_masked; int err;
ops = mlxsw_sp->port_type_speed_ops;
- /* Set advertised speeds to supported speeds. */ + /* Set advertised speeds to speeds supported by both the driver + * and the device. + */ ops->reg_ptys_eth_pack(mlxsw_sp, ptys_pl, mlxsw_sp_port->local_port, 0, false); err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(ptys), ptys_pl); @@ -1559,8 +1562,10 @@ mlxsw_sp_port_speed_by_width_set(struct
ops->reg_ptys_eth_unpack(mlxsw_sp, ptys_pl, ð_proto_cap, ð_proto_admin, ð_proto_oper); + eth_proto_cap_masked = ops->ptys_proto_cap_masked_get(eth_proto_cap); ops->reg_ptys_eth_pack(mlxsw_sp, ptys_pl, mlxsw_sp_port->local_port, - eth_proto_cap, mlxsw_sp_port->link.autoneg); + eth_proto_cap_masked, + mlxsw_sp_port->link.autoneg); return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ptys), ptys_pl); }
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -340,6 +340,7 @@ struct mlxsw_sp_port_type_speed_ops { u32 *p_eth_proto_cap, u32 *p_eth_proto_admin, u32 *p_eth_proto_oper); + u32 (*ptys_proto_cap_masked_get)(u32 eth_proto_cap); };
static inline struct net_device * --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ethtool.c @@ -1208,6 +1208,20 @@ mlxsw_sp1_reg_ptys_eth_unpack(struct mlx p_eth_proto_oper); }
+static u32 mlxsw_sp1_ptys_proto_cap_masked_get(u32 eth_proto_cap) +{ + u32 ptys_proto_cap_masked = 0; + int i; + + for (i = 0; i < MLXSW_SP1_PORT_LINK_MODE_LEN; i++) { + if (mlxsw_sp1_port_link_mode[i].mask & eth_proto_cap) + ptys_proto_cap_masked |= + mlxsw_sp1_port_link_mode[i].mask; + } + + return ptys_proto_cap_masked; +} + const struct mlxsw_sp_port_type_speed_ops mlxsw_sp1_port_type_speed_ops = { .from_ptys_supported_port = mlxsw_sp1_from_ptys_supported_port, .from_ptys_link = mlxsw_sp1_from_ptys_link, @@ -1217,6 +1231,7 @@ const struct mlxsw_sp_port_type_speed_op .to_ptys_speed = mlxsw_sp1_to_ptys_speed, .reg_ptys_eth_pack = mlxsw_sp1_reg_ptys_eth_pack, .reg_ptys_eth_unpack = mlxsw_sp1_reg_ptys_eth_unpack, + .ptys_proto_cap_masked_get = mlxsw_sp1_ptys_proto_cap_masked_get, };
static const enum ethtool_link_mode_bit_indices @@ -1632,6 +1647,20 @@ mlxsw_sp2_reg_ptys_eth_unpack(struct mlx p_eth_proto_admin, p_eth_proto_oper); }
+static u32 mlxsw_sp2_ptys_proto_cap_masked_get(u32 eth_proto_cap) +{ + u32 ptys_proto_cap_masked = 0; + int i; + + for (i = 0; i < MLXSW_SP2_PORT_LINK_MODE_LEN; i++) { + if (mlxsw_sp2_port_link_mode[i].mask & eth_proto_cap) + ptys_proto_cap_masked |= + mlxsw_sp2_port_link_mode[i].mask; + } + + return ptys_proto_cap_masked; +} + const struct mlxsw_sp_port_type_speed_ops mlxsw_sp2_port_type_speed_ops = { .from_ptys_supported_port = mlxsw_sp2_from_ptys_supported_port, .from_ptys_link = mlxsw_sp2_from_ptys_link, @@ -1641,4 +1670,5 @@ const struct mlxsw_sp_port_type_speed_op .to_ptys_speed = mlxsw_sp2_to_ptys_speed, .reg_ptys_eth_pack = mlxsw_sp2_reg_ptys_eth_pack, .reg_ptys_eth_unpack = mlxsw_sp2_reg_ptys_eth_unpack, + .ptys_proto_cap_masked_get = mlxsw_sp2_ptys_proto_cap_masked_get, };
From: Aleksandr Nogikh nogikh@google.com
[ Upstream commit eadd1befdd778a1eca57fad058782bd22b4db804 ]
Currently it is possible to craft a special netlink RTM_NEWQDISC command that can result in jitter being equal to 0x80000000. It is enough to set the 32 bit jitter to 0x02000000 (it will later be multiplied by 2^6) or just set the 64 bit jitter via TCA_NETEM_JITTER64. This causes an overflow during the generation of uniformly distributed numbers in tabledist(), which in turn leads to division by zero (sigma != 0, but sigma * 2 is 0).
The related fragment of code needs 32-bit division - see commit 9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to 64 bit is not an option.
Fix the issue by keeping the value of jitter within the range that can be adequately handled by tabledist() - [0;INT_MAX]. As negative std deviation makes no sense, take the absolute value of the passed value and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit arithmetic in order to prevent overflows.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Aleksandr Nogikh nogikh@google.com Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com Acked-by: Stephen Hemminger stephen@networkplumber.org Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_netem.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -330,7 +330,7 @@ static s64 tabledist(s64 mu, s32 sigma,
/* default uniform distribution */ if (dist == NULL) - return ((rnd % (2 * sigma)) + mu) - sigma; + return ((rnd % (2 * (u32)sigma)) + mu) - sigma;
t = dist->table[rnd % dist->size]; x = (sigma % NETEM_DIST_SCALE) * t; @@ -812,6 +812,10 @@ static void get_slot(struct netem_sched_ q->slot_config.max_packets = INT_MAX; if (q->slot_config.max_bytes == 0) q->slot_config.max_bytes = INT_MAX; + + /* capping dist_jitter to the range acceptable by tabledist() */ + q->slot_config.dist_jitter = min_t(__s64, INT_MAX, abs(q->slot_config.dist_jitter)); + q->slot.packets_left = q->slot_config.max_packets; q->slot.bytes_left = q->slot_config.max_bytes; if (q->slot_config.min_delay | q->slot_config.max_delay | @@ -1037,6 +1041,9 @@ static int netem_change(struct Qdisc *sc if (tb[TCA_NETEM_SLOT]) get_slot(q, tb[TCA_NETEM_SLOT]);
+ /* capping jitter to the range acceptable by tabledist() */ + q->jitter = min_t(s64, abs(q->jitter), INT_MAX); + return ret;
get_table_failure:
From: Zenghui Yu yuzenghui@huawei.com
[ Upstream commit e3364c5ff3ff975b943a7bf47e21a2a4bf20f3fe ]
When unbinding the hns3 driver with the HNS3 VF, I got the following kernel panic:
[ 265.709989] Unable to handle kernel paging request at virtual address ffff800054627000 [ 265.717928] Mem abort info: [ 265.720740] ESR = 0x96000047 [ 265.723810] EC = 0x25: DABT (current EL), IL = 32 bits [ 265.729126] SET = 0, FnV = 0 [ 265.732195] EA = 0, S1PTW = 0 [ 265.735351] Data abort info: [ 265.738227] ISV = 0, ISS = 0x00000047 [ 265.742071] CM = 0, WnR = 1 [ 265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000 [ 265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000 [ 265.764314] Internal error: Oops: 96000047 [#1] SMP [ 265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206 [ 265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019 [ 265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) [ 265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300 [ 265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300 [ 265.861992] sp : ffff80004c983b50 [ 265.881411] pmr_save: 000000e0 [ 265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500 [ 265.889744] x27: 0000000000000000 x26: 0000000000000000 [ 265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90 [ 265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150 [ 265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90 [ 265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004 [ 265.916198] x17: 0000000000000000 x16: ffff800011ae32e8 [ 265.916201] x15: 0000000000000028 x14: 0000000000000002 [ 265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8 [ 265.946619] x11: ffff80004c983b50 x10: 0000000000000000 [ 265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000 [ 265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff [ 265.951917] x5 : ffff80004c983930 x4 : 0000000000000001 [ 265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100 [ 265.964487] x1 : 0000000000000000 x0 : ffff800054627000 [ 265.964491] Call trace: [ 265.964494] hclgevf_cmd_uninit+0xbc/0x300 [ 265.964496] hclgevf_uninit_ae_dev+0x9c/0xe8 [ 265.964501] hnae3_unregister_ae_dev+0xb0/0x130 [ 265.964516] hns3_remove+0x34/0x88 [hns3] [ 266.009683] pci_device_remove+0x48/0xf0 [ 266.009692] device_release_driver_internal+0x114/0x1e8 [ 266.030058] device_driver_detach+0x28/0x38 [ 266.034224] unbind_store+0xd4/0x108 [ 266.037784] drv_attr_store+0x40/0x58 [ 266.041435] sysfs_kf_write+0x54/0x80 [ 266.045081] kernfs_fop_write+0x12c/0x250 [ 266.049076] vfs_write+0xc4/0x248 [ 266.052378] ksys_write+0x74/0xf8 [ 266.055677] __arm64_sys_write+0x24/0x30 [ 266.059584] el0_svc_common.constprop.3+0x84/0x270 [ 266.064354] do_el0_svc+0x34/0xa0 [ 266.067658] el0_svc+0x38/0x40 [ 266.070700] el0_sync_handler+0x8c/0xb0 [ 266.074519] el0_sync+0x140/0x180
It looks like the BAR memory region had already been unmapped before we start clearing CMDQ registers in it, which is pretty bad and the kernel happily kills itself because of a Current EL Data Abort (on arm64).
Moving the CMDQ uninitialization a bit early fixes the issue for me.
Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR") Signed-off-by: Zenghui Yu yuzenghui@huawei.com Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -3146,8 +3146,8 @@ static void hclgevf_uninit_hdev(struct h hclgevf_uninit_msi(hdev); }
- hclgevf_pci_uninit(hdev); hclgevf_cmd_uninit(hdev); + hclgevf_pci_uninit(hdev); hclgevf_uninit_mac_list(hdev); }
From: Alex Elder elder@linaro.org
[ Upstream commit df833050cced27e1b343cc8bc41f90191b289334 ]
IPA transactions describe actions to be performed by the IPA hardware. Three cases use IPA transactions: transmitting a socket buffer; providing a page to receive packet data; and issuing an IPA immediate command. An IPA transaction contains a scatter/gather list (SGL) to hold the set of actions to be performed.
We map buffers in the SGL for DMA at the time they are added to the transaction. For skb TX transactions, we fill the SGL with a call to skb_to_sgvec(). Page RX transactions involve a single page pointer, and that is recorded in the SGL with sg_set_page(). In both of these cases we then map the SGL for DMA with a call to dma_map_sg().
Immediate commands are different. The payload for an immediate command comes from a region of coherent DMA memory, which must *not* be mapped for DMA. For that reason, gsi_trans_cmd_add() sort of hand-crafts each SGL entry added to a command transaction.
This patch fixes a problem with the code that crafts the SGL entry for an immediate command. Previously a portion of the SGL entry was updated using sg_set_buf(). However this is not valid because it includes a call to virt_to_page() on the buffer, but the command buffer pointer is not a linear address.
Since we never actually map the SGL for command transactions, there are very few fields in the SGL we need to fill. Specifically, we only need to record the DMA address and the length, so they can be used by __gsi_trans_commit() to fill a TRE. We additionally need to preserve the SGL flags so for_each_sg() still works. For that we can simply assign a null page pointer for command SGL entries.
Fixes: 9dd441e4ed575 ("soc: qcom: ipa: GSI transactions") Reported-by: Stephen Boyd swboyd@chromium.org Tested-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Alex Elder elder@linaro.org Link: https://lore.kernel.org/r/20201022010029.11877-1-elder@linaro.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipa/gsi_trans.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
--- a/drivers/net/ipa/gsi_trans.c +++ b/drivers/net/ipa/gsi_trans.c @@ -398,15 +398,24 @@ void gsi_trans_cmd_add(struct gsi_trans
/* assert(which < trans->tre_count); */
- /* Set the page information for the buffer. We also need to fill in - * the DMA address and length for the buffer (something dma_map_sg() - * normally does). + /* Commands are quite different from data transfer requests. + * Their payloads come from a pool whose memory is allocated + * using dma_alloc_coherent(). We therefore do *not* map them + * for DMA (unlike what we do for pages and skbs). + * + * When a transaction completes, the SGL is normally unmapped. + * A command transaction has direction DMA_NONE, which tells + * gsi_trans_complete() to skip the unmapping step. + * + * The only things we use directly in a command scatter/gather + * entry are the DMA address and length. We still need the SG + * table flags to be maintained though, so assign a NULL page + * pointer for that purpose. */ sg = &trans->sgl[which]; - - sg_set_buf(sg, buf, size); + sg_assign_page(sg, NULL); sg_dma_address(sg) = addr; - sg_dma_len(sg) = sg->length; + sg_dma_len(sg) = size;
info = &trans->info[which]; info->opcode = opcode;
From: Guillaume Nault gnault@redhat.com
TCA_MPLS_ACT_PUSH and TCA_MPLS_ACT_MAC_PUSH might be used on gso packets. Such packets will thus require mpls_gso.ko for segmentation.
v2: Drop dependency on CONFIG_NET_MPLS_GSO in Kconfig (from Jakub and David).
Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Signed-off-by: Guillaume Nault gnault@redhat.com Link: https://lore.kernel.org/r/1f6cab15bbd15666795061c55563aaf6a386e90e.160370800... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_mpls.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/sched/act_mpls.c +++ b/net/sched/act_mpls.c @@ -408,6 +408,7 @@ static void __exit mpls_cleanup_module(v module_init(mpls_init_module); module_exit(mpls_cleanup_module);
+MODULE_SOFTDEP("post: mpls_gso"); MODULE_AUTHOR("Netronome Systems oss-drivers@netronome.com"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("MPLS manipulation actions");
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 2734a24e6e5d18522fbf599135c59b82ec9b2c9e ]
As reported by Serge flag IRQF_NO_THREAD causes an error if the interrupt is actually shared and the other driver(s) don't have this flag set. This situation can occur if a PCI(e) legacy interrupt is used in combination with forced threading. There's no good way to deal with this properly, therefore we have to remove flag IRQF_NO_THREAD. For fixing the original forced threading issue switch to napi_schedule().
Fixes: 424a646e072a ("r8169: fix operation under forced interrupt threading") Link: https://www.spinics.net/lists/netdev/msg694960.html Reported-by: Serge Belyshev belyshev@depni.sinp.msu.ru Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Tested-by: Serge Belyshev belyshev@depni.sinp.msu.ru Link: https://lore.kernel.org/r/b5b53bfe-35ac-3768-85bf-74d1290cf394@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4563,7 +4563,7 @@ static irqreturn_t rtl8169_interrupt(int }
rtl_irq_disable(tp); - napi_schedule_irqoff(&tp->napi); + napi_schedule(&tp->napi); out: rtl_ack_events(tp, status);
@@ -4738,7 +4738,7 @@ static int rtl_open(struct net_device *d rtl_request_firmware(tp);
retval = request_irq(pci_irq_vector(pdev, 0), rtl8169_interrupt, - IRQF_NO_THREAD | IRQF_SHARED, dev->name, tp); + IRQF_SHARED, dev->name, tp); if (retval < 0) goto err_release_fw_2;
From: Andrew Gabbasov andrew_gabbasov@mentor.com
[ Upstream commit 68b9f0865b1ef545da180c57d54b82c94cb464a4 ]
In the function ravb_hwtstamp_get() in ravb_main.c with the existing values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL (0x6)
if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT) config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT; else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL) config.rx_filter = HWTSTAMP_FILTER_ALL;
if the test on RAVB_RXTSTAMP_TYPE_ALL should be true, it will never be reached.
This issue can be verified with 'hwtstamp_config' testing program (tools/testing/selftests/net/hwtstamp_config.c). Setting filter type to ALL and subsequent retrieving it gives incorrect value:
$ hwtstamp_config eth0 OFF ALL flags = 0 tx_type = OFF rx_filter = ALL $ hwtstamp_config eth0 flags = 0 tx_type = OFF rx_filter = PTP_V2_L2_EVENT
Correct this by converting if-else's to switch.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Reported-by: Julia Lawall julia.lawall@inria.fr Signed-off-by: Andrew Gabbasov andrew_gabbasov@mentor.com Reviewed-by: Sergei Shtylyov sergei.shtylyov@gmail.com Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/renesas/ravb_main.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1747,12 +1747,16 @@ static int ravb_hwtstamp_get(struct net_ config.flags = 0; config.tx_type = priv->tstamp_tx_ctrl ? HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF; - if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT) + switch (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE) { + case RAVB_RXTSTAMP_TYPE_V2_L2_EVENT: config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT; - else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL) + break; + case RAVB_RXTSTAMP_TYPE_ALL: config.rx_filter = HWTSTAMP_FILTER_ALL; - else + break; + default: config.rx_filter = HWTSTAMP_FILTER_NONE; + }
return copy_to_user(req->ifr_data, &config, sizeof(config)) ? -EFAULT : 0;
From: Arjun Roy arjunroy@google.com
[ Upstream commit 435ccfa894e35e3d4a1799e6ac030e48a7b69ef5 ]
With SO_RCVLOWAT, under memory pressure, it is possible to enter a state where:
1. We have not received enough bytes to satisfy SO_RCVLOWAT. 2. We have not entered buffer pressure (see tcp_rmem_pressure()). 3. But, we do not have enough buffer space to accept more packets.
In this case, we advertise 0 rwnd (due to #3) but the application does not drain the receive queue (no wakeup because of #1 and #2) so the flow stalls.
Modify the heuristic for SO_RCVLOWAT so that, if we are advertising rwnd<=rcv_mss, force a wakeup to prevent a stall.
Without this patch, setting tcp_rmem to 6143 and disabling TCP autotune causes a stalled flow. With this patch, no stall occurs. This is with RPC-style traffic with large messages.
Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Arjun Roy arjunroy@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 2 ++ net/ipv4/tcp_input.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -483,6 +483,8 @@ static inline bool tcp_stream_is_readabl return true; if (tcp_rmem_pressure(sk)) return true; + if (tcp_receive_window(tp) <= inet_csk(sk)->icsk_ack.rcv_mss) + return true; } if (sk->sk_prot->stream_memory_read) return sk->sk_prot->stream_memory_read(sk); --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4840,7 +4840,8 @@ void tcp_data_ready(struct sock *sk) int avail = tp->rcv_nxt - tp->copied_seq;
if (avail < sk->sk_rcvlowat && !tcp_rmem_pressure(sk) && - !sock_flag(sk, SOCK_DONE)) + !sock_flag(sk, SOCK_DONE) && + tcp_receive_window(tp) > inet_csk(sk)->icsk_ack.rcv_mss) return;
sk->sk_data_ready(sk);
From: Tung Nguyen tung.q.nguyen@dektech.com.au
[ Upstream commit ceb1eb2fb609c88363e06618b8d4bbf7815a4e03 ]
Commit ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()") replaced skb_unshare() with skb_copy() to not reduce the data reference counter of the original skb intentionally. This is not the correct way to handle the cloned skb because it causes memory leak in 2 following cases: 1/ Sending multicast messages via broadcast link The original skb list is cloned to the local skb list for local destination. After that, the data reference counter of each skb in the original list has the value of 2. This causes each skb not to be freed after receiving ACK: tipc_link_advance_transmq() { ... /* release skb */ __skb_unlink(skb, &l->transmq); kfree_skb(skb); <-- memory exists after being freed }
2/ Sending multicast messages via replicast link Similar to the above case, each skb cannot be freed after purging the skb list: tipc_mcast_xmit() { ... __skb_queue_purge(pkts); <-- memory exists after being freed }
This commit fixes this issue by using skb_unshare() instead. Besides, to avoid use-after-free error reported by KASAN, the pointer to the fragment is set to NULL before calling skb_unshare() to make sure that the original skb is not freed after freeing the fragment 2 times in case skb_unshare() returns NULL.
Fixes: ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()") Acked-by: Jon Maloy jmaloy@redhat.com Reported-by: Thang Hoang Ngo thang.h.ngo@dektech.com.au Signed-off-by: Tung Nguyen tung.q.nguyen@dektech.com.au Reviewed-by: Xin Long lucien.xin@gmail.com Acked-by: Cong Wang xiyou.wangcong@gmail.com Link: https://lore.kernel.org/r/20201027032403.1823-1-tung.q.nguyen@dektech.com.au Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/msg.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -150,12 +150,11 @@ int tipc_buf_append(struct sk_buff **hea if (fragid == FIRST_FRAGMENT) { if (unlikely(head)) goto err; - if (skb_cloned(frag)) - frag = skb_copy(frag, GFP_ATOMIC); + *buf = NULL; + frag = skb_unshare(frag, GFP_ATOMIC); if (unlikely(!frag)) goto err; head = *headbuf = frag; - *buf = NULL; TIPC_SKB_CB(head)->tail = NULL; if (skb_is_nonlinear(head)) { skb_walk_frags(head, tail) {
From: Karsten Graul kgraul@linux.ibm.com
[ Upstream commit 6b1bbf94ab369d97ed3bdaa561521a52c27ef619 ]
smc_ism_register_dmb() returns error codes set by the ISM driver which are not guaranteed to be negative or in the errno range. Such values would not be handled by ERR_PTR() and finally the return code will be used as a memory address. Fix that by using a valid negative errno value with ERR_PTR().
Fixes: 72b7f6c48708 ("net/smc: unique reason code for exceeded max dmb count") Signed-off-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/smc_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1616,7 +1616,8 @@ static struct smc_buf_desc *smcd_new_buf rc = smc_ism_register_dmb(lgr, bufsize, buf_desc); if (rc) { kfree(buf_desc); - return (rc == -ENOMEM) ? ERR_PTR(-EAGAIN) : ERR_PTR(rc); + return (rc == -ENOMEM) ? ERR_PTR(-EAGAIN) : + ERR_PTR(-EIO); } buf_desc->pages = virt_to_page(buf_desc->cpu_addr); /* CDC header stored in buf. So, pretend it was smaller */
From: Karsten Graul kgraul@linux.ibm.com
[ Upstream commit 96d6fded958d971a3695009e0ed43aca6c598283 ]
The patch that repaired the invalid return code in smcd_new_buf_create() missed to take care of errno ENOSPC which has a special meaning that no more DMBEs can be registered on the device. Fix that by keeping this errno value during the translation of the return code.
Fixes: 6b1bbf94ab36 ("net/smc: fix invalid return code in smcd_new_buf_create()") Signed-off-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/smc_core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1616,8 +1616,11 @@ static struct smc_buf_desc *smcd_new_buf rc = smc_ism_register_dmb(lgr, bufsize, buf_desc); if (rc) { kfree(buf_desc); - return (rc == -ENOMEM) ? ERR_PTR(-EAGAIN) : - ERR_PTR(-EIO); + if (rc == -ENOMEM) + return ERR_PTR(-EAGAIN); + if (rc == -ENOSPC) + return ERR_PTR(-ENOSPC); + return ERR_PTR(-EIO); } buf_desc->pages = virt_to_page(buf_desc->cpu_addr); /* CDC header stored in buf. So, pretend it was smaller */
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit d6535dca28859d8d9ef80894eb287b2ac35a32e8 ]
The tcf_block_unbind() expects that the caller will take block->cb_lock before calling it, however the code took RTNL lock and dropped cb_lock instead. This causes to the following kernel panic.
WARNING: CPU: 1 PID: 13524 at net/sched/cls_api.c:1488 tcf_block_unbind+0x2db/0x420 Modules linked in: mlx5_ib mlx5_core mlxfw ptp pps_core act_mirred act_tunnel_key cls_flower vxlan ip6_udp_tunnel udp_tunnel dummy sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm ib_uverbs ib_core overlay [last unloaded: mlxfw] CPU: 1 PID: 13524 Comm: test-ecmp-add-v Tainted: G W 5.9.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:tcf_block_unbind+0x2db/0x420 Code: ff 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8d bc 24 30 01 00 00 be ff ff ff ff e8 7d 7f 70 00 85 c0 0f 85 7b fd ff ff <0f> 0b e9 74 fd ff ff 48 c7 c7 dc 6a 24 84 e8 02 ec fe fe e9 55 fd RSP: 0018:ffff888117d17968 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88812f713c00 RCX: 1ffffffff0848d5b RDX: 0000000000000001 RSI: ffff88814fbc8130 RDI: ffff888107f2b878 RBP: 1ffff11022fa2f3f R08: 0000000000000000 R09: ffffffff84115a87 R10: fffffbfff0822b50 R11: ffff888107f2b898 R12: ffff88814fbc8000 R13: ffff88812f713c10 R14: ffff888117d17a38 R15: ffff88814fbc80c0 FS: 00007f6593d36740(0000) GS:ffff8882a4f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005607a00758f8 CR3: 0000000131aea006 CR4: 0000000000170ea0 Call Trace: tc_block_indr_cleanup+0x3e0/0x5a0 ? tcf_block_unbind+0x420/0x420 ? __mutex_unlock_slowpath+0xe7/0x610 flow_indr_dev_unregister+0x5e2/0x930 ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core] ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core] ? flow_indr_block_cb_alloc+0x3c0/0x3c0 ? mlx5_db_free+0x37c/0x4b0 [mlx5_core] mlx5e_cleanup_rep_tx+0x8b/0xc0 [mlx5_core] mlx5e_detach_netdev+0xe5/0x120 [mlx5_core] mlx5e_vport_rep_unload+0x155/0x260 [mlx5_core] esw_offloads_disable+0x227/0x2b0 [mlx5_core] mlx5_eswitch_disable_locked.cold+0x38e/0x699 [mlx5_core] mlx5_eswitch_disable+0x94/0xf0 [mlx5_core] mlx5_device_disable_sriov+0x183/0x1f0 [mlx5_core] mlx5_core_sriov_configure+0xfd/0x230 [mlx5_core] sriov_numvfs_store+0x261/0x2f0 ? sriov_drivers_autoprobe_store+0x110/0x110 ? sysfs_file_ops+0x170/0x170 ? sysfs_file_ops+0x117/0x170 ? sysfs_file_ops+0x170/0x170 kernfs_fop_write+0x1ff/0x3f0 ? rcu_read_lock_any_held+0x6e/0x90 vfs_write+0x1f3/0x620 ksys_write+0xf9/0x1d0 ? __x64_sys_read+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x273/0x3f0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9
<...>
---[ end trace bfdd028ada702879 ]---
Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()") Signed-off-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20201026123327.1141066-1-leon@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -652,12 +652,12 @@ static void tc_block_indr_cleanup(struct block_cb->indr.binder_type, &block->flow_block, tcf_block_shared(block), &extack); + rtnl_lock(); down_write(&block->cb_lock); list_del(&block_cb->driver_list); list_move(&block_cb->list, &bo.cb_list); - up_write(&block->cb_lock); - rtnl_lock(); tcf_block_unbind(block, &bo); + up_write(&block->cb_lock); rtnl_unlock(); }
From: Gao Xiang hsiangkao@redhat.com
commit d578b46db69d125a654f509bdc9091d84e924dc8 upstream.
Don't recheck it since xattr_permission() already checks CAP_SYS_ADMIN capability.
Just follow 5d3ce4f70172 ("f2fs: avoid duplicated permission check for "trusted." xattrs")
Reported-by: Hongyu Jin hongyu.jin@unisoc.com [ Gao Xiang: since it could cause some complex Android overlay permission issue as well on android-5.4+, it'd be better to backport to 5.4+ rather than pure cleanup on mainline. ] Cc: stable@vger.kernel.org # 5.4+ Link: https://lore.kernel.org/r/20200811070020.6339-1-hsiangkao@redhat.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang hsiangkao@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/erofs/xattr.c | 2 -- 1 file changed, 2 deletions(-)
--- a/fs/erofs/xattr.c +++ b/fs/erofs/xattr.c @@ -473,8 +473,6 @@ static int erofs_xattr_generic_get(const return -EOPNOTSUPP; break; case EROFS_XATTR_INDEX_TRUSTED: - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; break; case EROFS_XATTR_INDEX_SECURITY: break;
From: Kim Phillips kim.phillips@amd.com
commit 221bfce5ebbdf72ff08b3bf2510ae81058ee568b upstream.
Stephane Eranian found a bug in that IBS' current Fetch counter was not being reset when the driver would write the new value to clear it along with the enable bit set, and found that adding an MSR write that would first disable IBS Fetch would make IBS Fetch reset its current count.
Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12, 2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when IbsFetchEn is changed from 0 to 1."
Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch, so the driver properly resets the internal counter to 0 and IBS Fetch starts counting again.
A family 15h machine tested does not have this problem, and the extra wrmsr is also not needed on Family 19h, so only do the extra wrmsr on families 16h through 18h.
Reported-by: Stephane Eranian stephane.eranian@google.com Signed-off-by: Kim Phillips kim.phillips@amd.com [peterz: optimized] Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/ibs.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -89,6 +89,7 @@ struct perf_ibs { u64 max_period; unsigned long offset_mask[1]; int offset_max; + unsigned int fetch_count_reset_broken : 1; struct cpu_perf_ibs __percpu *pcpu;
struct attribute **format_attrs; @@ -363,7 +364,12 @@ perf_ibs_event_update(struct perf_ibs *p static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs, struct hw_perf_event *hwc, u64 config) { - wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask); + u64 tmp = hwc->config | config; + + if (perf_ibs->fetch_count_reset_broken) + wrmsrl(hwc->config_base, tmp & ~perf_ibs->enable_mask); + + wrmsrl(hwc->config_base, tmp | perf_ibs->enable_mask); }
/* @@ -733,6 +739,13 @@ static __init void perf_event_ibs_init(v { struct attribute **attr = ibs_op_format_attrs;
+ /* + * Some chips fail to reset the fetch count when it is written; instead + * they need a 0-1 transition of IbsFetchEn. + */ + if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18) + perf_ibs_fetch.fetch_count_reset_broken = 1; + perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
if (ibs_caps & IBS_CAPS_OPCNT) {
From: Thomas Gleixner tglx@linutronix.de
commit 5f1ec1fd32252af5130dac23b5542e8e66fe0bcb upstream.
The conversion of #DE to the idtentry mechanism introduced a change in the Ooops message which confuses tools which parse crash information in dmesg.
Remove the underscore from 'divide_error' to restore previous behaviour.
Fixes: 9d06c4027f21 ("x86/entry: Convert Divide Error to IDTENTRY") Reported-by: Dmitry Vyukov dvyukov@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+o... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -195,7 +195,7 @@ static __always_inline void __user *erro
DEFINE_IDTENTRY(exc_divide_error) { - do_error_trap(regs, 0, "divide_error", X86_TRAP_DE, SIGFPE, + do_error_trap(regs, 0, "divide error", X86_TRAP_DE, SIGFPE, FPE_INTDIV, error_get_trap_addr(regs)); }
From: Juergen Gross jgross@suse.com
commit d759af38572f97321112a0852353613d18126038 upstream.
When running as Xen dom0 the kernel isn't responsible for selecting the error handling mode, this should be handled by the hypervisor.
So disable setting FF mode when running as Xen pv guest. Not doing so might result in boot splats like:
[ 7.509696] HEST: Enabling Firmware First mode for corrected errors. [ 7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2. [ 7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3. [ 7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4. [ 7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5. [ 7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6. [ 7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7. [ 7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.
Reason is that the HEST ACPI table contains the real number of MCA banks, while the hypervisor is emulating only 2 banks for guests.
Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Link: https://lore.kernel.org/r/20200925140751.31381-1-jgross@suse.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/enlighten_pv.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1376,6 +1376,15 @@ asmlinkage __visible void __init xen_sta x86_init.mpparse.get_smp_config = x86_init_uint_noop;
xen_boot_params_init_edd(); + +#ifdef CONFIG_ACPI + /* + * Disable selecting "Firmware First mode" for correctable + * memory errors, as this is the duty of the hypervisor to + * decide. + */ + acpi_disable_cmcff = 1; +#endif }
if (!boot_params.screen_info.orig_video_isVGA)
From: Pali Rohár pali@kernel.org
commit b0c6ae0f8948a2be6bf4e8b4bbab9ca1343289b6 upstream.
Old ATF automatically power on pcie phy and does not provide SMC call for phy power on functionality which leads to aardvark initialization failure:
[ 0.330134] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware [ 0.338846] phy phy-d0018300.phy.1: phy poweron failed --> -95 [ 0.344753] advk-pcie d0070000.pcie: Failed to initialize PHY (-95) [ 0.351160] advk-pcie: probe of d0070000.pcie failed with error -95
This patch fixes above failure by ignoring 'not supported' error in aardvark driver. In this case it is expected that phy is already power on.
Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Link: https://lore.kernel.org/r/20200902144344.16684-3-pali@kernel.org Fixes: 366697018c9a ("PCI: aardvark: Add PHY support") Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Rob Herring robh@kernel.org Cc: stable@vger.kernel.org # 5.8+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/controller/pci-aardvark.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1076,7 +1076,9 @@ static int advk_pcie_enable_phy(struct a }
ret = phy_power_on(pcie->phy); - if (ret) { + if (ret == -EOPNOTSUPP) { + dev_warn(&pcie->pdev->dev, "PHY unsupported by firmware\n"); + } else if (ret) { phy_exit(pcie->phy); return ret; }
From: Pali Rohár pali@kernel.org
commit 45aefe3d2251e4e229d7662052739f96ad1d08d9 upstream.
Older ATF does not provide SMC call for SATA phy power on functionality and therefore initialization of ahci_mvebu is failing when older version of ATF is using. In this case phy_power_on() function returns -EOPNOTSUPP.
This patch adds a new hflag AHCI_HFLAG_IGN_NOTSUPP_POWER_ON which cause that ahci_platform_enable_phys() would ignore -EOPNOTSUPP errors from phy_power_on() call.
It fixes initialization of ahci_mvebu on Espressobin boards where is older Marvell's Arm Trusted Firmware without SMC call for SATA phy power.
This is regression introduced in commit 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property") where SATA phy was defined and therefore ahci_platform_enable_phys() on Espressobin started failing.
Fixes: 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property") Signed-off-by: Pali Rohár pali@kernel.org Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Cc: stable@vger.kernel.org # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/ata/ahci.h | 2 ++ drivers/ata/ahci_mvebu.c | 2 +- drivers/ata/libahci_platform.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/ata/ahci.h +++ b/drivers/ata/ahci.h @@ -240,6 +240,8 @@ enum { as default lpm_policy */ AHCI_HFLAG_SUSPEND_PHYS = (1 << 26), /* handle PHYs during suspend/resume */ + AHCI_HFLAG_IGN_NOTSUPP_POWER_ON = (1 << 27), /* ignore -EOPNOTSUPP + from phy_power_on() */
/* ap->flags bits */
--- a/drivers/ata/ahci_mvebu.c +++ b/drivers/ata/ahci_mvebu.c @@ -227,7 +227,7 @@ static const struct ahci_mvebu_plat_data
static const struct ahci_mvebu_plat_data ahci_mvebu_armada_3700_plat_data = { .plat_config = ahci_mvebu_armada_3700_config, - .flags = AHCI_HFLAG_SUSPEND_PHYS, + .flags = AHCI_HFLAG_SUSPEND_PHYS | AHCI_HFLAG_IGN_NOTSUPP_POWER_ON, };
static const struct of_device_id ahci_mvebu_of_match[] = { --- a/drivers/ata/libahci_platform.c +++ b/drivers/ata/libahci_platform.c @@ -59,7 +59,7 @@ int ahci_platform_enable_phys(struct ahc }
rc = phy_power_on(hpriv->phys[i]); - if (rc) { + if (rc && !(rc == -EOPNOTSUPP && (hpriv->flags & AHCI_HFLAG_IGN_NOTSUPP_POWER_ON))) { phy_exit(hpriv->phys[i]); goto disable_phys; }
From: Miklos Szeredi mszeredi@redhat.com
commit d78092e4937de9ce55edcb4ee4c5e3c707be0190 upstream.
After unlock_request() pages from the ap->pages[] array may be put (e.g. by aborting the connection) and the pages can be freed.
Prevent use after free by grabbing a reference to the page before calling unlock_request().
The original patch was created by Pradeep P V K.
Reported-by: Pradeep P V K ppvk@codeaurora.org Cc: stable@vger.kernel.org Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fuse/dev.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -785,15 +785,16 @@ static int fuse_try_move_page(struct fus struct page *newpage; struct pipe_buffer *buf = cs->pipebufs;
+ get_page(oldpage); err = unlock_request(cs->req); if (err) - return err; + goto out_put_old;
fuse_copy_finish(cs);
err = pipe_buf_confirm(cs->pipe, buf); if (err) - return err; + goto out_put_old;
BUG_ON(!cs->nr_segs); cs->currbuf = buf; @@ -833,7 +834,7 @@ static int fuse_try_move_page(struct fus err = replace_page_cache_page(oldpage, newpage, GFP_KERNEL); if (err) { unlock_page(newpage); - return err; + goto out_put_old; }
get_page(newpage); @@ -852,14 +853,19 @@ static int fuse_try_move_page(struct fus if (err) { unlock_page(newpage); put_page(newpage); - return err; + goto out_put_old; }
unlock_page(oldpage); + /* Drop ref for ap->pages[] array */ put_page(oldpage); cs->len = 0;
- return 0; + err = 0; +out_put_old: + /* Drop ref obtained in this function */ + put_page(oldpage); + return err;
out_fallback_unlock: unlock_page(newpage); @@ -868,10 +874,10 @@ out_fallback: cs->offset = buf->offset;
err = lock_request(cs->req); - if (err) - return err; + if (!err) + err = 1;
- return 1; + goto out_put_old; }
static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, @@ -883,14 +889,16 @@ static int fuse_ref_page(struct fuse_cop if (cs->nr_segs >= cs->pipe->max_usage) return -EIO;
+ get_page(page); err = unlock_request(cs->req); - if (err) + if (err) { + put_page(page); return err; + }
fuse_copy_finish(cs);
buf = cs->pipebufs; - get_page(page); buf->page = page; buf->offset = offset; buf->len = count;
From: Song Liu songliubraving@fb.com
commit 1aef5b4391f0c75c0a1523706a7b0311846ee12f upstream.
This should be "current" not "skb".
Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/uapi/linux/bpf.h | 4 ++-- tools/include/uapi/linux/bpf.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1438,8 +1438,8 @@ union bpf_attr { * Return * The return value depends on the result of the test, and can be: * - * * 0, if the *skb* task belongs to the cgroup2. - * * 1, if the *skb* task does not belong to the cgroup2. + * * 0, if current task belongs to the cgroup2. + * * 1, if current task does not belong to the cgroup2. * * A negative error code, if an error occurred. * * long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags) --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1438,8 +1438,8 @@ union bpf_attr { * Return * The return value depends on the result of the test, and can be: * - * * 0, if the *skb* task belongs to the cgroup2. - * * 1, if the *skb* task does not belong to the cgroup2. + * * 0, if current task belongs to the cgroup2. + * * 1, if current task does not belong to the cgroup2. * * A negative error code, if an error occurred. * * long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
From: Roberto Sassu roberto.sassu@huawei.com
commit 455b6c9112eff8d249e32ba165742085678a80a4 upstream.
This patch checks the size for the EVM_IMA_XATTR_DIGSIG and EVM_XATTR_PORTABLE_DIGSIG types to ensure that the algorithm is read from the buffer returned by vfs_getxattr_alloc().
Cc: stable@vger.kernel.org # 4.19.x Fixes: 5feeb61183dde ("evm: Allow non-SHA1 digital signatures") Signed-off-by: Roberto Sassu roberto.sassu@huawei.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/integrity/evm/evm_main.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/security/integrity/evm/evm_main.c +++ b/security/integrity/evm/evm_main.c @@ -181,6 +181,12 @@ static enum integrity_status evm_verify_ break; case EVM_IMA_XATTR_DIGSIG: case EVM_XATTR_PORTABLE_DIGSIG: + /* accept xattr with non-empty signature field */ + if (xattr_len <= sizeof(struct signature_v2_hdr)) { + evm_status = INTEGRITY_FAIL; + goto out; + } + hdr = (struct signature_v2_hdr *)xattr_data; digest.hdr.algo = hdr->hash_algo; rc = evm_calc_hash(dentry, xattr_name, xattr_value,
From: Jia-Ju Bai baijiaju@tsinghua.edu.cn
commit 478762855b5ae9f68fa6ead1edf7abada70fcd5f upstream.
In p54p_tx(), skb->data is mapped to streaming DMA on line 337: mapping = pci_map_single(..., skb->data, ...);
Then skb->data is accessed on line 349: desc->device_addr = ((struct p54_hdr *)skb->data)->req_id;
This access may cause data inconsistency between CPU cache and hardware.
To fix this problem, ((struct p54_hdr *)skb->data)->req_id is stored in a local variable before DMA mapping, and then the driver accesses this local variable instead of skb->data.
Cc: stable@vger.kernel.org Signed-off-by: Jia-Ju Bai baijiaju@tsinghua.edu.cn Acked-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/20200802132949.26788-1-baijiaju@tsinghua.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/intersil/p54/p54pci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/wireless/intersil/p54/p54pci.c +++ b/drivers/net/wireless/intersil/p54/p54pci.c @@ -333,10 +333,12 @@ static void p54p_tx(struct ieee80211_hw struct p54p_desc *desc; dma_addr_t mapping; u32 idx, i; + __le32 device_addr;
spin_lock_irqsave(&priv->lock, flags); idx = le32_to_cpu(ring_control->host_idx[1]); i = idx % ARRAY_SIZE(ring_control->tx_data); + device_addr = ((struct p54_hdr *)skb->data)->req_id;
mapping = dma_map_single(&priv->pdev->dev, skb->data, skb->len, DMA_TO_DEVICE); @@ -350,7 +352,7 @@ static void p54p_tx(struct ieee80211_hw
desc = &ring_control->tx_data[i]; desc->host_addr = cpu_to_le32(mapping); - desc->device_addr = ((struct p54_hdr *)skb->data)->req_id; + desc->device_addr = device_addr; desc->len = cpu_to_le16(skb->len); desc->flags = 0;
From: Frederic Barrat fbarrat@linux.ibm.com
commit 40ac790d99c6dd16b367d5c2339e446a5f1b0593 upstream.
Improve the error message shown if a capi adapter is plugged on a capi-incompatible slot directly under the PHB (no intermediate switch).
Fixes: 5632874311db ("cxl: Add support for POWER9 DD2") Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Frederic Barrat fbarrat@linux.ibm.com Reviewed-by: Andrew Donnellan ajd@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20200407115601.25453-1-fbarrat@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/cxl/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -393,8 +393,8 @@ int cxl_calc_capp_routing(struct pci_dev *capp_unit_id = get_capp_unit_id(np, *phb_index); of_node_put(np); if (!*capp_unit_id) { - pr_err("cxl: invalid capp unit id (phb_index: %d)\n", - *phb_index); + pr_err("cxl: No capp unit found for PHB[%lld,%d]. Make sure the adapter is on a capi-compatible slot\n", + *chipid, *phb_index); return -ENODEV; }
From: Jason Gunthorpe jgg@nvidia.com
commit 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e upstream.
This three thread race can result in the work being run once the callback becomes NULL:
CPU1 CPU2 CPU3 netevent_callback() process_one_req() rdma_addr_cancel() [..] spin_lock_bh() set_timeout() spin_unlock_bh()
spin_lock_bh() list_del_init(&req->list); spin_unlock_bh()
req->callback = NULL spin_lock_bh() if (!list_empty(&req->list)) // Skipped! // cancel_delayed_work(&req->work); spin_unlock_bh()
process_one_req() // again req->callback() // BOOM cancel_delayed_work_sync()
The solution is to always cancel the work once it is completed so any in between set_timeout() does not result in it running again.
Cc: stable@vger.kernel.org Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence") Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org Reported-by: Dan Aloni dan@kernelim.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/addr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -647,13 +647,12 @@ static void process_one_req(struct work_ req->callback = NULL;
spin_lock_bh(&lock); + /* + * Although the work will normally have been canceled by the workqueue, + * it can still be requeued as long as it is on the req_list. + */ + cancel_delayed_work(&req->work); if (!list_empty(&req->list)) { - /* - * Although the work will normally have been canceled by the - * workqueue, it can still be requeued as long as it is on the - * req_list. - */ - cancel_delayed_work(&req->work); list_del_init(&req->list); kfree(req); }
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 1c9c02bb22684f6949d2e7ddc0a3ff364fd5a6fc upstream.
Update logic for broken test. Use a more common logging style.
It appears the logic in this function is broken for the consecutive tests of
if (prog_status & 0x3) ... else if (prog_status & 0x2) ... else (prog_status & 0x1) ...
Likely the first test should be
if ((prog_status & 0x3) == 0x3)
Found by inspection of include files using printk.
Fixes: eb3db27507f7 ("[MTD] LPDDR PFOW definition") Cc: stable@vger.kernel.org Reported-by: Joe Perches joe@perches.com Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Acked-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/3fb0e29f5b601db8be2938a01d974b00c8788501.1... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mtd/pfow.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/mtd/pfow.h +++ b/include/linux/mtd/pfow.h @@ -128,7 +128,7 @@ static inline void print_drs_error(unsig
if (!(dsr & DSR_AVAILABLE)) printk(KERN_NOTICE"DSR.15: (0) Device not Available\n"); - if (prog_status & 0x03) + if ((prog_status & 0x03) == 0x03) printk(KERN_NOTICE"DSR.9,8: (11) Attempt to program invalid " "half with 41h command\n"); else if (prog_status & 0x02)
From: Chris Wilson chris@chris-wilson.co.uk
commit 4fe9af8e881d946bf60790eeb37a7c4f96e28382 upstream.
Since the debugfs may peek into the GEM contexts as the corresponding client/fd is being closed, we may try and follow a dangling pointer. However, the context closure itself is serialised with the ctx->mutex, so if we hold that mutex as we inspect the state coupled in the context, we know the pointers within the context are stable and will remain valid as we inspect their tables.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: CQ Tang cq.tang@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20200723172119.17649-3-chris@c... (cherry picked from commit 102f5aa491f262c818e607fc4fee08a724a76c69) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/i915_debugfs.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -326,6 +326,7 @@ static void print_context_stats(struct s } i915_gem_context_unlock_engines(ctx);
+ mutex_lock(&ctx->mutex); if (!IS_ERR_OR_NULL(ctx->file_priv)) { struct file_stats stats = { .vm = rcu_access_pointer(ctx->vm), @@ -346,6 +347,7 @@ static void print_context_stats(struct s
print_file_stats(m, name, stats); } + mutex_unlock(&ctx->mutex);
spin_lock(&i915->gem.contexts.lock); list_safe_reset_next(ctx, cn, link);
From: Paras Sharma parashar@codeaurora.org
commit c9ca43d42ed8d5fd635d327a664ed1d8579eb2af upstream.
For QUP IP versions 2.5 and above the oversampling rate is halved from 32 to 16.
Commit ce734600545f ("tty: serial: qcom_geni_serial: Update the oversampling rate") is pushed to handle this scenario. But the existing logic is failing to classify QUP Version 3.0 into the correct group ( 2.5 and above).
As result Serial Engine clocks are not configured properly for baud rate and garbage data is sampled to FIFOs from the line.
So, fix the logic to detect QUP with versions 2.5 and above.
Fixes: ce734600545f ("tty: serial: qcom_geni_serial: Update the oversampling rate") Cc: stable stable@vger.kernel.org Signed-off-by: Paras Sharma parashar@codeaurora.org Reviewed-by: Akash Asthana akashast@codeaurora.org Link: https://lore.kernel.org/r/1601445926-23673-1-git-send-email-parashar@codeaur... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/serial/qcom_geni_serial.c | 2 +- include/linux/qcom-geni-se.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -1000,7 +1000,7 @@ static void qcom_geni_serial_set_termios sampling_rate = UART_OVERSAMPLING; /* Sampling rate is halved for IP versions >= 2.5 */ ver = geni_se_get_qup_hw_version(&port->se); - if (GENI_SE_VERSION_MAJOR(ver) >= 2 && GENI_SE_VERSION_MINOR(ver) >= 5) + if (ver >= QUP_SE_VERSION_2_5) sampling_rate /= 2;
clk_rate = get_clk_div_rate(baud, sampling_rate, &clk_div); --- a/include/linux/qcom-geni-se.h +++ b/include/linux/qcom-geni-se.h @@ -248,6 +248,9 @@ struct geni_se { #define GENI_SE_VERSION_MINOR(ver) ((ver & HW_VER_MINOR_MASK) >> HW_VER_MINOR_SHFT) #define GENI_SE_VERSION_STEP(ver) (ver & HW_VER_STEP_MASK)
+/* QUP SE VERSION value for major number 2 and minor number 5 */ +#define QUP_SE_VERSION_2_5 0x20050000 + /* * Define bandwidth thresholds that cause the underlying Core 2X interconnect * clock to run at the named frequency. These baseline values are recommended
From: Peter Zijlstra peterz@infradead.org
commit 534cf755d9df99e214ddbe26b91cd4d81d2603e2 upstream.
Issuing a magic-sysrq via the PL011 causes the following lockdep splat, which is easily reproducible under QEMU:
| sysrq: Changing Loglevel | sysrq: Loglevel set to 9 | | ====================================================== | WARNING: possible circular locking dependency detected | 5.9.0-rc7 #1 Not tainted | ------------------------------------------------------ | systemd-journal/138 is trying to acquire lock: | ffffab133ad950c0 (console_owner){-.-.}-{0:0}, at: console_lock_spinning_enable+0x34/0x70 | | but task is already holding lock: | ffff0001fd47b098 (&port_lock_key){-.-.}-{2:2}, at: pl011_int+0x40/0x488 | | which lock already depends on the new lock.
[...]
| Possible unsafe locking scenario: | | CPU0 CPU1 | ---- ---- | lock(&port_lock_key); | lock(console_owner); | lock(&port_lock_key); | lock(console_owner); | | *** DEADLOCK ***
The issue being that CPU0 takes 'port_lock' on the irq path in pl011_int() before taking 'console_owner' on the printk() path, whereas CPU1 takes the two locks in the opposite order on the printk() path due to setting the "console_owner" prior to calling into into the actual console driver.
Fix this in the same way as the msm-serial driver by dropping 'port_lock' before handling the sysrq.
Cc: stable@vger.kernel.org # 4.19+ Cc: Russell King linux@armlinux.org.uk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jirislaby@kernel.org Link: https://lore.kernel.org/r/20200811101313.GA6970@willie-the-truck Signed-off-by: Peter Zijlstra peterz@infradead.org Tested-by: Will Deacon will@kernel.org Signed-off-by: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20200930120432.16551-1-will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/serial/amba-pl011.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -308,8 +308,9 @@ static void pl011_write(unsigned int val */ static int pl011_fifo_to_tty(struct uart_amba_port *uap) { - u16 status; unsigned int ch, flag, fifotaken; + int sysrq; + u16 status;
for (fifotaken = 0; fifotaken != 256; fifotaken++) { status = pl011_read(uap, REG_FR); @@ -344,10 +345,12 @@ static int pl011_fifo_to_tty(struct uart flag = TTY_FRAME; }
- if (uart_handle_sysrq_char(&uap->port, ch & 255)) - continue; + spin_unlock(&uap->port.lock); + sysrq = uart_handle_sysrq_char(&uap->port, ch & 255); + spin_lock(&uap->port.lock);
- uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag); + if (!sysrq) + uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag); }
return fifotaken;
From: Grygorii Strashko grygorii.strashko@ti.com
commit 6b61d49a55796dbbc479eeb4465e59fd656c719c upstream.
Commit 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers") switched PM runtime autosuspend to use hrtimers and all related time accounting in ns, but missed to update the timer_expires data type in struct dev_pm_info to u64.
This causes the timer_expires value to be truncated on 32-bit architectures when assignment is done from u64 values:
rpm_suspend() |- dev->power.timer_expires = expires;
Fix it by changing the timer_expires type to u64.
Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers") Signed-off-by: Grygorii Strashko grygorii.strashko@ti.com Acked-by: Pavel Machek pavel@ucw.cz Acked-by: Vincent Guittot vincent.guittot@linaro.org Cc: 5.0+ stable@vger.kernel.org # 5.0+ [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/pm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -590,7 +590,7 @@ struct dev_pm_info { #endif #ifdef CONFIG_PM struct hrtimer suspend_timer; - unsigned long timer_expires; + u64 timer_expires; struct work_struct work; wait_queue_head_t wait_queue; struct wake_irq *wakeirq;
From: Geert Uytterhoeven geert+renesas@glider.be
commit df9c590986fdb6db9d5636d6cd93bc919c01b451 upstream.
Before commit 9495b7e92f716ab2 ("driver core: platform: Initialize dma_parms for platform devices"), the R-Car SATA device didn't have DMA parameters. Hence the DMA boundary mask supplied by its driver was silently ignored, as __scsi_init_queue() doesn't check the return value of dma_set_seg_boundary(), and the default value of 0xffffffff was used.
Now the device has gained DMA parameters, the driver-supplied value is used, and the following warning is printed on Salvator-XS:
DMA-API: sata_rcar ee300000.sata: mapping sg segment across boundary [start=0x00000000ffffe000] [end=0x00000000ffffefff] [boundary=0x000000001ffffffe] WARNING: CPU: 5 PID: 38 at kernel/dma/debug.c:1233 debug_dma_map_sg+0x298/0x300
(the range of start/end values depend on whether IOMMU support is enabled or not)
The issue here is that SATA_RCAR_DMA_BOUNDARY doesn't have bit 0 set, so any typical end value, which is odd, will trigger the check.
Fix this by increasing the DMA boundary value by 1.
This also fixes the following WRITE DMA EXT timeout issue:
# dd if=/dev/urandom of=/mnt/de1/file1-1024M bs=1M count=1024 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata1.00: failed command: WRITE DMA EXT ata1.00: cmd 35/00:00:00:e6:0c/00:0a:00:00:00/e0 tag 0 dma 1310720 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY }
as seen by Shimoda-san since commit 429120f3df2dba2b ("block: fix splitting segments on boundary masks").
Fixes: 8bfbeed58665dbbf ("sata_rcar: correct 'sata_rcar_sht'") Fixes: 9495b7e92f716ab2 ("driver core: platform: Initialize dma_parms for platform devices") Fixes: 429120f3df2dba2b ("block: fix splitting segments on boundary masks") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Tested-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com Tested-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Sergei Shtylyov sergei.shtylyov@cogentembedded.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Cc: stable stable@vger.kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/ata/sata_rcar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ata/sata_rcar.c +++ b/drivers/ata/sata_rcar.c @@ -120,7 +120,7 @@ /* Descriptor table word 0 bit (when DTA32M = 1) */ #define SATA_RCAR_DTEND BIT(0)
-#define SATA_RCAR_DMA_BOUNDARY 0x1FFFFFFEUL +#define SATA_RCAR_DMA_BOUNDARY 0x1FFFFFFFUL
/* Gen2 Physical Layer Control Registers */ #define RCAR_GEN2_PHY_CTL1_REG 0x1704
From: Jens Axboe axboe@kernel.dk
commit 13bd691421bc191a402d2e0d3da5f248d170a632 upstream.
Once we've copied some data for an iocb that is marked with IOCB_WAITQ, we should no longer attempt to async lock a new page. Instead make sure we return the copied amount, and let the caller retry, instead of returning -EIOCBQUEUED for a new page.
This should only be possible with read-ahead disabled on the below device, and multiple threads racing on the same file. Haven't been able to reproduce on anything else.
Cc: stable@vger.kernel.org # v5.9 Fixes: 1a0a7853b901 ("mm: support async buffered reads in generic_file_buffered_read()") Reported-by: Kent Overstreet kent.overstreet@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/filemap.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -2179,6 +2179,14 @@ ssize_t generic_file_buffered_read(struc last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT; offset = *ppos & ~PAGE_MASK;
+ /* + * If we've already successfully copied some data, then we + * can no longer safely return -EIOCBQUEUED. Hence mark + * an async read NOWAIT at that point. + */ + if (written && (iocb->ki_flags & IOCB_WAITQ)) + iocb->ki_flags |= IOCB_NOWAIT; + for (;;) { struct page *page; pgoff_t end_index;
From: Souptick Joarder jrdr.linux@gmail.com
commit 779055842da5b2e508f3ccf9a8153cb1f704f566 upstream.
There seems to be a bug in the original code when gntdev_get_page() is called with writeable=true then the page needs to be marked dirty before being put.
To address this, a bool writeable is added in gnt_dev_copy_batch, set it in gntdev_grant_copy_seg() (and drop `writeable` argument to gntdev_get_page()) and then, based on batch->writeable, use set_page_dirty_lock().
Fixes: a4cdb556cae0 (xen/gntdev: add ioctl for grant copy) Suggested-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Souptick Joarder jrdr.linux@gmail.com Cc: John Hubbard jhubbard@nvidia.com Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: David Vrabel david.vrabel@citrix.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1599375114-32360-1-git-send-email-jrdr.linux@gmail... Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/gntdev.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
--- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -720,17 +720,18 @@ struct gntdev_copy_batch { s16 __user *status[GNTDEV_COPY_BATCH]; unsigned int nr_ops; unsigned int nr_pages; + bool writeable; };
static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt, - bool writeable, unsigned long *gfn) + unsigned long *gfn) { unsigned long addr = (unsigned long)virt; struct page *page; unsigned long xen_pfn; int ret;
- ret = get_user_pages_fast(addr, 1, writeable ? FOLL_WRITE : 0, &page); + ret = get_user_pages_fast(addr, 1, batch->writeable ? FOLL_WRITE : 0, &page); if (ret < 0) return ret;
@@ -746,9 +747,13 @@ static void gntdev_put_pages(struct gntd { unsigned int i;
- for (i = 0; i < batch->nr_pages; i++) + for (i = 0; i < batch->nr_pages; i++) { + if (batch->writeable && !PageDirty(batch->pages[i])) + set_page_dirty_lock(batch->pages[i]); put_page(batch->pages[i]); + } batch->nr_pages = 0; + batch->writeable = false; }
static int gntdev_copy(struct gntdev_copy_batch *batch) @@ -837,8 +842,9 @@ static int gntdev_grant_copy_seg(struct virt = seg->source.virt + copied; off = (unsigned long)virt & ~XEN_PAGE_MASK; len = min(len, (size_t)XEN_PAGE_SIZE - off); + batch->writeable = false;
- ret = gntdev_get_page(batch, virt, false, &gfn); + ret = gntdev_get_page(batch, virt, &gfn); if (ret < 0) return ret;
@@ -856,8 +862,9 @@ static int gntdev_grant_copy_seg(struct virt = seg->dest.virt + copied; off = (unsigned long)virt & ~XEN_PAGE_MASK; len = min(len, (size_t)XEN_PAGE_SIZE - off); + batch->writeable = true;
- ret = gntdev_get_page(batch, virt, true, &gfn); + ret = gntdev_get_page(batch, virt, &gfn); if (ret < 0) return ret;
From: Pavel Begunkov asml.silence@gmail.com
commit ff5771613cd7b3a76cd16cb54aa81d30d3c11d48 upstream.
Clear linked_timeout for next requests in __io_queue_sqe() so we won't queue it up unnecessary when it's going to be punted.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Cc: stable@vger.kernel.org # v5.9 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/io_uring.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6249,8 +6249,10 @@ err: if (nxt) { req = nxt;
- if (req->flags & REQ_F_FORCE_ASYNC) + if (req->flags & REQ_F_FORCE_ASYNC) { + linked_timeout = NULL; goto punt; + } goto again; } exit:
From: Ricky Wu ricky_wu@realtek.com
commit 551b6729578a8981c46af964c10bf7d5d9ddca83 upstream.
this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform missing card reader
Signed-off-by: Ricky Wu ricky_wu@realtek.com Link: https://lore.kernel.org/r/20200824030006.30033-1-ricky_wu@realtek.com Cc: Chris Clayton chris2553@googlemail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/cardreader/rtsx_pcr.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/drivers/misc/cardreader/rtsx_pcr.c +++ b/drivers/misc/cardreader/rtsx_pcr.c @@ -1155,10 +1155,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr * rtsx_pci_write_register(pcr, REG_OCPGLITCH, SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch); rtsx_pci_enable_ocp(pcr); - } else { - /* OC power down */ - rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, - OC_POWER_DOWN); } } }
From: Pali Rohár pali@kernel.org
commit ea17a0f153af2cd890e4ce517130dcccaa428c13 upstream.
Driver ->power_on and ->power_off callbacks leaks internal SMCC firmware return codes to phy caller. This patch converts SMCC error codes to standard linux errno codes. Include file linux/arm-smccc.h already provides defines for SMCC error codes, so use them instead of custom driver defines. Note that return value is signed 32bit, but stored in unsigned long type with zero padding.
Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Link: https://lore.kernel.org/r/20200902144344.16684-2-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/phy/marvell/phy-mvebu-a3700-comphy.c | 14 +++++++++++--- drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 14 +++++++++++--- 2 files changed, 22 insertions(+), 6 deletions(-)
--- a/drivers/phy/marvell/phy-mvebu-a3700-comphy.c +++ b/drivers/phy/marvell/phy-mvebu-a3700-comphy.c @@ -26,7 +26,6 @@ #define COMPHY_SIP_POWER_ON 0x82000001 #define COMPHY_SIP_POWER_OFF 0x82000002 #define COMPHY_SIP_PLL_LOCK 0x82000003 -#define COMPHY_FW_NOT_SUPPORTED (-1)
#define COMPHY_FW_MODE_SATA 0x1 #define COMPHY_FW_MODE_SGMII 0x2 @@ -112,10 +111,19 @@ static int mvebu_a3700_comphy_smc(unsign unsigned long mode) { struct arm_smccc_res res; + s32 ret;
arm_smccc_smc(function, lane, mode, 0, 0, 0, 0, 0, &res); + ret = res.a0;
- return res.a0; + switch (ret) { + case SMCCC_RET_SUCCESS: + return 0; + case SMCCC_RET_NOT_SUPPORTED: + return -EOPNOTSUPP; + default: + return -EINVAL; + } }
static int mvebu_a3700_comphy_get_fw_mode(int lane, int port, @@ -220,7 +228,7 @@ static int mvebu_a3700_comphy_power_on(s }
ret = mvebu_a3700_comphy_smc(COMPHY_SIP_POWER_ON, lane->id, fw_param); - if (ret == COMPHY_FW_NOT_SUPPORTED) + if (ret == -EOPNOTSUPP) dev_err(lane->dev, "unsupported SMC call, try updating your firmware\n");
--- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c +++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c @@ -123,7 +123,6 @@
#define COMPHY_SIP_POWER_ON 0x82000001 #define COMPHY_SIP_POWER_OFF 0x82000002 -#define COMPHY_FW_NOT_SUPPORTED (-1)
/* * A lane is described by the following bitfields: @@ -273,10 +272,19 @@ static int mvebu_comphy_smc(unsigned lon unsigned long lane, unsigned long mode) { struct arm_smccc_res res; + s32 ret;
arm_smccc_smc(function, phys, lane, mode, 0, 0, 0, 0, &res); + ret = res.a0;
- return res.a0; + switch (ret) { + case SMCCC_RET_SUCCESS: + return 0; + case SMCCC_RET_NOT_SUPPORTED: + return -EOPNOTSUPP; + default: + return -EINVAL; + } }
static int mvebu_comphy_get_mode(bool fw_mode, int lane, int port, @@ -819,7 +827,7 @@ static int mvebu_comphy_power_on(struct if (!ret) return ret;
- if (ret == COMPHY_FW_NOT_SUPPORTED) + if (ret == -EOPNOTSUPP) dev_err(priv->dev, "unsupported SMC call, try updating your firmware\n");
On Sat, 2020-10-31 at 12:35 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
hello, i have build using ktest. but then i did the normal compile. compiled and booted 5.9.3-rc1+ . dmesg -l err is clear.
some lines from dmesg output related ----------x------------------x---------------------------x--- video: module verification failed: signature and/or required key missing - tainting kernel sdhci-pci 0000:00:1e.6: failed to setup card detect gpio --------x-------------------------------x-----------------x---
Now something related to kernel build and install.. ----------x---------------------x--------------------------x------- WARNING: modpost: EXPORT symbol "copy_mc_fragile" [vmlinux] version generation failed, symbol will not be versioned. W: Possible missing firmware /lib/firmware/i915/rkl_dmc_ver2_01.bin for module i915 -------------x-------------------x-------------------------x---------
Now one thing during boot.. -----------x------------x---- unable to start nftables -x-----------------------x---
iam attaching a file....please see...
Tested-by: Jeffrin Jose T jeffrin@rajagiritech.edu.in
On Sat, Oct 31, 2020 at 10:41:38PM +0530, Jeffrin Jose T wrote:
On Sat, 2020-10-31 at 12:35 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
hello, i have build using ktest. but then i did the normal compile. compiled and booted 5.9.3-rc1+ . dmesg -l err is clear.
some lines from dmesg output related ----------x------------------x---------------------------x--- video: module verification failed: signature and/or required key missing - tainting kernel sdhci-pci 0000:00:1e.6: failed to setup card detect gpio --------x-------------------------------x-----------------x---
Now something related to kernel build and install.. ----------x---------------------x--------------------------x------- WARNING: modpost: EXPORT symbol "copy_mc_fragile" [vmlinux] version generation failed, symbol will not be versioned. W: Possible missing firmware /lib/firmware/i915/rkl_dmc_ver2_01.bin for module i915 -------------x-------------------x-------------------------x---------
Now one thing during boot.. -----------x------------x---- unable to start nftables -x-----------------------x---
iam attaching a file....please see...
Is this any different from 5.9.2?
thanks,
greg k-h
On Sun, 2020-11-01 at 11:41 +0100, Greg Kroah-Hartman wrote:
On Sat, Oct 31, 2020 at 10:41:38PM +0530, Jeffrin Jose T wrote:
On Sat, 2020-10-31 at 12:35 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
hello, i have build using ktest. but then i did the normal compile. compiled and booted 5.9.3-rc1+ . dmesg -l err is clear.
some lines from dmesg output related ----------x------------------x---------------------------x--- video: module verification failed: signature and/or required key missing - tainting kernel sdhci-pci 0000:00:1e.6: failed to setup card detect gpio --------x-------------------------------x-----------------x---
Now something related to kernel build and install.. ----------x---------------------x--------------------------x------- WARNING: modpost: EXPORT symbol "copy_mc_fragile" [vmlinux] version generation failed, symbol will not be versioned. W: Possible missing firmware /lib/firmware/i915/rkl_dmc_ver2_01.bin for module i915
-------------x-------------------x-------------------------x-------
Now one thing during boot.. -----------x------------x---- unable to start nftables -x-----------------------x---
iam attaching a file....please see...
Is this any different from 5.9.2?
according to dmesg,things are to an extent ok.
the other stuff may be because of my system upgradation and may be there is a change in kernel configuration
Overall i think there may be nothing serious in kernel space.
.
On Sat, Oct 31, 2020 at 12:35:42PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
Build results: total: 154 pass: 154 fail: 0 Qemu test results: total: 426 pass: 426 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Sat, Oct 31, 2020 at 01:09:15PM -0700, Guenter Roeck wrote:
On Sat, Oct 31, 2020 at 12:35:42PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
Build results: total: 154 pass: 154 fail: 0 Qemu test results: total: 426 pass: 426 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Thanks for testing these and letting me know.
greg k-h
On Sat, 31 Oct 2020 at 17:17, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
NOTE: LTP version upgrade to 20200930. Due to this change we have noticed few test failures and fixes which are not related to kernel changes.
Summary ------------------------------------------------------------------------
kernel: 5.9.3-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.9.y git commit: dae2c902d0480f9e51891768862f034ee97f4db1 git describe: v5.9.2-75-gdae2c902d048 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.9.y/build/v5.9.2-...
No regressions (compared to build v5.9.2)
No fixes (compared to build v5.9.2)
Fixes (compared to LTP 20200515) These fixes are coming from LTP upgrade 20200930. ltp-commands-tests: * logrotate_sh
ltp-containers-tests: * netns_netlink
ltp-controllers-tests: * cpuset_hotplug
ltp-crypto-tests: * af_alg02
ltp-cve-tests: * cve-2017-17805 * cve-2018-1000199
ltp-syscalls-tests: * clock_gettime03 * clone302 * copy_file_range02 * mknod07 * ptrace08 * syslog01 * syslog02 * syslog03 * syslog04 * syslog05 * syslog07 * syslog08 * syslog09 * syslog10
ltp-open-posix-tests: * clocks_invaliddates
On Sun, Nov 01, 2020 at 12:34:42PM +0530, Naresh Kamboju wrote:
On Sat, 31 Oct 2020 at 17:17, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Thanks for testing all of these and letting me konw.
greg k-h
On Sat, 31 Oct 2020 12:35:42 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.9.3 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.9.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.9.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.9: 15 builds: 15 pass, 0 fail 26 boots: 26 pass, 0 fail 61 tests: 61 pass, 0 fail
Linux version: 5.9.3-rc1-gdae2c902d048 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
linux-stable-mirror@lists.linaro.org