This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.144-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.144-rc1
Richard Guy Briggs rgb@redhat.com audit: move put_tree() to avoid trim_trees refcount underflow and UAF
Peter Collingbourne pcc@google.com net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
Helge Deller deller@gmx.de Revert "parisc: Add assembly implementations for memset, strlen, strcpy, strncpy and strcat"
Denis Efremov efremov@linux.com Revert "floppy: reintroduce O_NDELAY fix"
Qu Wenruo wqu@suse.com btrfs: fix NULL pointer dereference when deleting device by invalid id
Petr Vorel petr.vorel@gmail.com arm64: dts: qcom: msm8994-angler: Fix gpio-reserved-ranges 85-88
Sean Christopherson seanjc@google.com KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
DENG Qingfang dqfext@gmail.com net: dsa: mt7530: fix VLAN traffic leaks again
Andrii Nakryiko andriin@fb.com bpf: Fix cast to pointer from integer of different size warning
Andrii Nakryiko andriin@fb.com bpf: Track contents of read-only maps as scalars
Linus Torvalds torvalds@linux-foundation.org vt_kdsetmode: extend console locking
Filipe Manana fdmanana@suse.com btrfs: fix race between marking inode needs to be logged and log syncing
Gerd Rausch gerd.rausch@oracle.com net/rds: dma_map_sg is entitled to merge entries
Ben Skeggs bskeggs@redhat.com drm/nouveau/disp: power down unused DP links during init
Mark Yacoub markyacoub@google.com drm: Copy drm_wait_vblank to user before returning
Shai Malin smalin@marvell.com qed: Fix null-pointer dereference in qed_rdma_create_qp()
Shai Malin smalin@marvell.com qed: qed ll2 race condition fixes
Neeraj Upadhyay neeraju@codeaurora.org vringh: Use wiov->used to check for read/write desc order
Parav Pandit parav@nvidia.com virtio_pci: Support surprise removal of virtio pci device
Parav Pandit parav@nvidia.com virtio: Improve vq->broken access to avoid any compiler optimization
Michał Mirosław mirq-linux@rere.qmqm.pl opp: remove WARN when no valid OPPs remain
Colin Ian King colin.king@canonical.com perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32
Jerome Brunet jbrunet@baylibre.com usb: gadget: u_audio: fix race condition on endpoint stop
Matthew Brost matthew.brost@intel.com drm/i915: Fix syncmap memory leak
Guangbin Huang huangguangbin2@huawei.com net: hns3: fix get wrong pfc_en when query PFC configuration
Guojia Liao liaoguojia@huawei.com net: hns3: fix duplicate node in VLAN list
Yufeng Mo moyufeng@huawei.com net: hns3: clear hardware resource when loading driver
Andrey Ignatov rdna@fb.com rtnetlink: Return correct error on changing device netns
Maxim Kiselev bigunclemax@gmail.com net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Christophe JAILLET christophe.jaillet@wanadoo.fr xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()'
Shreyansh Chouhan chouhan.shreyansh630@gmail.com ip_gre: add validation for csum_start
Gal Pressman galpress@amazon.com RDMA/efa: Free IRQ vectors on error flow
Sasha Neftin sasha.neftin@intel.com e1000e: Fix the max snoop/no-snoop latency for 10M
Tuo Li islituo@gmail.com IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Naresh Kumar PBS nareshkumar.pbs@broadcom.com RDMA/bnxt_re: Add missing spin lock initialization
Li Jinlin lijinlin3@huawei.com scsi: core: Fix hang of freezing queue between blocking and running device
Wesley Cheng wcheng@codeaurora.org usb: dwc3: gadget: Stop EP0 transfers during pullup disable
Thinh Nguyen Thinh.Nguyen@synopsys.com usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Zhengjun Zhang zhangzhengjun@aicrobo.com USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold johan@kernel.org Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje stefan.maetje@esd.eu can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Yafang Shao laoar.shao@gmail.com mm, oom: make the calculation of oom badness more accurate
Shaik Sajida Bhanu sbhanu@codeaurora.org mmc: sdhci-msm: Update the software timeout value for sdhc
Miklos Szeredi mszeredi@redhat.com ovl: fix uninitialized pointer read in ovl_lookup_real_one()
Kefeng Wang wangkefeng.wang@huawei.com once: Fix panic when module unload
Florian Westphal fw@strlen.de netfilter: conntrack: collect all entries in one cycle
Guenter Roeck linux@roeck-us.net ARC: Fix CONFIG_STACKDEPOT
Xiaolong Huang butterflyhuangxx@gmail.com net: qrtr: fix another OOB Read in qrtr_endpoint_post
-------------
Diffstat:
Makefile | 4 +- arch/arc/kernel/vmlinux.lds.S | 2 + .../arm64/boot/dts/qcom/msm8994-angler-rev-101.dts | 4 + arch/parisc/include/asm/string.h | 15 --- arch/parisc/kernel/parisc_ksyms.c | 4 - arch/parisc/lib/Makefile | 4 +- arch/parisc/lib/memset.c | 72 +++++++++++ arch/parisc/lib/string.S | 136 --------------------- arch/x86/events/intel/uncore_snbep.c | 2 +- arch/x86/kvm/mmu.c | 10 +- drivers/block/floppy.c | 27 ++-- drivers/gpu/drm/drm_ioc32.c | 4 +- drivers/gpu/drm/i915/gt/intel_timeline.c | 8 ++ drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 2 +- drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h | 1 + drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c | 9 ++ drivers/infiniband/hw/bnxt_re/ib_verbs.c | 1 + drivers/infiniband/hw/efa/efa_main.c | 1 + drivers/infiniband/hw/hfi1/sdma.c | 9 +- drivers/mmc/host/sdhci-msm.c | 18 +++ drivers/net/can/usb/esd_usb2.c | 4 +- drivers/net/dsa/mt7530.c | 5 +- drivers/net/ethernet/apm/xgene-v2/main.c | 4 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h | 3 + .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 13 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 32 ++++- drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 ++- drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 + drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_ll2.c | 20 +++ drivers/net/ethernet/qlogic/qed/qed_rdma.c | 3 +- drivers/opp/of.c | 5 +- drivers/scsi/scsi_sysfs.c | 9 +- drivers/tty/vt/vt_ioctl.c | 11 +- drivers/usb/dwc3/gadget.c | 23 ++-- drivers/usb/gadget/function/u_audio.c | 5 +- drivers/usb/serial/ch341.c | 1 - drivers/usb/serial/option.c | 2 + drivers/vhost/vringh.c | 2 +- drivers/virtio/virtio_pci_common.c | 7 ++ drivers/virtio/virtio_ring.c | 6 +- fs/btrfs/btrfs_inode.h | 15 +++ fs/btrfs/file.c | 10 +- fs/btrfs/inode.c | 4 +- fs/btrfs/transaction.h | 2 +- fs/btrfs/volumes.c | 2 +- fs/overlayfs/export.c | 2 +- fs/proc/base.c | 11 +- include/linux/netdevice.h | 4 + include/linux/once.h | 4 +- include/linux/oom.h | 4 +- kernel/audit_tree.c | 2 +- kernel/bpf/verifier.c | 57 ++++++++- lib/once.c | 11 +- mm/oom_kill.c | 22 ++-- net/core/rtnetlink.c | 3 +- net/ipv4/ip_gre.c | 2 + net/netfilter/nf_conntrack_core.c | 71 ++++------- net/qrtr/qrtr.c | 2 +- net/rds/ib_frmr.c | 4 +- net/socket.c | 6 +- 61 files changed, 419 insertions(+), 326 deletions(-)
From: Xiaolong Huang butterflyhuangxx@gmail.com
commit 7e78c597c3ebfd0cb329aa09a838734147e4f117 upstream.
This check was incomplete, did not consider size is 0:
if (len != ALIGN(size, 4) + hdrlen) goto err;
if size from qrtr_hdr is 0, the result of ALIGN(size, 4) will be 0, In case of len == hdrlen and size == 0 in header this check won't fail and
if (cb->type == QRTR_TYPE_NEW_SERVER) { /* Remote node endpoint can bridge other distant nodes */ const struct qrtr_ctrl_pkt *pkt = data + hdrlen;
qrtr_node_assign(node, le32_to_cpu(pkt->server.node)); }
will also read out of bound from data, which is hdrlen allocated block.
Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Fixes: ad9d24c9429e ("net: qrtr: fix OOB Read in qrtr_endpoint_post") Signed-off-by: Xiaolong Huang butterflyhuangxx@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/qrtr/qrtr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -314,7 +314,7 @@ int qrtr_endpoint_post(struct qrtr_endpo goto err; }
- if (len != ALIGN(size, 4) + hdrlen) + if (!size || len != ALIGN(size, 4) + hdrlen) goto err;
if (cb->dst_port != QRTR_PORT_CTRL && cb->type != QRTR_TYPE_DATA)
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit bf79167fd86f3b97390fe2e70231d383526bd9cc ]
Enabling CONFIG_STACKDEPOT results in the following build error.
arc-elf-ld: lib/stackdepot.o: in function `filter_irq_stacks': stackdepot.c:(.text+0x456): undefined reference to `__irqentry_text_start' arc-elf-ld: stackdepot.c:(.text+0x456): undefined reference to `__irqentry_text_start' arc-elf-ld: stackdepot.c:(.text+0x476): undefined reference to `__irqentry_text_end' arc-elf-ld: stackdepot.c:(.text+0x476): undefined reference to `__irqentry_text_end' arc-elf-ld: stackdepot.c:(.text+0x484): undefined reference to `__softirqentry_text_start' arc-elf-ld: stackdepot.c:(.text+0x484): undefined reference to `__softirqentry_text_start' arc-elf-ld: stackdepot.c:(.text+0x48c): undefined reference to `__softirqentry_text_end' arc-elf-ld: stackdepot.c:(.text+0x48c): undefined reference to `__softirqentry_text_end'
Other architectures address this problem by adding IRQENTRY_TEXT and SOFTIRQENTRY_TEXT to the text segment, so do the same here.
Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arc/kernel/vmlinux.lds.S | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S index 6c693a9d29b6..0391b8293ad8 100644 --- a/arch/arc/kernel/vmlinux.lds.S +++ b/arch/arc/kernel/vmlinux.lds.S @@ -88,6 +88,8 @@ SECTIONS CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT + IRQENTRY_TEXT + SOFTIRQENTRY_TEXT *(.fixup) *(.gnu.warning) }
From: Florian Westphal fw@strlen.de
[ Upstream commit 4608fdfc07e116f9fc0895beb40abad7cdb5ee3d ]
Michal Kubecek reports that conntrack gc is responsible for frequent wakeups (every 125ms) on idle systems.
On busy systems, timed out entries are evicted during lookup. The gc worker is only needed to remove entries after system becomes idle after a busy period.
To resolve this, always scan the entire table. If the scan is taking too long, reschedule so other work_structs can run and resume from next bucket.
After a completed scan, wait for 2 minutes before the next cycle. Heuristics for faster re-schedule are removed.
GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow tuning this as-needed or even turn the gc worker off.
Reported-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_core.c | 71 ++++++++++--------------------- 1 file changed, 22 insertions(+), 49 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 4a988ce4264c..4bcc36e4b2ef 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -66,22 +66,17 @@ EXPORT_SYMBOL_GPL(nf_conntrack_hash);
struct conntrack_gc_work { struct delayed_work dwork; - u32 last_bucket; + u32 next_bucket; bool exiting; bool early_drop; - long next_gc_run; };
static __read_mostly struct kmem_cache *nf_conntrack_cachep; static DEFINE_SPINLOCK(nf_conntrack_locks_all_lock); static __read_mostly bool nf_conntrack_locks_all;
-/* every gc cycle scans at most 1/GC_MAX_BUCKETS_DIV part of table */ -#define GC_MAX_BUCKETS_DIV 128u -/* upper bound of full table scan */ -#define GC_MAX_SCAN_JIFFIES (16u * HZ) -/* desired ratio of entries found to be expired */ -#define GC_EVICT_RATIO 50u +#define GC_SCAN_INTERVAL (120u * HZ) +#define GC_SCAN_MAX_DURATION msecs_to_jiffies(10)
static struct conntrack_gc_work conntrack_gc_work;
@@ -1226,17 +1221,13 @@ static void nf_ct_offload_timeout(struct nf_conn *ct)
static void gc_worker(struct work_struct *work) { - unsigned int min_interval = max(HZ / GC_MAX_BUCKETS_DIV, 1u); - unsigned int i, goal, buckets = 0, expired_count = 0; - unsigned int nf_conntrack_max95 = 0; + unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; + unsigned int i, hashsz, nf_conntrack_max95 = 0; + unsigned long next_run = GC_SCAN_INTERVAL; struct conntrack_gc_work *gc_work; - unsigned int ratio, scanned = 0; - unsigned long next_run; - gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
- goal = nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV; - i = gc_work->last_bucket; + i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u;
@@ -1244,22 +1235,21 @@ static void gc_worker(struct work_struct *work) struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; struct hlist_nulls_node *n; - unsigned int hashsz; struct nf_conn *tmp;
- i++; rcu_read_lock();
nf_conntrack_get_ht(&ct_hash, &hashsz); - if (i >= hashsz) - i = 0; + if (i >= hashsz) { + rcu_read_unlock(); + break; + }
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct net *net;
tmp = nf_ct_tuplehash_to_ctrack(h);
- scanned++; if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) { nf_ct_offload_timeout(tmp); continue; @@ -1267,7 +1257,6 @@ static void gc_worker(struct work_struct *work)
if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); - expired_count++; continue; }
@@ -1299,7 +1288,14 @@ static void gc_worker(struct work_struct *work) */ rcu_read_unlock(); cond_resched(); - } while (++buckets < goal); + i++; + + if (time_after(jiffies, end_time) && i < hashsz) { + gc_work->next_bucket = i; + next_run = 0; + break; + } + } while (i < hashsz);
if (gc_work->exiting) return; @@ -1310,40 +1306,17 @@ static void gc_worker(struct work_struct *work) * * This worker is only here to reap expired entries when system went * idle after a busy period. - * - * The heuristics below are supposed to balance conflicting goals: - * - * 1. Minimize time until we notice a stale entry - * 2. Maximize scan intervals to not waste cycles - * - * Normally, expire ratio will be close to 0. - * - * As soon as a sizeable fraction of the entries have expired - * increase scan frequency. */ - ratio = scanned ? expired_count * 100 / scanned : 0; - if (ratio > GC_EVICT_RATIO) { - gc_work->next_gc_run = min_interval; - } else { - unsigned int max = GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV; - - BUILD_BUG_ON((GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV) == 0); - - gc_work->next_gc_run += min_interval; - if (gc_work->next_gc_run > max) - gc_work->next_gc_run = max; + if (next_run) { + gc_work->early_drop = false; + gc_work->next_bucket = 0; } - - next_run = gc_work->next_gc_run; - gc_work->last_bucket = i; - gc_work->early_drop = false; queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); }
static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker); - gc_work->next_gc_run = HZ; gc_work->exiting = false; }
From: Kefeng Wang wangkefeng.wang@huawei.com
[ Upstream commit 1027b96ec9d34f9abab69bc1a4dc5b1ad8ab1349 ]
DO_ONCE DEFINE_STATIC_KEY_TRUE(___once_key); __do_once_done once_disable_jump(once_key); INIT_WORK(&w->work, once_deferred); struct once_work *w; w->key = key; schedule_work(&w->work); module unload //*the key is destroy* process_one_work once_deferred BUG_ON(!static_key_enabled(work->key)); static_key_count((struct static_key *)x) //*access key, crash*
When module uses DO_ONCE mechanism, it could crash due to the above concurrency problem, we could reproduce it with link[1].
Fix it by add/put module refcount in the once work process.
[1] https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.c...
Cc: Hannes Frederic Sowa hannes@stressinduktion.org Cc: Daniel Borkmann daniel@iogearbox.net Cc: David S. Miller davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Reported-by: Minmin chen chenmingmin@huawei.com Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Acked-by: Hannes Frederic Sowa hannes@stressinduktion.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/once.h | 4 ++-- lib/once.c | 11 ++++++++--- 2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/linux/once.h b/include/linux/once.h index 9225ee6d96c7..ae6f4eb41cbe 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -7,7 +7,7 @@
bool __do_once_start(bool *done, unsigned long *flags); void __do_once_done(bool *done, struct static_key_true *once_key, - unsigned long *flags); + unsigned long *flags, struct module *mod);
/* Call a function exactly once. The idea of DO_ONCE() is to perform * a function call such as initialization of random seeds, etc, only @@ -46,7 +46,7 @@ void __do_once_done(bool *done, struct static_key_true *once_key, if (unlikely(___ret)) { \ func(__VA_ARGS__); \ __do_once_done(&___done, &___once_key, \ - &___flags); \ + &___flags, THIS_MODULE); \ } \ } \ ___ret; \ diff --git a/lib/once.c b/lib/once.c index 8b7d6235217e..59149bf3bfb4 100644 --- a/lib/once.c +++ b/lib/once.c @@ -3,10 +3,12 @@ #include <linux/spinlock.h> #include <linux/once.h> #include <linux/random.h> +#include <linux/module.h>
struct once_work { struct work_struct work; struct static_key_true *key; + struct module *module; };
static void once_deferred(struct work_struct *w) @@ -16,10 +18,11 @@ static void once_deferred(struct work_struct *w) work = container_of(w, struct once_work, work); BUG_ON(!static_key_enabled(work->key)); static_branch_disable(work->key); + module_put(work->module); kfree(work); }
-static void once_disable_jump(struct static_key_true *key) +static void once_disable_jump(struct static_key_true *key, struct module *mod) { struct once_work *w;
@@ -29,6 +32,8 @@ static void once_disable_jump(struct static_key_true *key)
INIT_WORK(&w->work, once_deferred); w->key = key; + w->module = mod; + __module_get(mod); schedule_work(&w->work); }
@@ -53,11 +58,11 @@ bool __do_once_start(bool *done, unsigned long *flags) EXPORT_SYMBOL(__do_once_start);
void __do_once_done(bool *done, struct static_key_true *once_key, - unsigned long *flags) + unsigned long *flags, struct module *mod) __releases(once_lock) { *done = true; spin_unlock_irqrestore(&once_lock, *flags); - once_disable_jump(once_key); + once_disable_jump(once_key, mod); } EXPORT_SYMBOL(__do_once_done);
From: Miklos Szeredi mszeredi@redhat.com
[ Upstream commit 580c610429b3994e8db24418927747cf28443cde ]
One error path can result in release_dentry_name_snapshot() being called before "name" was initialized by take_dentry_name_snapshot().
Fix by moving the release_dentry_name_snapshot() to immediately after the only use.
Reported-by: Colin Ian King colin.king@canonical.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/overlayfs/export.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c index 11dd8177770d..19574ef17470 100644 --- a/fs/overlayfs/export.c +++ b/fs/overlayfs/export.c @@ -395,6 +395,7 @@ static struct dentry *ovl_lookup_real_one(struct dentry *connected, */ take_dentry_name_snapshot(&name, real); this = lookup_one_len(name.name.name, connected, name.name.len); + release_dentry_name_snapshot(&name); err = PTR_ERR(this); if (IS_ERR(this)) { goto fail; @@ -409,7 +410,6 @@ static struct dentry *ovl_lookup_real_one(struct dentry *connected, }
out: - release_dentry_name_snapshot(&name); dput(parent); inode_unlock(dir); return this;
From: Shaik Sajida Bhanu sbhanu@codeaurora.org
[ Upstream commit 67b13f3e221ed81b46a657e2b499bf8b20162476 ]
Whenever SDHC run at clock rate 50MHZ or below, the hardware data timeout value will be 21.47secs, which is approx. 22secs and we have a current software timeout value as 10secs. We have to set software timeout value more than the hardware data timeout value to avioid seeing the below register dumps.
[ 332.953670] mmc2: Timeout waiting for hardware interrupt. [ 332.959608] mmc2: sdhci: ============ SDHCI REGISTER DUMP =========== [ 332.966450] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00007202 [ 332.973256] mmc2: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000001 [ 332.980054] mmc2: sdhci: Argument: 0x00000000 | Trn mode: 0x00000027 [ 332.986864] mmc2: sdhci: Present: 0x01f801f6 | Host ctl: 0x0000001f [ 332.993671] mmc2: sdhci: Power: 0x00000001 | Blk gap: 0x00000000 [ 333.000583] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 [ 333.007386] mmc2: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000 [ 333.014182] mmc2: sdhci: Int enab: 0x03ff100b | Sig enab: 0x03ff100b [ 333.020976] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 333.027771] mmc2: sdhci: Caps: 0x322dc8b2 | Caps_1: 0x0000808f [ 333.034561] mmc2: sdhci: Cmd: 0x0000183a | Max curr: 0x00000000 [ 333.041359] mmc2: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000 [ 333.048157] mmc2: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 [ 333.054945] mmc2: sdhci: Host ctl2: 0x00000000 [ 333.059657] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000ffffff218 [ 333.067178] mmc2: sdhci_msm: ----------- VENDOR REGISTER DUMP ----------- [ 333.074343] mmc2: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x6000642c | DLL cfg2: 0x0020a000 [ 333.083417] mmc2: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00000000 | DDR cfg: 0x80040873 [ 333.092850] mmc2: sdhci_msm: Vndr func: 0x00008a9c | Vndr func2 : 0xf88218a8 Vndr func3: 0x02626040 [ 333.102371] mmc2: sdhci: ============================================
So, set software timeout value more than hardware timeout value.
Signed-off-by: Shaik Sajida Bhanu sbhanu@codeaurora.org Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1626435974-14462-1-git-send-email-sbhanu@codeauror... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/sdhci-msm.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c index 8bed81cf03ad..8ab963055238 100644 --- a/drivers/mmc/host/sdhci-msm.c +++ b/drivers/mmc/host/sdhci-msm.c @@ -1589,6 +1589,23 @@ out: __sdhci_msm_set_clock(host, clock); }
+static void sdhci_msm_set_timeout(struct sdhci_host *host, struct mmc_command *cmd) +{ + u32 count, start = 15; + + __sdhci_set_timeout(host, cmd); + count = sdhci_readb(host, SDHCI_TIMEOUT_CONTROL); + /* + * Update software timeout value if its value is less than hardware data + * timeout value. Qcom SoC hardware data timeout value was calculated + * using 4 * MCLK * 2^(count + 13). where MCLK = 1 / host->clock. + */ + if (cmd && cmd->data && host->clock > 400000 && + host->clock <= 50000000 && + ((1 << (count + start)) > (10 * host->clock))) + host->data_timeout = 22LL * NSEC_PER_SEC; +} + /* * Platform specific register write functions. This is so that, if any * register write needs to be followed up by platform specific actions, @@ -1753,6 +1770,7 @@ static const struct sdhci_ops sdhci_msm_ops = { .set_uhs_signaling = sdhci_msm_set_uhs_signaling, .write_w = sdhci_msm_writew, .write_b = sdhci_msm_writeb, + .set_timeout = sdhci_msm_set_timeout, };
static const struct sdhci_pltfm_data sdhci_msm_pdata = {
From: Yafang Shao laoar.shao@gmail.com
[ Upstream commit 9066e5cfb73cdbcdbb49e87999482ab615e9fc76 ]
Recently we found an issue on our production environment that when memcg oom is triggered the oom killer doesn't chose the process with largest resident memory but chose the first scanned process. Note that all processes in this memcg have the same oom_score_adj, so the oom killer should chose the process with largest resident memory.
Bellow is part of the oom info, which is enough to analyze this issue. [7516987.983223] memory: usage 16777216kB, limit 16777216kB, failcnt 52843037 [7516987.983224] memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0 [7516987.983225] kmem: usage 301464kB, limit 9007199254740988kB, failcnt 0 [...] [7516987.983293] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [7516987.983510] [ 5740] 0 5740 257 1 32768 0 -998 pause [7516987.983574] [58804] 0 58804 4594 771 81920 0 -998 entry_point.bas [7516987.983577] [58908] 0 58908 7089 689 98304 0 -998 cron [7516987.983580] [58910] 0 58910 16235 5576 163840 0 -998 supervisord [7516987.983590] [59620] 0 59620 18074 1395 188416 0 -998 sshd [7516987.983594] [59622] 0 59622 18680 6679 188416 0 -998 python [7516987.983598] [59624] 0 59624 1859266 5161 548864 0 -998 odin-agent [7516987.983600] [59625] 0 59625 707223 9248 983040 0 -998 filebeat [7516987.983604] [59627] 0 59627 416433 64239 774144 0 -998 odin-log-agent [7516987.983607] [59631] 0 59631 180671 15012 385024 0 -998 python3 [7516987.983612] [61396] 0 61396 791287 3189 352256 0 -998 client [7516987.983615] [61641] 0 61641 1844642 29089 946176 0 -998 client [7516987.983765] [ 9236] 0 9236 2642 467 53248 0 -998 php_scanner [7516987.983911] [42898] 0 42898 15543 838 167936 0 -998 su [7516987.983915] [42900] 1000 42900 3673 867 77824 0 -998 exec_script_vr2 [7516987.983918] [42925] 1000 42925 36475 19033 335872 0 -998 python [7516987.983921] [57146] 1000 57146 3673 848 73728 0 -998 exec_script_J2p [7516987.983925] [57195] 1000 57195 186359 22958 491520 0 -998 python2 [7516987.983928] [58376] 1000 58376 275764 14402 290816 0 -998 rosmaster [7516987.983931] [58395] 1000 58395 155166 4449 245760 0 -998 rosout [7516987.983935] [58406] 1000 58406 18285584 3967322 37101568 0 -998 data_sim [7516987.984221] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3aa16c9482ae3a6f6b78bda68a55d32c87c99b985e0f11331cddf05af6c4d753,mems_allowed=0-1,oom_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184,task_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184/1f246a3eeea8f70bf91141eeaf1805346a666e225f823906485ea0b6c37dfc3d,task=pause,pid=5740,uid=0 [7516987.984254] Memory cgroup out of memory: Killed process 5740 (pause) total-vm:1028kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB [7516988.092344] oom_reaper: reaped process 5740 (pause), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
We can find that the first scanned process 5740 (pause) was killed, but its rss is only one page. That is because, when we calculate the oom badness in oom_badness(), we always ignore the negtive point and convert all of these negtive points to 1. Now as oom_score_adj of all the processes in this targeted memcg have the same value -998, the points of these processes are all negtive value. As a result, the first scanned process will be killed.
The oom_socre_adj (-998) in this memcg is set by kubelet, because it is a a Guaranteed pod, which has higher priority to prevent from being killed by system oom.
To fix this issue, we should make the calculation of oom point more accurate. We can achieve it by convert the chosen_point from 'unsigned long' to 'long'.
[cai@lca.pw: reported a issue in the previous version] [mhocko@suse.com: fixed the issue reported by Cai] [mhocko@suse.com: add the comment in proc_oom_score()] [laoar.shao@gmail.com: v3] Link: http://lkml.kernel.org/r/1594396651-9931-1-git-send-email-laoar.shao@gmail.c...
Signed-off-by: Yafang Shao laoar.shao@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Tested-by: Naresh Kamboju naresh.kamboju@linaro.org Acked-by: Michal Hocko mhocko@suse.com Cc: David Rientjes rientjes@google.com Cc: Qian Cai cai@lca.pw Link: http://lkml.kernel.org/r/1594309987-9919-1-git-send-email-laoar.shao@gmail.c... Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/base.c | 11 ++++++++++- include/linux/oom.h | 4 ++-- mm/oom_kill.c | 22 ++++++++++------------ 3 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c index 90d2f62a9672..5a187e9b7221 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -549,8 +549,17 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, { unsigned long totalpages = totalram_pages() + total_swap_pages; unsigned long points = 0; + long badness; + + badness = oom_badness(task, totalpages); + /* + * Special case OOM_SCORE_ADJ_MIN for all others scale the + * badness value into [0, 2000] range which we have been + * exporting for a long time so userspace might depend on it. + */ + if (badness != LONG_MIN) + points = (1000 + badness * 1000 / (long)totalpages) * 2 / 3;
- points = oom_badness(task, totalpages) * 1000 / totalpages; seq_printf(m, "%lu\n", points);
return 0; diff --git a/include/linux/oom.h b/include/linux/oom.h index b9df34326772..2db9a1432511 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -48,7 +48,7 @@ struct oom_control { /* Used by oom implementation, do not set */ unsigned long totalpages; struct task_struct *chosen; - unsigned long chosen_points; + long chosen_points;
/* Used to print the constraint info. */ enum oom_constraint constraint; @@ -108,7 +108,7 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
bool __oom_reap_task_mm(struct mm_struct *mm);
-extern unsigned long oom_badness(struct task_struct *p, +long oom_badness(struct task_struct *p, unsigned long totalpages);
extern bool out_of_memory(struct oom_control *oc); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 212e71874301..f1b810ddf232 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -197,17 +197,17 @@ static bool is_dump_unreclaim_slabs(void) * predictable as possible. The goal is to return the highest value for the * task consuming the most memory to avoid subsequent oom failures. */ -unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) +long oom_badness(struct task_struct *p, unsigned long totalpages) { long points; long adj;
if (oom_unkillable_task(p)) - return 0; + return LONG_MIN;
p = find_lock_task_mm(p); if (!p) - return 0; + return LONG_MIN;
/* * Do not even consider tasks which are explicitly marked oom @@ -219,7 +219,7 @@ unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) test_bit(MMF_OOM_SKIP, &p->mm->flags) || in_vfork(p)) { task_unlock(p); - return 0; + return LONG_MIN; }
/* @@ -234,11 +234,7 @@ unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) adj *= totalpages / 1000; points += adj;
- /* - * Never return 0 for an eligible task regardless of the root bonus and - * oom_score_adj (oom_score_adj can't be OOM_SCORE_ADJ_MIN here). - */ - return points > 0 ? points : 1; + return points; }
static const char * const oom_constraint_text[] = { @@ -311,7 +307,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) static int oom_evaluate_task(struct task_struct *task, void *arg) { struct oom_control *oc = arg; - unsigned long points; + long points;
if (oom_unkillable_task(task)) goto next; @@ -337,12 +333,12 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) * killed first if it triggers an oom, then select it. */ if (oom_task_origin(task)) { - points = ULONG_MAX; + points = LONG_MAX; goto select; }
points = oom_badness(task, oc->totalpages); - if (!points || points < oc->chosen_points) + if (points == LONG_MIN || points < oc->chosen_points) goto next;
select: @@ -366,6 +362,8 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) */ static void select_bad_process(struct oom_control *oc) { + oc->chosen_points = LONG_MIN; + if (is_memcg_oom(oc)) mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc); else {
From: Stefan Mätje stefan.maetje@esd.eu
commit 044012b52029204900af9e4230263418427f4ba4 upstream.
This patch fixes the interchanged fetch of the CAN RX and TX error counters from the ESD_EV_CAN_ERROR_EXT message. The RX error counter is really in struct rx_msg::data[2] and the TX error counter is in struct rx_msg::data[3].
Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device") Link: https://lore.kernel.org/r/20210825215227.4947-2-stefan.maetje@esd.eu Cc: stable@vger.kernel.org Signed-off-by: Stefan Mätje stefan.maetje@esd.eu Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/usb/esd_usb2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/can/usb/esd_usb2.c +++ b/drivers/net/can/usb/esd_usb2.c @@ -224,8 +224,8 @@ static void esd_usb2_rx_event(struct esd if (id == ESD_EV_CAN_ERROR_EXT) { u8 state = msg->msg.rx.data[0]; u8 ecc = msg->msg.rx.data[1]; - u8 txerr = msg->msg.rx.data[2]; - u8 rxerr = msg->msg.rx.data[3]; + u8 rxerr = msg->msg.rx.data[2]; + u8 txerr = msg->msg.rx.data[3];
skb = alloc_can_err_skb(priv->netdev, &cf); if (skb == NULL) {
From: Johan Hovold johan@kernel.org
commit df7b16d1c00ecb3da3a30c999cdb39f273c99a2f upstream.
This reverts commit 3c18e9baee0ef97510dcda78c82285f52626764b.
These devices do not appear to send a zero-length packet when the transfer size is a multiple of the bulk-endpoint max-packet size. This means that incoming data may not be processed by the driver until a short packet is received or the receive buffer is full.
Revert back to using endpoint-sized receive buffers to avoid stalled reads.
Reported-by: Paul Größel pb.g@gmx.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=214131 Fixes: 3c18e9baee0e ("USB: serial: ch341: fix character loss at high transfer rates") Cc: stable@vger.kernel.org Cc: Willy Tarreau w@1wt.eu Link: https://lore.kernel.org/r/20210824121926.19311-1-johan@kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/ch341.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/usb/serial/ch341.c +++ b/drivers/usb/serial/ch341.c @@ -678,7 +678,6 @@ static struct usb_serial_driver ch341_de .owner = THIS_MODULE, .name = "ch341-uart", }, - .bulk_in_size = 512, .id_table = id_table, .num_ports = 1, .open = ch341_open,
From: Zhengjun Zhang zhangzhengjun@aicrobo.com
commit 2829a4e3cf3a6ac2fa3cdb681b37574630fb9c1a upstream.
Fibocom FG150 is a 5G module based on Qualcomm SDX55 platform, support Sub-6G band.
Here are the outputs of lsusb -v and usb-devices:
T: Bus=02 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1 P: Vendor=2cb7 ProdID=010b Rev=04.14 S: Manufacturer=Fibocom S: Product=Fibocom Modem_SN:XXXXXXXX S: SerialNumber=XXXXXXXX C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host I: If#=0x1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=(none) I: If#=0x4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
Bus 002 Device 002: ID 2cb7:010b Fibocom Fibocom Modem_SN:XXXXXXXX Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.20 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 9 idVendor 0x2cb7 Fibocom idProduct 0x010b bcdDevice 4.14 iManufacturer 1 Fibocom iProduct 2 Fibocom Modem_SN:XXXXXXXX iSerial 3 XXXXXXXX bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x00e6 bNumInterfaces 5 bConfigurationValue 1 iConfiguration 4 RNDIS_DUN_DIAG_ADB bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 896mA Interface Association: bLength 8 bDescriptorType 11 bFirstInterface 0 bInterfaceCount 2 bFunctionClass 239 Miscellaneous Device bFunctionSubClass 4 bFunctionProtocol 1 iFunction 7 RNDIS Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 239 Miscellaneous Device bInterfaceSubClass 4 bInterfaceProtocol 1 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 01 ** UNRECOGNIZED: 04 24 02 00 ** UNRECOGNIZED: 05 24 06 00 01 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 9 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 10 CDC Data bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x8e EP 14 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 6 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x0f EP 15 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 6 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 3 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 48 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 4 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 66 bInterfaceProtocol 1 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 0x0016 bNumDeviceCaps 2 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000006 BESL Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000f Device can operate at Low Speed (1Mbps) Device can operate at Full Speed (12Mbps) Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 1 Lowest fully-functional device speed is Full Speed (12Mbps) bU1DevExitLat 1 micro seconds bU2DevExitLat 500 micro seconds Device Status: 0x0000 (Bus Powered)
Signed-off-by: Zhengjun Zhang zhangzhengjun@aicrobo.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2074,6 +2074,8 @@ static const struct usb_device_id option .driver_info = RSVD(4) | RSVD(5) }, { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x0105, 0xff), /* Fibocom NL678 series */ .driver_info = RSVD(6) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x2cb7, 0x010b, 0xff, 0xff, 0x30) }, /* Fibocom FG150 Diag */ + { USB_DEVICE_AND_INTERFACE_INFO(0x2cb7, 0x010b, 0xff, 0, 0) }, /* Fibocom FG150 AT */ { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a0, 0xff) }, /* Fibocom NL668-AM/NL652-EU (laptop MBIM) */ { USB_DEVICE_INTERFACE_CLASS(0x2df3, 0x9d03, 0xff) }, /* LongSung M5710 */ { USB_DEVICE_INTERFACE_CLASS(0x305a, 0x1404, 0xff) }, /* GosunCn GM500 RNDIS */
From: Thinh Nguyen Thinh.Nguyen@synopsys.com
commit 51f1954ad853d01ba4dc2b35dee14d8490ee05a1 upstream.
We can't depend on the TRB's HWO bit to determine if the TRB ring is "full". A TRB is only available when the driver had processed it, not when the controller consumed and relinquished the TRB's ownership to the driver. Otherwise, the driver may overwrite unprocessed TRBs. This can happen when many transfer events accumulate and the system is slow to process them and/or when there are too many small requests.
If a request is in the started_list, that means there is one or more unprocessed TRBs remained. Check this instead of the TRB's HWO bit whether the TRB ring is full.
Fixes: c4233573f6ee ("usb: dwc3: gadget: prepare TRBs on update transfers too") Cc: stable@vger.kernel.org Acked-by: Felipe Balbi balbi@kernel.org Signed-off-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Link: https://lore.kernel.org/r/e91e975affb0d0d02770686afc3a5b9eb84409f6.162933541... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/gadget.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -894,19 +894,19 @@ static struct dwc3_trb *dwc3_ep_prev_trb
static u32 dwc3_calc_trbs_left(struct dwc3_ep *dep) { - struct dwc3_trb *tmp; u8 trbs_left;
/* - * If enqueue & dequeue are equal than it is either full or empty. - * - * One way to know for sure is if the TRB right before us has HWO bit - * set or not. If it has, then we're definitely full and can't fit any - * more transfers in our ring. + * If the enqueue & dequeue are equal then the TRB ring is either full + * or empty. It's considered full when there are DWC3_TRB_NUM-1 of TRBs + * pending to be processed by the driver. */ if (dep->trb_enqueue == dep->trb_dequeue) { - tmp = dwc3_ep_prev_trb(dep, dep->trb_enqueue); - if (tmp->ctrl & DWC3_TRB_CTRL_HWO) + /* + * If there is any request remained in the started_list at + * this point, that means there is no TRB available. + */ + if (!list_empty(&dep->started_list)) return 0;
return DWC3_TRB_NUM - 1;
From: Wesley Cheng wcheng@codeaurora.org
commit 4a1e25c0a029b97ea4a3d423a6392bfacc3b2e39 upstream.
During a USB cable disconnect, or soft disconnect scenario, a pending SETUP transaction may not be completed, leading to the following error:
dwc3 a600000.dwc3: timed out waiting for SETUP phase
If this occurs, then the entire pullup disable routine is skipped and proper cleanup and halting of the controller does not complete.
Instead of returning an error (which is ignored from the UDC perspective), allow the pullup disable routine to continue, which will also handle disabling of EP0/1. This will end any active transfers as well. Ensure to clear any delayed_status also, as the timeout could happen within the STATUS stage.
Fixes: bb0147364850 ("usb: dwc3: gadget: don't clear RUN/STOP when it's invalid to do so") Cc: stable@vger.kernel.org Reviewed-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Acked-by: Felipe Balbi balbi@kernel.org Signed-off-by: Wesley Cheng wcheng@codeaurora.org Link: https://lore.kernel.org/r/20210825042855.7977-1-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/gadget.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2012,10 +2012,8 @@ static int dwc3_gadget_pullup(struct usb
ret = wait_for_completion_timeout(&dwc->ep0_in_setup, msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT)); - if (ret == 0) { - dev_err(dwc->dev, "timed out waiting for SETUP phase\n"); - return -ETIMEDOUT; - } + if (ret == 0) + dev_warn(dwc->dev, "timed out waiting for SETUP phase\n"); }
/* @@ -2217,6 +2215,7 @@ static int __dwc3_gadget_start(struct dw /* begin to receive SETUP packets */ dwc->ep0state = EP0_SETUP_PHASE; dwc->link_state = DWC3_LINK_STATE_SS_DIS; + dwc->delayed_status = false; dwc3_ep0_out_start(dwc);
dwc3_gadget_enable_irq(dwc);
From: Li Jinlin lijinlin3@huawei.com
commit 02c6dcd543f8f051973ee18bfbc4dc3bd595c558 upstream.
We found a hang, the steps to reproduce are as follows:
1. blocking device via scsi_device_set_state()
2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10
3. echo none > /sys/block/sda/queue/scheduler
4. echo "running" >/sys/block/sda/device/state
Step 3 and 4 should complete after step 4, but they hang.
CPU#0 CPU#1 CPU#2 --------------- ---------------- ---------------- Step 1: blocking device
Step 2: dd xxxx ^^^^^^ get request q_usage_counter++
Step 3: switching scheculer elv_iosched_store elevator_switch blk_mq_freeze_queue blk_freeze_queue > blk_freeze_queue_start ^^^^^^ mq_freeze_depth++
> blk_mq_run_hw_queues ^^^^^^ can't run queue when dev blocked
> blk_mq_freeze_queue_wait ^^^^^^ Hang here!!! wait q_usage_counter==0
Step 4: running device store_state_field scsi_rescan_device scsi_attach_vpd scsi_vpd_inquiry __scsi_execute blk_get_request blk_mq_alloc_request blk_queue_enter ^^^^^^ Hang here!!! wait mq_freeze_depth==0
blk_mq_run_hw_queues ^^^^^^ dispatch IO, q_usage_counter will reduce to zero
blk_mq_unfreeze_queue ^^^^^ mq_freeze_depth--
To fix this, we need to run queue before rescanning device when the device state changes to SDEV_RUNNING.
Link: https://lore.kernel.org/r/20210824025921.3277629-1-lijinlin3@huawei.com Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after offlinining device") Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Li Jinlin lijinlin3@huawei.com Signed-off-by: Qiu Laibin qiulaibin@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/scsi_sysfs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -788,12 +788,15 @@ store_state_field(struct device *dev, st ret = scsi_device_set_state(sdev, state); /* * If the device state changes to SDEV_RUNNING, we need to - * rescan the device to revalidate it, and run the queue to - * avoid I/O hang. + * run the queue to avoid I/O hang, and rescan the device + * to revalidate it. Running the queue first is necessary + * because another thread may be waiting inside + * blk_mq_freeze_queue_wait() and because that call may be + * waiting for pending I/O to finish. */ if (ret == 0 && state == SDEV_RUNNING) { - scsi_rescan_device(dev); blk_mq_run_hw_queues(sdev->request_queue, true); + scsi_rescan_device(dev); } mutex_unlock(&sdev->state_mutex);
From: Naresh Kumar PBS nareshkumar.pbs@broadcom.com
[ Upstream commit 17f2569dce1848080825b8336e6b7c6900193b44 ]
Add the missing initialization of srq lock.
Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Link: https://lore.kernel.org/r/1629343553-5843-3-git-send-email-selvin.xavier@bro... Signed-off-by: Naresh Kumar PBS nareshkumar.pbs@broadcom.com Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 58c021648b7c..a96f9142fe08 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -1404,6 +1404,7 @@ int bnxt_re_create_srq(struct ib_srq *ib_srq, if (nq) nq->budget++; atomic_inc(&rdev->srq_count); + spin_lock_init(&srq->lock);
return 0;
From: Tuo Li islituo@gmail.com
[ Upstream commit cbe71c61992c38f72c2b625b2ef25916b9f0d060 ]
kmalloc_array() is called to allocate memory for tx->descp. If it fails, the function __sdma_txclean() is called: __sdma_txclean(dd, tx);
However, in the function __sdma_txclean(), tx-descp is dereferenced if tx->num_desc is not zero: sdma_unmap_desc(dd, &tx->descp[0]);
To fix this possible null-pointer dereference, assign the return value of kmalloc_array() to a local variable descp, and then assign it to tx->descp if it is not NULL. Otherwise, go to enomem.
Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20210806133029.194964-1-islituo@gmail.com Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Tuo Li islituo@gmail.com Tested-by: Mike Marciniszyn mike.marciniszyn@cornelisnetworks.com Acked-by: Mike Marciniszyn mike.marciniszyn@cornelisnetworks.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/sdma.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c index c61b6022575e..248be21acdbe 100644 --- a/drivers/infiniband/hw/hfi1/sdma.c +++ b/drivers/infiniband/hw/hfi1/sdma.c @@ -3056,6 +3056,7 @@ static void __sdma_process_event(struct sdma_engine *sde, static int _extend_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx) { int i; + struct sdma_desc *descp;
/* Handle last descriptor */ if (unlikely((tx->num_desc == (MAX_DESC - 1)))) { @@ -3076,12 +3077,10 @@ static int _extend_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx) if (unlikely(tx->num_desc == MAX_DESC)) goto enomem;
- tx->descp = kmalloc_array( - MAX_DESC, - sizeof(struct sdma_desc), - GFP_ATOMIC); - if (!tx->descp) + descp = kmalloc_array(MAX_DESC, sizeof(struct sdma_desc), GFP_ATOMIC); + if (!descp) goto enomem; + tx->descp = descp;
/* reserve last descriptor for coalescing */ tx->desc_limit = MAX_DESC - 1;
From: Sasha Neftin sasha.neftin@intel.com
[ Upstream commit 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73 ]
We should decode the latency and the max_latency before directly compare. The latency should be presented as lat_enc = scale x value: lat_enc_d = (lat_enc & 0x0x3ff) x (1U << (5*((max_ltr_enc & 0x1c00)
10)))
Fixes: cf8fb73c23aa ("e1000e: add support for LTR on I217/I218") Suggested-by: Yee Li seven.yi.lee@gmail.com Signed-off-by: Sasha Neftin sasha.neftin@intel.com Tested-by: Dvora Fuxbrumer dvorax.fuxbrumer@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 +++++++++++++- drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++ 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index a1fab77b2096..58ff747a42ae 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -995,6 +995,8 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link) { u32 reg = link << (E1000_LTRV_REQ_SHIFT + E1000_LTRV_NOSNOOP_SHIFT) | link << E1000_LTRV_REQ_SHIFT | E1000_LTRV_SEND; + u16 max_ltr_enc_d = 0; /* maximum LTR decoded by platform */ + u16 lat_enc_d = 0; /* latency decoded */ u16 lat_enc = 0; /* latency encoded */
if (link) { @@ -1048,7 +1050,17 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link) E1000_PCI_LTR_CAP_LPT + 2, &max_nosnoop); max_ltr_enc = max_t(u16, max_snoop, max_nosnoop);
- if (lat_enc > max_ltr_enc) + lat_enc_d = (lat_enc & E1000_LTRV_VALUE_MASK) * + (1U << (E1000_LTRV_SCALE_FACTOR * + ((lat_enc & E1000_LTRV_SCALE_MASK) + >> E1000_LTRV_SCALE_SHIFT))); + + max_ltr_enc_d = (max_ltr_enc & E1000_LTRV_VALUE_MASK) * + (1U << (E1000_LTRV_SCALE_FACTOR * + ((max_ltr_enc & E1000_LTRV_SCALE_MASK) + >> E1000_LTRV_SCALE_SHIFT))); + + if (lat_enc_d > max_ltr_enc_d) lat_enc = max_ltr_enc; }
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.h b/drivers/net/ethernet/intel/e1000e/ich8lan.h index 1502895eb45d..e757896287eb 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.h +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.h @@ -274,8 +274,11 @@
/* Latency Tolerance Reporting */ #define E1000_LTRV 0x000F8 +#define E1000_LTRV_VALUE_MASK 0x000003FF #define E1000_LTRV_SCALE_MAX 5 #define E1000_LTRV_SCALE_FACTOR 5 +#define E1000_LTRV_SCALE_SHIFT 10 +#define E1000_LTRV_SCALE_MASK 0x00001C00 #define E1000_LTRV_REQ_SHIFT 15 #define E1000_LTRV_NOSNOOP_SHIFT 16 #define E1000_LTRV_SEND (1 << 30)
From: Gal Pressman galpress@amazon.com
[ Upstream commit dbe986bdfd6dfe6ef24b833767fff4151e024357 ]
Make sure to free the IRQ vectors in case the allocation doesn't return the expected number of IRQs.
Fixes: b7f5e880f377 ("RDMA/efa: Add the efa module") Link: https://lore.kernel.org/r/20210811151131.39138-2-galpress@amazon.com Reviewed-by: Firas JahJah firasj@amazon.com Reviewed-by: Yossi Leybovich sleybo@amazon.com Signed-off-by: Gal Pressman galpress@amazon.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/efa/efa_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c index 83858f7e83d0..75dfe9d1564c 100644 --- a/drivers/infiniband/hw/efa/efa_main.c +++ b/drivers/infiniband/hw/efa/efa_main.c @@ -340,6 +340,7 @@ static int efa_enable_msix(struct efa_dev *dev) }
if (irq_num != msix_vecs) { + efa_disable_msix(dev); dev_err(&dev->pdev->dev, "Allocated %d MSI-X (out of %d requested)\n", irq_num, msix_vecs);
From: Shreyansh Chouhan chouhan.shreyansh630@gmail.com
[ Upstream commit 1d011c4803c72f3907eccfc1ec63caefb852fcbf ]
Validate csum_start in gre_handle_offloads before we call _gre_xmit so that we do not crash later when the csum_start value is used in the lco_csum function call.
This patch deals with ipv4 code.
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan chouhan.shreyansh630@gmail.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_gre.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index fedad3a3e61b..fd8298b8b1c5 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -446,6 +446,8 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
static int gre_handle_offloads(struct sk_buff *skb, bool csum) { + if (csum && skb_checksum_start(skb) < skb->data) + return -EINVAL; return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); }
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 5ed74b03eb4d08f5dd281dcb5f1c9bb92b363a8d ]
A successful 'xge_mdio_config()' call should be balanced by a corresponding 'xge_mdio_remove()' call in the error handling path of the probe, as already done in the remove function.
Update the error handling path accordingly.
Fixes: ea8ab16ab225 ("drivers: net: xgene-v2: Add MDIO support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/apm/xgene-v2/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c b/drivers/net/ethernet/apm/xgene-v2/main.c index 02b4f3af02b5..848be6bf2fd1 100644 --- a/drivers/net/ethernet/apm/xgene-v2/main.c +++ b/drivers/net/ethernet/apm/xgene-v2/main.c @@ -677,11 +677,13 @@ static int xge_probe(struct platform_device *pdev) ret = register_netdev(ndev); if (ret) { netdev_err(ndev, "Failed to register netdev\n"); - goto err; + goto err_mdio_remove; }
return 0;
+err_mdio_remove: + xge_mdio_remove(ndev); err: free_netdev(ndev);
From: Maxim Kiselev bigunclemax@gmail.com
[ Upstream commit 359f4cdd7d78fdf8c098713b05fee950a730f131 ]
According to Armada XP datasheet bit at 0 position is corresponding for TxInProg indication.
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: Maxim Kiselev bigunclemax@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 7b0543056b10..64aa5510e61a 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -101,7 +101,7 @@ #define MVNETA_DESC_SWAP BIT(6) #define MVNETA_TX_BRST_SZ_MASK(burst) ((burst) << 22) #define MVNETA_PORT_STATUS 0x2444 -#define MVNETA_TX_IN_PRGRS BIT(1) +#define MVNETA_TX_IN_PRGRS BIT(0) #define MVNETA_TX_FIFO_EMPTY BIT(8) #define MVNETA_RX_MIN_FRAME_SIZE 0x247c /* Only exists on Armada XP and Armada 370 */
From: Andrey Ignatov rdna@fb.com
[ Upstream commit 96a6b93b69880b2c978e1b2be9cae6970b605008 ]
Currently when device is moved between network namespaces using RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID, IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and target namespace already has device with same name, userspace will get EINVAL what is confusing and makes debugging harder.
Fix it so that userspace gets more appropriate EEXIST instead what makes debugging much easier.
Before:
# ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: Invalid argument
After:
# ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: File exists
The problem is that do_setlink() passes its `char *ifname` argument, that it gets from a caller, to __dev_change_net_namespace() as is (as `const char *pat`), but semantics of ifname and pat can be different.
For example, __rtnl_newlink() does this:
net/core/rtnetlink.c 3270 char ifname[IFNAMSIZ]; ... 3286 if (tb[IFLA_IFNAME]) 3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); 3288 else 3289 ifname[0] = '\0'; ... 3364 if (dev) { ... 3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status); 3395 }
, i.e. do_setlink() gets ifname pointer that is always valid no matter if user specified IFLA_IFNAME or not and then do_setlink() passes this ifname pointer as is to __dev_change_net_namespace() as pat argument.
But the pat (pattern) in __dev_change_net_namespace() is used as:
net/core/dev.c 11198 err = -EEXIST; 11199 if (__dev_get_by_name(net, dev->name)) { 11200 /* We get here if we can't use the current device name */ 11201 if (!pat) 11202 goto out; 11203 err = dev_get_valid_name(net, dev, pat); 11204 if (err < 0) 11205 goto out; 11206 }
As the result the `goto out` path on line 11202 is neven taken and instead of returning EEXIST defined on line 11198, __dev_change_net_namespace() returns an error from dev_get_valid_name() and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.
Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.") Signed-off-by: Andrey Ignatov rdna@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 0bad5db23129..6fbc9cb09dc0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2414,6 +2414,7 @@ static int do_setlink(const struct sk_buff *skb, return err;
if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD] || tb[IFLA_TARGET_NETNSID]) { + const char *pat = ifname && ifname[0] ? ifname : NULL; struct net *net = rtnl_link_get_net_capable(skb, dev_net(dev), tb, CAP_NET_ADMIN); if (IS_ERR(net)) { @@ -2421,7 +2422,7 @@ static int do_setlink(const struct sk_buff *skb, goto errout; }
- err = dev_change_net_namespace(dev, net, ifname); + err = dev_change_net_namespace(dev, net, pat); put_net(net); if (err) goto errout;
From: Yufeng Mo moyufeng@huawei.com
[ Upstream commit 1a6d281946c330cee2855f6d0cd796616e54601f ]
If a PF is bonded to a virtual machine and the virtual machine exits unexpectedly, some hardware resource cannot be cleared. In this case, loading driver may cause exceptions. Therefore, the hardware resource needs to be cleared when the driver is loaded.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yufeng Mo moyufeng@huawei.com Signed-off-by: Salil Mehta salil.mehta@huawei.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../hisilicon/hns3/hns3pf/hclge_cmd.h | 3 +++ .../hisilicon/hns3/hns3pf/hclge_main.c | 26 +++++++++++++++++++ 2 files changed, 29 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h index e34e0854635c..d64cded30eeb 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h @@ -257,6 +257,9 @@ enum hclge_opcode_type { /* Led command */ HCLGE_OPC_LED_STATUS_CFG = 0xB000,
+ /* clear hardware resource command */ + HCLGE_OPC_CLEAR_HW_RESOURCE = 0x700B, + /* NCL config command */ HCLGE_OPC_QUERY_NCL_CONFIG = 0x7011, /* M7 stats command */ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 93f3865b679b..28e260439196 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -9165,6 +9165,28 @@ static void hclge_clear_resetting_state(struct hclge_dev *hdev) } }
+static int hclge_clear_hw_resource(struct hclge_dev *hdev) +{ + struct hclge_desc desc; + int ret; + + hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_CLEAR_HW_RESOURCE, false); + + ret = hclge_cmd_send(&hdev->hw, &desc, 1); + /* This new command is only supported by new firmware, it will + * fail with older firmware. Error value -EOPNOSUPP can only be + * returned by older firmware running this command, to keep code + * backward compatible we will override this value and return + * success. + */ + if (ret && ret != -EOPNOTSUPP) { + dev_err(&hdev->pdev->dev, + "failed to clear hw resource, ret = %d\n", ret); + return ret; + } + return 0; +} + static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) { struct pci_dev *pdev = ae_dev->pdev; @@ -9206,6 +9228,10 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) if (ret) goto err_cmd_uninit;
+ ret = hclge_clear_hw_resource(hdev); + if (ret) + goto err_cmd_uninit; + ret = hclge_get_cap(hdev); if (ret) { dev_err(&pdev->dev, "get hw capability error, ret = %d.\n",
From: Guojia Liao liaoguojia@huawei.com
[ Upstream commit 94391fae82f71c98ecc7716a32611fcca73c74eb ]
VLAN list should not be added duplicate VLAN node, otherwise it would cause "add failed" when restore VLAN from VLAN list, so this patch adds VLAN ID check before adding node into VLAN list.
Fixes: c6075b193462 ("net: hns3: Record VF vlan tables") Signed-off-by: Guojia Liao liaoguojia@huawei.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 28e260439196..aa402e267121 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -8006,7 +8006,11 @@ static int hclge_init_vlan_config(struct hclge_dev *hdev) static void hclge_add_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id, bool writen_to_tbl) { - struct hclge_vport_vlan_cfg *vlan; + struct hclge_vport_vlan_cfg *vlan, *tmp; + + list_for_each_entry_safe(vlan, tmp, &vport->vlan_list, node) + if (vlan->vlan_id == vlan_id) + return;
vlan = kzalloc(sizeof(*vlan), GFP_KERNEL); if (!vlan)
From: Guangbin Huang huangguangbin2@huawei.com
[ Upstream commit 8c1671e0d13d4a0ba4fb3a0da932bf3736d7ff73 ]
Currently, when query PFC configuration by dcbtool, driver will return PFC enable status based on TC. As all priorities are mapped to TC0 by default, if TC0 is enabled, then all priorities mapped to TC0 will be shown as enabled status when query PFC setting, even though some priorities have never been set.
for example: $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off $ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on
To fix this problem, just returns user's PFC config parameter saved in driver.
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c index a1790af73096..d16488bab86f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c @@ -281,21 +281,12 @@ static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc) u64 requests[HNAE3_MAX_TC], indications[HNAE3_MAX_TC]; struct hclge_vport *vport = hclge_get_vport(h); struct hclge_dev *hdev = vport->back; - u8 i, j, pfc_map, *prio_tc; int ret; + u8 i;
memset(pfc, 0, sizeof(*pfc)); pfc->pfc_cap = hdev->pfc_max; - prio_tc = hdev->tm_info.prio_tc; - pfc_map = hdev->tm_info.hw_pfc_map; - - /* Pfc setting is based on TC */ - for (i = 0; i < hdev->tm_info.num_tc; i++) { - for (j = 0; j < HNAE3_MAX_USER_PRIO; j++) { - if ((prio_tc[j] == i) && (pfc_map & BIT(i))) - pfc->pfc_en |= BIT(j); - } - } + pfc->pfc_en = hdev->tm_info.pfc_en;
ret = hclge_pfc_tx_stats_get(hdev, requests); if (ret)
From: Matthew Brost matthew.brost@intel.com
[ Upstream commit a63bcf08f0efb5348105bb8e0e1e8c6671077753 ]
A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the syncmap on fini.
unreferenced object 0xffff88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... backtrace: [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<0000000076c362a4>] do_syscall_64+0x33/0x80 [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Matthew Brost matthew.brost@intel.com Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: stable@vger.kernel.org Reviewed-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: John Harrison John.C.Harrison@Intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthe... (cherry picked from commit faf890985e30d5e88cc3a7c50c1bcad32f89ab7c) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_timeline.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 9cb01d9828f1..c970e3deb008 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -289,6 +289,14 @@ void intel_timeline_fini(struct intel_timeline *timeline) i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
i915_vma_put(timeline->hwsp_ggtt); + + /* + * A small race exists between intel_gt_retire_requests_timeout and + * intel_timeline_exit which could result in the syncmap not getting + * free'd. Rather than work to hard to seal this race, simply cleanup + * the syncmap on fini. + */ + i915_syncmap_free(&timeline->sync); }
struct intel_timeline *
From: Jerome Brunet jbrunet@baylibre.com
[ Upstream commit 068fdad20454f815e61e6f6eb9f051a8b3120e88 ]
If the endpoint completion callback is call right after the ep_enabled flag is cleared and before usb_ep_dequeue() is call, we could do a double free on the request and the associated buffer.
Fix this by clearing ep_enabled after all the endpoint requests have been dequeued.
Fixes: 7de8681be2cd ("usb: gadget: u_audio: Free requests only after callback") Cc: stable stable@vger.kernel.org Reported-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Jerome Brunet jbrunet@baylibre.com Link: https://lore.kernel.org/r/20210827092927.366482-1-jbrunet@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/u_audio.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c index 223029fa8445..4e01ba0ab8ec 100644 --- a/drivers/usb/gadget/function/u_audio.c +++ b/drivers/usb/gadget/function/u_audio.c @@ -349,8 +349,6 @@ static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep) if (!prm->ep_enabled) return;
- prm->ep_enabled = false; - audio_dev = uac->audio_dev; params = &audio_dev->params;
@@ -368,11 +366,12 @@ static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep) } }
+ prm->ep_enabled = false; + if (usb_ep_disable(ep)) dev_err(uac->card->dev, "%s:%d Error!\n", __func__, __LINE__); }
- int u_audio_start_capture(struct g_audio *audio_dev) { struct snd_uac_chip *uac = audio_dev->uac;
From: Colin Ian King colin.king@canonical.com
[ Upstream commit 0b3a8738b76fe2087f7bc2bd59f4c78504c79180 ]
The u32 variable pci_dword is being masked with 0x1fffffff and then left shifted 23 places. The shift is a u32 operation,so a value of 0x200 or more in pci_dword will overflow the u32 and only the bottow 32 bits are assigned to addr. I don't believe this was the original intent. Fix this by casting pci_dword to a resource_size_t to ensure no overflow occurs.
Note that the mask and 12 bit left shift operation does not need this because the mask SNR_IMC_MMIO_MEM0_MASK and shift is always a 32 bit value.
Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Kan Liang kan.liang@linux.intel.com Link: https://lore.kernel.org/r/20210706114553.28249-1-colin.king@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/intel/uncore_snbep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 40751af62dd3..9096a1693942 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -4382,7 +4382,7 @@ static void snr_uncore_mmio_init_box(struct intel_uncore_box *box) return;
pci_read_config_dword(pdev, SNR_IMC_MMIO_BASE_OFFSET, &pci_dword); - addr = (pci_dword & SNR_IMC_MMIO_BASE_MASK) << 23; + addr = ((resource_size_t)pci_dword & SNR_IMC_MMIO_BASE_MASK) << 23;
pci_read_config_dword(pdev, SNR_IMC_MMIO_MEM0_OFFSET, &pci_dword); addr |= (pci_dword & SNR_IMC_MMIO_MEM0_MASK) << 12;
From: Michał Mirosław mirq-linux@rere.qmqm.pl
[ Upstream commit 335ffab3ef864539e814b9a2903b0ae420c1c067 ]
This WARN can be triggered per-core and the stack trace is not useful. Replace it with plain dev_err(). Fix a comment while at it.
Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/opp/of.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 249738e1e0b7..603c688fe23d 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -682,8 +682,9 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table) } }
- /* There should be one of more OPP defined */ - if (WARN_ON(!count)) { + /* There should be one or more OPPs defined */ + if (!count) { + dev_err(dev, "%s: no supported OPPs", __func__); ret = -ENOENT; goto remove_static_opp; }
From: Parav Pandit parav@nvidia.com
[ Upstream commit 60f0779862e4ab943810187752c462e85f5fa371 ]
Currently vq->broken field is read by virtqueue_is_broken() in busy loop in one context by virtnet_send_command().
vq->broken is set to true in other process context by virtio_break_device(). Reader and writer are accessing it without any synchronization. This may lead to a compiler optimization which may result to optimize reading vq->broken only once.
Hence, force reading vq->broken on each invocation of virtqueue_is_broken() and also force writing it so that such update is visible to the readers.
It is a theoretical fix that isn't yet encountered in the field.
Signed-off-by: Parav Pandit parav@nvidia.com Link: https://lore.kernel.org/r/20210721142648.1525924-2-parav@nvidia.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/virtio/virtio_ring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index f6011c9ed32f..e442d400dbb2 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2268,7 +2268,7 @@ bool virtqueue_is_broken(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq);
- return vq->broken; + return READ_ONCE(vq->broken); } EXPORT_SYMBOL_GPL(virtqueue_is_broken);
@@ -2283,7 +2283,9 @@ void virtio_break_device(struct virtio_device *dev) spin_lock(&dev->vqs_list_lock); list_for_each_entry(_vq, &dev->vqs, list) { struct vring_virtqueue *vq = to_vvq(_vq); - vq->broken = true; + + /* Pairs with READ_ONCE() in virtqueue_is_broken(). */ + WRITE_ONCE(vq->broken, true); } spin_unlock(&dev->vqs_list_lock); }
From: Parav Pandit parav@nvidia.com
[ Upstream commit 43bb40c5b92659966bdf4bfe584fde0a3575a049 ]
When a virtio pci device undergo surprise removal (aka async removal in PCIe spec), mark the device as broken so that any upper layer drivers can abort any outstanding operation.
When a virtio net pci device undergo surprise removal which is used by a NetworkManager, a below call trace was observed.
kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059] CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S W I L 5.13.0-hotplug+ #8 Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020 Workqueue: events linkwatch_event RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net] Call Trace: virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net] ? __hw_addr_create_ex+0x85/0xc0 __dev_mc_add+0x72/0x80 igmp6_group_added+0xa7/0xd0 ipv6_mc_up+0x3c/0x60 ipv6_find_idev+0x36/0x80 addrconf_add_dev+0x1e/0xa0 addrconf_dev_config+0x71/0x130 addrconf_notify+0x1f5/0xb40 ? rtnl_is_locked+0x11/0x20 ? __switch_to_asm+0x42/0x70 ? finish_task_switch+0xaf/0x2c0 ? raw_notifier_call_chain+0x3e/0x50 raw_notifier_call_chain+0x3e/0x50 netdev_state_change+0x67/0x90 linkwatch_do_dev+0x3c/0x50 __linkwatch_run_queue+0xd2/0x220 linkwatch_event+0x21/0x30 process_one_work+0x1c8/0x370 worker_thread+0x30/0x380 ? process_one_work+0x370/0x370 kthread+0x118/0x140 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30
Hence, add the ability to abort the command on surprise removal which prevents infinite loop and system lockup.
Signed-off-by: Parav Pandit parav@nvidia.com Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/virtio/virtio_pci_common.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 222d630c41fc..b35bb2d57f62 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev);
+ /* + * Device is marked broken on surprise removal so that virtio upper + * layers can abort any ongoing operation. + */ + if (!pci_device_is_present(pci_dev)) + virtio_break_device(&vp_dev->vdev); + pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev);
From: Neeraj Upadhyay neeraju@codeaurora.org
[ Upstream commit e74cfa91f42c50f7f649b0eca46aa049754ccdbd ]
As __vringh_iov() traverses a descriptor chain, it populates each descriptor entry into either read or write vring iov and increments that iov's ->used member. So, as we iterate over a descriptor chain, at any point, (riov/wriov)->used value gives the number of descriptor enteries available, which are to be read or written by the device. As all read iovs must precede the write iovs, wiov->used should be zero when we are traversing a read descriptor. Current code checks for wiov->i, to figure out whether any previous entry in the current descriptor chain was a write descriptor. However, iov->i is only incremented, when these vring iovs are consumed, at a later point, and remain 0 in __vringh_iov(). So, correct the check for read and write descriptor order, to use wiov->used.
Acked-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Neeraj Upadhyay neeraju@codeaurora.org Link: https://lore.kernel.org/r/1624591502-4827-1-git-send-email-neeraju@codeauror... Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vringh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index 026a37ee4177..4653de001e26 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -331,7 +331,7 @@ __vringh_iov(struct vringh *vrh, u16 i, iov = wiov; else { iov = riov; - if (unlikely(wiov && wiov->i)) { + if (unlikely(wiov && wiov->used)) { vringh_bad("Readable desc %p after writable", &descs[i]); err = -EINVAL;
From: Shai Malin smalin@marvell.com
[ Upstream commit 37110237f31105d679fc0aa7b11cdec867750ea7 ]
Avoiding qed ll2 race condition and NULL pointer dereference as part of the remove and recovery flows.
Changes form V1: - Change (!p_rx->set_prod_addr). - qed_ll2.c checkpatch fixes.
Change from V2: - Revert "qed_ll2.c checkpatch fixes".
Signed-off-by: Ariel Elior aelior@marvell.com Signed-off-by: Shai Malin smalin@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_ll2.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c index 19a1a58d60f8..c449ecc0add2 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c @@ -353,6 +353,9 @@ static int qed_ll2_txq_completion(struct qed_hwfn *p_hwfn, void *p_cookie) unsigned long flags; int rc = -EINVAL;
+ if (!p_ll2_conn) + return rc; + spin_lock_irqsave(&p_tx->lock, flags); if (p_tx->b_completing_packet) { rc = -EBUSY; @@ -526,7 +529,16 @@ static int qed_ll2_rxq_completion(struct qed_hwfn *p_hwfn, void *cookie) unsigned long flags = 0; int rc = 0;
+ if (!p_ll2_conn) + return rc; + spin_lock_irqsave(&p_rx->lock, flags); + + if (!QED_LL2_RX_REGISTERED(p_ll2_conn)) { + spin_unlock_irqrestore(&p_rx->lock, flags); + return 0; + } + cq_new_idx = le16_to_cpu(*p_rx->p_fw_cons); cq_old_idx = qed_chain_get_cons_idx(&p_rx->rcq_chain);
@@ -847,6 +859,9 @@ static int qed_ll2_lb_rxq_completion(struct qed_hwfn *p_hwfn, void *p_cookie) struct qed_ll2_info *p_ll2_conn = (struct qed_ll2_info *)p_cookie; int rc;
+ if (!p_ll2_conn) + return 0; + if (!QED_LL2_RX_REGISTERED(p_ll2_conn)) return 0;
@@ -870,6 +885,9 @@ static int qed_ll2_lb_txq_completion(struct qed_hwfn *p_hwfn, void *p_cookie) u16 new_idx = 0, num_bds = 0; int rc;
+ if (!p_ll2_conn) + return 0; + if (!QED_LL2_TX_REGISTERED(p_ll2_conn)) return 0;
@@ -1642,6 +1660,8 @@ int qed_ll2_post_rx_buffer(void *cxt, if (!p_ll2_conn) return -EINVAL; p_rx = &p_ll2_conn->rx_queue; + if (!p_rx->set_prod_addr) + return -EIO;
spin_lock_irqsave(&p_rx->lock, flags); if (!list_empty(&p_rx->free_descq))
From: Shai Malin smalin@marvell.com
[ Upstream commit d33d19d313d3466abdf8b0428be7837aff767802 ]
Fix a possible null-pointer dereference in qed_rdma_create_qp().
Changes from V2: - Revert checkpatch fixes.
Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Ariel Elior aelior@marvell.com Signed-off-by: Shai Malin smalin@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_rdma.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c index 38b1f402f7ed..b291971bcf92 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c +++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c @@ -1245,8 +1245,7 @@ qed_rdma_create_qp(void *rdma_cxt,
if (!rdma_cxt || !in_params || !out_params || !p_hwfn->p_rdma_info->active) { - DP_ERR(p_hwfn->cdev, - "qed roce create qp failed due to NULL entry (rdma_cxt=%p, in=%p, out=%p, roce_info=?\n", + pr_err("qed roce create qp failed due to NULL entry (rdma_cxt=%p, in=%p, out=%p, roce_info=?\n", rdma_cxt, in_params, out_params); return NULL; }
From: Mark Yacoub markyacoub@google.com
[ Upstream commit fa0b1ef5f7a694f48e00804a391245f3471aa155 ]
[Why] Userspace should get back a copy of drm_wait_vblank that's been modified even when drm_wait_vblank_ioctl returns a failure.
Rationale: drm_wait_vblank_ioctl modifies the request and expects the user to read it back. When the type is RELATIVE, it modifies it to ABSOLUTE and updates the sequence to become current_vblank_count + sequence (which was RELATIVE), but now it became ABSOLUTE. drmWaitVBlank (in libdrm) expects this to be the case as it modifies the request to be Absolute so it expects the sequence to would have been updated.
The change is in compat_drm_wait_vblank, which is called by drm_compat_ioctl. This change of copying the data back regardless of the return number makes it en par with drm_ioctl, which always copies the data before returning.
[How] Return from the function after everything has been copied to user.
Fixes IGT:kms_flip::modeset-vs-vblank-race-interruptible Tested on ChromeOS Trogdor(msm)
Reviewed-by: Michel Dänzer mdaenzer@redhat.com Signed-off-by: Mark Yacoub markyacoub@chromium.org Signed-off-by: Sean Paul seanpaul@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20210812194917.1703356-1-marky... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_ioc32.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_ioc32.c b/drivers/gpu/drm/drm_ioc32.c index 2cf053fb8d54..1c691bdb8914 100644 --- a/drivers/gpu/drm/drm_ioc32.c +++ b/drivers/gpu/drm/drm_ioc32.c @@ -863,8 +863,6 @@ static int compat_drm_wait_vblank(struct file *file, unsigned int cmd, req.request.sequence = req32.request.sequence; req.request.signal = req32.request.signal; err = drm_ioctl_kernel(file, drm_wait_vblank_ioctl, &req, DRM_UNLOCKED); - if (err) - return err;
req32.reply.type = req.reply.type; req32.reply.sequence = req.reply.sequence; @@ -873,7 +871,7 @@ static int compat_drm_wait_vblank(struct file *file, unsigned int cmd, if (copy_to_user(argp, &req32, sizeof(req32))) return -EFAULT;
- return 0; + return err; }
#if defined(CONFIG_X86)
From: Ben Skeggs bskeggs@redhat.com
[ Upstream commit 6eaa1f3c59a707332e921e32782ffcad49915c5e ]
When booted with multiple displays attached, the EFI GOP driver on (at least) Ampere, can leave DP links powered up that aren't being used to display anything. This confuses our tracking of SOR routing, with the likely result being a failed modeset and display engine hang.
Fix this by (ab?)using the DisableLT IED script to power-down the link, restoring HW to a state the driver expects.
Signed-off-by: Ben Skeggs bskeggs@redhat.com Reviewed-by: Lyude Paul lyude@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 2 +- drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h | 1 + drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c | 9 +++++++++ 3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c index 818d21bd28d3..1d2837c5a8f2 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c @@ -419,7 +419,7 @@ nvkm_dp_train(struct nvkm_dp *dp, u32 dataKBps) return ret; }
-static void +void nvkm_dp_disable(struct nvkm_outp *outp, struct nvkm_ior *ior) { struct nvkm_dp *dp = nvkm_dp(outp); diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h b/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h index 428b3f488f03..e484d0c3b0d4 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h @@ -32,6 +32,7 @@ struct nvkm_dp {
int nvkm_dp_new(struct nvkm_disp *, int index, struct dcb_output *, struct nvkm_outp **); +void nvkm_dp_disable(struct nvkm_outp *, struct nvkm_ior *);
/* DPCD Receiver Capabilities */ #define DPCD_RC00_DPCD_REV 0x00000 diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c index c62030c96fba..4b1c72fd8f03 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c @@ -22,6 +22,7 @@ * Authors: Ben Skeggs */ #include "outp.h" +#include "dp.h" #include "ior.h"
#include <subdev/bios.h> @@ -216,6 +217,14 @@ nvkm_outp_init_route(struct nvkm_outp *outp) if (!ior->arm.head || ior->arm.proto != proto) { OUTP_DBG(outp, "no heads (%x %d %d)", ior->arm.head, ior->arm.proto, proto); + + /* The EFI GOP driver on Ampere can leave unused DP links routed, + * which we don't expect. The DisableLT IED script *should* get + * us back to where we need to be. + */ + if (ior->func->route.get && !ior->arm.head && outp->info.type == DCB_OUTPUT_DP) + nvkm_dp_disable(outp, ior); + return; }
From: Gerd Rausch gerd.rausch@oracle.com
[ Upstream commit fb4b1373dcab086d0619c29310f0466a0b2ceb8a ]
Function "dma_map_sg" is entitled to merge adjacent entries and return a value smaller than what was passed as "nents".
Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len") rather than the original "nents" parameter ("sg_len").
This old RDS bug was exposed and reliably causes kernel panics (using RDMA operations "rds-stress -D") on x86_64 starting with: commit c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Simply put: Linux 5.11 and later.
Signed-off-by: Gerd Rausch gerd.rausch@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/ib_frmr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c index 06ecf9d2d4bf..ef6acd721118 100644 --- a/net/rds/ib_frmr.c +++ b/net/rds/ib_frmr.c @@ -131,9 +131,9 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr) cpu_relax(); }
- ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_len, + ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_dma_len, &off, PAGE_SIZE); - if (unlikely(ret != ibmr->sg_len)) + if (unlikely(ret != ibmr->sg_dma_len)) return ret < 0 ? ret : -EINVAL;
if (cmpxchg(&frmr->fr_state,
From: Filipe Manana fdmanana@suse.com
commit bc0939fcfab0d7efb2ed12896b1af3d819954a14 upstream.
We have a race between marking that an inode needs to be logged, either at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between btrfs_sync_log(). The following steps describe how the race happens.
1) We are at transaction N;
2) Inode I was previously fsynced in the current transaction so it has:
inode->logged_trans set to N;
3) The inode's root currently has:
root->log_transid set to 1 root->last_log_commit set to 0
Which means only one log transaction was committed to far, log transaction 0. When a log tree is created we set ->log_transid and ->last_log_commit of its parent root to 0 (at btrfs_add_log_tree());
4) One more range of pages is dirtied in inode I;
5) Some task A starts an fsync against some other inode J (same root), and so it joins log transaction 1.
Before task A calls btrfs_sync_log()...
6) Task B starts an fsync against inode I, which currently has the full sync flag set, so it starts delalloc and waits for the ordered extent to complete before calling btrfs_inode_in_log() at btrfs_sync_file();
7) During ordered extent completion we have btrfs_update_inode() called against inode I, which in turn calls btrfs_set_inode_last_trans(), which does the following:
spin_lock(&inode->lock); inode->last_trans = trans->transaction->transid; inode->last_sub_trans = inode->root->log_transid; inode->last_log_commit = inode->root->last_log_commit; spin_unlock(&inode->lock);
So ->last_trans is set to N and ->last_sub_trans set to 1. But before setting ->last_log_commit...
8) Task A is at btrfs_sync_log():
- it increments root->log_transid to 2 - starts writeback for all log tree extent buffers - waits for the writeback to complete - writes the super blocks - updates root->last_log_commit to 1
It's a lot of slow steps between updating root->log_transid and root->last_log_commit;
9) The task doing the ordered extent completion, currently at btrfs_set_inode_last_trans(), then finally runs:
inode->last_log_commit = inode->root->last_log_commit; spin_unlock(&inode->lock);
Which results in inode->last_log_commit being set to 1. The ordered extent completes;
10) Task B is resumed, and it calls btrfs_inode_in_log() which returns true because we have all the following conditions met:
inode->logged_trans == N which matches fs_info->generation && inode->last_subtrans (1) <= inode->last_log_commit (1) && inode->last_subtrans (1) <= root->last_log_commit (1) && list inode->extent_tree.modified_extents is empty
And as a consequence we return without logging the inode, so the existing logged version of the inode does not point to the extent that was written after the previous fsync.
It should be impossible in practice for one task be able to do so much progress in btrfs_sync_log() while another task is at btrfs_set_inode_last_trans() right after it reads root->log_transid and before it reads root->last_log_commit. Even if kernel preemption is enabled we know the task at btrfs_set_inode_last_trans() can not be preempted because it is holding the inode's spinlock.
However there is another place where we do the same without holding the spinlock, which is in the memory mapped write path at:
vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) { (...) BTRFS_I(inode)->last_trans = fs_info->generation; BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid; BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit; (...)
So with preemption happening after setting ->last_sub_trans and before setting ->last_log_commit, it is less of a stretch to have another task do enough progress at btrfs_sync_log() such that the task doing the memory mapped write ends up with ->last_sub_trans and ->last_log_commit set to the same value. It is still a big stretch to get there, as the task doing btrfs_sync_log() has to start writeback, wait for its completion and write the super blocks.
So fix this in two different ways:
1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the value of ->last_sub_trans minus 1;
2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just like we do for buffered and direct writes at btrfs_file_write_iter(), which is all we need to make sure multiple writes and fsyncs to an inode in the same transaction never result in an fsync missing that the inode changed and needs to be logged. Turn this into a helper function and use it both at btrfs_page_mkwrite() and at btrfs_file_write_iter() - this also fixes the problem that at btrfs_page_mkwrite() we were setting those fields without the protection of the inode's spinlock.
This is an extremely unlikely race to happen in practice.
Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/btrfs_inode.h | 15 +++++++++++++++ fs/btrfs/file.c | 10 ++-------- fs/btrfs/inode.c | 4 +--- fs/btrfs/transaction.h | 2 +- 4 files changed, 19 insertions(+), 12 deletions(-)
--- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -268,6 +268,21 @@ static inline void btrfs_mod_outstanding mod); }
+/* + * Called every time after doing a buffered, direct IO or memory mapped write. + * + * This is to ensure that if we write to a file that was previously fsynced in + * the current transaction, then try to fsync it again in the same transaction, + * we will know that there were changes in the file and that it needs to be + * logged. + */ +static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode) +{ + spin_lock(&inode->lock); + inode->last_sub_trans = inode->root->log_transid; + spin_unlock(&inode->lock); +} + static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation) { int ret = 0; --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2004,14 +2004,8 @@ static ssize_t btrfs_file_write_iter(str
inode_unlock(inode);
- /* - * We also have to set last_sub_trans to the current log transid, - * otherwise subsequent syncs to a file that's been synced in this - * transaction will appear to have already occurred. - */ - spin_lock(&BTRFS_I(inode)->lock); - BTRFS_I(inode)->last_sub_trans = root->log_transid; - spin_unlock(&BTRFS_I(inode)->lock); + btrfs_set_inode_last_sub_trans(BTRFS_I(inode)); + if (num_written > 0) num_written = generic_write_sync(iocb, num_written);
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9250,9 +9250,7 @@ again: set_page_dirty(page); SetPageUptodate(page);
- BTRFS_I(inode)->last_trans = fs_info->generation; - BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid; - BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit; + btrfs_set_inode_last_sub_trans(BTRFS_I(inode));
unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
--- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -160,7 +160,7 @@ static inline void btrfs_set_inode_last_ spin_lock(&BTRFS_I(inode)->lock); BTRFS_I(inode)->last_trans = trans->transaction->transid; BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid; - BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit; + BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->last_sub_trans - 1; spin_unlock(&BTRFS_I(inode)->lock); }
From: Linus Torvalds torvalds@linux-foundation.org
commit 2287a51ba822384834dafc1c798453375d1107c7 upstream.
As per the long-suffering comment.
Reported-by: Minh Yuan yuanmingbuaa@gmail.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jirislaby@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/vt/vt_ioctl.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/tty/vt/vt_ioctl.c +++ b/drivers/tty/vt/vt_ioctl.c @@ -484,16 +484,19 @@ int vt_ioctl(struct tty_struct *tty, ret = -EINVAL; goto out; } - /* FIXME: this needs the console lock extending */ - if (vc->vc_mode == (unsigned char) arg) + console_lock(); + if (vc->vc_mode == (unsigned char) arg) { + console_unlock(); break; + } vc->vc_mode = (unsigned char) arg; - if (console != fg_console) + if (console != fg_console) { + console_unlock(); break; + } /* * explicitly blank/unblank the screen if switching modes */ - console_lock(); if (arg == KD_TEXT) do_unblank_screen(1); else
From: Andrii Nakryiko andriin@fb.com
commit a23740ec43ba022dbfd139d0fe3eff193216272b upstream.
Maps that are read-only both from BPF program side and user space side have their contents constant, so verifier can track referenced values precisely and use that knowledge for dead code elimination, branch pruning, etc. This patch teaches BPF verifier how to do this.
Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20191009201458.2679171-2-andriin@fb.com Signed-off-by: Rafael David Tinoco rafaeldtinoco@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2778,6 +2778,41 @@ static void coerce_reg_to_size(struct bp reg->smax_value = reg->umax_value; }
+static bool bpf_map_is_rdonly(const struct bpf_map *map) +{ + return (map->map_flags & BPF_F_RDONLY_PROG) && map->frozen; +} + +static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val) +{ + void *ptr; + u64 addr; + int err; + + err = map->ops->map_direct_value_addr(map, &addr, off); + if (err) + return err; + ptr = (void *)addr + off; + + switch (size) { + case sizeof(u8): + *val = (u64)*(u8 *)ptr; + break; + case sizeof(u16): + *val = (u64)*(u16 *)ptr; + break; + case sizeof(u32): + *val = (u64)*(u32 *)ptr; + break; + case sizeof(u64): + *val = *(u64 *)ptr; + break; + default: + return -EINVAL; + } + return 0; +} + /* check whether memory at (regno + off) is accessible for t = (read | write) * if t==write, value_regno is a register which value is stored into memory * if t==read, value_regno is a register which will receive the value from memory @@ -2815,9 +2850,27 @@ static int check_mem_access(struct bpf_v if (err) return err; err = check_map_access(env, regno, off, size, false); - if (!err && t == BPF_READ && value_regno >= 0) - mark_reg_unknown(env, regs, value_regno); + if (!err && t == BPF_READ && value_regno >= 0) { + struct bpf_map *map = reg->map_ptr;
+ /* if map is read-only, track its contents as scalars */ + if (tnum_is_const(reg->var_off) && + bpf_map_is_rdonly(map) && + map->ops->map_direct_value_addr) { + int map_off = off + reg->var_off.value; + u64 val = 0; + + err = bpf_map_direct_read(map, map_off, size, + &val); + if (err) + return err; + + regs[value_regno].type = SCALAR_VALUE; + __mark_reg_known(®s[value_regno], val); + } else { + mark_reg_unknown(env, regs, value_regno); + } + } } else if (reg->type == PTR_TO_CTX) { enum bpf_reg_type reg_type = SCALAR_VALUE;
From: Andrii Nakryiko andriin@fb.com
commit 2dedd7d2165565bafa89718eaadfc5d1a7865f66 upstream.
Fix "warning: cast to pointer from integer of different size" when casting u64 addr to void *.
Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars") Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/bpf/20191011172053.2980619-1-andriin@fb.com Cc: Rafael David Tinoco rafaeldtinoco@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2792,7 +2792,7 @@ static int bpf_map_direct_read(struct bp err = map->ops->map_direct_value_addr(map, &addr, off); if (err) return err; - ptr = (void *)addr + off; + ptr = (void *)(long)addr + off;
switch (size) { case sizeof(u8):
From: DENG Qingfang dqfext@gmail.com
commit 7428022b50d0fbb4846dd0f00639ea09d36dff02 upstream.
When a port leaves a VLAN-aware bridge, the current code does not clear other ports' matrix field bit. If the bridge is later set to VLAN-unaware mode, traffic in the bridge may leak to that port.
Remove the VLAN filtering check in mt7530_port_bridge_leave.
Fixes: 474a2ddaa192 ("net: dsa: mt7530: fix VLAN traffic leaks") Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530") Signed-off-by: DENG Qingfang dqfext@gmail.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/mt7530.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -842,11 +842,8 @@ mt7530_port_bridge_leave(struct dsa_swit /* Remove this port from the port matrix of the other ports * in the same bridge. If the port is disabled, port matrix * is kept and not being setup until the port becomes enabled. - * And the other port's port matrix cannot be broken when the - * other port is still a VLAN-aware port. */ - if (dsa_is_user_port(ds, i) && i != port && - !dsa_port_is_vlan_filtering(&ds->ports[i])) { + if (dsa_is_user_port(ds, i) && i != port) { if (dsa_to_port(ds, i)->bridge_dev != bridge) continue; if (priv->ports[i].enable)
From: Sean Christopherson seanjc@google.com
commit 112022bdb5bc372e00e6e43cb88ee38ea67b97bd upstream
Mark NX as being used for all non-nested shadow MMUs, as KVM will set the NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled. Checking the mitigation itself is not sufficient as it can be toggled on at any time and KVM doesn't reset MMU contexts when that happens. KVM could reset the contexts, but that would require purging all SPTEs in all MMUs, for no real benefit. And, KVM already forces EFER.NX=1 when TDP is disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved for shadow MMUs.
Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210622175739.3610207-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com [sudip: use old path] Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/mmu.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4666,7 +4666,15 @@ static void reset_rsvds_bits_mask_ept(st void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { - bool uses_nx = context->nx || + /* + * KVM uses NX when TDP is disabled to handle a variety of scenarios, + * notably for huge SPTEs if iTLB multi-hit mitigation is enabled and + * to generate correct permissions for CR0.WP=0/CR4.SMEP=1/EFER.NX=0. + * The iTLB multi-hit workaround can be toggled at any time, so assume + * NX can be used by any non-nested shadow MMU to avoid having to reset + * MMU contexts. Note, KVM forces EFER.NX=1 when TDP is disabled. + */ + bool uses_nx = context->nx || !tdp_enabled || context->mmu_role.base.smep_andnot_wp; struct rsvd_bits_validate *shadow_zero_check; int i;
From: Petr Vorel petr.vorel@gmail.com
commit f890f89d9a80fffbfa7ca791b78927e5b8aba869 upstream.
Reserve GPIO pins 85-88 as these aren't meant to be accessible from the application CPUs (causes reboot). Yet another fix similar to 9134586715e3, 5f8d3ab136d0, which is needed to allow angler to boot after 3edfb7bd76bd ("gpiolib: Show correct direction from the beginning").
Fixes: feeaf56ac78d ("arm64: dts: msm8994 SoC and Huawei Angler (Nexus 6P) support")
Signed-off-by: Petr Vorel petr.vorel@gmail.com Reviewed-by: Konrad Dybcio konrad.dybcio@somainline.org Link: https://lore.kernel.org/r/20210415193913.1836153-1-petr.vorel@gmail.com Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/msm8994-angler-rev-101.dts | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/arm64/boot/dts/qcom/msm8994-angler-rev-101.dts +++ b/arch/arm64/boot/dts/qcom/msm8994-angler-rev-101.dts @@ -30,3 +30,7 @@ }; }; }; + +&msmgpio { + gpio-reserved-ranges = <85 4>; +};
From: Qu Wenruo wqu@suse.com
commit e4571b8c5e9ffa1e85c0c671995bd4dcc5c75091 upstream.
[BUG] It's easy to trigger NULL pointer dereference, just by removing a non-existing device id:
# mkfs.btrfs -f -m single -d single /dev/test/scratch1 \ /dev/test/scratch2 # mount /dev/test/scratch1 /mnt/btrfs # btrfs device remove 3 /mnt/btrfs
Then we have the following kernel NULL pointer dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 649 Comm: btrfs Not tainted 5.14.0-rc3-custom+ #35 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:btrfs_rm_device+0x4de/0x6b0 [btrfs] btrfs_ioctl+0x18bb/0x3190 [btrfs] ? lock_is_held_type+0xa5/0x120 ? find_held_lock.constprop.0+0x2b/0x80 ? do_user_addr_fault+0x201/0x6a0 ? lock_release+0xd2/0x2d0 ? __x64_sys_ioctl+0x83/0xb0 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
[CAUSE] Commit a27a94c2b0c7 ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly") moves the "missing" device path check into btrfs_rm_device().
But btrfs_rm_device() itself can have case where it only receives @devid, with NULL as @device_path.
In that case, calling strcmp() on NULL will trigger the NULL pointer dereference.
Before that commit, we handle the "missing" case inside btrfs_find_device_by_devspec(), which will not check @device_path at all if @devid is provided, thus no way to trigger the bug.
[FIX] Before calling strcmp(), also make sure @device_path is not NULL.
Fixes: a27a94c2b0c7 ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly") CC: stable@vger.kernel.org # 5.4+ Reported-by: butt3rflyh4ck butterflyhuangxx@gmail.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2168,7 +2168,7 @@ int btrfs_rm_device(struct btrfs_fs_info
if (IS_ERR(device)) { if (PTR_ERR(device) == -ENOENT && - strcmp(device_path, "missing") == 0) + device_path && strcmp(device_path, "missing") == 0) ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND; else ret = PTR_ERR(device);
From: Denis Efremov efremov@linux.com
commit c7e9d0020361f4308a70cdfd6d5335e273eb8717 upstream.
The patch breaks userspace implementations (e.g. fdutils) and introduces regressions in behaviour. Previously, it was possible to O_NDELAY open a floppy device with no media inserted or with write protected media without an error. Some userspace tools use this particular behavior for probing.
It's not the first time when we revert this patch. Previous revert is in commit f2791e7eadf4 (Revert "floppy: refactor open() flags handling").
This reverts commit 8a0c014cd20516ade9654fc13b51345ec58e7be8.
Link: https://lore.kernel.org/linux-block/de10cb47-34d1-5a88-7751-225ca380f735@com... Reported-by: Mark Hounschell markh@compro.net Cc: Jiri Kosina jkosina@suse.cz Cc: Wim Osterholt wim@djo.tudelft.nl Cc: Kurt Garloff kurt@garloff.de Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov efremov@linux.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/floppy.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
--- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -4063,22 +4063,21 @@ static int floppy_open(struct block_devi if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2;
- if (mode & (FMODE_READ|FMODE_WRITE)) { - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, &UDRS->flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, &UDRS->flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, &UDRS->flags)) + if (!(mode & FMODE_NDELAY)) { + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, &UDRS->flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, &UDRS->flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, &UDRS->flags)) + goto out; + } + res = -EROFS; + if ((mode & FMODE_WRITE) && + !test_bit(FD_DISK_WRITABLE_BIT, &UDRS->flags)) goto out; } - - res = -EROFS; - - if ((mode & FMODE_WRITE) && - !test_bit(FD_DISK_WRITABLE_BIT, &UDRS->flags)) - goto out; - mutex_unlock(&open_lock); mutex_unlock(&floppy_mutex); return 0;
From: Helge Deller deller@gmx.de
commit f6a3308d6feb351d9854eb8b3f6289a1ac163125 upstream.
This reverts commit 83af58f8068ea3f7b3c537c37a30887bfa585069.
It turns out that at least the assembly implementation for strncpy() was buggy. Revert the whole commit and return back to the default coding.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v5.4+ Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/include/asm/string.h | 15 ---- arch/parisc/kernel/parisc_ksyms.c | 4 - arch/parisc/lib/Makefile | 4 - arch/parisc/lib/memset.c | 72 ++++++++++++++++++++ arch/parisc/lib/string.S | 136 -------------------------------------- 5 files changed, 74 insertions(+), 157 deletions(-) create mode 100644 arch/parisc/lib/memset.c delete mode 100644 arch/parisc/lib/string.S
--- a/arch/parisc/include/asm/string.h +++ b/arch/parisc/include/asm/string.h @@ -8,19 +8,4 @@ extern void * memset(void *, int, size_t #define __HAVE_ARCH_MEMCPY void * memcpy(void * dest,const void *src,size_t count);
-#define __HAVE_ARCH_STRLEN -extern size_t strlen(const char *s); - -#define __HAVE_ARCH_STRCPY -extern char *strcpy(char *dest, const char *src); - -#define __HAVE_ARCH_STRNCPY -extern char *strncpy(char *dest, const char *src, size_t count); - -#define __HAVE_ARCH_STRCAT -extern char *strcat(char *dest, const char *src); - -#define __HAVE_ARCH_MEMSET -extern void *memset(void *, int, size_t); - #endif --- a/arch/parisc/kernel/parisc_ksyms.c +++ b/arch/parisc/kernel/parisc_ksyms.c @@ -17,10 +17,6 @@
#include <linux/string.h> EXPORT_SYMBOL(memset); -EXPORT_SYMBOL(strlen); -EXPORT_SYMBOL(strcpy); -EXPORT_SYMBOL(strncpy); -EXPORT_SYMBOL(strcat);
#include <linux/atomic.h> EXPORT_SYMBOL(__xchg8); --- a/arch/parisc/lib/Makefile +++ b/arch/parisc/lib/Makefile @@ -3,7 +3,7 @@ # Makefile for parisc-specific library files #
-lib-y := lusercopy.o bitops.o checksum.o io.o memcpy.o \ - ucmpdi2.o delay.o string.o +lib-y := lusercopy.o bitops.o checksum.o io.o memset.o memcpy.o \ + ucmpdi2.o delay.o
obj-y := iomap.o --- /dev/null +++ b/arch/parisc/lib/memset.c @@ -0,0 +1,72 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#include <linux/types.h> +#include <asm/string.h> + +#define OPSIZ (BITS_PER_LONG/8) +typedef unsigned long op_t; + +void * +memset (void *dstpp, int sc, size_t len) +{ + unsigned int c = sc; + long int dstp = (long int) dstpp; + + if (len >= 8) + { + size_t xlen; + op_t cccc; + + cccc = (unsigned char) c; + cccc |= cccc << 8; + cccc |= cccc << 16; + if (OPSIZ > 4) + /* Do the shift in two steps to avoid warning if long has 32 bits. */ + cccc |= (cccc << 16) << 16; + + /* There are at least some bytes to set. + No need to test for LEN == 0 in this alignment loop. */ + while (dstp % OPSIZ != 0) + { + ((unsigned char *) dstp)[0] = c; + dstp += 1; + len -= 1; + } + + /* Write 8 `op_t' per iteration until less than 8 `op_t' remain. */ + xlen = len / (OPSIZ * 8); + while (xlen > 0) + { + ((op_t *) dstp)[0] = cccc; + ((op_t *) dstp)[1] = cccc; + ((op_t *) dstp)[2] = cccc; + ((op_t *) dstp)[3] = cccc; + ((op_t *) dstp)[4] = cccc; + ((op_t *) dstp)[5] = cccc; + ((op_t *) dstp)[6] = cccc; + ((op_t *) dstp)[7] = cccc; + dstp += 8 * OPSIZ; + xlen -= 1; + } + len %= OPSIZ * 8; + + /* Write 1 `op_t' per iteration until less than OPSIZ bytes remain. */ + xlen = len / OPSIZ; + while (xlen > 0) + { + ((op_t *) dstp)[0] = cccc; + dstp += OPSIZ; + xlen -= 1; + } + len %= OPSIZ; + } + + /* Write the last few bytes. */ + while (len > 0) + { + ((unsigned char *) dstp)[0] = c; + dstp += 1; + len -= 1; + } + + return dstpp; +} --- a/arch/parisc/lib/string.S +++ /dev/null @@ -1,136 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * PA-RISC assembly string functions - * - * Copyright (C) 2019 Helge Deller deller@gmx.de - */ - -#include <asm/assembly.h> -#include <linux/linkage.h> - - .section .text.hot - .level PA_ASM_LEVEL - - t0 = r20 - t1 = r21 - t2 = r22 - -ENTRY_CFI(strlen, frame=0,no_calls) - or,COND(<>) arg0,r0,ret0 - b,l,n .Lstrlen_null_ptr,r0 - depwi 0,31,2,ret0 - cmpb,COND(<>) arg0,ret0,.Lstrlen_not_aligned - ldw,ma 4(ret0),t0 - cmpib,tr 0,r0,.Lstrlen_loop - uxor,nbz r0,t0,r0 -.Lstrlen_not_aligned: - uaddcm arg0,ret0,t1 - shladd t1,3,r0,t1 - mtsar t1 - depwi -1,%sar,32,t0 - uxor,nbz r0,t0,r0 -.Lstrlen_loop: - b,l,n .Lstrlen_end_loop,r0 - ldw,ma 4(ret0),t0 - cmpib,tr 0,r0,.Lstrlen_loop - uxor,nbz r0,t0,r0 -.Lstrlen_end_loop: - extrw,u,<> t0,7,8,r0 - addib,tr,n -3,ret0,.Lstrlen_out - extrw,u,<> t0,15,8,r0 - addib,tr,n -2,ret0,.Lstrlen_out - extrw,u,<> t0,23,8,r0 - addi -1,ret0,ret0 -.Lstrlen_out: - bv r0(rp) - uaddcm ret0,arg0,ret0 -.Lstrlen_null_ptr: - bv,n r0(rp) -ENDPROC_CFI(strlen) - - -ENTRY_CFI(strcpy, frame=0,no_calls) - ldb 0(arg1),t0 - stb t0,0(arg0) - ldo 0(arg0),ret0 - ldo 1(arg1),t1 - cmpb,= r0,t0,2f - ldo 1(arg0),t2 -1: ldb 0(t1),arg1 - stb arg1,0(t2) - ldo 1(t1),t1 - cmpb,<> r0,arg1,1b - ldo 1(t2),t2 -2: bv,n r0(rp) -ENDPROC_CFI(strcpy) - - -ENTRY_CFI(strncpy, frame=0,no_calls) - ldb 0(arg1),t0 - stb t0,0(arg0) - ldo 1(arg1),t1 - ldo 0(arg0),ret0 - cmpb,= r0,t0,2f - ldo 1(arg0),arg1 -1: ldo -1(arg2),arg2 - cmpb,COND(=),n r0,arg2,2f - ldb 0(t1),arg0 - stb arg0,0(arg1) - ldo 1(t1),t1 - cmpb,<> r0,arg0,1b - ldo 1(arg1),arg1 -2: bv,n r0(rp) -ENDPROC_CFI(strncpy) - - -ENTRY_CFI(strcat, frame=0,no_calls) - ldb 0(arg0),t0 - cmpb,= t0,r0,2f - ldo 0(arg0),ret0 - ldo 1(arg0),arg0 -1: ldb 0(arg0),t1 - cmpb,<>,n r0,t1,1b - ldo 1(arg0),arg0 -2: ldb 0(arg1),t2 - stb t2,0(arg0) - ldo 1(arg0),arg0 - ldb 0(arg1),t0 - cmpb,<> r0,t0,2b - ldo 1(arg1),arg1 - bv,n r0(rp) -ENDPROC_CFI(strcat) - - -ENTRY_CFI(memset, frame=0,no_calls) - copy arg0,ret0 - cmpb,COND(=) r0,arg0,4f - copy arg0,t2 - cmpb,COND(=) r0,arg2,4f - ldo -1(arg2),arg3 - subi -1,arg3,t0 - subi 0,t0,t1 - cmpiclr,COND(>=) 0,t1,arg2 - ldo -1(t1),arg2 - extru arg2,31,2,arg0 -2: stb arg1,0(t2) - ldo 1(t2),t2 - addib,>= -1,arg0,2b - ldo -1(arg3),arg3 - cmpiclr,COND(<=) 4,arg2,r0 - b,l,n 4f,r0 -#ifdef CONFIG_64BIT - depd,* r0,63,2,arg2 -#else - depw r0,31,2,arg2 -#endif - ldo 1(t2),t2 -3: stb arg1,-1(t2) - stb arg1,0(t2) - stb arg1,1(t2) - stb arg1,2(t2) - addib,COND(>) -4,arg2,3b - ldo 4(t2),t2 -4: bv,n r0(rp) -ENDPROC_CFI(memset) - - .end
From: Peter Collingbourne pcc@google.com
commit d0efb16294d145d157432feda83877ae9d7cdf37 upstream.
A common implementation of isatty(3) involves calling a ioctl passing a dummy struct argument and checking whether the syscall failed -- bionic and glibc use TCGETS (passing a struct termios), and musl uses TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will copy sizeof(struct ifreq) bytes of data from the argument and return -EFAULT if that fails. The result is that the isatty implementations may return a non-POSIX-compliant value in errno in the case where part of the dummy struct argument is inaccessible, as both struct termios and struct winsize are smaller than struct ifreq (at least on arm64).
Although there is usually enough stack space following the argument on the stack that this did not present a practical problem up to now, with MTE stack instrumentation it's more likely for the copy to fail, as the memory following the struct may have a different tag.
Fix the problem by adding an early check for whether the ioctl is a valid socket ioctl, and return -ENOTTY if it isn't.
Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers") Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05... Signed-off-by: Peter Collingbourne pcc@google.com Cc: stable@vger.kernel.org # 4.19 Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/netdevice.h | 4 ++++ net/socket.c | 6 +++++- 2 files changed, 9 insertions(+), 1 deletion(-)
--- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3684,6 +3684,10 @@ int netdev_rx_handler_register(struct ne void netdev_rx_handler_unregister(struct net_device *dev);
bool dev_valid_name(const char *name); +static inline bool is_socket_ioctl_cmd(unsigned int cmd) +{ + return _IOC_TYPE(cmd) == SOCK_IOC_TYPE; +} int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr, bool *need_copyout); int dev_ifconf(struct net *net, struct ifconf *, int); --- a/net/socket.c +++ b/net/socket.c @@ -1053,7 +1053,7 @@ static long sock_do_ioctl(struct net *ne rtnl_unlock(); if (!err && copy_to_user(argp, &ifc, sizeof(struct ifconf))) err = -EFAULT; - } else { + } else if (is_socket_ioctl_cmd(cmd)) { struct ifreq ifr; bool need_copyout; if (copy_from_user(&ifr, argp, sizeof(struct ifreq))) @@ -1062,6 +1062,8 @@ static long sock_do_ioctl(struct net *ne if (!err && need_copyout) if (copy_to_user(argp, &ifr, sizeof(struct ifreq))) return -EFAULT; + } else { + err = -ENOTTY; } return err; } @@ -3228,6 +3230,8 @@ static int compat_ifr_data_ioctl(struct struct ifreq ifreq; u32 data32;
+ if (!is_socket_ioctl_cmd(cmd)) + return -ENOTTY; if (copy_from_user(ifreq.ifr_name, u_ifreq32->ifr_name, IFNAMSIZ)) return -EFAULT; if (get_user(data32, &u_ifreq32->ifr_data))
From: Richard Guy Briggs rgb@redhat.com
commit 67d69e9d1a6c889d98951c1d74b19332ce0565af upstream.
AUDIT_TRIM is expected to be idempotent, but multiple executions resulted in a refcount underflow and use-after-free.
git bisect fingered commit fb041bb7c0a9 ("locking/refcount: Consolidate implementations of refcount_t") but this patch with its more thorough checking that wasn't in the x86 assembly code merely exposed a previously existing tree refcount imbalance in the case of tree trimming code that was refactored with prune_one() to remove a tree introduced in commit 8432c7006297 ("audit: Simplify locking around untag_chunk()")
Move the put_tree() to cover only the prune_one() case.
Passes audit-testsuite and 3 passes of "auditctl -t" with at least one directory watch.
Cc: Jan Kara jack@suse.cz Cc: Will Deacon will@kernel.org Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Seiji Nishikawa snishika@redhat.com Cc: stable@vger.kernel.org Fixes: 8432c7006297 ("audit: Simplify locking around untag_chunk()") Signed-off-by: Richard Guy Briggs rgb@redhat.com Reviewed-by: Jan Kara jack@suse.cz [PM: reformatted/cleaned-up the commit description] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/audit_tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -595,7 +595,6 @@ static void prune_tree_chunks(struct aud spin_lock(&hash_lock); } spin_unlock(&hash_lock); - put_tree(victim); }
/* @@ -604,6 +603,7 @@ static void prune_tree_chunks(struct aud static void prune_one(struct audit_tree *victim) { prune_tree_chunks(victim, false); + put_tree(victim); }
/* trim the uncommitted chunks from tree */
On 9/1/2021 5:27 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.144-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Wed, 01 Sep 2021 14:27:50 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.144-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.4: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 59 tests: 59 pass, 0 fail
Linux version: 5.4.144-rc1-gfa7f9f53436e Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 9/1/21 6:27 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.144-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 2021/9/1 20:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.144-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Tested on arm64 and x86 for 5.4.144-rc1,
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.4.y Version: 5.4.144-rc1 Commit: 9bc4aee46b925fc92370b55814836e5430a85975 Compiler: gcc version 7.3.0 (GCC)
arm64: -------------------------------------------------------------------- Testcase Result Summary: total: 8906 passed: 8906 failed: 0 timeout: 0 --------------------------------------------------------------------
x86: -------------------------------------------------------------------- Testcase Result Summary: total: 8906 passed: 8906 failed: 0 timeout: 0 --------------------------------------------------------------------
Tested-by: Hulk Robot hulkrobot@huawei.com
On Wed, 1 Sept 2021 at 18:02, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.144-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.4.144-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.4.y * git commit: fa7f9f53436ea73ba28358d682e08577f20c1342 * git describe: v5.4.143-49-gfa7f9f53436e * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.14...
## No regressions (compared to v5.4.143-28-g66b6adc3ce6e)
## No fixes (compared to v5.4.143-28-g66b6adc3ce6e)
## Test result summary total: 82537, pass: 67479, fail: 741, skip: 12958, xfail: 1359
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 257 total, 257 passed, 0 failed * arm64: 35 total, 35 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 16 total, 16 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 51 total, 51 passed, 0 failed * parisc: 9 total, 9 passed, 0 failed * powerpc: 27 total, 27 passed, 0 failed * riscv: 27 total, 27 passed, 0 failed * s390: 9 total, 9 passed, 0 failed * sh: 18 total, 18 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 35 total, 35 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kvm-unit-tests * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Wed, Sep 01, 2021 at 02:27:50PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.1.1 20210816): 65 configs -> no new failure arm (gcc version 11.1.1 20210816): 107 configs -> no new failure arm64 (gcc version 11.1.1 20210816): 2 configs -> no failure x86_64 (gcc version 10.2.1 20210110): 4 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/81
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Wed, Sep 01, 2021 at 02:27:50PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.144 release. There are 48 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 443 pass: 443 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
linux-stable-mirror@lists.linaro.org